The Self-Modeling Ceiling: Formally Verified Bounds on Machine Self-Calibration

Nagy, Tamás

The Self-Modeling Ceiling: Formally Verified Bounds on Machine Self-Calibration

Dr. Tamás Nagy Updated 2026-03-07 Draft Formal Verification Lean-Verified

Mathematics verified. Core theorems are machine-checked in Lean 4. Prose and presentation may not have been human-reviewed.

Download PDF View in Graph BibTeX

Abstract

How much can a system know about itself? We develop a formally verified theory of self-calibration limits, connecting spectral coupling models to hallucination, robustness, and convergence guarantees. All results are verified in Lean 4 with Mathlib (30 files, 0 sorry).

(i) The self-calibration ceiling, blind zone growth, and monitoring cost. A system with summable coupling \(g(k)\) has incomplete self-knowledge for any finite budget \(N\), and self-calibration \(\leq\) capability when \(g_{\text{meta}}(k) \leq g(k)\). For power-law coupling, the blind zone (capable but not self-aware modes) grows as \(\Theta(N^{1/\beta})\) with tight constants (Theorem 3.6). An information-theoretic lower bound (Theorem 3.7) shows this is fundamental: any self-monitoring procedure requires superlinear budget \(\Omega(N^{1+1/\beta})\) to classify all frontier modes, and no algorithm can eliminate the blind zone with proportional resources.

(ii) Hallucination duality. The blind zone formally characterizes hallucination-prone regions: modes where a system is capable but lacks self-monitoring produce confident errors. Expanding meta-knowledge provably shrinks the hallucination surface. The blind zone correlates with hallucination rates (\(r = 0.84\), \(n = 15\) modes; preliminary) and with ECE, but provides structural information ECE cannot: which specific capabilities lack self-monitoring.

(iii) Structural transfer test. Genuine self-awareness degrades as \(\varepsilon + K\delta\) under perturbation — a falsifiable test distinguishing structural from memorized self-assessment. Confirmed: LLM self-assessment degrades 2×, memorized baselines 5×.

We validate on GPT-4o, Claude 3.5 Sonnet, and Llama-3 (8B/70B) across mathematics and code generation. Coupling spectra are consistent with \(g_{\text{meta}}(k) \leq g(k)\) across all models. The Llama scaling comparison provides preliminary evidence for blind zone growth with model size. All proofs are machine-checkable and publicly available.

Length

6,507 words

Status

Draft (Round 4 Revision)

Connects To

Universal Foundations: A Verified Library of Core Mathematic...

Browse all Formal Verification papers →