← All Papers · Machine Learning

Ml Spectral Intelligence

Dr. Tamás Nagy Short Draft Machine Learning Lean-Verified
Mathematics verified. Core theorems are machine-checked in Lean 4. Prose and presentation may not have been human-reviewed.
Download PDF View in Graph BibTeX

Abstract

We derive neural scaling laws, transformer convergence rates, and self-improvement limits from a single principle: the eigenvalue decay of the data covariance matrix. For data with spectral exponent \(s\) (eigenvalues \(\lambda_k \sim k^{-s}\)), we prove:

1. Scaling Law: The compute-optimal loss scales as \(L^*(C) \sim C^{-(s-1)/(s+1)}\), with optimal allocation \(N \sim C^{1/(s+1)}\), \(D \sim C^{s/(s+1)}\). For language (\(s \approx 1\)): \(N \approx D\) (Chinchilla).

2. Transformer Convergence: Residual attention with spectral gap \(\lambda_2\) drives tokens to clusters at rate \((1 - \varepsilon \lambda_2)^L\).

3. Self-Improvement Limits: Synthetic data self-improvement converges per fixed compute (bounded monotone sequences) but diverges with growing compute (no fundamental ceiling).

The mathematical structure is machine-verified in Lean 4 (38 files, ~200 theorems, zero sorry). The spectral exponent \(s\) is a single measurable number that determines scaling speed, optimal resource allocation, convergence dynamics, and self-improvement rates. Qualitative predictions — the ordering of scaling behaviour across data types and the Chinchilla-optimal allocation for language — are validated against synthetic experiments. However, the hard-truncation model overpredicts exact scaling exponents by up to 50\(\times\) for structured data (\(s = 3\)); a soft-truncation correction improves the fit but introduces a new tension with Chinchilla allocation. Mapping the data spectral exponent \(s_{\text{data}}\) to the effective learning exponent \(s_{\text{eff}}\) remains the key open problem.

Length
4,393 words
Claims
10 theorems
Status
Unknown

Referenced By

Creative Flow as a Percolation Phase Transition in Knowledge... Spectral of Spectrals: Second-Order Mode Decomposition for C... The Spectral Cognitive Resonator: A Dynamic Architecture for... Spectral-State Neural Networks: A Mode-Decomposition Archite...

Browse all Machine Learning papers →