Why Neural Networks Scale: A Complete Latent-Theoretic Foundation
We present a unified mathematical theory of neural scaling laws derived from the spectral structure of data distributions. The central object is the Latent Number $\rho \in (0, \infty)$, which measures the rate at which a distribution's spectral coefficients decay.
Verified
1 claims
6,042 words