← All problems

Why do large language models work?

Neural Scaling Laws

Why does model performance follow power laws in compute, data, and parameters? And when should we expect it to break?

Progress

85%
Machine-verified derivation
Current approach
Formal derivation of Chinchilla scaling from information-theoretic bounds; Lean-verified power-law emergence.
Status notes
Chinchilla exponents derived from first principles and Lean-checked. Adam optimizer convergence separately formalized.

Direct contributions

3 papers