Why Does LoRA Work? The Spectral Theory of Low-Rank Adaptation

Nagy, Tamás

Why Does LoRA Work? The Spectral Theory of Low-Rank Adaptation

Tamás Nagy, Ph.D. Updated 2026-03-08 Short Draft Machine Learning

Unreviewed draft. This paper has not been human-reviewed. Mathematical claims may be unverified. Use with appropriate caution.

Download PDF View in Graph BibTeX

Abstract

Low-Rank Adaptation (LoRA; Hu et al., 2021) fine-tunes large language models by adding rank-\(r\) updates \(\Delta W = AB\) with \(r \ll d\). In practice, \(r = 4\)--\(16\) works remarkably well, but no theory explains why or predicts the optimal \(r\) for a given task. We provide a spectral theory: the fine-tuning data's eigenvalue spectrum decays at rate \(\rho\), and the optimal LoRA rank is \(K^ = \lceil\log(1/\tau_{\text{MP}})/\log\rho\rceil\), where \(\tau_{\text{MP}}\) is the Marchenko--Pastur noise threshold. This counts the number of eigenvalues (signal modes) above the random matrix noise floor. On 10 synthetic fine-tuning tasks with controlled \(\rho\), \(K^\) matches the empirically optimal rank in 9 out of 10 cases (90\% match rate, mean error 0.8 ranks). The spectral decay rate \(\rho\) is estimable from the data alone via SVD of the OLS solution, with 1--12\% accuracy. The practical implication: compute \(\rho\) from your fine-tuning dataset, calculate \(K^\), set LoRA rank = \(K^\). No hyperparameter search needed. The theoretical foundation is the Universal Spectral Representation Theorem (Nagy, 2026b), which guarantees dimension-free convergence.

Length

2,310 words

Claims

3 theorems

Status

Draft

Target

NeurIPS 2026 / ICML 2026

Connects To

Universal Foundations: A Verified Library of Core Mathematic...

Browse all Machine Learning papers →