Spectral Model Compression: Provably Optimal Knowledge Extraction via Eigendecomposition
Abstract
We present spectral model compression: any trained model — neural network, tree ensemble, or kernel machine — can be distilled into \(K^\) spectral coefficients via eigendecomposition of its learned function on the data manifold. By the Eckart-Young theorem [Lean-verified], this \(K^\)-mode representation is the provably optimal compression among all representations of equal dimension. The number of required modes is determined by the URRT: \(K^* = \Theta(\log(n/\sigma^2)/\log\rho)\), where \(\rho\) is the spectral decay rate of the data kernel — a quantity computable from the eigenvalue spectrum without access to the true function.
On synthetic benchmarks, spectral compression of a 200-tree Random Forest (126,074 tree nodes) into 346 effective spectral parameters achieves 364x compression while retaining 78% of the learned function (\(R^2 = 0.78\)) and improving test RMSE (1.61 vs 1.72) by removing noise modes. With RBF kernel spectral regression, the method outperforms gradient boosting by 27% on smooth nonlinear functions — a direct consequence of exponential vs polynomial convergence rates.
The compressed model retains full diagnostic power: each mode has a statistical significance level (t-statistic), an eigenvalue (variance explained), and a shrinkage weight (GCV-optimal). The spectral representation enables model comparison (\(\|A_1 - A_2\|\) in eigenspace), ensemble by coefficient averaging, and anomaly detection via noise-mode projections — capabilities absent from the original black-box model.