Spectral Knowledge Distillation: From Black Box to Certified White Box
Abstract
Knowledge distillation (Hinton et al., 2015) compresses a large teacher model into a smaller student model by training on the teacher's soft outputs. The student is a smaller neural network — still a black box, with no guarantee on how much knowledge was preserved. We introduce spectral knowledge distillation: the teacher's learned function is eigendecomposed on the data manifold, producing \(K^*\) spectral coefficients that provably capture the maximum possible information per parameter (Eckart-Young theorem, Lean 4 verified). The student is not a neural network — it is an explicit formula with certified error bounds.
We show that the eigenvalue spectrum plays the role of Hinton's temperature parameter: large eigenvalues correspond to hard targets (dominant patterns), small eigenvalues to soft targets (subtle "dark knowledge"). The GCV-optimal shrinkage filter \(h_k = \lambda_k / (\lambda_k + \alpha)\) replaces manual temperature tuning with an analytic optimum.
Experiments on neural networks of 6 architectures (513 to 139,009 parameters) demonstrate: (1) spectral distillation improves test accuracy over the original neural network in all cases by removing noise modes, (2) the distilled form captures 77–95% of the teacher's learned function (\(R^2\) vs teacher) in 250–390 effective parameters, (3) the mode-by-mode decomposition reveals exactly which patterns the neural network learned, which it underlearned, and which are spurious noise, and (4) for the largest network (512-256, 139K parameters), spectral distillation achieves 359x compression while improving prediction.
The spectral diagnostic additionally detects overfitting without holdout data: a neural network using 249 effective modes when only \(K^* = 76\) are signal is provably overfitting 173 noise modes — explaining the observed train-test gap of 2.32 RMSE.