Verified Adversarial Robustness: Lipschitz Certificates for Neural Networks in Lean 4
Abstract
Can we prove a neural network is safe? Adversarial examples — imperceptible input perturbations that cause misclassification — remain the most persistent failure mode of deep learning. Existing defenses rely on empirical testing, which cannot guarantee safety. Certification methods based on Lipschitz constants offer provable guarantees, but the mathematical chain from definition to certificate has never been formally verified.
We present the first Lean 4 verification of end-to-end adversarial robustness certificates for feedforward ReLU networks. The proof chain spans 22 files and 172 machine-checked declarations with zero sorry (no unproved assertions). The key results:
1. Network Lipschitz bound. For an \(L\)-layer ReLU network with weight matrices \(W_1, \ldots, W_L\), the Lipschitz constant satisfies \(\text{Lip}(f) \leq \prod_{\ell=1}^L \|W_\ell\|_{\text{op}}\).
2. Certified radius. For any input \(x\) with classification margin \(m(x) > 0\), every perturbation \(\delta\) with \(\|\delta\| < m(x) / (2 \cdot \text{Lip}(f))\) preserves the predicted class.
3. Tightness. The certified radius is exact for linear networks: the bound is achieved by the top singular vector.
4. Spectral improvement. Replacing the spectral norm \(\|J\|_{\text{op}} = \sigma_{\max}\) with the effective Lipschitz constant \(L_{\text{eff}} = \|J\|_F / \sqrt{n}\) yields strictly tighter certificates, with improvement factor \(\sigma_{\max} \cdot \sqrt{n} / \|J\|_F \geq 1\), with equality if and only if the singular value spectrum is flat.
The spectral connection is the paper's unique contribution: the same eigenvalue conditioning that replaces Monte Carlo simulation for financial risk computation (Spectral Fenton) yields provably tighter adversarial certificates for neural networks. Both applications reduce to a single mathematical operation — projecting high-dimensional uncertainty onto its principal spectral components.
One-sentence summary: We provide the first machine-verified proof that neural networks can be certified adversarially robust, with a spectral bridge showing that eigenvalue conditioning from financial mathematics yields strictly tighter safety certificates.