Contaminated by Construction: Separating Simulation Noise from Model Risk in ES Backtests
Abstract
Expected Shortfall backtesting under Basel III/IV suffers from an unmeasured structural weakness: Monte Carlo estimation of ES injects computational noise into the Acerbi-Székely (2014) test statistic, but the magnitude of this contamination has not been quantified.
We prove that the test statistic's variance decomposes additively as \(\text{Var}_{\text{returns}} + \text{Var}_{\text{MC}}\), separating the irreducible statistical uncertainty from the eliminable computational noise. This decomposition is the paper's central result: it tells practitioners exactly how much of a backtest outcome is real and how much is Monte Carlo artifact. Simulation confirms that MC contributes roughly a third of total test variance at \(M = 1{,}000\) paths and 3–5% at \(M = 10{,}000\); the literature suggests contributions of up to 10% for heavier-tailed portfolios.
For portfolios of correlated lognormal assets — covering the majority of linear equity and FX books — the MC component can be eliminated entirely using the Hermite-COS method, which computes ES, the CDF, and the density in closed form from Fourier coefficients. Setting \(\text{Var}_{\text{MC}} = 0\) yields 11–26% relative power gains and enables two qualitatively new tests: (i) a Probability Integral Transform test that detects distributional misspecifications invisible to any ES-only backtest; (ii) a tail likelihood ratio test that is optimal for simple tail alternatives by the Neyman-Pearson lemma. The algebraic chain from assumptions to conclusions is formally verified in Lean 4 (10 files, 53 lemmas, zero sorry; proof depth classified in Section 8.2a); probabilistic assumptions are supplied as hypotheses.
Keywords: Expected Shortfall, backtesting, regulatory capital, Monte Carlo noise, formal verification, Lean 4, Basel III, FRTB
JEL Classification: G32, C12, C15
Novelty
First explicit variance decomposition of the Acerbi-Székely ES backtest statistic into irreducible (returns) and eliminable (Monte Carlo) components, quantifying a contamination that the literature acknowledged but never measured.