← All Papers · Machine Learning

The Structured Latent Basis: Feature Engineering as Basis Selection

Tamás Nagy, Ph.D. Updated 2026-04-22 Working Paper Machine Learning
View in Graph BibTeX

Abstract

We introduce the Structured Latent Basis (SLB) framework, a perspective on supervised learning that unifies feature engineering and modeling as a single problem: selecting a mathematical basis in which the target function is linear. When the basis is well-matched to the function class — smooth modes for smooth targets, threshold functions for step targets, spectral decompositions for sequential data — Ridge regression on the basis features is sufficient, and the model inherits the approximation properties of the basis itself.

We present two instantiations:

1. Tabular (S³M): Cosine modes + multi-scale sigmoids with data-learned thresholds. On 9 benchmark datasets, the S³M basis achieves statistically significant improvements over XGBoost on 4 datasets, and the tree-free variant (Ridge only) beats XGBoost by 5.7% on the Kaggle Ames Housing benchmark (Nagy, 2026a).

2. Text (Spectral Text): Per-token embeddings from a pretrained transformer, followed by DCT-II along the position axis, produce frequency-domain features that decompose text into semantic scales. On 4 NLP benchmarks (SST-2, AG News, Rotten Tomatoes, IMDB), DCT features combined with sentence embeddings improve classification accuracy on 3 of 4 datasets, and pure DCT features (256 dimensions, no sentence pooling) match the 384-dimensional sentence embedding baseline.

Both instantiations reduce the model to Ridge regression — the complexity lives entirely in the basis. The framework provides a constructive answer to the question "what is a good feature?" — it is a basis function matched to the target's regularity structure.

Keywords: feature engineering, basis selection, spectral methods, DCT, Ridge regression, tabular learning, text classification, structured representation

Length
4,963 words
Claims
2 theorems
Status
Working Paper
Target
JMLR / ICML

Browse all Machine Learning papers →