LUT-KAN: Segment-wise LUT Quantization for Fast KAN Inference

View PDF HTML (experimental)

Abstract:Kolmogorov–Arnold Networks (KAN) replace scalar weights by learnable univariate functions, often implemented with B-splines. This design can be accurate and interpretable, but it makes inference expensive on CPU because each layer requires many spline evaluations. Standard quantization toolchains are also hard to apply because the main computation is not a matrix multiply but repeated spline basis evaluation. This paper introduces LUT-KAN, a segment-wise lookup-table (LUT) compilation and quantization method for PyKAN-style KAN layers. LUT-KAN converts each edge function into a per-segment LUT with affine int8/uint8 quantization and linear interpolation. The metho…

View PDF HTML (experimental)

Abstract:Kolmogorov–Arnold Networks (KAN) replace scalar weights by learnable univariate functions, often implemented with B-splines. This design can be accurate and interpretable, but it makes inference expensive on CPU because each layer requires many spline evaluations. Standard quantization toolchains are also hard to apply because the main computation is not a matrix multiply but repeated spline basis evaluation. This paper introduces LUT-KAN, a segment-wise lookup-table (LUT) compilation and quantization method for PyKAN-style KAN layers. LUT-KAN converts each edge function into a per-segment LUT with affine int8/uint8 quantization and linear interpolation. The method provides an explicit and reproducible inference contract, including boundary conventions and out-of-bounds (OOB) policies. We propose an ``honest baseline’’ methodology for speed evaluation: B-spline evaluation and LUT evaluation are compared under the same backend optimization (NumPy vs NumPy and Numba vs Numba), which separates representation gains from vectorization and JIT effects. Experiments include controlled sweeps over LUT resolution L in 16, 32, 64, 128 and two quantization schemes (symmetric int8 and asymmetric uint8). We report accuracy, speed, and memory metrics with mean and standard deviation across multiple seeds. A two-by-two OOB robustness matrix evaluates behavior under different boundary modes and OOB policies. In a case study, we compile a trained KAN model for DoS attack detection (CICIDS2017 pipeline) into LUT artifacts. The compiled model preserves classification quality (F1 drop below 0.0002) while reducing steady-state CPU inference latency by 12x under NumPy and 10x under Numba backends (honest baseline). The memory overhead is approximately 10x at L=64. All code and artifacts are publicly available with fixed release tags for reproducibility.


Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2601.03332 [cs.LG]
	(or arXiv:2601.03332v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2601.03332 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Oleksandr Kuznetsov [view email] [v1] Tue, 6 Jan 2026 18:00:45 UTC (765 KB)

Submission history

Similar Posts