The Curved Spacetime of Transformer Architectures

View PDF HTML (experimental)

Abstract:We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise …

View PDF HTML (experimental)

Abstract:We present a geometric framework for understanding Transformer-based language models, drawing an explicit analogy to General Relativity. Queries and keys induce an effective metric on representation space, and attention acts as a discrete connection that implements parallel transport of value vectors across tokens. Stacked layers provide discrete time-slices through which token representations evolve on this curved manifold, while backpropagation plays the role of a least-action principle that shapes loss-minimizing trajectories in parameter space. If this analogy is correct, token embeddings should not traverse straight paths in feature space; instead, their layer-wise steps should bend and reorient as interactions mediated by embedding space curvature. To test this prediction, we design experiments that expose both the presence and the consequences of curvature: (i) we visualize a curvature landscape for a full paragraph, revealing how local turning angles vary across tokens and layers; (ii) we show through simulations that excess counts of sharp/flat angles and longer length-to-chord ratios are not explainable by dimensionality or chance; and (iii) inspired by Einstein’s eclipse experiment, we probe deflection under controlled context edits, demonstrating measurable, meaning-consistent bends in embedding trajectories that confirm attention-induced curvature.


Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Differential Geometry (math.DG)
Cite as:	arXiv:2511.03060 [cs.LG]
	(or arXiv:2511.03060v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2511.03060 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jairo Diaz-Rodriguez [view email] [v1] Tue, 4 Nov 2025 22:58:40 UTC (1,644 KB)

Submission history

Similar Posts