Context Collapse: In-Context Learning and Model Collapse

Computer Science > Artificial Intelligence

arXiv:2601.00923 (cs)

Abstract:This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the g…

Computer Science > Artificial Intelligence

arXiv:2601.00923 (cs)

View PDF

Abstract:This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the gradient direction. For model collapse, we use martingale and random walk theory to analyse simplified settings - linear regression and Gaussian fitting - under both replacing and cumulative data regimes. We strengthen existing results by proving almost sure convergence, showing that collapse occurs unless the data grows sufficiently fast or is retained over time. Finally, we introduce the notion of context collapse: a degradation of context during long generations, especially in chain-of-thought reasoning. This concept links the dynamics of ICL with long-term stability challenges in generative models.


Comments:	Master’s thesis
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2601.00923 [cs.AI]
	(or arXiv:2601.00923v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2601.00923 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Josef Ott [view email] [v1] Thu, 1 Jan 2026 17:33:47 UTC (3,363 KB)

Current browse context:

cs.AI

Change to browse by:

export BibTeX citation

Computer Science > Artificial Intelligence

Computer Science > Artificial Intelligence

Submission history

Bookmark

Similar Posts