Computer Science > Artificial Intelligence
arXiv:2601.00923 (cs)
Abstract:This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the g…
Computer Science > Artificial Intelligence
arXiv:2601.00923 (cs)
Abstract:This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the gradient direction. For model collapse, we use martingale and random walk theory to analyse simplified settings - linear regression and Gaussian fitting - under both replacing and cumulative data regimes. We strengthen existing results by proving almost sure convergence, showing that collapse occurs unless the data grows sufficiently fast or is retained over time. Finally, we introduce the notion of context collapse: a degradation of context during long generations, especially in chain-of-thought reasoning. This concept links the dynamics of ICL with long-term stability challenges in generative models.
| Comments: | Master’s thesis |
| Subjects: | Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2601.00923 [cs.AI] |
| (or arXiv:2601.00923v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2601.00923 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Josef Ott [view email] [v1] Thu, 1 Jan 2026 17:33:47 UTC (3,363 KB)
Current browse context:
cs.AI
Change to browse by:
export BibTeX citation