A First-Order Mean Field Control Analysis of Transformer Layers under Cross-Entropy Training (opens in new tab)
We study Transformer-type residual layers under cross-entropy training through a continuous-depth mean field control viewpoint. Depth is treated as time, layer parameters as controls, and the residual Transformer recursion as an explicit Euler scheme for a controlled hidden-state flow. For fixed controls, we prove an $O(\varepsilon)$ pathwise approximation of finite-depth trajectories by the continuous flow and combine this with high-probabili...
Read the original article