Circuit Synchronization Precedes Generalization: A Causal Precursor to Grokking (opens in new tab)
Grokking is the delayed generalisation phenomenon where a transformer trained on modular arithmetic abruptly transitions from near-chance to near-perfect validation accuracy. It has been attributed to a Fourier-based algorithmic circuit, but its timing, causal structure, and controllability remain poorly understood. We introduce the Frequency Synchronization Degree (FSD), a normalised, permutation-tested metric for Fourier circuit synchronis...
Read the original article