Which tokens does a hybrid model predict better? (opens in new tab)
New token-level analyses of Olmo 3 and Olmo Hybrid show that hybrid models predict meaning-bearing, context-dependent tokens better than transformers, while transformers retain an edge on verbatim copying.
Read the original article