AI Safety
Trajectory Geometry of Transformer Representations Across Layers
🔍AI Interpretability Content type: AcademicVFUSE: Virulent Feature Understanding with Sparse autoEncoders
🔍AI Interpretability Content type: AcademicWhen Attribution Patching Lies: Diagnosis and a Second-Order Correction
🔍AI Interpretability Content type: AcademicLess-relevant results