Performance Archaeology, Access Patterns, Memory Hierarchies, Optimization Forensics
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
arxiv.org·18h
Loading...Loading more...
Performance Archaeology, Access Patterns, Memory Hierarchies, Optimization Forensics