Introducing AMS: Activation-based model scanner for open-weight LLM safety verification (opens in new tab)
Is your open-weight model safe? AMS is a new open source scanner that verifies LLM safety by measuring activation geometry in under a minute.
Read the original article