Back to article

Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks (opens in new tab)

Covered by 6 sources including Simon Willison's Newsletter, red.anthropic.comDiscussed on Hacker News

Covered in 7 articles

Simon Willison's Newsletter·

Datasette Apps: Host custom HTML applications inside Datasette

Discussed on Substack

red.anthropic.com·

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

Discussed on Hacker News and Hacker News

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

lesswrong.com·

Why I think evals are pretty important and most worth working on (for me)

lesswrong.com·

2B scoring model flags out-of-domain misalignment, suggesting specialist judges have potential for audits

Interconnects·

Claude Fable 5 and new AI safety fables

Discussed on Hacker News

eido-askayo.blogspot.com·

Anthropic’s Series H and Draft S-1 Point to a Bigger Shift in Frontier AI

Discussed on Hacker News