🔒 AI Safety - gruggiero · Scour

Who Elected Anthropic?

🤖LLM Agents Blog

vizierprime.substack.com··Substack

AI, at a Crossroads

💻AI Coding News Blog

edgyoptimist.substack.com··Substack

AI giant says its own models could soon improve themselves — and now it wants a global pause

thecooldown.com·

Thoughts on Claude Fable's silent safeguards

lesswrong.com·

Claude Fable 5: Anthropic releases a 'safe' version of Claude Mythos

✅TLA+ News

Anthropic Urges Governments to Secure Power to Halt Dangerous AI

Anthropic releases Mythos-derived model with cyber guardrails

📏Model Evaluation

metacurity.com·

Anthropic urges ‘temporary pause’ on AI development to discuss risks

💻AI Coding News

theguardian.com··Hacker News, Hacker News

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

🧠LLMs Academic

Anthropic accused of ‘secret sabotage’ as Claude Fable 5 silently limits capabilities for AI researchers and developers

💻AI Coding News

tech.yahoo.com·

Germany's National Security Council greenights an AI Safety Institute modeled after the UK's AISI

the-decoder.com

·

My Oslo Freedom Forum Keynote: Authoritarians and AI

💻AI Coding Blog

redpacket.substack.com··Substack

Anthropic Scared, Calls for Global Freeze on AI Advances

The Ghost of Alignment — Why AI Should Never Fully Obey Humanity

🤖AI Models Blog

·

Anthropic’s Dario Amodei wants governments to have the power to block ‘dangerous’ AI systems

siliconangle.com·

Advanced AI Safety Addendum

cloud.google.com··Hacker News

What the Claude Is Going on with Anthropic?

🔄Agentic Workflows

xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims

techcrunch.com·

Anthropic releases a version of its vaunted Mythos model to developers

fastcompany.com·

Claude Fable 5 and new AI safety fables

🧠LLMs News

interconnects.ai··Hacker News

Log in to enable infinite scrolling