⚠️ AI Safety - alvin.kuruvilla · Scour

Evaluating whether AI models would sabotage AI safety research 🛡️AI Security

The “Bug-Free” Workforce: How AI Efficiency Is Subtly Disrupting The Interactions That Build Strong Teams ⚖️AI Governance

smashingmagazine.com·3d

Dario Amodei — The Urgency of Interpretability 📱Edge AI

darioamodei.com·4d·Hacker News

On the political feasibility of stopping AI ⚖️AI Governance

lesswrong.com·2d

Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents ⚖️AI Governance

Diary of a "Doomer": 12+ years arguing about AI risk (part 3: the LLM era) ⚖️AI Governance

lesswrong.com·4d

AI Safety Training Can be Clinically Harmful 🛡️AI Security

ML Safety Newsletter #20: AI Wellbeing, Classifier Jailbreaking and Honest Pushback Benchmarking 🛡️AI Security

lesswrong.com·2d

What People See (and Miss) About Generative AI Risks: Perceptions of Failures, Risks, and Who Should Address Them ⚖️AI Governance

Risk Reporting for Developers' Internal AI Model Use 🛡️AI Security

Thoughts on AI Safety Megagame Design 🛡️AI Security

lesswrong.com·5d

UGAF-ITS: A Standards Harmonization Framework and Validation Tool for Multi-Framework AI Governance in Distributed Intelligent Transportation Systems ⚖️AI Governance

How could I best use this opportunity? (AI Safety) 🤖Agentic AI

lesswrong.com·4d

Structural Enforcement of Goal Integrity in AI Agents via Separation-of-Powers Architecture 🛡️AI Security

Third Symposium on AIT & ML: AI Safety Applications 🛡️AI Security

lesswrong.com·5d

Green Shielding: A User-Centric Approach Towards Trustworthy AI 🛡️AI Security

Honest Ethics & AI – Part 1: The origins of morality ⚖️AI Governance

lesswrong.com·5d

When the Agent Is the Adversary: Architectural Requirements for Agentic AI Containment After the April 2026 Frontier Model Escape 🛡️AI Security

arxiv.org·2d·Hacker News

What holds AI safety together? Co-authorship networks from 200 papers 📊Algorithms

lesswrong.com·6d

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers ⚖️AI Governance

Sign up or log in to see more results

Log in to enable infinite scrolling