🛡️ AI Safety - vabsw · Scour

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

🔄Transformers Academic

Less-relevant results

Complete Drosophila Nervous System Mapped

neurosciencenews.com·

Who Elected Anthropic?

📜Tech Policy Blog

vizierprime.substack.com··Substack

scMTG reconstructs single-cell temporal dynamics with Markov transition generators

🆓Free Software Academic

Finding Inner Stillness at the Jinmandir

srmdwpsitelive.kinsta.cloud·

Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers

🔄Transformers Academic

Iliad is Hiring

lesswrong.com·

High Dynamic Range DIY Air Testing

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

🧠Philosophy Code

github.com··r/SideProject

Interactions Between Crosscoder Features: A Compact Proofs Perspective

🤖LLMs Academic

The Rise of Agentic AI Threats: How Attackers Are Weaponizing AI Agents Against Your Business

🔐Cybersecurity Blog

Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms

🔄Transformers Academic

Sequent: scale and automation for higher confidence in alignment

🎮Reinforcement Learning

lesswrong.com·

Beyond Safety Through Filtering: Toward Responsible Training on Human Distress

🔄Transformers Blog

compliancearchitecture.substack.com··r/OpenAI

FoldSAE: Learning to Steer Protein Folding Through Sparse Representations

🤖AI Academic

Claude vs GPT-4: Which AI API Is Better for Developers? (2026)

kalyna.pro··DEV

Coelho Mollo and Millière: The Vector Grounding Problem

philosophyofbrains.com·

Guardian Angels: LLM Personalization for Productivity and Security

🔄Transformers

gwern.net··Hacker News

‘We Did Our Best!’ | Meghan O’Gieblyn

🔄Transformers

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

🤖AI Academic

Log in to enable infinite scrolling