🛡️ AI Safety - taylor · Scour

Sixteen schemes for AI safety

lesswrong.com·

Advanced AI Safety Addendum

cloud.google.com··Hacker News

AI red teaming comes of age

🌐Distributed Systems

csoonline.com·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

🔬Mech Interp Academic

Matador-og/huntbot: AI offensive security harness for bug bounty, pentesting, red teaming.

✅Formal Verification Code

github.com··Hacker News

Autonomous Pentesting vs Autonomous Red Teaming: What's the Difference?

🌐Distributed Systems

Musk's xAI accused of illegally firing engineer who raised safety concerns

🐧Operating Systems News

ca.finance.yahoo.com·

My Oslo Freedom Forum Keynote: Authoritarians and AI

📊Observability Blog

redpacket.substack.com··Substack

Claude Fable 5 and new AI safety fables

🌐Distributed Systems News

interconnects.ai··Hacker News

Mechanistic Interpretability: The Key to Trusting Agentic AI

🔍Interpretability Discussion

bradenkelley.com·

[Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"

✅Formal Verification Blog

meditationsondigitalminds.substack.com··Substack

AI giant says its own models could soon improve themselves — and now it wants a global pause

🔍Interpretability

thecooldown.com·

Germany to create AI safety agency

🎲Procedural Generation

techxplore.com·

AI policy scholar Dean W. Ball shares a text from his mother recommending he focus on frontier AI policy

🐧Operating Systems

Assessing the Polyglot Chatbot: Multilingual Safety in AI Systems

✅Formal Verification

The Stoic Path to Actual AI Safety: Three Practical Steps for Industry and Individuals

🌐Distributed Systems

From oversight to coercion: How authoritarian governments are twisting AI safety to get tech companies to fall in line

🔍Interpretability

theconversation.com·

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

🔍Interpretability Academic

Criti-hyping is the best thing that happened to Big Tech

🌐Distributed Systems

reveriesofahuman.com·

OpenAI says it will comply with Trump's order to let the government review AI models before release

✅Formal Verification

Log in to enable infinite scrolling