🛡️ AI Safety - kevincrane · Scour

Mechanistic Interpretability: The Key to Trusting Agentic AI

🤝AI Agents Discussion

bradenkelley.com·

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

🧠LLMs Academic

The Pope Found the Missing Layer in AI Alignment

🤖AI Engineering Blog

chrisperkins505.medium.com·

The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably

lesswrong.com·

[Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"

🤖AI Engineering Blog

meditationsondigitalminds.substack.com··Substack

Anthropic’s Bet: Interview with Dario Amodei

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

🧠LLMs Academic

Criti-hyping is the best thing that happened to Big Tech

🕸️Distributed Systems

reveriesofahuman.com·

#5: Advertising is broken. It’s time to move your brand inside the model.

Guardian Angels: LLM Personalization for Productivity and Security

gwern.net··Hacker News·Cited by 3 articles

Less-relevant results

Adam Smith's Creation of a "Large Model" - 36 Kr

AI Will Not Start a Nuclear War, but Humans Might: Conclusions and Policy Recommendations The notion that AI could start a nuclear war may be attention-grabbing...

🤖AI Engineering

ai-frontiers.org

·

Solsong Chord Updates

Neglected Basics of AI Alignment

lesswrong.com·

Cisco AI Defense Policy Studio: Turning Unwritten Policy into Adaptive AI Guardrails

🧠LLMs Blog

blogs.cisco.com·

ERTS: Adversarial Robustness Testing of Ethical AI via Semantic Perturbation in a Bounded Consequence Space

🧠LLMs Academic

Designer babies. Self-improving AI. Are we ready for either?

🧠LLMs News

·

Op Ed: Consultant Tony O’Connor On The Agentic Trojan Horse

thecompanydime.com·

Is the Space Pope Reptilian?

🧠LLMs News

tearsinrain.ai··Hacker News

Seven big ideas from 7x7

Log in to enable infinite scrolling