🛡️ AI Safety - obaqueiro · Scour

Sixteen schemes for AI safety

🧠AI Research

lesswrong.com·

My Oslo Freedom Forum Keynote: Authoritarians and AI

🤖AI Blog

redpacket.substack.com··Substack

Advanced AI Safety Addendum

cloud.google.com··Hacker News

[Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"

🤖AI Blog

meditationsondigitalminds.substack.com··Substack

Reward Hacking, The Loophole Lesson: Winning the Signal, Losing the Reason

🧬Biohacking Blog

·

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

🧬Biohacking Academic

Germany to create AI safety agency

techxplore.com·

Less-relevant results

The Best Politician In A Generation

🤖AI News Blog

benthams.substack.com··Substack

Mechanistic Interpretability: The Key to Trusting Agentic AI

🤖AI Discussion

bradenkelley.com·

Claude Fable 5 and new AI safety fables

🔐Security News

interconnects.ai··Hacker News

Criti-hyping is the best thing that happened to Big Tech

reveriesofahuman.com·

Model Evaluations: Prove Your Routing Policy Actually Works

✍️Prompt Engineering Blog

digitalocean.com·

Assessing the Polyglot Chatbot: Multilingual Safety in AI Systems

Leaderboard Integrity Update at terminal-bench

🕵️Vulnerability Research

tbench.ai··Hacker News

The Stoic Path to Actual AI Safety: Three Practical Steps for Industry and Individuals

OpenAI says it will comply with Trump's order to let the government review AI models before release

Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing

🤖AI News Blog

importai.substack.com··Substack

Iliad is Hiring

✍️Prompt Engineering

lesswrong.com·

Aaronontheweb/dotnet-slopwatch: Catch naughty LLM reward-hacking and bad behavior for .NET coding

⚙️MLOps Code

github.com··Hacker News

AI policy scholar Dean W. Ball shares a text from his mother recommending he focus on frontier AI policy

Log in to enable infinite scrolling