🛡️ AI Safety - obaqueiro · Scour

Assessing the Polyglot Chatbot: Multilingual Safety in AI Systems

Cheap Reward Hacking Detection

🕵️Vulnerability Research Academic

arxiv.org··Hacker News

From oversight to coercion: How authoritarian governments are twisting AI safety to get tech companies to fall in line

theconversation.com·

The Stoic Path to Actual AI Safety: Three Practical Steps for Industry and Individuals

The technical community can't be the main character in AI safety anymore

substackcdn.com··Substack

Germany's National Security Council greenights an AI Safety Institute modeled after the UK's AISI

the-decoder.com

·

Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing

🤖AI News Blog

importai.substack.com··Substack

AI Scientist Bengio on Engineering Safer Agents

🤖AI News

·

Clearing Up The Confusion About What Anthropic Really Said On Globally Pausing The Unrelenting Race Toward AI That Builds AI

new mantra just dropped

Complex Objects: Why AI Safety Can’t Just Think in Posts

🤖AI Blog

Paving the way for agents in biology

anthropic.com··Hacker News

AI Scientist Bengio: Building Systems We Don't Know How to Control

🤖AI News

·

Anthropic's Model Naming, Extrapolated

samwilkinson.io··Hacker News

I Started an AI Safety Research Org and Think These 7 Things Matter

🚀Startup Strategy

lesswrong.com·

Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization

✍️Prompt Engineering Academic

What Will Canada’s AI Strategy Mean for Jobs and Safety?

🚀Startup Strategy News

·

KiloBench - Because Your Benchmark Score Doesn't Pay the Bill

✍️Prompt Engineering News Blog

AI Safety — Genuine or Performative?

🤖AI Blog

·

In policy paper, OpenAI diverges from White House on AI safety

siliconangle.com·

Log in to enable infinite scrolling