🛡️ AI Safety - Nakeddave02 · Scour

🔬Science lesswrong.com·

Thoughts on Likelihood of Existential Risks by Misaligned AIs

🤖LLMs arxiv.org·

Reward Hacking in Language Model Agents: Revisiting AI Safety Gridworlds

🎮Gamification medium.com

·

Reward hacking in Reinforcement learning

⚖️AI Regulation E-International Relations·

Interview – Andrea Miotti

Covers 2 stories including When AI Builds Itself

🔬ROM Hacking medium.com

·

What I Learned Studying Whether Fine-Tuning Breaks a Transformer’s “Copy Mechanism”

⚖️AI Governance science.org·

Researchers caught in the crossfire as companies and government grapple over AI safety

🤖AI GitHub·

Open source AI projects from Banco Santander

Discussed on Hacker News

⚖️AI Governance tehnologijaviews.medium.com·

Is the US Government’s Anthropic Ban Actually Helping the Brand? A Surprising Turn in AI Regulation

⚖️AI Ethics stevekinney.com·

Some Thoughts on AI Safety

Covers 10 stories including Goodhart's Law

Discussed on Hacker News

⚖️AI Ethics medium.com

·

Ninety Percent of Physicians Trust Their Clinical AI. They Catch a Third of Its Dangerous Errors.

⚖️AI Regulation lesswrong.com·

On revolutionary love in AI safety

🤖AI TechRadar

·

'AI will probably most likely lead to the end of the world, but in the meantime, there’ll be great companies' - quote of the day by OpenAI CEO Sam Altman

Covers Sam Altman May Control Our Future—Can He Be Trusted?

⚖️AI Regulation ft.com

·

Letter: Argentina’s AI fix widens the gap it is meant to close

⚖️AI Governance CNBC

·

Synthesia CEO: Creating a coalition around an AI code of conduct will help build the AI future we all want

⚖️AI Governance highcapacity.org·

Mythos, China, and a New Era of AI Regulation

🔭Futurism lesswrong.com·

The Cookie Monster Explains AI Safety

Covers 10 stories including Anthropic confidentially submits draft S-1 to the SEC

⚖️AI Regulation BetaKit·

AI experts want a middle-power coalition to set safety guardrails. Some think Canada could lead it

⚖️AI Regulation lesswrong.com·

We made a map of the doom debate. Here's how the breakdown works.

⚖️Tech Policy tehnologijaviews.medium.com·

The Trump Administration’s Push to Block AI Jailbreaks: A Safety Measure or Political Theater?

📊Statistics arxiv.org·

Circuit Tracing in Autoregressive Protein Language Models

Log in to enable infinite scrolling