🛡️ AI Safety - Nakeddave02 · Scour

Affective AI Safety: The Missing Piece in LLM Safety

⚖️AI Governance Science·

Researchers caught in the crossfire as companies and government grapple over AI safety

🎮Gamification medium.com

·

Reward hacking in Reinforcement learning

⚖️AI Ethics stevekinney.com·

Some Thoughts on AI Safety

Covers 11 stories including Goodhart's Law

Discussed on Hacker News

⚖️AI Governance SiliconANGLE·

Nvidia introduces Halos for Robotics to bridge the physical AI safety gap

⚖️AI Ethics medium.com

·

Ninety Percent of Physicians Trust Their Clinical AI. They Catch a Third of Its Dangerous Errors.

⚖️AI Regulation E-International Relations·

Interview – Andrea Miotti

Covers 2 stories including When AI Builds Itself

🤖AI GitHub·

Open source AI projects from Banco Santander

Covered by Interesting Engineering++

Discussed on Hacker News

⚖️AI Regulation Financial Times

·

Letter: Argentina’s AI fix widens the gap it is meant to close

🤖AI Forbes·

OpenAI Tricks AI Into Revealing Its True Nature Prior To Being Unleashed Into The Real World

⚖️Tech Policy tehnologijaviews.medium.com·

The Trump Administration’s Push to Block AI Jailbreaks: A Safety Measure or Political Theater?

🤖Machine Learning arXiv·

Are Safety Guarantees in Neural Networks Safe? How to Compute Trustworthy Robustness Certifications

🔬ROM Hacking medium.com

·

What I Learned Studying Whether Fine-Tuning Breaks a Transformer’s “Copy Mechanism”

⚖️AI Governance CNBC

·

Synthesia CEO: Creating a coalition around an AI code of conduct will help build the AI future we all want

⚖️AI Regulation Forward Future·

Anthropic Thinks “FOOM” Is Near

⚖️AI Ethics medium.com

·

Beyond AI Safety: Alignment Through Positive Psychology

·

The Invisible Guardrail: How Commercial LLMs Enforce Algorithmic Paternalism

Discussed on DEV

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

♟️Game Theory Bentham's Newsletter·

Effective Altruists Are Underestimating Politics

Discussed on Substack

🤖LLMs arXiv·

Reinforcement Learning Towards Broadly and Persistently Beneficial Models

Covers Reinforcement learning towards broadly and persistently beneficial models

Log in to enable infinite scrolling