🛡️ AI Safety - aibrain0x01 · Scour

We Don't Talk Enough About The Aftermath Of Death Note ✨Generative AI

aftermath.site·1d

Automated Alignment is Harder Than You Think ⚖️AI Ethics

lesswrong.com·6d

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents 🎮Reinforcement Learning

Representation Engineering: a New Way of Understanding Models ⚙️Transformers

The mistake of conflating intelligence and power ⚖️AI Ethics

dwarkesh.com·4d·Hacker News

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs 📝LLMs

How the Grande Panda is strengthening Serbia’s car industry and where its weaknesses lie 📝LLMs

serbianmonitor.com·5d

China Yuan Hits Three Year High as Markets Watch Trump Xi Summit 🔗Deep Learning

moderndiplomacy.eu·6d

The Case for Evaluating Model Behaviors ⚖️AI Ethics

lesswrong.com·9h

Trump goes to Beijing as Washington faces a changed world 🔐Cybersecurity

mronline.org·5d

How China interprets Trump’s visit: Balance and boundaries 📝LLMs

dailysabah.com·3d

At summit, China’s Xi eased tensions with Trump without giving ground ✨Generative AI

washingtonpost.com·3d

'Thucydides trap': Xi warns Trump against 'mishandling' Taiwan 📝LLMs

XSearch: Explainable Code Search via Concept-to-Code Alignment 🔍RAG

Document-tuning instills durable animal compassion in LLMs (and generalizes to humans) ⚖️AI Ethics

lesswrong.com·1h

President Xi to pay first state visit to US in more than a decade, Trump says 📝LLMs

·5d·r/SCMPauto

In Beijing, Trump faced America’s equal 🔐Cybersecurity

chellaney.net·4d

The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness 🎮Reinforcement Learning

lesswrong.com·6d

Risk reports need to address deployment-time spread of misalignment 🔐Cybersecurity

lesswrong.com·5d

Log in to enable infinite scrolling