AI Safety

Feeds to Scour
SubscribedAll
Scoured 138 posts in 7.0 ms

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 Generative AI  Content type: Academic
arxiv.org·

Mechanistic Interpretability: The Key to Trusting Agentic AI

 🤖Agentic AI  Content type: Discussion
bradenkelley.com·

White House restricts public AI testing to prioritize national security

 ⚖️AI Regulation
4sysops.com·

Sequent: scale and automation for higher confidence in alignment

 🧠AGI
lesswrong.com·

[Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"

 🏢Enterprise AI  Content type: Blog

Model Evaluations: Prove Your Routing Policy Actually Works

 🔓Open Source AI  Content type: Blog
digitalocean.com·

Anthropic releases Mythos-derived model with cyber guardrails

 🔓Open Source AI
metacurity.com·

Criti-hyping is the best thing that happened to Big Tech

 ✍️Prompt Engineering

How To Keep Giant A.I. Robots From Killing Us All

 🧠AGI
dailywire.com·

KiloBench - Because Your Benchmark Score Doesn't Pay the Bill

 ✍️Prompt Engineering  Content type: News  Content type: Blog
blog.kilo.ai·

SONAR Sitrep: How nuclear verdicts are reshaping carrier economics

 ⚖️AI Regulation
freightwaves.com·

Ask HN: What happens when humans become as dumb as AI?

 🤖Agentic AI  Content type: Discussion

Lawmakers Are Aiming To Regulate AI-Builds-AI Before AI Gets Entirely Beyond Human Control

 ⚖️AI Regulation
forbes.com·

Controversial smut as an AI alignment issue

 🧠AGI  Content type: News  Content type: Blog

My Data Science Internship Journey at Oasis Infobyte: Building Real-World Machine Learning Projects

 👨‍💻Coding Assistants  Content type: Blog
medium.com·

A new chapter of efficient foundation models for medical imaging

 🔓Open Source AI

Quote of the day by Nvidia CEO, Jensen Huang: "I appreciate that many of us grew up and enjoyed science fiction, but it's not helpful" — on quantifying the existential risks posed by AI

 🧠AGI
techradar.com
·

Why LLMs (still) lack taste

 Generative AI

Hidden Consensus:Preference-Validity Compression in Human Feedback

 Generative AI  Content type: Academic
arxiv.org·

Anish-185/Production-Line-Performance-Checker

 🏢Enterprise AI  Content type: Code
github.com··r/coding

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help