Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
AI Safety
🛡️ AI Safety
Specific
alignment, AI risk, RLHF, model safety
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
74
posts in
6.5
ms
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a
Large
Language Model
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Mechanistic
Interpretability
: The Key to Trusting Agentic
AI
🤖
AI Agents
Content type:
Discussion
bradenkelley.com
·
4d
4 days ago
Actions for Mechanistic Interpretability: The Key to Trusting Agentic AI
Less-relevant results
The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably
🤖
AI Agents
lesswrong.com
·
11h
11 hours ago
Actions for The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably
[Recorded talk] "
AI
Alignment
Versus
AI
Ethical Treatment: 10 Challenges"
🤖
AI Agents
Content type:
Blog
meditationsondigitalminds.substack.com
·
1d
1 day ago
·
Substack
Actions for [Recorded talk] "AI Alignment Versus AI Ethical Treatment: 10 Challenges"
Controversial smut as an
AI
alignment
issue
✍️
Prompt Engineering
Content type:
News
Content type:
Blog
thingofthings.substack.com
·
5d
5 days ago
·
Substack
Actions for Controversial smut as an AI alignment issue
Criti-hyping is the best thing that happened to Big Tech
🔭
Tech Research
reveriesofahuman.com
·
1d
1 day ago
Actions for Criti-hyping is the best thing that happened to Big Tech
Why LLMs (still) lack taste
🧠
LLMs
beyondtheprior.com
·
1d
1 day ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
🧠
LLMs
Content type:
Blog
medium.com
·
5d
5 days ago
Actions for Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
umair-tareen/philosopher-council: An eleven-philosopher
LLM
council - ask it questions or point it at
AI-research
trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
🧠
LLMs
Content type:
Code
github.com
·
5d
5 days ago
·
r/SideProject
Actions for umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
Nvidia Nemotron 3 Ultra
🧠
LLMs
research.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia Nemotron 3 Ultra
Sequent: scale and automation for higher confidence in
alignment
🧠
LLMs
lesswrong.com
·
5h
5 hours ago
Actions for Sequent: scale and automation for higher confidence in alignment
Hidden Consensus:Preference-Validity Compression in Human Feedback
🧠
LLMs
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Hidden Consensus:Preference-Validity Compression in Human Feedback
From oversight to coercion: How authoritarian governments are twisting
AI
safety
to get tech companies to fall in line
✍️
Prompt Engineering
theconversation.com
·
6d
6 days ago
Actions for From oversight to coercion: How authoritarian governments are twisting AI safety to get tech companies to fall in line
Is the Space Pope Reptilian?
✍️
Prompt Engineering
Content type:
News
tearsinrain.ai
·
7h
7 hours ago
·
Hacker News
Actions for Is the Space Pope Reptilian?
Guardian Angels:
LLM
Personalization for Productivity and Security
✍️
Prompt Engineering
gwern.net
·
3d
3 days ago
·
Hacker News
Actions for Guardian Angels: LLM Personalization for Productivity and Security
A Unifying Lens on Reward Uncertainty in
RLHF
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
The crucial human component in computing and
AI
🤖
AI Agents
Content type:
Academic
news.mit.edu
·
5d
5 days ago
Actions for The crucial human component in computing and AI
Reasoning
RL
in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
🧠
LLMs
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Complete Drosophila Nervous System Mapped
🤖
AI Agents
neurosciencenews.com
·
2d
2 days ago
Actions for Complete Drosophila Nervous System Mapped
Stack Overflow didn't just help
AI
learn to code
🧠
LLMs
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help