Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Alignment Research
🎯 Alignment Research
AI alignment, RLHF, value alignment, reward modeling
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
45
posts in
15.8
ms
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a Large Language
Model
✨
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
A Deep Dive into Calibration of Language
Models
: Platt Scaling, Isotonic Regression, Temperature Scaling
✨
LLMs
kdnuggets.com
·
4d
4 days ago
Actions for A Deep Dive into Calibration of Language Models: Platt Scaling, Isotonic Regression, Temperature Scaling
Less-relevant results
A free diagnostic for the Claude Certified Architect exam
🔌
Claude Plugins
Content type:
Discussion
Content type:
Tutorial
claudecertifiedarchitects.com
·
23h
23 hours ago
·
Hacker News
Actions for A free diagnostic for the Claude Certified Architect exam
Hidden Consensus:Preference-Validity Compression in Human Feedback
✨
LLMs
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for Hidden Consensus:Preference-Validity Compression in Human Feedback
Anthropic Urges Global Pause in
AI
Development, Flags 'Self-Improvement' Risk
🎭
Claude
Content type:
Discussion
news.ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Anthropic Urges Global Pause in AI Development, Flags 'Self-Improvement' Risk
Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
✨
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Stack Overflow didn't just help
AI
learn to code
🤖
AI
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
Beyond Rubrics: Exploration-Guided Evaluation Skills for
Reward
Modeling
🇨🇳
Chinese AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Beyond Rubrics: Exploration-Guided Evaluation Skills for Reward Modeling
🔮
AI
’s growth impact; recursive risks; Unitree #577
🎭
Claude
Content type:
News
exponentialview.co
·
3d
3 days ago
Actions for 🔮 AI’s growth impact; recursive risks; Unitree #577
VFUSE: Virulent Feature Understanding with Sparse autoEncoders
🔍
AI Interpretability
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for VFUSE: Virulent Feature Understanding with Sparse autoEncoders
Emergent
alignment
and the projectability of ethical personas
🛡️
Anthropic PBC
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Emergent alignment and the projectability of ethical personas
Anthropic proposes a global slowdown of
AI
development
🎭
Claude
Content type:
News
engadget.com
·
5d
5 days ago
·
Hacker News
Actions for Anthropic proposes a global slowdown of AI development
When Attribution Patching Lies: Diagnosis and a Second-Order Correction
🔍
AI Interpretability
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for When Attribution Patching Lies: Diagnosis and a Second-Order Correction
Epiplexity
⚠️
Existential Risk
Content type:
Blog
andys.blog
·
6d
6 days ago
·
Hacker News
Actions for Epiplexity
PAFO: Pareto Fairness
Optimization
for Personalized
Reward
Modeling
👤
Search Personalization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for PAFO: Pareto Fairness Optimization for Personalized Reward Modeling
Anthropic calls for global pause in
AI
development before humans lose control
🛡️
Anthropic PBC
siliconangle.com
·
5d
5 days ago
·
Hacker News
Actions for Anthropic calls for global pause in AI development before humans lose control
A Unifying Lens on
Reward
Uncertainty in
RLHF
✨
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Nvidia Nemotron 3 Ultra
✨
LLMs
research.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia Nemotron 3 Ultra
Trajectory Geometry of Transformer Representations Across Layers
🔍
AI Interpretability
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Trajectory Geometry of Transformer Representations Across Layers
Show HN: Hive Trust – Ed25519-signed benchmarks for every
AI
inference primitive
🔐
Cryptography
thehiveryiq.com
·
6d
6 days ago
·
Hacker News
Actions for Show HN: Hive Trust – Ed25519-signed benchmarks for every AI inference primitive
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help