Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎯 Alignment Research
AI alignment, RLHF, value alignment, reward modeling
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
4880
posts in
11.1
ms
Distributional
Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value
Codebook
✨
LLMs
arxiv.org
·
1d
Show HN:
ECX
a 'Jail-Fix' for
RLHF
Neutrality Loops in LLMs
🕳
LLM Vulnerabilities
zenodo.org
·
5d
·
Hacker News
Low-Rank Key Value Attention: Reducing
KV
Cache Memory and
Maintaining
Head Diversity
🤖
LLM
fin.ai
·
17h
·
Hacker News
🥇Top AI
Papers
of the Week
🛡️
AI Safety
nlp.elvissaravia.com
·
4d
AI Safety at the
Frontier
:
Paper
Highlights of February & March 2026
🛡️
AI Safety
lesswrong.com
·
5d
·
Hacker News
Learning What Matters: Dynamic
Dimension
Selection and
Aggregation
for Interpretable Vision-Language Reward Modeling
✨
Gemini
arxiv.org
·
2d
Show HN: Pre-training,
fine-tuning
, and
evals
platform
⚙️
MLOps
oumi.ai
·
6d
·
Hacker News
Generalization
Limits of Reinforcement Learning
Alignment
🛡️
AI Safety
arxiv.org
·
4d
Simulating
the Evolution of Alignment and
Values
in Machine Intelligence
🛡️
AI Safety
arxiv.org
·
2d
One Model for All:
Multi-Objective
Controllable
Language Models
✨
LLMs
arxiv.org
·
3d
Mitigating
Reward Hacking in
RLHF
via Advantage Sign Robustness
🛡️
AI Security
arxiv.org
·
4d
TDA-RC
: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models
🤖
LLM
arxiv.org
·
2d
ARM: Advantage
Reward
Modeling for Long-Horizon
Manipulation
♟️
Game Theory
arxiv.org
·
4d
TABQAWORLD
: Optimizing Multimodal Reasoning for Multi-Turn Table Question
Answering
✨
Gemini
arxiv.org
·
3d
Beyond Compromise:
Pareto-Lenient
Consensus for Efficient Multi-Preference LLM Alignment
🤖
LLM
arxiv.org
·
2d
Beyond Semantic
Manipulation
: Token-Space Attacks on
Reward
Models
🛡️
AI Security
arxiv.org
·
4d
UniCreative
: Unifying Long-form Logic and Short-form
Sparkle
via Reference-Free Reinforcement Learning
🧠
Agent Memory
arxiv.org
·
2d
Hierarchical
Reinforcement Learning with Augmented Step-Level
Transitions
for LLM Agents
🧠
Agent Memory
arxiv.org
·
2d
Information as Structural Alignment: A
Dynamical
Theory of
Continual
Learning
🧠
Agent Memory
arxiv.org
·
1d
Discrete
Flow
Matching
Policy Optimization
📱
Edge AI Optimization
arxiv.org
·
1d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help