Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎯 Alignment Research
AI alignment, RLHF, value alignment, reward modeling
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
7196
posts in
19.5
ms
The Alignment Target Problem:
Divergent
Moral
Judgments
of Humans, AI Systems, and Their Designers
🛡️
AI Safety
arxiv.org
·
2d
AI &
Alignment
🛡️
AI Safety
chriscoyier.net
·
5d
·
Hacker News
AI Infrastructure
Architect
·
Builder
· Author
🇨🇳
Chinese AI
markferraz.com
·
4h
·
Hacker News
reward-lens: A
Mechanistic
Interpretability
Library for Reward Models
🔍
AI Interpretability
arxiv.org
·
19h
The AI
Flippening
Is Here
🔎
AI Auditing
maximepeabody.substack.com
·
2d
·
Substack
The Human
Creativity
Benchmark –
Evaluating
Generative AI in Creative Work
🎭
Claude
contralabs.com
·
4h
·
Hacker News
Flow
generation through natural language: An agentic
modeling
approach (11 minute read)
⚙️
MLOps
shopify.engineering
·
23h
Rewarding
the
Scientific
Process: Process-Level Reward Modeling for Agentic Data Analysis
🔍
AI Interpretability
arxiv.org
·
2d
A new GitHub
repo
to detect reward hacking in
RL
models
🛡️
AI Security
github.com
·
4d
·
Hacker News
AI could help human scientists pick
promising
research
topics
🧭
Content Discovery
physicsworld.com
·
14h
·
Hacker News
Reward Models Are Secretly Value Functions:
Temporally
Coherent
Reward Modeling
🧠
Agent Memory
arxiv.org
·
2d
The AI
Productivity
Scorecard
Is Broken
🔎
AI Auditing
engineeredinsight.substack.com
·
8h
·
Substack
Monitoring LLM behavior: Drift,
retries
, and
refusal
patterns
🛡️
AI Safety
venturebeat.com
·
5d
·
Hacker News
Three Models of
RLHF
Annotation
: Extension, Evidence, and Authority
⚙️
MLOps
arxiv.org
·
1d
Hidden States Know Where Reasoning
Diverges
: Credit Assignment via Span-Level
Wasserstein
Distance
🔢
BitNet
arxiv.org
·
2d
Building a (
mostly
)
reliable
research assistant
🪄
Prompt Engineering
blog.dark.ca
·
6d
·
Hacker News
Reaching
SOTA
Without Breaking the Bank: Using
AI21
Maestro to optimize deep research agents
📱
Edge AI Optimization
ai21.com
·
2d
·
Hacker News
Structural Enforcement of Goal
Integrity
in AI Agents via
Separation-of-Powers
Architecture
🛡️
AI Security
arxiv.org
·
2d
Show HN: How LLMs Work – Interactive visual guide based on
Karpathy
's
lecture
🤖
LLM
ynarwal.github.io
·
6d
·
Hacker News
Hodlatoor/SyntheticOutlaw
: 🤖 Bug bounty for AI misalignment. Submit real-world instances of AI systems behaving contrary to human intent, values, or safety — win up to $2,500.
🛡️
AI Safety
github.com
·
1d
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help