RLHF

Reinforcement Learning from Human Feedback, Alignment, Reward Models

Feeds to Scour
SubscribedAll
Scoured 183 posts in 6.6 ms

Training Deliberative Monitors for Black-Box Scheming Detection

 🎮Reinforcement Learning
lesswrong.com·

Emergence of Context Characteristics Sensitivity in Large Language Models

 🤖LLM  Content type: Academic
arxiv.org·

Raize Orion Multi-framework GRC with anchored NIS2 reporting clocks

 🎯Fine-tuning
raizehq.dev··Hacker News

The Art of Interrogation: Consistency Amplifies Factuality in Spatial Reasoning

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Cohere open-sources a coding agent that runs on a single H100

 🤖Agent
venturebeat.com·

Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators

 🧠OpenAI  Content type: News
the-decoder.com
·

magenta/magenta-realtime: Magenta RealTime 2: An Open-Weights Live Music Model

 🎯Fine-tuning  Content type: Code
github.com·

Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding

 Gemini  Content type: Academic
arxiv.org·

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

 🤖AI  Content type: Academic
arxiv.org·

A free diagnostic for the Claude Certified Architect exam

 🎭Anthropic Claude  Content type: Discussion  Content type: Tutorial

Optimisation over non-stationary distributions creates weirder minds

 🎮Reinforcement Learning
lesswrong.com·

PriFT: Prior-Support Guided Supervised Fine-Tuning

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

The sample efficiency black hole

 ✍️Prompt Engineering  Content type: News
dwarkesh.com··Hacker News

Job Searcher

 🎯Fine-tuning  Content type: Blog
huggingface.co·

A Regret Minimization Framework on Preference Learning in Large Language Models

 🤖AI  Content type: Academic
arxiv.org·

happy monday

 🎭Anthropic Claude
world.hey.com·

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

 🦙Llama

The EU Cloud Sovereignty Framework Sets a New Benchmark - for Everyone

 🎯Fine-tuning  Content type: Blog
cirran.eu··r/devops

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

 🤖AI  Content type: Academic
arxiv.org·

Neglected Basics of AI Alignment

 🤖AI
lesswrong.com·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help