RLHF

Reinforcement Learning from Human Feedback, Alignment, Reward Models

Feeds to Scour
SubscribedAll
Scoured 167 posts in 6.9 ms

Posting for authoring

 🤖Agent
turingpost.com·

EDPB meets with EU Commissioner McGrath and adopts common data breach notification template

 🎯Fine-tuning
edpb.europa.eu·

Some Interesting Papers on RLVR

 🎮Reinforcement Learning
lesswrong.com·

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 🤖AI  Content type: Code
github.com··Hacker News

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🎯Fine-tuning  Content type: Academic
arxiv.org·

The Substitution Wave in AI

 🎮Reinforcement Learning
tomtunguz.com·

Cisco AI Defense Policy Studio: Turning Unwritten Policy into Adaptive AI Guardrails

 🤖Agent  Content type: Blog
blogs.cisco.com·

A Unifying Lens on Reward Uncertainty in RLHF

 🤖AI  Content type: Academic
arxiv.org·

PSA: Convoy offers SFT-70 4000K, CRI 90 (pre-production)

 ✍️Prompt Engineering

Can You Hide From a Natural Language Autoencoder?

 🎯Fine-tuning  Content type: Blog
yogesh.bearblog.dev·

Clipping Businesses: Pay-Per-View Distribution, Clip Armies, View Verification

 🔓Open Source
trends.vc·

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

 🤖LLM  Content type: Academic
arxiv.org·

I built a machine that turns AI papers into interactive explainers

 🎮Reinforcement Learning  Content type: Blog
blog.skz.dev·

GPT-2: Too Dangerous To Release (2019)

 🤖LLM  Content type: Blog

Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding

 Gemini  Content type: Academic
arxiv.org·

X-VPN proves its privacy credentials with new independent no-logs audit

 🎯Fine-tuning  Content type: News
techradar.com
·

AWS Destroyed the Value Proposition for Bedrock

 🎭Anthropic Claude  Content type: Blog
securosis.com·

Emergence of Context Characteristics Sensitivity in Large Language Models

 🤖LLM  Content type: Academic
arxiv.org·

Training Deliberative Monitors for Black-Box Scheming Detection

 🎮Reinforcement Learning
lesswrong.com·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help