Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎯 RLHF
Reinforcement Learning, Human Feedback, LLM Alignment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
7557
posts in
7.7
ms
Reinforcement
Learning for LLMs
mesuvash.github.io
·
2d
·
Discuss:
Hacker News
🎮
Reinforcement Learning
Generalisation of RLHF under Reward Shift and
Clipped
KL
Regularisation
arxiv.org
·
2d
🎮
Reinforcement Learning
New AI
Steering
Method Exposes
Flaws
and Potential Improvements
nationaltoday.com
·
10h
🎯
AI Agents
The
Topology
of LLM
Behavior
lesswrong.com
·
15h
🎛️
Fine-tuning
Reinforcement-aware
Knowledge
Distillation
for LLM Reasoning
arxiv.org
·
1d
⚡
Speculative Decoding
AI
Learns
To
Self-Correct
And Reduce False Claims Using Internal Knowledge
quantumzeitgeist.com
·
1d
🎛️
Fine-tuning
Simulation
for Agentic
Evaluation
yortuc.com
·
13h
·
Discuss:
Hacker News
🎯
AI Agents
Instant LLM Updates with
Doc-to-LoRA
and
Text-to-LoRA
pub.sakana.ai
·
20h
·
Discuss:
Lobsters
,
Hacker News
🎛️
Fine-tuning
I
fine-tuned
Qwen
14B
to beat GPT-4o on NYT Connections (30% vs 22.7%)
john463212.substack.com
·
2d
·
Discuss:
r/LocalLLaMA
⚡
Speculative Decoding
Learning about
automated
prompts
marcabraham.com
·
8h
✍️
Prompt Engineering
Driving the Edge:
YOLOv11
Autonomous Mastery with
MentorPi
hackster.io
·
8h
🧠
Deep Learning
A Coding Implementation to Build a
Hierarchical
Planner
AI Agent Using Open-Source LLMs with Tool Execution and Structured Multi-Agent Reasoning
marktechpost.com
·
13h
🤖
Agentic AI
A
multi-objective
graph reinforcement learning framework for urban public facility
location
problem
sciencedirect.com
·
1d
🤖
Game AI
Can LLM
Embeddings
Improve Time Series
Forecasting
? A Practical Feature Engineering Approach
machinelearningmastery.com
·
1d
⚡
Speculative Decoding
Breaking through safety performance
stagnation
in autonomous vehicles with
dense
learning
nature.com
·
3d
🎮
Reinforcement Learning
Asura
:
Looped
Language Models done better
neel04.github.io
·
2d
·
Discuss:
Hacker News
💬
LLMs
Stop
Asking
if a Model Is
Interpretable
towardsdatascience.com
·
1d
🤖
AI
Mercury
2: The AI Model That Feels
Instant
analyticsvidhya.com
·
23h
🤖
Game AI
Accuracy
vs. Speed in Local LLMs: Finding Your
Sweet
Spot
grigio.org
·
6h
·
Discuss:
Hacker News
🎛️
Fine-tuning
Show HN: A framework to
observe
epistemic
drift in AI outputs
app.guardianai.fr
·
2d
·
Discuss:
Hacker News
⚡
Speculative Decoding
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help