Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
149613
posts in
12.0
ms
Target Policy Optimization
🧠
LLMs
arxiv.org
·
2d
Markov
Decision
Processes
: The Language of Reinforcement Learning
🕵️
AI Agents
medium.com
·
4d
Trustworthy
agents in
practice
🕵️
AI Agents
anthropic.com
·
17h
Model
organisms
researchers should check whether high
LRs
defeat their model
organisms
🧠
LLMs
lesswrong.com
·
10h
Rethinking
Robotics Reinforcement Learning: A Practical
Humanoid
Training Workflow
🧠
LLMs
semiengineering.com
·
1d
Reinforcement
Learning From Human Feedback (
RLHF
) in Large Language Models(LLMs)
🧠
LLMs
pub.towardsai.net
·
6d
How HN: We were wrong about AI
capability
floors
(and why smart triggers matter)
🕵️
AI Agents
zenodo.org
·
8h
·
Hacker News
Formalizing
the "generative crash" via
inverse
reinforcement learning
🧠
LLMs
news.ycombinator.com
·
2d
·
Hacker News
Show HN: Agent Tuning, using
recursion
to achieve
predictable
agent output
✍️
Prompt Engineering
github.com
·
12h
·
Hacker News
Hyperparameter
optimization impact and tuning guidelines for decentralized multi-agent reinforcement learning in multi-energy
neighborhoods
🕵️
AI Agents
sciencedirect.com
·
2d
Three Ways
Machines
Learn
🤖
Machine Learning
medium.com
·
3d
The Dark Factory
Harness
: Turning Autonomous
Hill-Climbing
into Autonomous Research
🕵️
AI Agents
sotaverified.org
·
2d
·
Hacker News
Continual
learning for AI agents
🕵️
AI Agents
bestblogs.dev
·
4d
Neural
circuits
encode
prior knowledge of temporal statistics
🧠
LLMs
nature.com
·
2d
Autonomous
Rocket
Landing
with Reinforcement Learning (YouTube)
🕵️
AI Agents
youtube.com
·
1d
·
Hacker News
Google
DeepMind
's Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It
Outperformed
the Experts
🤖
AI
marktechpost.com
·
6d
·
r/singularity
Better
Harness
: A Recipe for
Harness
Hill-Climbing with
Evals
🕵️
AI Agents
blog.langchain.com
·
1d
QaRL
: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference
Mismatch
🧠
LLMs
arxiv.org
·
6h
ALTK
‑
Evolve
: On‑the‑Job Learning for AI Agents
🕵️
AI Agents
huggingface.co
·
1d
·
Hacker News
Tavily
🕵️
AI Agents
tavily.com
·
5d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help