Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
438
posts in
7.0
ms
TT-DAC-PS: Twin-Target Deterministic
Actor-Critic
with
Policy
Smoothing for Optimal Trade Execution
📈
Optimization
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution
Less-relevant results
Import AI 460:
Reward
hacking society, RSI data from Anthropic; and
RL-based
quadcopter racing
🚀
Bootstrapping
jack-clark.net
·
3d
3 days ago
Actions for Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing
I got so mad at poke(rogue)like that I trained a
RL
agent
to beat it for me
🤖
AI Agents
thiagolira.blot.im
·
4d
4 days ago
·
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🚀
Bootstrapping
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
China women’s volleyball team finish Nations League leg on a high after opening defeat
📈
Economics
Content type:
News
scmp.com
·
3d
3 days ago
·
r/SCMPauto
Actions for China women’s volleyball team finish Nations League leg on a high after opening defeat
Why LLMs (still) lack taste
🤖
LLM
beyondtheprior.com
·
3d
3 days ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Reinforcement
Learning
Disrupts
Gradient-Based
Adversarial Optimization
🔥
PyTorch
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
Reinforcement
learning
in linear embedding space unlocks generalizable control across soft robot configurations
🔢
Embeddings
Content type:
Academic
nature.com
·
4d
4 days ago
Actions for Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
📈
Optimization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training
agents
.
🤖
Machine Learning
Content type:
Blog
huggingface.co
·
4d
4 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
Comp.compilers: Paper: MileStone: A Multi-Objective Compiler Phase Ordering Framework for Graph-based IR-Level Optimization
🕸️
Knowledge Graphs
compilers.iecc.com
·
6d
6 days ago
Actions for Comp.compilers: Paper: MileStone: A Multi-Objective Compiler Phase Ordering Framework for Graph-based IR-Level Optimization
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
🤖
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Combermere and Harrison College reach Under-15 basketball final
🔤
Tokenization
cbc.bb
·
5d
5 days ago
Actions for Combermere and Harrison College reach Under-15 basketball final
CCKS: Consensus-based Communication and Knowledge Sharing
🧠
Knowledge Management
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for CCKS: Consensus-based Communication and Knowledge Sharing
Central College News
📈
Economics
Content type:
Academic
news.central.edu
·
4d
4 days ago
Actions for Central College News
Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep
Reinforcement
Learning
🔥
PyTorch
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning
Risk Has an Owner, and It's Not the AI
🤖
Automation
Content type:
Blog
aaddrick.com
·
5d
5 days ago
·
Hacker News
Actions for Risk Has an Owner, and It's Not the AI
Dmsh: A
Multi-Agent
Reinforcement
Learning
Framework for All-Quad Mesh Generation
📈
Optimization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
What is MBPO? A Beginner’s Guide to Efficient
Reinforcement
Learning
🧠
LLM Inference
Content type:
Blog
ujangriswanto08.medium.com
·
6d
6 days ago
Actions for What is MBPO? A Beginner’s Guide to Efficient Reinforcement Learning
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
🔥
PyTorch
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help