Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement learning, Post training
🎯 Reinforcement learning, Post training
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
103
posts in
29.0
ms
Microsoft Research's Lens proves detailed captions matter more than raw scale for
training
efficient image generators
🏋
Training
Content type:
News
the-decoder.com
·
3d
3 days ago
Actions for Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators
Research Is Not Engineering at a Slower Speed
🤖
AI, LLM,
voiceinthemachine.com
·
1d
1 day ago
·
Hacker News
Actions for Research Is Not Engineering at a Slower Speed
A Regret Minimization Framework on
Preference
Learning
in Large Language
Models
🤖
AI, LLM,
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
AWS Destroyed the Value Proposition for Bedrock
🤖
AI, LLM,
Content type:
Blog
securosis.com
·
1d
1 day ago
Actions for AWS Destroyed the Value Proposition for Bedrock
Show HN: The Deterministic Core Architecture for AI-Augmented Applications
🤖
AI, LLM,
brandonbellsystems.com
·
6d
6 days ago
·
Hacker News
Actions for Show HN: The Deterministic Core Architecture for AI-Augmented Applications
As Trump turns 80, who are the oldest – and youngest – current world leaders?
🤖
AI, LLM,
pewresearch.org
·
3d
3 days ago
Actions for As Trump turns 80, who are the oldest – and youngest – current world leaders?
Beyond Scalar
Rewards
by Internalizing Reasoning into Score Distributions
🤖
AI, LLM,
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions
Government proposes new integration allowance for migrants
🏋
Training
helsinkitimes.fi
·
6d
6 days ago
Actions for Government proposes new integration allowance for migrants
Training
LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct
Preference
Optimization
🤖
AI, LLM,
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization
Turkish Navy Confirms 2032 Delivery Date for MUGEM Aircraft Carrier
🏋
Training
navalnews.com
·
4d
4 days ago
Actions for Turkish Navy Confirms 2032 Delivery Date for MUGEM Aircraft Carrier
PAFO: Pareto Fairness Optimization for Personalized
Reward
Modeling
🏋
Training
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for PAFO: Pareto Fairness Optimization for Personalized Reward Modeling
happy monday
🤖
AI, LLM,
world.hey.com
·
3d
3 days ago
Actions for happy monday
The EU Cloud Sovereignty Framework Sets a New Benchmark - for Everyone
🤖
AI, LLM,
Content type:
Blog
cirran.eu
·
3d
3 days ago
·
r/devops
Actions for The EU Cloud Sovereignty Framework Sets a New Benchmark - for Everyone
SARM2: Multi-Task Stage Aware
Reward
Modeling
for Self Improving Robotic Manipulation
🏋
Training
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation
Four insights you might have missed from theCUBE’s coverage of IBM Think
🤖
AI, LLM,
siliconangle.com
·
6d
6 days ago
Actions for Four insights you might have missed from theCUBE’s coverage of IBM Think
To Intervene or Not: Guiding Inference-time
Alignment
with Probabilistic
Model
Blending
🤖
AI, LLM,
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending
Macron’s nuclear pact expands across Scandinavia as global forces surges
🏋
Training
Content type:
News
breakingdefense.com
·
3d
3 days ago
Actions for Macron’s nuclear pact expands across Scandinavia as global forces surges
Learning
to Attack and Defend: Adaptive Red Teaming of Language
Models
via GRPO
🏋
Training
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO
(VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to AI: Interviewing CEO Chris Best
🤖
AI, LLM,
Content type:
News
Content type:
Blog
braddelong.substack.com
·
1w
1 week ago
·
Substack
Actions for (VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to AI: Interviewing CEO Chris Best
Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation
🤖
AI, LLM,
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help