Reinforcement learning, Post training

Feeds to Scour
SubscribedAll
Scoured 103 posts in 6.6 ms

Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators

 🏋Training  Content type: News
the-decoder.com
·

A Regret Minimization Framework on Preference Learning in Large Language Models

 🤖AI, LLM,  Content type: Academic
arxiv.org·

AWS Destroyed the Value Proposition for Bedrock

 🤖AI, LLM,  Content type: Blog
securosis.com·

Beyond AI Firewalls: The Rise of Runtime Governance

 🤖AI, LLM,  Content type: Blog
medium.com·

Show HN: The Deterministic Core Architecture for AI-Augmented Applications

 🤖AI, LLM,

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

 🤖AI, LLM,  Content type: Academic
arxiv.org·

As Trump turns 80, who are the oldest – and youngest – current world leaders?

 🤖AI, LLM,
pewresearch.org·

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

 🤖AI, LLM,  Content type: Academic
arxiv.org·

Government proposes new integration allowance for migrants

 🏋Training
helsinkitimes.fi·

PAFO: Pareto Fairness Optimization for Personalized Reward Modeling

 🤖AI, LLM,  Content type: Academic
arxiv.org·

Turkish Navy Confirms 2032 Delivery Date for MUGEM Aircraft Carrier

 🏋Training
navalnews.com·

happy monday

 🤖AI, LLM,
world.hey.com·

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation

 🏋Training  Content type: Academic
arxiv.org·

The EU Cloud Sovereignty Framework Sets a New Benchmark - for Everyone

 🤖AI, LLM,  Content type: Blog
cirran.eu··r/devops

Four insights you might have missed from theCUBE’s coverage of IBM Think

 🤖AI, LLM,
siliconangle.com·

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

 🏋Training  Content type: Academic
arxiv.org·

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending

 🤖AI, LLM,  Content type: Academic
arxiv.org·

Macron’s nuclear pact expands across Scandinavia as global forces surges

 🏋Training  Content type: News
breakingdefense.com·

Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation

 🤖AI, LLM,  Content type: Academic
arxiv.org·

(VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to AI: Interviewing CEO Chris Best

 🤖AI, LLM,  Content type: News  Content type: Blog
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help