Reinforcement learning, Post training

Feeds to Scour
SubscribedAll
Scoured 103 posts in 5.0 ms

EDPB meets with EU Commissioner McGrath and adopts common data breach notification template

馃AI, LLM,
edpb.europa.eu

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

馃AI, LLM, Content type: Code
github.comr/SideProject

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

馃AI, LLM, Content type: Academic
arxiv.org

Ukraine is ready to share drone technology with Nordic and Baltic countries, Zelenskyy says

馃弸Training
the-journal.com

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

馃AI, LLM, Content type: Academic
arxiv.org

How to Train Your Goblin

馃AI, LLM,

Harmfulness Directions in OLMo

馃AI, LLM,
lesswrong.com

Zelenskyy meets with leaders | Arkansas Democrat Gazette

馃弸TrainingContent type: News
arkansasonline.com

Room360: Video-to-3D Spatial Reconstruction Platform

馃弸TrainingContent type: Blog
huggingface.co

DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving

馃AI, LLM, Content type: Academic
arxiv.org

X-VPN proves its privacy credentials with new independent no-logs audit

馃弸TrainingContent type: News
techradar.com

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

馃AI, LLM, Content type: Academic
arxiv.org

Clipping Businesses: Pay-Per-View Distribution, Clip Armies, View Verification

馃弸Training
trends.vc

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

馃AI, LLM,
codehamr.comr/SideProject

At Netroots Nation, Progressives Divided on AI

馃AI, LLM,
techpolicy.press

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

馃弸TrainingContent type: Academic
arxiv.org

From pew to pitch, Cameroonian priest builds bridges in French banlieue

馃弸TrainingContent type: News
rfi.fr

Mankirat47/Dao-Heart-v3.14: Dao Heart v3.14 : a bounded symbolic AI value governance research scaffold for studying value drift, oversight, warmth preservation, and identity stability under pressure.

馃AI, LLM, Content type: Code
github.comHacker News

Neglected Basics of AI Alignment

馃AI, LLM,
lesswrong.com

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

馃AI, LLM, Content type: Academic
arxiv.org

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help