Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement learning, Post training
馃幆 Reinforcement learning, Post training
Specific
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
103
posts in
5.0
ms
EDPB meets with EU Commissioner McGrath and adopts common data breach notification template
聽
馃
AI, LLM,
edpb.europa.eu
路
1d
1 day ago
Actions for EDPB meets with EU Commissioner McGrath and adopts common data breach notification template
umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
聽
馃
AI, LLM,
聽
Content type:
Code
github.com
路
6d
6 days ago
路
r/SideProject
Actions for umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
Representation-Aware Advantage Estimation: Your
Reward
Model
Provides More Than A Scalar Output
聽
馃
AI, LLM,
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Ukraine is ready to share drone technology with Nordic and Baltic countries, Zelenskyy says
聽
馃弸
Training
the-journal.com
路
3d
3 days ago
Actions for Ukraine is ready to share drone technology with Nordic and Baltic countries, Zelenskyy says
Mult-DPO
: Multinomial Direct
Preference
Optimization for Recommender Systems
聽
馃
AI, LLM,
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
How to
Train
Your Goblin
聽
馃
AI, LLM,
goblins.mchen.workers.dev
路
4d
4 days ago
路
Hacker News
,
Hacker News
Actions for How to Train Your Goblin
Harmfulness Directions in OLMo
聽
馃
AI, LLM,
lesswrong.com
路
2d
2 days ago
Actions for Harmfulness Directions in OLMo
Zelenskyy meets with leaders | Arkansas Democrat Gazette
聽
馃弸
Training
聽
Content type:
News
arkansasonline.com
路
1d
1 day ago
Actions for Zelenskyy meets with leaders | Arkansas Democrat Gazette
Room360: Video-to-3D Spatial Reconstruction Platform
聽
馃弸
Training
聽
Content type:
Blog
huggingface.co
路
4d
4 days ago
Actions for Room360: Video-to-3D Spatial Reconstruction Platform
DriveReward: A Comprehensive Dataset and Generative Vision-Language
Reward
Model
for Autonomous Driving
聽
馃
AI, LLM,
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving
X-VPN proves its privacy credentials with new independent no-logs audit
聽
馃弸
Training
聽
Content type:
News
techradar.com
路
3d
3 days ago
Actions for X-VPN proves its privacy credentials with new independent no-logs audit
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
聽
馃
AI, LLM,
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Clipping Businesses: Pay-Per-View Distribution, Clip Armies, View Verification
聽
馃弸
Training
trends.vc
路
1d
1 day ago
Actions for Clipping Businesses: Pay-Per-View Distribution, Clip Armies, View Verification
Would a prepaid pass for a coding agent solve a real need or is it just my itch?
聽
馃
AI, LLM,
codehamr.com
路
6d
6 days ago
路
r/SideProject
Actions for Would a prepaid pass for a coding agent solve a real need or is it just my itch?
At Netroots Nation, Progressives Divided on AI
聽
馃
AI, LLM,
techpolicy.press
路
1d
1 day ago
Actions for At Netroots Nation, Progressives Divided on AI
DynaCF: Mitigating Shortcut
Learning
in
Reward
Models
via Dynamic Counterfactual Sensitivity
聽
馃弸
Training
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity
From pew to pitch, Cameroonian priest builds bridges in French banlieue
聽
馃弸
Training
聽
Content type:
News
rfi.fr
路
4d
4 days ago
Actions for From pew to pitch, Cameroonian priest builds bridges in French banlieue
Mankirat47/Dao-Heart-v3.14: Dao Heart v3.14 : a bounded symbolic AI value governance research scaffold for studying value drift, oversight, warmth preservation, and identity stability under pressure.
聽
馃
AI, LLM,
聽
Content type:
Code
github.com
路
1d
1 day ago
路
Hacker News
Actions for Mankirat47/Dao-Heart-v3.14: Dao Heart v3.14 : a bounded symbolic AI value governance research scaffold for studying value drift, oversight, warmth preservation, and identity stability under pressure.
Neglected Basics of AI
Alignment
聽
馃
AI, LLM,
lesswrong.com
路
4d
4 days ago
Actions for Neglected Basics of AI Alignment
DOG-DPO
:Dynamic Optimization in Geometry for Safety
Alignment
聽
馃
AI, LLM,
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help