Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Post-training
🎯 Post-training
Specific
RLHF, fine-tuning, DPO, instruction tuning, model alignment
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
127
posts in
13.5
ms
🤖
AI Development
arXiv
·
19h
19 hours ago
The Hitchhiker's Guide to Agentic AI: From Foundations to Systems
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Hitchhiker's Guide to Agentic AI: From Foundations to Systems
Less-relevant results
🧠
LLM Research
GitHub
·
6d
6 days ago
Show HN: NanoEuler – GPT-2 scale
model
in pure C/CUDA from scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch
🧪
AI Labs
IT之家
·
5d
5 days ago
谷歌 Gemini 联席负责人沙泽尔转投 OpenAI,奥尔特曼亲自发文欢迎
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 谷歌 Gemini 联席负责人沙泽尔转投 OpenAI,奥尔特曼亲自发文欢迎
🧪
AI Labs
mittrchina.com
·
5d
5 days ago
美国三家最强AI公司,怎么都去搞生命科学了?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 美国三家最强AI公司,怎么都去搞生命科学了?
🧠
LLM Training
arXiv
·
2d
2 days ago
Attention-Spectrum Regularization for Replay-Free Continual Multimodal LLMs
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Attention-Spectrum Regularization for Replay-Free Continual Multimodal LLMs
🤖
AI Development
Hacker News
·
6d
6 days ago
Ask HN: Can Monte Carlo Tree Search Improve AI Outputs?
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Ask HN: Can Monte Carlo Tree Search Improve AI Outputs?
🧠
LLM Training
arXiv
·
1d
1 day ago
Towards Spec Learning: Inference-Time
Alignment
from Preference Pairs
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Towards Spec Learning: Inference-Time Alignment from Preference Pairs
🧠
LLM Training
arXiv
·
1d
1 day ago
Weight-Space Geometry of Offline Reasoning
Training
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Weight-Space Geometry of Offline Reasoning Training
🧪
AI Labs
cnbeta.com.tw
·
4d
4 days ago
三天内连失两位传奇:谷歌的AI人才大坝正在决堤?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 三天内连失两位传奇:谷歌的AI人才大坝正在决堤?
🧠
LLM Training
arXiv
·
2d
2 days ago
Repeated
post-training
is not Self-improving: Diagnosing Scientific Amnesia in Continual
DPO
Pipelines
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Repeated post-training is not Self-improving: Diagnosing Scientific Amnesia in Continual DPO Pipelines
🧠
LLM Training
arXiv
·
19h
19 hours ago
Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning
🎯
RLHF
arXiv
·
2d
2 days ago
Self-Evolution for Multi-Turn Tool-Calling Agents via Divergence-Point Preference Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Self-Evolution for Multi-Turn Tool-Calling Agents via Divergence-Point Preference Learning
🗣️
Large Language Models
arXiv
·
2d
2 days ago
Can LLMs Reliably Self-Report Adversarial Prefills, and How?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Can LLMs Reliably Self-Report Adversarial Prefills, and How?
🧠
LLM Training
arXiv
·
19h
19 hours ago
The Geometry of Sequential Learning: Lie-Bracket Prediction of
Transfer
Order
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order
🤖
AI
arXiv
·
2d
2 days ago
A Markov Chain Approach to Preference
Alignment
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for A Markov Chain Approach to Preference Alignment
🎯
Alignment Research
arXiv
·
6d
6 days ago
Uncertainty-Aware Reward
Modeling
for Stable
RLHF
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Uncertainty-Aware Reward Modeling for Stable RLHF
🧠
LLM Training
arXiv
·
19h
19 hours ago
Speculative Decoding at Temperature Zero: A Scoped Safety-Invariance Screen with a 48,072-Sample Expansion
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Speculative Decoding at Temperature Zero: A Scoped Safety-Invariance Screen with a 48,072-Sample Expansion
🧠
LLM Training
arXiv
·
6d
6 days ago
Emergent
Alignment
Covered by
何夕2077的个人站
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Emergent Alignment
📄
AI Papers
arXiv
·
19h
19 hours ago
Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Memory Retrieval in Visuomotor Policies for Long-Horizon Robot Control
🤖
AI Development
arXiv
·
1d
1 day ago
Lightweight
Transformer
Models
for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report