Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Post-training
🎯 Post-training
Specific
fine-tuning, RLHF, instruction tuning, alignment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
157
posts in
9.3
ms
Emergence of Context Characteristics Sensitivity in
Large
Language
Models
🌐
World Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Emergence of Context Characteristics Sensitivity in Large Language Models
Reasoning
RL
in 2026: GRPO,
DPO
, RLVR, Agentic
PO
& Beyond
🎮
RL
turingpost.com
·
4d
4 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Sequent: scale and automation for higher confidence in
alignment
🧠
AI
lesswrong.com
·
21h
21 hours ago
Actions for Sequent: scale and automation for higher confidence in alignment
Why LLMs (still) lack taste
💬
LLMs
beyondtheprior.com
·
2d
2 days ago
·
Hacker News
Actions for Why LLMs (still) lack taste
The week AI infrastructure crossed from a technology story to a
financial
one
💬
LLMs
Content type:
News
mlwhiz.com
·
13h
13 hours ago
Actions for The week AI infrastructure crossed from a technology story to a financial one
Deep Learning Weekly: Issue 458
🤖
AI Agents
deeplearningweekly.com
·
6d
6 days ago
Actions for Deep Learning Weekly: Issue 458
Introducing North Mini Code: Cohere’s First
Model
For Developers
🌐
World Models
Content type:
Blog
huggingface.co
·
1d
1 day ago
·
Hacker News
Actions for Introducing North Mini Code: Cohere’s First Model For Developers
KJLdefeated/RL.cu
: RLVR
training
for
LLM
in CUDA/C++
📊
ML
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
Less-relevant results
Vibe Diaries:
Training
Nanochat
📊
ML
vibediary.dev
·
2d
2 days ago
·
Hacker News
Actions for Vibe Diaries: Training Nanochat
Researchers
trained
an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
🌐
World Models
venturebeat.com
·
2d
2 days ago
·
Hacker News
Actions for Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
Compatibility-Aware Dynamic
Fine-Tuning
for
Large
Language Models
🎮
RL
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Compatibility-Aware Dynamic Fine-Tuning for Large Language Models
DiffusionGemma: The Developer Guide- Google Developers Blog
💬
LLMs
Content type:
Blog
developers.googleblog.com
·
1d
1 day ago
·
r/LocalLLaMA
Actions for DiffusionGemma: The Developer Guide- Google Developers Blog
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
🌐
World Models
Content type:
Blog
developer.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
GPT-2: Too Dangerous To Release (2019)
💬
LLMs
Content type:
Blog
naokishibuya.github.io
·
1d
1 day ago
·
Hacker News
Actions for GPT-2: Too Dangerous To Release (2019)
SFT
& the Locus Awards
💬
LLMs
sfintranslation.com
·
5d
5 days ago
Actions for SFT & the Locus Awards
Tracing Eval-Awareness Emergence Through
Training
of OLMo 3
🏋️
Pretraining
lesswrong.com
·
1d
1 day ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
I built a machine that turns AI papers into interactive explainers
🎮
RL
Content type:
Blog
blog.skz.dev
·
6d
6 days ago
Actions for I built a machine that turns AI papers into interactive explainers
Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning
🏋️
Pretraining
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning
SLUUG Talk: Demystifying
Large
Language
Models
on Linux
🧠
AI
Content type:
Code
github.com
·
4d
4 days ago
·
DEV
Actions for SLUUG Talk: Demystifying Large Language Models on Linux
AI2's Nathan Lambert says Nvidia's multi-teacher on-policy distillation for Nemotron 3 Ultra is the
post-training
industry standard
💬
LLMs
digg.com
·
6d
6 days ago
Actions for AI2's Nathan Lambert says Nvidia's multi-teacher on-policy distillation for Nemotron 3 Ultra is the post-training industry standard
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help