Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Evaluation
📊 LLM Evaluation
Specific
LLM benchmarks, evals, model evaluation, Harness, lm-eval
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
30
posts in
23.1
ms
🧠
LLMs
arXiv
·
1d
1 day ago
The Origins of Stochasticity: Comprehensive Investigations on Uncertainty Quantification for
Large
Language
Models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Origins of Stochasticity: Comprehensive Investigations on Uncertainty Quantification for Large Language Models
🎯
Post-training
fareedkhan-dev.github.io
·
3d
3 days ago
Train
LLM
from Scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Train LLM from Scratch
🔄
MLOps
blog.doubleword.ai
·
2d
2 days ago
Prediction: A Frontier open-source
LLM
Will Be Released On 3rd December 2026
Covered by
whyopensource.ai
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Prediction: A Frontier open-source LLM Will Be Released On 3rd December 2026
🏗️
AI Infra
GitHub
·
1d
1 day ago
I built a Rust entropy monitor to route
LLM
inference
— here's what the
benchmark
showed
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I built a Rust entropy monitor to route LLM inference — here's what the benchmark showed
🏗️
AI Infra
tai.shadie-oneapi.com
·
1d
1 day ago
Building an AI Side Project That Actually Ships — Lessons from Shipping 3 MVPs
Covered by
DEV Community
,
api.deepseek.com
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Building an AI Side Project That Actually Ships — Lessons from Shipping 3 MVPs
Less-relevant results
🎯
Post-training
medium.com
·
6d
6 days ago
GRPO vs PPO vs DPO on
GSM8K
: What I Learned Building RL Training from Scratch
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for GRPO vs PPO vs DPO on GSM8K: What I Learned Building RL Training from Scratch
🏗️
AI Infra
NVIDIA Technical Blog
·
1d
1 day ago
Boost
Inference
Performance
up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
Covers
3 stories
See all stories this covers
including
NVIDIA/TensorRT-LLM
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
🔄
MLOps
arXiv
·
22h
22 hours ago
Holistic Data Scheduler for
LLM
Pre-training via Multi-Objective Reinforcement Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
🏗️
AI Infra
Deep Learning Weekly
·
6d
6 days ago
Deep Learning Weekly: Issue 460
Covers
4 stories
See all stories this covers
including
GLM-5.2 (6 minute read)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Deep Learning Weekly: Issue 460
🤖
AI Agents
Context Window
·
11h
11 hours ago
Transcript: ‘What It Will Mean to Be
Human
When AI Can Do Everything’
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Transcript: ‘What It Will Mean to Be Human When AI Can Do Everything’
🧠
LLMs
arXiv
·
22h
22 hours ago
Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization
🏗️
AI Infra
Red Hat Developer
·
2d
2 days ago
Connect
EvalHub
to protected production
model
servers
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Connect EvalHub to protected production model servers
🔌
MCP
Microsoft for Developers
·
2d
2 days ago
Models
don’t have preferences, they have context
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Models don’t have preferences, they have context
🧠
LLMs
arXiv
·
5d
5 days ago
Pruning via Causal Attribution Preserves Reasoning
Performance
in
Large
Language
Models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models
🔌
MCP
redhat.com
·
3d
3 days ago
Introducing Project Navigator: From AI intent to optimized deployment on Red Hat OpenShift AI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Introducing Project Navigator: From AI intent to optimized deployment on Red Hat OpenShift AI
🧠
LLMs
arXiv
·
1d
1 day ago
MINCE: Shrinking
LLM
Evaluation
Datasets via
Few-Model
Monte Carlo Calibration
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MINCE: Shrinking LLM Evaluation Datasets via Few-Model Monte Carlo Calibration
🔄
MLOps
arXiv
·
22h
22 hours ago
You Don't Need to Run Every
Eval
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for You Don't Need to Run Every Eval
⚙️
Backend Engineering
wowhow.cloud
·
4d
4 days ago
Claude Opus 4.8 vs Gemini 3.5 Pro vs GPT-5.6: Developer
Model
Selection Guide (June 2026)
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Claude Opus 4.8 vs Gemini 3.5 Pro vs GPT-5.6: Developer Model Selection Guide (June 2026)
🧠
LLMs
arXiv
·
1d
1 day ago
Investigating Linguistic Steering: An Analysis of Adjectival Effects Across
Large
Language
Model
Architectures
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Investigating Linguistic Steering: An Analysis of Adjectival Effects Across Large Language Model Architectures
🏗️
AI Infra
arXiv
·
1d
1 day ago
Uncertainty-based Debiasing and Unlearning for Decontamination
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Uncertainty-based Debiasing and Unlearning for Decontamination
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report