Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Evaluation
π LLM Evaluation
Specific
LLM benchmarks, evals, model evaluation, Harness, lm-eval
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
31
posts in
13.2
ms
π§
LLMs
arXiv
Β·
3d
3 days ago
The Origins of Stochasticity: Comprehensive Investigations on Uncertainty Quantification for
Large
Language
Models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Origins of Stochasticity: Comprehensive Investigations on Uncertainty Quantification for Large Language Models
ποΈ
AI Infra
tai.shadie-oneapi.com
Β·
2d
2 days ago
Building an AI Side Project That Actually Ships β Lessons from Shipping 3 MVPs
Covered byΒ
DEV Community
,
api.deepseek.com
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Building an AI Side Project That Actually Ships β Lessons from Shipping 3 MVPs
Less-relevant results
π§
LLMs
Hugging Face
Β·
19h
19 hours ago
HRM-Text: Efficient Pretraining Beyond Scaling
CoversΒ
sapientinc/HRM-Text: HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for HRM-Text: Efficient Pretraining Beyond Scaling
ποΈ
AI Infra
NVIDIA Technical Blog
Β·
2d
2 days ago
Boost
Inference
Performance
up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
CoversΒ
4Β stories
See all stories this covers
Β includingΒ
NVIDIA Blackwell Architecture
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
π
MLOps
blog.doubleword.ai
Β·
4d
4 days ago
Prediction: A Frontier open-source
LLM
Will Be Released On 3rd December 2026
Covered byΒ
whyopensource.ai
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Prediction: A Frontier open-source LLM Will Be Released On 3rd December 2026
ποΈ
AI Infra
GitHub
Β·
1d
1 day ago
For users with 4x-8x 6000 PROs, how is your experience with bigger
models
lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
π―
Post-training
fareedkhan-dev.github.io
Β·
5d
5 days ago
Train
LLM
from Scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Train LLM from Scratch
π―
Post-training
Liquid AI
Β·
6h
6 hours ago
LFM2.5-230M: Built to Run Anywhere
Covered byΒ
VentureBeat
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for LFM2.5-230M: Built to Run Anywhere
π§
LLMs
arXiv
Β·
9h
9 hours ago
Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion
LLM
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Dynamic-dLLM: Dynamic Cache-Budget and Adaptive Parallel Decoding for Training-Free Acceleration of Diffusion LLM
π€
AI Agents
Context Window
Β·
1d
1 day ago
Transcript: βWhat It Will Mean to Be
Human
When AI Can Do Everythingβ
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Transcript: βWhat It Will Mean to Be Human When AI Can Do Everythingβ
ποΈ
AI Infra
Red Hat Developer
Β·
3d
3 days ago
Connect
EvalHub
to protected production
model
servers
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Connect EvalHub to protected production model servers
π
MLOps
arXiv
Β·
2d
2 days ago
Holistic Data Scheduler for
LLM
Pre-training via Multi-Objective Reinforcement Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Holistic Data Scheduler for LLM Pre-training via Multi-Objective Reinforcement Learning
π
MCP
Microsoft for Developers
Β·
3d
3 days ago
Models
donβt have preferences, they have context
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Models donβt have preferences, they have context
ποΈ
AI Infra
GitHub
Β·
3d
3 days ago
I built a Rust entropy monitor to route
LLM
inference
β here's what the
benchmark
showed
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I built a Rust entropy monitor to route LLM inference β here's what the benchmark showed
π―
Post-training
arXiv
Β·
1d
1 day ago
Cliff Tokens: Identifying Single-Token Failure Triggers in
LLM
Mathematical Reasoning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Cliff Tokens: Identifying Single-Token Failure Triggers in LLM Mathematical Reasoning
π§
LLMs
arXiv
Β·
2d
2 days ago
Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reasoning as Attractor Dynamics: Latent Memory Retrieval via Gibbs-Weighted Energy Minimization
π
MCP
jvm-weekly.com
Β·
1d
1 day ago
The Rest of the Story: June Edition - JVM Weekly vol. 181
CoversΒ
4Β stories
See all stories this covers
Β includingΒ
Where are the uploaded skill folders stored on the MacOS file system?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Rest of the Story: June Edition - JVM Weekly vol. 181
π
MCP
redhat.com
Β·
4d
4 days ago
Introducing Project Navigator: From AI intent to optimized deployment on Red Hat OpenShift AI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Introducing Project Navigator: From AI intent to optimized deployment on Red Hat OpenShift AI
π―
Post-training
arXiv
Β·
1d
1 day ago
Riazi-8B: An Urdu
Large
Language
Model
for Mathematical Reasoning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Riazi-8B: An Urdu Large Language Model for Mathematical Reasoning
π§
LLMs
arXiv
Β·
3d
3 days ago
MINCE: Shrinking
LLM
Evaluation
Datasets via
Few-Model
Monte Carlo Calibration
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MINCE: Shrinking LLM Evaluation Datasets via Few-Model Monte Carlo Calibration
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report