Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Evaluation
📊 LLM Evaluation
Specific
LLM benchmarks, evals, model evaluation, Harness, lm-eval
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
31
posts in
9.5
ms
⚙️
Backend Engineering
wowhow.cloud
·
5d
5 days ago
Claude Opus 4.8 vs Gemini 3.5 Pro vs GPT-5.6: Developer
Model
Selection Guide (June 2026)
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Claude Opus 4.8 vs Gemini 3.5 Pro vs GPT-5.6: Developer Model Selection Guide (June 2026)
🧠
LLMs
arXiv
·
3d
3 days ago
Investigating Linguistic Steering: An Analysis of Adjectival Effects Across
Large
Language
Model
Architectures
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Investigating Linguistic Steering: An Analysis of Adjectival Effects Across Large Language Model Architectures
🔄
MLOps
arXiv
·
2d
2 days ago
You Don't Need to Run Every
Eval
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for You Don't Need to Run Every Eval
🏗️
AI Infra
arXiv
·
3d
3 days ago
Uncertainty-based Debiasing and Unlearning for Decontamination
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Uncertainty-based Debiasing and Unlearning for Decontamination
🎯
Post-training
arXiv
·
2d
2 days ago
Weight-Space Geometry of Offline Reasoning Training
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Weight-Space Geometry of Offline Reasoning Training
🧠
LLMs
arXiv
·
3d
3 days ago
In
LLM
Reasoning, there is Irrationality on top of Value Misalignment
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for In LLM Reasoning, there is Irrationality on top of Value Misalignment
🎯
Post-training
arXiv
·
1d
1 day ago
The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Geometry of Sequential Learning: Lie-Bracket Prediction of Transfer Order
🎯
Post-training
arXiv
·
1d
1 day ago
SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SARA: Unlocking Multilingual Knowledge in Mixture-of-Experts via Semantically Anchored Routing Alignment
🧠
LLMs
arXiv
·
3d
3 days ago
Beyond Fixed Budgets: Characterizing the Inelasticity and Limitations of Tree-of-Thought Reasoning Strategies
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Beyond Fixed Budgets: Characterizing the Inelasticity and Limitations of Tree-of-Thought Reasoning Strategies
🎯
Post-training
arXiv
·
3d
3 days ago
L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small
Language
Modeling
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small Language Modeling
🧠
LLMs
arXiv
·
3d
3 days ago
Scheduling Thoughts: Learning the Order of Thought in Diffusion
Language
Models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Scheduling Thoughts: Learning the Order of Thought in Diffusion Language Models
« Page 1
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report