Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
📊 LLM Evaluation
Specific
benchmarks, evals, LLM scoring, evaluation metrics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
177
posts in
10.7
ms
How to run
evals
for the
model
router
🚀
LLM Deployment
devblogs.microsoft.com
·
1d
Context pruning: cut
LLM
tokens without losing quality (9 minute read)
🎯
LLM Finetuning
redis.io
·
3d
Less-relevant results
https://
research.perplexity.ai/articles/query-aware-context-compression-for-better-snippets
🔍
RAG
research.perplexity.ai
·
12h
Prompt
Compression in Diffusion Large Language
Models
:
Evaluating
LLMLingua-2 on LLaDA
🧠
LLMs
arxiv.org
·
2d
Agentic
evals
or
LLM
as a
judge
? considering cost, time and quality
🎯
LLM Finetuning
news.ycombinator.com
·
5d
·
Hacker News
3DAeroRelief: The first 3D
Benchmark
UAV Dataset for Post-Disaster Assessment
🛡️
AI Safety
nature.com
·
2h
sapientinc/HRM-Text: HRM-Text is a 1B text generation
model
based on the HRM architecture, strengthened by task completion and latent space reasoning.
🚀
LLM Deployment
github.com
·
1d
·
r/singularity
Researchers train AI
model
that hits near-full performance with just 12.5 percent of its experts
🧠
LLMs
the-decoder.com
·
4d
NLA Verbalizations on AuditBench: Llama 70B
🧠
LLMs
lesswrong.com
·
5d
tokenspeed — feel
LLM
tokens-per-second
🎯
LLM Finetuning
mikeveerman.github.io
·
2h
Discover the Red Hat OpenShift AI
model
catalog
🚀
LLM Deployment
redhat.com
·
3d
Eval
engineering: The missing piece of agentic AI governance
🤖
Agentic AI
siliconangle.com
·
3d
Beyond the Runbook: How to Scale SRE Operations for Cloud-Native Infrastructure
🤖
Agentic AI
cloudnativenow.com
·
2d
Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)
🚀
LLM Deployment
huggingface.co
·
5d
·
r/LocalLLaMA
Fine-Grained
Benchmark
Generation for Comprehensive
Evaluation
of Foundation
Models
🧠
LLMs
arxiv.org
·
1d
AI researchers flag bias risks in
LLM
judging
🧠
LLMs
kite.kagi.com
·
5d
AI researchers push reliability tests for agent systems
🤖
AI Agents
kite.kagi.com
·
4d
May 20, 2026 (#4672)
🤖
AI Agents
alvinashcraft.com
·
18h
Build custom code-based
evaluators
in Amazon Bedrock AgentCore
🤖
AI Agents
aws.amazon.com
·
2d
LLM-as-a-Judge
: How to Become a Preferred Content Source for AI Answers
🏢
LLM Adoption
lumar.io
·
5d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help