Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Model Evals
📊 Model Evals
Specific
LLM evaluation, benchmarks, model evaluation, evals
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
92
posts in
6.2
ms
Beyond English
benchmarks
: clinical
llm
evaluation
in Brazilian Portuguese
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese
Less-relevant results
Adrarsh Divakaran: Building AI Agents in Python
🤖
AI Agents
Content type:
Blog
blog.adarshd.dev
·
6d
6 days ago
Actions for Adrarsh Divakaran: Building AI Agents in Python
Evaluating
using Mock Tool Calls to Quarantine Untrusted Prompt Inputs
🧠
LLMs
lesswrong.com
·
5d
5 days ago
Actions for Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs
Mr Vegas World Cup offer 2026: Bet £10, Get £30 in free bets
🎮
Gaming
Content type:
News
thesun.co.uk
·
1d
1 day ago
Actions for Mr Vegas World Cup offer 2026: Bet £10, Get £30 in free bets
Why Shrinking an AI
Model
Often Makes It More Useful
🧠
LLMs
siliconopera.com
·
3d
3 days ago
Actions for Why Shrinking an AI Model Often Makes It More Useful
SurgiQ: A Large-Scale Multi-Domain
Benchmark
for
Evaluating
Surgical Understanding in Large Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models
Law Professors Prefer AI over Peer Answers
🧠
LLMs
Content type:
Academic
law.stanford.edu
·
4d
4 days ago
·
Hacker News
Actions for Law Professors Prefer AI over Peer Answers
The Vanta AI Quality
Eval
Maturity
Model
⚡
AI Apps
vanta.com
·
7h
7 hours ago
·
Hacker News
Actions for The Vanta AI Quality Eval Maturity Model
History says one of these five teams will win the 2026 World Cup
🌍
Geopolitics
Content type:
News
nytimes.com
·
6d
6 days ago
Actions for History says one of these five teams will win the 2026 World Cup
LLM
Routing: From Strategy Selection to Production Architecture
🧠
LLMs
Content type:
Blog
blog.n8n.io
·
7h
7 hours ago
Actions for LLM Routing: From Strategy Selection to Production Architecture
Apple WWDC On-Device AI Deep Dive - Google Docs
🍎
Apple
gist.is
·
40m
40 minutes ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
Cybersecurity M&A Roundup: 26 Deals Announced in May 2026
🖥️
Hardware
securityweek.com
·
2d
2 days ago
Actions for Cybersecurity M&A Roundup: 26 Deals Announced in May 2026
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
🖥️
GPUs
Content type:
Blog
huggingface.co
·
6d
6 days ago
Actions for Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
Rank Intervals for Leaderboards: A Hierarchical Framework for
Model
Evaluation
🔧
MLOps
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation
How to Train Your Goblin
🧠
LLMs
goblins.mchen.workers.dev
·
3d
3 days ago
·
Hacker News
,
Hacker News
Actions for How to Train Your Goblin
AI Governance Tools: How To Achieve Compliance and Visibility
🔧
MLOps
Content type:
Blog
blog.n8n.io
·
7h
7 hours ago
Actions for AI Governance Tools: How To Achieve Compliance and Visibility
FanGraphs Power Rankings: June 1–7
📱
Tech Reviews
Content type:
News
Content type:
Blog
blogs.fangraphs.com
·
2d
2 days ago
Actions for FanGraphs Power Rankings: June 1–7
What does a reranker even do ?
📚
RAG
Content type:
Blog
anima-mundi.bearblog.dev
·
5d
5 days ago
Actions for What does a reranker even do ?
LLM-Based
Visualization
Evaluation
: How Well Do Literacy-Stratified Personas Approximate
Human
Judgments?
🧠
LLMs
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for LLM-Based Visualization Evaluation: How Well Do Literacy-Stratified Personas Approximate Human Judgments?
LLM
Research Papers: The 2026 List (January to May)
🖥️
GPUs
Content type:
News
magazine.sebastianraschka.com
·
4d
4 days ago
·
Hacker News
Actions for LLM Research Papers: The 2026 List (January to May)
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help