Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃搳 Model Evaluation
Benchmarks, Metrics, Testing, Performance Analysis
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
146491
posts in
70.4
ms
Block-Bench: A Framework for
Controllable
and
Transparent
Discrete Optimization Benchmarking
聽
馃
Multi-Agent Systems
arxiv.org
路
5h
Show HN: Pre-training,
fine-tuning
, and
evals
platform
聽
馃
Multi-Agent Systems
oumi.ai
路
5d
路
Hacker News
A Hands-On Guide to Testing Agents with
RAGAs
and
G-Eval
聽
馃
Multi-Agent Systems
machinelearningmastery.com
路
21h
Live Life on the Edge: A
Layered
Strategy for Testing Data Models
聽
馃
Multi-Agent Systems
chiply.dev
路
2d
路
Hacker News
,
r/programming
smoothyy3/willitrun
: CLI to tell you if an ML model will fit and run on your device, using real benchmarks + lightweight estimation.
聽
馃幃
reinforcement learning
github.com
路
2d
路
Hacker News
Fast Isn鈥檛 Fast Enough:
Redefining
Metrics
for Edge AI
聽
馃
Multi-Agent Systems
semiengineering.com
路
2h
Better
Harness
: A Recipe for
Harness
Hill-Climbing with
Evals
聽
馃
Multi-Agent Systems
blog.langchain.com
路
14h
benchmarking
inference
of popular models on consumer hardware
聽
馃幃
reinforcement learning
inferena.tech
路
4d
路
Hacker News
AI to
ROI
Metrics
: Infrastructure Cost Optimization
聽
馃
Multi-Agent Systems
ai2roi.substack.com
路
18h
路
Substack
I
benchmarked
my own product, published everything, and 0.2.0 is
basically
the list of things I had to fix.
聽
馃
Multi-Agent Systems
blog.routerly.ai
路
1d
路
r/SideProject
You
Fine-Tuned
Your Model. Now It鈥檚 Worse. Here鈥檚 the Concept You Were Never
Taught
.
聽
馃幃
reinforcement learning
pub.towardsai.net
路
14h
AXI
: Agent EXperience
Interface
聽
馃
Multi-Agent Systems
axi.md
路
4h
路
Hacker News
April 7, 2026 (#4641)
聽
馃
Multi-Agent Systems
alvinashcraft.com
路
1d
Thoughts on
causal
isolation
of AI evaluation benchmarks
聽
馃幃
reinforcement learning
lesswrong.com
路
6d
Introducing
Metrics
SQL: A SQL-based semantic
layer
for humans and agents
聽
馃
Multi-Agent Systems
rilldata.com
路
11h
路
Hacker News
The case for Model-as-a-Service over
self-managed
inference
聽
馃
Multi-Agent Systems
news.ycombinator.com
路
2d
路
Hacker News
NL2SQLBench
: A Modular Benchmarking Framework for
LLM-Enabled
NL2SQL Solutions
聽
馃
llm
vldb.org
路
7h
Give an LLM an API and It'll
Thrive
. Give It a
Touchscreen
and It Struggles
聽
馃
llm
blog.allada.com
路
3d
路
Hacker News
,
Hacker News
Why a High
Accuracy
Model Can Still Be
Useless
聽
馃
Multi-Agent Systems
medium.com
路
1d
Introducing
workload
simulation
workbench
for Amazon MSK Express broker
聽
馃
Multi-Agent Systems
aws.amazon.com
路
1d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help