Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 Model Evaluation
Benchmarking, Performance Metrics, A/B Testing, Quality Assessment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
111975
posts in
295.7
ms
Beyond
ATE
:
Multi-Criteria
Design for A/B Testing
arxiv.org
·
14h
🤖
AI Agent
BalatroBench
Benchmarks
Large Language Models Playing Balatro
balatrobench.com
·
7h
·
Discuss:
Hacker News
🔧
Functional Programming
Studying
Quality Improvements
Recommended
via Manual and Automated Code Review
arxiv.org
·
14h
🔧
Functional Programming
SWE-rebench
Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance
swe-rebench.com
·
53m
·
Discuss:
r/LocalLLaMA
🤖
AI Agent
Clean
Architecture in .NET 10: Testing What
Matters
dev.to
·
14h
·
Discuss:
DEV
🔧
Functional Programming
The case for
industrial
evals
lesswrong.com
·
17h
🤖
AI Agent
ml-rust/fluxbench
: Benchmarking framework with crash isolation,
bootstrap
statistics, and CI integration
github.com
·
5h
·
Discuss:
r/rust
🔧
Functional Programming
Joint optimization of maintenance and spare parts management in
upstream
–
downstream
systems under quality control
sciencedirect.com
·
2h
🔧
Functional Programming
BinaryAudit
: Can AI find
backdoors
in raw machine code?
quesma.com
·
3h
·
Discuss:
Hacker News
🤖
AI Agent
My
Skill
Makes Claude Code GREAT At
TDD
aihero.dev
·
3h
🔧
Functional Programming
Quality
Assurance
in AI Assisted Software Development: Risks and
Implications
dev.to
·
17h
·
Discuss:
DEV
🤖
AI Agent
Beyond the
Prompt
- Why and How to
Fine-tune
Your Own Models
devblogs.microsoft.com
·
2d
🤖
AI Agent
MiniMaxAI
MiniMax-M2.5 has
230b
parameters and 10b active parameters
openhands.dev
·
21h
·
Discuss:
r/LocalLLaMA
🤖
AI Agent
Olmix
: A framework for data mixing throughout
LM
development
allenai.org
·
2h
🔧
Functional Programming
Completed
Hyperparameter
Transfer across Modules, Width, Depth, Batch and
Duration
machinelearning.apple.com
·
19h
🔧
Functional Programming
5 Days, One GPU
Gameboy
Swarm
bkase.io
·
4h
·
Discuss:
Hacker News
🤖
AI Agent
Product
Forecasting
through Time Series Analysis (
Modelling
)
pub.towardsai.net
·
18h
🤖
AI Agent
CCBench
: How do agents perform on
codebases
that aren't part of training data?
ccbench.org
·
20h
·
Discuss:
Hacker News
🤖
AI Agent
Design Decision:
Technical
Debt in
BillaBear
iain.rocks
·
7h
·
Discuss:
Hacker News
,
r/programming
🔧
Functional Programming
Benchmark
&
Compare
the Best AI Models
arena.ai
·
2d
🤖
AI Agent
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help