Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 Model Evaluation
Benchmarking, Performance Metrics, A/B Testing, Quality Assessment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112733
posts in
423.0
ms
Beyond
ATE
:
Multi-Criteria
Design for A/B Testing
arxiv.org
·
21h
🤖
AI Agent
AI dev tool power rankings &
comparison
[
Feb
. 2026]
blog.logrocket.com
·
9h
🤖
AI Agent
guard0-ai/TrustVector
: Independent, evidence-based trust evaluations for 100+ AI models, agents, and tools.
github.com
·
3h
·
Discuss:
Hacker News
🤖
AI Agent
BalatroBench
Benchmarks
Large Language Models Playing Balatro
balatrobench.com
·
15h
·
Discuss:
Hacker News
🔧
Functional Programming
Constructing
Industrial-Scale Optimization Modeling
Benchmark
arxiv.org
·
1d
🔧
Functional Programming
You are
probably
overpaying
for intelligence
residuals.bearblog.dev
·
5h
🤖
AI Agent
Clean
Architecture in .NET 10: Testing What
Matters
dev.to
·
21h
·
Discuss:
DEV
🔧
Functional Programming
SWE-rebench
Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance
swe-rebench.com
·
8h
·
Discuss:
r/LocalLLaMA
🤖
AI Agent
The case for
industrial
evals
lesswrong.com
·
1d
🤖
AI Agent
Joint optimization of maintenance and spare parts management in
upstream
–
downstream
systems under quality control
sciencedirect.com
·
10h
🔧
Functional Programming
Data Engineering for Large Models:
Architecture
,
Algorithms
& Projects
github.com
·
1h
🔧
Functional Programming
BinaryAudit
: Can AI find
backdoors
in raw machine code?
quesma.com
·
11h
·
Discuss:
Hacker News
🤖
AI Agent
Quality
Assurance
in AI Assisted Software Development: Risks and
Implications
dev.to
·
1d
·
Discuss:
DEV
🤖
AI Agent
Beyond the
Prompt
- Why and How to
Fine-tune
Your Own Models
devblogs.microsoft.com
·
2d
🤖
AI Agent
Reflections on
prototyping
a
sysadmin
benchmark
samek.fyi
·
6h
🔧
Functional Programming
My
Skill
Makes Claude Code GREAT At
TDD
aihero.dev
·
10h
🔧
Functional Programming
MiniMaxAI
MiniMax-M2.5 has
230b
parameters and 10b active parameters
openhands.dev
·
1d
·
Discuss:
r/LocalLLaMA
🤖
AI Agent
5 Days, One GPU
Gameboy
Swarm
bkase.io
·
12h
·
Discuss:
Hacker News
🤖
AI Agent
Olmix
: A framework for data mixing throughout
LM
development
allenai.org
·
10h
🔧
Functional Programming
Completed
Hyperparameter
Transfer across Modules, Width, Depth, Batch and
Duration
machinelearning.apple.com
·
1d
🔧
Functional Programming
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help