Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 LLM Evaluation
Model Benchmarking, Quality Metrics, Human Evaluation, Testing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112668
posts in
367.5
ms
PELLI
: Framework to
effectively
integrate LLMs for quality software generation
arxiv.org
·
1d
🤖
AI
When LLMs get significantly worse: A
statistical
approach to detect model
degradations
arxiv.org
·
1d
🤖
AI
Generative LLMs as Automatic
Proofreaders
of Radiology Reports -
Radiological
Society of North America
rsna.org
·
2d
🤖
AI
SWE-rebench
Jan 2026: GLM-5, MiniMax M2.5, Qwen3-Coder-Next, Opus 4.6, Codex Performance
swe-rebench.com
·
3h
·
Discuss:
r/LocalLLaMA
🤖
AI
How To
Utilize
LMS
Data: Use Cases For Enhancing L&D Insights
elearningindustry.com
·
1d
🤖
AI
My
Skill
Makes Claude Code GREAT At
TDD
aihero.dev
·
5h
🤖
AI
Analysis of systems with dependent components through a
variance-based
index and
regression
importance signature
sciencedirect.com
·
1d
🤖
AI
Quality
Assurance
in AI Assisted Software Development: Risks and
Implications
dev.to
·
19h
·
Discuss:
DEV
🤖
AI
The AI
hater
’s guide to code with LLMs. This is an
interesti
...
kottke.org
·
52m
🤖
AI
The case for
industrial
evals
lesswrong.com
·
19h
🤖
AI
The Problem With LLMs
deobald.ca
·
2d
·
Discuss:
Lobsters
,
Hacker News
🤖
AI
Intelligence analysis platform for AI Agents (~
OpenClaw
)
blog.lukaszolejnik.com
·
6h
🤖
AI
AI dev tool power rankings &
comparison
[
Feb
. 2026]
blog.logrocket.com
·
4h
🤖
AI
feat
: implement LLM decision engine (Task 10) by
meleantonio
· Pull Request #36
github.com
·
1d
🤖
AI
Your AI
sounds
confident
. But is it right?
truthlayer.netlify.app
·
1d
·
Discuss:
Hacker News
🤖
AI
Scaling
LLM Post-Training at Netflix
netflixtechblog.com
·
13h
🤖
AI
Quality and
understandability
after AI
federicopereiro.com
·
1d
·
Discuss:
Hacker News
🤖
AI
Design Decision:
Technical
Debt in
BillaBear
iain.rocks
·
9h
·
Discuss:
Hacker News
,
r/programming
🤖
AI
LangChain
Agent Testing Guide Tool (Free)
news.ycombinator.com
·
5h
·
Discuss:
Hacker News
🤖
AI
I used a local LLM to
analyze
my journal
entries
ankursethi.com
·
5h
·
Discuss:
Lobsters
✍
longform travel writing
Loading...
Loading more...
« Page 1
•
Page 3 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help