Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 LLM Evaluation
Model Benchmarking, Quality Metrics, Human Evaluation, Testing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112424
posts in
277.8
ms
PELLI
: Framework to
effectively
integrate LLMs for quality software generation
arxiv.org
·
2d
🤖
AI
Analysis of systems with dependent components through a
variance-based
index and
regression
importance signature
sciencedirect.com
·
2d
🤖
AI
AI
usage
in popular open source projects
tirkarthi.github.io
·
11h
·
Discuss:
Hacker News
,
r/programming
🤖
AI
Karpathy
's
Micro
LLM in JavaScript
github.com
·
2d
·
Discuss:
Hacker News
🤖
AI
The
Evolving
Role of the
ML
Engineer
towardsdatascience.com
·
1d
🤖
AI
Reflections
on making a video game whose core
mechanic
is talking to LLMs
alanmunirji.dev
·
18h
·
Discuss:
Hacker News
🤖
AI
Agentic Engineering: What Actually Works After
Hundreds
of
Sessions
muhammadhammadkhan.substack.com
·
21h
·
Discuss:
Substack
🤖
AI
Building an ARC-2
Solver
— From
Socratic
Panels to a Single Oracle
pub.towardsai.net
·
1d
🤖
AI
Intelligence analysis platform for AI Agents (~
OpenClaw
)
blog.lukaszolejnik.com
·
1d
🤖
AI
Quality and
understandability
after AI
federicopereiro.com
·
2d
·
Discuss:
Hacker News
🤖
AI
Towards Fair and Comprehensive Evaluation of
Routers
in
Collaborative
LLM Systems
arxiv.org
·
1d
🤖
AI
The AI
hater
’s guide to code with LLMs. This is an
interesti
...
kottke.org
·
23h
🤖
AI
The Problem With LLMs
deobald.ca
·
3d
·
Discuss:
Lobsters
,
Hacker News
🤖
AI
Generative LLMs as Automatic
Proofreaders
of Radiology Reports -
Radiological
Society of North America
rsna.org
·
2d
🤖
AI
LangChain
Agent Testing Guide Tool (Free)
news.ycombinator.com
·
1d
·
Discuss:
Hacker News
🤖
AI
I used a local LLM to
analyze
my journal
entries
ankursethi.com
·
1d
·
Discuss:
Lobsters
✍
longform travel writing
Painless
Activation
Steering
(PAS): Automated, Lightweight Post‑Training for LLM Behavior
sashacui.substack.com
·
13h
·
Discuss:
Substack
🤖
AI
🤖AI Agents Weekly: GPT-5.3-Codex-Spark,
GLM-5
, MiniMax M2.5, Recursive Language Models, Harness Engineering,
Agentica
, and More
nlp.elvissaravia.com
·
4h
🤖
AI
LLM Performance in
Astro
, React,
Tailwind
and Cloudflare
10xbench.ai
·
3d
·
Discuss:
Hacker News
🤖
AI
The case for
industrial
evals
lesswrong.com
·
1d
🤖
AI
Sign up or log in to see more results
Sign Up
Login
« Page 2
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help