Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🧪 LLM Testing
LLM eval, model evaluation, evals, harness, benchmarks
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
183948
posts in
13.1
ms
A
Metamorphic
Testing Approach to Diagnosing
Memorization
in LLM-Based Program Repair
🧠
LLMs
arxiv.org
·
6d
Continually
improving our agent
harness
🤖
AI Agent
cursor.com
·
5h
Evals
in
practice
for an AI coding agent
🤖
AI Agent
ministryoftesting.com
·
6h
针对您的具体应用场景量身定制的
Vibe-train
评估与防护措施
🤖
AI Agent
plurai.ai
·
1d
Granite
4.1: IBM's
8B
Model Is Competing With Models Four Times Its Size
🧠
LLMs
firethering.com
·
6h
·
Hacker News
The
harness
matters
more than the model
✍️
Prompt Engineering
troyjarv00.bearblog.dev
·
2h
Vibing
, Harness and
OODA
loop
✍️
Prompt Engineering
architecture-weekly.com
·
3d
1jehuang/jcode
: Coding Agent Harness
🤖
AI Agent
github.com
·
15h
·
Hacker News
Bun
’s Zig fork got 4x faster
compilation
times
🐹
Go
ziggit.dev
·
2d
Gemini-3-Flash: My ai agent benchmark
terminalbench
Win & 3
Fixes
✍️
Prompt Engineering
buildzn.com
·
2d
·
DEV
Best
Cheap
Open Source Models for
Hermes
Agent in 2026
🤖
AI Agent
bitdoze.com
·
17h
Getting Up to Speed on Multi-Agent Systems, Part 7:
Benchmarks
and What They Miss
🕵️
AI Agents
christophermeiklejohn.com
·
5h
How My
RLM
Tool Works
🧠
LLMs
isaacflath.com
·
3d
·
Hacker News
not much
happened
today
✍️
Prompt Engineering
news.smol.ai
·
2d
ForgeCode
: Top open source coding agent in
Terminal-Bench
2.0
🦀
Rust
tensorlake.ai
·
22h
·
Hacker News
OpenAI and the New
Cognitive
Architecture of Software
Repositories
🕵️
AI Agents
openai.com
·
2d
·
DEV
Structured
CoT
: Shorter Reasoning with a
Grammar
File
🧠
LLMs
andthattoo.dev
·
5d
·
r/LocalLLaMA
Harness
teams of agentic
coders
with Squad
🕵️
AI Agents
infoworld.com
·
8h
DO NOT BUY: AMD Ryzen 9
9950X3D2
CPU Review &
Benchmarks
| 24 Charts in 24 Hours
💾
AI Hardware
gamersnexus.net
·
2h
ExaBench
: An Open Database Performance
Leaderboard
🐘
PostgreSQL
exasol.com
·
1d
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help