Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🧪 LLM Testing
LLM eval, model evaluation, evals, harness, benchmarks
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
184452
posts in
29.5
ms
A
Metamorphic
Testing Approach to Diagnosing
Memorization
in LLM-Based Program Repair
🧠
LLMs
arxiv.org
·
6d
Continually
improving our agent
harness
🤖
AI Agent
cursor.com
·
1d
Evals
in
practice
for an AI coding agent
🤖
AI Agent
ministryoftesting.com
·
3h
1jehuang/jcode
: Coding Agent Harness
🤖
AI Agent
github.com
·
13h
·
Hacker News
Granite
4.1: IBM's
8B
Model Is Competing With Models Four Times Its Size
🧠
LLMs
firethering.com
·
4h
·
Hacker News
针对您的具体应用场景量身定制的
Vibe-train
评估与防护措施
🤖
AI Agent
plurai.ai
·
1d
Vibing
, Harness and
OODA
loop
✍️
Prompt Engineering
architecture-weekly.com
·
3d
Best
Cheap
Open Source Models for
Hermes
Agent in 2026
🤖
AI Agent
bitdoze.com
·
14h
If Claude
Feels
Worse, Fix Your
Harness
✍️
Prompt Engineering
mdelcaro.substack.com
·
2d
·
Substack
Plurai
, a vibe training platform for evals, is launching today on Product Hunt! https://
meooow.link/plurai
✍️
Prompt Engineering
producthunt.com
·
23h
·
DEV
ForgeCode
: Top open source coding agent in
Terminal-Bench
2.0
🦀
Rust
tensorlake.ai
·
20h
·
Hacker News
Getting Up to Speed on Multi-Agent Systems, Part 7:
Benchmarks
and What They Miss
🕵️
AI Agents
christophermeiklejohn.com
·
2h
How My
RLM
Tool Works
🧠
LLMs
isaacflath.com
·
3d
·
Hacker News
Bun
’s Zig fork got 4x faster
compilation
times
🐹
Go
ziggit.dev
·
2d
Harness
teams of agentic
coders
with Squad
🕵️
AI Agents
infoworld.com
·
5h
Gemini-3-Flash: My ai agent benchmark
terminalbench
Win & 3
Fixes
✍️
Prompt Engineering
buildzn.com
·
2d
·
DEV
VoidZero
’s Experimental
Oxc
Angular Compiler with Up to 20x Faster Build Performance
🐹
Go
infoq.com
·
23h
Structured
CoT
: Shorter Reasoning with a
Grammar
File
🧠
LLMs
andthattoo.dev
·
5d
·
r/LocalLLaMA
local-first MCP code intelligence (and the
runs
we
lose
)
🐹
Go
sverklo.com
·
2d
·
Hacker News
Innovid
expands measurement with purchase data and publisher
attribution
🖥️
Fullstack
ppc.land
·
21h
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help