Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 LLM Evaluation
Model Benchmarking, Quality Metrics, Human Evaluation, Testing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112723
posts in
291.2
ms
Benchmark Health Index: A
Systematic
Framework for Benchmarking the
Benchmarks
of LLMs
arxiv.org
·
17h
🤖
AI
Assessing
LLM Reliability on
Temporally
Recent Open-Domain Questions
arxiv.org
·
17h
✍
longform travel writing
AI
Proactively
Finds Software Bugs Before Failures In Realistic
Codebases
quantumzeitgeist.com
·
1d
🤖
AI
8
Standards
for Shipping Production LLM Features
teotti.com
·
22h
·
Discuss:
Hacker News
🤖
AI
The
OWASP
Top 10 for LLMs — A
Pentester
's Practical Guide
dev.to
·
2h
·
Discuss:
DEV
🤖
AI
BalatroBench
Benchmarks
Large Language Models Playing Balatro
balatrobench.com
·
11h
·
Discuss:
Hacker News
🤖
AI
Olmix
: A framework for data mixing throughout
LM
development
allenai.org
·
6h
🤖
AI
GLM-5
: Targeting complex systems engineering and
long-horizon
agentic tasks
news.ycombinator.com
·
2h
·
Discuss:
Hacker News
🤖
AI
Karpathy
's
Micro
LLM in JavaScript
github.com
·
1d
·
Discuss:
Hacker News
🤖
AI
LLMs will
either
be the best or worst thing to happen to software engineering. They will free us from
whittling
programs by hand. But will we use that freedom t...
bsky.app
·
12h
·
Discuss:
Bluesky
🤖
AI
You are
probably
overpaying
for intelligence
residuals.bearblog.dev
·
1h
🤖
AI
Reflections on
prototyping
a
sysadmin
benchmark
samek.fyi
·
2h
🤖
AI
MiniMaxAI/MiniMax-M2.5
huggingface.co
·
8h
·
Discuss:
Hacker News
,
r/LocalLLaMA
🤖
AI
Find the right local LLM for your
exact
hardware
localclaw.io
·
15h
·
Discuss:
Hacker News
🤖
AI
Are Multi-Agent LLM
Workflows
Quietly
Amplifying
Mistakes?
medium.com
·
10h
·
Discuss:
DEV
🤖
AI
Securing
LLM Applications: Using LLM-as-a-Judge to Block Prompt
Injection
Attacks
infosecwriteups.com
·
15h
🤖
AI
Completed
Hyperparameter
Transfer across Modules, Width, Depth, Batch and
Duration
machinelearning.apple.com
·
22h
🤖
AI
Study: Platforms that
rank
the latest LLMs can be
unreliable
digitalinformationworld.com
·
2d
🤖
AI
Building an ARC-2
Solver
— From
Socratic
Panels to a Single Oracle
pub.towardsai.net
·
18h
🤖
AI
The
Evolving
Role of the
ML
Engineer
towardsdatascience.com
·
7h
🤖
AI
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help