Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 LLM Evaluation
Model Benchmarking, Quality Metrics, Human Evaluation, Testing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
80175
posts in
200.9
ms
TamperBench
:
Systematically
Stress-Testing LLM Safety Under Fine-Tuning and Tampering
arxiv.org
·
1d
🤖
AI
Building LLMs in
Resource-Constrained
Environments
: A Hands-On Perspective
infoq.com
·
19h
🤖
AI
Study: Platforms that
rank
the latest LLMs can be
unreliable
news.mit.edu
·
1d
🤖
AI
Some
thoughts
on LLM coding
blog.dave.tf
·
1d
·
Discuss:
Hacker News
🤖
AI
Implementing
Automated Rules-Based
Evaluations
for LLM Applications
github.com
·
4d
·
Discuss:
DEV
🤖
AI
LLMs
Refuse
High-Cost Attacks but Stay
Vulnerable
to Cheap, Real-World Harm
expectedharm.github.io
·
1h
·
Discuss:
Hacker News
🤖
AI
SAE Feature
Matchmaking
(Layer-to-Layer) by
Mitali
M
greaterwrong.com
·
2h
🤖
AI
Stop Silent Failures: Using LLMs to
Validate
Web
Scraper
Output
dev.to
·
1d
·
Discuss:
DEV
🤖
AI
Why
Spec-Driven
Development
Breaks
at Scale (and How to Fix It)
arcturus-labs.com
·
9h
·
Discuss:
Hacker News
🤖
AI
Custom AI Tool Development in
Regulated
Industries: Why
Off-The-Shelf
LLM Solutions Fall Short
analyticsvidhya.com
·
18h
🤖
AI
The Potential of
RLMs
dbreunig.com
·
13h
·
Discuss:
Hacker News
🤖
AI
RAG
vs. Fine-Tuning: Why Your LLM Strategy is Probably
Half-Baked
pub.towardsai.net
·
1d
🤖
AI
Show HN:
C-CMCP
–
Validated
AI development workflow with quality gates
news.ycombinator.com
·
15h
·
Discuss:
Hacker News
🤖
AI
Reliability of LLMs as medical assistants for the general public: a
randomized
preregistered
study
nature.com
·
14h
·
Discuss:
Hacker News
🤖
AI
Implementing
Automated Rules-Based
Evaluations
for LLM Applications
dev.to
·
4d
·
Discuss:
DEV
🤖
AI
Agent
Evaluation
: How to Test and
Measure
Agentic AI Performance
machinelearningmastery.com
·
4d
🤖
AI
Property-based testing as
executable
specs
for agentic coding
kiro.dev
·
5h
·
Discuss:
Hacker News
🤖
AI
The LLM Judge
Controversy
mlfrontiers.substack.com
·
1d
·
Discuss:
Substack
🤖
AI
Code vs
Serialized
AST
Inputs
for LLM-Based Code Summarization: An Empirical Study
arxiv.org
·
1d
🤖
AI
Why the “Best LLM for
Marketing
” Doesn’t
Exist
unite.ai
·
13h
🤖
AI
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help