Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 LLM Evaluation
Model Benchmarking, Quality Metrics, Human Evaluation, Testing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112505
posts in
793.2
ms
When LLMs get
significantly
worse: A statistical approach to detect model
degradations
arxiv.org
·
2d
🤖
AI
PELLI
: Framework to
effectively
integrate LLMs for quality software generation
arxiv.org
·
2d
🤖
AI
How Today’s AI Models Are
Leaving
Enterprises
in the Dark
modernghana.com
·
21h
🤖
AI
Leetcode
for
ML
pixelbank.dev
·
4h
🤖
AI
Here’s Our First Gemini Deep Think
LLM-Assisted
Hardware
Design
blog.adafruit.com
·
4h
🤖
AI
Ask HN: What
explains
the recent surge in LLM coding
capabilities
?
news.ycombinator.com
·
3h
·
Discuss:
Hacker News
🤖
AI
Data Engineering for Large Models:
Architecture
,
Algorithms
& Projects
github.com
·
1d
·
Discuss:
Hacker News
🤖
AI
Challenges of
revision
control in the LLM era
gist.github.com
·
9h
·
Discuss:
Hacker News
⚖
English law
Reflections on
prototyping
a
sysadmin
benchmark
samek.fyi
·
1d
🤖
AI
AgentRE-Bench
: Can LLM Agents Reverse Engineer Malware?
agentre-bench.ai
·
1d
·
Discuss:
Hacker News
🤖
AI
Comprehensive
Code Review
agenticoding.ai
·
5h
🤖
AI
Securing
LLM Applications: Using LLM-as-a-Judge to Block Prompt
Injection
Attacks
infosecwriteups.com
·
1d
🤖
AI
Are Multi-Agent LLM
Workflows
Quietly
Amplifying
Mistakes?
medium.com
·
1d
·
Discuss:
DEV
🤖
AI
Study: Platforms that
rank
the latest LLMs can be
unreliable
digitalinformationworld.com
·
3d
🤖
AI
A New LLM System for
Synthesis
Planning
science.org
·
1d
🤖
AI
LLMs will
either
be the best or worst thing to happen to software engineering. They will free us from
whittling
programs by hand. But will we use that freedom t...
bsky.app
·
1d
·
Discuss:
Bluesky
🤖
AI
The role of large language models in emergency care: a
comprehensive
benchmarking
study
nature.com
·
1d
🤖
AI
Building an ARC-2
Solver
— From
Socratic
Panels to a Single Oracle
pub.towardsai.net
·
1d
🤖
AI
Completed
Hyperparameter
Transfer across Modules, Width, Depth, Batch and
Duration
machinelearning.apple.com
·
2d
🤖
AI
🤖AI Agents Weekly: GPT-5.3-Codex-Spark,
GLM-5
, MiniMax M2.5, Recursive Language Models, Harness Engineering,
Agentica
, and More
nlp.elvissaravia.com
·
12h
🤖
AI
Loading...
Loading more...
« Page 1
•
Page 3 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help