Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📏 LLM Evaluation
Specific
Benchmarks, Evaluation Frameworks, Metrics, LLM Testing
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
187227
posts in
16.1
ms
BLAST:
Benchmarking
LLMs with
ASP-based
Structured Testing
🔄
MLOps
arxiv.org
·
3d
Cyborg
evals
🔄
MLOps
lesswrong.com
·
9h
·
Hacker News
google-deepmind/proeval
:
Proactive
failure discovery and efficient performance estimation for GenAI evaluation.
🔄
MLOps
github.com
·
1d
Evals
in
practice
for an AI coding agent
🔌
Claude Plugins
ministryoftesting.com
·
16h
针对您的具体应用场景量身定制的
Vibe-train
评估与防护措施
📈
Prometheus
plurai.ai
·
1d
Better audio and a
decent
chair do more for gaming than 100 extra
FPS
📊
Load Testing
xda-developers.com
·
4h
Introducing
SOB
: A Multi-Source
Structured
Output Benchmark for LLMs
🔄
MLOps
interfaze.ai
·
3d
·
Hacker News
Granite
4.1: IBM's
8B
Model Is Competing With Models Four Times Its Size
🔄
MLOps
firethering.com
·
16h
·
Hacker News
Temporal
Language Models
🔄
MLOps
calcifercomputing.com
·
2d
·
Hacker News
Getting Up to Speed on Multi-Agent Systems, Part 7:
Benchmarks
and What They Miss
🌐
Distributed Systems
christophermeiklejohn.com
·
15h
Intel Arc
G3
Extreme CPU Shows
Promising
Performance in Benchmark Leak
📊
Load Testing
techpowerup.com
·
3h
not much
happened
today
🔄
MLOps
news.smol.ai
·
2d
Introducing
ARFBench
: A time series
question-answering
benchmark based on real incidents
📈
Prometheus
blog.ml.cmu.edu
·
3d
Assessing
the
Viability
of Open Source Projects
📈
Prometheus
fastwonderblog.com
·
11h
Why real-time
teamwork
dashboards
can backfire instead of improving collaboration
📈
Prometheus
phys.org
·
3h
ExaBench
: An Open Database Performance
Leaderboard
📈
Prometheus
exasol.com
·
1d
·
Hacker News
Diagnosing
protein
sequence
search in the era of language models
🧮
Vector Databases
biorxiv.org
·
7h
Introducing the
Apitally
CLI and
skill
for agents
🔌
Claude Plugins
apitally.io
·
5d
·
r/node
Load
balancer
for
vLLM
server instances?
📊
Load Testing
docs.vllm.ai
·
2d
·
r/LocalLLaMA
Training on
Fiction
While the Real Threat is in Your
Inbox
🔄
MLOps
cofense.com
·
22h
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help