Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
📊 AI Evals
benchmarks, evaluation, MMLU, leaderboard, harness
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
149456
posts in
14.0
ms
Better
Harness
: A Recipe for
Harness
Hill-Climbing with
Evals
🔀
LoRA
blog.langchain.com
·
1d
Position: Science of AI Evaluation
Requires
Item-level
Benchmark Data
📄
AI Papers
arxiv.org
·
3d
Benchmarking
LLMs with
Marimo
Pair
🚀
MLOps
ericmjl.github.io
·
11h
·
Hacker News
My personal research
agenda
, version 1.
🧠
AI
lesswrong.com
·
2h
How to
perform
a structured evaluation of AI
conversational
solutions
💬
LLMs
thoughtworks.com
·
1d
Show HN:
Proposal
for a real long-term AI memory
benchmark
💬
LLMs
penfieldlabs.substack.com
·
16h
·
Substack
What is AI
Harness
Engineering?
🕵️
AI Agents
medium.com
·
5d
alexfleetcommander/smokehouse-eval
: The BBQ Benchmark - Competition-judging framework for AI model evaluation. Applies BBQ judging mechanics (double-blind, drop-scoring, weighted multi-dimensional criteria) to AI evaluation.
🚀
MLOps
github.com
·
15h
·
DEV
The art of AI
harness
engineering
🕵️
AI Agents
bdtechtalks.substack.com
·
2d
·
Substack
Stealth
Alibaba
Video AI Model Tops Global Ranking on Debut
🔀
LoRA
bloomberg.com
·
4h
Benchmarks
are the new stars
🚀
MLOps
mercurialsolo.github.io
·
14h
The Dark Factory
Harness
: Turning Autonomous
Hill-Climbing
into Autonomous Research
🚀
MLOps
sotaverified.org
·
2d
·
Hacker News
Harness
Engineering: The Missing
Layer
in AI Systems
🕵️
AI Agents
medium.com
·
3d
The Real AI Race Isn't About Models or Data. It's About
Context
.
🕵️
AI Agents
blog.hubspot.com
·
19h
Alibaba leads $
293m
round in Chinese AI start-up after
HappyHorse
reveal
🧠
AI
siliconrepublic.com
·
35m
AI Coding and Best Tools
Compared
🕵️
AI Agents
altexsoft.com
·
2d
Meta
Muse
Spark: What the
Benchmarks
Actually Mean
🚀
MLOps
medium.com
·
15h
A mysterious video generation model that
swept
to the top of global benchmarks was developed by a team under Alibaba, sending
ripples
across China’s AI industry...
🔀
LoRA
twitter.macworks.dev
·
4h
Agent Labs:
Workload-Harness
Fit
🎯
Fine-Tuning
akashbajwa.co
·
6d
·
Hacker News
Presentation: State of Play: AI Coding
Assistants
🕵️
AI Agents
infoq.com
·
2d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help