Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Evals
📊 Evals
Specific
LLM evaluation, harness, benchmarking, eval framework
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
125
posts in
4.5
ms
PhysMetrics.Weather: An
Evaluation
Framework
for Physical Consistency in ML Weather
Models
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for PhysMetrics.Weather: An Evaluation Framework for Physical Consistency in ML Weather Models
Bring your own
evaluation
framework
to
EvalHub
✍️
Prompt Engineering
developers.redhat.com
·
1d
1 day ago
Actions for Bring your own evaluation framework to EvalHub
MLPerf and the rise of latency-aware
LLM
benchmarking
🧠
LLMs
edn.com
·
5d
5 days ago
Actions for MLPerf and the rise of latency-aware LLM benchmarking
Less-relevant results
AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
🌐
Open Source AI
phoronix.com
·
3h
3 hours ago
Actions for AMD's Lemonade SDK For Local AI Adds NVIDIA CUDA Support
1-bit and 1.58 bit
LLM
Benchmarking
on Jetson Orin Nano Super | Bonsai
LM
🌐
Open Source AI
smolhub.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
Daimon Robotics and Galbot jointly launches RobOmni for
benchmarking
tactile perception and dexterous manipulation
🏆
SOTA Models
therobotreport.com
·
2d
2 days ago
Actions for Daimon Robotics and Galbot jointly launches RobOmni for benchmarking tactile perception and dexterous manipulation
What Does Abliteration Actually Cost?
✍️
Prompt Engineering
lesswrong.com
·
5d
5 days ago
Actions for What Does Abliteration Actually Cost?
How to Select Your POI Data Provider |
Evaluation
Framework
for Quality & Coverage
🎛️
Fine-tuning
Content type:
Blog
mapbox.com
·
2d
2 days ago
Actions for How to Select Your POI Data Provider | Evaluation Framework for Quality & Coverage
An information-theoretic
evaluation
framework
for CNN–LSTM-based Alzheimer’s disease classification from structural MRI
🧠
LLMs
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for An information-theoretic evaluation framework for CNN–LSTM-based Alzheimer’s disease classification from structural MRI
StereoTales: Multilingual Open-Ended Stereotype Discovery in LLMs
🧠
LLMs
Content type:
Blog
research.giskard.ai
·
6d
6 days ago
·
Hacker News
Actions for StereoTales: Multilingual Open-Ended Stereotype Discovery in LLMs
The State of
LLM
Evaluation
(2026): Why Evals Became the New Unit Tests
🧠
LLMs
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for The State of LLM Evaluation (2026): Why Evals Became the New Unit Tests
For Robotaxis, Safety Must Be Built In, Not Bolted On
✍️
Prompt Engineering
Content type:
Blog
blogs.nvidia.com
·
46m
46 minutes ago
Actions for For Robotaxis, Safety Must Be Built In, Not Bolted On
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
🏆
SOTA Models
Content type:
Blog
huggingface.co
·
6d
6 days ago
Actions for Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
Comprehensive
evaluation
of
LLM
capabilities for interpretation and analysis of genome-scale metabolic
models
in metabolic engineering
⚡
Inference
Content type:
Academic
biorxiv.org
·
1d
1 day ago
Actions for Comprehensive evaluation of LLM capabilities for interpretation and analysis of genome-scale metabolic models in metabolic engineering
Flaws in the
LLM
Automation Narrative
🏆
SOTA Models
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for Flaws in the LLM Automation Narrative
Evaluate
your Amazon Nova Sonic voice agent at scale, no microphone required
✍️
Prompt Engineering
Content type:
Blog
aws.amazon.com
·
2d
2 days ago
Actions for Evaluate your Amazon Nova Sonic voice agent at scale, no microphone required
Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16 GB of RAM
🌐
Open Source AI
the-decoder.com
·
6d
6 days ago
Actions for Google Deepmind's Gemma 4 12B squeezes multimodal AI onto a laptop with just 16 GB of RAM
The biggest local
LLM
on your machine is useless if it can't call a single tool, no matter how many parameters it has
🌐
Open Source AI
xda-developers.com
·
2h
2 hours ago
Actions for The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has
Launch HN: General Instinct (YC P26) – Frontier
models
on edge devices
🏆
SOTA Models
Content type:
Discussion
news.ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Launch HN: General Instinct (YC P26) – Frontier models on edge devices
LLM-Based
Visualization
Evaluation
: How Well Do Literacy-Stratified Personas Approximate
Human
Judgments?
🧠
LLMs
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for LLM-Based Visualization Evaluation: How Well Do Literacy-Stratified Personas Approximate Human Judgments?
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help