Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Evaluation
📊 LLM Evaluation
Specific
Benchmarks, Model Testing, Performance Metrics, HELM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
65
posts in
6.3
ms
What Does Abliteration Actually Cost?
🔄
Transformers
lesswrong.com
·
6d
6 days ago
Actions for What Does Abliteration Actually Cost?
A Controlled Study of Decoding-Time
Truthfulness
Methods on Instruction-Tuned LLMs
⚡
Inference Optimization
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs
Less-relevant results
The total number of possible chess games is so large that it exceeds the number of atoms in the observable universe — by some estimates, there are more possible chess games than there are atoms in approximately a trillion trillion trillion universes like ours — and despite this near-infinite possibility space,
modern
chess engines can now defeat any human grandmaster who has ever lived, in any opening position they care to attempt
🔲
TPU Architecture
spacedaily.com
·
2d
2 days ago
·
Hacker News
Actions for The total number of possible chess games is so large that it exceeds the number of atoms in the observable universe — by some estimates, there are more possible chess games than there are atoms in approximately a trillion trillion trillion universes like ours — and despite this near-infinite possibility space, modern chess engines can now defeat any human grandmaster who has ever lived, in any opening position they care to attempt
The State of
LLM
Evaluation
(2026): Why Evals Became the New Unit
Tests
🤖
LLM Agents
Content type:
Blog
medium.com
·
3d
3 days ago
Actions for The State of LLM Evaluation (2026): Why Evals Became the New Unit Tests
I built a dashboard ranking all 48 World Cup 2026 teams by travel difficulty
📐
Linear Algebra
jetlagxi.com
·
2d
2 days ago
·
r/SideProject
Actions for I built a dashboard ranking all 48 World Cup 2026 teams by travel difficulty
Researchers say they trained a foundation
model
from scratch for about $1,500
🎛️
Fine-Tuning
venturebeat.com
·
12h
12 hours ago
Actions for Researchers say they trained a foundation model from scratch for about $1,500
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
🔄
Transformers
Content type:
Blog
huggingface.co
·
6d
6 days ago
Actions for Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
USMNT World Cup bracket scenarios, odds to advance, predicted path to knockouts
📐
Linear Algebra
Content type:
Video
Content type:
News
espn.com
·
23h
23 hours ago
Actions for USMNT World Cup bracket scenarios, odds to advance, predicted path to knockouts
Mr Vegas World Cup offer 2026: Bet £10, Get £30 in free bets
🎯
RLHF
Content type:
News
thesun.co.uk
·
1d
1 day ago
Actions for Mr Vegas World Cup offer 2026: Bet £10, Get £30 in free bets
What does a reranker even do ?
🔍
RAG
Content type:
Blog
anima-mundi.bearblog.dev
·
6d
6 days ago
Actions for What does a reranker even do ?
Launch HN: General Instinct (YC P26) – Frontier
models
on edge devices
⚡
Inference Optimization
Content type:
Discussion
news.ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Launch HN: General Instinct (YC P26) – Frontier models on edge devices
Bring your own
evaluation
framework to EvalHub
🔥
PyTorch Internals
developers.redhat.com
·
2d
2 days ago
Actions for Bring your own evaluation framework to EvalHub
Soft-Prompt Tuning for Fair and Efficient
LLM
Benchmark
Evaluation
🔧
MLIR
Content type:
Academic
arxiv.org
·
6h
6 hours ago
Actions for Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation
Beat the Oracle
🔍
RAG
Content type:
Code
github.com
·
4d
4 days ago
·
DEV
Actions for Beat the Oracle
AI Governance Tools: How To Achieve Compliance and Visibility
🤖
LLM Agents
Content type:
Blog
blog.n8n.io
·
19h
19 hours ago
Actions for AI Governance Tools: How To Achieve Compliance and Visibility
Cybersecurity M&A Roundup: 26 Deals Announced in May 2026
🤖
agentic system
securityweek.com
·
2d
2 days ago
Actions for Cybersecurity M&A Roundup: 26 Deals Announced in May 2026
Context windows in AI: why every token is a budget decision
🔍
RAG
Content type:
Blog
redis.io
·
15h
15 hours ago
Actions for Context windows in AI: why every token is a budget decision
FanGraphs Power Rankings: June 1–7
🎯
RLHF
Content type:
News
Content type:
Blog
blogs.fangraphs.com
·
2d
2 days ago
Actions for FanGraphs Power Rankings: June 1–7
The biggest local
LLM
on your machine is useless if it can't call a single tool, no matter how many parameters it has
🤖
agentic system
xda-developers.com
·
17h
17 hours ago
Actions for The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
⚡
Inference Optimization
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help