Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Benchmarking
📈 Benchmarking
Model Evaluation, Performance Metrics, MMLU, HumanEval
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
39
posts in
5.7
ms
Rank Intervals for
Leaderboards
: A Hierarchical Framework for
Model
Evaluation
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation
What Does Abliteration Actually Cost?
🤖
LLM
lesswrong.com
·
6d
6 days ago
Actions for What Does Abliteration Actually Cost?
Researchers say they trained a
foundation
model
from scratch for about $1,500
🤖
LLM
venturebeat.com
·
8h
8 hours ago
Actions for Researchers say they trained a foundation model from scratch for about $1,500
The biggest local
LLM
on your machine is useless if it can't call a single tool, no matter how many parameters it has
🤖
LLM
xda-developers.com
·
13h
13 hours ago
Actions for The biggest local LLM on your machine is useless if it can't call a single tool, no matter how many parameters it has
Adrarsh Divakaran: Building AI Agents in Python
🤖
LLM
Content type:
Blog
blog.adarshd.dev
·
6d
6 days ago
Actions for Adrarsh Divakaran: Building AI Agents in Python
Context windows in AI: why every token is a budget decision
🤖
LLM
Content type:
Blog
redis.io
·
11h
11 hours ago
Actions for Context windows in AI: why every token is a budget decision
LLM
Research Papers: The 2026 List (January to May)
🤖
LLM
Content type:
News
magazine.sebastianraschka.com
·
4d
4 days ago
·
Hacker News
Actions for LLM Research Papers: The 2026 List (January to May)
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
🤖
LLM
huggingface.co
·
6d
6 days ago
·
Hacker News
,
Hacker News
,
r/LocalLLaMA
Actions for nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
Launch HN: General Instinct (YC P26) – Frontier
models
on edge devices
🤖
AI
Content type:
Discussion
news.ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Launch HN: General Instinct (YC P26) – Frontier models on edge devices
Multilingual Refusal Alignment for Safer
Large
Language
Models
🤖
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Multilingual Refusal Alignment for Safer Large Language Models
Why Shrinking an AI
Model
Often Makes It More Useful
🤖
LLM
siliconopera.com
·
3d
3 days ago
Actions for Why Shrinking an AI Model Often Makes It More Useful
Reality: The Final
Eval
— Lukas Petersson and Axel Backlund of Andon Labs
🤖
LLM
latent.space
·
6d
6 days ago
·
Hacker News
Actions for Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Back on Track: Aligning Rewards and States for Reasoning in Diffusion
Large
Language
Models
🤖
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Back on Track: Aligning Rewards and States for Reasoning in Diffusion Large Language Models
🧾 Weekly Wrap Sheet (06/05/2026): Prospectuses & Platforms
💰
Finance
Content type:
News
Content type:
Blog
saanyaojha.substack.com
·
3d
3 days ago
·
Substack
Actions for 🧾 Weekly Wrap Sheet (06/05/2026): Prospectuses & Platforms
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
🤖
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
Revisiting
GSM-Symbolic
: Do 2026 Frontier
Models
Still Fail at Confounded Grade School Math?
🤖
LLM
lesswrong.com
·
5d
5 days ago
Actions for Revisiting GSM-Symbolic: Do 2026 Frontier Models Still Fail at Confounded Grade School Math?
When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample
LLM
Inference
🤖
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
🤖
LLM
Content type:
Blog
huggingface.co
·
6d
6 days ago
Actions for Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
Density Ridge Selective Prediction for
LLM
and VLM Hallucination Detection under Calibration Label Scarcity
🤖
LLM
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity
Evaluating
using Mock Tool Calls to Quarantine Untrusted Prompt Inputs
🤖
LLM
lesswrong.com
·
5d
5 days ago
Actions for Evaluating using Mock Tool Calls to Quarantine Untrusted Prompt Inputs
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help