Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Evals
📊 LLM Evals
Specific
evaluation, benchmarking, LLM testing, model assessment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
50
posts in
7.5
ms
Beyond English
benchmarks
: clinical
llm
evaluation
in Brazilian Portuguese
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese
Less-relevant results
Claude Fable 5 is Here — Anthropic's Most Powerful Public
Model
Yet
🏗️
Agent Design Patterns
Content type:
Blog
dev.to
·
1d
1 day ago
·
DEV
Actions for Claude Fable 5 is Here — Anthropic's Most Powerful Public Model Yet
Flaws in the
LLM
Automation Narrative
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Flaws in the LLM Automation Narrative
Attention-Discounted Adaptive Sampler for Masked Diffusion Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models
One AI Vendor Is a Single Point of Failure. Treat It Like One.
💾
Agent Memory
Content type:
Blog
dev.to
·
4d
4 days ago
·
DEV
Actions for One AI Vendor Is a Single Point of Failure. Treat It Like One.
Sample Where You Struggle: Sharpening Base
Model
Reasoning via Entropy-Guided Power Sampling
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling
Density Ridge Selective Prediction for
LLM
and VLM
Hallucination
Detection under Calibration Label Scarcity
⚙️
MLOps
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity
Detect AI Agent
Hallucinations
: Zero-Shot Methods
🤖
AI Agents
Content type:
Blog
dev.to
·
6d
6 days ago
·
DEV
Actions for Detect AI Agent Hallucinations: Zero-Shot Methods
The Fine-Tuning Trap:
Evaluating
Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning
🧠
LLMs
Content type:
Academic
arxiv.org
·
4d
4 days ago
Actions for The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning
rag-explained-how-it-works
🔍
RAG
Content type:
Blog
dev.to
·
2d
2 days ago
·
DEV
Actions for rag-explained-how-it-works
Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring Systems
🎼
Agent Orchestration
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring Systems
The Search Engine Renaissance: How Apache Lucene and Elasticsearch Are Reclaiming the AI-Native Future
🔍
RAG
Content type:
Blog
dev.to
·
2d
2 days ago
·
DEV
Actions for The Search Engine Renaissance: How Apache Lucene and Elasticsearch Are Reclaiming the AI-Native Future
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
🧠
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
Gemma 4 makes on-device multimodal AI good enough to ship
🔐
AI Security
Content type:
Blog
dev.to
·
6d
6 days ago
·
DEV
Actions for Gemma 4 makes on-device multimodal AI good enough to ship
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
⚙️
MLOps
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs
Cline + LM Studio 2026: complete setup guide, the 32k context trap, and which coding
models
actually hold up
🌐
Open Source AI
Content type:
Blog
dev.to
·
5h
5 hours ago
·
DEV
Actions for Cline + LM Studio 2026: complete setup guide, the 32k context trap, and which coding models actually hold up
Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
Prompt Engineering Is Systems Design, Not a User Skill
🧠
LLMs
Content type:
Blog
dev.to
·
19h
19 hours ago
·
DEV
Actions for Prompt Engineering Is Systems Design, Not a User Skill
When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample
LLM
Inference
⚙️
MLOps
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference
I Built an Adversarial
Eval
Framework and Attacked 5 LLMs — Every Single One Failed
🌐
Open Source AI
Content type:
Blog
dev.to
·
4d
4 days ago
·
DEV
Actions for I Built an Adversarial Eval Framework and Attacked 5 LLMs — Every Single One Failed
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help