Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Evals
馃搳 LLM Evals
Specific
AI evaluation, benchmarking LLMs, model assessment, AI harness
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
53
posts in
6.4
ms
Less is MoE: Trimming Experts in Domain-Specialist Language
Models
聽
馃敡
MLOps
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Less is MoE: Trimming Experts in Domain-Specialist Language Models
Discourse-Role Labels as Presentation-Time Variables for Context Use in Language
Models
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models
Lightweight Language
Models
are Prone to Reasoning Errors for Complex Computational Phenotyping Tasks
聽
馃敡
MLOps
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Lightweight Language Models are Prone to Reasoning Errors for Complex Computational Phenotyping Tasks
Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language
Models
聽
馃敡
MLOps
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models
Evidence Markets
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Evidence Markets
Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving
PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents
The Geography of Algorithmic Judgment:
LLM
Intermediaries, Place Identity, and Racial Steering in Housing Search
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search
Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language
Models
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models
SemBlock: Semantic Boundary Dynamic Blocks for Diffusion
LLMs
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for SemBlock: Semantic Boundary Dynamic Blocks for Diffusion LLMs
MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking
Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language
Models
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help