Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Benchmarking
馃搱 Benchmarking
Model Evaluation, Performance Metrics, MMLU, HumanEval
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
39
posts in
11.3
ms
MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in
Language
Models
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models
FASE: Fast Adaptive Semantic Entropy for Code Quality
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for FASE: Fast Adaptive Semantic Entropy for Code Quality
Attention-Discounted Adaptive Sampler for Masked Diffusion
Language
Models
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Attention-Discounted Adaptive Sampler for Masked Diffusion Language Models
Null-Space Constrained Low-Rank Adaptation for Response-Specified
Large
Language
Model
Unlearning
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Null-Space Constrained Low-Rank Adaptation for Response-Specified Large Language Model Unlearning
Sample Where You Struggle: Sharpening Base
Model
Reasoning via Entropy-Guided Power Sampling
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling
UrduMMLU: A Massive Multitask
Benchmark
for Urdu
Language
Understanding
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
3d
3 days ago
Actions for UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding
Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring Systems
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring Systems
CodeAlchemy: Synthetic Code Rewriting at Scale
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for CodeAlchemy: Synthetic Code Rewriting at Scale
Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving
Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning
From 0-to-1 to 1-to-N: Reproducible Engineering Evidence for MetaAI Recursive Self-Design
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for From 0-to-1 to 1-to-N: Reproducible Engineering Evidence for MetaAI Recursive Self-Design
Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Scaffold, Not Vocabulary? A Controlled, Two-Tier, Pre-Registered Study of a Popperian Code-Generation Skill
PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents
Less is MoE: Trimming Experts in Domain-Specialist
Language
Models
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Less is MoE: Trimming Experts in Domain-Specialist Language Models
The Fine-Tuning
Trap
:
Evaluating
Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning
聽
馃
LLM
聽
Content type:
Academic
arxiv.org
路
3d
3 days ago
Actions for The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning
Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of
Language
Models
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models
MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Self-Commitment Latency: A Reward-Free Probe for Prompted Implicit Hacking
SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation
« Page 1
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help