Agent Evaluation

Feeds to Scour
SubscribedAll
Scoured 92 posts in 4.2 ms

REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces

馃ЗNeural-Symbolic AIContent type: Academic
arxiv.org

LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics

馃挰LLMsContent type: Academic
arxiv.org

Causal Agent Replay: Counterfactual Attribution for LLM-Agent Failures

馃儚Imperfect Information GamesContent type: Academic
arxiv.org

H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions

馃挰LLMsContent type: Academic
arxiv.org

Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments

馃尦Decision-Time PlanningContent type: Academic
arxiv.org

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

馃尦Decision-Time PlanningContent type: Academic
arxiv.org

Self-evolving LLM agents with in-distribution Optimization

馃挰LLMsContent type: Academic
arxiv.org

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

馃挰LLMsContent type: Academic
arxiv.org

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

馃挰LLMsContent type: Academic
arxiv.org

TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

馃尦Decision-Time PlanningContent type: Academic
arxiv.org

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

馃儚Imperfect Information GamesContent type: Academic
arxiv.org

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

馃儚Imperfect Information GamesContent type: Academic
arxiv.org

Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

馃ЗNeural-Symbolic AIContent type: Academic
arxiv.org

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

馃挰LLMsContent type: Academic
arxiv.org

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

馃尦Decision-Time PlanningContent type: Academic
arxiv.org

Assessing Automated Prompt Injection Attacks in Agentic Environments

馃儚Imperfect Information GamesContent type: Academic
arxiv.org

Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

馃挰LLMsContent type: Academic
arxiv.org

Toward Agentic Governance: What Shapes LLM-Agent Intervention in Public Forums?

馃儚Imperfect Information GamesContent type: Academic
arxiv.org

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

馃挰LLMsContent type: Academic
arxiv.org

OpenSkill: Open-World Self-Evolution for LLM Agents

馃挰LLMsContent type: Academic
arxiv.org

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help