🧪 Agent Evaluation - sworddish · Scour

REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces

🧩Neural-Symbolic AI Academic

LLM Agent-Assisted Reverse Engineering with Quantitative Readability Metrics

💬LLMs Academic

Causal Agent Replay: Counterfactual Attribution for LLM-Agent Failures

🃏Imperfect Information Games Academic

H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions

💬LLMs Academic

Tree-of-Experience: A Structured Experience-Management Solution for Self-Evolving Agents under Low-Repetition and Implicit-Reward Environments

🌳Decision-Time Planning Academic

HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning

🌳Decision-Time Planning Academic

Self-evolving LLM agents with in-distribution Optimization

💬LLMs Academic

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

💬LLMs Academic

Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses

💬LLMs Academic

TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

🌳Decision-Time Planning Academic

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

🃏Imperfect Information Games Academic

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

🃏Imperfect Information Games Academic

Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents

🧩Neural-Symbolic AI Academic

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

💬LLMs Academic

When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents

🌳Decision-Time Planning Academic

Assessing Automated Prompt Injection Attacks in Agentic Environments

🃏Imperfect Information Games Academic

Infini Memory: Maintainable Topic Documents for Long-Term LLM Agent Memory

💬LLMs Academic

Toward Agentic Governance: What Shapes LLM-Agent Intervention in Public Forums?

🃏Imperfect Information Games Academic

Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents

💬LLMs Academic

OpenSkill: Open-World Self-Evolution for LLM Agents

💬LLMs Academic

Log in to enable infinite scrolling