Agent Evaluation
ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents
💬LLMs Content type: AcademicLakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake
✓Formal Verification Content type: AcademicAgent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads
💬LLMs Content type: AcademicPOISE: Position-Aware Undetectable Skill Injection on LLM Agents
🃏Imperfect Information Games Content type: AcademicLess-relevant results
Data Agents Under Attack: Vulnerabilities in LLM-Driven Analytical Systems
💬LLMs Content type: Academic3SPO: State-Score-Supervised Policy Optimization for LLM Agents
🃏Imperfect Information Games Content type: AcademicCollective Hallucination in Multi-Agent LLMs:Modeling and Defense
🃏Imperfect Information Games Content type: AcademicMind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?
📐Formal Languages Content type: AcademicBrain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents
🃏Imperfect Information Games Content type: AcademicNo more posts from sworddish's subscribed feeds.