🧪 Agent Evaluation - sworddish · Scour

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

🃏Imperfect Information Games Academic

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

💬LLMs Academic

Rosetta Memory: Adaptive Memory for Cross-LLM Agents

💬LLMs Academic

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

🃏Imperfect Information Games Academic

LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake

✓Formal Verification Academic

Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

💬LLMs Academic

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

🃏Imperfect Information Games Academic

Less-relevant results

Data Agents Under Attack: Vulnerabilities in LLM-Driven Analytical Systems

💬LLMs Academic

RedEdit: Agentic Red-Teaming of Image Safety Classifiers via MCTS-Guided Photo-Editing

🌳Decision-Time Planning Academic

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

🃏Imperfect Information Games Academic

Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration

🃏Imperfect Information Games Academic

Collective Hallucination in Multi-Agent LLMs:Modeling and Defense

🃏Imperfect Information Games Academic

TOKI: A Bitemporal Operator Algebra for Contradiction Resolution in LLM-Agent Persistent Memory

🃏Imperfect Information Games Academic

HDSL: A Hierarchical Domain-Specific Language for Structured 3D Indoor Scene Generation and Localized Editing with LLM Agents

📐Formal Languages Academic

SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection

🃏Imperfect Information Games Academic

Hierarchical Certified Semantic Commitment for Byzantine-Resilient LLM-Agent Collaboration

✓Formal Verification Academic

Mind the Gap: Can Frontier LLMs Pass a Standardized Office Proficiency Exam?

📐Formal Languages Academic

Critic-Guided Heterogeneous Multi-Agent Reasoning for Reliable Mathematical Problem Solving

🧩Neural-Symbolic AI Academic

Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy

🃏Imperfect Information Games Academic

Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents

🃏Imperfect Information Games Academic

No more posts from sworddish's subscribed feeds.

Scour all 25258 feeds Learn more about Feeds

Sign up or log in to see more results

Log in to enable infinite scrolling