Evaluating Long-Context Reasoning in LLM-Based WebAgents
arxiv.org·2d
Formal Methods
Preview
Report Post

Title:Evaluating Long-Context Reasoning in LLM-Based WebAgents

View PDF HTML (experimental)

Abstract:As large language model (LLM)-based agents become increasingly integrated into daily digital interactions, their ability to reason across long interaction histories becomes crucial for providing personalized and contextually aware assistance. However, the performance of these agents in long context scenarios, particularly for action-taking WebAgents operating in realistic web environments, remains largely unexplored. This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of informatio…

Similar Posts

Loading similar posts...