PPTArena: A Benchmark for Agentic PowerPoint Editing

Title:PPTArena: A Benchmark for Agentic PowerPoint Editing

Abstract:We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images. Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent…

Title:PPTArena: A Benchmark for Agentic PowerPoint Editing

View PDF

Abstract:We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images. Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent that plans semantic edit sequences, routes between high-level programmatic tools and deterministic XML operations for precise control, and verifies outputs through an iterative plan-edit-check loop against task-specific constraints. In our experiments, PPTPilot outperforms strong proprietary agents and frontier VLM systems by over 10 percentage points on compound, layout-sensitive, and cross-slide edits, with particularly large gains in visual fidelity and deck-wide consistency. Despite these improvements, existing agents still underperform on long-horizon, document-scale tasks in PPTArena, highlighting the remaining challenges in reliable PPT editing.


Comments:	25 pages, 26 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2512.03042 [cs.CV]
	(or arXiv:2512.03042v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2512.03042 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Michael Ofengenden [view email] [v1] Tue, 2 Dec 2025 18:59:50 UTC (37,000 KB)

Title:PPTArena: A Benchmark for Agentic PowerPoint Editing

Title:PPTArena: A Benchmark for Agentic PowerPoint Editing

Submission history

Similar Posts