Evaluating LLMs' Reasoning Over Ordered Procedural Steps
arxiv.org·3h
Flag this post

Title:Evaluating LLMs’ Reasoning Over Ordered Procedural Steps

View PDF HTML (experimental)

Abstract:Reasoning over procedural sequences, where the order of steps directly impacts outcomes, is a critical capability for large language models (LLMs). In this work, we study the task of reconstructing globally ordered sequences from shuffled procedural steps, using a curated dataset of food recipes, a domain where correct sequencing is essential for task success. We evaluate several LLMs under zero-shot and few-shot settings and present a comprehensive evaluation framework that adapts established metrics from ranking and sequence alignment. These include Kendall’s Tau, Normalized Longest Common Subsequence…

Similar Posts

Loading similar posts...