LLM-Guided Reinforcement Learning with Representative Agents for Traffic Modeling

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are increasingly used as behavioral proxies for self-interested travelers in agent-based traffic models. Although more flexible and generalizable than conventional models, the practical use of these approaches remains limited by scalability due to the cost of calling one LLM for every traveler. Moreover, it has been found that LLM agents often make opaque choices and produce unstable day-to-day dynamics. To address these challenges, we propose to model each homogeneous traveler group facing the same decision context with a single representative LLM agent who behaves like the population’s average, maintaining and updating a mixed strategy ov…

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are increasingly used as behavioral proxies for self-interested travelers in agent-based traffic models. Although more flexible and generalizable than conventional models, the practical use of these approaches remains limited by scalability due to the cost of calling one LLM for every traveler. Moreover, it has been found that LLM agents often make opaque choices and produce unstable day-to-day dynamics. To address these challenges, we propose to model each homogeneous traveler group facing the same decision context with a single representative LLM agent who behaves like the population’s average, maintaining and updating a mixed strategy over routes that coincides with the group’s aggregate flow proportions. Each day, the LLM reviews the travel experience and flags routes with positive reinforcement that they hope to use more often, and an interpretable update rule then converts this judgment into strategy adjustments using a tunable (progressively decaying) step size. The representative-agent design improves scalability, while the separation of reasoning from updating clarifies the decision logic while stabilizing learning. In classic traffic assignment settings, we find that the proposed approach converges rapidly to the user equilibrium. In richer settings with income heterogeneity, multi-criteria costs, and multi-modal choices, the generated dynamics remain stable and interpretable, reproducing plausible behavioral patterns well-documented in psychology and economics, for example, the decoy effect in toll versus non-toll road selection, and higher willingness-to-pay for convenience among higher-income travelers when choosing between driving, transit, and park-and-ride options.


Subjects:	Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Cite as:	arXiv:2511.06260 [cs.GT]
	(or arXiv:2511.06260v1 [cs.GT] for this version)
	https://doi.org/10.48550/arXiv.2511.06260 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jiayang Li [view email] [v1] Sun, 9 Nov 2025 07:36:46 UTC (353 KB)

Submission history

Similar Posts