ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning

View PDF

Abstract:Reliable geospatial information on road accidents is vital for safety analysis and infrastructure planning, yet most low- and middle-income countries continue to face a critical shortage of accurate, location-specific crash data. Existing text-based geocoding tools perform poorly in multilingual and unstructured news environments, where incomplete place descriptions and mixed Bangla-English scripts obscure spatial context. To address these limitations, this study introduces ALIGN (Accident Location Inference through Geo-Spatial Neural Reasoning)- a vision-language framework that emulates human spatial reasoning to infer accident coordinates directly from textual and map-based cues. ALIGN integrates large language and vision-lang…

View PDF

Abstract:Reliable geospatial information on road accidents is vital for safety analysis and infrastructure planning, yet most low- and middle-income countries continue to face a critical shortage of accurate, location-specific crash data. Existing text-based geocoding tools perform poorly in multilingual and unstructured news environments, where incomplete place descriptions and mixed Bangla-English scripts obscure spatial context. To address these limitations, this study introduces ALIGN (Accident Location Inference through Geo-Spatial Neural Reasoning)- a vision-language framework that emulates human spatial reasoning to infer accident coordinates directly from textual and map-based cues. ALIGN integrates large language and vision-language models within a multi-stage pipeline that performs optical character recognition, linguistic reasoning, and map-level verification through grid-based spatial scanning. The framework systematically evaluates each predicted location against contextual and visual evidence, ensuring interpretable, fine-grained geolocation outcomes without requiring model retraining. Applied to Bangla-language news data, ALIGN demonstrates consistent improvements over traditional geoparsing methods, accurately identifying district and sub-district-level crash sites. Beyond its technical contribution, the framework establishes a high accuracy foundation for automated crash mapping in data-scarce regions, supporting evidence-driven road-safety policymaking and the broader integration of multimodal artificial intelligence in transportation analytics. The code for this paper is open-source and available at: this https URL


Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2511.06316 [cs.AI]
	(or arXiv:2511.06316v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2511.06316 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: MD Thamed Bin Zaman Chowdhury [view email] [v1] Sun, 9 Nov 2025 10:44:26 UTC (3,621 KB)

Submission history

Similar Posts