22 min read3 days ago
–
September-December 2025
Press enter or click to view image in full size
Introduction
Non-Traditional Data (NTD) — data that is digitally captured, mediated, or observed through sources such as satellites, sensors, online platforms, mobility traces, and crowdsourcing — continues to feature in public-interest research and decision support as a complement to traditional statistics and administrative records. Earlier updates in this series documented how these data sources have been used in areas such as public health, economic measurement, urban systems and mobility, environmental monitoring, and governance. This edition adds further evidence on where NTD is now being applied most consistently and the types of questions it is being used to address.
This …
22 min read3 days ago
–
September-December 2025
Press enter or click to view image in full size
Introduction
Non-Traditional Data (NTD) — data that is digitally captured, mediated, or observed through sources such as satellites, sensors, online platforms, mobility traces, and crowdsourcing — continues to feature in public-interest research and decision support as a complement to traditional statistics and administrative records. Earlier updates in this series documented how these data sources have been used in areas such as public health, economic measurement, urban systems and mobility, environmental monitoring, and governance. This edition adds further evidence on where NTD is now being applied most consistently and the types of questions it is being used to address.
This update, covering work published between September and December 2025, is organized around the following thematic areas:
- Public Health Intelligence and Surveillance;
- Crisis Response and Humanitarian Decision Support;
- Environment, Climate Resilience, and Air Quality;
- Urban Systems, Mobility, and Public Space;
- Online Political Sentiment;
- Economic Opportunity and Labor Dynamics; and
- Migration.
Across these areas, the cases draw on a recurring set of non-traditional data sources, including:
Open-source digital information
- News media, online reports, blogs, and transcribed radio broadcasts (e.g. epidemic intelligence systems)
- Public web content scraped to detect emerging signals and trends
Wastewater and environmental biosignals
- Viral RNA measurements from wastewater at treatment plants and upstream sewer catchments
- Multi-pathogen wastewater surveillance (COVID-19, influenza, RSV, norovirus)
Online behavior and digital traces
- Search engine query data (e.g. symptom- and disease-related searches)
- Social media posts, comments, engagement metrics, and timestamps
- Platform activity metadata linked (with consent) to cohort or survey data
Crowdsourced and community-generated data
- Citizen reports submitted via mobile phones during disasters (flooding, landslides, infrastructure damage)
- Participatory mapping contributions, photos, and local observations
Mobile network and location data
- Aggregated, anonymized mobile phone connectivity data to infer population movement and evacuation behavior
- Platform-based mobility data (e.g. traffic and routing applications)
Remote sensing and satellite imagery
- Optical, radar, and thermal satellite data for disaster damage assessment; agricultural disruption in conflict zones; and road quality, passability, and infrastructure change
- AI-derived features extracted from high-resolution imagery
Low-cost sensor data
- Mobile air-quality sensors carried by delivery workers or mounted on vehicles
- High-frequency environmental measurements outside fixed monitoring networks
Platform-generated economic and labor data
- Aggregated professional profile and career-transition data from online labor platforms
- Transactional data from supermarket loyalty card programs
Operational and administrative by-product data
- Border management systems, asylum case records, service-provider datasets
- NGO and humanitarian operational data not originally collected for statistical purposes
Taken together, the cases offer a snapshot of recent practice in the reuse of non-traditional data for public-interest analysis. They illustrate how different data sources are being applied in specific institutional and geographic contexts, without seeking to draw broader conclusions about effectiveness, impact, or long-term sustainability.
Public Health Intelligence and Surveillance
World Health Organization. “WHO Upgrades Its Public Health Intelligence System to Boost Global Health Security.” October 13, 2025.https://www.who.int/news/item/13-10-2025-who-upgrades-its-public-health-intelligence-system-to-boost-global-health-security
**Focus: **Announces the launch of version 2.0 of the Epidemic Intelligence from Open Sources (EIOS) system, WHO’s global platform for early detection and monitoring of emerging public health threats.
**Role of Non-Traditional Data: **EIOS analyzes large volumes of publicly available information, including news media, social media, online reports, and transcribed radio broadcasts, using automated analytics and multilingual processing. The WHO operates EIOS as a public good, with more than 110 national public health agencies using the system to identify early signals that may not yet appear in laboratory or clinical data. The upgraded version integrates additional open-source data streams and applies automated analytics to help users identify, verify, and assess potential health threats.
**Why This Matters: **The updated system expands the scale, diversity, and timeliness of global health intelligence available to countries. New features, including AI-supported signal detection, multilingual access, and the ability to analyze additional sources such as radio broadcasts, alongside improving collaborative analysis strengthens the ability of national and international actors to monitor health risks linked to disease emergence, climate impacts, and conflict.
Bebinger, Martha. “Boston-Based AI Disease Tracker Aims to Be an ‘Alarm Bell’ as the Trump Administration Severs Global Health Ties.” WBUR, September 12, 2025. https://www.wbur.org/news/2025/09/12/boston-ai-biothreat-tracker-beacon-cdc-diseases-global-health
**Focus: **Profiles the development and early deployment of BEACON, an independent disease surveillance platform designed to detect and communicate emerging public health threats.
**Role of Non-Traditional Data: **Researchers at Boston University designed BEACON to aggregate publicly available web-based information, including news reports, online sources, and user submissions, alongside inputs from a global network of infectious disease experts. Automated web scraping generates roughly half of all alerts, while expert contributions and verified public reports account for the remainder. Medical and public health professionals review all signals before the platform publishes them in a continuously updated, open-access feed that maps outbreaks across countries and pathogens.
**Why This Matters: **BEACON shortens the time between initial reporting of disease events and public communication. In its first months of operation, the platform produced hundreds of alerts across multiple pathogens and regions, demonstrating its ability to operate at global scale. The system provides openly accessible outbreak intelligence that supports monitoring and decision-making even when traditional surveillance capacity is reduced or disrupted.
Oswald, Claire, Stephanie Melles, Kimberley Gilbride, Eyerusalem Goitom, Sarah Ariano, Alexandra Johnston, Eden Hataley, Amir Tehrani, Nora Dannah, Hussain Aqeel, Christopher Wellen, James Li & Steven Liss. “Identification of Sentinel Upstream Community Sites for Wastewater Surveillance of SARS-CoV-2 in a Large Urban Area.” Water Research, Volume 284, 123958.September 15, 2025. https://doi.org/10.1016/j.watres.2025.123958
**Focus: **Evaluates how upstream wastewater sampling locations improve neighborhood-level detection of COVID-19 transmission.
**Role of Non-Traditional Data: **The research team analyzed high-frequency viral RNA measurements from untreated wastewater collected across multiple upstream sewer catchments in Toronto. The analysis combined wastewater signals with geospatial sewer network data and community marginalization indices to assess where wastewater signals most closely aligned with reported clinical cases. Results show that upstream sites, particularly in dense and socially vulnerable communities with shorter pipe lengths, captured infection dynamics earlier and more clearly than centralized treatment plants.
**Why This Matters: **Wastewater surveillance performance depends in part on where data collection occurs and how infrastructure and social context shape the collected signals. Accounting for these factors can improve early warning capacity and support more equitable surveillance design.
Amirali, Ayaaz, Mark E. Sharkey, Shruti Choudhary, Kristina M. Babler, Cynthia C. Beaver, Pratim Biswas, Kate R. Bowie, Taylor Burke, Benjamin B. Currall, George S. Grills, Hannah G. Healy, Alexander G. Lucaci, Christopher E. Mason, Michaela McGuire, Rosemarie Ramos, Madelena Ruedaflores, Natasha Schaefer Solle, Stephan C. Schürer, Bhavarth S. Shukla, Mario Stevenson, Helena M. Solo-Gabriele. “Long-Term Assessment of SARS-CoV-2 in Wastewater and the Transition to Evaluate Additional Viral Targets.” Science of the Total Environment, Volume 995 (2025), 180096. September 15, 2025.https://doi.org/10.1016/j.scitotenv.2025.180096
**Focus: **Assesses the long-term reliability of wastewater-based surveillance and examines the feasibility of expanding surveillance beyond SARS-CoV-2 to additional viral targets.
**Role of Non-Traditional Data: **Researchers analyzed nearly four years of wastewater viral concentration data collected from three regional treatment plants in Miami-Dade County and processed by multiple laboratories. The study compared wastewater signals with several clinical health metrics across time and geography. The analysis also expanded monitoring to influenza A and B, norovirus, respiratory syncytial virus, and human metapneumovirus to evaluate multi-pathogen surveillance capacity.
**Why This Matters: **Long-term evidence on wastewater surveillance remains limited despite widespread adoption. The study shows that correlations with disease prevalence vary by geography, population mobility, variant dynamics, and choice of clinical comparator. Consistent detection of multiple pathogens across laboratories supports the feasibility of distributed surveillance systems while underscoring the importance of spatial and temporal context.
Deshpande, Gargi, Bijay Rimal, Kristen Shelton, Jason Vogel, Bradley Stevenson, Katrin Gaardbo Kuhn. “Wastewater-Based Surveillance for Influenza and Respiratory Syncytial Virus: Insights from a 21-Month Study in Oklahoma.” Epidemics, Volume 53, 100861. October 2025.https://doi.org/10.1016/j.epidem.2025.100861
**Focus: **Tests whether wastewater surveillance can track seasonal circulation of influenza and RSV at statewide scale.
**Role of Non-Traditional Data: **The study analyzed weekly wastewater samples from 18 treatment plants across Oklahoma to measure viral RNA concentrations for influenza A, influenza B, and RSV over a 21-month period. Researchers compared these signals with hospitalization data and test positivity across urban, rural, and underserved communities.
**Why This Matters: **Clinical surveillance often undercounts mild or asymptomatic infections and lags behind transmission trends. Wastewater data offers a population-representative view of respiratory virus circulation across large and diverse regions that does not depend on individual testing or care-seeking behavior. Statewide surveillance supports more comprehensive situational awareness, particularly in communities with limited access to healthcare or testing.
Press enter or click to view image in full size
Farkas, Kata, Devrim Kaya, Rasha Maal-Bared, Ahmad I. Al-Mustapha, Sarmila Tandukar, Ishi Keenum, Teemu Gunnar, Aaron Bivins, Matthew J. Wade, Kyle Bibby, Tarja M. Pitkänen & Ananda Tiwari. “Communicating Wastewater-Based Surveillance Data to Drive Action.” Journal of Water and Health, Volume 23, Issue 9. September 1, 2025. https://doi.org/10.2166/wh.2025.080
**Focus: **Examines how communication practices shape the public health value of wastewater surveillance data.
**Role of Non-Traditional Data: **The study reviews how public health agencies, utilities, and research institutions generate and share aggregated wastewater pathogen signals through dashboards, standardized reports, online repositories, and near-real-time data feeds. These systems link laboratory outputs with decision-makers across local, national, and international contexts.
**Why This Matters: **Effective public health surveillance depends not only on data collection but on whether decision-makers and communities can interpret and use the information. Inconsistent formats, unclear thresholds, and fragmented digital infrastructure limit the usability of wastewater surveillance. Moving away from treating laboratory results as standalone scientific outputs, towards integrating wastewater data into standardized digital communication systems, can improve interpretation, coordination, and trust across sectors as countries institutionalize wastewater monitoring beyond COVID-19.
Alshahrani, Abdulrahman M., Areej A. Alahmadi, Fahad S. Alzahrani, and Saeed S. Alqahtani. “Enhancing the Accuracy of COVID-19 Incidence and Mortality Predictions Using Google Trends Data Across the 50 U.S. States and the District of Columbia.” Data & Policy, Volume 7, e77. November 3, 2025.https://www.cambridge.org/core/journals/data-and-policy/article/enhancing-the-accuracy-of-covid19-incidence-and-mortality-predictions-using-google-trends-data-across-the-50-us-states-and-the-district-of-columbia/A855E95979B30B4130F14C8FBBF3F0BD
**Focus: **Tests whether online search behavior can improve short-term predictions of COVID-19 incidence and mortality.
**Role of Non-Traditional Data: **The authors used aggregated Google Trends queries related to symptoms, testing, and disease awareness at the state level. The study integrated these indicators into predictive models alongside reported case and death data and evaluated multiple lag structures.
**Why This Matters: **Search behavior reflects population concern and symptom experience earlier than formal reporting. Incorporating these signals can improve predictive accuracy during periods of rapid change and illustrates how behavioral data can complement epidemiological surveillance.
Dahiya, Liza & Rachit Bagga. “Digital Epidemiology: Leveraging Social Media for Insight into Epilepsy and Mental Health.” Journal of Computational Social Science, Volume 9, Article 1. November 4, 2025.https://doi.org/10.1007/s42001-025-00402-x
**Focus: **Analyzes social media discussions to identify mental health risks and support needs among people with epilepsy and their caregivers.
**Role of Non-Traditional Data: **The researchers analyzed approximately 57,000 posts and over 530,000 comments from Reddit’s r/Epilepsy community over three years. The dataset captures unsolicited expressions of symptoms, treatment experiences, emotional distress, and caregiving challenges. Researchers used text analysis to extract themes, temporal patterns, and indicators of depression across demographic groups.
**Why This Matters: **Mental health comorbidities in epilepsy often remain under-detected in clinical settings. By linking linguistic patterns and engagement metrics to reported experiences across age, gender, and caregiver relationships, researchers can use social media data to provide complementary insight into emerging concerns and vulnerable groups, particularly younger adults and caregivers.
Joinson, David, Claire M. A. Haworth, Emma Simpson, Nello Cristianini, Nicholas H. Di Cara, & Oliver S. P. Davis. “Active Night-Time Tweeting Is Associated with Meaningfully Lower Mental Wellbeing in a UK Birth Cohort Study.” Scientific Reports 15: 34301.October 9, 2025. https://doi.org/10.1038/s41598-025-14745-y.
Focus: Examines associations between night-time social media use and mental wellbeing in adults.
Role of Non-Traditional Data: The research team linked consented Twitter activity metadata with participants in a UK longitudinal birth cohort. The study used Tweet timestamps to calculate each participant’s average posting time in the two weeks preceding validated mental health assessments. They combined these behavioral traces with longitudinal cohort data to analyze associations between digital activity patterns and mental health outcomes.
Why This Matters: Platform-generated behavioral traces allow researchers to study mental health risks that self-report methods often miss. The findings are directly relevant to ongoing policy debates on digital wellbeing, platform design, and online safety regulation.
Burgess, Romana, Ayesha Suhag, and Anna Skatova. “Exploring the use of supermarket loyalty card data in health research: A scoping review.” Public Health, Volume 247. October 2025. https://doi.org/10.1016/j.puhe.2025.105848
**Focus: **Synthesizes evidence on how researchers use supermarket loyalty card data to study health outcomes and inequalities.
**Role of Non-Traditional Data: **The review covers 44 studies that used loyalty card transaction data to infer dietary patterns, alcohol and tobacco consumption, medication use, and responses to policy interventions such as sugar taxes and pricing reforms. Many studies linked retail data with surveys, deprivation indices, food composition databases, or health records.
**Why This Matters: **Loyalty card data enable longitudinal observation of health-related behavior at population scale. The review highlights both the analytical value and governance challenges involved in integrating private-sector data into public health research and decision-making.
Crisis Response and Humanitarian Decision Support
PetaBencana.id. “Understanding Sumatra’s Extreme Floods and How Communities Are Responding” and “Bali Under Water: Communities Map Floods in Real Time to Guide Evacuations.” PetaBencana Blog. 2025.https://blog.petabencana.id/category/uncategorized
Focus: Documents community-led flood mapping that supported evacuations and response operations during major flood events in Sumatra and Bali in late 2025.
Role of Non-Traditional Data: Residents submitted geolocated reports through PetaBencana.id during unfolding emergencies. Contributors reported flooded roads, blocked bridges, landslides, rising water levels, and unsafe zones via mobile devices. The platform aggregated reports into a live public map that reflected on-the-ground conditions as infrastructure failed and official updates lagged. Emergency services and volunteer groups used the map alongside official channels to identify priority evacuation areas, deploy rescue assets, coordinate sandbagging, and manage traffic.
Why This Matters: Flood response requires timely, granular situational awareness when conditions change faster than official alerts can circulate. Community-generated reports can fill critical information gaps and provide operationally relevant detail that traditional monitoring systems often cannot deliver during fast-moving disasters.
Press enter or click to view image in full size
Abid, Sheikh Kamran, Ruhizal Roosli, Umber Nazir, and Nur Shazwani Kamarudin. “AI-Enhanced Crowdsourcing for Disaster Management: Strengthening Community Resilience Through Social Media.” International Journal of Emergency Medicine 18, Article 201. October 13, 2025.https://doi.org/10.1186/s12245-025-01009-9
**Focus: **Synthesizes evidence on how machine learning and social media crowdsourcing can improve disaster management and strengthen community resilience.
**Role of Non-Traditional Data: **The study examines citizen-generated social media content shared during disasters, including text, images, video, and location-tagged updates. The authors review how machine learning methods can sort, classify, and prioritize these data streams to support preparedness and response, with particular attention to disaster cooperation in Pakistan. The paper highlights use cases where automated processing can help identify urgent needs, improve situational assessments, and support coordination between communities and response organizations.
**Why This Matters: **Many disaster systems struggle to convert high-volume citizen reporting into actionable intelligence. The study shows how automated analytics can connect community-generated signals to decision-making in rapidly evolving emergencies. The Pakistan focus reflects broader patterns across the Global South, where responders increasingly rely on AI-enabled tools to make crowdsourced information usable at operational speeds.
Elejalde, Erick, Timur Naushirvanov, Kyriaki Kalimeri, Elisa Omodei, Márton Karsai, Loreto Bravo, and Leo Ferres. “Use of Mobile Phone Data to Measure Behavioral Response to SMS Evacuation Alerts.” International Journal of Disaster Risk Reduction, Volume 131, Article 105919.December 2025. https://doi.org/10.1016/j.ijdrr.2025.105919
**Focus: **Measures real-time evacuation behavior following emergency SMS alerts during the February 2024 wildfires in Valparaíso, Chile.
**Role of Non-Traditional Data: **The researchers used anonymized mobile network data from approximately 580,000 devices to infer population movement before and after evacuation alerts. Changes in mobile tower connectivity served as a proxy for evacuation timing, intensity, and recovery. The study evaluated spillover movement into non-warned areas and examined differences across socioeconomic groups. The analysis revealed patterns consistent with alert fatigue, voluntary evacuation outside targeted zones, and unequal capacity to evacuate and return.
**Why This Matters: **Emergency managers often lack direct evidence on whether alerts trigger movement and which communities face constraints in responding. Mobile network data enables high-frequency observation of evacuation behavior at operational timescales. The findings show how repeated alerts can reduce responsiveness, how evacuation extends beyond designated zones, and how socioeconomic differences shape evacuation and recovery outcomes.
Ramachandran, Anu, Akash Yadav, and Andrew Schroeder. “Implementation of Remote-Sensing Models to Identify Post-Disaster Health Facility Damage: Comparative Approaches to the 2023 Earthquake in Turkey.” PLOS Digital Health, Volume 4, Issue 10, e0001060. October 27, 2025.https://doi.org/10.1371/journal.pdig.0001060
**Focus: **Evaluates AI-based damage detection models that estimate health facility damage from post-disaster imagery after the 2023 earthquake in Türkiye.
**Role of Non-Traditional Data: **The researchers combined AI-generated building damage estimates with open and semi-crowdsourced health facility location data, including hospitals, dialysis centers, and pharmacies. Two machine learning models produced building-level damage outputs from post-event imagery. The team intersected model outputs with facility location points to estimate likely facility damage, and it tested both facility-level overlays and spatially aggregated approaches. Open mapping sources expanded facility coverage, particularly for pharmacies that official post-disaster inventories often omit.
**Why This Matters: **Response teams need early insight into damage to health infrastructure to allocate medical resources and plan recovery, yet ground assessments often take weeks. The study shows that AI-derived damage layers combined with facility location data can generate scalable early indicators and support prioritization. The results also show that spatially aggregated approaches can improve performance, while current models still fall short of replacing on-the-ground validation.
Heidelberg Institute for Geoinformation Technology (HeiGIT). “New Global Satellite Dataset for Humanitarian Routing and Tracking Infrastructure Change.” Eurekalert! November 19, 2025.https://www.eurekalert.org/news-releases/1106576
**Focus: **Introduces a global dataset that characterizes road surface quality, width, and change over time to support humanitarian routing and infrastructure monitoring.
**Role of Non-Traditional Data: **HeiGIT built the dataset from high-resolution commercial satellite imagery from PlanetScope and applied machine learning to classify road surface type and width across 9.2 million kilometers of routes worldwide. The pipeline generates a Humanitarian Passability Score that estimates accessibility under varying conditions. The dataset tracks infrastructure change from 2020 to 2024 and converts imagery into a dynamic view of road quality rather than a static map. The team distributed outputs as an open dataset through the Humanitarian Data Exchange to enable reuse by humanitarian organizations, governments, and researchers.
**Why This Matters: **Humanitarian logistics and disaster response depend on reliable information about which routes remain usable, especially during seasonal disruption and extreme weather. Many regions lack current or detailed road condition data. The dataset offers globally consistent road quality indicators and supports routing decisions, investment planning, and monitoring of infrastructure change, including in rural and underserved areas.
Environment, Climate Resilience, and Air Quality
Lyons, David. “Mapping a fairer future: The open-source movement that’s mobilising for climate resilience”. Pioneers Post. October 21, 2025. https://immersives.pioneerspost.com/openstreetmap-climate-resilience/index.html
**Focus: **Shows how OpenStreetMap contributors and partners use open geospatial data to support disaster preparedness, climate resilience planning, and infrastructure investment in underserved regions.
**Role of Non-Traditional Data: **Volunteers and local organizations generate and update geospatial data through participatory field surveys, drone imagery, and locally collected observations. Contributors map roads, buildings, drainage, shade cover, flood exposure, and emergency resources in places where commercial maps and official registries remain incomplete. The Humanitarian OpenStreetMap Team combines these inputs with AI-assisted mapping tools to expand coverage while local participants validate results and correct errors.
**Why This Matters: **Emergency response and climate adaptation depend on accurate, current information about what exists on the ground and where risks concentrate. Locally produced maps can speed response during floods, fires, and heat events and can inform investment choices such as drainage upgrades or targeted flood mitigation. The work also strengthens local capacity to generate and use data, which can reduce dependence on external actors and support longer-term, community-owned data ecosystems.
Perry, Niál, Peter P. Pedersen, Charles N. Christensen, Emanuel Nussli, Sanelma Heinonen, Lorena Gordillo Dagallier, Raphaël Jacquat, Sebastian Horstmann & Christoph Franck. “Detecting Urban PM Hotspots with Mobile Sensing and Gaussian Process Regression.” arXiv. September 21, 2025.https://arxiv.org/abs/2509.17175
**Focus: **Develops a method that identifies urban particulate matter (PM2.5) pollution hotspots using mobile low-cost sensors and probabilistic spatial modeling.
**Role of Non-Traditional Data: **Delivery workers in Kigali, Rwanda carried low-cost sensors mounted on electric motorbikes during routine routes, generating high-frequency, geolocated pollution measurements that differ from structured readings from fixed monitoring stations. The researchers normalized sensor readings to reduce background effects and applied Gaussian process regression to estimate city-wide PM2.5 patterns. The method produces hotspot scores that represent the probability that pollution in a given area exceeds the city-wide median. The approach also uses open-source mapping data and software to support replication without access to formal monitoring infrastructure.
**Why This Matters: **Many rapidly growing cities lack dense air quality monitoring networks, which limits understanding of exposure and weakens the evidence base for intervention. Mobility-based sensing can produce actionable, high-resolution pollution maps without reliance on fixed sensors or satellite-only estimates. Hotspot identification can support targeted clean air interventions, transport planning, and public health assessment for populations facing persistent exposure.
Deininger, Klaus; Daniel Ayalew Ali, Nataliia Kussul, Guido Lemoine, Andrii Shelestov, and Leonid Shumilo. “Using Remotely Sensed Data to Assess War-Induced Damage to Agricultural Cultivation: Evidence from Ukraine.” World Bank Group, Policy Research Working Paper 11221. September 25, 2025. https://documents.worldbank.org/en/publication/documents-reports/documentdetail/099726309252542219
**Focus: **Quantifies war-related damage to agricultural cultivation in Ukraine and measures changes in crop activity across conflict-affected regions.
**Role of Non-Traditional Data: **The researchers used high-frequency satellite-derived indicators to detect damage and abandonment in farmland. The analysis drew on Sentinel optical and radar imagery, thermal fire-detection data, and satellite-based crop classification maps to identify burned fields, artillery impacts, vehicle tracks, trenches, and disrupted planting. These remotely sensed measures supported tracking of winter and summer crop cultivation across nearly 10,000 local administrative units, including areas with limited access, shifting frontlines, and incomplete official reporting.
**Why This Matters: **Conflict-driven agricultural losses affect food security, rural livelihoods, and post-war recovery, yet traditional reporting often cannot measure impacts at scale. Satellite-derived indicators provide timely, consistent evidence that can reveal losses beyond those captured in media-based conflict datasets. The results can inform humanitarian assistance, compensation design, demining priorities, and reconstruction planning.
Urban Systems, Mobility, and Public Space
Gambrell, Dane. “Vibe Coding the City: How One Developer Used Open Data to Map Every Public Space in New York City.” Rebooting Democracy in the Age of AI. October 14, 2025. https://rebootdemocracy.ai/blog/vibe-coding-the-city-how-one-developer-used-open-data-to-map-every-public-space-in-new-york-city
**Focus: **Describes how civic technologist Chris Whong consolidated fragmented public datasets to map New York City’s public spaces in a single searchable tool.
Role of Non-Traditional Data: The project combined open government datasets on parks, plazas, waterfront access areas, schoolyards, and privately owned public spaces. The developer cleaned and standardized records into a unified spatial inventory and supplemented official data with community-submitted updates, photos, and amenity details collected through the app. Generative AI assisted coding and produced initial descriptions for thousands of spaces, while user contributions and moderation corrected errors and added context.
**Why This Matters: **Residents often cannot use public space information effectively because agencies publish it in fragmented formats with inconsistent structures. A consolidated and enriched map can improve access to everyday amenities and support navigation, accessibility planning, and community use of the public realm.
Pirlea, Ana Florina, and Divyanshi Wadhwa. “Understanding Traffic Changes During COVID-19 Through Waze Data.” Development Data Partnership, World Bank. November 24, 2025.https://datapartnership.org/updates/understanding-traffic-changes-during-covid-19-through-waze-data/
**Focus: **Analyzes how city road traffic changed during COVID-19 lockdowns and reopening using real-time Waze mobility data.
**Role of Non-Traditional Data: **The World Bank accessed anonymized, aggregated traffic data from Waze through the Development Data Partnership. The team measured changes in road traffic volume in New York, Bogotá, Mumbai, and Manila following stay-at-home orders and reopening phases. Platform-based signals enabled city-level observation of behavioral responses to policy interventions on short time horizons.
**Why This Matters: **Transport planners and policymakers need timely evidence on how crises reshape mobility patterns to manage congestion, emissions, and system resilience. Platform data can reveal immediate and uneven impacts across cities with different income levels and governance contexts. Integrating these signals with economic and environmental indicators can support more responsive and resilient urban planning.
Online Political Sentiment
Askarizade, Mojgan, Ensieh Davoodijam. “Analyzing Public Sentiment in Iranian Presidential Elections on Twitter Using Large Language Models.” Journal of Computational Social Science, Volume 8, Article 102. October 17, 2025.https://link.springer.com/article/10.1007/s42001-025-00431-6
**Focus: **Measures shifts in public sentiment and candidate attention during the 2024 Iranian presidential election using Persian-language Twitter activity.
**Role of Non-Traditional Data: **The researchers analyzed 111,386 election-related Persian-language tweets collected during the election period. The dataset included tweet text, user metadata, and engagement indicators, which captured political expression in an environment where polling and open survey research face constraints. The authors applied large language models to classify sentiment at scale and traced hourly and daily sentiment dynamics across the election cycle.
**Why This Matters: **Policymakers and researchers often lack reliable, timely measures of public opinion in settings where political expression is sensitive and traditional data sources are limited. Social media analysis can offer an additional lens on political engagement and sentiment dynamics. The study also illustrates how large language models can support analysis of non-English political discourse.
Economic Opportunity and Labor Dynamics
Yañez-Pagans, Patricia, Jimena Serrano, Mattia Chiapello, Magdalena Barafani, Casey Weston, Silvia Lara, and Alejandra Barrientos. “Fixing the Broken Rung: How Data Can Help Advance Women’s Careers in Latin America and the Caribbean.” IDB Group Blogs (Sustainable Businesses Blog). November 5, 2025. https://idbinvest.org/en/blog/gender/fixing-broken-rung-how-data-can-help-advance-womens-careers-latin-america-and-caribbean
**Focus: **Identifies where women’s representation drops most sharply along the career ladder in Latin America and the Caribbean, with emphasis on the move from entry-level roles into management.
**Role of Non-Traditional Data: **The analysis uses aggregated and anonymized LinkedIn profile data across 18 countries, including job titles, sectors, seniority levels, and transitions. The dataset enables comparisons across industries and countries at a scale and granularity that labor force surveys typically cannot provide. IDB Invest accessed the data through an agreement with LinkedIn under the Development Data Partnership
**Why This Matters: **Policymakers and employers need clearer evidence on where career progression breaks down to design effective interventions that close gender gaps in leadership and earnings. Traditional labor statistics rarely capture internal transitions with sufficient detail. Platform-based data can support targeted responses such as leadership pipelines, sector-specific inclusion strategies, and reskilling programs grounded in observed career pathways.
Press enter or click to view image in full size
Aliu, Toluwani. “How AI Is Powering Grassroots Solutions for Underserved Communities.” World Economic Forum. September 2, 2025. https://africa.businessinsider.com/news/a-nonprofit-used-ai-to-document-77-million-miles-of-unmapped-waterways-heres-why-that/bwdn0gh
Focus: Describes how a nonprofit maps unmapped waterways to identify missing bridge infrastructure and help decision-makers prioritize investments that expand access to services and markets.
**Role of Non-Traditional Data: **The initiative uses geospatial signals derived from satellite imagery and related passively collected sources to map waterways and identify likely bridge sites at scale. Machine learning helps analyze elevation, vegetation, and hydrological patterns and reduces reliance on time-intensive field surveys. The organization combines these outputs with information on population locations, destinations such as schools and clinics, and travel-time proxies to estimate who lacks safe crossings and where investments yield the highest access gains.
Why This Matters: Spatial data quality influences which communities receive infrastructure investment and how quickly needs assessments translate into action. Waterway mapping linked to access indicators can improve transparency and speed for site selection and planning. The work has informed infrastructure planning in parts of East Africa and has reduced the time and cost associated with identifying high-impact bridge locations.
Migration
Kierans, Denis, and Albert Kraler, eds. “Handbook on Irregular Migration Data. Concepts, Methods and Practices.” MIrreM Project, University of Krems Press. October 9, 2025. https://door.donau-uni.ac.at/detail/o:5665
**Focus: **Provides practical guidance on how institutions collect, interpret, and govern data on irregular migration.
**Role of Non-Traditional Data: **The handbook covers data sources that extend beyond censuses and surveys, including border management systems, asylum and case-processing records, visa and permit databases, return and detention records, service-provider data, NGO operational datasets, and digitally mediated information flows. It discusses methods that combine fragmented datasets across institutions and jurisdictions and estimation approaches that address undercounting and invisibility. The text also emphasizes ethical safeguards for handling sensitive personal information.
**Why This Matters: **Irregular migration remains difficult to measure and politically contested, yet it shapes humanitarian protection, labor markets, and border governance. Clearer definitions and transparent methods can improve consistency and credibility in estimates. Shared standards for data access, interpretation, and protection can strengthen accountability and support evidence-informed policy in high-stakes settings.
Reflections
- **Public health intelligence is the most prominent application area: **Many cases focus on early detection and situational awareness rather than long-term epidemiological measurement. Open-source epidemic intelligence systems, wastewater surveillance, search behavior, and online discussion are used to complement clinical reporting, especially where testing, coverage, or timeliness are limited.
- **Most uses support monitoring and operational decision-making: **The strongest examples emphasize understanding conditions as they evolve, including tracking outbreaks, observing evacuation behavior, mapping flood impacts, assessing damage to health facilities, and identifying accessible transport routes during crises.
- **Specific data types recur in consistent, task-oriented roles: **Remote sensing is repeatedly used for environmental damage, agricultural disruption, and infrastructure assessment in inaccessible or conflict-affected areas. Crowdsourced data appears most often in localized crisis response and resilience planning. Platform and mobility data are used to observe behavior at scale, including movement during disasters, traffic patterns, labor market dynamics, and online political engagement.
- **Translation and integration matter as much as data collection: **Some cases focus on how signals are processed and presented through dashboards, hotspot probabilities, passability scores, damage overlays, or standardized reporting formats. These design choices often determine whether non-traditional data can inform real-world decisions.
- **Most applications remain research- or mission-driven rather than fully institutionalized: **Universities, public agencies, and non-profit actors dominate the cases, even when privately held or platform-generated data is involved. Questions of access, representativeness, and governance remain present but unresolved across many examples.
Together, these cases show non-traditional data being used repeatedly for a limited set of tasks, particularly where timeliness, spatial granularity, and the ability to fill gaps in official data are most critical.