Abstract
This paper introduces the BostonWalks (BWS) study, detailing its methodology, the resulting dataset, and an initial analysis. The BWS study is a smartphone-based GNSS-tracking study in the Boston metropolitan area, designed to generate an up-to-date dataset on travel behavior, with a particular focus on non-auto travel behavior and its representativeness across all population segments. The dataset encompasses approximately 155,000 trips from 990 participants, making it one of the most extensive datasets of its kind in North America. It includes both raw trajectory data and comprehensive socio-demographic information about participants. The paper outlines the survey methodology, including the technical infrastructure, recruitment strategy, and data processing techniques.…
Abstract
This paper introduces the BostonWalks (BWS) study, detailing its methodology, the resulting dataset, and an initial analysis. The BWS study is a smartphone-based GNSS-tracking study in the Boston metropolitan area, designed to generate an up-to-date dataset on travel behavior, with a particular focus on non-auto travel behavior and its representativeness across all population segments. The dataset encompasses approximately 155,000 trips from 990 participants, making it one of the most extensive datasets of its kind in North America. It includes both raw trajectory data and comprehensive socio-demographic information about participants. The paper outlines the survey methodology, including the technical infrastructure, recruitment strategy, and data processing techniques. A comparison of the socio-demographic and travel behavior characteristics of BWS participants with those from the National Household Travel Survey is provided. Lastly, the paper highlights the richness of the data through correlation and cluster analysis.
Similar content being viewed by others

Introduction
Understanding travel patterns and how people move is a critical component of transportation modeling, urban planning, and policy research. For over three decades, transportation planners and researchers have relied on household travel surveys to gain insights into people’s travel behavior (Axhausen 1995; Stopher and Greaves 2007). In more recent years, advances in information technology have substantially enhanced the collection of both survey-elicited and passive mobility data (Schönfelder et al. 2002; Palmer et al. 2013). Compared to self-reported travel behavior obtained from surveys, travel behavior captured in a semi-automated manner by urban information and mobile systems has been found to be superior in terms of both representativeness and accuracy (Murakami and Wagner 1999; Stopher et al. 2007; Chapleau et al. 2018; Su et al. 2022).
Compared to other mobile data sources such as cellular, WiFi, or Bluetooth signals, geolocation data gathered through devices with GNSS (Global Navigation Satellite System) technology offer an unprecedented resolution while reducing data collection costs, especially for large-scale surveys with several thousand participants (Toole et al. 2015; Wang et al. 2018). GNSS tracking can help identify trips that are often under-reported in traditional travel diaries due to memory lapses. It also captures spatial data related to route selection with high precision (Janzen et al. 2018; Marra et al. 2019; Buehler and Pucher 2024). Therefore, GNSS tracking is now considered the state-of-the-art data collection method for large-scale travel behavior data, especially in academic contexts. It is also increasingly used by public agencies to replace conventional paper-based, phone- or computer-assisted household travel surveys. We invite readers to refer to Lawson et al. (2023), Lynch et al. (2019), Lawson et al. (2016) for a comprehensive overview of GNSS tracking applications developed in the academic, private, and public sectors, as well as their global use cases. Recent academic projects, such as Molloy et al. (2023), Heinonen et al. (2023), Winkler et al. (2022), have used the same tracking technology as this study (BostonWalks). In the U.S. context, the recent household travel survey conducted by the City of New YorkFootnote 1 is notable for providing a similar dataset with regard to sample size, data characteristics, and geographical extent, with a particularly strong focus on non-auto modes of transportation in highly urbanized areas.
The usage of GNSS tracking as a new method for collecting data on travel behavior is not without drawbacks. The discrepancy between self-reported and passively sensed data can vary according to the travel mode (Loidl et al. 2020). Also, issues with sample (non-)representativeness can lead to inaccurate and biased conclusions. Studies have reported that mobile phone ownership and participation in travel surveys are biased towards individuals with higher income and higher education levels (Wesolowski et al. 2013; Schmid et al. 2019). Furthermore, the high resolution offered by GNSS tracking has led to data privacy concerns, which in turn have led to such data typically being shared (and sometimes collected) in an anonymized format. As a result, currently available datasets often reflect the travel behaviors of specific demographic groups (Wesolowski et al. 2013; Wardle et al. 2023). Moreover, anonymization practices and differences in processing algorithms interpret and segment raw travel data can introduce further biases in GNSS tracking (Wang et al. 2025). This problem is particularly relevant in the context of sustainable and equitable transport planning, where greater emphasis is placed on understanding non-auto travel behavior and differentiating among various sociodemographic groups, especially those who have been historically disadvantaged.
This paper presents the core methodology of the BostonWalks study, which was initiated due to the lack of up-to-date travel behavior data in the Boston metropolitan area. The last large-scale household travel survey for the Boston metropolitan area was conducted in 2010 (MPO [2017](https://link.springer.com/article/10.1007/s11116-025-10637-2#ref-CR29 “MPO (2017) Exploring the 2011 Massachusetts Travel Survey: MPO Travel Profiles|Boston Region MPO. https://www.ctps.org/travel-profiles
“)). The dataset includes GNSS trajectories but has poor accuracy, mostly due to the low sampling frequency of the collection device, a relatively small sample size, and the lack of sociodemographic indicators, making it inadequate to support the exploration of current or future research and policy-making questions. The study was further motivated by the growing need for highly detailed spatio-temporal data—such as GNSS trajectories—combined with comprehensive sociodemographic indicators. This combination aims to improve understanding of non-auto travel behavior (particularly walking) and its variations across different sociodemographic groups. In this paper, we provide summary statistics of the resulting data, examine the representativeness of the participants and their travel behavior, and identify high-level travel patterns and dependencies between these elements. The contributions of this paper include a detailed description of the developed survey methodology, which is relevant for other researchers and public agencies planning to conduct similar studies. Additionally, we provide a thorough explanation of the data processing methods. Finally, we present key findings from the collected data, which can enable the investigation of pressing research and policy questions related to sustainable and equitable urban transportation planning.
Study design
The BostonWalks study, conducted by the MIT City Form Lab in partnership with the Ryan Wang Lab at Northeastern University, consists of two parts: an initial online questionnaire for screening, followed by a smartphone-based tracking phase covering at least two weeks. The study design was approved by the ethics committee from MIT (protocol number 2211000806). The initial questionnaire was set up on the QualtricsFootnote 2 survey platform and designed to gather sociodemographic information about potential participants and screening their eligibility for participation in the study. The survey started with the consent form displayed over two separate pages, both of which needed to be explicitly accepted by the user. The consent pages were structured according to MIT’s internal guidelines. They comprehensively outline the study’s motivation and methods, detail the risks and benefits, and include sections on privacy, security, participant rights, and contact information. The consent section was followed by a first section of questions related to all quota-relevant indicators (such as education level, household income, age, gender, race, and ethnicity). The questions in this section could neither be skipped nor answered using evasive options. The second section included questions regarding the household composition, occupation, mobility tool ownership, and health conditions (including height and weight). The third section included several Likert-scale questions regarding attitudes towards walking. The survey concluded with questions to verify eligibility for participation in the tracking phase. Participants were required to have an iOS or Android smartphone, be able to walk a quarter mile without pain, reside within the study region, and not be employed full-time as a professional driver. Participants were also asked if they worked as drivers for ride-hailing services like Uber, and if so, how much time and distance they typically covered during an average week.
If eligible, participants were redirected to a dedicated website with detailed explanations on how to install and initialize the tracking app. Participants had to track their movements for a minimum duration of 14 days to be eligible for a $25 Dunkin’ gift card incentive. Dunkin’ was specifically chosen over alternatives like Amazon because of its strong local presence in the Boston metropolitan area. Participants were further encouraged to continue tracking voluntarily beyond the mandatory two-week period until the end of the project. The average respondent took around 13 minutes to complete the initial questionnaire. We also used the scheme of Schmid and Axhausen (2019) to estimate the response burden of the questionnaire and found it to be 225 points. The response burden score metric is designed to quantify the overall strain that a survey places on its respondents. It is a point-based system that rates different types of questions or survey tasks based on their length/duration, complexity and related effort. It can be used to derive expected response rates and compare different survey relatively.
All relevant study materials (such as the website, user guide, manuals, and FAQs) and communication were made available in the three most common languages in the Boston area: English, Spanish, and Brazilian-Portuguese. Although we identified a few other relevant languages for the study area (including Creole, Chinese, and Arabic), we were unable to provide support for those languages primarily due to capacity constraints on the part of the research team and technical challenges when working with non-Latin alphabets. We formulated all study materials in plain language (keeping the reading level at or below grade 6 on the Coleman-Liau index (Coleman and Liau 1975) to ensure that our outreach was as inclusive as possible and encouraged the participation of historically marginalized groups. The complete study pipeline, including the tracking component and all related communication, was pretested within our research team. The main purpose of the pretest was to evaluate the functionality of the technical infrastructure. Subsequently, no major changes were made to the structure or content of the study.
Recruitment
We focused on the inner core of the Boston metropolitan area for this study. The study region comprises 21 cities and towns, which form the Inner Core Committee (ICC) subregionFootnote 3 as defined by the Metropolitan Area Planning Council (MAPC)Footnote 4, which is the regional planning agency for Metro Boston. The inner core subregion of the Boston metropolitan area is home to approximately 1.8 million residents (just over half of the regional population in the metro area) living in around 720,000 households. The study region spans north up to Lynn and Saugus, south up to Quincy and Milton, and west up to Needham and Waltham (see Fig. 8 in the Appendix). We set the initial target sample size to 2000 participants. One-year estimates from the American Community Survey (ACS) from 2022 were used to generate quotas based on key sociodemographic indicators, such as age, gender, household income, race, ethnicity, and education. This resulted in over 700 quota buckets, which were aggregated to 71 categories with a minimum size of 10. The initial screening survey was designed to automatically end for users if their respective quota bucket was already filled with active participants.
We started the recruitment phase in June 2023. Our initial strategy was to distribute physical and/or digital recruitment flyers through partner organizations. At the time, we had not allocated any explicit budget for recruitment. Potential partners included local public/governmental organizations (such as planning agencies, transportation departments, public libraries, community engagement offices), transportation providers (The Massachusetts Bay Transportation Authority, MBTA) as well as NGOs from various fields (including equitable city development, low-income support, walking/biking advocacy groups). Our research team conducted extensive outreach efforts with very limited returns, forcing us to reconsider our recruitment strategy.
We consequently reduced the target sample size to 1,000 participants, reallocating the saved funds specifically for recruitment. In partnership with the local public transportation agency (MBTA), we launched a campaign using large 45x60-inch recruitment posters installed at selected major subway stations in Boston metropolitan area. These posters had the highest stated impressions (number of views, regardless of engagement) of around 6 million for two months, as opposed to smaller posters placed within buses and subway cars. We started this campaign in late August and ran it for nine weeks. Figure 1 shows the temporal development of user participation from the beginning of the campaign. By the end of the campaign, there were around 800 participants that had successfully started tracking. We supplemented the campaign by conducting further recruitment efforts through social media (primarily Facebook) to reach the target sample size. We used a professional Facebook account to request membership in around 100 local groups, focusing on parts of the study region with low response rates. This turned out to be more effective than we had originally anticipated, and by the end of the year, we had reached our target of 1,000 participants who had started tracking.
Fig. 1
Participant recruitment over time from August 28, 2023
Figure 1 further shows a substantial share (around 25%) of participants who finished the initial introductory survey but did not end up activating the app, even after being sent multiple reminder emails with relevant materials. This can be attributed to a variety of reasons, such as technical problems while downloading and initializing the app, a loss of interest, or psychological barriers to installing an application that collects highly sensitive data.
Tracking
GNSS traces were collected using the Catch-my-Day smartphone application developed by MotionTagFootnote 5. The system has previously been used for several academic GNSS tracking studies (see Molloy et al. (2023), Heinonen et al. (2023), Loder et al. (2024)) and is actively developed and maintained. The app collects relevant phone sensory data (such as the GNSS signal, gyroscope data, and accelerometer data) and sends it to a back-end server. The data stream is then transformed into an activity-based travel diary, characterized by a continuous timeline with stationary (stay) and non-stationary (track) events. These track events are processed by a mode-detection algorithm that automatically infers which mode of transportation was used (among car, walking, biking, rail transit, and airplane) with an average accuracy of over 90% (Molloy et al. 2023). The stay events are subject to a routine detection algorithm that, once labeled by the user, detects home and work locations in an automated manner. After processing, the resulting timeline is visualized within the app. Figure 2 shows the main user interface screens for the iOS version.
Fig. 2
Catch-my-Day user interface (iOS version)
For each day, the app visualizes the detected timeline on a map with a chronological list of events (Fig. 2, left). The user can correct any of the detected information, such as the transportation mode for a track event or the purpose of a stay event. Apart from the auto-inferred modes, the user can choose from a variety of other relevant transport modes (such as e-bikes; see Table 2 for the full list of modes). The list of possible purposes for stay events is shown in Table 3. Users can also merge events or completely delete events that have not been correctly recorded or that they do not want to have recorded for any reason. After making these potential adjustments, users can explicitly validate each day by tapping the red check mark, indicating that all the shown information is correct. We did not formally require users to validate in order to consider their participation as successful, but we consistently encouraged participants to validate through various communication channels. The app also includes a summary screen (Fig. 2, right) that provides the user with aggregated statistics about their travel behavior. Previous research has demonstrated the importance of this feature in motivating users to use such an app (Dastjerdi et al. 2019; Winkler et al. 2023).
Participants received communication materials via an automated email system throughout the tracking phase. This system would identify users for whom the app stopped recording data (operating systems commonly block continuous access to sensitive data), send reminders to validate the data, or notify participants about reaching the required number of tracking days. Participants could directly respond to these emails to get in touch with a help desk managed by our research team.
Data
The data we collected through the BostonWalks study can be divided into three components: (1) user data, (2) trip data, and (3) activity data. Out of the total 1,001 participants who started tracking, 990 successfully completed the mandatory 14-day tracking period. Figure 3 shows how long participants actively tracked. The median tracking period is around 23 days for both operating systems; the average is around 37 days (40 days for iOS and 35 days for Android). iOS users tend to track slightly longer, which can be attributed to the app generally working better on iOS as opposed to the Android platform which runs on hardware from many different manufacturers. About 15% of the sample tracked for the complete project duration of about 120 days.
Fig. 3
Kaplan-Meier survival curve of BWS respondents
MotionTag applies its proprietary backend algorithm to incoming smartphone sensor data, conducting an initial round of outlier filtering, trip segmentation, and mode detection. After this initial processing, we receive the raw data from MotionTag as mode-annotated GNSS traces, including timestamped latitude and longitude values for each trip and each user. We cleaned the raw data, i.e., the collected stream of events, to ensure accurate and reliable analysis results. A first round of filtering ensured that all traces fall within the study boundaries (see Fig. 8, shapefiles sourced from MassGIS). A second filtering step was applied based on threshold speeds and distances, as described in the dataset documentation available on our website. These thresholds were defined based on expert knowledge, and may require custom adjustments based on the analysis use-case. A third filtering step addressed urban water bodies. Trajectories that were partially located within water bodies were excluded. Trajectories entirely within water bodies were retained, as they may represent legitimate water-based transport modes (e.g., ferries).
The resulting clean dataset comprises around 407,000 events, of which 235,000 are track events and 172,000 are stay events, from a total of 984 users over a total of around 34,000 tracked days. Figure 7 in the Appendix shows how many of these days are completely tracked, i.e. have events that cover each days’ respective 24 hours. One can see that this holds true for around 75%, and only around 9% have no events at all. Incomplete days, i.e. interrupted tracking, can have several causes; apart from the previously mentioned filtering which accounts for around 5%, these include loss of connectivity, loss of access to the sensory data (usually through the smartphone OS), smartphones running out of battery, or the app being (temporary) disabled by the user. We refer to Mesaric et al. (2022), Zhao et al. (2021), Harrison et al. (2020) for a more detailed discussion on spatiotemporal gaps in GNSS tracking systems and how those can be filled or imputed.
We ensured data security and participant privacy through multiple safeguards. All sensor data collected via the mobile app was transmitted to the MotionTag platform using cryptographically encrypted secure connections and anonymized user IDs, in full compliance with the EU General Data Protection Regulation (EU-DSGVO). User-specific information (e.g., age, gender, income) was collected through Qualtrics and linked to the anonymized movement data only within encrypted computing environments at MIT and Northeastern University. Access to these servers was restricted to researchers who had signed a non-disclosure agreement. All published results are presented in aggregated form to protect individual privacy. In addition, participants retained the right to withdraw from the study or permanently delete their data at any time by contacting the project team.
User data
Table 1 shows the composition of the unweighted and weighted samples from this study with regard to key sociodemographic indicators and compares them against the ACS sample as a reference. We observe that the unweighted sample is skewed towards younger individuals with higher levels of education. The level of education is the indicator with the highest discrepancy, where more than 80% of the (unweighted) sample have at least a bachelor’s degree, compared to about 45% in the general population. This bias is largely expected for two reasons. Not only do younger, better educated individuals have higher digital literacy, but they also have a greater interest in participating in academic studies. The BostonWalks study (BWS) sample and the ACS sample are largely similar with regard to household income and ethnicity. Black or African-American respondents are underrepresented in our sample, while Asian or Asian-American respondents are slightly overrepresented. Figure 8 in the Appendix shows the spatial distribution of participants across the study area by ZIP code. We see a radial pattern, whereby we obtained a large number of participants in and around downtown Boston but the number of participants decreases as we move further away from downtown. This is likely due to our recruitment strategy in which we installed posters in major subway stations in and around downtown Boston, thereby correlating the spatial variation with subway station catchment areas. We were aware of this a priori and tried to partially compensate through the social media outreach toward the end of the recruitment phase. There are some ZIP codes on the outskirts of the study area in less dense municipalities for which we do not have any participants.
To ensure comparability for subsequent analyses, we weighted participants using Iterative Proportional Fitting (IPF) for the following variables: age, gender, education level, household income, race, and ethnicityFootnote 6. The relatively large differences between some variables and their respective target marginals resulted in a small number of users having very large weights. Considering the sample size, we capped the weights at a maximum value of 10 to make the following analysis less prone to outliers and to prevent the over-amplification of certain observed behaviors that are rare. We see from Table 1 that the weighted BWS sample matches the ACS sample fairly well. Apart from two attribute levels (the shares of American Indians/Alaska Natives and other races), all other levels show smaller discrepancy toward the target marginals compared to the unweighted sample. In particular, the distributions of age and education levels are much better aligned. The largest remaining discrepancies relate to the lowest education level (less than high school) and older age groups (over 75 years), as our original sample did not have enough individuals from these categories. It should be noted that post-stratification can correct first-order marginals but does not guarantee unbiased behavioral insights for sparsely sampled groups, explaining some of the differences we observe here. Each finding must therefore be considered against the effective sample size of that group.
Trip data
The GNSS data we collected are in the form of sequences of events. Track events are defined as continuous movements while using a specific transportation mode. Compared to traditional self-reported travel diaries, GNSS tracking has a significantly higher spatiotemporal granularity, to the point where even longer stops at red lights can be detected as individual events. Travel behavior studies typically focus on trips (i.e., journeys from an origin to a destination) as the unit of analysis (see Axhausen (2007) for a detailed trip definition). The aggregation of track events (often referred to as stages) into trips is typically done using a simple time-based approach, where track events are merged into a trip as long as potential stay events in between do not exceed a certain threshold, say 120 seconds (2 minutes) or 300 seconds (5 minutes) (Wolf et al. 2004; Stopher et al. 2008). This threshold needs to be defined based on the characteristics of the data and the urban context in which data were collected. Public transit trips tend to be particularly prone to incorrect aggregation as transit trips can have substantial waiting in between two track events, especially when people make transfers. After manually inspecting the data, we identified several cases to differentiate trip sequences.
- 1.
We merged sequences of successive track events into trips and defined the stage with the longest distance as the main mode of the trip.
- 2.
We merged sequences of alternating stay and track events in the following manner, again with the main mode defined as the longest-distance stage.
- (a)
We merged any track event before and after a stay event labeled as “wait” into a trip, independent of the stay event duration.
- (b)
For public transit track events, we merged them with the preceding track event when the stay event in between was labeled as “wait” or “unknown” and was under 900 seconds (15 minutes).
- (c)
For non-transit track events, we adopted the same approach as transit track events but used a lower stay event duration threshold of 180 seconds (3 minutes).
This approach led to a total of around 155,000 trips, of which about a quarter (26%) are inter-modal, i.e., merged together from multiple events with different modes using the methodology described above. Figure 4 shows the resulting distribution of the number of daily trips for the BWS sample and the Massachusetts sample from the 2017 National Household Travel Survey (NHTS) as reference (FHWA 2017). The BWS values are calculated based on the tracked days that have a full 24 hour coverage (see Fig. 7).
Fig. 4
Distribution of number of daily trips for the BWS and 2017 NHTS samples, both weighted with regard to the 2022 ACS sample
We chose the NHTS from 2017 as reference as the data can be filtered by state, allowing us to focus only on the Massachusetts sample. While still not a perfect comparison owing to different study areas (the entire state for NHTS vs. only 21 cities and towns for BWS), the NHTS provides the most comparable and recent data as opposed to the Massachusetts household travel survey from 2011 (MassDOT 2011) or the latest NHTS from 2022 which allows users to filter by region (e.g., the Northeast) but not by state (FHWA 2022). We observe the average number of trips to be around 3.75 in the NHTS compared to 4.9 for the BWS sample. This difference is expected and we caution against attributing it completely to a change in travel behavior between 2017 and 2022 (which has occurred to some extent) or seasonality effects (both samples include hotter and cooler seasons). Rather, the primary reason behind this difference is the underlying data collection method itself. The NHTS uses (mostly) web-based self-reporting which tends to be prone to under-reporting, especially for short trips and/or when numerous daily trips are made. We also see from Fig. 4 that a substantial share of the NHTS sample (around 12.9%) reports not making any trips at all, which can be explained by the fact that the NHTS considers only one specific day per respondent as opposed to a multi-day period (as was used in our BostonWalks study).
The resulting mode shares of the BWS sample, again compared to the 2017 NHTS data, are reported in Table 2. The first three columns show how different transport modes from both surveys are aggregated to facilitate this comparison. The mode shares as well as the mode-dependent average trip distances show expected differences between the largely urban travel behavior of the BWS sample and travel patterns measured across the entire state of Massachusetts. We observe the largest difference in walking mode share; the BWS sample has a walking trip share of 21.3%, which is approximately ten times higher than the 2.1% observed in the NHTS data. In the BWS sample, four in ten trips (39.2%) are made by car, while the average distance of car trips is about a third (5.0 mi) of the NHTS sample (15.2 mi). Urban rail transit (subway, light rail, and tram) has the second-largest mode share (26.9%) within the BWS sample, compared to only 3.4% for the NHTS sample, with almost identical average distances. This large share is partly related to our recruitment method. Urban road transit (bus) trips are almost twice as frequent in the BWS data (4.1%) compared to NHTS (2.2%), again with similar average distances. Bike trips are substantially more frequent (5.6%) in the BWS data, with slightly higher average distances. Relative to regular bike trips, the shares of electric and shared bike trips are unexpectedly low. This is likely because the Catch-my-day app cannot automatically differentiate between these bike categories, which are also sometimes prone to mislabeling by participants.
Table 3 shows the same statistics as Table 2, but differentiated by trip purpose as opposed to mode of transport. The purpose of any given trip is defined by the label of the succeeding stay event. The labeling of these stay events requires substantial user interaction within the app to validate the data, which was not an official requirement for participation. Therefore, we computed summary statistics only for trips with an available purpose label (anything except ’unknown’), amounting to around 55,000 trips (36% of all trips). The first three columns again show how the different labels from both datasets are aggregated for the sake of comparison. We find the shares of trip purposes to be fairly similar, with home, work, and shopping being the three most frequent trip purposes in both datasets. However, the distributions of average distances are substantially different, with the distances in the BWS data being much smaller, reflecting the difference in the size of the study areas.
Activity data
In addition to trip data, GNSS tracking also provides information about stationary time periods. These can be used to derive trip purpose information, but more generally also represent a valuable source of information for many time-use and/or transport-related applications. We show the frequency and average duration of different stay events in Table 4. As previously mentioned, around 60% of these were not labeled by participants. The remaining events show a similar distribution to the trip purposes shown in Table 3. The wait (PT) and wait (non-PT) events (marked *) are derived from the trip aggregation methodology and represent stay events that have been merged into trips.
Results and discussion
In this section, we discuss findings from a high-level correlation and cluster analysis of mode share patterns to demonstrate the richness of the data and to highlight the importance of differentiated granular data collection and analysis for equity-focused urban planning.
Correlation analysis
Using correlation analysis, we first explored which sociodemographic indicators have statistically significant relationships with transport mode share and trip distance. We focused on income, age, race, education level, and gender as indicators of interest, as well as the predominant modes of transport found in the data, i.e., car, subway, walking, and biking (see Table 2). Using weighted dataset, we performed a Kruskal-Wallis Analysis of Variance (ANOVA) test to compare the different distributions. A p-value smaller than 0.05 is considered significant for rejecting the null hypothesis at a 95% confidence level, indicating that at least one of the distribution means is not equal to the others.
The results of this analysis show that all our considered indicators significantly impact the distribution of mean walking trip distances. We observe a highly significant relationship with income categories ((p < 0.001)), where higher income groups walk about 20% longer (in terms of distance) than lower income ones on average (Fig. 10). Education also has a comparable effect: individuals with the highest education level walk about 20% longer distances than the ones with the lowest (Fig. 16). This is unsurprising given the typically pre-existing correlation between education and income, which also exists for the participants in this study (Fig. 9). Trip distances also vary significantly across age and race categories (Figs. 12 and 14). Cycling distances only show a significant difference across income levels, where higher-income individuals bike about twice as far per trip on average compared to lower-income individuals (Fig. 10). The income effect on both biking and walking distances can potentially be attributed to the fact that higher-income individuals can more easily afford to live in centrally located neighborhoods that are generally more walkable and include better biking infrastructure.
Income emerged as a key factor influencing mode share across all transportation modes, with the strongest impact observed in subway and car usage. Individuals in the highest income category have a car mode share approximately 50% higher than those in lower income groups. Conversely, subway mode share declines steadily as income increases, with a roughly 50% difference between the highest and lowest income levels. These patterns may be linked to the cost of vehicle ownership (Basu and Ferreira 2021; Klein et al. 2023). Income also significantly affects walking and cycling mode shares, with higher income levels correlating with increased participation in both activities (Fig. 11). For cycling, the difference between the highest and lowest income groups is approximately fourfold. The effect of income on walking and cycling may be attributed to better walkability and bikeability in wealthier, more central neighborhoods, similar to its influence on trip distances. Broader sociocultural dimensions such as the local mobility culture and the social image or status associated with certain travel modes, and longstanding personal or community-level preferences may also shape non-auto travel choices. In addition to income, age and education levels exert a notable influence on subway mode choice, while race significantly affects walking trip shares (Figs. 13,15 a