National Transit Atlas Technical Documentation

Overview

This document provides information on the National Transit Atlas, including project background, data sources and methodology, and additional context to help people use the platform with confidence. Additional discussion on the methodology used in the project is located on the Transit-Oriented Discoveries Blog

Project Timeframe

The National Transit Gazetteer and Atlas was developed between July 1 2024-January 10, 2026. Transit agency and station data included in the platform represents fixed guideway modes (i.e. heavy rail, light rail, commuter rail, bus rapid transit, ferry, streetcar rail, monorail, incline plane, and ariel tramway) in operation as of December 31, 2025. We anticipate updating the platform in the fall of 2026 to incorporate stations that are newly opened or newly closed as of September 2025.

If you believe the platform is missing modes or stations, please contact connect@transitdiscoveries.com.

Transit Station Data Elements and Sources

The project’s database relies on General Transit Feed Specification GTFS data for station locations and associated transit routes and data from the National Transit Database NTD Facility Inventory for station age, size and configurations. Data from Wikipedia articles and transit agency websites were used to confirm information from both sources and to selectively fill in gaps, when necessary. Data on a station’s the street address, city, county, and State was generated using Google Maps and reverse geocoding. The chart below summarizes the data used in the Gazetteer’s station summaries:

A Note About Bus Rapid Transit (BRT)

The Bus Rapid Transit systems included in the Gazetteer and Atlas are documented in submissions to the National Transit Map hosted by the Bureau of Transportation Statistics or the 2023 National Transit Database Facility Inventory. In a few instances, (such as the Madison Rapid Line A and the King County G Line, both of which opened in the fall of 2024) data that had not yet been reported to either of these databases was included in this platform. Some users of the Gazetteer may not find the BRT in their community included on the map. This is likely because of a lack of standardization on how the mode is defined, leading to inconsistencies in what qualifies as "BRT”. If you believe a BRT system is missing from the Gazetteer and Atlas, please contact connect@transitdiscoveries.com.

Use of Artificial Intelligence (AI) for Station Summaries

This project uses OpenAI's GPT-3.5 model to craft a conversational, easy-to-understand summary of each station being queried. These summaries are grounded in the GTFS and NTD data used in the project along with the Wikipedia entries generated with separate code. AI summaries may also include information about station usage or characteristics of transit agencies and surrounding areas not included in the GTFS and NTD datasets. Much of the information provided by the AI chatbot can be verified by examining the corresponding station area map, the Wikipedia sources generated, or the data in the database. This platform uses AI as an interpretive layer, not an independent source of information. If you have concerns that any of the information in the paragraphs is incorrect, please contact connect@transitdiscoveries.com.

Use of Wikipedia

The platform uses the Wikipedia Application Programming Interface (API) along with the name of the transit station selected, the name of the agency selected, the agency’s location, and the term “transit station” to search Wikipedia for up to three relevant articles. This function includes a scoring system that prioritizes articles that mention both the station name and location in their title or early content. The articles are incorporated into the AI prompt to provide additional context and returned alongside the generated summary for users to explore further.

Users may find that Wikipedia returns irrelevant articles for some station searches. Reasons for this may include non-unique station names (such as the “Convention Center Station” could exist in multiple transit systems) and the search algorithm’s limitations.

In addition, the platform is more likely to generate Wikipedia articles for larger stations in big cities, such as subway stations then for smaller stations such as bus rapid transit lines in smaller communities. Larger metropolitan areas tend to have more comprehensive Wikipedia coverage from volunteer contributors. Regions with more active local historians, transit enthusiasts, or Wikipedia editors will have more comprehensive articles.  Stations with historical importance or unique architectural features are more likely to have dedicated Wikipedia pages

Use of OpenStreetMap Points of Interest

The platform connects with the Overpass API, which is part of the OpenStreetMap (OSM) ecosystem, to fetch nearby points of interest (POIs) dynamically. The query searches for amenities (such as public facilities), shops, and tourist attractions within 800 meters of the station longitude and latitude coordinates. The code the sorts the POIs by proximity and returns the closest three points of interest.

OSM points of interest are tagged in real-time so the three closest points of interest to a particular station today may not be the same a month from now.

Frequent Users of the National Transit Gazetteer and Atlas may notice that some stations are associated with somewhat trivial or quirky points of interest such as “a mailbox”  or “a statue”. This is a direct result of how OpenStreetMap is crowd-sourced and how different contributors document their local environments. Some contributors are extremely detailed, mapping even very minor features like individual mailboxes, street furniture, or small landmarks while others focus on more major landmarks. There are very few strict rules about what can or cannot be mapped.

In addition, OSM tends to have a documentation bias where affluent areas tend to have more detailed POI documentation and regions with active tech communities or university campuses often have more comprehensive mapping.

Nevertheless, we hope you find the OSM points of interest valuable insofar as they add “local color” to station area summaries and enhance the information included in the GTFS or NTD data. 

Use of Concentric Circle Overlays

The Gazetteer and Atlas incorporates allows users to visualize areas within a 200 meter, 400 meter, and 800 meter radius from the longitude and latitude coordinates of the  transit stations. These areas were chosen consistent with common practice for transit-oriented analysis with a 200 meter radius representing the station and immediate surrounding area, an 800-meter radius incorporating the furthest distance that most people are willing to walk to a station and ¼ mile (400 meter) providing an intermediary threshold. Future enhancements may provide more sophisticated information about walking distance to a stion that takes into account local barriers to access (such as rivers or highways).

Some platform users may notice that the dots representing station areas, along with heir concentric circles, are not precisely aligned with the station location on the OpenStreetMap base layer. This typically has to do with how station location coordinates are captured in GTFS stops data versus how OSM volunteers chose to render the station location on the map. Typically these differences are less than 100 meters, however if users spot larger discrepancies, please contact connect@transitdiscoveries.com.

Built Environment Data

The National Transit Station Gazetteer and Atlas contains a thumbnail sketch of the urban form development patterns around each station. The data underlying these summaries was queried using the OpenStreetMap API (overpass).

Data on the number of buildings, building footprint, and building heights are also organized into three bands based on their distance from the transit station. Users can analyze data on buildings within 200 meters, 400 meters, and 800 meters of a station to better understand how (if at all) urban form changes with distance from the transit hub.

Data on parking lots, garages, and square feet also identifies the number of lots and garages that are 2,000 square feet or greater. This data is a subset of the total parking data and is used to estimate the number of buildings that could be built within 1/2 mile of a transit station.

Built Environment Renderings

This web application visualizes transit stations and their surrounding urban environments in an interactive 3D interface. When a user selects a transit agency and station, the system makes API calls to a Flask backend, which retrieves geographic data from OpenStreetMap's Overpass API. This data includes detailed information on buildings, parking structures, and other urban elements within 800 meters of the selected station. The application processes this data to extract critical attributes like building heights (calculated from explicit height tags or estimated from building levels) and parking designations. The processed data is then rendered in the browser using the deck.gl visualization framework, which creates a three-dimensional map with color-coded layers: red for the station location, gray for buildings, and light red for parking structures.

Under-mapped Station Areas

The building information in the platform comes from OpenStreetMap data, which is created and maintained by a global community of volunteers. OpenStreetMap (OSM) completeness varies by location, and areas with fewer active contributors often have incomplete or outdated building data (especially heights, footprints, or even presence of buildings). This can impact both the raw statistics and the AI-generated summaries derived from them. As a result, the statistics and summaries may underestimate the actual amount of development near some transit stations.

The transit-oriented-discoveries dataset includes a flag identifying which station areas are “under-mapped” where building data has not yet been added to the OSM database. The land use summaries for these stations contain a sentence noting that building information is incomplete. Under-mapped stations were identified based on a visual inspection of all 5,116 station areas. About 15% of the stations in the database are under-mapped and data gaps are more frequent in stations serving suburban and exurban areas (such as commuter rail). In addition, a column in the “urban form” dataset identifies the number of buildings that were tagged with height data can help users identify the extent to which development profiles may lack complete building height information.

Housing Estimates

The Gazetteer and Atlas provide estimates of the number of housing units that could be built on surface parking and parking garages within 1/2 mile of a station. These estimates are zoning-agnostic in that they do not take into account existing regulations on building height, setbacks, or land uses. They also assume no minimum parking requirements within 1/2 mile of a station. The housing estimates limit development to surface parking and/or parking garages that are 2,000 meters or more under the assumption that building on larger lots may be more straightforward. The estimates also restrict the height of new housing to no higher than the tallest building located within 1/2 mile of the station area (i.e if the tallest building is 2 stories, any new development would not exceed 2 stories, but if the tallest building is 44 stories, a new apartment building could rise up to that height).

The Gazetteer and Atlas also provide “conservative” and “aggressive” housing scenarios based on building height, lot coverage, space for public and commercial use, and housing unit size. The conservative estimate assumes that a new building height would not exceed 60% of the tallest nearby building and that a new development occupies 75% of the lot, which accounts for setbacks, landscaping, walkways, and driveways. This estimate also assumes 20% of a building is reserved for commercial/public use such as Commercial/retail on the ground floor, lobbies, stairwells, elevators, and community facilities (e.g., daycares, offices, civic space). Finally, the conservative scenario assumes each housing unit is 1,000 square feet.

In contrast, the aggressive scenario assumes building heights at 100% of the tallest nearby building, 85% lot coverage, 10% of space reserved for public use, and each housing unit is 800 square feet.

These estimates can translate into hundreds of thousands of new housing units around stations, especially those surrounded by large amount of surface parking and existing high-rise buildings. It is unlikely that any station area will be entirely redeveloped to the parameters used here. Rather, the scenarios illustrate the potential for context-sensitive development near transit and how future development may vary depending on the surrounding built environment.

Please contact connect@transitdiscoveries.com for additional questions about this methodology or to discuss modeling different scenarios based on additional or new parameters.

Estimating Housing Across Multiple Stations

Atlas users also have access to data that estimates the total number of housing units that can be built across multiple stations, such as a transit line, a transit system, zip code, city, county, or state. The methodology for these housing estimates is the same as described above except that calculations pro-rate the amount of parking over 2000 square feet associated stations that are located within 1/2 mile of one another to avoid double counting. Approximately 2,700 out of 4,900 stations are located within 1/2 mile of another station. In these instances, The Transit-Oriented Discoveries algorithm identifies the number of other stations within 1/2 mile of the reference station and pro-rates the parking associated with the reference station by the number of other stations. For example, if a reference station is located within 1/2 mile of two other stations that station’s total parking square footage is assigned a coefficient of 0.333 and the amount of housing is based off of the new (smaller) parking area.

Roadway Barrier Detection

The National Transit Station Atlas uses computer vision technology to identify transportation barriers—motorways, arterial roads, and primary roads- that may affect pedestrian access to transit stations along with comfort and convenience. Computer vision is a form of artificial intelligence that can "see" and interpret images, similar to how humans recognize objects in photographs. The platform retrieves satellite map imagery covering an 800-meter radius around each transit station and applies computer vision algorithms to automatically detect and measure major roadways (such as highways, arterials, and trunk roads) The system calculates the total footprint of these barriers in meters and generates a visualization showing where barriers are located relative to the station and the amount of surface area occupied by each barrier.

Street Network Analysis

The National Transit Station Atlas analyzes the pedestrian street network within an 800-meter radius of each transit station to assess how easy it is to walk in the surrounding area. The platform uses street and intersection data from OpenStreetMap queries for all street segments excluding motorways. Block length—the average distance pedestrians must walk between intersections—is calculated by measuring the geographic distance between consecutive intersection points along each street segment and then averaging these distances across all streets in the analysis area.

The platform calculates key metrics that urban planners use to evaluate walkability, including the total number of streets and intersections, average block length, and the density of intersections and streets per square kilometer. These measurements are important because areas with shorter blocks, more intersections, and denser street networks generally provide pedestrians with more route choices and shorter walking distances to their destinations.

 The platform calculates a walkability score on a 0-100 scale using a weighted composite of five factors. The two primary factors—block length and intersection density—each contribute up to 45 points. Block length measures the average distance between intersections along streets, with shorter blocks (under 75 meters) receiving maximum points because they provide pedestrians with more frequent opportunities to change direction or access buildings. Intersection density, measured as intersections per square mile, rewards areas with more connected street networks, with the highest scores (380+ intersections per square mile) typically found in dense urban grids like Manhattan or downtown San Francisco. A synergy bonus of up to 10 additional points rewards station areas that score well on both metrics simultaneously, recognizing that truly walkable environments excel at multiple dimensions. Street density contributes up to 5 points based on the number of streets per square kilometer, favoring areas with more route options. Finally, the system applies a connectivity penalty (up to -5 points) for networks with poor connectivity, calculated using the average degree of intersections—areas where streets frequently dead-end rather than forming a connected grid receive lower scores. Based on the final score, each station receives one of five categories: "Highly Walkable Grid" (92-100 points, reserved for the top 5-10% of stations with exceptional urban form), "Walkable Urban" (82-91 points), "Transitional/Mixed" (65-81 points), "Suburban" (45-64 points), or "Car-Oriented" (below 45 points). While the data represents an 800-meter radius, the Atlas presents the viewer with visuals representing a 400-meter radius from the station in order to reduce API overload.

Walkshed Analysis

The National Transit Gazetteer and Atlas uses isochrone mapping to visualize and measure the geographic area pedestrians can reach from each transit station within specific time intervals. An isochrone (from the Greek words for "equal" and "time") is a line or polygon that connects all points reachable within a given travel time. The platform generates three isochrones for each station representing 5-minute, 10-minute, and 15-minute walking distances using the Mapbox Isochrone API, which calculates these areas based on the actual street network, sidewalk connectivity, and a standard walking speed of approximately 3 miles per hour (4.8 kilometers per hour). Unlike a simple circular buffer that assumes people can walk in any direction, isochrones account for real-world constraints such as highways that cannot be crossed, rivers without pedestrian bridges, dead-end streets, and gaps in the sidewalk network. The platform measures the total area (in square miles and acres) that falls within each time band and calculates an "efficiency" percentage by comparing the actual 15-minute walkshed area to the theoretical maximum area achievable if pedestrians could walk in straight lines in all directions. Stations are categorized as "Well Connected" (60%+ efficiency), "Moderately Constrained" (40-59% efficiency), or "Highly Constrained" (below 40% efficiency) based on how much physical barriers and street network gaps limit pedestrian access. This analysis helps identify which stations serve larger or smaller catchment areas and reveals where infrastructure improvements could expand the number of residents and destinations within comfortable walking distance of transit.

Pedestrian and Bicycle Infrastructure

The National Transit Gazetteer and Atlas evaluates the quality and extent of pedestrian and cycling infrastructure within an 800-meter radius of each transit station. The system queries OpenStreetMap for street segments tagged as pedestrian facilities—such as "footway" (sidewalks along streets), "pedestrian" (pedestrian-only zones),  and "path" (multi-use paths). For cycling infrastructure, the platform identifies dedicated cycleways (bike lanes separated from vehicle traffic) and streets tagged with bicycle="yes" or bicycle="designated" indicating they are designed to safely accommodate cyclists. The platform calculates the total number street segments in the transit station area, excluding motorways, and identifies the number and percentage of segments that include a sidewalk and/or a bike infrastructure, While the data represents an 800-meter radius, the Atlas presents the viewer with visuals representing a 400-meter radius from the station in order to reduce API overload.

Accessing the Underlying Data and Data Dictionary

Additional methodological notes are included in each page of the Atlas and underlying data in csv file format can be downloaded from the “Data” tab.

Future Updates

Transit-oriented discoveries is a work in progress, and we plan to add additional functionality based on user experience and feedback. Planned features include transportation network data and analytics to determine the extent to which transit areas are safe and attractive for pedestrians and cyclists, information on additional land uses, such as residential, commercial, retail, and industrial development, data on points of interest and civic infrastructure such as schools, hospitals, and entertainment near stations, and demographic information. Feel free to reach out to connect@transitdiscoveries.com with suggestions for new features.