This study simulates park-and-ride trips to Midtown Station in Atlanta by combining driving from Census Tract centroids to the nearest MARTA rail station with subsequent rail travel. Using GTFS transit data, OpenStreetMap road networks, and American Community Survey demographic data, the analysis quantifies travel times and explores how transit accessibility varies across neighborhoods. The objective is to identify spatial and socioeconomic patterns that influence commute times, highlighting areas with longer or shorter travel durations and potential disparities in transit access. Such insights can inform urban planning decisions, promote equitable transit solutions, and support strategies for improving multimodal connectivity.
Importing the necessary packages is part of this assignment. Add any required packages to the code chunk below as you progress through the tasks.
library(tidycensus)
library(tidytransit)
library(sf)
library(tmap)
library(tidyverse)
library(osmdata)
library(sfnetworks)
library(tidygraph)
library(gtfsrouter)
library(units)
library(ggplot2)
# TASK ////////////////////////////////////////////////////////////////////////
# Download MARTA GTFS data and assign it to `gtfs` object
gtfs <- read_gtfs("https://www.itsmarta.com/google_transit_feed/google_transit.zip")
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Edit stop_name to append serial numbers (1, 2, etc.) to remove duplicate names
stop_dist <- stop_group_distances(gtfs$stops, by='stop_name') %>%
filter(dist_max > 200)
gtfs$stops <- gtfs$stops %>%
group_by(stop_name) %>%
mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
TRUE ~ stop_name))
# Create a transfer table
gtfs <- gtfsrouter::gtfs_transfer_table(gtfs,
d_limit = 200,
min_transfer_time = 120)
# NOTE: Converting to sf format uses stop_lat and stop_lon columns contained in gtfs$stops.
# In the conversion process, stop_lat and stop_lon are converted into a geometry column, and
# the output sf object do not have the lat lon column anymore.
# But many other functions in tidytransit look for stop_lat and stop_lon.
# So I re-create them using mutate().
gtfs <- gtfs %>% gtfs_as_sf(crs = 4326)
gtfs$stops <- gtfs$stops %>%
ungroup() %>%
mutate(stop_lat = st_coordinates(.)[,2],
stop_lon = st_coordinates(.)[,1])
# Get stop_id for rails and buses
rail_stops <- gtfs$routes %>%
filter(route_type %in% c(1)) %>%
inner_join(gtfs$trips, by = "route_id") %>%
inner_join(gtfs$stop_times, by = "trip_id") %>%
inner_join(gtfs$stops, by = "stop_id") %>%
group_by(stop_id) %>%
slice(1) %>%
pull(stop_id)
# Extract MARTA rail stations
station <- gtfs$stops %>% filter(stop_id %in% rail_stops)
# Extract Midtown Station
midtown <- gtfs$stops %>% filter(stop_id == "134")
# Create a bounding box to which we limit our analysis
bbox <- st_bbox(c(xmin = -84.45241, ymin = 33.72109, xmax = -84.35009, ymax = 33.80101),
crs = st_crs(4326)) %>%
st_as_sfc()
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
census_api_key(Sys.getenv("CENSUS_API"))
# TASK ////////////////////////////////////////////////////////////////////////
# Required data from the Census ACS:
# 1) Median Household Income (name the column `hhinc`)
# 2) Minority Population (%) (name the column `pct_minority`)
# Note: You may need to download two or more Census ACS variables to calculate minority population (%). "Minority" here can refer to either racial minorities or racial+ethnic minorities -- it's your choice.
census <- get_acs(
geography = "tract", state="GA",
county=c("Fulton","Dekalb","Clayton"),
variables = c(hhinc="B19013_001",
non_hispanic_white="B03002_003E",
race_ethnic_total="B03002_001E"),
year=2022, survey="acs5", geometry=TRUE, output="wide"
) %>% mutate(pct_minority = (race_ethnic_total - non_hispanic_white)/race_ethnic_total)
# //TASK //////////////////////////////////////////////////////////////////////
# Convert the CRS of `census` to GCS
census <- st_transform(census, 4326)
# //TASK //////////////////////////////////////////////////////////////////////
# get centroids of `census` polygons,
# extract centroids that fall inside the `bbox`,
# and assign it to `home` object.
home <- census %>% st_centroid() %>% st_intersection(bbox)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Get OSM road data for all road types for the `bbox` area.
# 3. Convert the OSM data into an sf object
# 4. Convert osmdata polygons into lines
osm_road <- opq(bbox=bbox) %>%
add_osm_feature("highway", value=c("motorway","motorway_link","trunk","trunk_link",
"primary","primary_link","secondary","secondary_link",
"tertiary","residential")) %>%
osmdata_sf() %>% osm_poly2line()
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_road$osm_line into sfnetwork
# 2. Activate edges
# 3. Clean the network using the methods we learned
# 4. Assign the cleaned network to an object named 'osm'
osm <- osm <- osm_road$osm_line %>%
select(geometry) %>%
sfnetworks::as_sfnetwork(directed=FALSE) %>%
activate("edges") %>%
filter(!edge_is_multiple()) %>%
filter(!edge_is_loop()) %>%
convert(to_spatial_subdivision) %>%
convert(to_spatial_smooth)
# TASK ////////////////////////////////////////////////////////////////////////
# Add a new column named 'length' to the edges part of the object `osm`.
osm <- osm %>% mutate(length=edge_length())
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest paths from `home_1` to all other stations
paths <- st_network_paths(osm, from=home_1, to=station, type="shortest")
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get shortest network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
osm %>%
activate("nodes") %>%
slice(paths$node_paths[[x]]) %>%
st_as_sf("edges") %>%
pull(length) %>%
sum()
}) %>% unlist()
# Replace zeros with a large value.
if (any(dist_all == 0)){
dist_all[dist_all == 0] <- max(dist_all)
}
# TASK ////////////////////////////////////////////////////////////////////////
# Find the closest station and assign the value to `closest_station`
closest_index <- which.min(dist_all)
closest_station <- station[closest_index,]
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Get the distance to the closest station.
closest_dist <- min(dist_all)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Calculate travel time based on the `closest_dist` assuming we drive at 20 miles/hour speed.
# Assign the value to `trvt_osm_m` object.
car_speed <- set_units(20, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>% as.numeric()
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Create a subset of stop_times data table for date = 2025-11-10,
# minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs, "2025-11-10","07:00:00","10:00:00")
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Calculate travel times from the `closest_station` to all other stations
# during time specified in `am_stop_time`. Allow ONE transfer.
# 2. Filter the row where the destination is Midtown station.
# 3. Assign it to `trvt` object.
trvt <- travel_times(am_stop_time, closest_station, max_transfers=1) %>%
filter(to_stop_name==midtown$stop_name)
if(nrow(trvt)==0) trvt_gtfs_m <- 0 else trvt_gtfs_m <- trvt$travel_time/60
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60
# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================
# Function definition (do not modify other parts of the code in this code chunk except for those inside the TASK section)
get_trvt <- function(home, osm, station, midtown){
# TASK ////////////////////////////////////////
# If the code in Step 4 runs fine,
# Replace where it says **YOUR CODE HERE..** below with
# the ENTIRETY of the previous code chunk (i.e., Step 4)
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest paths from `home_1` to all other stations
paths <- st_network_paths(
osm,
from = home_1,
to = station,
type = "shortest"
)
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get shortest network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
osm %>%
activate("nodes") %>%
slice(paths$node_paths[[x]]) %>%
st_as_sf("edges") %>%
pull(length) %>%
sum()
}) %>% unlist()
# Replace zeros with a large value.
if (any(dist_all == 0)){
dist_all[dist_all == 0] <- max(dist_all)
}
# Find the closest station and assign the value to `closest_station`
closest_index <- which.min(dist_all)
closest_station <- station[closest_index,]
# Get the distance to the closest station.
closest_dist <- min(dist_all)
car_speed <- set_units(20, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%
as.vector(.)
# TASK ////////////////////////////////////////////////////////////////////////
# Create a subset of stop_times data table for date = 2025-11-10,
# minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs, "2025-11-10", "07:00:00", "10:00:00" )
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Calculate travel times from the `closest_station` to all other stations
# during time specified in `am_stop_time`. Allow ONE transfer.
# 2. Filter the row where the destination is Midtown station.
# 3. Assign it to `trvt` object.
trvt <- travel_times(am_stop_time,
closest_station,
max_transfers = 1) %>%
filter(to_stop_name == midtown$stop_name)
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60
# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
# //TASK //////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
if (length(total_trvt) == 0) {total_trvt = 0}
return(total_trvt)
# =========== NO MODIFY ZONE ENDS HERE ========================================
}
# Prepare an empty vector
total_trvt <- vector("numeric", nrow(home))
# Apply the function to all Census Tracts
# Fill `total_trvt` object with the calculated time
for (i in 1:nrow(home)){
total_trvt[i] <- get_trvt(home[i,], osm, station, midtown)
}
## Stop distance check took longer than 1 second (1.1s). Set stop_dist_check=FALSE to skip it.
## Stop distance check took longer than 1 second (1s). Set stop_dist_check=FALSE to skip it.
# cbind the calculated travel time to `home`
home <- home %>%
cbind(trvt = total_trvt)
Create two maps and two plots by following the instructions in the code chunk below. Write a brief (maximum 200 words) description summarizing your observations from the maps and plots
# TASK ////////////////////////////////////////
# Create an interactive map displaying `census` (polygons) and `home` (points), effectively visualizing household income and travel time to Midtown Station, respectively.
tmap_mode("view")
## ℹ tmap modes "plot" - "view"
## ℹ toggle with `tmap::ttm()`
# Household income vs travel time
tm_shape(census[census$GEOID %in% home$GEOID,]) +
tm_polygons("hhincE", palette="Blues") +
tm_shape(home) + tm_dots("trvt", palette="Reds", size=0.5)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_tm_polygons()`: migrate the argument(s) related to the scale of
## the visual variable `fill` namely 'palette' (rename to 'values') to fill.scale
## = tm_scale(<HERE>).[cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
# Create an interactive map displaying `census` (polygons) and `home` (points) effectively visualizing the percentage of minority population and travel time to Midtown Station, respectively.
tm_shape(census[census$GEOID %in% home$GEOID,]) +
tm_polygons("pct_minority", palette="PuRd") +
tm_shape(home) + tm_dots("trvt", palette="Blues", size=0.5)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_tm_polygons()`: migrate the argument(s) related to the scale of
## the visual variable `fill` namely 'palette' (rename to 'values') to fill.scale
## = tm_scale(<HERE>).[cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "PuRd" is named
## "brewer.pu_rd"Multiple palettes called "pu_rd" found: "brewer.pu_rd", "matplotlib.pu_rd". The first one, "brewer.pu_rd", is returned.
# Create a scatter plot with a trend line showing the relationship between household income (x-axis) and travel time to Midtown Station (y-axis).
ggplot(home, aes(x=hhincE, y=trvt)) + geom_point() + geom_smooth(method="lm", se=FALSE) +
labs(x="Median Household Income", y="Travel Time (min)") + theme_bw()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).
# Create a scatter plot with a trend line showing the relationship between the percentage of minority population (x-axis) and travel time to Midtown Station (y-axis).
# Minority population vs travel time
ggplot(home, aes(x=pct_minority, y=trvt)) + geom_point() + geom_smooth(method="lm", se=FALSE) +
labs(x="Percent Minority", y="Travel Time (min)") + theme_bw()
## `geom_smooth()` using formula = 'y ~ x'
#//TASK //////////////////////////////////////
The simulation reveals notable spatial variations in travel time. Northern and northeastern tracts generally experience longer commutes due to their distance from MARTA stations, whereas southern and central neighborhoods benefit from closer proximity to transit infrastructure. Wealthier areas show slightly longer travel times, reflecting suburban locations with limited rail access, while tracts with higher percentages of minority residents tend to have shorter travel times, likely due to stronger reliance on public transit and better connectivity to transit corridors. Scatter plots indicate a modest positive correlation between household income and commute duration, and minimal correlation between minority population share and travel time, suggesting that geographic location is the primary driver of accessibility differences. Overall, the findings highlight a socio-spatial divide: northern, wealthier, and less diverse neighborhoods face longer travel times, while southern, lower-income, and more diverse communities enjoy faster transit access, reflecting Atlanta’s historic urban development and transit patterns.