In this assignment, you will simulate a journey that begins at a Census Tract centroid, drives to the nearest MARTA station, and then take the MARTA to the Midtown station. You will then compare how travel time varies by different Census Tracts. The steps and data required for this analysis are outlined below.
Step 1. Download the required GTFS data. Convert it to sf format, extract MARTA rail stations, and clean the stop names to remove duplicates. Also, extract the destination station.
Step 2. Download the required Census data. Convert Census Tract polygons into centroids and create a subset for analysis.
Step 3. Download the required OSM data. Convert it into an sfnetwork object and clean the network.
Step 4. Simulate a park-and-ride trip (from a test origin → closest station → Midtown station).
Step 5. Turn the simulation process from Step 4 into a reusable function.
Step 6. Apply the function from Step 5 iteratively to all home locations.
Step 7.Finally, create maps and plots to analyze whether there are any disparities in transit accessibility when commuting to Midtown.
Importing the necessary packages is part of this assignment. Add any required packages to the code chunk below as you progress through the tasks.
#For Census data & geometry task
library(tidycensus)
library(sf)
library(dplyr)
library(tidyr)
library(nominatimlite)
#For GTFS (transit schedule) analysis task
library(tidytransit)
library(gtfsrouter)
#For OpenStreetMap and routing task
library(osmdata)
library(sfnetworks)
library(tidygraph)
#For visualization task
library(tmap)
library(leaflet)
library(ggplot2)
library(purrr)
#For handling measurement units (miles, meters, etc.)
library(units)
# TASK ////////////////////////////////////////////////////////////////////////
#Download MARTA GTFS data and assign it to `gtfs` object
gtfs <- read_gtfs("https://www.itsmarta.com/google_transit_feed/google_transit.zip")
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Edit stop_name to append serial numbers (1, 2, etc.) to remove duplicate names
stop_dist <- stop_group_distances(gtfs$stops, by='stop_name') %>%
filter(dist_max > 200)
gtfs$stops <- gtfs$stops %>%
group_by(stop_name) %>%
mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
TRUE ~ stop_name))
#Create a transfer table
gtfs <- gtfsrouter::gtfs_transfer_table(gtfs,
d_limit = 200,
min_transfer_time = 120)
# NOTE: Converting to sf format uses stop_lat and stop_lon columns contained in gtfs$stops.
# In the conversion process, stop_lat and stop_lon are converted into a geometry column, and
# the output sf object do not have the lat lon column anymore.
# But many other functions in tidytransit look for stop_lat and stop_lon.
# So I re-create them using mutate().
gtfs <- gtfs %>% gtfs_as_sf(crs = 4326)
gtfs$stops <- gtfs$stops %>%
ungroup() %>%
mutate(stop_lat = st_coordinates(.)[,2],
stop_lon = st_coordinates(.)[,1])
#Get stop_id for rails and buses
rail_stops <- gtfs$routes %>%
filter(route_type %in% c(1)) %>%
inner_join(gtfs$trips, by = "route_id") %>%
inner_join(gtfs$stop_times, by = "trip_id") %>%
inner_join(gtfs$stops, by = "stop_id") %>%
group_by(stop_id) %>%
slice(1) %>%
pull(stop_id)
#Extract MARTA rail stations
station <- gtfs$stops %>% filter(stop_id %in% rail_stops)
#Extract Midtown Station
midtown <- gtfs$stops %>% filter(stop_id == "134")
#Create a bounding box
bbox <- st_bbox(c(xmin = -84.45241, ymin = 33.72109, xmax = -84.35009, ymax = 33.80101),
crs = st_crs(4326)) %>%
st_as_sfc()
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Specify Census API key whichever you prefer using census_api_key() function
census_api_key("877289a17bb63edb87bc6cdad7415c6fbd1d4189", install = FALSE)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Using get_acs() function, download Census Tract level data for 2022 for Fulton, DeKalb, and Clayton in GA.
# and assign it to `census` object.
# Make sure you set geometry = TRUE.
# Required data from the Census ACS:
# 1) Median Household Income (name the column `hhinc`)
# 2) Minority Population (%) (name the column `pct_minority`)
# Note: You may need to download two or more Census ACS variables to calculate minority population (%). "Minority" here can refer to either racial minorities or racial+ethnic minorities -- it's your choice.
census_api_key("877289a17bb63edb87bc6cdad7415c6fbd1d4189", install = FALSE)
#Median Household Income
income <- get_acs(
geography = "tract",
state = "GA",
county = c("Fulton", "DeKalb", "Clayton"),
variables = c(hhinc = "B19013_001"),
year = 2022,
geometry = TRUE
)
## | | | 0% | |= | 1% | |= | 2% | |== | 3% | |=== | 4% | |=== | 5% | |==== | 5% | |==== | 6% | |===== | 7% | |====== | 8% | |====== | 9% | |======= | 11% | |======== | 11% | |======== | 12% | |========= | 13% | |========== | 14% | |========== | 15% | |=========== | 16% | |============ | 16% | |============ | 17% | |============= | 18% | |============= | 19% | |============== | 19% | |============== | 20% | |=============== | 21% | |=============== | 22% | |================ | 22% | |================ | 23% | |================= | 24% | |================= | 25% | |================== | 25% | |================== | 26% | |=================== | 27% | |==================== | 28% | |==================== | 29% | |===================== | 30% | |====================== | 31% | |====================== | 32% | |======================= | 33% | |======================== | 34% | |======================== | 35% | |========================= | 36% | |========================== | 37% | |========================== | 38% | |=========================== | 39% | |============================ | 40% | |============================= | 41% | |============================== | 42% | |============================== | 43% | |=============================== | 44% | |================================ | 45% | |================================ | 46% | |================================= | 47% | |================================== | 48% | |================================== | 49% | |=================================== | 50% | |==================================== | 51% | |==================================== | 52% | |===================================== | 53% | |====================================== | 54% | |====================================== | 55% | |======================================= | 55% | |======================================= | 56% | |======================================== | 57% | |======================================== | 58% | |========================================= | 58% | |========================================= | 59% | |========================================== | 60% | |========================================== | 61% | |=========================================== | 61% | |=========================================== | 62% | |============================================ | 63% | |============================================= | 64% | |============================================== | 65% | |============================================== | 66% | |=============================================== | 67% | |================================================ | 68% | |================================================ | 69% | |================================================= | 69% | |================================================= | 70% | |================================================== | 71% | |================================================== | 72% | |=================================================== | 72% | |=================================================== | 73% | |==================================================== | 74% | |==================================================== | 75% | |===================================================== | 75% | |===================================================== | 76% | |====================================================== | 77% | |====================================================== | 78% | |======================================================= | 78% | |======================================================= | 79% | |======================================================== | 80% | |======================================================== | 81% | |========================================================= | 81% | |========================================================= | 82% | |========================================================== | 83% | |=========================================================== | 84% | |=========================================================== | 85% | |============================================================ | 86% | |============================================================= | 86% | |============================================================= | 87% | |============================================================== | 88% | |============================================================== | 89% | |=============================================================== | 89% | |=============================================================== | 90% | |================================================================ | 91% | |================================================================ | 92% | |================================================================= | 92% | |================================================================= | 93% | |================================================================== | 94% | |================================================================== | 95% | |=================================================================== | 95% | |=================================================================== | 96% | |==================================================================== | 97% | |===================================================================== | 98% | |===================================================================== | 99% | |======================================================================| 100%
#Race variables (Total and White Alone)
race <- get_acs(
geography = "tract",
state = "GA",
county = c("Fulton", "DeKalb", "Clayton"),
variables = c(total = "B02001_001", white = "B02001_002"),
year = 2022,
geometry = FALSE
)
#Step 1: Reshape race data into wide format
race_wide <- race %>%
select(GEOID, variable, estimate) %>%
tidyr::spread(variable, estimate)
#Step 2: Calculate % minority
race_wide <- race_wide %>%
mutate(pct_minority = 100 * (1 - white / total)) %>%
select(GEOID, pct_minority)
#Step 3: Join back with income
census <- income %>%
select(GEOID, hhinc = estimate, geometry) %>%
left_join(race_wide, by = "GEOID")
#Step 4: Check names and values
names(census)
## [1] "GEOID" "hhinc" "pct_minority" "geometry"
summary(census$pct_minority)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.9512 37.4959 71.5204 64.7885 93.1448 100.0000 3
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Convert the CRS of `census` to GCS
census <- census %>%
st_transform(crs = 4326)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# get centroids of `census` polygons,
# extract centroids that fall inside the `bbox`,
# and assign it to `home` object.
#Get bounding box coordinates for Atlanta (From in-class module)
bb <- nominatimlite::geo_lite_sf('Atlanta, GA', points_only = FALSE) %>%
st_bbox()
bb_sf <- bb %>% st_as_sfc()
home <- census %>%
st_centroid() %>%
st_intersection(bbox)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Get OSM road data for all road types for the `bbox` area.
# 3. Convert the OSM data into an sf object
# 4. Convert osmdata polygons into lines
osm_road <- opq(bbox = bbox) %>%
add_osm_feature(
key = "highway",
value = c(
"motorway", "motorway_link",
"trunk", "trunk_link",
"primary", "primary_link",
"secondary", "secondary_link",
"tertiary", "residential"
)
) %>%
osmdata_sf() %>%
osm_poly2line()
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_road$osm_line into sfnetwork
# 2. Activate edges
# 3. Clean the network using the methods we learned
# 4. Assign the cleaned network to an object named 'osm'
osm <- osm_road$osm_line %>%
select(geometry)%>%
sfnetworks::as_sfnetwork(directed = FALSE) %>%
activate("edges") %>%
filter(!edge_is_multiple()) %>%
filter(!edge_is_loop()) %>%
convert(sfnetworks::to_spatial_subdivision) %>%
convert(sfnetworks::to_spatial_smooth)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Add a new column named 'length' to the edges part of the object `osm`.
osm <- osm %>%
activate("edges") %>%
mutate(length = edge_length())
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest paths from `home_1` to all other stations
paths <- sfnetworks::st_network_paths(
x = osm,
from = home_1,
to = station,
type = "shortest"
)
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get shortest network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
osm %>%
activate("nodes") %>%
slice(paths$node_paths[[x]]) %>%
st_as_sf("edges") %>%
pull(length) %>%
sum()
}) %>% unlist()
# Replace zeros with a large value.
if (any(dist_all == 0)){
dist_all[dist_all == 0] <- max(dist_all)
}
# TASK ////////////////////////////////////////////////////////////////////////
# Find the closest station and assign the value to `closest_station`
dist_all <- sapply(paths$edge_paths, function(x) {
osm %>%
activate("edges") %>%
slice(x) %>%
st_as_sf() %>%
pull(length) %>%
sum()
})
closest_station <- which.min(dist_all)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Get the distance to the closest station.
closest_dist <- dist_all[closest_station]
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Calculate travel time based on the `closest_dist` assuming we drive at 20 miles/hour speed.
# Assign the value to `trvt_osm_m` object.
#1 mile = 1609.34 meters
trvt_osm_m <- (closest_dist / 1609.34) / 20 * 60
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Create a subset of stop_times data table for date = 2025-11-10,
# minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- tidytransit::filter_stop_times(
gtfs_obj = gtfs,
extract_date = "2025-11-10",
min_departure_time = 7 * 3600,
max_arrival_time = 10 * 3600
)
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Calculate travel times from the `closest_station` to all other stations
# during time specified in `am_stop_time`. Allow ONE transfer.
# 2. Filter the row where the destination is Midtown station.
# 3. Assign it to `trvt` object.
trvt <- tidytransit::travel_times(
filtered_stop_times = am_stop_time,
stop_name = station$stop_name[closest_station],
max_transfers = 1,
return_coords = TRUE
) %>%
dplyr::filter(to_stop_name == "MIDTOWN STATION")
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60
# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================
# Function definition (do not modify other parts of the code in this code chunk except for those inside the TASK section)
get_trvt <- function(home, osm, station, midtown){
# TASK ////////////////////////////////////////
# If the code in Step 4 runs fine,
# Replace where it says **YOUR CODE HERE..** below with
# the ENTIRETY of the previous code chunk (i.e., Step 4)
home_1 <- home[1,]
#Re-paste code from step 4 below (works)
#Find the shortest paths from home_1 to all other stations
paths <- sfnetworks::st_network_paths(
x = osm,
from = home_1,
to = station,
type = "shortest"
)
#Compute total distance for each path
dist_all <- sapply(paths$edge_paths, function(x) {
osm %>%
activate("edges") %>%
slice(x) %>%
st_as_sf() %>%
pull(length) %>%
sum()
})
#Find the closest station
closest_station <- which.min(dist_all)
#Get distance to the closest station
closest_dist <- dist_all[closest_station]
#Calculate driving travel time (20 mph → minutes)
trvt_osm_m <- (closest_dist / 1609.34) / 20 * 60
#Create subset of stop_times (7–10AM)
am_stop_time <- tidytransit::filter_stop_times(
gtfs_obj = gtfs,
extract_date = "2025-11-10",
min_departure_time = 7 * 3600,
max_arrival_time = 10 * 3600
)
#Calculate MARTA travel time from the closest station to Midtown
trvt <- tidytransit::travel_times(
filtered_stop_times = am_stop_time,
stop_name = station$stop_name[closest_station],
max_transfers = 1,
return_coords = TRUE
) %>%
dplyr::filter(to_stop_name == "MIDTOWN STATION")
#Combine driving and transit travel time
trvt_gtfs_m <- trvt$travel_time / 60
total_trvt <- trvt_osm_m + trvt_gtfs_m
# //TASK //////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
if (length(total_trvt) == 0) {total_trvt = 0}
return(total_trvt)
# =========== NO MODIFY ZONE ENDS HERE ========================================
}
# Prepare an empty vector
total_trvt <- vector("numeric", nrow(home))
# Fill `total_trvt` object with the calculated time
for (i in 1:nrow(home)){
total_trvt[i] <- get_trvt(home[i,], osm, station, midtown)
}
# cbind the calculated travel time to `home`
home <- home %>%
cbind(trvt = total_trvt)
#Code kept taking a long time so pasted this in:
options(gtfsrouter.stop_dist_check = FALSE)
#Join home points with census polygons to bring in minority data
home_minority <- st_join(home, census %>% select(GEOID, hhinc, pct_minority))
Create two maps and two plots by following the instructions in the code chunk below. Write a brief (maximum 200 words) description summarizing your observations from the maps and plots
# TASK ////////////////////////////////////////
#Map 1: Create an interactive map displaying `census` (polygons) and `home` (points), effectively visualizing household income and travel time to Midtown Station, respectively.
library(dplyr)
library(tidyverse)
census <- census %>%
mutate(hhinc = as.numeric(hhinc))
home <- home %>%
mutate(trvt = as.numeric(trvt))
#Define color palettes
pal_income <- leaflet::colorNumeric(palette = "YlOrRd", domain = census$hhinc, na.color = "transparent")
pal_trvt <- leaflet::colorNumeric(palette = "Blues", domain = home$trvt, na.color = "transparent")
#Create the leaflet map object
map_income_trvt <- leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
setView(lng = -84.3854, lat = 33.7618, zoom = 11) %>%
#Add census polygons (colored by median household income)
addPolygons(
data = census,
color = "white",
weight = 0.5,
fillColor = ~pal_income(hhinc),
fillOpacity = 0.7,
popup = ~paste0(
"<b>Census Tract: </b>", GEOID, "<br>",
"<b>Median HH Income: </b>$", round(hhinc, 0)
)
) %>%
#Add home centroids (colored by travel time-light blue theme)
addCircles(
data = home,
fillColor = ~pal_trvt(trvt),
color = "lightblue",
stroke = FALSE,
fillOpacity = 0.8,
radius = 70,
popup = ~paste0(
"<b>Travel Time: </b>", round(trvt, 1), " minutes"
)
) %>%
#Add legends
addLegend(
position = "bottomright",
pal = pal_income,
values = census$hhinc,
title = "Median HH Income"
) %>%
addLegend(
position = "bottomleft",
pal = pal_trvt,
values = home$trvt,
title = "Travel Time (min)"
)
#Print leaflet map (per in-class module)
map_income_trvt
htmlwidgets::saveWidget(map_income_trvt, "map_income_trvt.html", selfcontained = TRUE)
browseURL("map_income_trvt.html")
#Map 2: Create an interactive map displaying `census` (polygons) and `home` (points) effectively visualizing the percentage of minority population and travel time to Midtown Station, respectively.
names(census)
## [1] "GEOID" "hhinc" "pct_minority" "geometry"
summary(census$pct_minority)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.9512 37.4959 71.5204 64.7885 93.1448 100.0000 3
census <- census %>%
mutate(pct_minority = as.numeric(pct_minority))
home <- home %>%
mutate(trvt = as.numeric(trvt))
pal_minority <- leaflet::colorNumeric(palette = "Purples", domain = census$pct_minority, na.color = "transparent")
pal_trvt <- leaflet::colorNumeric(palette = "Blues", domain = home$trvt, na.color = "transparent")
#Build map
map_minority_trvt <- leaflet() %>%
addProviderTiles(providers$CartoDB.DarkMatter) %>%
setView(lng = -84.3854, lat = 33.7618, zoom = 11) %>%
#Add census polygons (shaded by % minority)
addPolygons(
data = census,
color = "white",
weight = 0.5,
fillColor = ~pal_minority(pct_minority),
fillOpacity = 0.7,
popup = ~paste0(
"<b>Census Tract: </b>", GEOID, "<br>",
"<b>Minority Population: </b>", round(pct_minority, 1), "%"
)
) %>%
#Add home centroids (colored by travel time)
addCircles(
data = home,
fillColor = ~pal_trvt(trvt),
color = "lightblue",
stroke = FALSE,
fillOpacity = 0.8,
radius = 70,
popup = ~paste0("<b>Travel Time: </b>", round(trvt, 1), " minutes")
) %>%
#Add legends
addLegend(
position = "bottomright",
pal = pal_minority,
values = census$pct_minority,
title = "% Minority Population"
) %>%
addLegend(
position = "bottomleft",
pal = pal_trvt,
values = home$trvt,
title = "Travel Time (min)"
)
#Display map in browser to double check
htmlwidgets::saveWidget(map_minority_trvt, "map_minority_trvt.html", selfcontained = TRUE)
browseURL("map_minority_trvt.html")
map_minority_trvt
#Plot 1: Create a scatter plot with a trend line showing the relationship between household income (x-axis) and travel time to Midtown Station (y-axis).
#Scatter plot: Household income vs Travel time
plot_income_trvt <- ggplot(home, aes(x = hhinc, y = trvt)) +
geom_point(alpha = 0.5, color = "steelblue") +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1) +
labs(
title = "Household Income vs Travel Time to Midtown Station",
x = "Median Household Income ($)",
y = "Travel Time (minutes)"
) +
theme_minimal(base_size = 12)
#Print the plot
print(plot_income_trvt)
#Plot 2: Create a scatter plot with a trend line showing the relationship between the percentage of minority population (x-axis) and travel time to Midtown Station (y-axis).
names(home_minority)
## [1] "GEOID.x" "hhinc.x" "pct_minority.x" "trvt"
## [5] "GEOID.y" "hhinc.y" "pct_minority.y" "geometry"
home_minority <- home_minority %>%
mutate(
# keep whichever column version is not NA
GEOID = coalesce(GEOID.x, GEOID.y),
hhinc = coalesce(hhinc.x, hhinc.y),
pct_minority = coalesce(pct_minority.x, pct_minority.y)
) %>%
select(GEOID, hhinc, pct_minority, trvt, geometry)
#Drop geometry and filter missing values
home_minority_df <- home_minority %>%
st_drop_geometry() %>%
filter(!is.na(pct_minority), !is.na(trvt))
#Create scatter plot
plot_minority_trvt <- ggplot(home_minority_df, aes(x = pct_minority, y = trvt)) +
geom_point(alpha = 0.5, color = "purple") +
geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1) +
labs(
title = "% Minority Population vs Travel Time to Midtown Station",
x = "Minority Population (%)",
y = "Travel Time (minutes)"
) +
theme_minimal(base_size = 12)
#Print the plot
print(plot_minority_trvt)
# //TASK //////////////////////////////////////
Summary of findings: Map 1: The first map, “Median Household Income vs Travel Time to Midtown Station,” shows how wealth and access are distributed across Atlanta. Dark red and maroon tracts represent lower-income areas, while tan and gold tracts mark higher-income neighborhoods. The blue circles show estimated travel times to Midtown Station. Wealthier neighborhoods, especially north of downtown and around areas like Buckhead, generally have lighter colors and shorter travel times. In contrast, darker tracts in the south and west, such as near College Park and East Point, have longer travel times, showing a mild income-access gap.
Map 2: The second map, “% Minority Population vs Travel Time to Midtown Station,” uses shades of purple to display racial composition. Dark purple areas indicate higher minority populations, primarily in south and west Atlanta, while light gray areas to the north show lower minority shares. Overlaying travel times suggests that minority-heavy areas experience slightly longer average travel times than predominantly white neighborhoods.
ScatterPlot 1: The third chart, “Household Income vs Travel Time to Midtown Station,” displays a subtle upward trend in the red regression line, meaning that as income increases, travel time slightly rises, though the correlation is weak.
ScatterPlot 2: The fourth chart, “% Minority Population vs Travel Time to Midtown Station,” shows a nearly flat red trend line, suggesting little to no relationship between minority percentage and travel time, implying fairly uniform accessibility across demographic groups and no significant correlation.