Park-and-Ride Simulation

This assignment focuses on simulating a multimodal journey that starts from the centroid of each Census Tract, involves driving to the nearest MARTA station, and continues by taking the MARTA to the Midtown station. The analysis aims to compare how travel times differ across Census Tracts.

R Code Steps

Step 0 - Importing the necessary packages that will be used throughout the workflow

library(tidycensus)
library(tidytransit)
library(sf)
library(tmap)
library(tidyverse)
library(osmdata)
library(sfnetworks)
library(tidygraph)
library(gtfsrouter)
library(units)
library(ggplot2)

Step 1- Download the required GTFS data. Convert it to sf format, extract MARTA rail stations, and clean the stop names to remove duplicates. Also, extract the destination station.

# TASK ////////////////////////////////////////////////////////////////////////
# Download MARTA GTFS data and assign it to `gtfs` object
gtfs <- read_gtfs("https://www.itsmarta.com/google_transit_feed/google_transit.zip")
# //TASK //////////////////////////////////////////////////////////////////////



# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Edit stop_name to append serial numbers (1, 2, etc.) to remove duplicate names
stop_dist <- stop_group_distances(gtfs$stops, by='stop_name') %>%
  filter(dist_max > 200)

gtfs$stops <- gtfs$stops %>% 
  group_by(stop_name) %>% 
  mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
                               TRUE ~ stop_name))

# Create a transfer table
gtfs <- gtfsrouter::gtfs_transfer_table(gtfs, 
                                        d_limit = 200, 
                                        min_transfer_time = 120)

# NOTE: Converting to sf format uses stop_lat and stop_lon columns contained in gtfs$stops.
#       In the conversion process, stop_lat and stop_lon are converted into a geometry column, and
#       the output sf object do not have the lat lon column anymore.
#       But many other functions in tidytransit look for stop_lat and stop_lon.
#       So I re-create them using mutate().
gtfs <- gtfs %>% gtfs_as_sf(crs = 4326)

gtfs$stops <- gtfs$stops %>% 
  ungroup() %>% 
  mutate(stop_lat = st_coordinates(.)[,2],
         stop_lon = st_coordinates(.)[,1]) 

# Get stop_id for rails and buses
rail_stops <- gtfs$routes %>% 
  filter(route_type %in% c(1)) %>% 
  inner_join(gtfs$trips, by = "route_id") %>% 
  inner_join(gtfs$stop_times, by = "trip_id") %>% 
  inner_join(gtfs$stops, by = "stop_id") %>% 
  group_by(stop_id) %>% 
  slice(1) %>% 
  pull(stop_id)

# Extract MARTA rail stations
station <- gtfs$stops %>% filter(stop_id %in% rail_stops)

# Extract Midtown Station
midtown <- gtfs$stops %>% filter(stop_id == "134")

# Create a bounding box to which we limit our analysis
bbox <- st_bbox(c(xmin = -84.45241, ymin = 33.72109, xmax = -84.35009, ymax = 33.80101), 
                 crs = st_crs(4326)) %>% 
  st_as_sfc()

# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 2- Download the required Census data. Convert Census Tract polygons into centroids and create a subset for analysis.

# TASK ////////////////////////////////////////////////////////////////////////
# Specify Census API key whichever you prefer using census_api_key() function
census_api_key(Sys.getenv('CENSUS_API'))
# //TASK //////////////////////////////////////////////////////////////////////



# TASK ////////////////////////////////////////////////////////////////////////
# Using get_acs() function, download Census Tract level data for 2022 for Fulton, DeKalb, and Clayton in GA.
# and assign it to `census` object.
# Make sure you set geometry = TRUE.

# Required data from the Census ACS:
#  1) Median Household Income (name the column `hhinc`)
#  2) Minority Population (%) (name the column `pct_minority`)
# Note: You may need to download two or more Census ACS variables to calculate minority population (%). "Minority" here can refer to either racial minorities or racial+ethnic minorities -- it's your choice.

census <- get_acs(geography = "tract", 
                      state = "GA",
                      county = c("Fulton","Dekalb","Clayton"),
                      variables = c(hhinc = 'B19013_001',
                                    non_hispanic_white = "B03002_003E",
                                    race_ethnic_total = "B03002_001E"),
                      year = 2022,
                      survey = "acs5",      
                      geometry = TRUE, 
                      output = "wide")%>% 
  mutate(pct_minority = ((race_ethnic_total - non_hispanic_white)/race_ethnic_total)*100)

# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Convert the CRS of `census` to GCS

census <- census %>%  st_transform(4326)

# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# get centroids of `census` polygons, 
# extract centroids that fall inside the `bbox`, 
# and assign it to `home` object.

home <- census %>%
  st_centroid() %>%
  st_intersection(bbox)

# //TASK //////////////////////////////////////////////////////////////////////

Step 3- Download the required OSM data. Convert it into an sfnetwork object and clean the network.

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Get OSM road data for all road types for the `bbox` area.
# 3. Convert the OSM data into an sf object
# 4. Convert osmdata polygons into lines

osm_road <- opq(bbox = bbox) %>%
  add_osm_feature(key = 'highway', 
                  value = c("motorway", "motorway_link",
                            "trunk", "trunk_link", 
                            "primary", "primary_link",
                            "secondary", "secondary_link",
                            "tertiary", "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_road$osm_line into sfnetwork
# 2. Activate edges
# 3. Clean the network using the methods we learned
# 4. Assign the cleaned network to an object named 'osm'

osm <- osm_road$osm_line %>%
  select(geometry)%>%
  sfnetworks::as_sfnetwork(directed = FALSE) %>%
  activate("edges") %>%
  filter(!edge_is_multiple()) %>% 
  filter(!edge_is_loop()) %>%
  convert(sfnetworks::to_spatial_subdivision) %>%
  convert(sfnetworks::to_spatial_smooth)

message(str_c("After simplification, there are ", osm %>% st_as_sf("edges") %>% nrow(), " edges."))
## After simplification, there are 7471 edges.
  # ...
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Add a new column named 'length' to the edges part of the object `osm`.
osm <- osm %>% mutate(length = edge_length())

  # ...
# //TASK //////////////////////////////////////////////////////////////////////

Step 4- Simulate a park-and-ride trip (from a test origin → closest station → Midtown station).

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================


# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest paths from `home_1` to all other stations
paths <- st_network_paths(
  osm,
  from = home_1,
  to = station,
  type = "shortest"
)
# //TASK //////////////////////////////////////////////////////////////////////

  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get shortest network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
  osm %>%
    activate("nodes") %>% 
    slice(paths$node_paths[[x]]) %>% 
    st_as_sf("edges") %>% 
    pull(length) %>% 
    sum()
}) %>% unlist()

# Replace zeros with a large value.
if (any(dist_all == 0)){
  dist_all[dist_all == 0] <- max(dist_all)
}

# TASK ////////////////////////////////////////////////////////////////////////
# Find the closest station and assign the value to `closest_station`
closest_index <- which.min(dist_all)
closest_station <- station[closest_index,]
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Get the distance to the closest station.

closest_dist <- min(dist_all)

# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Calculate travel time based on the `closest_dist` assuming we drive at 20 miles/hour speed.
# Assign the value to `trvt_osm_m` object.
car_speed <- set_units(20, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%  
  as.vector(.)
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Create a subset of stop_times data table for date = 2025-11-10, 
# minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs, "2025-11-10", "07:00:00", "10:00:00" )
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# 1. Calculate travel times from the `closest_station` to all other stations 
#    during time specified in `am_stop_time`. Allow ONE transfer.
# 2. Filter the row where the destination is Midtown station. 
# 3. Assign it to `trvt` object.
trvt <-  travel_times(am_stop_time,
                     closest_station,
                     max_transfers = 1) %>%
        filter(to_stop_name == midtown$stop_name)
# //TASK //////////////////////////////////////////////////////////////////////


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60

# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 5- Turn the simulation process from Step 4 into a reusable function.

# Function definition (do not modify other parts of the code in this code chunk except for those inside the TASK section)

get_trvt <- function(home, osm, station, midtown){
  
  # TASK ////////////////////////////////////////
  # If the code in Step 4 runs fine,
  # Replace where it says **YOUR CODE HERE..** below with 
  # the ENTIRETY of the previous code chunk (i.e., Step 4)
  
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================


# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest paths from `home_1` to all other stations
paths <- st_network_paths(
  osm,
  from = home_1,
  to = station,
  type = "shortest"
)
# //TASK //////////////////////////////////////////////////////////////////////

  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Get shortest network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
  osm %>%
    activate("nodes") %>% 
    slice(paths$node_paths[[x]]) %>% 
    st_as_sf("edges") %>% 
    pull(length) %>% 
    sum()
}) %>% unlist()

# Replace zeros with a large value.
if (any(dist_all == 0)){
  dist_all[dist_all == 0] <- max(dist_all)
}

# TASK ////////////////////////////////////////////////////////////////////////
# Find the closest station and assign the value to `closest_station`
closest_index <- which.min(dist_all)
closest_station <- station[closest_index,]
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Get the distance to the closest station.

closest_dist <- min(dist_all)

# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Calculate travel time based on the `closest_dist` assuming we drive at 20 miles/hour speed.
# Assign the value to `trvt_osm_m` object.
car_speed <- set_units(20, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%  
  as.vector(.)
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Create a subset of stop_times data table for date = 2025-11-10, 
# minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs, "2025-11-10", "07:00:00", "10:00:00" )
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# 1. Calculate travel times from the `closest_station` to all other stations 
#    during time specified in `am_stop_time`. Allow ONE transfer.
# 2. Filter the row where the destination is Midtown station. 
# 3. Assign it to `trvt` object.
trvt <-  travel_times(am_stop_time,
                     closest_station,
                     max_transfers = 1) %>%
        filter(to_stop_name == midtown$stop_name)
# //TASK //////////////////////////////////////////////////////////////////////


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60

# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
  
  # //TASK //////////////////////////////////////

  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  if (length(total_trvt) == 0) {total_trvt = 0}

  return(total_trvt)
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

Step 6- Apply the function from Step 5 iteratively to all home locations.

# Prepare an empty vector
total_trvt <- vector("numeric", nrow(home))

# Apply the function to all Census Tracts
# Fill `total_trvt` object with the calculated time
for (i in 1:nrow(home)){
  total_trvt[i] <- get_trvt(home[i,], osm, station, midtown)
}

# cbind the calculated travel time to `home`
home <- home %>% 
  cbind(trvt = total_trvt)

Step 7- Finally, create maps and plots to analyze whether there are any disparities in transit accessibility when commuting to Midtown.

# TASK ////////////////////////////////////////
# Create an interactive map displaying `census` (polygons) and `home` (points), effectively visualizing household income and travel time to Midtown Station, respectively.

tmap_mode('view')
## ℹ tmap modes "plot" - "view"
## ℹ toggle with `tmap::ttm()`
tm_shape(census[census$GEOID %in% home$GEOID,]) + 
  tm_polygons(col = "hhincE", palette = 'Blues',
              title = "Median HH Income") + 
  tm_shape(home) + 
  tm_dots(col = "trvt", palette = 'Reds', size = 0.5) +
  tm_layout(title = "Map 1: Median Household Income and Travel Time(trvt) to Midtown Station")
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_tm_polygons()`: migrate the argument(s) related to the scale of
## the visual variable `fill` namely 'palette' (rename to 'values') to fill.scale
## = tm_scale(<HERE>).[v3->v4] `tm_polygons()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').[v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`[cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Reds" is named
## "brewer.reds"Multiple palettes called "reds" found: "brewer.reds", "matplotlib.reds". The first one, "brewer.reds", is returned.
# Create an interactive map displaying `census` (polygons) and `home` (points) effectively visualizing the percentage of minority population and travel time to Midtown Station, respectively.

tm_shape(census[census$GEOID %in% home$GEOID,]) + 
  tm_polygons(col = "pct_minority", palette = 'PuRd',
              title = "Minority Population (%)") + 
  tm_shape(home) + 
  tm_dots(col = "trvt", palette = 'Blues', size = 0.5)+
  tm_layout(title = "Map 2: Percentage of Minority Population and Travel Time(trvt) to Midtown Station")
## 
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_tm_polygons()`: migrate the argument(s) related to the scale of
## the visual variable `fill` namely 'palette' (rename to 'values') to fill.scale
## = tm_scale(<HERE>).[v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`[cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "PuRd" is named
## "brewer.pu_rd"Multiple palettes called "pu_rd" found: "brewer.pu_rd", "matplotlib.pu_rd". The first one, "brewer.pu_rd", is returned.
## [cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Blues" is named
## "brewer.blues"Multiple palettes called "blues" found: "brewer.blues", "matplotlib.blues". The first one, "brewer.blues", is returned.
# Create a scatter plot with a trend line showing the relationship between household income (x-axis) and travel time to Midtown Station (y-axis).

income_plot <- ggplot(data = home,
              aes(x = hhincE, y = trvt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Scatter Plot 1: Travel Time vs. Median Household Income",
       x = "Median Annual Household Income",
       y = "Travel Time from Home to Midtown Station") +
  theme_bw()
income_plot
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).

# Create a scatter plot with a trend line showing the relationship between the percentage of minority population (x-axis) and travel time to Midtown Station (y-axis).

minority_plot <- ggplot(data = home,
                   aes(x = pct_minority, y = trvt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Scatter Plot 2: Travel Time vs. Percentage of Minority Population",
       x = "Percentage of Minority Population",
       y = "Travel Time from Home to Midtown Station") +
  theme_bw()

minority_plot
## `geom_smooth()` using formula = 'y ~ x'

# //TASK //////////////////////////////////////

Inferences

The analysis reveals distinct spatial and socioeconomic patterns in travel time to Midtown Station across Atlanta’s census tracts. Travel times vary notably by location, with longer durations generally observed in the northeastern and some southern neighborhoods. A slight positive relationship between median annual household income and travel time suggests that higher-income areas tend to have longer commutes to Midtown, likely because these neighborhoods are located farther from MARTA stations or within more suburban, car-oriented zones. In contrast, tracts with higher percentages of minority populations, mainly concentrated in the southwestern and south-central parts of the city, exhibit shorter travel times, indicating closer proximity to transit corridors and greater reliance on public transportation. However, the scatter plot comparing travel time with minority population percentage shows no strong correlation, implying that demographic concentration alone does not fully explain accessibility differences.

The results highlight a clear socio-spatial divide. The northern, wealthier, and less diverse neighborhoods experience longer travel times, while southern, lower-income, and more racially diverse areas benefit from better transit access. This reflects the historical development of Atlanta and similar U.S. cities shaped by car-oriented suburban sprawl, where affluent households moved outward, leaving lower-income and minority communities near dense urban cores with stronger public transit connectivity.