In this assignment, I simulated a journey that starts from the starting point (e.g., home), drives to the nearest MARTA rail station, transfers to MARTA rail transit, and finally arrives at Midtown station. There are a few main components in this assignment - home location, road networks, transit network, and destination.

Steps

Step 1. Download Required data from GTFS. Convert it to sf format, extract MARTA rail stations, and clean the stop names to delete duplicate names. Also extract the destination station.

Step 2. Download Required data from Census. Convert Census polygons into centroids and create a subset.

Step 3. Download Required data from OSM. Convert it into an sfnetwork object and clean the network.

Step 4. Simulate a park-and-ride trip (home -> closest station -> Midtown station).

Step 5. Convert what we did in Step 4 into a function so that we can use it to repeat it in a loop.

Step 6. Run a loop to repeat the function from Step 5 to all other home location. Once finished, merge the simulation output back to Census data.

Step 7. Finally, examine whether there is any disparity in using transit to commute to midtown.

library(tidyverse)
library(tmap)
library(units)
library(sf)
library(leaflet)
library(dbscan)
library(sfnetworks)
library(tigris)
library(tidygraph)
library(plotly)
library(osmdata)
library(here)
library(tidytransit)
library(tidycensus)
library(leafsync)
library(ggpubr)
library(gtfsrouter)
library(nominatimlite)
library(dplyr)
library(ggplot2)
epsg <- 4326 # Set EPSG code

Step 1. Download Required data from GTFS.

# TASK ////////////////////////////////////////////////////////////////////////
# Download MARTA (Metropolitan Atlanta Rapid Transit Authority) GTFS data using `read_gtfs()` function and assign it to `gtfs` object
gtfs_url <- "https://www.itsmarta.com/google_transit_feed/google_transit.zip"
gtfs <- read_gtfs(gtfs_url)
# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Edit stop_name to append serial numbers (1, 2, etc.) to remove duplicate names
stop_dist <- stop_group_distances(gtfs$stops, by='stop_name') %>%
  filter(dist_max > 200)

gtfs$stops <- gtfs$stops %>% 
  group_by(stop_name) %>% 
  mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
                               TRUE ~ stop_name))

# Create a transfer table
gtfs <- gtfsrouter::gtfs_transfer_table(gtfs, 
                                        d_limit = 200, 
                                        min_transfer_time = 120)

# NOTE: Converting to sf format uses stop_lat and stop_lon columns contained in gtfs$stops.
#       In the conversion process, stop_lat and stop_lon are converted into a geometry column, and
#       the output sf object do not have the lat lon column anymore.
#       But many other functions in tidytransit look for stop_lat and stop_lon.
#       So I re-create them using mutate().
gtfs <- gtfs %>% gtfs_as_sf(crs = epsg)

gtfs$stops <- gtfs$stops %>% 
  ungroup() %>% 
  mutate(stop_lat = st_coordinates(.)[,2],
         stop_lon = st_coordinates(.)[,1]) 

# Get stop_id for rails and buses
rail_stops <- gtfs$routes %>% 
  filter(route_type %in% c(1)) %>% 
  inner_join(gtfs$trips, by = "route_id") %>% 
  inner_join(gtfs$stop_times, by = "trip_id") %>% 
  inner_join(gtfs$stops, by = "stop_id") %>% 
  group_by(stop_id) %>% 
  slice(1) %>% 
  pull(stop_id)

# Extract MARTA rail stations
station <- gtfs$stops %>% filter(stop_id %in% rail_stops)

# Extract Midtown Station
midtown <- gtfs$stops %>% filter(stop_id == "134")

# Create a bounding box to which we limit our analysis
bbox <- st_bbox(c(xmin = -84.45241, ymin = 33.72109, xmax = -84.35009, ymax = 33.80101), 
                 crs = st_crs(4326)) %>% 
  st_as_sfc()

# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 2. Download Required data from Census

tidycensus::census_api_key(Sys.getenv("CENSUS_API_KEY"))
# TASK ////////////////////////////////////////////////////////////////////////
# Using get_acs() function, download Census Tract level data for 2022 for Fulton, DeKalb, and Clayton in GA.
# and assign it to `census` object.
# Make sure you set geometry = TRUE.

# Required data from the Census ACS:
#  1) Median Household Income (name the column `hhinc`)
#  2) Minority Population (%) (name the column `pct_minority`)
# Note: You may need to download two or more Census ACS variables to calculate minority population (%). "Minority" here can refer to either racial minorities or racial+ethnic minorities -- it's your choice.

census <- get_acs(geography = "tract", 
          state = "GA",
          county = c("Dekalb", "Fulton", "Clayton"), 
          variables = c(hhinc = "B19019_001", white = "B02001_002", total_pop = "B02001_001"),
          year = 2022,
          survey = "acs5", 
          geometry = TRUE, 
          output = "wide" ) %>%  
  select(GEOID, NAME, geometry,
         hhinc = hhincE, white = whiteE, total_pop = total_popE) %>%
  mutate(pct_minority = (total_pop - white) / total_pop)
## Getting data from the 2018-2022 5-year ACS
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##   |                                                                              |                                                                      |   0%  |                                                                              |                                                                      |   1%  |                                                                              |=                                                                     |   2%  |                                                                              |==                                                                    |   2%  |                                                                              |==                                                                    |   3%  |                                                                              |==                                                                    |   4%  |                                                                              |===                                                                   |   4%  |                                                                              |===                                                                   |   5%  |                                                                              |====                                                                  |   6%  |                                                                              |=====                                                                 |   7%  |                                                                              |=====                                                                 |   8%  |                                                                              |======                                                                |   8%  |                                                                              |======                                                                |   9%  |                                                                              |=======                                                               |  10%  |                                                                              |=======                                                               |  11%  |                                                                              |========                                                              |  11%  |                                                                              |=========                                                             |  12%  |                                                                              |=========                                                             |  13%  |                                                                              |==========                                                            |  14%  |                                                                              |===========                                                           |  16%  |                                                                              |============                                                          |  16%  |                                                                              |=============                                                         |  18%  |                                                                              |==============                                                        |  19%  |                                                                              |===============                                                       |  21%  |                                                                              |===============                                                       |  22%  |                                                                              |================                                                      |  22%  |                                                                              |=================                                                     |  24%  |                                                                              |==================                                                    |  25%  |                                                                              |===================                                                   |  27%  |                                                                              |===================                                                   |  28%  |                                                                              |====================                                                  |  29%  |                                                                              |=====================                                                 |  31%  |                                                                              |======================                                                |  31%  |                                                                              |=======================                                               |  33%  |                                                                              |========================                                              |  35%  |                                                                              |=========================                                             |  35%  |                                                                              |==========================                                            |  37%  |                                                                              |===========================                                           |  38%  |                                                                              |===========================                                           |  39%  |                                                                              |============================                                          |  40%  |                                                                              |==============================                                        |  43%  |                                                                              |===============================                                       |  44%  |                                                                              |================================                                      |  46%  |                                                                              |==================================                                    |  49%  |                                                                              |======================================                                |  54%  |                                                                              |=============================================                         |  64%  |                                                                              |=============================================                         |  65%  |                                                                              |===================================================                   |  72%  |                                                                              |=============================================================         |  87%  |                                                                              |==============================================================        |  89%  |                                                                              |================================================================      |  91%  |                                                                              |=================================================================     |  93%  |                                                                              |===================================================================== |  99%  |                                                                              |======================================================================| 100%
# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
census <- census %>% 
  st_transform(crs = 4326) %>% 
  separate(col = NAME, into = c("tract", "county", "state"), sep = ", ")
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 600 rows [1, 2, 3, 4, 5,
## 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
# Convert it to POINT at polygon centroids and extract those that fall into bbox
# and assign it into `home` object
home <- census %>% st_centroid() %>% .[bbox,]
## Warning: st_centroid assumes attributes are constant over geometries
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 3. Download Required data from OSM.

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Get OSM data using opq() function and bbox object defined in the previous code chunk.
# 2. Specify arguments for add_osm_feature() function using 
#    key = 'highway' and 
#    value = c("motorway", "trunk", "primary", "secondary", "tertiary", "residential", 
#              "motorway_link", "trunk_link", "primary_link", "secondary_link", 
#              "tertiary_link", "residential_link", "unclassified")
# 3. Convert the OSM data into an sf object using osmdata_sf() function
# 4. Convert osmdata polygons into lines using osm_poly2line() function
  
 # Use the previously defined bbox for OSM
# Get OSM road data
osm_road <- opq(bbox = bbox) %>%
  add_osm_feature(key = 'highway', 
                  value = c("motorway", "motorway_link",
                            "trunk", "trunk_link", 
                            "primary", "primary_link",
                            "secondary", "secondary_link",
                            "tertiary", "residential")) %>%
  osmdata_sf() %>% 
  osm_poly2line()

names(osm_road)
## [1] "bbox"              "overpass_call"     "meta"             
## [4] "osm_points"        "osm_lines"         "osm_polygons"     
## [7] "osm_multilines"    "osm_multipolygons"
# //TASK //////////////////////////////////////////////////////////////////////

#Inspect interactive
tmap_mode('view')
## tmap mode set to interactive viewing
# Plot OSM lines with highways colored
tm_shape(osm_road$osm_lines) +
  tm_lines(col = "highway")
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_road$osm_lines into sfnetworks using as_sfnetwork() function
# 2. Activate edges
# 3. Clean the network using edge_is_multiple(), edge_is_loop(), to_spatial_subdivision(), to_spatial_smooth()
# 4. Assign the cleaned network to an object named 'osm'

osm <- osm_road$osm_line %>%
  select(geometry) %>% # Keep only essential columns
  sfnetworks::as_sfnetwork(directed = FALSE) %>% 
  activate("edges") %>% 
  filter(!edge_is_multiple()) %>%
  filter(!edge_is_loop()) %>%
  convert(sfnetworks::to_spatial_subdivision) %>%
  convert(sfnetworks::to_spatial_smooth)
# ...
# //TASK //////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# Add a new column 'length' to the edges part of the object 'osm_clean'
osm <- osm %>%
  mutate(length = edge_length())

# //TASK //////////////////////////////////////////////////////////////////////

Step 4. Simulate a park-and-ride trip (home -> closest station -> Midtown station).

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================

# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest path from `home_1` to all other stations
# using st_network_paths() function.
# Find the shortest path from `home_1` to all other stations
paths <- st_network_paths(osm,  #
                           from = home_1,  
                           to = station,
                          type = "shortest")
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Using the `paths` object, get network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
  osm %>% 
    activate("nodes") %>% 
    slice(paths$node_paths[[x]]) %>% 
    st_as_sf("edges") %>% 
    pull(length) %>% 
    sum()
}) %>% unlist() 

# Replace zeros with a large value.
if (any(dist_all == 0)){
  dist_all[dist_all == 0] <- max(dist_all)
}
# Find the index of the closest station.
closest_index <- which.min(dist_all)

# Provide closest_station object
closest_station <- station[closest_index,]

# Find the distance to the closest station.
closest_dist <- min(dist_all)

# Calculate how long it takes to traverse `closest_dist` 
# assuming we drive at 30 miles/hour speed.
# Store the output in trvt_osm_m.
car_speed <- set_units(30, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%  # Distance divided by 30 mile/h
  as.vector(.)
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# 1. From `osm` object, activate nodes part and
# 2. use `closest_index` to extract the selected path
paths_closest <- osm %>%
  activate("nodes") %>%
  slice(paths$node_paths[[closest_index]])
# //TASK //////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# Use filter_stop_times() function to create a subset of stop_times data table
# for date = 2021-11-14, minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs, "2024-11-14", "07:00:00", "10:00:00")
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Use travel_times() function to calculate travel times from the `closest_station` 
#    to all other stations during time specified in am_stop_time. 
# 2. Filter the row for which the value of 'to_stop_name' column 
#    equals midtown$stop_name. Assign it into `trvt` object.
trvt <- travel_times(am_stop_time, closest_station, max_transfers = '1') %>%
  filter(to_stop_name==midtown$stop_name)# Calculate travel times
  
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60

# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================

Step 5. Convert Step 4 into a function

# Function definition (do not modify other parts of the code in this code chunk except for those inside the TASK section)

get_trvt <- function(home, osm, station, midtown){
  
  # TASK ////////////////////////////////////////
  # If the code in Step 4 runs fine,
  # Replace where it says **YOUR CODE HERE..** below with 
  # the ENTIRETY of the code in the previous code chunk (i.e., Step 4)
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it `home_1`
home_1 <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================

# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest path from `home_1` to all other stations
# using st_network_paths() function.
# Find the shortest path from `home_1` to all other stations
paths <- st_network_paths(osm,  #
                           from = home_1,  
                           to = station,
                          type = "shortest")
# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Using the `paths` object, get network distances from `home_1` to all other stations.
dist_all <- map_dbl(1:nrow(paths), function(x){
  osm %>% 
    activate("nodes") %>% 
    slice(paths$node_paths[[x]]) %>% 
    st_as_sf("edges") %>% 
    pull(length) %>% 
    sum()
}) %>% unlist() 

# Replace zeros with a large value.
if (any(dist_all == 0)){
  dist_all[dist_all == 0] <- max(dist_all)
}

# Find the index of the closest station.
closest_index <- which.min(dist_all)

# Provide closest_station object
closest_station <- station[closest_index,]

# Find the distance to the closest station.
closest_dist <- min(dist_all)

# Calculate how long it takes to traverse `closest_dist` 
# assuming we drive at 30 miles/hour speed.
# Store the output in trvt_osm_m.
car_speed <- set_units(30, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%  # Distance divided by 30 mile/h
  as.vector(.)
# =========== NO MODIFY ZONE ENDS HERE ========================================

# TASK ////////////////////////////////////////////////////////////////////////
# 1. From `osm` object, activate nodes part and
# 2. use `closest_index` to extract the selected path
paths_closest <- osm %>%
  activate("nodes") %>%
  slice(paths$node_paths[[closest_index]])
# //TASK //////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# Use filter_stop_times() function to create a subset of stop_times data table
# for date = 2021-11-14, minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs, "2024-11-14", "07:00:00", "10:00:00")
# //TASK //////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Use travel_times() function to calculate travel times from the `closest_station` 
#    to all other stations during time specified in am_stop_time. 
# 2. Filter the row for which the value of 'to_stop_name' column 
#    equals midtown$stop_name. Assign it into `trvt` object.
trvt <- travel_times(am_stop_time, closest_station, max_transfers = '1') %>%
  filter(to_stop_name==midtown$stop_name)# Calculate travel times
  
# //TASK //////////////////////////////////////////////////////////////////////
  
  
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60

# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- trvt_osm_m + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================
  
  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  if (length(total_trvt) == 0) {total_trvt = 0}

  return(total_trvt)
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

Step 6. Apply the function for the whole study area

# Prepare an empty vector
total_trvt <- vector("numeric", nrow(home))

# Apply the function for all Census Tracts
# Fill `total_trvt` object with the calculated time
for (i in 1:nrow(home)){
  total_trvt[i] <- get_trvt(home[i,], osm, station, midtown)
}

# Cbind the calculated travel time back to `home`
home_done <- home %>% 
  cbind(trvt = total_trvt)

Step 7. Create maps and plots

# Map
# Set tmap mode
tmap_mode('view')
## tmap mode set to interactive viewing

Create map

# Map for household income
tm_shape(census[census$GEOID %in% home$GEOID,]) + 
  tm_polygons(col = "hhinc", palette = 'GnBu') + 
  tm_shape(home_done) + 
  tm_dots(col = "trvt", palette = 'Reds', size = 0.1)
# Second map showing Percentage Minority
tm_shape(census[census$GEOID %in% home$GEOID,]) + 
  tm_polygons(col = "pct_minority", palette = 'GnBu') + 
  tm_shape(home_done) + 
  tm_dots(col = "trvt", palette = 'Reds', size = 0.1)

Create ggplots

# Plot for household income
inc <- ggplot(data = home_done,
              aes(x = hhinc, y = trvt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Median Annual Household Income",
       y = "Park-and-ride Travel Time from Home to Midtown Station") +
  theme_bw()
minority <- ggplot(data = home_done,
                   aes(x = pct_minority, y = trvt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Minority Population (%)",
       y = "Park-and-ride Travel Time from Home to Midtown Station") +
  theme_bw()
# Arrange the plots using ggpubr
ggpubr::ggarrange(inc, minority, ncol = 2, nrow = 1)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).
## `geom_smooth()` using formula = 'y ~ x'

Conclusions

Transit equity is a core aspect of urban planning. This analysis explores the spatial relationships between median household income and travel time, as well as the correlation between minority population and travel time.

The first plot reveals a slightly positive correlation between income and travel time, suggesting that as income increases, so does the time it requires to get to Midtown. Income doesn’t appear to have a major impact on travel time, however, with some high income areas having long travel times and some low income areas having short commutes.

The second plot shows a slightly negative relationship between the percentage of minority population and travel time. This suggests that areas with a higher minority population have shorter commute times to Midtown, despite there being a weak correlation.

Ultimately, the analysis reveals that areas with higher household incomes tend to be further from Atlanta’s “city center” of Midtown, and thus have longer park-and-ride travel times to Midtown Station. Given the slight trends and weak relationships, it appears that further analysis of factors other than income or race may be more powerful determinants of commuting times in the Atlanta region. Implementing variables such as transit infrastructure, vehicle ownership, or job location distribution in the analysis may be able to provide further insights into commuting patterns both in Atlanta and beyond.