Task description

There are a few main components in this assignment - home location, road networks, transit network, and destination. We will simulate a journey that starts from the starting point (e.g., home), drives to nearest MARTA rail station, transfers to MARTA rail transit, and finally arrives at Midtown station (i.e., an employment center). The following is a list of tasks and data we need for this analysis.

Step 1. Download Required data from GTFS. Convert it to sf format, extract MARTA rail stations, and clean the stop names to delete duplicate names. Also extract the destination station.

Step 2. Download Required data from Census. Convert Census polygons into centroids and subsetting.

Step 3. Download Required data from OSM. Convert it to sfnetwork object and clean the network.

Step 4. Try the simulation for just one home location as a pilot test.

Step 5. Convert the steps we identified in Step 4 into a function so that we can use it to repeat it in a loop.

Step 6. Run a loop to repeat what we did in Step 5 to all other home location using the function from Step 6. Once finished, merge the simulation output back to Census data.

Step 7. Finally, examine whether there is any disparity in using transit to commute to midtown.

Before we start, libraries first..

library(tidyverse)
library(tmap)
library(ggplot2)
library(units)
library(sf)
library(leaflet)
library(tidycensus)
library(leafsync)
library(dbscan)
library(sfnetworks)
library(tigris)
library(tidygraph)
library(plotly)
library(osmdata)
library(here)
library(tidytransit)
library(units)
library(leaflet)
library(tidycensus)
library(leafsync)
library(gtfsrouter)
library(knitr)
library(ggpubr)
epsg <- 4326

Step 1. Download Required data from GTFS.

# TASK ////////////////////////////////////////////////////////////////////////
# Download GTFS data from [here](https://opendata.atlantaregional.com/datasets/marta-gtfs-latest-feed/about) and save it in your hard drive. Read the file using `read_gtfs()` function and assign it in `gtfs` object
gtfs <- read_gtfs('/Users/helenalindsay/Documents/Fall_23/CP8883/MARTA_GTFS_Latest_Feed.zip')
# //TASK //////////////////////////////////////////////////////////////////////


# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Edit stop_name to append serial numbers (1, 2, etc.) to remove duplicate names
stop_dist <- stop_group_distances(gtfs$stops, by='stop_name') %>%
  filter(dist_max > 200)

gtfs$stops <- gtfs$stops %>% 
  group_by(stop_name) %>% 
  mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
                               TRUE ~ stop_name))

# Create a transfer table
gtfs$transfers <- gtfsrouter::gtfs_transfer_table(gtfs, 
                                     d_limit = 200, 
                                     min_transfer_time = 120)

# NOTE: Converting to sf format uses stop_lat and stop_lon columns contained in gtfs$stops.
#       In the conversion process, stop_lat and stop_lon are converted into a geometry column, and
#       the output sf object do not have the lat lon column anymore.
#       But many other functions in tidytransit look for stop_lat and stop_lon.
#       So I re-create them using mutate().
gtfs <- gtfs %>% gtfs_as_sf(crs = epsg)

gtfs$stops <- gtfs$stops %>% 
  ungroup() %>% 
  mutate(stop_lat = st_coordinates(.)[,2],
         stop_lon = st_coordinates(.)[,1]) 

# Get stop_id for rails and buses
rail_stops <- gtfs$routes %>% 
  filter(route_type %in% c(1)) %>% 
  inner_join(gtfs$trips, by = "route_id") %>% 
  inner_join(gtfs$stop_times, by = "trip_id") %>% 
  inner_join(gtfs$stops, by = "stop_id") %>% 
  group_by(stop_id) %>% 
  slice(1) %>% 
  pull(stop_id)

# Extract MARTA rail stations
station <- gtfs$stops %>% filter(stop_id %in% rail_stops)

# Extract Midtown Station
midtown <- gtfs$stops %>% filter(stop_id == "134")
save(gtfs, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/gtfs.RData")
# Create a bounding box to which we limit our analysis
bbox <- st_bbox(c(xmin = -84.45241, ymin = 33.72109, xmax = -84.35009, ymax = 33.80101), 
                 crs = st_crs(4326)) %>% 
  st_as_sfc()

# =========== NO MODIFY ZONE ENDS HERE ========================================
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/gtfs.RData")
kable(head(gtfs$stops))
stop_id stop_code stop_name geometry stop_lat stop_lon
907933 27 HAMILTON E HOLMES STATION POINT (-84.4693 33.75455) 33.75455 -84.46930
908023 28 WEST LAKE STATION POINT (-84.44533 33.75333) 33.75333 -84.44533
907906 39 WEST LAKE STATION POINT (-84.44557 33.75325) 33.75325 -84.44557
907907 40 HAMILTON E HOLMES STATION POINT (-84.46982 33.75452) 33.75452 -84.46982
908051 53 DUNWOODY STATION POINT (-84.34421 33.92086) 33.92086 -84.34421
908054 57 LINDBERGH CENTER STATION POINT (-84.36936 33.82339) 33.82339 -84.36936

Step 2. Download Required data from Census

# TASK ////////////////////////////////////////////////////////////////////////
# Specify Census API key whichever you prefer using census_api_key() function
tidycensus::census_api_key(Sys.getenv('census_api'))
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Using get_acs() function, download Census Tract level data for 2020 for Fulton, DeKalb, and Clayton in GA.
# and assign it into `census` object.
# Make sure you set geometry = TRUE.
#variables to download = c("hhinc" = 'B19013_001',
#                           "r_tot" = "B02001_001",
#                           "r_wh" = "B02001_002",
#                           "r_bl" = "B02001_003",
#                           "tot_hh" = "B25044_001",
#                           "own_novhc" = "B25044_003",
#                           "rent_novhc" = "B25044_010")
census <- tidycensus::get_acs(geography = "tract",
                   variables = c("hhinc" = 'B19013_001',
                           "r_tot" = "B02001_001",
                           "r_wh" = "B02001_002",
                           "r_bl" = "B02001_003",
                           "tot_hh" = "B25044_001",
                           "own_novhc" = "B25044_003",
                           "rent_novhc" = "B25044_010"),
                   year = 2020,
                   output = "wide",
                   state = "GA",
                   county = c("Fulton", "DeKalb", "Clayton"),
                   geometry = TRUE)
# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
census <- census %>% 
  st_transform(crs = 4326) %>% 
  separate(col = NAME, into = c("tract", "county", "state"), sep = ", ")

# Convert it to POINT at polygon centroids and extract those that fall into bbox
# and assign it into `home` object
home <- census %>% st_centroid() %>% .[bbox,]
# =========== NO MODIFY ZONE ENDS HERE ========================================
save(census, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/census.RData")
save(home, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/home.RData")
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/home.RData")
kable(head(home))
GEOID tract county state hhincE hhincM r_totE r_totM r_whE r_whM r_blE r_blM tot_hhE tot_hhM own_novhcE own_novhcM rent_novhcE rent_novhcM geometry
4 13121002300 Census Tract 23 Fulton County Georgia 28611 16783 1384 322 63 70 1293 323 596 151 52 40 208 110 POINT (-84.4186 33.76721)
5 13121004200 Census Tract 42 Fulton County Georgia 23906 4494 2675 577 252 146 2423 606 1470 235 39 54 647 99 POINT (-84.41824 33.73906)
6 13121003100 Census Tract 31 Fulton County Georgia 82309 39452 2290 397 1324 243 900 361 1251 332 10 14 311 293 POINT (-84.35289 33.75186)
8 13121003600 Census Tract 36 Fulton County Georgia 68859 19118 1132 267 190 96 891 264 893 221 14 11 279 127 POINT (-84.40295 33.75188)
27 13121001204 Census Tract 12.04 Fulton County Georgia 90040 28919 1845 244 1470 189 54 38 1216 198 134 165 93 78 POINT (-84.38016 33.77878)
33 13121002802 Census Tract 28.02 Fulton County Georgia 25054 5296 1487 350 325 160 1028 346 587 120 0 14 229 62 POINT (-84.38011 33.75689)

Step 3. Download Required data from OSM.

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Get OSM data using opq() function and bbox object defined in the previous code chunk.
# 2. Specify arguments for add_osm_feature() function using 
#    key = 'highway' and 
#    value = c("motorway", "trunk", "primary", "secondary", "tertiary", "residential", 
#              "motorway_link", "trunk_link", "primary_link", "secondary_link", 
#              "tertiary_link", "residential_link", "unclassified")
# 3. Convert the OSM data into a sf object using osmdata_sf() function
# 4. Convert osmdata polygons into lines using osm_poly2line() function

osm_road <- opq(bbox = bbox) %>%
  add_osm_feature(key = 'highway', 
                  value = c("motorway", "trunk", "primary", "secondary", "tertiary", "residential", "motorway_link", "trunk_link", "primary_link", "secondary_link", "tertiary_link", "residential_link", "unclassified")) %>%
  osmdata_sf() %>% 
  osm_poly2line()
# //TASK //////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_road$osm_lines to sfnetworks using as_sfnetwork() function
# 2. Activate edges
# 3. Clean the network using edge_is_multiple(), edge_is_loop(), to_spatial_subdivision(), to_spatial_smooth()
# 4. Assign the cleaned network to an object named 'osm'

osm <- osm_road$osm_lines %>% 
  sfnetworks::as_sfnetwork(directed = FALSE) %>%
  activate("edges") %>%
  filter(!edge_is_multiple()) %>%
  filter(!edge_is_loop())%>%
  convert(.,sfnetworks::to_spatial_subdivision)

osm <- osm %>%
  rename_all(~gsub(":", "_", .))%>%
  convert(sfnetworks::to_spatial_smooth)
# //TASK //////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# Add a new column named 'length' to the edges part of the object `osm`.
osm <- osm %>% mutate(length = edge_length())
# //TASK //////////////////////////////////////////////////////////////////////
tmap_mode('plot')

tm_shape(osm_road$osm_lines) + tm_lines(col = "highway")

Step 4. Try the simulation for just one home location as a pilot test.

load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/home.RData")
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it as `origin`
origin <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================


# TASK ////////////////////////////////////////////////////////////////////////
# Find a station that is closest to the origin by Euclidean distance
# using st_distance() function.
dist_to_stations <- st_distance(station, origin)
closest_station <- station[which.min(dist_to_stations), ]
# //TASK //////////////////////////////////////////////////////////////////////


# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest path from origin to station
# using st_network_paths() function.
paths <- st_network_paths(osm, from = origin, to = closest_station, type = "shortest")
# //TASK //////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Calculate the length of edges in the shortest route to the closest MARTA station
closest_dist <- osm %>% 
  activate("nodes") %>% 
  # Slice the part that corresponds with the shortest route
  slice(paths$node_paths[[1]]) %>% 
  # Extract "edges" from the sfnetworks object as a separate sf object
  st_as_sf("edges") %>% 
  # Extract 'length' column and calculate sum
  pull(length) %>% 
  sum()

# If the routing function is not working, assume the route length is 150% of Euclidean distance
if (closest_dist == set_units(0, m)){
  closest_dist <- dist_to_stations[which.min(dist_to_stations)] * 1.5
}

# Calculate how to long it takes to traverse `closest_dist` 
# assuming we drive at 30 miles/hour speed.
# Store the output in `trvt_osm_m`.
car_speed <- set_units(30, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%  # Distance divided by 30 mile/h
  as.vector(.)


# =========== NO MODIFY ZONE ENDS HERE ========================================

# TASK ////////////////////////////////////////////////////////////////////////
# Use filter_stop_times() function to create a subset of stop_times data table
# for date = 2021-08-14, minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
am_stop_time <- filter_stop_times(gtfs_obj = gtfs, 
                                  extract_date = "2021-08-14",
                                  min_departure_time = "07:00:00", 
                                  max_arrival_time = "10:00:00"
                                  )
# TASK ////////////////////////////////////////////////////////////////////////

# TASK ////////////////////////////////////////////////////////////////////////
# 1. Use travel_times() function to calculate travel times from the `closest_station` 
#    to all other stations during time specified in am_stop_time. 
# 2. Filter the row for which the value of 'to_stop_name' column 
#    equals midtown$stop_name. Assign it into `trvt` object.
# 1. Use travel_times() function to calculate travel times
trvt <-  travel_times(filtered_stop_times = am_stop_time,
                      stop_name = closest_station$stop_name,
                      return_coords = TRUE) %>% filter(to_stop_name == midtown$stop_name)
# TASK ////////////////////////////////////////////////////////////////////////

# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60

# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- drop_units(trvt_osm_m) + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================
#save(trvt, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/trvt.RData")
#kable(head(total_trvt))

Step 5. Converting it into a function & Apply it for the whole study area

# Function definition (do not modify other parts of the code in this code chunk except for those inside the TASK section)

get_trvt <- function(home, osm, station, midtown){
  
  # TASK ////////////////////////////////////////
  # If the code in Step 4 runs fine,
  # Replace where it says **YOUR CODE HERE..** below with 
  # the entirety of the code in the previous code chunk (i.e., Step 4)
  
  # **YOUR CODE HERE..**
  origin <- home[1,]
  dist_to_stations <- st_distance(station, origin)
  closest_station <- station[which.min(dist_to_stations), ]
  paths <- st_network_paths(osm, from = origin, to = closest_station, type = "shortest")
  closest_dist <- osm %>% 
    activate("nodes") %>% 
    slice(paths$node_paths[[1]]) %>% 
    st_as_sf("edges") %>% 
    pull(length) %>% 
    sum()

  if (closest_dist == set_units(0, m)){
    closest_dist <- dist_to_stations[which.min(dist_to_stations)] * 1.5
  }
  car_speed <- set_units(30, mile/h)
  trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>%  # Distance divided by 30 mile/h
    as.vector(.)
  closest_index <- which.min(closest_dist)
  paths_closest <- osm %>% 
    activate("nodes") %>% 
    slice(paths$node_paths[[closest_index]])
  am_stop_time <- filter_stop_times(gtfs_obj = gtfs, 
                                  extract_date = "2021-08-14",
                                  min_departure_time = "07:00:00", 
                                  max_arrival_time = "10:00:00"
                                  )
  trvt <-  travel_times(filtered_stop_times = am_stop_time,
                      stop_name = closest_station$stop_name,
                      return_coords = TRUE) %>% filter(to_stop_name == midtown$stop_name)
  trvt_gtfs_m <- trvt$travel_time/60
  total_trvt <- drop_units(trvt_osm_m) + trvt_gtfs_m
  # //TASK //////////////////////////////////////

  # =========== NO MODIFICATION ZONE STARTS HERE ===============================
  if (length(total_trvt) == 0) {total_trvt = 0}

  return(total_trvt)
  # =========== NO MODIFY ZONE ENDS HERE ========================================
}

This is the end of the section where you need to code

Step 6 - 7. Run the loop, create a map and plots

load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/census.RData")
#load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/trvt.RData")
# Prepare an empty vector
total_trvt <- vector("numeric", nrow(home))

# Apply the function for all Census Tracts
# Fill `total_trvt` object with the calculated time
for (i in 1:nrow(home)){
  total_trvt[i] <- get_trvt(home[i,], osm, station, midtown)
}
#save(total_trvt, file = "/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/total_trvt.RData")
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/total_trvt.RData")
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/home.RData")
load("/Users/helenalindsay/Documents/Fall_23/CP8883/Major2/census.RData")
# Cbind the calculated travel time back to `home`
home_done <- home %>% 
  cbind(trvt = total_trvt)

kable(head(total_trvt))
x
17.880377
10.547920
17.712230
12.923612
3.134930
5.004586
kable(head(home_done))
GEOID tract county state hhincE hhincM r_totE r_totM r_whE r_whM r_blE r_blM tot_hhE tot_hhM own_novhcE own_novhcM rent_novhcE rent_novhcM trvt geometry
4 13121002300 Census Tract 23 Fulton County Georgia 28611 16783 1384 322 63 70 1293 323 596 151 52 40 208 110 17.880377 POINT (-84.4186 33.76721)
5 13121004200 Census Tract 42 Fulton County Georgia 23906 4494 2675 577 252 146 2423 606 1470 235 39 54 647 99 10.547920 POINT (-84.41824 33.73906)
6 13121003100 Census Tract 31 Fulton County Georgia 82309 39452 2290 397 1324 243 900 361 1251 332 10 14 311 293 17.712230 POINT (-84.35289 33.75186)
8 13121003600 Census Tract 36 Fulton County Georgia 68859 19118 1132 267 190 96 891 264 893 221 14 11 279 127 12.923612 POINT (-84.40295 33.75188)
27 13121001204 Census Tract 12.04 Fulton County Georgia 90040 28919 1845 244 1470 189 54 38 1216 198 134 165 93 78 3.134930 POINT (-84.38016 33.77878)
33 13121002802 Census Tract 28.02 Fulton County Georgia 25054 5296 1487 350 325 160 1028 346 587 120 0 14 229 62 5.004586 POINT (-84.38011 33.75689)
# Map!
tmap_mode('view')

wh <- tm_shape(census[census$GEOID %in% home$GEOID,] %>% mutate(pct_white = r_whE/r_totE)) + 
  tm_polygons(col = "pct_white", palette = 'GnBu') + 
  tm_shape(home_done) + 
  tm_dots(col = "trvt", palette = 'Reds', size = 0.1)

black <- tm_shape(census[census$GEOID %in% home$GEOID,] %>% mutate(pct_blk = r_blE/r_totE)) + 
  tm_polygons(col = "pct_blk", palette = 'GnBu') + 
  tm_shape(home_done) + 
  tm_dots(col = "trvt", palette = 'Reds', size = 0.1)

wh
black
  # ggplot!
inc <- ggplot(data = home_done %>% 
         mutate(hhinc = hhincE),
       aes(x = hhinc, y = trvt)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Median Annual Household Income",
       y = "Travel Time from Home to Midtown Station",
       title = "Median Annual Household Income & Travel Time from Home to Midtown Station") +
  stat_cor(aes(x=hhinc,y = trvt),  method = "pearson",  color = 'orange') +
  theme(plot.title = element_text(size = 8))


white <- ggplot(data = home_done %>% 
         mutate(pct_white = r_whE/r_totE),
       aes(x = pct_white, y = trvt)) +
  geom_point(aes(x = pct_white, y = trvt)) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Percent White",
       y = "Travel Time from Home to Midtown Station",
       title = "Percent White & Travel Time from Home to Midtown Station") +
  stat_cor(aes(x=pct_white,y = trvt),  method = "pearson",  color = 'orange') +
  theme(plot.title = element_text(size = 8))

blk <- ggplot(data = home_done %>% 
         mutate(pct_blk = r_blE/r_totE),
       aes(x = pct_blk, y = trvt)) +
  geom_point(aes(x = pct_blk, y = trvt)) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Percent Black",
       y = "Travel Time from Home to Midtown Station",
       title = "Percent Black & Travel Time from Home to Midtown Station") +
  stat_cor(aes(x=pct_blk,y = trvt),  method = "pearson",  color = 'orange') +
  theme(plot.title = element_text(size = 8))


ggpubr::ggarrange(white, blk,inc)

Conclusion

Analyzing the thematic maps reveals an interesting trend: areas with a higher proportion of white residents generally exhibit shorter travel times from home to the midtown station. Conversely, tracts with a lower percentage of white residents tend to experience longer travel times. This relationship appears to be reversed when considering areas with varying percentages of black residents.

Upon closer examination of the plots, the trends observed in the maps become more evident. The plots reveal that the negative relationship between the percentage of white residents and travel time from home to the midtown station is statistically significant, with a correlation coefficient of -0.3 and a p-value of 0.011. Conversely, the percentage of black residents and travel time from home to the midtown station exhibit a positive relationship, supported by a correlation coefficient of 0.38 and a low p-value of 0.00078.

Another intriguing finding was the inverse correlation between median annual household income and travel time from home to the midtown station. This relationship was also statistically significant, with a correlation coefficient of -0.23 and a p-value of 0.049.

In conclusion, the maps and plots clearly indicate disparities among tracts with different racial and income compositions when it comes to using transit for commuting to midtown. The findings from this analysis indicate that there is significant work to be done by the City of Atlanta in creating a fair and equitable environment for all residents.