##Major Assignment 2 #Katherine Ginensky
There are a few main components in this assignment - home location, road networks, transit network, and destination. We will simulate a journey that starts from the starting point (e.g., home), drives to nearest MARTA rail station, transfers to MARTA rail transit, and finally arrives at Midtown station (i.e., an employment center). The following is a list of tasks and data we need for this analysis.
Step 1. Download Required data from GTFS. Convert it to sf format, extract MARTA rail stations, and clean the stop names to delete duplicate names. Also extract the destination station.
Step 2. Download Required data from Census. Convert Census polygons into centroids and subsetting.
Step 3. Download Required data from OSM. Convert it to sfnetwork object and clean the network.
Step 4. Try the simulation for just one home location as a pilot test.
Step 5. Convert the steps we identified in Step 4 into a function so that we can use it to repeat it in a loop.
Step 6. Run a loop to repeat what we did in Step 5 to all other home location using the function from Step 6. Once finished, merge the simulation output back to Census data.
Step 7. Finally, examine whether there is any disparity in using transit to commute to midtown.
Before we start, libraries first..
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tmap)
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
library(ggplot2)
library(units)
## udunits database from /usr/share/xml/udunits/udunits2.xml
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
library(leaflet)
library(tidycensus)
library(leafsync)
library(dbscan)
##
## Attaching package: 'dbscan'
##
## The following object is masked from 'package:stats':
##
## as.dendrogram
library(sfnetworks)
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(tidygraph)
##
## Attaching package: 'tidygraph'
##
## The following object is masked from 'package:stats':
##
## filter
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(osmdata)
## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright
library(here)
## here() starts at /home/rstudio
library(tidytransit)
library(units)
library(leaflet)
library(tidycensus)
library(leafsync)
epsg <- 4326
# TASK ////////////////////////////////////////////////////////////////////////
# Download GTFS data from [here](https://opendata.atlantaregional.com/datasets/marta-gtfs-latest-feed/about) and save it in your hard drive. Read the file using `read_gtfs()` function and assign it in `gtfs` object
gtfs <- read_gtfs("https://raw.githubusercontent.com/ujhwang/UrbanAnalytics2023/main/Lab/module_3/MARTA_GTFS_Latest_Feed.zip")
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Edit stop_name to append serial numbers (1, 2, etc.) to remove duplicate names
stop_dist <- stop_group_distances(gtfs$stops, by='stop_name') %>%
filter(dist_max > 200)
gtfs$stops <- gtfs$stops %>%
group_by(stop_name) %>%
mutate(stop_name = case_when(stop_name %in% stop_dist$stop_name ~ paste0(stop_name, " (", seq(1,n()), ")"),
TRUE ~ stop_name))
# Create a transfer table
gtfs$transfers <- gtfsrouter::gtfs_transfer_table(gtfs,
d_limit = 200,
min_transfer_time = 120)
## Registered S3 method overwritten by 'gtfsrouter':
## method from
## summary.gtfs gtfsio
## ▶ Finding neighbouring services for each stop
## Loading required namespace: pbapply
## ✔ Found neighbouring services for each stop
## ▶ Expanding to include in-place transfers
## ✔ Expanded to include in-place transfers
# NOTE: Converting to sf format uses stop_lat and stop_lon columns contained in gtfs$stops.
# In the conversion process, stop_lat and stop_lon are converted into a geometry column, and
# the output sf object do not have the lat lon column anymore.
# But many other functions in tidytransit look for stop_lat and stop_lon.
# So I re-create them using mutate().
gtfs <- gtfs %>% gtfs_as_sf(crs = epsg)
gtfs$stops <- gtfs$stops %>%
ungroup() %>%
mutate(stop_lat = st_coordinates(.)[,2],
stop_lon = st_coordinates(.)[,1])
# Get stop_id for rails and buses
rail_stops <- gtfs$routes %>%
filter(route_type %in% c(1)) %>%
inner_join(gtfs$trips, by = "route_id") %>%
inner_join(gtfs$stop_times, by = "trip_id") %>%
inner_join(gtfs$stops, by = "stop_id") %>%
group_by(stop_id) %>%
slice(1) %>%
pull(stop_id)
# Extract MARTA rail stations
station <- gtfs$stops %>% filter(stop_id %in% rail_stops)
# Extract Midtown Station
midtown <- gtfs$stops %>% filter(stop_id == "134")
# Create a bounding box to which we limit our analysis
bbox <- st_bbox(c(xmin = -84.45241, ymin = 33.72109, xmax = -84.35009, ymax = 33.80101),
crs = st_crs(4326)) %>%
st_as_sfc()
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Specify Census API key whichever you prefer using census_api_key() function
tidycensus::census_api_key(Sys.getenv("census_api"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Using get_acs() function, download Census Tract level data for 2020 for Fulton, DeKalb, and Clayton in GA.
# and assign it into `census` object.
# Make sure you set geometry = TRUE.
# variables to download = c("hhinc" = 'B19013_001',
# "r_tot" = "B02001_001",
# "r_wh" = "B02001_002",
# "r_bl" = "B02001_003",
# "tot_hh" = "B25044_001",
# "own_novhc" = "B25044_003",
# "rent_novhc" = "B25044_010")
census <- get_acs(geography = "tract",
variables = c("hhinc" = 'B19013_001',
"r_tot" = "B02001_001",
"r_wh" = "B02001_002",
"r_bl" = "B02001_003",
"tot_hh" = "B25044_001",
"own_novhc" = "B25044_003",
"rent_novhc" = "B25044_010"),
year = 2020,
output = "wide",
state = "GA",
county = c("Fulton", "DeKalb", "Clayton"),
geometry = TRUE)
## Getting data from the 2016-2020 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|===================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|======================= | 32%
|
|======================= | 33%
|
|======================= | 34%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 36%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================== | 65%
|
|============================================== | 66%
|
|================================================ | 69%
|
|================================================= | 71%
|
|================================================== | 71%
|
|=================================================== | 73%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================= | 87%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
census <- census %>%
st_transform(crs = 4326) %>%
separate(col = NAME, into = c("tract", "county", "state"), sep = ", ")
# Convert it to POINT at polygon centroids and extract those that fall into bbox
# and assign it into `home` object
home <- census %>% st_centroid() %>% .[bbox,]
## Warning: st_centroid assumes attributes are constant over geometries
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Get OSM data using opq() function and bbox object defined in the previous code chunk.
# 2. Specify arguments for add_osm_feature() function using
# key = 'highway' and
# value = c("motorway", "trunk", "primary", "secondary", "tertiary", "residential",
# "motorway_link", "trunk_link", "primary_link", "secondary_link",
# "tertiary_link", "residential_link", "unclassified")
# 3. Convert the OSM data into a sf object using osmdata_sf() function
# 4. Convert osmdata polygons into lines using osm_poly2line() function
osm_road <- opq(bbox = bbox) %>%
add_osm_feature(key = 'highway',
value = c("motorway", "trunk", "primary", "secondary", "tertiary", "residential",
"motorway_link", "trunk_link", "primary_link", "secondary_link",
"tertiary_link", "residential_link", "unclassified")) %>%
osmdata_sf() %>%
osm_poly2line()
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Convert osm_road$osm_lines to sfnetworks using as_sfnetwork() function
# 2. Activate edges
# 3. Clean the network using edge_is_multiple(), edge_is_loop(), to_spatial_subdivision(), to_spatial_smooth()
# 4. Assign the cleaned network to an object named 'osm'
osm <- osm_road$osm_line %>%
select(osm_id, highway) %>%
sfnetworks::as_sfnetwork(directed = FALSE) %>%
activate("edges") %>%
filter(!edge_is_multiple()) %>%
filter(!edge_is_loop()) %>%
convert(., sfnetworks::to_spatial_subdivision) %>%
convert(., sfnetworks::to_spatial_smooth)
## Warning: to_spatial_subdivision assumes attributes are constant over geometries
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Add a new column named 'length' to the edges part of the object `osm`.
osm <- osm %>%
mutate(length = edge_length())
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Extract the first row from `home` object and store it as `origin`
origin <- home[1,]
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Find a station that is closest to the origin by Euclidean distance
# using st_distance() function.
dist_to_stations <- st_distance(origin, station)
closest_station <- station[which.min(dist_to_stations), ]
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# Find the shortest path from origin to the closest station
# using st_network_paths() function.
paths <- st_network_paths(osm, from = origin, to = station, type = "shortest")
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Calculate the length of edges in the shortest route to the closest MARTA station
closest_dist <- osm %>%
activate("nodes") %>%
# Slice the part that corresponds with the shortest route
slice(paths$node_paths[[1]]) %>%
# Extract "edges" from the sfnetworks object as a separate sf object
st_as_sf("edges") %>%
# Extract 'length' column and calculate sum
pull(length) %>%
sum()
# If the routing function is not working, assume the route length is 150% of Euclidean distance
if (closest_dist == set_units(0, m)){
closest_dist <- dist_to_stations[which.min(dist_to_stations)] * 1.5
}
# Calculate how to long it takes to traverse `closest_dist`
# assuming we drive at 30 miles/hour speed.
# Store the output in `trvt_osm_m`.
car_speed <- set_units(30, mile/h)
trvt_osm_m <- closest_dist/set_units(car_speed, m/min) %>% # Distance divided by 30 mile/h
as.vector(.)
# =========== NO MODIFY ZONE ENDS HERE ========================================
# TASK ////////////////////////////////////////////////////////////////////////
# Use filter_stop_times() function to create a subset of stop_times data table
# for date = 2021-08-14, minimum departure time of 7AM, maximum departure time of 10AM.
# Assign the output to `am_stop_time` object
gtfs$transfers <- gtfsrouter::gtfs_transfer_table(gtfs,
d_limit = 200,
min_transfer_time = 120) #have to add transfer table
## ▶ Finding neighbouring services for each stop
## ✔ Found neighbouring services for each stop
## ▶ Expanding to include in-place transfers
## ✔ Expanded to include in-place transfers
am_stop_time <- filter_stop_times(gtfs, "2021-08-14", "07:00:00", "10:00:00")
# //TASK //////////////////////////////////////////////////////////////////////
# TASK ////////////////////////////////////////////////////////////////////////
# 1. Use travel_times() function to calculate travel times from the `closest_station`
# to all other stations during time specified in am_stop_time.
# 2. Filter the row for which the value of 'to_stop_name' column
# equals midtown$stop_name. Assign it into `trvt` object.
trvt <- travel_times(am_stop_time, "BANKHEAD STATION", return_coords = TRUE) %>%
filter(to_stop_name == midtown$stop_name)
# //TASK //////////////////////////////////////////////////////////////////////
# =========== NO MODIFICATION ZONE STARTS HERE ===============================
# Divide the calculated travel time by 60 to convert the unit from seconds to minutes.
trvt_gtfs_m <- trvt$travel_time/60
# Add the travel time from home to the nearest station and
# the travel time from the nearest station to Midtown station
total_trvt <- drop_units(trvt_osm_m) + trvt_gtfs_m
# =========== NO MODIFY ZONE ENDS HERE ========================================
This is the end of the section where you need to code
Run the code below to generate a thematic map and a plot
Write a short description of the pattern you see in the map and the plot
knitr::opts_chunk$set(eval = FALSE)
# Prepare an empty vector
total_trvt <- vector("numeric", nrow(home))
# Apply the function for all Census Tracts
# Fill `total_trvt` object with the calculated time
for (i in 1:nrow(home)){
total_trvt[i] <- get_trvt(home[i,], osm, station, midtown)
}
# Cbind the calculated travel time back to `home`
home_done <- home %>%
cbind(trvt = total_trvt)
# Map!
tmap_mode('view')
## tmap mode set to interactive viewing
tm_shape(census[census$GEOID %in% home$GEOID,] %>% mutate(pct_white = r_whE/r_totE)) +
tm_polygons(col = "pct_white", palette = 'GnBu') +
tm_shape(home_done) +
tm_dots(col = "trvt", palette = 'Reds', size = 0.1)
# ggplot!
inc <- ggplot(data = home_done %>%
mutate(hhinc = hhincE),
aes(x = hhinc, y = trvt)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Median Annual Household Income",
y = "Travel Time from Home to Midtown Station") +
theme_bw()
wh <- ggplot(data = home_done %>%
mutate(pct_white = r_whE/r_totE),
aes(x = pct_white, y = trvt)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Percent White",
y = "Travel Time from Home to Midtown Station") +
theme_bw()
ggpubr::ggarrange(inc, wh)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## `geom_smooth()` using formula = 'y ~ x'
The map shows a concentric pattern of increasing travel times as you move further away from Downtown Atlanta. While there is a clear East/West divide between census tracts with a higher proportion of white people, the travel times for residents who park-n-ride to Midtown Station is not clearly impacted by racial composition of the census tracts. Knowing the orthogonal “+” shape of MARTA it also makes sense that the tracts with the highest travel time are in the diagonal corners of the map. This map also shows a characteristic of Atlanta culture where public transportation accessibility is not an extremely desirable aspect of residential location. The two plots reinforce my conclusions from the map that increasing travel time is correlated with higher household income and with higher percentage of white residents in a census tract. There are majority white and majority non-white census tracts that have the shortest travel times to the Midtown Station so it is hard to say if there is a disparity in racial terms, but there does seem to be a parity of lower income tracts and shorter travel times, which somewhat reduces dependance on cars to commute to work. Atlanta is a well-known for being a “car city” and in some ways, there is more equity in lower income census tracts having faster access to a downtown business center station. However, this plot shows travel times for park-n-ride, so there is still a reliance on cars to get to your closest MARTA station.