2nd Mini-project

Mini projects are designed to allow you to work on real-world planning-related data. The second mini project presents THREE questions related to recent discussions in (transportation) planning.

Atlanta BeltLine and Changes in Local Economies
Transit catchment areas in the US
Measuring consumption amenities

Your tasks include:

Understand data and variables
Transform data
Visualize data
Interpret patterns in your visualization

That is, you not only wrangle data but also make sense of their patterns. To help you do so, each question has several subquestions that guide your process of data processing and analysis. Hopefully, this is a fun and useful exercise in your geographic data analysis and visualization career.

Submission instruction:

Due: 11:59:00 PM on March 26 (Tuesday), 2019
Where: Canvas
Deliverables: your R script file(s) and images. Include your discussion as #comments in your script file.

Happy R scripting!

install.packages("tidyverse", dependencies = TRUE)
install.packages("stringr", dependencies = TRUE)

install.packages("sf", dependencies = TRUE)
install.packages("sp", dependencies = TRUE)
install.packages("mapview", dependencies = TRUE)
install.packages("tmap", dependencies = TRUE)

install.packages("tidycensus", dependencies = TRUE)
install.packages("tigris", dependencies = TRUE)
install.packages("devtools") # for this, you need Rtools installed on your machine 
devtools::install_github("jamgreen/lehdr")

install.packages("httr", dep = TRUE)
install.packages("jsonlite", dep = TRUE)
install.packages("ggpubr", dependencies = TRUE)

library(tidyverse)
library(stringr)     # for working with strings (pattern matching)

library(sf)          # classes and functions for vector data
library(raster)
library(sp)
library(mapview)
library(tmap)

library(tidycensus)
library(tigris)
library(lehdr)

library(httr) 
library(jsonlite)
library(viridis)
library(ggpubr)

1. Atlanta BeltLine and Changes in Local Economies

Background

Industrial gentrification
Industrial heritage as agent of gentrification
These are efforts to theorize multiple stages in a cycle of certin types of gentrification.
- Creative, artistic, and cultural industries move into gentrifiable neighborhoods first.
- In reaponse to their success, wealthier residents also appear as gentrifiers in the later stage of industrial gentrification.
In Atlanta, we have a little different gentrification process around the neighborhoods next to BeltLine.
Scholars reported its effects on housing market perfomance.
- Large Redevelopment Initiatives, Housing Values and Gentrification: The Case of the Atlanta Beltline
- Sustainable for whom? Green urban development, environmental gentrification, and the Atlanta Beltline
Still, BeltLine has been less studied regarding its effects on various aspects.

Workflow

Step 1. Import an official BeltLine shapefile as an sf object: line
- Data source
- Download this zipped file and unzip on your working directory: via Dropbox or on Canvas.
- Stephanie Lamm’s map for AJC: Her files are available under subareas_shapefiles

getwd() # put the shapefile in your working directory 
beltline <- st_read("Export_Output.shp")
class(beltline)

east_trail <- beltline %>%
  filter(Status == "Completed", Name == "Eastside Trail")

mapview(st_geometry(east_trail))

Step 2. Read in the 2010 US Census blockgroup shapefile in Fulton County
- tigris GitHub
- tigris on R journal
- tigris technical documentation
- Georgia County FIPS code

library(tigris)
?tigris

tigris_cache_dir("PATH TO YOUR CUSTOM CACHE DIRECTORY")
# Restart R

options(tigris_use_cache = TRUE)
?block_groups
bg_fd <- st_as_sf(
  block_groups(state = "13", county = c("121", "089"), cb = TRUE)
  )

# useful intro: http://rgooglemaps.r-forge.r-project.org/Mapview.html
mapView(bg_fd, zcol = c("ALAND"), legend = FALSE, alpha = 0.5)

Step 3. Read in the 2002 & 2015 LEHD job count data in Georgia
- LEHD homepage
- Technical documentation on LEHD
- lehdr GitHub

library(lehdr)
?grab_lodes
job2002 <- grab_lodes(state = "GA", year = 2002, lodes_type = "wac",
                      job_type = "JT00", segment = "S000", agg_geo = "bg",
                      download_dir = file.path(getwd(), "lodes_raw"))
job2015 <- grab_lodes(state = "GA", year = 2015, lodes_type = "wac",
                      job_type = "JT00", segment = "S000", agg_geo = "bg",
                      download_dir = file.path(getwd(), "lodes_raw"))

Step 4. Read in the 2006-2010 & 2013-2017 ACS 5 year estimates at the ~~blockgroup~~ tract level in Fulton County

This is your job! (no skeleton code is given here, but you can refer to the complete code above and below.)
Compute the socioeconomic status, which was introduced in the 1st Mini-project, at the tract level for both the ACS 2006-2010 and the ACS 2013-2017. Since tidycensus::get_acs() does not work with ACS data with ending years before 2015, use the American Fact Finder. Below links will lead to four pages on which you can download csv files:
- Visit Ameican Fact Finder and click Advanced Search.
- You need to set up your geography first. Choose Geographies > Census Tract - 140 > Georgia > Fulton > All Census Tracts within Fulton County, Georgia and click ADD TO YOUR SELECTIONS. Once you’re done, close the selection window.
- Copy and past below URLs to download csv files.
- B01001: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_10_5YR_B01001&prodType=table
- B01001H: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_10_5YR_B01001H&prodType=table
- B15002: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_10_5YR_B15002&prodType=table
- B19013: https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_10_5YR_B19013&prodType=table
First, calculate three variables: (1) % Non-Hispanic Whites, (2) % College graduates among those who are 25 years old or older, and (3) natural log transformation of median household income. When calculating %, the base of (1) differs from that of (2). Use the appropriate base of each variable. Also, for some tracts, median household incomes may be missing because of their small sample size or confidentiality. Discard such blockgroups.
Second, standardize these values for each year separately by scale().
Third, compute the average of these three variables, which is our socioeconomic status variable.

Step 5. Calculate the distance from the completed Eastside Trail to individual blockgruops in mile

5.1 Make the crs of two sf objects the same for further spatial data operations

east_trail: epsg 3857
bg_fd: epsg 4269

st_crs(east_trail) # 3857 projected 
st_crs(bg_fd)      # 4269 unprojected: NAD83 

bg_fd_P <- st_transform(bg_fd, 3857)
st_crs(bg_fd_P)

5.2 Calculate distances from Beltline Eastside Trail to individual blockgroups, and append them to the blockgroup sf object.

Note that st_distance generates a matrix. If you want to use it as a vector, subset the matrix by calling an appropriate column number .[, x].

distances <- st_distance(bg_fd_P, east_trail)
bg_fd_P$dist <- distances[, 1]

5.3 Create a common key for joining with job2002 and job2015 df.

Note that dplyr::select may be masked by raster::selectin your R session. Thus, you need to speficy its package name.

bg_fd_Pb <- bg_fd_P %>%
  mutate(w_bg = paste0(STATEFP, COUNTYFP, TRACTCE, BLKGRPCE)) %>%
  dplyr::select(STATEFP, COUNTYFP, TRACTCE, BLKGRPCE, w_bg, dist, everything())

5.4 Join job counts in 2002 and 2015 to the blockgroup sf objet.

Since the job2002 and job2015 have the same names for some columns, first compute retailservice2002 and retailservice2015
For retail/service jobs, add CNS07, CNS17, CNS18, and CNS19 (refer to its technical documentation)
When joining, reassign NA to 0 with ifelse: What does NA mean here?

job2002b <- job2002 %>%
  mutate(
    retailservice2002 = CNS07 + CNS17 + CNS18 + CNS19 
  ) %>%
  dplyr::select(w_bg, retailservice2002)

job2015b <- job2015 %>%
  mutate(
    retailservice2015 = CNS07 + CNS17 + CNS18 + CNS19 
  ) %>%
  dplyr::select(w_bg, retailservice2015)

bg_fd_Pc <- bg_fd_Pb %>%
  left_join(job2002b) %>%
  left_join(job2015b) %>% 
  mutate(
   retailservice2002 = ifelse(is.na(retailservice2002) == TRUE, 0, retailservice2002), 
   retailservice2015 = ifelse(is.na(retailservice2015) == TRUE, 0, retailservice2015)   
  ) %>%
  dplyr::select(w_bg, dist, retailservice2002, retailservice2015)

Step 6. Plot a chart: x denotes the distance from individual blockgroups to the Eastside Trail (i.e., air distance, not taking street networks into account) and y indicates # of retail/service jobs at the blockgroup level.

bg_fd_Pc %>%
  ggplot() + 
  geom_point(aes(x = as.numeric(dist), y = retailservice2002), color = "grey") + 
  geom_smooth(aes(x = as.numeric(dist), y = retailservice2002), color = "grey", se = FALSE) + 
  geom_point(aes(x = as.numeric(dist), y = retailservice2015)) + 
  geom_smooth(aes(x = as.numeric(dist), y = retailservice2015), se = FALSE) + 
  coord_cartesian(ylim = c(0, 1000))

ggsave("ggplot_output.png", width = 7, height = 5, units = "in", dpi = 300)

Step 7. Compare patterns of the SES index between 2006-2010 and 2013-2017

This is your job! (no skeleton code is given here, but you can refer to the complete code above and below.)
With the compute the socioeconomic status for 2006-2010 and 2013-2017, create charts similar to that of Step 6.
Note that when you calculate distances to BeltLine, you calculate them using tract sf objects.
Discuss any similar/different patterns between economic trends and neighborhood change (e.g., which causes which or which may have happened first).

2. Transit catchment areas in the US

Before data analysis, let’s prepare main data sets.
- Fixed guideway transit facilities in the US (updated as of 1/1/2019) from the TOD database.
- The Urban Area boundary shapefile from the US Census (tigris).
- The number of workers commuting by public transit, walking, and biking at the block group level from the US Census American Community Survey (tidycensus).

Step 1. Download tod_database_download.csv on https://toddata.cnt.org/downloads.php. If you do not want to register on their website, download the same fime on Canvas or via a Dropbox link. Save it on your working directory and read in as a csv file.

# getwd()
stations_df <- read_csv("tod_database_download.csv")

Create an sf object out of the csv file.
- how to plot xy coordinates: https://ryanpeek.github.io/2017-08-03-converting-XY-data-with-sf-package/
- crs of gtfs data: https://developers.google.com/transit/gtfs/reference/#stopstxt

Then, project it for the Altanta area.
- NAD83/Georgia West crs: http://epsg.io/26967

stations_sf <- st_as_sf(stations_df, coords = c("Longitude", "Latitude"), crs = 4326)
stations_sf <- stations_sf %>%
  filter(Buffer == "Existing Transit") %>%
  st_transform(26967)

tmap_mode("view")
tm_shape(us_states) +
  tm_borders() + 
  tm_shape(stations_sf) + 
  tm_dots()

Step 2. Next, download the Urban Area boundary for the Atlanta region, delineated by the US Census Bureau. For this task, we use tigris::urban_areas to retrieve a UA boundary shapefile.

ua <- st_as_sf(
  tigris::urban_areas(cb = TRUE, year = 2017)
  )

ua_atl <- ua %>%
  filter(UACE10 == "03817") %>%
  st_transform(26967)

Spatially subset those stations within the boundary of the Atlanta UA ua_atl.

stations_atl <- stations_sf[ua_atl, ]

# for the full lost of basemaps, visit http://leaflet-extras.github.io/leaflet-providers/preview/
tmap_mode("view") + 
  tm_shape(stations_atl) + tm_dots() + 
  tm_shape(ua_atl) + tm_borders()

Step 3. Download the US Census ACS 5-year estimates for # of workers living in individual block groups and commuting by various means of transportation. To obtain your Census API key, visit http://api.census.gov/data/key_signup.html, and insert your key to the current environment.

census_api_key("YOUR API KEY GOES HERE")

var_acs17 <- load_variables(2017, "acs5", cache = TRUE)
View(var_acs17)

acs17 <- get_acs(state = "GA", county = c("Fulton", "DeKalb"), 
                 geography = "block group", year = 2017, 
                 variables = c("B08301_001", "B08301_010", "B08301_018", "B08301_019"), 
                 output = "wide",
                 geometry = TRUE)

blkgrp_atl <- acs17 %>%
  st_transform(26967)

blkgrp_atl$area_whole <- unclass(st_area(blkgrp_atl))

Step 4. Creat half-mile buffers for individual stations. Check the unit of the crs in stations_atl.

buffer_atl <- st_buffer(stations_atl, dist = 800)

Step 5. Clip census block groups by the buffer polygons.

seg_atl <- st_intersection(buffer_atl, blkgrp_atl)
seg_atl$area_seg <- unclass(st_area(seg_atl))
seg_atl$area_prop <- seg_atl$area_seg / seg_atl$area_whole
hist(seg_atl$area_prop)

# View(stations_atl)

seg_midtown <- seg_atl %>%
  filter(Station.Name == "MIDTOWN STATION")
tmap_mode("view")
tm_shape(seg_midtown) +
  tm_polygons(col = "area_prop", alpha = 0.5)

Step 6. Compute summary statistics for several commuting mode shares (e.g., % transit commuters) for individual stations. Note that below code presents how such statistics are computed for the Atlanta stations.

union_midtown <- 
  seg_midtown %>% 
  mutate(workers = B08301_001E * area_prop, 
         workers_tr = B08301_010E * area_prop, 
         workers_bk = B08301_018E * area_prop,
         workers_wk = B08301_019E * area_prop) %>%
  summarize(n_seg = n(), 
            workers_sum = sum(workers), 
            workers_sum_tr = sum(workers_tr),
            workers_sum_bk = sum(workers_bk),
            workers_sum_wk = sum(workers_wk), 
            workers_pct_tr = workers_sum_tr/workers_sum*100, 
            workers_pct_active = (workers_sum_bk + workers_sum_bk)/workers_sum*100)  

tmap_mode("view")
tm_shape(union_midtown) +
  tm_borders()

union_atl <-
  seg_atl %>% 
  group_by(Station.Name) %>% 
  mutate(workers = B08301_001E * area_prop, 
         workers_tr = B08301_010E * area_prop, 
         workers_bk = B08301_018E * area_prop,
         workers_wk = B08301_019E * area_prop
         ) %>%
  summarize(n_seg = n(), 
            workers_sum = sum(workers), 
            workers_sum_tr = sum(workers_tr),
            workers_sum_bk = sum(workers_bk),
            workers_sum_wk = sum(workers_wk), 
            workers_pct_tr = workers_sum_tr/workers_sum*100, 
            workers_pct_active = (workers_sum_bk + workers_sum_bk)/workers_sum*100) %>%
  arrange(desc(n_seg))

tmap_mode("view")
tm_shape(union_atl) +
  tm_borders()

Step 7 Compute the same summary statistics for the entire country with the same data sets.

This is your job! Submit your script that does the following tasks (no skeleton code is given here, but you can refer to the complete code above and below.).
In Step 3, get_acs has state and county arguments. Thus, to extract blockgroup-level commuter counts, we need to have a full list of state and county combinations in which individual stations in the US are located.
To make such a list, download a county shapefile for the entire US with tigris, then spatially join station buffers with the county shapefile (once loaded, it’ll be either an sp or sf object) and collect county fips codes.
When the list is ready, define a function using get_acs and use a map function for interation. This helps you deal with random api errors, and you may want to revisit the sample R scripts, which we discussed briefly in the class.
Once you extract all necessary commuter counts, follow the above work flow. If R fails to process all station buffers with one run, split those stations into several groups (e.g., by region/neighboring state or for each UA), and run each group separately.
Answer the following questions:
- Which station in the US has the the highest number of blockgroup segments?
- Which station in the US has the the largest resident workers in the half mile buffers?
- Which station in the US has the the largest transit commuters in the half mile buffer?
- Which station in the US has the the largest biking commuters in the half mile buffer?
- Which station in the US has the the largest walking commuters in the half mile buffer?
- Which station in the US has the the highest percent of transit commuters (as a proportion of all commuters) in the half mile buffer?
- Which station in the US has the the highest percent of active commuters (bikers/walkers combined as a proportion of all commuters) in the half mile buffer?

3. Measuring consumption amenities

Background

Innovative attempts to “predict” neighborhood change with new data sources and analytics
Confirmed: Starbucks knows the next hot neighborhood before everybody else does
Nowcasting Gentrification: Using Yelp Data to Quantify Neighborhood Change
Divergent Pathways of Gentrification: Racial Inequality and the Social Order of Renewal in Chicago Neighborhoods

As a simple example, we’ll use Google Places API to examine the spatial distribution of various businesses in cities, and their possible connections to neighborhood change.

Step 1 Create xy points for API calls.

# import urban area boundaries for the entire US 
ua00 <- urban_areas(cb = FALSE, year = 2017) # https://cran.r-project.org/web/packages/tigris/tigris.pdf
# convert to sf to use tidyverse verbs 
ua01 <- ua00 %>% st_as_sf 

# subset Atlanta UA 
ua_atl <- ua01 %>%
  filter(grepl("Atlanta, GA", NAME10)) %>%
  st_transform(crs = 26967)  # https://epsg.io/26967
plot(ua_atl)

# convert Atlanta UA to raster with resolution 800 meters 
raster_template  <- raster(extent(ua_atl), resolution = 800, 
                           crs = st_crs(ua_atl)$proj4string) # http://strimas.com/spatial/hexagonal-grids/
atl_r <- rasterize(ua_atl, raster_template , field = 1)
plot(atl_r)

# convert raster to polygons (i.e., fishnet)
atl_fishnet <- rasterToPolygons(atl_r) %>%
  st_as_sf()
plot(atl_fishnet)

# convert polygons to their centroids 
atl_fishnet_cent <- st_centroid(atl_fishnet)
plot(atl_fishnet_cent)

# a proxy for the Atlanta CBD: 33.7662, -84.38726 (lat, long) - unprojectd coordinates on WGS84
cbd_point <- c(-84.38726, 33.7662) %>% st_point()  
cbd <- st_sfc(cbd_point, crs = 4326)
cbd_proj <- cbd %>% st_transform(crs = 26967)
cbd_bf <- cbd_proj %>% st_buffer(dist = 8046.72) # 5 miles 
# write_rds(cbd_bf, paste0(getwd(), "/Week10/cbd_bf.rds"))

inner_points <- atl_fishnet_cent[cbd_bf, ]
# write_rds(inner_points, paste0(getwd(), "/Week10/inner_points.rds"))
nrow(inner_points)

tmap_mode("view")
tm_shape(cbd_bf) +
  tm_borders() + 
  tm_shape(inner_points) +
  tm_dots()

# to extract xy coordinates in Google API's epsg, first reproject to 4326, and then extract xy coordinates 
inner_coord <- inner_points %>% 
  st_transform(crs = 4326) %>% 
  st_coordinates() %>% 
  as.data.frame()

Step 2 Send Google Places API calls.

I misinformed you of Google’s pricing policies in the class. Here is an accurate explanation as long as I understand:

Each month up to 5,000 calls for Places - Nearby Search are free.
For more details, visit this website and search for “nearby”.
However, note that they first charge your account and then also give you credits up to $200/month.
That is, you are still not charged up to the 5,000 calls (equivalent to $200), but you will see transaction records in your account.
Check your transaction records by visiting Google Cloud Platform > Billing > transactions. Make sure you select the right project on the top.
For this assignment, you will send around 2,000 API calls, so 5000 calls are more than enough.
I see this is not a very comfortable situation, and next time, I will explore other API options, which works in better ways.

# prepare URLs for API calls 
lat_lng <- paste(inner_coord$Y, inner_coord$X, sep=",")
# write_rds(lat_lng, paste0(getwd(), "/Week10/lat_lng.rds"))
# temp <- read_rds(paste0(getwd(), "/Week10/lat_lng.rds"))
# identical(lat_lng, temp)

url_1 <- "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location="
# url_2 <- "&radius=600&type="
url_2 <- "&rankby=distance&type="
business <- "cafe"
my_api <- "YOUR API KEY GOES HERE"
token <- "&pagetoken="

# define a function 
google_places <- function(lat_lng){
  
  # read in output from Google 
  result1_raw <<-
    paste0(url_1, lat_lng, url_2, business, "&key=", my_api) %>% 
    GET() %>% 
    content(as = "text") %>%
    fromJSON() 
  # transform to a data frame 
  if (result1_raw$status == "OK"){
    result1_df <<- 
      result1_raw$results %>% 
      dplyr::select(name, place_id, types, rating, user_ratings_total) 
    result1_df$x <<- result1_raw$results$geometry$location$lng
    result1_df$y <<- result1_raw$results$geometry$location$lat
    # result1_df$geometry <- NULL
  } else {
    result1_df <<- NA
  }
  print("Result #1 is just finished.")
  Sys.sleep(2)
  
  # run if results have more than 20 businesses 
  if (is.null(result1_raw$next_page_token) == FALSE){
    # read in output from Google 
    result2_raw <<-
      paste0(url_1, lat_lng, url_2, business, "&key=", my_api, token, result1_raw$next_page_token) %>% 
      GET() %>% 
      content(as = "text") %>%
      fromJSON() 
    # transform to a data frame 
    if (result2_raw$status == "OK"){
      result2_df <<- 
        result2_raw$results %>% 
        dplyr::select(name, place_id, types, rating, user_ratings_total) 
      result2_df$x <<- result2_raw$results$geometry$location$lng
      result2_df$y <<- result2_raw$results$geometry$location$lat
      # result2_df$geometry <- NULL
    } else {
      result2_df <<- NA
    }
  } else {
    result2_df <<- NA
  }
  print("Result #2 is just finished.")
  Sys.sleep(2)
  
  # run if results have more than 40 businesses 
  if (is.null(result2_raw$next_page_token) == FALSE){
    # read in output from Google 
    result3_raw <<-
      paste0(url_1, lat_lng, url_2, business, "&key=", my_api, token, result2_raw$next_page_token) %>% 
      GET() %>% 
      content(as = "text") %>%
      fromJSON() 
    # transform to a data frame 
    if (result3_raw$status == "OK"){
      result3_df <<- 
        result3_raw$results %>% 
        dplyr::select(name, place_id, types, rating, user_ratings_total) 
      result3_df$x <<- result3_raw$results$geometry$location$lng
      result3_df$y <<- result3_raw$results$geometry$location$lat
      # result3_df$geometry <- NULL
    } else {
      result3_df <<- NA
    }
  } else {
    result3_df <<- NA
  }
  print("Result #3 is just finished.")
  
  result_invalid <<- list(result1_raw$status, result2_raw$status, result3_raw$status) %>%
    map_lgl(~.=="INVALID_REQUEST")
  if (any(result_invalid)){
    print("INVALID_REQUEST")
    return("INVALID_REQUEST")
  } else {
    result_list  <<- list(result1_df, result2_df, result3_df)
    result_final <<- result_list[!is.na(result_list)] %>% 
      dplyr::bind_rows()
    print("Succesful collection")
    return(result_final)
  }
  Sys.sleep(2)
}

# make an empty ouput list 
output <- vector("list", length(lat_lng))

# run a for loop 
for (i in seq_along(lat_lng)) {
  output[[i]] <- google_places(lat_lng[[i]])  
}

map(output, ~class(.)!="data.frame") %>% unlist() %>% sum()
map_dbl(output, nrow) %>% unlist() %>% sum()/60

output_df <- dplyr::bind_rows(output) %>% as_tibble()

output_df2 <- 
  output_df %>%  
  group_by(place_id) %>%
  summarize(rating = mean(rating, na.rm = TRUE), 
            rating_n = mean(user_ratings_total, na.rm = TRUE), 
            x = mean(x), 
            y = mean(y)) 
  
output_sf <- st_as_sf(output_df2, coords = c("x", "y"), crs = 4326)
output_sf$ln_rating_n <- log(output_sf$rating_n +1)

tmap_mode("view")
tm_basemap("Stamen.TonerLite", alpha = 0.25) +
  tm_shape(output_sf) +
  tm_dots(col = "ln_rating_n", size = 0.05, palette = "-viridis") +
  tm_shape(inner_points) +
  tm_dots(col = "red", size = 0.01, alpha = 0.25) + 
  tm_shape(cbd_bf) +
  tm_borders(col = "red") + 
  tm_tiles("Stamen.TonerLabels")

Step 3 Clean the Google Places API results

tigris_cache_dir(paste0(getwd(), "/tigris"))
readRenviron('~/.Renviron')

# read in census tracts in GA with the tigris package 
tracts_ga <- tracts(
  state = "GA", cb = FALSE, year = 2016) %>% 
  st_as_sf() %>%
  mutate(
    sqmt = as.numeric(ALAND), # for now ALAND is characer
    sqmi = sqmt/1000000*0.386102159  # convert from sq meter to sq mile 
  ) %>%
  dplyr::select(GEOID, sqmi) %>%
  st_transform(crs = 26967)

# subset tracts within the Altanta UA boudnary: 885 tracts  
tracts_atl <- tracts_ga[ua_atl, ]

# subset tracts within the CBD 5-mile buffer: 124 tracts  
tracts_cbd <- tracts_ga[cbd_bf, ] 

# project output_sf (430 businesses) with crs = 26967 
output_sf_proj <- output_sf %>%
  st_transform(crs = 26967) 

tmap_mode("view")
tm_shape(tracts_cbd) +
  tm_polygons(col = "red", alpha = 0.25) + 
  tm_shape(tracts_atl) +
  tm_borders() + 
  tm_shape(output_sf_proj) + 
  tm_dots(col = "blue")

den_cafe <- st_join(output_sf_proj, tracts_cbd) %>% #  only 74 tracts have any cafe
  filter(is.na(GEOID) == FALSE) %>% 
  group_by(GEOID) %>%
  summarize(
    n = n(), 
    sqmi = mean(sqmi)
    ) %>%
  mutate(
    den_cafe = n/sqmi
    ) %>%
  dplyr::select(GEOID, den_cafe)

Step 4 Collect an alternative measure.

# read in job counts in CA with the lehdr package 
jobs_ga <- grab_lodes(
  state = "GA", year = 2015, lodes_type = "wac", job_type = "JT00", 
  segment = "S000", state_part = "main", agg_geo = "tract", 
  download_dir = paste0(getwd(), "/Week10/lodes"))

retail_ga <- jobs_ga %>%
  dplyr::select(GEOID = w_tract, retail = CNS07)

# compute retail density at the census tract level for Atlanta UA, transform back to WGS84    
den_retail <- tracts_cbd %>%
  left_join(retail_ga) %>%
  mutate(
    den_retail = retail/sqmi
  ) %>%
  dplyr::select(GEOID, den_retail)

Step 5 Compute College graduates 25-34 as of 2017

var_acs17 <- load_variables(2017, "acs5", cache = TRUE)
View(var_acs17)

# census_api_key("3b1d8912e33aa2d4c01bf1abc84729cfeb7cd6cd", install = TRUE, overwrite = TRUE)
# readRenviron("~/.Renviron")
# Sys.getenv("CENSUS_API_KEY")

# B15001_017: Male!!25 to 34 years!!Bachelor's degree
# B15001_018: Male!!25 to 34 years!!Graduate or professional degree
# B15001_058: Female!!25 to 34 years!!Bachelor's degree
# B15001_059: Female!!25 to 34 years!!Graduate or professional degree
young_ga <- get_acs(state = "GA", geography = "tract", year = 2017, 
        variables = c("B15001_017", "B15001_018", "B15001_058", "B15001_059"), 
        output = "wide") 

young_ga2 <- young_ga %>%
  mutate(young = B15001_017E + B15001_018E + B15001_058E + B15001_059E) %>%
  dplyr::select(GEOID, young)

young_cbd <- tracts_cbd %>%
  left_join(young_ga2) %>%
  mutate(
    sum_young = sum(young), 
    pct_young = young/sum_young*100 
  ) %>%
  dplyr::select(GEOID, pct_young) 

tmap_mode("view")
tm_shape(young_cbd) +
  tm_polygons(col = "pct_young", alpha = 0.5) + 
  tm_shape(output_sf) +
  tm_dots(col = "red", size = 0.03)

Step 6 Create plots to compare the performance of two measures.

den_cafe_df <- den_cafe 
st_geometry(den_cafe_df) <- NULL

den_retail_df<- den_retail
st_geometry(den_retail_df) <- NULL

young_cbd_df <- young_cbd
st_geometry(young_cbd_df) <- NULL

data <- left_join(den_retail_df, den_cafe_df) %>%
  mutate(
    den_retail = ifelse(is.na(den_retail)==TRUE, 0, den_retail), 
    ln_retail  = log(den_retail+1), 
    den_cafe   = ifelse(is.na(den_cafe)  ==TRUE, 0, den_cafe), 
    ln_cafe    = log(den_cafe+1)
    ) %>%
  left_join(young_cbd_df)

data$ln_retail <- scale(data$ln_retail)
data$ln_cafe <- scale(data$ln_cafe)

# https://stackoverflow.com/questions/7549694/adding-regression-line-equation-and-r2-on-graph
lm_retail <- ggplot(data) + 
  stat_smooth_func(aes(ln_retail, pct_young), geom="text",method="lm",hjust=0,parse=TRUE) +
  geom_smooth(aes(ln_retail, pct_young), method="lm",se=FALSE, color = "coral1", size = 0.1) +
  geom_point(aes(ln_retail, pct_young), color = "coral4") 

lm_cafe <- ggplot(data) + 
  stat_smooth_func(aes(ln_cafe, pct_young), geom="text",method="lm",hjust=0,parse=TRUE) +
  geom_smooth(aes(ln_cafe, pct_young), method="lm",se=FALSE, color = "deepskyblue1", size = 0.1) +
  geom_point(aes(ln_cafe, pct_young), color = "deepskyblue4")
  install.packages("ggpubr")
  

ggarrange(lm_retail, lm_cafe, 
          ncol = 2, nrow = 1)

Step 7 With the same workflow, extract cafe for the inner city of the Boston Urban Area, MA and the San Francisco Urban Area, CA.

This is your job! Submit your script that does the following tasks (no skeleton code is given here, but you can refer to the complete code here.).
Limit the inner city as those census tracts within 5 miles from the CBD.
Above, the CBD of the Atlanta UA was the centroid of the census tract with the highest job density based on the 2015 LEHD (I extracted the total # of job or C000 from the WAC table at the census tract level. The code used for this task is not included here). Follow the same approach when you define five-mile buffers for Boston and San Francisco.

2nd Mini-project

March 11, 2019 (Monday)

1. Atlanta BeltLine and Changes in Local Economies

Background

2. Transit catchment areas in the US

3. Measuring consumption amenities

Background