Mini projects are designed to allow you to work on real-world planning-related data. The second mini project presents THREE questions related to recent discussions in (transportation) planning.

  1. Atlanta BeltLine and Changes in Local Economies
  2. Transit catchment areas in the US
  3. Measuring consumption amenities

Your tasks include:

  1. Understand data and variables
  2. Transform data
  3. Visualize data
  4. Interpret patterns in your visualization

That is, you not only wrangle data but also make sense of their patterns. To help you do so, each question has several subquestions that guide your process of data processing and analysis. Hopefully, this is a fun and useful exercise in your geographic data analysis and visualization career.

Submission instruction:

  1. Due: 11:59:00 PM on March 26 (Tuesday), 2019
  2. Where: Canvas
  3. Deliverables: your R script file(s) and images. Include your discussion as #comments in your script file.

Happy R scripting!

install.packages("tidyverse", dependencies = TRUE)
install.packages("stringr", dependencies = TRUE)

install.packages("sf", dependencies = TRUE)
install.packages("sp", dependencies = TRUE)
install.packages("mapview", dependencies = TRUE)
install.packages("tmap", dependencies = TRUE)

install.packages("tidycensus", dependencies = TRUE)
install.packages("tigris", dependencies = TRUE)
install.packages("devtools") # for this, you need Rtools installed on your machine 
devtools::install_github("jamgreen/lehdr")

install.packages("httr", dep = TRUE)
install.packages("jsonlite", dep = TRUE)
install.packages("ggpubr", dependencies = TRUE)
library(tidyverse)
library(stringr)     # for working with strings (pattern matching)

library(sf)          # classes and functions for vector data
library(raster)
library(sp)
library(mapview)
library(tmap)

library(tidycensus)
library(tigris)
library(lehdr)

library(httr) 
library(jsonlite)
library(viridis)
library(ggpubr)

1. Atlanta BeltLine and Changes in Local Economies

Background

Workflow

Step 1. Import an official BeltLine shapefile as an sf object: line
- Data source
- Download this zipped file and unzip on your working directory: via Dropbox or on Canvas.
- Stephanie Lamm’s map for AJC: Her files are available under subareas_shapefiles

getwd() # put the shapefile in your working directory 
beltline <- st_read("Export_Output.shp")
class(beltline)

east_trail <- beltline %>%
  filter(Status == "Completed", Name == "Eastside Trail")
mapview(st_geometry(east_trail))

Step 2. Read in the 2010 US Census blockgroup shapefile in Fulton County
- tigris GitHub
- tigris on R journal
- tigris technical documentation
- Georgia County FIPS code

library(tigris)
?tigris
tigris_cache_dir("PATH TO YOUR CUSTOM CACHE DIRECTORY")
# Restart R
options(tigris_use_cache = TRUE)
?block_groups
bg_fd <- st_as_sf(
  block_groups(state = "13", county = c("121", "089"), cb = TRUE)
  )
# useful intro: http://rgooglemaps.r-forge.r-project.org/Mapview.html
mapView(bg_fd, zcol = c("ALAND"), legend = FALSE, alpha = 0.5)

Step 3. Read in the 2002 & 2015 LEHD job count data in Georgia
- LEHD homepage
- Technical documentation on LEHD
- lehdr GitHub

library(lehdr)
?grab_lodes
job2002 <- grab_lodes(state = "GA", year = 2002, lodes_type = "wac",
                      job_type = "JT00", segment = "S000", agg_geo = "bg",
                      download_dir = file.path(getwd(), "lodes_raw"))
job2015 <- grab_lodes(state = "GA", year = 2015, lodes_type = "wac",
                      job_type = "JT00", segment = "S000", agg_geo = "bg",
                      download_dir = file.path(getwd(), "lodes_raw"))

Step 4. Read in the 2006-2010 & 2013-2017 ACS 5 year estimates at the blockgroup tract level in Fulton County

Step 5. Calculate the distance from the completed Eastside Trail to individual blockgruops in mile

5.1 Make the crs of two sf objects the same for further spatial data operations

st_crs(east_trail) # 3857 projected 
st_crs(bg_fd)      # 4269 unprojected: NAD83 

bg_fd_P <- st_transform(bg_fd, 3857)
st_crs(bg_fd_P)

5.2 Calculate distances from Beltline Eastside Trail to individual blockgroups, and append them to the blockgroup sf object.

  • Note that st_distance generates a matrix. If you want to use it as a vector, subset the matrix by calling an appropriate column number .[, x].
distances <- st_distance(bg_fd_P, east_trail)
bg_fd_P$dist <- distances[, 1]

5.3 Create a common key for joining with job2002 and job2015 df.

  • Note that dplyr::select may be masked by raster::selectin your R session. Thus, you need to speficy its package name.
bg_fd_Pb <- bg_fd_P %>%
  mutate(w_bg = paste0(STATEFP, COUNTYFP, TRACTCE, BLKGRPCE)) %>%
  dplyr::select(STATEFP, COUNTYFP, TRACTCE, BLKGRPCE, w_bg, dist, everything())

5.4 Join job counts in 2002 and 2015 to the blockgroup sf objet.

  • Since the job2002 and job2015 have the same names for some columns, first compute retailservice2002 and retailservice2015
  • For retail/service jobs, add CNS07, CNS17, CNS18, and CNS19 (refer to its technical documentation)
  • When joining, reassign NA to 0 with ifelse: What does NA mean here?
job2002b <- job2002 %>%
  mutate(
    retailservice2002 = CNS07 + CNS17 + CNS18 + CNS19 
  ) %>%
  dplyr::select(w_bg, retailservice2002)

job2015b <- job2015 %>%
  mutate(
    retailservice2015 = CNS07 + CNS17 + CNS18 + CNS19 
  ) %>%
  dplyr::select(w_bg, retailservice2015)

bg_fd_Pc <- bg_fd_Pb %>%
  left_join(job2002b) %>%
  left_join(job2015b) %>% 
  mutate(
   retailservice2002 = ifelse(is.na(retailservice2002) == TRUE, 0, retailservice2002), 
   retailservice2015 = ifelse(is.na(retailservice2015) == TRUE, 0, retailservice2015)   
  ) %>%
  dplyr::select(w_bg, dist, retailservice2002, retailservice2015)

Step 6. Plot a chart: x denotes the distance from individual blockgroups to the Eastside Trail (i.e., air distance, not taking street networks into account) and y indicates # of retail/service jobs at the blockgroup level.

bg_fd_Pc %>%
  ggplot() + 
  geom_point(aes(x = as.numeric(dist), y = retailservice2002), color = "grey") + 
  geom_smooth(aes(x = as.numeric(dist), y = retailservice2002), color = "grey", se = FALSE) + 
  geom_point(aes(x = as.numeric(dist), y = retailservice2015)) + 
  geom_smooth(aes(x = as.numeric(dist), y = retailservice2015), se = FALSE) + 
  coord_cartesian(ylim = c(0, 1000)) 

ggsave("ggplot_output.png", width = 7, height = 5, units = "in", dpi = 300)

Step 7. Compare patterns of the SES index between 2006-2010 and 2013-2017

  • This is your job! (no skeleton code is given here, but you can refer to the complete code above and below.)
  • With the compute the socioeconomic status for 2006-2010 and 2013-2017, create charts similar to that of Step 6.
  • Note that when you calculate distances to BeltLine, you calculate them using tract sf objects.
  • Discuss any similar/different patterns between economic trends and neighborhood change (e.g., which causes which or which may have happened first).

2. Transit catchment areas in the US

Before data analysis, let’s prepare main data sets.
- Fixed guideway transit facilities in the US (updated as of 1/1/2019) from the TOD database.
- The Urban Area boundary shapefile from the US Census (tigris).
- The number of workers commuting by public transit, walking, and biking at the block group level from the US Census American Community Survey (tidycensus).

Step 1. Download tod_database_download.csv on https://toddata.cnt.org/downloads.php. If you do not want to register on their website, download the same fime on Canvas or via a Dropbox link. Save it on your working directory and read in as a csv file.

# getwd()
stations_df <- read_csv("tod_database_download.csv") 

Create an sf object out of the csv file.
- how to plot xy coordinates: https://ryanpeek.github.io/2017-08-03-converting-XY-data-with-sf-package/
- crs of gtfs data: https://developers.google.com/transit/gtfs/reference/#stopstxt

Then, project it for the Altanta area.
- NAD83/Georgia West crs: http://epsg.io/26967

stations_sf <- st_as_sf(stations_df, coords = c("Longitude", "Latitude"), crs = 4326)
stations_sf <- stations_sf %>%
  filter(Buffer == "Existing Transit") %>%
  st_transform(26967)

tmap_mode("view")
tm_shape(us_states) +
  tm_borders() + 
  tm_shape(stations_sf) + 
  tm_dots()

Step 2. Next, download the Urban Area boundary for the Atlanta region, delineated by the US Census Bureau. For this task, we use tigris::urban_areas to retrieve a UA boundary shapefile.

ua <- st_as_sf(
  tigris::urban_areas(cb = TRUE, year = 2017)
  )
ua_atl <- ua %>%
  filter(UACE10 == "03817") %>%
  st_transform(26967)

Spatially subset those stations within the boundary of the Atlanta UA ua_atl.

stations_atl <- stations_sf[ua_atl, ]

# for the full lost of basemaps, visit http://leaflet-extras.github.io/leaflet-providers/preview/
tmap_mode("view") + 
  tm_shape(stations_atl) + tm_dots() + 
  tm_shape(ua_atl) + tm_borders() 

Step 3. Download the US Census ACS 5-year estimates for # of workers living in individual block groups and commuting by various means of transportation. To obtain your Census API key, visit http://api.census.gov/data/key_signup.html, and insert your key to the current environment.

census_api_key("YOUR API KEY GOES HERE")
var_acs17 <- load_variables(2017, "acs5", cache = TRUE)
View(var_acs17)
acs17 <- get_acs(state = "GA", county = c("Fulton", "DeKalb"), 
                 geography = "block group", year = 2017, 
                 variables = c("B08301_001", "B08301_010", "B08301_018", "B08301_019"), 
                 output = "wide",
                 geometry = TRUE) 
blkgrp_atl <- acs17 %>%
  st_transform(26967)

blkgrp_atl$area_whole <- unclass(st_area(blkgrp_atl))

Step 4. Creat half-mile buffers for individual stations. Check the unit of the crs in stations_atl.

buffer_atl <- st_buffer(stations_atl, dist = 800)

Step 5. Clip census block groups by the buffer polygons.

seg_atl <- st_intersection(buffer_atl, blkgrp_atl)
seg_atl$area_seg <- unclass(st_area(seg_atl))
seg_atl$area_prop <- seg_atl$area_seg / seg_atl$area_whole
hist(seg_atl$area_prop)

# View(stations_atl)

seg_midtown <- seg_atl %>%
  filter(Station.Name == "MIDTOWN STATION")
tmap_mode("view")
tm_shape(seg_midtown) +
  tm_polygons(col = "area_prop", alpha = 0.5)

Step 6. Compute summary statistics for several commuting mode shares (e.g., % transit commuters) for individual stations. Note that below code presents how such statistics are computed for the Atlanta stations.

union_midtown <- 
  seg_midtown %>% 
  mutate(workers = B08301_001E * area_prop, 
         workers_tr = B08301_010E * area_prop, 
         workers_bk = B08301_018E * area_prop,
         workers_wk = B08301_019E * area_prop) %>%
  summarize(n_seg = n(), 
            workers_sum = sum(workers), 
            workers_sum_tr = sum(workers_tr),
            workers_sum_bk = sum(workers_bk),
            workers_sum_wk = sum(workers_wk), 
            workers_pct_tr = workers_sum_tr/workers_sum*100, 
            workers_pct_active = (workers_sum_bk + workers_sum_bk)/workers_sum*100)  

tmap_mode("view")
tm_shape(union_midtown) +
  tm_borders()
union_atl <-
  seg_atl %>% 
  group_by(Station.Name) %>% 
  mutate(workers = B08301_001E * area_prop, 
         workers_tr = B08301_010E * area_prop, 
         workers_bk = B08301_018E * area_prop,
         workers_wk = B08301_019E * area_prop
         ) %>%
  summarize(n_seg = n(), 
            workers_sum = sum(workers), 
            workers_sum_tr = sum(workers_tr),
            workers_sum_bk = sum(workers_bk),
            workers_sum_wk = sum(workers_wk), 
            workers_pct_tr = workers_sum_tr/workers_sum*100, 
            workers_pct_active = (workers_sum_bk + workers_sum_bk)/workers_sum*100) %>%
  arrange(desc(n_seg))

tmap_mode("view")
tm_shape(union_atl) +
  tm_borders()

Step 7 Compute the same summary statistics for the entire country with the same data sets.

3. Measuring consumption amenities

Background

As a simple example, we’ll use Google Places API to examine the spatial distribution of various businesses in cities, and their possible connections to neighborhood change.

Step 1 Create xy points for API calls.

# import urban area boundaries for the entire US 
ua00 <- urban_areas(cb = FALSE, year = 2017) # https://cran.r-project.org/web/packages/tigris/tigris.pdf
# convert to sf to use tidyverse verbs 
ua01 <- ua00 %>% st_as_sf 

# subset Atlanta UA 
ua_atl <- ua01 %>%
  filter(grepl("Atlanta, GA", NAME10)) %>%
  st_transform(crs = 26967)  # https://epsg.io/26967
plot(ua_atl)

# convert Atlanta UA to raster with resolution 800 meters 
raster_template  <- raster(extent(ua_atl), resolution = 800, 
                           crs = st_crs(ua_atl)$proj4string) # http://strimas.com/spatial/hexagonal-grids/
atl_r <- rasterize(ua_atl, raster_template , field = 1)
plot(atl_r)

# convert raster to polygons (i.e., fishnet)
atl_fishnet <- rasterToPolygons(atl_r) %>%
  st_as_sf()
plot(atl_fishnet)

# convert polygons to their centroids 
atl_fishnet_cent <- st_centroid(atl_fishnet)
plot(atl_fishnet_cent)

# a proxy for the Atlanta CBD: 33.7662, -84.38726 (lat, long) - unprojectd coordinates on WGS84
cbd_point <- c(-84.38726, 33.7662) %>% st_point()  
cbd <- st_sfc(cbd_point, crs = 4326)
cbd_proj <- cbd %>% st_transform(crs = 26967)
cbd_bf <- cbd_proj %>% st_buffer(dist = 8046.72) # 5 miles 
# write_rds(cbd_bf, paste0(getwd(), "/Week10/cbd_bf.rds"))

inner_points <- atl_fishnet_cent[cbd_bf, ]
# write_rds(inner_points, paste0(getwd(), "/Week10/inner_points.rds"))
nrow(inner_points)

tmap_mode("view")
tm_shape(cbd_bf) +
  tm_borders() + 
  tm_shape(inner_points) +
  tm_dots()

# to extract xy coordinates in Google API's epsg, first reproject to 4326, and then extract xy coordinates 
inner_coord <- inner_points %>% 
  st_transform(crs = 4326) %>% 
  st_coordinates() %>% 
  as.data.frame()

Step 2 Send Google Places API calls.

I misinformed you of Google’s pricing policies in the class. Here is an accurate explanation as long as I understand:

  • Each month up to 5,000 calls for Places - Nearby Search are free.
  • For more details, visit this website and search for “nearby”.
  • However, note that they first charge your account and then also give you credits up to $200/month.
  • That is, you are still not charged up to the 5,000 calls (equivalent to $200), but you will see transaction records in your account.
  • Check your transaction records by visiting Google Cloud Platform > Billing > transactions. Make sure you select the right project on the top.
  • For this assignment, you will send around 2,000 API calls, so 5000 calls are more than enough.
  • I see this is not a very comfortable situation, and next time, I will explore other API options, which works in better ways.
# prepare URLs for API calls 
lat_lng <- paste(inner_coord$Y, inner_coord$X, sep=",")
# write_rds(lat_lng, paste0(getwd(), "/Week10/lat_lng.rds"))
# temp <- read_rds(paste0(getwd(), "/Week10/lat_lng.rds"))
# identical(lat_lng, temp)

url_1 <- "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location="
# url_2 <- "&radius=600&type="
url_2 <- "&rankby=distance&type="
business <- "cafe"
my_api <- "YOUR API KEY GOES HERE"
token <- "&pagetoken="

# define a function 
google_places <- function(lat_lng){
  
  # read in output from Google 
  result1_raw <<-
    paste0(url_1, lat_lng, url_2, business, "&key=", my_api) %>% 
    GET() %>% 
    content(as = "text") %>%
    fromJSON() 
  # transform to a data frame 
  if (result1_raw$status == "OK"){
    result1_df <<- 
      result1_raw$results %>% 
      dplyr::select(name, place_id, types, rating, user_ratings_total) 
    result1_df$x <<- result1_raw$results$geometry$location$lng
    result1_df$y <<- result1_raw$results$geometry$location$lat
    # result1_df$geometry <- NULL
  } else {
    result1_df <<- NA
  }
  print("Result #1 is just finished.")
  Sys.sleep(2)
  
  # run if results have more than 20 businesses 
  if (is.null(result1_raw$next_page_token) == FALSE){
    # read in output from Google 
    result2_raw <<-
      paste0(url_1, lat_lng, url_2, business, "&key=", my_api, token, result1_raw$next_page_token) %>% 
      GET() %>% 
      content(as = "text") %>%
      fromJSON() 
    # transform to a data frame 
    if (result2_raw$status == "OK"){
      result2_df <<- 
        result2_raw$results %>% 
        dplyr::select(name, place_id, types, rating, user_ratings_total) 
      result2_df$x <<- result2_raw$results$geometry$location$lng
      result2_df$y <<- result2_raw$results$geometry$location$lat
      # result2_df$geometry <- NULL
    } else {
      result2_df <<- NA
    }
  } else {
    result2_df <<- NA
  }
  print("Result #2 is just finished.")
  Sys.sleep(2)
  
  # run if results have more than 40 businesses 
  if (is.null(result2_raw$next_page_token) == FALSE){
    # read in output from Google 
    result3_raw <<-
      paste0(url_1, lat_lng, url_2, business, "&key=", my_api, token, result2_raw$next_page_token) %>% 
      GET() %>% 
      content(as = "text") %>%
      fromJSON() 
    # transform to a data frame 
    if (result3_raw$status == "OK"){
      result3_df <<- 
        result3_raw$results %>% 
        dplyr::select(name, place_id, types, rating, user_ratings_total) 
      result3_df$x <<- result3_raw$results$geometry$location$lng
      result3_df$y <<- result3_raw$results$geometry$location$lat
      # result3_df$geometry <- NULL
    } else {
      result3_df <<- NA
    }
  } else {
    result3_df <<- NA
  }
  print("Result #3 is just finished.")
  
  result_invalid <<- list(result1_raw$status, result2_raw$status, result3_raw$status) %>%
    map_lgl(~.=="INVALID_REQUEST")
  if (any(result_invalid)){
    print("INVALID_REQUEST")
    return("INVALID_REQUEST")
  } else {
    result_list  <<- list(result1_df, result2_df, result3_df)
    result_final <<- result_list[!is.na(result_list)] %>% 
      dplyr::bind_rows()
    print("Succesful collection")
    return(result_final)
  }
  Sys.sleep(2)
}

# make an empty ouput list 
output <- vector("list", length(lat_lng))

# run a for loop 
for (i in seq_along(lat_lng)) {
  output[[i]] <- google_places(lat_lng[[i]])  
}

map(output, ~class(.)!="data.frame") %>% unlist() %>% sum()
map_dbl(output, nrow) %>% unlist() %>% sum()/60

output_df <- dplyr::bind_rows(output) %>% as_tibble()

output_df2 <- 
  output_df %>%  
  group_by(place_id) %>%
  summarize(rating = mean(rating, na.rm = TRUE), 
            rating_n = mean(user_ratings_total, na.rm = TRUE), 
            x = mean(x), 
            y = mean(y)) 
  
output_sf <- st_as_sf(output_df2, coords = c("x", "y"), crs = 4326)
output_sf$ln_rating_n <- log(output_sf$rating_n +1)

tmap_mode("view")
tm_basemap("Stamen.TonerLite", alpha = 0.25) +
  tm_shape(output_sf) +
  tm_dots(col = "ln_rating_n", size = 0.05, palette = "-viridis") +
  tm_shape(inner_points) +
  tm_dots(col = "red", size = 0.01, alpha = 0.25) + 
  tm_shape(cbd_bf) +
  tm_borders(col = "red") + 
  tm_tiles("Stamen.TonerLabels")  

Step 3 Clean the Google Places API results

tigris_cache_dir(paste0(getwd(), "/tigris"))
readRenviron('~/.Renviron')

# read in census tracts in GA with the tigris package 
tracts_ga <- tracts(
  state = "GA", cb = FALSE, year = 2016) %>% 
  st_as_sf() %>%
  mutate(
    sqmt = as.numeric(ALAND), # for now ALAND is characer
    sqmi = sqmt/1000000*0.386102159  # convert from sq meter to sq mile 
  ) %>%
  dplyr::select(GEOID, sqmi) %>%
  st_transform(crs = 26967)

# subset tracts within the Altanta UA boudnary: 885 tracts  
tracts_atl <- tracts_ga[ua_atl, ]

# subset tracts within the CBD 5-mile buffer: 124 tracts  
tracts_cbd <- tracts_ga[cbd_bf, ] 

# project output_sf (430 businesses) with crs = 26967 
output_sf_proj <- output_sf %>%
  st_transform(crs = 26967) 

tmap_mode("view")
tm_shape(tracts_cbd) +
  tm_polygons(col = "red", alpha = 0.25) + 
  tm_shape(tracts_atl) +
  tm_borders() + 
  tm_shape(output_sf_proj) + 
  tm_dots(col = "blue")

den_cafe <- st_join(output_sf_proj, tracts_cbd) %>% #  only 74 tracts have any cafe
  filter(is.na(GEOID) == FALSE) %>% 
  group_by(GEOID) %>%
  summarize(
    n = n(), 
    sqmi = mean(sqmi)
    ) %>%
  mutate(
    den_cafe = n/sqmi
    ) %>%
  dplyr::select(GEOID, den_cafe)

Step 4 Collect an alternative measure.

# read in job counts in CA with the lehdr package 
jobs_ga <- grab_lodes(
  state = "GA", year = 2015, lodes_type = "wac", job_type = "JT00", 
  segment = "S000", state_part = "main", agg_geo = "tract", 
  download_dir = paste0(getwd(), "/Week10/lodes"))

retail_ga <- jobs_ga %>%
  dplyr::select(GEOID = w_tract, retail = CNS07)

# compute retail density at the census tract level for Atlanta UA, transform back to WGS84    
den_retail <- tracts_cbd %>%
  left_join(retail_ga) %>%
  mutate(
    den_retail = retail/sqmi
  ) %>%
  dplyr::select(GEOID, den_retail) 

Step 5 Compute College graduates 25-34 as of 2017

var_acs17 <- load_variables(2017, "acs5", cache = TRUE)
View(var_acs17)

# census_api_key("3b1d8912e33aa2d4c01bf1abc84729cfeb7cd6cd", install = TRUE, overwrite = TRUE)
# readRenviron("~/.Renviron")
# Sys.getenv("CENSUS_API_KEY")

# B15001_017: Male!!25 to 34 years!!Bachelor's degree
# B15001_018: Male!!25 to 34 years!!Graduate or professional degree
# B15001_058: Female!!25 to 34 years!!Bachelor's degree
# B15001_059: Female!!25 to 34 years!!Graduate or professional degree
young_ga <- get_acs(state = "GA", geography = "tract", year = 2017, 
        variables = c("B15001_017", "B15001_018", "B15001_058", "B15001_059"), 
        output = "wide") 

young_ga2 <- young_ga %>%
  mutate(young = B15001_017E + B15001_018E + B15001_058E + B15001_059E) %>%
  dplyr::select(GEOID, young)

young_cbd <- tracts_cbd %>%
  left_join(young_ga2) %>%
  mutate(
    sum_young = sum(young), 
    pct_young = young/sum_young*100 
  ) %>%
  dplyr::select(GEOID, pct_young) 

tmap_mode("view")
tm_shape(young_cbd) +
  tm_polygons(col = "pct_young", alpha = 0.5) + 
  tm_shape(output_sf) +
  tm_dots(col = "red", size = 0.03)

Step 6 Create plots to compare the performance of two measures.

den_cafe_df <- den_cafe 
st_geometry(den_cafe_df) <- NULL

den_retail_df<- den_retail
st_geometry(den_retail_df) <- NULL

young_cbd_df <- young_cbd
st_geometry(young_cbd_df) <- NULL

data <- left_join(den_retail_df, den_cafe_df) %>%
  mutate(
    den_retail = ifelse(is.na(den_retail)==TRUE, 0, den_retail), 
    ln_retail  = log(den_retail+1), 
    den_cafe   = ifelse(is.na(den_cafe)  ==TRUE, 0, den_cafe), 
    ln_cafe    = log(den_cafe+1)
    ) %>%
  left_join(young_cbd_df)

data$ln_retail <- scale(data$ln_retail)
data$ln_cafe <- scale(data$ln_cafe)

# https://stackoverflow.com/questions/7549694/adding-regression-line-equation-and-r2-on-graph
lm_retail <- ggplot(data) + 
  stat_smooth_func(aes(ln_retail, pct_young), geom="text",method="lm",hjust=0,parse=TRUE) +
  geom_smooth(aes(ln_retail, pct_young), method="lm",se=FALSE, color = "coral1", size = 0.1) +
  geom_point(aes(ln_retail, pct_young), color = "coral4") 

lm_cafe <- ggplot(data) + 
  stat_smooth_func(aes(ln_cafe, pct_young), geom="text",method="lm",hjust=0,parse=TRUE) +
  geom_smooth(aes(ln_cafe, pct_young), method="lm",se=FALSE, color = "deepskyblue1", size = 0.1) +
  geom_point(aes(ln_cafe, pct_young), color = "deepskyblue4")
  install.packages("ggpubr")
  

ggarrange(lm_retail, lm_cafe, 
          ncol = 2, nrow = 1)

Step 7 With the same workflow, extract cafe for the inner city of the Boston Urban Area, MA and the San Francisco Urban Area, CA.

  • This is your job! Submit your script that does the following tasks (no skeleton code is given here, but you can refer to the complete code here.).
  • Limit the inner city as those census tracts within 5 miles from the CBD.
  • Above, the CBD of the Atlanta UA was the centroid of the census tract with the highest job density based on the 2015 LEHD (I extracted the total # of job or C000 from the WAC table at the census tract level. The code used for this task is not included here). Follow the same approach when you define five-mile buffers for Boston and San Francisco.