Mini-Assignment1_Yifan Wang

Step1: Install Packages

Packages	Description
tidycensus	Allows users to interact with the U.S. Census APIs and returns tidyverse-ready data frames, with the option to include simple element geometry.
sf	Introducing simple features objects in R
tmap	A specialized geographic mapping toolkit in the R
jsonlite	Specialized for converting, reading and saving json file
tidyverse	The core functions are ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats, which provide functions for modeling, transforming, and visualizing data
httr	For HTTP requests in R
reshape2	Convert data between wide-format and long-format
here	Quickly get the path to the target file
yelpr	An R library for the Yelp Fusion API
knitr	Quick report generation
tigris	Allow users to download and use the TIGER / Line shapefile() directly from the U.S. Census Bureau

# install.packages("tidycensus")
# install.packages("sf")
# install.packages("tmap")

# install.packages("jsonlite")
# install.packages("tidyverse")
# install.packages("httr")
# install.packages("reshape2")
# install.packages("here")

# install.packages("yelpr") Error: package ‘yelpr’ is not available for this version of R
# install.packages("devtools")
# devtools::install_github("OmaymaS/yelpr")

# install.packages("knitr")
# install.packages("tigris")

library(tidycensus)
library(sf)
library(tmap)

library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
library(here)
library(yelpr)
library(knitr)
library(tigris)

Step2: Getting Census Tract Boundary

Before calling the boundary data, it needs to be authenticated with the census_api. Since the code content is going to be public on Rpubs, store the key as an environment variable and call it with the Sys.getenv() function.

tidycensus::census_api_key(Sys.getenv("census_api"))

## To install your API key for use in future sessions, run this function with `install = TRUE`.

In my code, the area selected is District of Columbia in 2020.But later we will see ## Retrieving data for the year 2021. this is because the administrative boundary data is changing all the time and is updated every 5 years. Here it is updated for the year 2021.

#### Tract polygons for the Yelp query
tract <- suppressMessages(
  get_acs(geography = "tract", # or "block group", "county", "state" etc. 
          state = "District of Columbia", 
          county = c("District of Columbia"), 
          variables = c(hhincome = 'B19019_001',
                        race.tot = "B02001_001", 
                        race.white = "B02001_002", 
                        race.black = 'B02001_003'
          ),
          year = 2020,
          survey = "acs5", # American Community Survey 5-year estimate
          geometry = TRUE, # returns sf objects
          output = "wide") # wide vs. long  
          # Wide version's each variable have its own column but long version does not.
)

washington <- places('District of Columbia') %>% 
  filter(NAME == 'Washington')

## Retrieving data for the year 2021

tract <- tract[washington,]

# View the data
message(sprintf("nrow: %s, ncol: %s", nrow(tract), ncol(tract)))

## nrow: 206, ncol: 11

# kable() This function is for neatly displaying tables on HTML document.
tract %>% head() %>% knitr::kable() #head(tract)

GEOID	NAME	hhincomeE	hhincomeM	race.totE	race.totM	race.whiteE	race.whiteM	race.blackE	race.blackM	geometry
11001001002	Census Tract 10.02, District of Columbia, District of Columbia	140772	20766	3489	630	2710	596	211	111	POLYGON ((-77.08563 38.9382…
11001002801	Census Tract 28.01, District of Columbia, District of Columbia	62323	22218	4104	607	1727	412	1104	370	POLYGON ((-77.03645 38.9349…
11001003100	Census Tract 31, District of Columbia, District of Columbia	133408	18433	3825	606	1820	452	987	243	POLYGON ((-77.02826 38.9318…
11001004001	Census Tract 40.01, District of Columbia, District of Columbia	153650	5886	4785	822	4126	850	309	260	POLYGON ((-77.05018 38.9212…
11001010500	Census Tract 105, District of Columbia, District of Columbia	92083	20106	3927	704	2283	528	1164	516	POLYGON ((-77.01756 38.8850…
11001004600	Census Tract 46, District of Columbia, District of Columbia	129896	18312	3169	551	1517	418	1384	403	POLYGON ((-77.01811 38.9146…

In the data list, there are two variables for a data type, the one ending in E is the actual value and the one ending in M is the associated parameter error. For simplicity, only the value ending in E is retained.

# Retaining only those I want.
# Notice that select function can also change names when it selects columns.
tract <- tract %>% 
  select(GEOID, 
         hhincome = hhincomeE, # New name = old name
         race.tot = race.totE,
         race.white = race.whiteE,
         race.black = race.blackE)

tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(tract) + tm_borders() + tm_shape(washington) + tm_borders(col = 'red')

Step3: Preparing Census Tract Boundary for Yelp API

Because yelp does range searches in points and radii, and it provides up to 1,000 return values at a time, we break the target range into smaller portions for split batch searches using census tract boundaries: create a minimum outer circle for each census tract.

# Function: Get tract-wise radius
get_r <- function(poly, epsg_id){
  #---------------------
  # Takes: a single POLYGON or LINESTRTING
  # Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
  #---------------------
  
  # Get bounding box of a given polygon
  bb <- st_bbox(poly)
  # Get lat & long coordinates of any one corner of the bounding box.
  bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
  # Get centroid of the bb
  bb_center_x <- (bb[3]+bb[1])/2
  bb_center_y <- (bb[4]+bb[2])/2
  bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
  
  # Get the distance between bb_p and c
  r <- st_distance(bb_corner, bb_center)
  # Multiply 1.1 to make the circle a bit larger than the Census Tract.
  # See the Yelp explanation of their radius parameter to see why we do this.
  bb_center$radius <- r*1.2
  return(bb_center)
}

## Using a loop -----------------------------------------------------------------
# Creating an empty vector of NA. 
# Results will fill this vector
epsg_id <- 4326

r4all_loop <- vector("list", nrow(tract))

# Starting a for-loop

for (i in 1:nrow(tract)){
  r4all_loop[[i]] <- tract %>% 
    st_transform(crs = epsg_id) %>% 
    st_geometry() %>% 
    .[[i]] %>% 
    get_r(epsg_id = epsg_id)
}

r4all_loop <- bind_rows(r4all_loop)


# Using a functional -----------------------------------------------------------
# We use a functional (sapply) to apply this custom function to each Census Tract.
r4all_apply <- tract %>%
  st_geometry() %>% 
  st_transform(crs = epsg_id) %>% 
  lapply(., function(x) get_r(x, epsg_id = epsg_id))

r4all_apply <- bind_rows(r4all_apply)

# Are these two identical?
identical(r4all_apply, r4all_loop)

## [1] TRUE

# Appending X Y coordinates as seprate columns
ready_4_yelp <- r4all_apply %>% 
  mutate(x = st_coordinates(.)[,1],
         y = st_coordinates(.)[,2])

Selecting the first ten regions for visualization.

tmap_mode('view')

## tmap mode set to interactive viewing

# Select the first 10 rows
ready_4_yelp[1:10,] %>% 
  # Draw a buffer centered at the centroid of Tract polygons.
  # Radius of the buffer is the radius we just calculated using loop
  st_buffer(., dist = .$radius) %>% 
  # Display this buffer in red
  tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
  # Display the original polygon in blue
  tm_shape(tract[1:10,]) + tm_borders(col= 'blue')

Step4: Yelp API for accessing census tracts

Here is an example of calling the business_search() function with yelpr.

which_tract <- 1
test <- business_search(api_key = Sys.getenv('yelp_api'), # like we did for census, store your api key
                        categories = 'hospitals',
                        latitude = ready_4_yelp$y[which_tract],
                        longitude = ready_4_yelp$x[which_tract],
                        offset = 0, # 1st page, 1st obs
                        radius = round(ready_4_yelp$radius[which_tract]), # radius requires integer value
                        limit = 50) # how many business per page

names(test)

# Business
paste0("is it a data.frame?: ", is.data.frame(test$businesses), ", ",
       " how many rows?: ", nrow(test$businesses), ", ",
       " how many columns?: ", ncol(test$businesses))

Next, creating a function that accesses the Yelp API for a census tract.

# FUNCTION
get_yelp <- function(tract, category){
  # ----------------------------------
  # Gets one row of tract information (1,) and category name (str),
  # Outputs a list of business data.frame
  Sys.sleep(1)
  n <- 1
  # First request --------------------------------------------------------------
  resp <- business_search(api_key = Sys.getenv("yelp_api"), 
                          categories = category, 
                          latitude = tract$y, 
                          longitude = tract$x, 
                          offset = (n - 1) * 50, # = 0 when n = 1
                          radius = round(tract$radius), 
                          limit = 50)
  # Calculate how many requests are needed in total
  required_n <- ceiling(resp$total/50)
  
  # out is where the results will be appended to.
  out <- vector("list", required_n)
  
  # Store the business information to nth slot in out
  out[[n]] <- resp$businesses
  
  # Change the name of the elements to the total required_n
  # This is to know if there are more than 1000 businesses,
  # we know how many.
  names(out)[n] <- required_n
  
  # Throw error if more than 500
  if (resp$total >= 500)
  {
    # glue formats string by inserting {n} with what's currently stored in object n.
    print(glue::glue("{n}th row has >= 1000 businesses."))
    # Stop before going into the loop because we need to
    # break down Census Tract to something smaller.
    return(out)
  } 
  else 
  {
    # add 1 to n
    n <- n + 1
    
    # Now we know required_n -----------------------------------------------------
    # Starting a loop
    while(n <= required_n){
      resp <- business_search(api_key = Sys.getenv("yelp_api"), 
                              categories = category, 
                              latitude = tract$y, 
                              longitude = tract$x, 
                              offset = (n - 1) * 50, 
                              radius = round(tract$radius), 
                              limit = 50)
      
      out[[n]] <- resp$businesses
      
      n <- n + 1
    } #<< end of while loop
    
    # Merge all elements in the list into a single data frame
    out <- out %>% bind_rows()
    
    return(out)
  }
}

Apply the function for the first Census Tract.

# Apply the function for the first Census Tract
yelp_first_tract <- get_yelp(ready_4_yelp[1,], "hospitals") %>% 
  as_tibble()

## No encoding supplied: defaulting to UTF-8.

# Print
yelp_first_tract %>% print

## # A tibble: 2 × 15
##   id        alias name  image_url is_closed url   review_count categories rating
##   <chr>     <chr> <chr> <chr>     <lgl>     <chr>        <int> <list>      <dbl>
## 1 _RaDtIOF… meds… MedS… "https:/… FALSE     http…           20 <df>          1.5
## 2 YmDNjmiY… geor… Geor… ""        FALSE     http…            1 <df>          3  
## # ℹ 6 more variables: coordinates <df[,2]>, transactions <list>,
## #   location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>

Apply the function for each Census Tract.

# Prepare a collector
yelp_all_list <- vector("list", nrow(ready_4_yelp))

# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
  yelp_all_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "hospitals"))
  if (row %% 10 == 0){
    print(paste0("Current row: ", row))
  }
}

## [1] "Current row: 10"
## [1] "Current row: 20"
## [1] "Current row: 30"
## [1] "Current row: 40"
## [1] "Current row: 50"
## [1] "Current row: 60"
## [1] "Current row: 70"
## [1] "Current row: 80"
## [1] "Current row: 90"
## [1] "Current row: 100"
## [1] "Current row: 110"
## [1] "Current row: 120"
## [1] "Current row: 130"
## [1] "Current row: 140"
## [1] "Current row: 150"
## [1] "Current row: 160"
## [1] "Current row: 170"
## [1] "Current row: 180"
## [1] "Current row: 190"
## [1] "Current row: 200"

# Collapsing the list into a data.frame
yelp_all <- yelp_all_list %>% bind_rows() %>% as_tibble()

# print
yelp_all %>% print(width=1000)

## # A tibble: 166 × 15
##    id                    
##    <chr>                 
##  1 _RaDtIOFuyYCUHLKe59YfA
##  2 YmDNjmiYY2iWTZD-NkD1gQ
##  3 AVKphCn6rkDKRrnaBQnGGg
##  4 UGEDcMyuV2aH52RKfMIWeg
##  5 KmkgdnIOuKzw6DZ-j0pJxQ
##  6 AQyKAawtsnSkuCYidRs7tw
##  7 _T_yuBcKg2I5y2WIpACFzA
##  8 VLqdfhwTWkX-u4UjvBZtXA
##  9 a4SoZrsTqgMGIuL7-6wTTw
## 10 AVKphCn6rkDKRrnaBQnGGg
##    alias                                                   
##    <chr>                                                   
##  1 medstar-georgetown-pediatrics-washington-2              
##  2 georgetown-university-hospital-washington-2             
##  3 georgetown-university-hospital-washington-4             
##  4 doctors-groover-christie-and-merritt-washington         
##  5 kaiser-health-plan-of-the-mid-atlantic-states-washington
##  6 columbia-hospital-for-women-medical-center-washington   
##  7 bridgepoint-capitol-hill-washington-2                   
##  8 capitol-hill-nursing-home-washington                    
##  9 premier-surgery-center-of-washington-dc-washington      
## 10 georgetown-university-hospital-washington-4             
##    name                                         
##    <chr>                                        
##  1 MedStar Georgetown Pediatrics                
##  2 Georgetown University Hospital               
##  3 Georgetown University Hospital               
##  4 Doctors Groover Christie & Merritt           
##  5 Kaiser Health Plan of the MID-Atlantic States
##  6 Columbia Hospital For Women Medical Center   
##  7 BridgePoint - Capitol Hill                   
##  8 Capitol Hill Nursing Home                    
##  9 Premier Surgery Center of Washington DC      
## 10 Georgetown University Hospital               
##    image_url                                                             
##    <chr>                                                                 
##  1 "https://s3-media3.fl.yelpcdn.com/bphoto/LELlyZqWQAMVqEd6JOF57Q/o.jpg"
##  2 ""                                                                    
##  3 ""                                                                    
##  4 ""                                                                    
##  5 "https://s3-media3.fl.yelpcdn.com/bphoto/_WEVQTve5o4VXw6Mb6Ii0Q/o.jpg"
##  6 ""                                                                    
##  7 "https://s3-media1.fl.yelpcdn.com/bphoto/t8h0FiUGWBxfHxAnahJT8Q/o.jpg"
##  8 ""                                                                    
##  9 ""                                                                    
## 10 ""                                                                    
##    is_closed
##    <lgl>    
##  1 FALSE    
##  2 FALSE    
##  3 FALSE    
##  4 FALSE    
##  5 FALSE    
##  6 FALSE    
##  7 FALSE    
##  8 FALSE    
##  9 FALSE    
## 10 FALSE    
##    url                                                                          
##    <chr>                                                                        
##  1 https://www.yelp.com/biz/medstar-georgetown-pediatrics-washington-2?adjust_c…
##  2 https://www.yelp.com/biz/georgetown-university-hospital-washington-2?adjust_…
##  3 https://www.yelp.com/biz/georgetown-university-hospital-washington-4?adjust_…
##  4 https://www.yelp.com/biz/doctors-groover-christie-and-merritt-washington?adj…
##  5 https://www.yelp.com/biz/kaiser-health-plan-of-the-mid-atlantic-states-washi…
##  6 https://www.yelp.com/biz/columbia-hospital-for-women-medical-center-washingt…
##  7 https://www.yelp.com/biz/bridgepoint-capitol-hill-washington-2?adjust_creati…
##  8 https://www.yelp.com/biz/capitol-hill-nursing-home-washington?adjust_creativ…
##  9 https://www.yelp.com/biz/premier-surgery-center-of-washington-dc-washington?…
## 10 https://www.yelp.com/biz/georgetown-university-hospital-washington-4?adjust_…
##    review_count categories   rating coordinates$latitude $longitude transactions
##           <int> <list>        <dbl>                <dbl>      <dbl> <list>      
##  1           20 <df [2 × 2]>    1.5                 38.9      -77.1 <list [0]>  
##  2            1 <df [2 × 2]>    3                   38.9      -77.1 <list [0]>  
##  3            1 <df [1 × 2]>    4                   38.9      -77.0 <list [0]>  
##  4            1 <df [1 × 2]>    2                   38.9      -77.0 <list [0]>  
##  5            1 <df [1 × 2]>    4                   38.9      -77.0 <list [0]>  
##  6            3 <df [1 × 2]>    5                   38.9      -77.1 <list [0]>  
##  7           12 <df [1 × 2]>    2.5                 38.9      -77.0 <list [0]>  
##  8            1 <df [1 × 2]>    5                   38.9      -77.0 <list [0]>  
##  9            1 <df [1 × 2]>    5                   39.0      -77.0 <list [0]>  
## 10            1 <df [1 × 2]>    4                   38.9      -77.0 <list [0]>  
##    location$address1         $address2 $address3           $city         
##    <chr>                     <chr>     <chr>               <chr>         
##  1 "4200 Wisconsin Ave NW"   "Fl 4"    ""                  Washington, DC
##  2 "3301 New Mexico Ave NW"  ""        ""                  Washington, DC
##  3 ""                        ""        ""                  Washington, DC
##  4 ""                        ""        ""                  Washington, DC
##  5 "700 Second St NE"        ""        "Fl 6-  Nephrology" Washington, DC
##  6 "2440 M St NW"            "Ste 224" ""                  Washington, DC
##  7 "223 7th St NE"           ""        ""                  Washington, DC
##  8 "700 Constitution Ave NE" ""        ""                  Washington, DC
##  9 "6323 Georgia Ave NW"     ""        ""                  Washington, DC
## 10 ""                        ""        ""                  Washington, DC
##    $zip_code $country $state $display_address phone        display_phone 
##    <chr>     <chr>    <chr>  <list>           <chr>        <chr>         
##  1 20016     US       DC     <chr [3]>        +12022433400 (202) 243-3400
##  2 20016     US       DC     <chr [2]>        +12026861316 (202) 686-1316
##  3 20001     US       DC     <chr [1]>        +12024442000 (202) 444-2000
##  4 20001     US       DC     <chr [1]>        +12025374781 (202) 537-4781
##  5 20002     US       DC     <chr [3]>        +12023463550 (202) 346-3550
##  6 20037     US       DC     <chr [3]>        +12026590240 (202) 659-0240
##  7 20002     US       DC     <chr [2]>        +12025465700 (202) 546-5700
##  8 20002     US       DC     <chr [2]>        +12025464800 (202) 546-4800
##  9 20011     US       DC     <chr [2]>        +12022910126 (202) 291-0126
## 10 20001     US       DC     <chr [1]>        +12024442000 (202) 444-2000
##    distance
##       <dbl>
##  1    448. 
##  2    846. 
##  3    524. 
##  4    524. 
##  5   1709. 
##  6    440. 
##  7    292. 
##  8    277. 
##  9    394. 
## 10     39.8
## # ℹ 156 more rows

Replace ‘hospitals’ with ‘Medical Centers’ and repeat the data acquisition steps

# Apply the function for the first Census Tract
yelp_first_tract_2 <- get_yelp(ready_4_yelp[1,], "medcenters") %>% 
  as_tibble()

# Prepare a collector
yelp_all_list_2 <- vector("list", nrow(ready_4_yelp))

# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
  yelp_all_list_2[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "medcenters"))
  if (row %% 10 == 0){
    print(paste0("Current row: ", row))
  }
}

# Collapsing the list into a data.frame
yelp_all_2 <- yelp_all_list_2 %>% bind_rows() %>% as_tibble()

# print
yelp_all_2 %>% print(width=1000)

Step5: Map yelp information in data frame format

Hospitals

# Extract coordinates
yelp_sf <- yelp_all %>% 
  mutate(x = .$coordinates$longitude,
         y = .$coordinates$latitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

# Map
tm_shape(yelp_sf) +
  tm_dots(col = "review_count", style="quantile")

Medical Centers

# Extract coordinates
yelp_sf_2 <- yelp_all_2 %>% 
  mutate(x = .$coordinates$longitude,
         y = .$coordinates$ltitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

# Map
tm_shape(yelp_sf_2) +
  tm_dots(col = "review_count", style="quantile")

Step6: Overview

What’s the county and state of your choice? State : District of Columbia County: District of Columbia
How many businesses are there in total? Total:
How many businesses are there for each business category? Hospitals: Medical Centers:
Upon visual inspection, can you see any noticeable spatial patterns to the way they are distributed across the county (e.g., clustering of businesses at some parts of the county)?
Are there any other interesting findings?