Step1: Install Packages

Packages Description
tidycensus Allows users to interact with the U.S. Census APIs and returns tidyverse-ready data frames, with the option to include simple element geometry.
sf Introducing simple features objects in R
tmap A specialized geographic mapping toolkit in the R
jsonlite Specialized for converting, reading and saving json file
tidyverse The core functions are ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats, which provide functions for modeling, transforming, and visualizing data
httr For HTTP requests in R
reshape2 Convert data between wide-format and long-format
here Quickly get the path to the target file
yelpr An R library for the Yelp Fusion API
knitr Quick report generation
tigris Allow users to download and use the TIGER / Line shapefile() directly from the U.S. Census Bureau
# install.packages("tidycensus")
# install.packages("sf")
# install.packages("tmap")

# install.packages("jsonlite")
# install.packages("tidyverse")
# install.packages("httr")
# install.packages("reshape2")
# install.packages("here")

# install.packages("yelpr") Error: package ‘yelpr’ is not available for this version of R
# install.packages("devtools")
# devtools::install_github("OmaymaS/yelpr")

# install.packages("knitr")
# install.packages("tigris")

library(tidycensus)
library(sf)
library(tmap)

library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
library(here)
library(yelpr)
library(knitr)
library(tigris)

Step2: Getting Census Tract Boundary

Before calling the boundary data, it needs to be authenticated with the census_api. Since the code content is going to be public on Rpubs, store the key as an environment variable and call it with the Sys.getenv() function.

tidycensus::census_api_key(Sys.getenv("census_api"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.

In my code, the area selected is District of Columbia in 2020.But later we will see ## Retrieving data for the year 2021. this is because the administrative boundary data is changing all the time and is updated every 5 years. Here it is updated for the year 2021.

#### Tract polygons for the Yelp query
tract <- suppressMessages(
  get_acs(geography = "tract", # or "block group", "county", "state" etc. 
          state = "District of Columbia", 
          county = c("District of Columbia"), 
          variables = c(hhincome = 'B19019_001',
                        race.tot = "B02001_001", 
                        race.white = "B02001_002", 
                        race.black = 'B02001_003'
          ),
          year = 2020,
          survey = "acs5", # American Community Survey 5-year estimate
          geometry = TRUE, # returns sf objects
          output = "wide") # wide vs. long  
          # Wide version's each variable have its own column but long version does not.
)

washington <- places('District of Columbia') %>% 
  filter(NAME == 'Washington') 
## Retrieving data for the year 2021
tract <- tract[washington,]
# View the data
message(sprintf("nrow: %s, ncol: %s", nrow(tract), ncol(tract)))
## nrow: 206, ncol: 11
# kable() This function is for neatly displaying tables on HTML document.
tract %>% head() %>% knitr::kable() #head(tract)
GEOID NAME hhincomeE hhincomeM race.totE race.totM race.whiteE race.whiteM race.blackE race.blackM geometry
11001001002 Census Tract 10.02, District of Columbia, District of Columbia 140772 20766 3489 630 2710 596 211 111 POLYGON ((-77.08563 38.9382…
11001002801 Census Tract 28.01, District of Columbia, District of Columbia 62323 22218 4104 607 1727 412 1104 370 POLYGON ((-77.03645 38.9349…
11001003100 Census Tract 31, District of Columbia, District of Columbia 133408 18433 3825 606 1820 452 987 243 POLYGON ((-77.02826 38.9318…
11001004001 Census Tract 40.01, District of Columbia, District of Columbia 153650 5886 4785 822 4126 850 309 260 POLYGON ((-77.05018 38.9212…
11001010500 Census Tract 105, District of Columbia, District of Columbia 92083 20106 3927 704 2283 528 1164 516 POLYGON ((-77.01756 38.8850…
11001004600 Census Tract 46, District of Columbia, District of Columbia 129896 18312 3169 551 1517 418 1384 403 POLYGON ((-77.01811 38.9146…

In the data list, there are two variables for a data type, the one ending in E is the actual value and the one ending in M is the associated parameter error. For simplicity, only the value ending in E is retained.

# Retaining only those I want.
# Notice that select function can also change names when it selects columns.
tract <- tract %>% 
  select(GEOID, 
         hhincome = hhincomeE, # New name = old name
         race.tot = race.totE,
         race.white = race.whiteE,
         race.black = race.blackE)

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(tract) + tm_borders() + tm_shape(washington) + tm_borders(col = 'red')

Step3: Preparing Census Tract Boundary for Yelp API

Because yelp does range searches in points and radii, and it provides up to 1,000 return values at a time, we break the target range into smaller portions for split batch searches using census tract boundaries: create a minimum outer circle for each census tract.

# Function: Get tract-wise radius
get_r <- function(poly, epsg_id){
  #---------------------
  # Takes: a single POLYGON or LINESTRTING
  # Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
  #---------------------
  
  # Get bounding box of a given polygon
  bb <- st_bbox(poly)
  # Get lat & long coordinates of any one corner of the bounding box.
  bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
  # Get centroid of the bb
  bb_center_x <- (bb[3]+bb[1])/2
  bb_center_y <- (bb[4]+bb[2])/2
  bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
  
  # Get the distance between bb_p and c
  r <- st_distance(bb_corner, bb_center)
  # Multiply 1.1 to make the circle a bit larger than the Census Tract.
  # See the Yelp explanation of their radius parameter to see why we do this.
  bb_center$radius <- r*1.2
  return(bb_center)
}

## Using a loop -----------------------------------------------------------------
# Creating an empty vector of NA. 
# Results will fill this vector
epsg_id <- 4326

r4all_loop <- vector("list", nrow(tract))

# Starting a for-loop

for (i in 1:nrow(tract)){
  r4all_loop[[i]] <- tract %>% 
    st_transform(crs = epsg_id) %>% 
    st_geometry() %>% 
    .[[i]] %>% 
    get_r(epsg_id = epsg_id)
}

r4all_loop <- bind_rows(r4all_loop)


# Using a functional -----------------------------------------------------------
# We use a functional (sapply) to apply this custom function to each Census Tract.
r4all_apply <- tract %>%
  st_geometry() %>% 
  st_transform(crs = epsg_id) %>% 
  lapply(., function(x) get_r(x, epsg_id = epsg_id))

r4all_apply <- bind_rows(r4all_apply)

# Are these two identical?
identical(r4all_apply, r4all_loop)
## [1] TRUE
# Appending X Y coordinates as seprate columns
ready_4_yelp <- r4all_apply %>% 
  mutate(x = st_coordinates(.)[,1],
         y = st_coordinates(.)[,2])

Selecting the first ten regions for visualization.

tmap_mode('view')
## tmap mode set to interactive viewing
# Select the first 10 rows
ready_4_yelp[1:10,] %>% 
  # Draw a buffer centered at the centroid of Tract polygons.
  # Radius of the buffer is the radius we just calculated using loop
  st_buffer(., dist = .$radius) %>% 
  # Display this buffer in red
  tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
  # Display the original polygon in blue
  tm_shape(tract[1:10,]) + tm_borders(col= 'blue')

Step4: Yelp API for accessing census tracts

Here is an example of calling the business_search() function with yelpr.

which_tract <- 1
test <- business_search(api_key = Sys.getenv('yelp_api'), # like we did for census, store your api key
                        categories = 'hospitals',
                        latitude = ready_4_yelp$y[which_tract],
                        longitude = ready_4_yelp$x[which_tract],
                        offset = 0, # 1st page, 1st obs
                        radius = round(ready_4_yelp$radius[which_tract]), # radius requires integer value
                        limit = 50) # how many business per page

names(test)

# Business
paste0("is it a data.frame?: ", is.data.frame(test$businesses), ", ",
       " how many rows?: ", nrow(test$businesses), ", ",
       " how many columns?: ", ncol(test$businesses))

Next, creating a function that accesses the Yelp API for a census tract.

# FUNCTION
get_yelp <- function(tract, category){
  # ----------------------------------
  # Gets one row of tract information (1,) and category name (str),
  # Outputs a list of business data.frame
  Sys.sleep(1)
  n <- 1
  # First request --------------------------------------------------------------
  resp <- business_search(api_key = Sys.getenv("yelp_api"), 
                          categories = category, 
                          latitude = tract$y, 
                          longitude = tract$x, 
                          offset = (n - 1) * 50, # = 0 when n = 1
                          radius = round(tract$radius), 
                          limit = 50)
  # Calculate how many requests are needed in total
  required_n <- ceiling(resp$total/50)
  
  # out is where the results will be appended to.
  out <- vector("list", required_n)
  
  # Store the business information to nth slot in out
  out[[n]] <- resp$businesses
  
  # Change the name of the elements to the total required_n
  # This is to know if there are more than 1000 businesses,
  # we know how many.
  names(out)[n] <- required_n
  
  # Throw error if more than 500
  if (resp$total >= 500)
  {
    # glue formats string by inserting {n} with what's currently stored in object n.
    print(glue::glue("{n}th row has >= 1000 businesses."))
    # Stop before going into the loop because we need to
    # break down Census Tract to something smaller.
    return(out)
  } 
  else 
  {
    # add 1 to n
    n <- n + 1
    
    # Now we know required_n -----------------------------------------------------
    # Starting a loop
    while(n <= required_n){
      resp <- business_search(api_key = Sys.getenv("yelp_api"), 
                              categories = category, 
                              latitude = tract$y, 
                              longitude = tract$x, 
                              offset = (n - 1) * 50, 
                              radius = round(tract$radius), 
                              limit = 50)
      
      out[[n]] <- resp$businesses
      
      n <- n + 1
    } #<< end of while loop
    
    # Merge all elements in the list into a single data frame
    out <- out %>% bind_rows()
    
    return(out)
  }
}

Apply the function for the first Census Tract.

# Apply the function for the first Census Tract
yelp_first_tract <- get_yelp(ready_4_yelp[1,], "hospitals") %>% 
  as_tibble()
## No encoding supplied: defaulting to UTF-8.
# Print
yelp_first_tract %>% print
## # A tibble: 2 × 15
##   id        alias name  image_url is_closed url   review_count categories rating
##   <chr>     <chr> <chr> <chr>     <lgl>     <chr>        <int> <list>      <dbl>
## 1 _RaDtIOF… meds… MedS… "https:/… FALSE     http…           20 <df>          1.5
## 2 YmDNjmiY… geor… Geor… ""        FALSE     http…            1 <df>          3  
## # ℹ 6 more variables: coordinates <df[,2]>, transactions <list>,
## #   location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>

Apply the function for each Census Tract.

# Prepare a collector
yelp_all_list <- vector("list", nrow(ready_4_yelp))

# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
  yelp_all_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "hospitals"))
  if (row %% 10 == 0){
    print(paste0("Current row: ", row))
  }
}
## [1] "Current row: 10"
## [1] "Current row: 20"
## [1] "Current row: 30"
## [1] "Current row: 40"
## [1] "Current row: 50"
## [1] "Current row: 60"
## [1] "Current row: 70"
## [1] "Current row: 80"
## [1] "Current row: 90"
## [1] "Current row: 100"
## [1] "Current row: 110"
## [1] "Current row: 120"
## [1] "Current row: 130"
## [1] "Current row: 140"
## [1] "Current row: 150"
## [1] "Current row: 160"
## [1] "Current row: 170"
## [1] "Current row: 180"
## [1] "Current row: 190"
## [1] "Current row: 200"
# Collapsing the list into a data.frame
yelp_all <- yelp_all_list %>% bind_rows() %>% as_tibble()

# print
yelp_all %>% print(width=1000)
## # A tibble: 166 × 15
##    id                    
##    <chr>                 
##  1 _RaDtIOFuyYCUHLKe59YfA
##  2 YmDNjmiYY2iWTZD-NkD1gQ
##  3 AVKphCn6rkDKRrnaBQnGGg
##  4 UGEDcMyuV2aH52RKfMIWeg
##  5 KmkgdnIOuKzw6DZ-j0pJxQ
##  6 AQyKAawtsnSkuCYidRs7tw
##  7 _T_yuBcKg2I5y2WIpACFzA
##  8 VLqdfhwTWkX-u4UjvBZtXA
##  9 a4SoZrsTqgMGIuL7-6wTTw
## 10 AVKphCn6rkDKRrnaBQnGGg
##    alias                                                   
##    <chr>                                                   
##  1 medstar-georgetown-pediatrics-washington-2              
##  2 georgetown-university-hospital-washington-2             
##  3 georgetown-university-hospital-washington-4             
##  4 doctors-groover-christie-and-merritt-washington         
##  5 kaiser-health-plan-of-the-mid-atlantic-states-washington
##  6 columbia-hospital-for-women-medical-center-washington   
##  7 bridgepoint-capitol-hill-washington-2                   
##  8 capitol-hill-nursing-home-washington                    
##  9 premier-surgery-center-of-washington-dc-washington      
## 10 georgetown-university-hospital-washington-4             
##    name                                         
##    <chr>                                        
##  1 MedStar Georgetown Pediatrics                
##  2 Georgetown University Hospital               
##  3 Georgetown University Hospital               
##  4 Doctors Groover Christie & Merritt           
##  5 Kaiser Health Plan of the MID-Atlantic States
##  6 Columbia Hospital For Women Medical Center   
##  7 BridgePoint - Capitol Hill                   
##  8 Capitol Hill Nursing Home                    
##  9 Premier Surgery Center of Washington DC      
## 10 Georgetown University Hospital               
##    image_url                                                             
##    <chr>                                                                 
##  1 "https://s3-media3.fl.yelpcdn.com/bphoto/LELlyZqWQAMVqEd6JOF57Q/o.jpg"
##  2 ""                                                                    
##  3 ""                                                                    
##  4 ""                                                                    
##  5 "https://s3-media3.fl.yelpcdn.com/bphoto/_WEVQTve5o4VXw6Mb6Ii0Q/o.jpg"
##  6 ""                                                                    
##  7 "https://s3-media1.fl.yelpcdn.com/bphoto/t8h0FiUGWBxfHxAnahJT8Q/o.jpg"
##  8 ""                                                                    
##  9 ""                                                                    
## 10 ""                                                                    
##    is_closed
##    <lgl>    
##  1 FALSE    
##  2 FALSE    
##  3 FALSE    
##  4 FALSE    
##  5 FALSE    
##  6 FALSE    
##  7 FALSE    
##  8 FALSE    
##  9 FALSE    
## 10 FALSE    
##    url                                                                          
##    <chr>                                                                        
##  1 https://www.yelp.com/biz/medstar-georgetown-pediatrics-washington-2?adjust_c…
##  2 https://www.yelp.com/biz/georgetown-university-hospital-washington-2?adjust_…
##  3 https://www.yelp.com/biz/georgetown-university-hospital-washington-4?adjust_…
##  4 https://www.yelp.com/biz/doctors-groover-christie-and-merritt-washington?adj…
##  5 https://www.yelp.com/biz/kaiser-health-plan-of-the-mid-atlantic-states-washi…
##  6 https://www.yelp.com/biz/columbia-hospital-for-women-medical-center-washingt…
##  7 https://www.yelp.com/biz/bridgepoint-capitol-hill-washington-2?adjust_creati…
##  8 https://www.yelp.com/biz/capitol-hill-nursing-home-washington?adjust_creativ…
##  9 https://www.yelp.com/biz/premier-surgery-center-of-washington-dc-washington?…
## 10 https://www.yelp.com/biz/georgetown-university-hospital-washington-4?adjust_…
##    review_count categories   rating coordinates$latitude $longitude transactions
##           <int> <list>        <dbl>                <dbl>      <dbl> <list>      
##  1           20 <df [2 × 2]>    1.5                 38.9      -77.1 <list [0]>  
##  2            1 <df [2 × 2]>    3                   38.9      -77.1 <list [0]>  
##  3            1 <df [1 × 2]>    4                   38.9      -77.0 <list [0]>  
##  4            1 <df [1 × 2]>    2                   38.9      -77.0 <list [0]>  
##  5            1 <df [1 × 2]>    4                   38.9      -77.0 <list [0]>  
##  6            3 <df [1 × 2]>    5                   38.9      -77.1 <list [0]>  
##  7           12 <df [1 × 2]>    2.5                 38.9      -77.0 <list [0]>  
##  8            1 <df [1 × 2]>    5                   38.9      -77.0 <list [0]>  
##  9            1 <df [1 × 2]>    5                   39.0      -77.0 <list [0]>  
## 10            1 <df [1 × 2]>    4                   38.9      -77.0 <list [0]>  
##    location$address1         $address2 $address3           $city         
##    <chr>                     <chr>     <chr>               <chr>         
##  1 "4200 Wisconsin Ave NW"   "Fl 4"    ""                  Washington, DC
##  2 "3301 New Mexico Ave NW"  ""        ""                  Washington, DC
##  3 ""                        ""        ""                  Washington, DC
##  4 ""                        ""        ""                  Washington, DC
##  5 "700 Second St NE"        ""        "Fl 6-  Nephrology" Washington, DC
##  6 "2440 M St NW"            "Ste 224" ""                  Washington, DC
##  7 "223 7th St NE"           ""        ""                  Washington, DC
##  8 "700 Constitution Ave NE" ""        ""                  Washington, DC
##  9 "6323 Georgia Ave NW"     ""        ""                  Washington, DC
## 10 ""                        ""        ""                  Washington, DC
##    $zip_code $country $state $display_address phone        display_phone 
##    <chr>     <chr>    <chr>  <list>           <chr>        <chr>         
##  1 20016     US       DC     <chr [3]>        +12022433400 (202) 243-3400
##  2 20016     US       DC     <chr [2]>        +12026861316 (202) 686-1316
##  3 20001     US       DC     <chr [1]>        +12024442000 (202) 444-2000
##  4 20001     US       DC     <chr [1]>        +12025374781 (202) 537-4781
##  5 20002     US       DC     <chr [3]>        +12023463550 (202) 346-3550
##  6 20037     US       DC     <chr [3]>        +12026590240 (202) 659-0240
##  7 20002     US       DC     <chr [2]>        +12025465700 (202) 546-5700
##  8 20002     US       DC     <chr [2]>        +12025464800 (202) 546-4800
##  9 20011     US       DC     <chr [2]>        +12022910126 (202) 291-0126
## 10 20001     US       DC     <chr [1]>        +12024442000 (202) 444-2000
##    distance
##       <dbl>
##  1    448. 
##  2    846. 
##  3    524. 
##  4    524. 
##  5   1709. 
##  6    440. 
##  7    292. 
##  8    277. 
##  9    394. 
## 10     39.8
## # ℹ 156 more rows

Replace ‘hospitals’ with ‘Medical Centers’ and repeat the data acquisition steps

# Apply the function for the first Census Tract
yelp_first_tract_2 <- get_yelp(ready_4_yelp[1,], "medcenters") %>% 
  as_tibble()

# Prepare a collector
yelp_all_list_2 <- vector("list", nrow(ready_4_yelp))

# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
  yelp_all_list_2[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "medcenters"))
  if (row %% 10 == 0){
    print(paste0("Current row: ", row))
  }
}
# Collapsing the list into a data.frame
yelp_all_2 <- yelp_all_list_2 %>% bind_rows() %>% as_tibble()

# print
yelp_all_2 %>% print(width=1000)

Step5: Map yelp information in data frame format

Hospitals

# Extract coordinates
yelp_sf <- yelp_all %>% 
  mutate(x = .$coordinates$longitude,
         y = .$coordinates$latitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

# Map
tm_shape(yelp_sf) +
  tm_dots(col = "review_count", style="quantile")

Medical Centers

# Extract coordinates
yelp_sf_2 <- yelp_all_2 %>% 
  mutate(x = .$coordinates$longitude,
         y = .$coordinates$ltitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

# Map
tm_shape(yelp_sf_2) +
  tm_dots(col = "review_count", style="quantile")

Step6: Overview

  1. What’s the county and state of your choice? State : District of Columbia County: District of Columbia

  2. How many businesses are there in total? Total:

  3. How many businesses are there for each business category? Hospitals: Medical Centers:

  4. Upon visual inspection, can you see any noticeable spatial patterns to the way they are distributed across the county (e.g., clustering of businesses at some parts of the county)?

  5. Are there any other interesting findings?