| Packages | Description |
|---|---|
| tidycensus | Allows users to interact with the U.S. Census APIs and returns tidyverse-ready data frames, with the option to include simple element geometry. |
| sf | Introducing simple features objects in R |
| tmap | A specialized geographic mapping toolkit in the R |
| jsonlite | Specialized for converting, reading and saving json file |
| tidyverse | The core functions are ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats, which provide functions for modeling, transforming, and visualizing data |
| httr | For HTTP requests in R |
| reshape2 | Convert data between wide-format and long-format |
| here | Quickly get the path to the target file |
| yelpr | An R library for the Yelp Fusion API |
| knitr | Quick report generation |
| tigris | Allow users to download and use the TIGER / Line shapefile() directly from the U.S. Census Bureau |
# install.packages("tidycensus")
# install.packages("sf")
# install.packages("tmap")
# install.packages("jsonlite")
# install.packages("tidyverse")
# install.packages("httr")
# install.packages("reshape2")
# install.packages("here")
# install.packages("yelpr") Error: package ‘yelpr’ is not available for this version of R
# install.packages("devtools")
# devtools::install_github("OmaymaS/yelpr")
# install.packages("knitr")
# install.packages("tigris")
library(tidycensus)
library(sf)
library(tmap)
library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
library(here)
library(yelpr)
library(knitr)
library(tigris)
Before calling the boundary data, it needs to be authenticated with the census_api. Since the code content is going to be public on Rpubs, store the key as an environment variable and call it with the Sys.getenv() function.
tidycensus::census_api_key(Sys.getenv("census_api"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.
In my code, the area selected is District of Columbia in 2020.But later we will see ## Retrieving data for the year 2021. this is because the administrative boundary data is changing all the time and is updated every 5 years. Here it is updated for the year 2021.
#### Tract polygons for the Yelp query
tract <- suppressMessages(
get_acs(geography = "tract", # or "block group", "county", "state" etc.
state = "District of Columbia",
county = c("District of Columbia"),
variables = c(hhincome = 'B19019_001',
race.tot = "B02001_001",
race.white = "B02001_002",
race.black = 'B02001_003'
),
year = 2020,
survey = "acs5", # American Community Survey 5-year estimate
geometry = TRUE, # returns sf objects
output = "wide") # wide vs. long
# Wide version's each variable have its own column but long version does not.
)
washington <- places('District of Columbia') %>%
filter(NAME == 'Washington')
## Retrieving data for the year 2021
tract <- tract[washington,]
# View the data
message(sprintf("nrow: %s, ncol: %s", nrow(tract), ncol(tract)))
## nrow: 206, ncol: 11
# kable() This function is for neatly displaying tables on HTML document.
tract %>% head() %>% knitr::kable() #head(tract)
| GEOID | NAME | hhincomeE | hhincomeM | race.totE | race.totM | race.whiteE | race.whiteM | race.blackE | race.blackM | geometry |
|---|---|---|---|---|---|---|---|---|---|---|
| 11001001002 | Census Tract 10.02, District of Columbia, District of Columbia | 140772 | 20766 | 3489 | 630 | 2710 | 596 | 211 | 111 | POLYGON ((-77.08563 38.9382… |
| 11001002801 | Census Tract 28.01, District of Columbia, District of Columbia | 62323 | 22218 | 4104 | 607 | 1727 | 412 | 1104 | 370 | POLYGON ((-77.03645 38.9349… |
| 11001003100 | Census Tract 31, District of Columbia, District of Columbia | 133408 | 18433 | 3825 | 606 | 1820 | 452 | 987 | 243 | POLYGON ((-77.02826 38.9318… |
| 11001004001 | Census Tract 40.01, District of Columbia, District of Columbia | 153650 | 5886 | 4785 | 822 | 4126 | 850 | 309 | 260 | POLYGON ((-77.05018 38.9212… |
| 11001010500 | Census Tract 105, District of Columbia, District of Columbia | 92083 | 20106 | 3927 | 704 | 2283 | 528 | 1164 | 516 | POLYGON ((-77.01756 38.8850… |
| 11001004600 | Census Tract 46, District of Columbia, District of Columbia | 129896 | 18312 | 3169 | 551 | 1517 | 418 | 1384 | 403 | POLYGON ((-77.01811 38.9146… |
In the data list, there are two variables for a data type, the one ending in E is the actual value and the one ending in M is the associated parameter error. For simplicity, only the value ending in E is retained.
# Retaining only those I want.
# Notice that select function can also change names when it selects columns.
tract <- tract %>%
select(GEOID,
hhincome = hhincomeE, # New name = old name
race.tot = race.totE,
race.white = race.whiteE,
race.black = race.blackE)
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(tract) + tm_borders() + tm_shape(washington) + tm_borders(col = 'red')
Because yelp does range searches in points and radii, and it provides up to 1,000 return values at a time, we break the target range into smaller portions for split batch searches using census tract boundaries: create a minimum outer circle for each census tract.
# Function: Get tract-wise radius
get_r <- function(poly, epsg_id){
#---------------------
# Takes: a single POLYGON or LINESTRTING
# Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
#---------------------
# Get bounding box of a given polygon
bb <- st_bbox(poly)
# Get lat & long coordinates of any one corner of the bounding box.
bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
# Get centroid of the bb
bb_center_x <- (bb[3]+bb[1])/2
bb_center_y <- (bb[4]+bb[2])/2
bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
# Get the distance between bb_p and c
r <- st_distance(bb_corner, bb_center)
# Multiply 1.1 to make the circle a bit larger than the Census Tract.
# See the Yelp explanation of their radius parameter to see why we do this.
bb_center$radius <- r*1.2
return(bb_center)
}
## Using a loop -----------------------------------------------------------------
# Creating an empty vector of NA.
# Results will fill this vector
epsg_id <- 4326
r4all_loop <- vector("list", nrow(tract))
# Starting a for-loop
for (i in 1:nrow(tract)){
r4all_loop[[i]] <- tract %>%
st_transform(crs = epsg_id) %>%
st_geometry() %>%
.[[i]] %>%
get_r(epsg_id = epsg_id)
}
r4all_loop <- bind_rows(r4all_loop)
# Using a functional -----------------------------------------------------------
# We use a functional (sapply) to apply this custom function to each Census Tract.
r4all_apply <- tract %>%
st_geometry() %>%
st_transform(crs = epsg_id) %>%
lapply(., function(x) get_r(x, epsg_id = epsg_id))
r4all_apply <- bind_rows(r4all_apply)
# Are these two identical?
identical(r4all_apply, r4all_loop)
## [1] TRUE
# Appending X Y coordinates as seprate columns
ready_4_yelp <- r4all_apply %>%
mutate(x = st_coordinates(.)[,1],
y = st_coordinates(.)[,2])
Selecting the first ten regions for visualization.
tmap_mode('view')
## tmap mode set to interactive viewing
# Select the first 10 rows
ready_4_yelp[1:10,] %>%
# Draw a buffer centered at the centroid of Tract polygons.
# Radius of the buffer is the radius we just calculated using loop
st_buffer(., dist = .$radius) %>%
# Display this buffer in red
tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
# Display the original polygon in blue
tm_shape(tract[1:10,]) + tm_borders(col= 'blue')
Here is an example of calling the business_search() function with yelpr.
which_tract <- 1
test <- business_search(api_key = Sys.getenv('yelp_api'), # like we did for census, store your api key
categories = 'hospitals',
latitude = ready_4_yelp$y[which_tract],
longitude = ready_4_yelp$x[which_tract],
offset = 0, # 1st page, 1st obs
radius = round(ready_4_yelp$radius[which_tract]), # radius requires integer value
limit = 50) # how many business per page
names(test)
# Business
paste0("is it a data.frame?: ", is.data.frame(test$businesses), ", ",
" how many rows?: ", nrow(test$businesses), ", ",
" how many columns?: ", ncol(test$businesses))
Next, creating a function that accesses the Yelp API for a census tract.
# FUNCTION
get_yelp <- function(tract, category){
# ----------------------------------
# Gets one row of tract information (1,) and category name (str),
# Outputs a list of business data.frame
Sys.sleep(1)
n <- 1
# First request --------------------------------------------------------------
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50, # = 0 when n = 1
radius = round(tract$radius),
limit = 50)
# Calculate how many requests are needed in total
required_n <- ceiling(resp$total/50)
# out is where the results will be appended to.
out <- vector("list", required_n)
# Store the business information to nth slot in out
out[[n]] <- resp$businesses
# Change the name of the elements to the total required_n
# This is to know if there are more than 1000 businesses,
# we know how many.
names(out)[n] <- required_n
# Throw error if more than 500
if (resp$total >= 500)
{
# glue formats string by inserting {n} with what's currently stored in object n.
print(glue::glue("{n}th row has >= 1000 businesses."))
# Stop before going into the loop because we need to
# break down Census Tract to something smaller.
return(out)
}
else
{
# add 1 to n
n <- n + 1
# Now we know required_n -----------------------------------------------------
# Starting a loop
while(n <= required_n){
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50,
radius = round(tract$radius),
limit = 50)
out[[n]] <- resp$businesses
n <- n + 1
} #<< end of while loop
# Merge all elements in the list into a single data frame
out <- out %>% bind_rows()
return(out)
}
}
Apply the function for the first Census Tract.
# Apply the function for the first Census Tract
yelp_first_tract <- get_yelp(ready_4_yelp[1,], "hospitals") %>%
as_tibble()
## No encoding supplied: defaulting to UTF-8.
# Print
yelp_first_tract %>% print
## # A tibble: 2 × 15
## id alias name image_url is_closed url review_count categories rating
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl>
## 1 _RaDtIOF… meds… MedS… "https:/… FALSE http… 20 <df> 1.5
## 2 YmDNjmiY… geor… Geor… "" FALSE http… 1 <df> 3
## # ℹ 6 more variables: coordinates <df[,2]>, transactions <list>,
## # location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>
Apply the function for each Census Tract.
# Prepare a collector
yelp_all_list <- vector("list", nrow(ready_4_yelp))
# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
yelp_all_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "hospitals"))
if (row %% 10 == 0){
print(paste0("Current row: ", row))
}
}
## [1] "Current row: 10"
## [1] "Current row: 20"
## [1] "Current row: 30"
## [1] "Current row: 40"
## [1] "Current row: 50"
## [1] "Current row: 60"
## [1] "Current row: 70"
## [1] "Current row: 80"
## [1] "Current row: 90"
## [1] "Current row: 100"
## [1] "Current row: 110"
## [1] "Current row: 120"
## [1] "Current row: 130"
## [1] "Current row: 140"
## [1] "Current row: 150"
## [1] "Current row: 160"
## [1] "Current row: 170"
## [1] "Current row: 180"
## [1] "Current row: 190"
## [1] "Current row: 200"
# Collapsing the list into a data.frame
yelp_all <- yelp_all_list %>% bind_rows() %>% as_tibble()
# print
yelp_all %>% print(width=1000)
## # A tibble: 166 × 15
## id
## <chr>
## 1 _RaDtIOFuyYCUHLKe59YfA
## 2 YmDNjmiYY2iWTZD-NkD1gQ
## 3 AVKphCn6rkDKRrnaBQnGGg
## 4 UGEDcMyuV2aH52RKfMIWeg
## 5 KmkgdnIOuKzw6DZ-j0pJxQ
## 6 AQyKAawtsnSkuCYidRs7tw
## 7 _T_yuBcKg2I5y2WIpACFzA
## 8 VLqdfhwTWkX-u4UjvBZtXA
## 9 a4SoZrsTqgMGIuL7-6wTTw
## 10 AVKphCn6rkDKRrnaBQnGGg
## alias
## <chr>
## 1 medstar-georgetown-pediatrics-washington-2
## 2 georgetown-university-hospital-washington-2
## 3 georgetown-university-hospital-washington-4
## 4 doctors-groover-christie-and-merritt-washington
## 5 kaiser-health-plan-of-the-mid-atlantic-states-washington
## 6 columbia-hospital-for-women-medical-center-washington
## 7 bridgepoint-capitol-hill-washington-2
## 8 capitol-hill-nursing-home-washington
## 9 premier-surgery-center-of-washington-dc-washington
## 10 georgetown-university-hospital-washington-4
## name
## <chr>
## 1 MedStar Georgetown Pediatrics
## 2 Georgetown University Hospital
## 3 Georgetown University Hospital
## 4 Doctors Groover Christie & Merritt
## 5 Kaiser Health Plan of the MID-Atlantic States
## 6 Columbia Hospital For Women Medical Center
## 7 BridgePoint - Capitol Hill
## 8 Capitol Hill Nursing Home
## 9 Premier Surgery Center of Washington DC
## 10 Georgetown University Hospital
## image_url
## <chr>
## 1 "https://s3-media3.fl.yelpcdn.com/bphoto/LELlyZqWQAMVqEd6JOF57Q/o.jpg"
## 2 ""
## 3 ""
## 4 ""
## 5 "https://s3-media3.fl.yelpcdn.com/bphoto/_WEVQTve5o4VXw6Mb6Ii0Q/o.jpg"
## 6 ""
## 7 "https://s3-media1.fl.yelpcdn.com/bphoto/t8h0FiUGWBxfHxAnahJT8Q/o.jpg"
## 8 ""
## 9 ""
## 10 ""
## is_closed
## <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## url
## <chr>
## 1 https://www.yelp.com/biz/medstar-georgetown-pediatrics-washington-2?adjust_c…
## 2 https://www.yelp.com/biz/georgetown-university-hospital-washington-2?adjust_…
## 3 https://www.yelp.com/biz/georgetown-university-hospital-washington-4?adjust_…
## 4 https://www.yelp.com/biz/doctors-groover-christie-and-merritt-washington?adj…
## 5 https://www.yelp.com/biz/kaiser-health-plan-of-the-mid-atlantic-states-washi…
## 6 https://www.yelp.com/biz/columbia-hospital-for-women-medical-center-washingt…
## 7 https://www.yelp.com/biz/bridgepoint-capitol-hill-washington-2?adjust_creati…
## 8 https://www.yelp.com/biz/capitol-hill-nursing-home-washington?adjust_creativ…
## 9 https://www.yelp.com/biz/premier-surgery-center-of-washington-dc-washington?…
## 10 https://www.yelp.com/biz/georgetown-university-hospital-washington-4?adjust_…
## review_count categories rating coordinates$latitude $longitude transactions
## <int> <list> <dbl> <dbl> <dbl> <list>
## 1 20 <df [2 × 2]> 1.5 38.9 -77.1 <list [0]>
## 2 1 <df [2 × 2]> 3 38.9 -77.1 <list [0]>
## 3 1 <df [1 × 2]> 4 38.9 -77.0 <list [0]>
## 4 1 <df [1 × 2]> 2 38.9 -77.0 <list [0]>
## 5 1 <df [1 × 2]> 4 38.9 -77.0 <list [0]>
## 6 3 <df [1 × 2]> 5 38.9 -77.1 <list [0]>
## 7 12 <df [1 × 2]> 2.5 38.9 -77.0 <list [0]>
## 8 1 <df [1 × 2]> 5 38.9 -77.0 <list [0]>
## 9 1 <df [1 × 2]> 5 39.0 -77.0 <list [0]>
## 10 1 <df [1 × 2]> 4 38.9 -77.0 <list [0]>
## location$address1 $address2 $address3 $city
## <chr> <chr> <chr> <chr>
## 1 "4200 Wisconsin Ave NW" "Fl 4" "" Washington, DC
## 2 "3301 New Mexico Ave NW" "" "" Washington, DC
## 3 "" "" "" Washington, DC
## 4 "" "" "" Washington, DC
## 5 "700 Second St NE" "" "Fl 6- Nephrology" Washington, DC
## 6 "2440 M St NW" "Ste 224" "" Washington, DC
## 7 "223 7th St NE" "" "" Washington, DC
## 8 "700 Constitution Ave NE" "" "" Washington, DC
## 9 "6323 Georgia Ave NW" "" "" Washington, DC
## 10 "" "" "" Washington, DC
## $zip_code $country $state $display_address phone display_phone
## <chr> <chr> <chr> <list> <chr> <chr>
## 1 20016 US DC <chr [3]> +12022433400 (202) 243-3400
## 2 20016 US DC <chr [2]> +12026861316 (202) 686-1316
## 3 20001 US DC <chr [1]> +12024442000 (202) 444-2000
## 4 20001 US DC <chr [1]> +12025374781 (202) 537-4781
## 5 20002 US DC <chr [3]> +12023463550 (202) 346-3550
## 6 20037 US DC <chr [3]> +12026590240 (202) 659-0240
## 7 20002 US DC <chr [2]> +12025465700 (202) 546-5700
## 8 20002 US DC <chr [2]> +12025464800 (202) 546-4800
## 9 20011 US DC <chr [2]> +12022910126 (202) 291-0126
## 10 20001 US DC <chr [1]> +12024442000 (202) 444-2000
## distance
## <dbl>
## 1 448.
## 2 846.
## 3 524.
## 4 524.
## 5 1709.
## 6 440.
## 7 292.
## 8 277.
## 9 394.
## 10 39.8
## # ℹ 156 more rows
Replace ‘hospitals’ with ‘Medical Centers’ and repeat the data acquisition steps
# Apply the function for the first Census Tract
yelp_first_tract_2 <- get_yelp(ready_4_yelp[1,], "medcenters") %>%
as_tibble()
# Prepare a collector
yelp_all_list_2 <- vector("list", nrow(ready_4_yelp))
# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
yelp_all_list_2[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "medcenters"))
if (row %% 10 == 0){
print(paste0("Current row: ", row))
}
}
# Collapsing the list into a data.frame
yelp_all_2 <- yelp_all_list_2 %>% bind_rows() %>% as_tibble()
# print
yelp_all_2 %>% print(width=1000)
Hospitals
# Extract coordinates
yelp_sf <- yelp_all %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$latitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# Map
tm_shape(yelp_sf) +
tm_dots(col = "review_count", style="quantile")
Medical Centers
# Extract coordinates
yelp_sf_2 <- yelp_all_2 %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$ltitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# Map
tm_shape(yelp_sf_2) +
tm_dots(col = "review_count", style="quantile")
What’s the county and state of your choice? State : District of Columbia County: District of Columbia
How many businesses are there in total? Total:
How many businesses are there for each business category? Hospitals: Medical Centers:
Upon visual inspection, can you see any noticeable spatial patterns to the way they are distributed across the county (e.g., clustering of businesses at some parts of the county)?
Are there any other interesting findings?