Introduction
This R Markdown document provides R script for getting business pois using Yelp API focusing on restaurant and shopping categories by incorporate the Census Tract geographic data to facilitate determination of coordinates for calling Yelp API.
This document is broken into 4 parts including
- Getting Census Tract and City Boundary
- Preparation for Yelp API calling
- Getting Business POIs from Yelp API
- Data Cleaning and Visualization
Import libraries
library(tidycensus)
library(sf)
library(tmap)
library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
library(here)
library(yelpr)
library(knitr)
1. Census Tract and City Boundary
In this study, Valdosta, GA is the city of interest.
Select City
city <- "Valdosta"
state <- "GA"
county <- "Lowndes"
Load Census Tract of Lowndes County, GA using CENSUS API
# Activate census api key
tidycensus::census_api_key(Sys.getenv("CENSUS_API_KEY"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.
tract <- suppressMessages(
get_acs(geography = "tract", # or "block group", "county", "state" etc.
state = state,
county = c(county),
variables = c(hhincome = 'B19019_001'),
year = 2022,
survey = "acs5", # American Community Survey 5-year estimate
geometry = TRUE, # returns sf objects
output = "wide") # wide vs. long
)
Get the polygon of Valdosta, GA from tigris
city_polygon <- tigris::places(state) %>%
filter(NAME == city)
## Retrieving data for the year 2022
Filter Census Tract for Valdosta, GA
census_tract <- tract[city_polygon, ]
# View the number of rows and columns of the census tract
message(sprintf("nrow: %s, ncol: %s", nrow(census_tract), ncol(census_tract)))
## nrow: 25, ncol: 5
Adjusting table for nice visualization
census_tract %>% head() %>% knitr::kable()
GEOID | NAME | hhincomeE | hhincomeM | geometry |
---|---|---|---|---|
13185011100 | Census Tract 111; Lowndes County; Georgia | 38413 | 22958 | MULTIPOLYGON (((-83.30259 3… |
13185010404 | Census Tract 104.04; Lowndes County; Georgia | 40216 | 7321 | MULTIPOLYGON (((-83.28802 3… |
13185010601 | Census Tract 106.01; Lowndes County; Georgia | 33234 | 8684 | MULTIPOLYGON (((-83.29609 3… |
13185011200 | Census Tract 112; Lowndes County; Georgia | 58635 | 12487 | MULTIPOLYGON (((-83.33315 3… |
13185010900 | Census Tract 109; Lowndes County; Georgia | 31974 | 11686 | MULTIPOLYGON (((-83.32097 3… |
13185010403 | Census Tract 104.03; Lowndes County; Georgia | 37402 | 31935 | MULTIPOLYGON (((-83.26592 3… |
# Select only GEOID and hhincomeE
census_tract <- census_tract %>%
select(GEOID,
hhincome = hhincomeE)
Display Valdosta City Polygon and Census Tract in the City
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(census_tract) + tm_borders(lwd = 2) +
tm_shape(city_polygon) + tm_polygons(col = '#ffa500', alpha = 0.4)
From the map, we can see that there are only small areas of city intersects with the outer census tract. Therefore, when getting the POIs based on these census tracts, we will get many points outside the city.
2. Preparation for Yelp API calling
Define function to calculate radius of the census tract
get_r <- function(poly, epsg_id){
#---------------------
# Takes: a single POLYGON or LINESTRTING
# Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
#---------------------
# Get bounding box of a given polygon
bb <- st_bbox(poly)
# Get lat & long coordinates of any one corner of the bounding box.
bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
# Get centroid of the bb
bb_center_x <- (bb[3]+bb[1])/2
bb_center_y <- (bb[4]+bb[2])/2
bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
# Get the distance between bb_p and c
r <- st_distance(bb_corner, bb_center)
# Multiply 1.1 to make the circle a bit larger than the Census Tract.
# See the Yelp explanation of their radius parameter to see why we do this.
bb_center$radius <- r*1.1
return(bb_center)
}
Calculate Radius of Each Tract using lapply
# Specify CRS
epsg_id <- 4326
tract_radius <- census_tract %>%
st_geometry() %>%
st_transform(crs = epsg_id) %>%
lapply(., function(x) get_r(x, epsg_id = epsg_id))
tract_radius <- bind_rows(tract_radius)
Preparing Coordinates of Census Tract for calling YELP API
tract_4_yelp <- tract_radius %>%
mutate(x = st_coordinates(.)[,1],
y = st_coordinates(.)[,2])
# Visualize the coverages
tmap_mode('view')
## tmap mode set to interactive viewing
tract_radius %>%
# Draw a buffer centered at the centroid of Tract polygons.
st_buffer(., dist = .$radius) %>%
# Display this buffer in red
tm_shape(.) + tm_polygons(alpha = 0.5, col = '#50cc68') +
# Display the original polygon in blue
tm_shape(census_tract) + tm_borders(col= '#112a5c')
3. Getting Business POIs from Yelp API
Function for Iterative calling YELP API
# YELP API Function
get_yelp <- function(tract, category){
# ----------------------------------
# Gets one row of tract information (1,) and category name (str),
# Outputs a list of business data.frame
Sys.sleep(1)
n <- 1
# First request --------------------------------------------------------------
resp <- business_search(api_key = Sys.getenv("YELP_API_KEY"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50, # = 0 when n = 1
radius = round(tract$radius),
limit = 50)
# Calculate how many requests are needed in total
required_n <- ceiling(resp$total/50)
# out is where the results will be appended to.
out <- vector("list", required_n)
# Store the business information to nth slot in out
out[[n]] <- resp$businesses
# Change the name of the elements to the total required_n
# This is to know if there are more than 1000 businesses,
# we know how many.
names(out)[n] <- required_n
# Throw error if more than 1000
if (resp$total >= 1000)
{
# glue formats string by inserting {n} with what's currently stored in object n.
print(glue::glue("{n}th row has >= 1000 businesses."))
# Stop before going into the loop because we need to
# break down Census Tract to something smaller.
return(out)
}
else
{
# add 1 to n
n <- n + 1
# Now we know required_n -----------------------------------------------------
# Starting a loop
while(n <= required_n){
resp <- business_search(api_key = Sys.getenv("YELP_API_KEY"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50,
radius = round(tract$radius),
limit = 50)
out[[n]] <- resp$businesses
n <- n + 1
} #<< end of while loop
# Merge all elements in the list into a single data frame
out <- out %>% bind_rows()
return(out)
}
}
Iterative Call YELP API for each census tract, each POI Category (Restaurants, Shopping)
# Prepare a collector
restaurant_all_list <- vector("list", nrow(tract_4_yelp))
shopping_all_list <- vector("list", nrow(tract_4_yelp))
poi_cat = c('restaurant', 'shopping')
for (row in 1:nrow(tract_4_yelp)){
restaurant_all_list[[row]] <- suppressMessages(get_yelp(tract_4_yelp[row,], poi_cat[1]))
shopping_all_list[[row]] <- suppressMessages(get_yelp(tract_4_yelp[row,], poi_cat[2]))
print(paste0("Current row: ", row))
}
Save POI Lists
# (For later use to prevent calling API during knitting)
saveRDS(restaurant_all_list, file="restaurant_yelp_valdosta_ga.rds")
saveRDS(shopping_all_list, file="shopping_yelp_valdosta_ga.rds")
Load POI Lists from File
restaurant_all_list <- readRDS("restaurant_yelp_valdosta_ga.rds")
shopping_all_list <- readRDS("shopping_yelp_valdosta_ga.rds")
Create DataFrames from lists
restaurant_poi <- restaurant_all_list %>%
bind_rows() %>%
mutate(main_category = "restaurant") #create a new column to specify main_category
shopping_poi <- shopping_all_list %>%
bind_rows() %>%
mutate(main_category = "shopping")
Merge two dataframes
all_poi <- bind_rows(restaurant_poi, shopping_poi) %>% as_tibble()
4. Data Cleaning and Visualization
Take a look at class of each columns
sapply(all_poi, class) %>% print()
## id alias name image_url is_closed
## "character" "character" "character" "character" "logical"
## url review_count categories rating coordinates
## "character" "integer" "list" "numeric" "data.frame"
## transactions price location phone display_phone
## "list" "character" "data.frame" "character" "character"
## distance business_hours attributes main_category
## "numeric" "list" "data.frame" "character"
Clean Data and Prepare for Visualization
# Remove duplicates pois with same id
all_poi_unique <- all_poi %>%
distinct(id, .keep_all = TRUE)
print(paste0("Before dropping duplicated id: ", nrow(all_poi)))
## [1] "Before dropping duplicated id: 6034"
print(paste0("After dropping duplicated id: ", nrow(all_poi_unique)))
## [1] "After dropping duplicated id: 1010"
# Drop records without coordinate information
all_poi_nona <- all_poi_unique %>%
filter(!is.na(coordinates$longitude))
print(paste0("Before dropping na: ", nrow(all_poi_unique)))
## [1] "Before dropping na: 1010"
print(paste0("After dropping na: ", nrow(all_poi_nona)))
## [1] "After dropping na: 1010"
# Extract Coordinates
poi_sf <- all_poi_unique %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$latitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
Remove POIs out of City Boundary
# Convert CRS of city polygon to EPSG:4326
city_polygon <- city_polygon %>% st_transform(crs=4326)
city_poi_sf <- poi_sf[city_polygon, ]
print(paste0("All Business POIs: ", nrow(poi_sf)))
## [1] "All Business POIs: 1010"
print(paste0("Business POIs in the City Boundary: ", nrow(city_poi_sf)))
## [1] "Business POIs in the City Boundary: 708"
There are 708 business POIs in Valdosta, GA from Restaurant and Shopping Category from YELP API.
Count POIs in the City by Category
city_poi_sf %>%
st_drop_geometry() %>%
count(main_category)
## # A tibble: 2 × 2
## main_category n
## <chr> <int>
## 1 restaurant 310
## 2 shopping 398
There are 310 restaurant POIs and 398 shopping POIs in Valdosta, GA.
Visualize POIs with City Boundary
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(city_poi_sf) +
tm_dots(col = "main_category", size = "rating", palette="Set2",
alpha=0.7, scale=1, size.max=50, id="name") +
tm_shape(city_polygon) + tm_borders()
## Legend for symbol sizes not available in view mode.
The interactive map shows locations of business pois where the green bubbles represent restaurant businesses and orange bubbles are shopping businesses. The size of bubbles indicates rating that the business gets from Yelp’s users. When hover the mouse over the point, the name of the business will appear.
From the map above, we can see that there are 2 main clusters of business POIs. The POIs in the first clusters are located along the highway number 41 which passes across the middle of the city. The second cluster is on the left which is dense near the Valdosta Mall and is nearby the I-75 highway.
Conclusion
This R document provides a demonstration of getting business POIs using Yelp API within a specific city by incorporating us census api to help in determine boundary of the city.
In this exploration, Valdosta city, which seats in Lowndes county, Georgia, is selected and business POIs categories of restaurant and shopping are considered. After cleaning the duplicates and removing out of the city POIs, there are 310 restaurant POIs and 398 shopping POIs in Valdosta, GA. The majority of the POIs are located along the highways.