Exploring Restaurant and Shopping Business POIs in Valdosta, GA using Yelp API

CP8883 Intro to Urban Analytics Fall 2024 - Mini Assignment 1

Thanawit Suwannikom

2024-09-19

Introduction

This R Markdown document provides R script for getting business pois using Yelp API focusing on restaurant and shopping categories by incorporate the Census Tract geographic data to facilitate determination of coordinates for calling Yelp API.

This document is broken into 4 parts including

  1. Getting Census Tract and City Boundary
  2. Preparation for Yelp API calling
  3. Getting Business POIs from Yelp API
  4. Data Cleaning and Visualization

Import libraries

library(tidycensus)
library(sf)
library(tmap)
library(jsonlite)
library(tidyverse)
library(httr)
library(jsonlite)
library(reshape2)
library(here)
library(yelpr)
library(knitr)

1. Census Tract and City Boundary

In this study, Valdosta, GA is the city of interest.

Select City

city <- "Valdosta"
state <- "GA"
county <- "Lowndes"

Load Census Tract of Lowndes County, GA using CENSUS API

# Activate census api key
tidycensus::census_api_key(Sys.getenv("CENSUS_API_KEY"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.
tract <- suppressMessages(
  get_acs(geography = "tract", # or "block group", "county", "state" etc. 
          state = state,
          county = c(county), 
          variables = c(hhincome = 'B19019_001'),
          year = 2022,
          survey = "acs5", # American Community Survey 5-year estimate
          geometry = TRUE, # returns sf objects
          output = "wide") # wide vs. long
)

Get the polygon of Valdosta, GA from tigris

city_polygon <- tigris::places(state) %>% 
            filter(NAME == city)
## Retrieving data for the year 2022

Filter Census Tract for Valdosta, GA

census_tract <- tract[city_polygon, ]

# View the number of rows and columns of the census tract
message(sprintf("nrow: %s, ncol: %s", nrow(census_tract), ncol(census_tract)))
## nrow: 25, ncol: 5

Adjusting table for nice visualization

census_tract %>% head() %>% knitr::kable()
GEOID NAME hhincomeE hhincomeM geometry
13185011100 Census Tract 111; Lowndes County; Georgia 38413 22958 MULTIPOLYGON (((-83.30259 3…
13185010404 Census Tract 104.04; Lowndes County; Georgia 40216 7321 MULTIPOLYGON (((-83.28802 3…
13185010601 Census Tract 106.01; Lowndes County; Georgia 33234 8684 MULTIPOLYGON (((-83.29609 3…
13185011200 Census Tract 112; Lowndes County; Georgia 58635 12487 MULTIPOLYGON (((-83.33315 3…
13185010900 Census Tract 109; Lowndes County; Georgia 31974 11686 MULTIPOLYGON (((-83.32097 3…
13185010403 Census Tract 104.03; Lowndes County; Georgia 37402 31935 MULTIPOLYGON (((-83.26592 3…
# Select only GEOID and hhincomeE
census_tract <- census_tract %>% 
  select(GEOID, 
         hhincome = hhincomeE)

Display Valdosta City Polygon and Census Tract in the City

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(census_tract) + tm_borders(lwd = 2) + 
  tm_shape(city_polygon) + tm_polygons(col = '#ffa500', alpha = 0.4)

From the map, we can see that there are only small areas of city intersects with the outer census tract. Therefore, when getting the POIs based on these census tracts, we will get many points outside the city.

2. Preparation for Yelp API calling

Define function to calculate radius of the census tract

get_r <- function(poly, epsg_id){
  #---------------------
  # Takes: a single POLYGON or LINESTRTING
  # Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
  #---------------------
  
  # Get bounding box of a given polygon
  bb <- st_bbox(poly)
  # Get lat & long coordinates of any one corner of the bounding box.
  bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
  # Get centroid of the bb
  bb_center_x <- (bb[3]+bb[1])/2
  bb_center_y <- (bb[4]+bb[2])/2
  bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
  
  # Get the distance between bb_p and c
  r <- st_distance(bb_corner, bb_center)
  # Multiply 1.1 to make the circle a bit larger than the Census Tract.
  # See the Yelp explanation of their radius parameter to see why we do this.
  bb_center$radius <- r*1.1
  return(bb_center)
}

Calculate Radius of Each Tract using lapply

# Specify CRS
epsg_id <- 4326

tract_radius <- census_tract %>%
  st_geometry() %>% 
  st_transform(crs = epsg_id) %>% 
  lapply(., function(x) get_r(x, epsg_id = epsg_id))

tract_radius <- bind_rows(tract_radius)

Preparing Coordinates of Census Tract for calling YELP API

tract_4_yelp <- tract_radius %>% 
  mutate(x = st_coordinates(.)[,1],
         y = st_coordinates(.)[,2])

# Visualize the coverages
tmap_mode('view')
## tmap mode set to interactive viewing
tract_radius %>% 
  # Draw a buffer centered at the centroid of Tract polygons.
  st_buffer(., dist = .$radius) %>% 
  # Display this buffer in red
  tm_shape(.) + tm_polygons(alpha = 0.5, col = '#50cc68') +
  # Display the original polygon in blue
  tm_shape(census_tract) + tm_borders(col= '#112a5c')

3. Getting Business POIs from Yelp API

Function for Iterative calling YELP API

# YELP API Function
get_yelp <- function(tract, category){
  # ----------------------------------
  # Gets one row of tract information (1,) and category name (str),
  # Outputs a list of business data.frame
  Sys.sleep(1)
  n <- 1
  # First request --------------------------------------------------------------
  resp <- business_search(api_key = Sys.getenv("YELP_API_KEY"), 
                          categories = category, 
                          latitude = tract$y, 
                          longitude = tract$x, 
                          offset = (n - 1) * 50, # = 0 when n = 1
                          radius = round(tract$radius), 
                          limit = 50)
  # Calculate how many requests are needed in total
  required_n <- ceiling(resp$total/50)
  
  # out is where the results will be appended to.
  out <- vector("list", required_n)
  
  # Store the business information to nth slot in out
  out[[n]] <- resp$businesses
  
  # Change the name of the elements to the total required_n
  # This is to know if there are more than 1000 businesses,
  # we know how many.
  names(out)[n] <- required_n
  
  # Throw error if more than 1000
  if (resp$total >= 1000)
  {
    # glue formats string by inserting {n} with what's currently stored in object n.
    print(glue::glue("{n}th row has >= 1000 businesses."))
    # Stop before going into the loop because we need to
    # break down Census Tract to something smaller.
    return(out)
  } 
  else 
  {
    # add 1 to n
    n <- n + 1
    
    # Now we know required_n -----------------------------------------------------
    # Starting a loop
    while(n <= required_n){
      resp <- business_search(api_key = Sys.getenv("YELP_API_KEY"), 
                              categories = category, 
                              latitude = tract$y, 
                              longitude = tract$x, 
                              offset = (n - 1) * 50, 
                              radius = round(tract$radius), 
                              limit = 50)
      
      out[[n]] <- resp$businesses
      
      n <- n + 1
    } #<< end of while loop
    
    # Merge all elements in the list into a single data frame
    out <- out %>% bind_rows()
    
    return(out)
  }
}

Iterative Call YELP API for each census tract, each POI Category (Restaurants, Shopping)

# Prepare a collector
restaurant_all_list <- vector("list", nrow(tract_4_yelp))
shopping_all_list <- vector("list", nrow(tract_4_yelp))

poi_cat = c('restaurant', 'shopping')

for (row in 1:nrow(tract_4_yelp)){
  restaurant_all_list[[row]] <- suppressMessages(get_yelp(tract_4_yelp[row,], poi_cat[1]))
  shopping_all_list[[row]] <- suppressMessages(get_yelp(tract_4_yelp[row,], poi_cat[2]))
  print(paste0("Current row: ", row))
}

Save POI Lists

# (For later use to prevent calling API during knitting)
saveRDS(restaurant_all_list, file="restaurant_yelp_valdosta_ga.rds")
saveRDS(shopping_all_list, file="shopping_yelp_valdosta_ga.rds")

Load POI Lists from File

restaurant_all_list <- readRDS("restaurant_yelp_valdosta_ga.rds")
shopping_all_list <- readRDS("shopping_yelp_valdosta_ga.rds")

Create DataFrames from lists

restaurant_poi <- restaurant_all_list %>% 
                  bind_rows() %>%
                  mutate(main_category = "restaurant") #create a new column to specify main_category

shopping_poi <- shopping_all_list %>% 
                bind_rows() %>%
                mutate(main_category = "shopping")

Merge two dataframes

all_poi <- bind_rows(restaurant_poi, shopping_poi) %>% as_tibble()

4. Data Cleaning and Visualization

Take a look at class of each columns

sapply(all_poi, class) %>% print()
##             id          alias           name      image_url      is_closed 
##    "character"    "character"    "character"    "character"      "logical" 
##            url   review_count     categories         rating    coordinates 
##    "character"      "integer"         "list"      "numeric"   "data.frame" 
##   transactions          price       location          phone  display_phone 
##         "list"    "character"   "data.frame"    "character"    "character" 
##       distance business_hours     attributes  main_category 
##      "numeric"         "list"   "data.frame"    "character"

Clean Data and Prepare for Visualization

# Remove duplicates pois with same id
all_poi_unique <- all_poi %>%
                  distinct(id, .keep_all = TRUE)
print(paste0("Before dropping duplicated id: ", nrow(all_poi)))
## [1] "Before dropping duplicated id: 6034"
print(paste0("After dropping duplicated id: ", nrow(all_poi_unique)))
## [1] "After dropping duplicated id: 1010"
# Drop records without coordinate information
all_poi_nona <- all_poi_unique %>% 
  filter(!is.na(coordinates$longitude))
print(paste0("Before dropping na: ", nrow(all_poi_unique)))
## [1] "Before dropping na: 1010"
print(paste0("After dropping na: ", nrow(all_poi_nona)))
## [1] "After dropping na: 1010"
# Extract Coordinates
poi_sf <- all_poi_unique %>% 
  mutate(x = .$coordinates$longitude,
         y = .$coordinates$latitude) %>% 
  filter(!is.na(x) & !is.na(y)) %>% 
  st_as_sf(coords = c("x", "y"), crs = 4326)

Remove POIs out of City Boundary

# Convert CRS of city polygon to EPSG:4326
city_polygon <- city_polygon %>% st_transform(crs=4326)

city_poi_sf <- poi_sf[city_polygon, ]

print(paste0("All Business POIs: ", nrow(poi_sf)))
## [1] "All Business POIs: 1010"
print(paste0("Business POIs in the City Boundary: ", nrow(city_poi_sf)))
## [1] "Business POIs in the City Boundary: 708"

There are 708 business POIs in Valdosta, GA from Restaurant and Shopping Category from YELP API.

Count POIs in the City by Category

city_poi_sf %>% 
    st_drop_geometry() %>%
    count(main_category)
## # A tibble: 2 × 2
##   main_category     n
##   <chr>         <int>
## 1 restaurant      310
## 2 shopping        398

There are 310 restaurant POIs and 398 shopping POIs in Valdosta, GA.

Visualize POIs with City Boundary

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(city_poi_sf) +
  tm_dots(col = "main_category", size = "rating", palette="Set2",
          alpha=0.7, scale=1, size.max=50, id="name") +
  tm_shape(city_polygon) + tm_borders()
## Legend for symbol sizes not available in view mode.

The interactive map shows locations of business pois where the green bubbles represent restaurant businesses and orange bubbles are shopping businesses. The size of bubbles indicates rating that the business gets from Yelp’s users. When hover the mouse over the point, the name of the business will appear.

From the map above, we can see that there are 2 main clusters of business POIs. The POIs in the first clusters are located along the highway number 41 which passes across the middle of the city. The second cluster is on the left which is dense near the Valdosta Mall and is nearby the I-75 highway.

Conclusion

This R document provides a demonstration of getting business POIs using Yelp API within a specific city by incorporating us census api to help in determine boundary of the city.

In this exploration, Valdosta city, which seats in Lowndes county, Georgia, is selected and business POIs categories of restaurant and shopping are considered. After cleaning the duplicates and removing out of the city POIs, there are 310 restaurant POIs and 398 shopping POIs in Valdosta, GA. The majority of the POIs are located along the highways.