Choose two categories of businesses. See the Yelp API documentation if you need help with the choice.
Because you will have Yelp business information for two different categories, you may need to merge them into a single data frame by using bind_rows().
Write R codes to get business data from Yelp using yelpr package for the selected county & business categories. Make sure you leave comments using hash tags (#) to explain what each code chunk is doing.
Map the locations of business using the downloaded data.
library(tidycensus) # to access census api
library(sf) # to read/write shp files
library(tmap)
library(tidyverse)
library(httr) # to use api requests
library(jsonlite) # to read/write json files
library(reshape2)
library(yelpr)
library(knitr)
library(here) # to use relative paths
I chose Champaign County, Illinois as my geographical boundary. To check if the boundary file is downloaded properly, I visualized the tract boundaries on a map.
#### Tract polygons for the Yelp query
tract <- suppressMessages(
get_acs(geography = "tract", # or "block group", "county", "state" etc.
state = "IL",
county = c("Champaign"),
variables = c(population = "B01003_001"),
year = 2019,
survey = "acs5", # American Community Survey 5-year estimate
geometry = TRUE, # returns sf objects
output = "wide") # wide vs. long
)
##
|
| | 0%
|
|= | 1%
|
|== | 3%
|
|=== | 5%
|
|===== | 7%
|
|====== | 9%
|
|======== | 12%
|
|========== | 14%
|
|============ | 17%
|
|============= | 18%
|
|=============== | 21%
|
|================ | 23%
|
|================== | 26%
|
|=================== | 28%
|
|===================== | 31%
|
|======================= | 32%
|
|========================= | 35%
|
|========================== | 37%
|
|============================ | 40%
|
|============================= | 42%
|
|=============================== | 44%
|
|================================ | 46%
|
|================================== | 49%
|
|==================================== | 51%
|
|====================================== | 54%
|
|======================================= | 55%
|
|========================================= | 58%
|
|========================================== | 60%
|
|============================================ | 63%
|
|============================================= | 65%
|
|=============================================== | 68%
|
|================================================ | 69%
|
|================================================== | 72%
|
|==================================================== | 74%
|
|====================================================== | 77%
|
|======================================================= | 78%
|
|========================================================= | 81%
|
|========================================================== | 83%
|
|============================================================ | 86%
|
|============================================================= | 88%
|
|=============================================================== | 91%
|
|================================================================= | 92%
|
|=================================================================== | 95%
|
|==================================================================== | 97%
|
|======================================================================| 100%
# Retaining only those I want.
# Notice that select function can also change names when it selects columns.
tract <- tract %>%
select(GEOID,
hhincome = populationE)
# visualize tract
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(tract) + tm_borders()
I will reuse the get_r function from the given RMD file.
# Function: Get tract-wise radius
get_r <- function(poly, epsg_id){
#---------------------
# Takes: a single POLYGON or LINESTRTING
# Outputs: distance between the centroid of the bounding box and a corner of the bounding box
#---------------------
# Get bounding box of a given polygon
bb <- st_bbox(poly)
# Get lat & long coordinates of any one corner of the bounding box.
bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
# Get centroid of the bb
bb_center_x <- (bb[3]+bb[1])/2
bb_center_y <- (bb[4]+bb[2])/2
bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
# Get the distance between bb_p and c
r <- st_distance(bb_corner, bb_center)
# Multiply 1.1 to make the circle a bit larger than the Census Tract.
# See the Yelp explanation of their radius parameter to see why we do this.
bb_center$radius <- r*1.2
return(bb_center)
}
epsg_id <- 4326
r4all <- tract %>%
st_geometry() %>%
st_transform(crs = epsg_id) %>%
lapply(., function(x) get_r(x, epsg_id = epsg_id))
r4all <- bind_rows(r4all)
# Appending X Y coordinates as separate columns
ready_4_yelp <- r4all %>%
mutate(x = st_coordinates(.)[,1],
y = st_coordinates(.)[,2])
checking if ready_4_yelp is successfully created…
tmap_mode('view')
## tmap mode set to interactive viewing
# Select the first 10 rows
ready_4_yelp[1:10,] %>%
# Draw a buffer centered at the centroid of Tract polygons.
# Radius of the buffer is the radius we just calculated using loop
st_buffer(., dist = .$radius) %>%
# Display this buffer in red
tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
# Display the original polygon in blue
tm_shape(tract[1:10,]) + tm_borders(col= 'blue')
I used the gey_yelp function from the given RMD file. For each tract and selection of category, the get_yelp function
# FUNCTION
get_yelp <- function(tract, category){
# ----------------------------------
# Gets one row of tract information (1,) and category name (str),
# Outputs a list of business data.frame
n <- 1
# First request --------------------------------------------------------------
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50, # = 0 when n = 1
radius = round(tract$radius),
limit = 50)
# Calculate how many requests are needed in total
required_n <- ceiling(resp$total/50)
# out is where the results will be appended to.
out <- vector("list", required_n)
# Store the business information to nth slot in out
out[[n]] <- resp$businesses
# Change the name of the elements to the total required_n
# This is to know if there are more than 1000 businesses,
# we know how many.
names(out)[n] <- required_n
# Throw error if more than 1000
if (resp$total >= 1000)
{
# glue formats string by inserting {n} with what's currently stored in object n.
print(glue::glue("{n}th row has >= 1000 businesses."))
# Stop before going into the loop because we need to
# break down Census Tract to something smaller.
return(out)
}
else
{
# add 1 to n
n <- n + 1
# Now we know required_n -----------------------------------------------------
# Starting a loop
while(n <= required_n){
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50,
radius = round(tract$radius),
limit = 50)
out[[n]] <- resp$businesses
n <- n + 1
} #<< end of while loop
# Merge all elements in the list into a single data frame
out <- out %>% bind_rows()
return(out)
}
}
For example, for the first tract in Champaign County, there are 70 shopping businesses and 3 hotel businesses.
# Apply the function for the first Census Tract
yelp_first_tract_shopping <- get_yelp(ready_4_yelp[1,], "shopping") %>%
as_tibble
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
# Print
yelp_first_tract_shopping %>% print
## # A tibble: 70 × 16
## id alias name image…¹ is_cl…² url revie…³ categ…⁴ rating coord…⁵
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl> <dbl>
## 1 KuJEyXesWoy… the-… The … "https… FALSE http… 22 <df> 4.5 40.1
## 2 mqW1uq15v8i… bake… Bake… "https… FALSE http… 33 <df> 4.5 40.1
## 3 e0tj2Jip560… fyxi… FYXIT "https… FALSE http… 36 <df> 4.5 40.1
## 4 ckdGk8ForF9… stra… Stra… "https… FALSE http… 30 <df> 3.5 40.1
## 5 POuzQLJuPWx… inte… Inte… "" FALSE http… 9 <df> 4.5 40.1
## 6 7Vaj54SeGM3… heel… Heel… "https… FALSE http… 32 <df> 3.5 40.1
## 7 z-jVqx3Wx9f… klos… Klos… "https… FALSE http… 12 <df> 4.5 40.1
## 8 QOg8cBYeaub… reco… Reco… "https… FALSE http… 16 <df> 4 40.1
## 9 BM1_iNKkC1t… camp… Camp… "https… FALSE http… 24 <df> 4.5 40.1
## 10 Dkhg2ClBOP0… robe… Robe… "" FALSE http… 16 <df> 5 40.1
## # … with 60 more rows, 7 more variables: coordinates$longitude <dbl>,
## # transactions <list>, price <chr>, location <df[,8]>, phone <chr>,
## # display_phone <chr>, distance <dbl>, and abbreviated variable names
## # ¹image_url, ²is_closed, ³review_count, ⁴categories, ⁵coordinates$latitude
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
# Apply the function for the first Census Tract
yelp_first_tract_hotels <- get_yelp(ready_4_yelp[1,], "hotels") %>%
as_tibble
## No encoding supplied: defaulting to UTF-8.
# Print
yelp_first_tract_hotels %>% print
## # A tibble: 3 × 16
## id alias name image…¹ is_cl…² url revie…³ categ…⁴ rating coord…⁵ trans…⁶
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl> <dbl> <list>
## 1 9lnu… the-… The … "https… FALSE http… 16 <df> 2.5 40.1 <list>
## 2 LMF8… urba… Urba… "https… FALSE http… 15 <df> 1.5 40.1 <list>
## 3 SKAQ… cour… Cour… "" FALSE http… 2 <df> 4.5 40.1 <list>
## # … with 6 more variables: coordinates$longitude <dbl>, price <chr>,
## # location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>, and
## # abbreviated variable names ¹image_url, ²is_closed, ³review_count,
## # ⁴categories, ⁵coordinates$latitude, ⁶transactions
## # ℹ Use `colnames()` to see all variable names
# Print
yelp_first_tract <- yelp_first_tract_shopping %>% bind_rows(yelp_first_tract_hotels) %>% print
## # A tibble: 73 × 16
## id alias name image…¹ is_cl…² url revie…³ categ…⁴ rating coord…⁵
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl> <dbl>
## 1 KuJEyXesWoy… the-… The … "https… FALSE http… 22 <df> 4.5 40.1
## 2 mqW1uq15v8i… bake… Bake… "https… FALSE http… 33 <df> 4.5 40.1
## 3 e0tj2Jip560… fyxi… FYXIT "https… FALSE http… 36 <df> 4.5 40.1
## 4 ckdGk8ForF9… stra… Stra… "https… FALSE http… 30 <df> 3.5 40.1
## 5 POuzQLJuPWx… inte… Inte… "" FALSE http… 9 <df> 4.5 40.1
## 6 7Vaj54SeGM3… heel… Heel… "https… FALSE http… 32 <df> 3.5 40.1
## 7 z-jVqx3Wx9f… klos… Klos… "https… FALSE http… 12 <df> 4.5 40.1
## 8 QOg8cBYeaub… reco… Reco… "https… FALSE http… 16 <df> 4 40.1
## 9 BM1_iNKkC1t… camp… Camp… "https… FALSE http… 24 <df> 4.5 40.1
## 10 Dkhg2ClBOP0… robe… Robe… "" FALSE http… 16 <df> 5 40.1
## # … with 63 more rows, 7 more variables: coordinates$longitude <dbl>,
## # transactions <list>, price <chr>, location <df[,8]>, phone <chr>,
## # display_phone <chr>, distance <dbl>, and abbreviated variable names
## # ¹image_url, ²is_closed, ³review_count, ⁴categories, ⁵coordinates$latitude
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Using for loop, I will repeat the same task for all other census tracts.
# Prepare a collector
yelp_all_list <- vector("list", nrow(ready_4_yelp))
yelp_shopping_list <- vector("list", nrow(ready_4_yelp))
yelp_hotels_list <- vector("list", nrow(ready_4_yelp))
# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
yelp_shopping <- suppressMessages(get_yelp(ready_4_yelp[row,], "shopping"))
yelp_hotels <- suppressMessages(get_yelp(ready_4_yelp[row,], "hotels"))
yelp_all_list[[row]] <- yelp_shopping %>% bind_rows(yelp_hotels)
yelp_shopping_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "shopping"))
yelp_hotels_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "hotels"))
if (row %% 10 == 0){
print(paste0("Current row: ", row))
}
}
## [1] "Current row: 10"
## [1] "Current row: 20"
## [1] "Current row: 30"
## [1] "Current row: 40"
# Collapsing the list into a data.frame
yelp_all <- yelp_all_list %>% bind_rows() %>% as_tibble()
yelp_shopping_df <- yelp_shopping_list %>% bind_rows() %>% as_tibble()
yelp_hotels_df <- yelp_hotels_list %>% bind_rows() %>% as_tibble()
# print
yelp_all %>% print(width=1000)
## # A tibble: 3,286 × 16
## id alias
## <chr> <chr>
## 1 KuJEyXesWoyOm3GQm0rqcg the-idea-store-urbana
## 2 mqW1uq15v8iABsh9BzDZjA bakers-bikes-urbana
## 3 e0tj2Jip560QbC8N9pF6xw fyxit-champaign-2
## 4 ckdGk8ForF9zEczmu4-tTA strawberry-fields-urbana-2
## 5 POuzQLJuPWx0i-dUNdRsWQ international-galleries-urbana
## 6 7Vaj54SeGM3RupuvNlccOw heel-to-toe-urbana
## 7 z-jVqx3Wx9fKOK6fn86Csg klose-knit-urbana
## 8 QOg8cBYeaublNpmPWV-k_w record-swap-urbana-4
## 9 BM1_iNKkC1tbda-29O99JA campus-mobile-solutions-champaign-2
## 10 Dkhg2ClBOP0efXaoL8yNNg roberts-fine-art-of-jewelry-champaign
## name
## <chr>
## 1 The Idea Store
## 2 Baker's Bikes
## 3 FYXIT
## 4 Strawberry Fields
## 5 International Galleries
## 6 Heel To Toe
## 7 Klose Knit
## 8 Record Swap
## 9 Campus Mobile Solutions
## 10 Robert's Fine Art of Jewelry
## image_url
## <chr>
## 1 "https://s3-media2.fl.yelpcdn.com/bphoto/Rc6DmJoBp9zdjrsdURIKPg/o.jpg"
## 2 "https://s3-media4.fl.yelpcdn.com/bphoto/yvbHZ0fmRfvuFRzxwuTCUw/o.jpg"
## 3 "https://s3-media1.fl.yelpcdn.com/bphoto/3u2wpWz6vU4geIOI9K61VA/o.jpg"
## 4 "https://s3-media2.fl.yelpcdn.com/bphoto/zO4Ie2IkCY9Asifnboy82w/o.jpg"
## 5 ""
## 6 "https://s3-media2.fl.yelpcdn.com/bphoto/sVy8H_AQ0Zy4jX7Dhq5CJA/o.jpg"
## 7 "https://s3-media4.fl.yelpcdn.com/bphoto/weHG-Pp-SSTwdlOid20SwQ/o.jpg"
## 8 "https://s3-media3.fl.yelpcdn.com/bphoto/KA7L4qCRBduN1XVTjDIxfw/o.jpg"
## 9 "https://s3-media3.fl.yelpcdn.com/bphoto/38aK7NV3GtA0ni4keQqQfQ/o.jpg"
## 10 ""
## is_closed
## <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## url
## <chr>
## 1 https://www.yelp.com/biz/the-idea-store-urbana?adjust_creative=-To_AbVIHKh8m…
## 2 https://www.yelp.com/biz/bakers-bikes-urbana?adjust_creative=-To_AbVIHKh8mMA…
## 3 https://www.yelp.com/biz/fyxit-champaign-2?adjust_creative=-To_AbVIHKh8mMAun…
## 4 https://www.yelp.com/biz/strawberry-fields-urbana-2?adjust_creative=-To_AbVI…
## 5 https://www.yelp.com/biz/international-galleries-urbana?adjust_creative=-To_…
## 6 https://www.yelp.com/biz/heel-to-toe-urbana?adjust_creative=-To_AbVIHKh8mMAu…
## 7 https://www.yelp.com/biz/klose-knit-urbana?adjust_creative=-To_AbVIHKh8mMAun…
## 8 https://www.yelp.com/biz/record-swap-urbana-4?adjust_creative=-To_AbVIHKh8mM…
## 9 https://www.yelp.com/biz/campus-mobile-solutions-champaign-2?adjust_creative…
## 10 https://www.yelp.com/biz/roberts-fine-art-of-jewelry-champaign?adjust_creati…
## review_count categories rating coordinates$latitude $longitude transactions
## <int> <list> <dbl> <dbl> <dbl> <list>
## 1 22 <df [2 × 2]> 4.5 40.1 -88.2 <chr [0]>
## 2 33 <df [2 × 2]> 4.5 40.1 -88.2 <chr [0]>
## 3 36 <df [3 × 2]> 4.5 40.1 -88.2 <chr [0]>
## 4 30 <df [3 × 2]> 3.5 40.1 -88.2 <chr [2]>
## 5 9 <df [3 × 2]> 4.5 40.1 -88.2 <chr [0]>
## 6 32 <df [3 × 2]> 3.5 40.1 -88.2 <chr [0]>
## 7 12 <df [1 × 2]> 4.5 40.1 -88.2 <chr [0]>
## 8 16 <df [1 × 2]> 4 40.1 -88.2 <chr [0]>
## 9 24 <df [3 × 2]> 4.5 40.1 -88.2 <chr [0]>
## 10 16 <df [3 × 2]> 5 40.1 -88.2 <chr [0]>
## price location$address1 $address2 $address3 $city $zip_code
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 $ 125 Lincoln Square "" "" Urbana 61801
## 2 $ 1003 S Lynn St "" "" Urbana 61801
## 3 $$ 202 E Green St "Ste 3" "" Champaign 61820
## 4 $$ 306 W Springfield Ave "" "" Urbana 61801
## 5 $$ 118 Lincoln Square "" "" Urbana 61801
## 6 $$$ 106 W Main St "" "" Urbana 61801
## 7 $$$$ 311 W Springfield Ave "" "" Urbana 61801
## 8 $$$ 119 Lincoln Square Mall "" "" Urbana 61801
## 9 $$ 616 E Green St "Ste F" "" Champaign 61820
## 10 $ 28 E Chester St "" "" Champaign 61820
## $country $state $display_address phone display_phone distance
## <chr> <chr> <list> <chr> <chr> <dbl>
## 1 US IL <chr [2]> +12173527878 (217) 352-7878 613.
## 2 US IL <chr [2]> +12173650318 (217) 365-0318 1125.
## 3 US IL <chr [3]> +12176974171 (217) 697-4171 2216.
## 4 US IL <chr [2]> +12173281655 (217) 328-1655 783.
## 5 US IL <chr [2]> +12173282254 (217) 328-2254 725.
## 6 US IL <chr [2]> +12173672880 (217) 367-2880 844.
## 7 US IL <chr [2]> +12173442123 (217) 344-2123 744.
## 8 US IL <chr [2]> +12173677927 (217) 367-7927 626.
## 9 US IL <chr [3]> +12176075048 (217) 607-5048 1665.
## 10 US IL <chr [2]> +12173528618 (217) 352-8618 2897.
## # … with 3,276 more rows
## # ℹ Use `print(n = ...)` to see more rows
# Extract coordinates
yelp_sf <- yelp_all %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$latitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# Map
tm_shape(yelp_sf) +
tm_dots(col = "review_count", style="quantile")
Answer the following questions:
1. What’s the county and state of your choice?
Champaign County, Illinois
2. How many businesses are there in total?
There are 3286 businesses in total.
3. How many businesses are there for each business category?
Among 3286 businesses, 358 are hotels, and 2928 are shopping businesses.
4. Upon visual inspection, can you see any noticeable spatial patterns to the way they are distributed across the county (e.g., clustering of businesses at some parts of the county)?
The businesses are highly clustered around City of Champaign, City of Rantoul, City of Mahomet, and City of Bloomington. Businesses with high review counts are especially concentrated in the City of Champaign. When I zoom into City of Champaign, I could also discover that the businesses are mostly located along Neil street, which vertically crosses city. Also, businesses with high review counts were placed along the street and within Downtown Champaign, which is in the top-left portion of the city. Businesses that are outside of the major cities usually have low review counts.
END.