For mini-assignment 1, I requested some business data from two different Yelp business categories, using tidycensus for Census Tract polygons.
Get the census tract polygons for Dekalb County, GA. Since the county is represented in both city of Atlanta and the city of Stone Mountain, I used the concatenate function to filter by both.
# need to get polygon data here, choose a different variable
tract <- tidycensus::get_acs(geography = "tract",
state = "GA",
county = "Dekalb",
variables = c(population = "B01003_001",
medianincome = "B19013_001"),
year = 2019,
survey = "acs5",
geometry = TRUE, # returns sf objects
output = "wide")
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================ | 24%
|
|================= | 24%
|
|================== | 25%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|=================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|======================================================================| 99%
|
|======================================================================| 100%
atlanta <- places('GA') %>%
filter(NAME %in% c('Stone Mountain', 'Atlanta')) ##Dekab county stretches into two cities
## Retrieving data for the year 2021
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
tract <- tract[atlanta,]
## View acs data
tract
## Simple feature collection with 36 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.35022 ymin: 33.64706 xmax: -84.09704 ymax: 33.90272
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME populationE
## 7 13089021906 Census Tract 219.06, DeKalb County, Georgia 5572
## 11 13089020500 Census Tract 205, DeKalb County, Georgia 3161
## 15 13089023209 Census Tract 232.09, DeKalb County, Georgia 5569
## 21 13089020600 Census Tract 206, DeKalb County, Georgia 2050
## 23 13089021102 Census Tract 211.02, DeKalb County, Georgia 6533
## 35 13089020802 Census Tract 208.02, DeKalb County, Georgia 4411
## 37 13089023801 Census Tract 238.01, DeKalb County, Georgia 4125
## 38 13089023601 Census Tract 236.01, DeKalb County, Georgia 3188
## 40 13089022401 Census Tract 224.01, DeKalb County, Georgia 3873
## 45 13089022800 Census Tract 228, DeKalb County, Georgia 4361
## populationM medianincomeE medianincomeM geometry
## 7 570 46448 4613 MULTIPOLYGON (((-84.187 33....
## 11 314 65694 19701 MULTIPOLYGON (((-84.34919 3...
## 15 500 59983 10992 MULTIPOLYGON (((-84.18113 3...
## 21 337 63313 16471 MULTIPOLYGON (((-84.34185 3...
## 23 493 110504 20795 MULTIPOLYGON (((-84.34831 3...
## 35 421 73981 16256 MULTIPOLYGON (((-84.31964 3...
## 37 372 81103 11921 MULTIPOLYGON (((-84.34944 3...
## 38 578 72303 16421 MULTIPOLYGON (((-84.30967 3...
## 40 338 110775 11426 MULTIPOLYGON (((-84.34842 3...
## 45 216 136711 23494 MULTIPOLYGON (((-84.2965 33...
These are the functions to get the radius, centroid, bounding box, everything that allows the polygon area to be displayed later.
# Function: Get tract-wise radius
get_r <- function(poly, epsg_id){
#---------------------
# Takes: a single POLYGON or LINESTRTING
# Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
#---------------------
# Get bounding box of a given polygon
bb <- st_bbox(poly)
# Get lat & long coordinates of any one corner of the bounding box.
bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
# Get centroid of the bb
bb_center_x <- (bb[3]+bb[1])/2
bb_center_y <- (bb[4]+bb[2])/2
bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
# Get the distance between bb_p and c
r <- st_distance(bb_corner, bb_center)
# Multiply 1.1 to make the circle a bit larger than the Census Tract.
# See the Yelp explanation of their radius parameter to see why we do this.
bb_center$radius <- r*1.2
return(bb_center)
}
## Using a loop -----------------------------------------------------------------
# Creating an empty vector of NA.
# Results will fill this vector
epsg_id <- 4326
r4all_loop <- vector("list", nrow(tract))
# Starting a for-loop
for (i in 1:nrow(tract)){
r4all_loop[[i]] <- tract %>%
st_transform(crs = epsg_id) %>%
st_geometry() %>%
.[[i]] %>%
get_r(epsg_id = epsg_id)
}
r4all_loop <- bind_rows(r4all_loop)
# Using a functional -----------------------------------------------------------
# We use a functional (sapply) to apply this custom function to each Census Tract.
r4all_apply <- tract %>%
st_geometry() %>%
st_transform(crs = epsg_id) %>%
lapply(., function(x) get_r(x, epsg_id = epsg_id))
r4all_apply <- bind_rows(r4all_apply)
# Are these two identical?
identical(r4all_apply, r4all_loop) ## checking because we used two functions that do the same thing
## [1] TRUE
# Appending X Y coordinates as seprate columns
ready_4_yelp <- r4all_apply %>%
mutate(x = st_coordinates(.)[,1],
y = st_coordinates(.)[,2])
The following code chunk visualizes the polygon area that will be used as the boundaries for business data that will be displayed later.
tmap_mode('view')
## tmap mode set to interactive viewing
# Select the 36 rows
ready_4_yelp[1:36,] %>%
# Draw a buffer centered at the centroid of Tract polygons.
# Radius of the buffer is the radius we just calculated using loop
st_buffer(., dist = .$radius) %>%
# Display this buffer in red
tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
# Display the original polygon in blue
tm_shape(tract[1:36,]) + tm_borders(col= 'blue')
The following frame applies the census tract to the raw data.
# FUNCTION
get_yelp <- function(tract, category){
# ----------------------------------
# Gets one row of tract information (1,) and category name (str),
# Outputs a list of business data.frame
Sys.sleep(1)
n <- 1
# First request --------------------------------------------------------------
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50, # = 0 when n = 1
radius = round(tract$radius),
limit = 50)
# Calculate how many requests are needed in total
required_n <- ceiling(resp$total/50)
# out is where the results will be appended to.
out <- vector("list", required_n)
# Store the business information to nth slot in out
out[[n]] <- resp$businesses
# Change the name of the elements to the total required_n
# This is to know if there are more than 1000 businesses,
# we know how many.
names(out)[n] <- required_n
# Throw error if more than 1000
if (resp$total >= 1000)
{
# glue formats string by inserting {n} with what's currently stored in object n.
print(glue::glue("{n}th row has >= 1000 businesses."))
# Stop before going into the loop because we need to
# break down Census Tract to something smaller.
return(out)
}
else
{
# add 1 to n
n <- n + 1
# Now we know required_n -----------------------------------------------------
# Starting a loop
while(n <= required_n){
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50,
radius = round(tract$radius),
limit = 50)
out[[n]] <- resp$businesses
n <- n + 1
} #<< end of while loop
# Merge all elements in the list into a single data frame
out <- out %>% bind_rows()
return(out)
}
}
# Prepare a collector
yelp_bank_list <- vector("list", nrow(ready_4_yelp))
# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
yelp_bank_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "banks"))
if (row %% 36 == 0){
print(paste0("Current row: ", row))
}
}
## [1] "Current row: 36"
# Collapsing the list into a data.frame
yelp_bank <- yelp_bank_list %>% bind_rows() %>% as_tibble() %>%
mutate(business_type = "banks")
# print
yelp_bank %>% print(width=1000)
## # A tibble: 124 × 16
## id alias
## <chr> <chr>
## 1 Rd5vXW3GpBxyGbJBZGoTCQ citizens-trust-bank-stone-mountain
## 2 sFLTL1srlad-uwBixN8MVw chase-bank-stone-mountain-3
## 3 8EzNyE_7S4jpTA6z6iVfzQ wells-fargo-bank-stone-mountain-11
## 4 YFs1mwUAjFFFuKPcc1JZEA wells-fargo-bank-atlanta-59
## 5 tDmsP7eTlh6Ic_r9N7aQgQ bond-community-federal-credit-union-atlanta-2
## 6 I4ajnYe2miujNkLuYojdLA chase-bank-atlanta-11
## 7 C6iSJji5JBGtHMbXf4b8dw bank-of-america-atlanta-24
## 8 aiSmo01KU7KNrCc6q4Whzw synovus-atlanta
## 9 Rd5vXW3GpBxyGbJBZGoTCQ citizens-trust-bank-stone-mountain
## 10 FBDOrMpNPoRAX0bUDq9eDA bank-ozk-brookhaven-2
## name
## <chr>
## 1 Citizens Trust Bank
## 2 Chase Bank
## 3 Wells Fargo Bank
## 4 Wells Fargo Bank
## 5 BOND Community Federal Credit Union
## 6 Chase Bank
## 7 Bank of America Financial Center
## 8 Synovus
## 9 Citizens Trust Bank
## 10 Bank OZK
## image_url
## <chr>
## 1 "https://s3-media1.fl.yelpcdn.com/bphoto/sn7fhdl7gp4lc8xzR6jScQ/o.jpg"
## 2 "https://s3-media1.fl.yelpcdn.com/bphoto/qcIuFjhtA6SDjxH0J1AFHA/o.jpg"
## 3 "https://s3-media2.fl.yelpcdn.com/bphoto/5wdwyabLljllUN0hEuEuWg/o.jpg"
## 4 "https://s3-media1.fl.yelpcdn.com/bphoto/dKCnwa10SS7EBO_HqNxKjQ/o.jpg"
## 5 "https://s3-media3.fl.yelpcdn.com/bphoto/9S2I73gOiI_mRxLv0dNSeQ/o.jpg"
## 6 "https://s3-media1.fl.yelpcdn.com/bphoto/65ALlNfJhyqrdWA93h-8EQ/o.jpg"
## 7 "https://s3-media4.fl.yelpcdn.com/bphoto/zzfwz221TtMT9YEHZ6pKSA/o.jpg"
## 8 "https://s3-media3.fl.yelpcdn.com/bphoto/gTuXCRU4AuruabrULGDBmA/o.jpg"
## 9 "https://s3-media1.fl.yelpcdn.com/bphoto/sn7fhdl7gp4lc8xzR6jScQ/o.jpg"
## 10 ""
## is_closed
## <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## url
## <chr>
## 1 https://www.yelp.com/biz/citizens-trust-bank-stone-mountain?adjust_creative=…
## 2 https://www.yelp.com/biz/chase-bank-stone-mountain-3?adjust_creative=VuthNv6…
## 3 https://www.yelp.com/biz/wells-fargo-bank-stone-mountain-11?adjust_creative=…
## 4 https://www.yelp.com/biz/wells-fargo-bank-atlanta-59?adjust_creative=VuthNv6…
## 5 https://www.yelp.com/biz/bond-community-federal-credit-union-atlanta-2?adjus…
## 6 https://www.yelp.com/biz/chase-bank-atlanta-11?adjust_creative=VuthNv6lmGi4h…
## 7 https://www.yelp.com/biz/bank-of-america-atlanta-24?adjust_creative=VuthNv6l…
## 8 https://www.yelp.com/biz/synovus-atlanta?adjust_creative=VuthNv6lmGi4hXZmh35…
## 9 https://www.yelp.com/biz/citizens-trust-bank-stone-mountain?adjust_creative=…
## 10 https://www.yelp.com/biz/bank-ozk-brookhaven-2?adjust_creative=VuthNv6lmGi4h…
## review_count categories rating coordinates$latitude $longitude transactions
## <int> <list> <dbl> <dbl> <dbl> <list>
## 1 2 <df [1 × 2]> 5 33.8 -84.2 <list [0]>
## 2 2 <df [1 × 2]> 2 33.8 -84.2 <list [0]>
## 3 4 <df [1 × 2]> 2 33.8 -84.2 <list [0]>
## 4 22 <df [1 × 2]> 2.5 33.8 -84.3 <list [0]>
## 5 12 <df [2 × 2]> 2.5 33.8 -84.3 <list [0]>
## 6 12 <df [1 × 2]> 2.5 33.8 -84.3 <list [0]>
## 7 26 <df [1 × 2]> 2 33.7 -84.3 <list [0]>
## 8 5 <df [1 × 2]> 3 33.8 -84.3 <list [0]>
## 9 2 <df [1 × 2]> 5 33.8 -84.2 <list [0]>
## 10 3 <df [1 × 2]> 2.5 33.9 -84.3 <list [0]>
## location$address1 $address2 $address3 $city $zip_code $country
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 5771 Rockbridge Rd "" "" Stone Mountain 30087 US
## 2 933 N Hairston Rd "Ste 1" <NA> Stone Mountain 30083 US
## 3 6063 Memorial Dr <NA> <NA> Stone Mountain 30083 US
## 4 1270 Caroline St NE "Ste D" "" Atlanta 30307 US
## 5 433 Moreland Ave NE "" "" Atlanta 30307 US
## 6 1215 Caroline St NE "Bldg H" <NA> Atlanta 30307 US
## 7 411 Flat Shoals Ave SE "" "" Atlanta 30316 US
## 8 144 Moreland Av NE "Ste E" <NA> Atlanta 30307 US
## 9 5771 Rockbridge Rd "" "" Stone Mountain 30087 US
## 10 104 Town Blvd NE <NA> <NA> Brookhaven 30319 US
## $state $display_address phone display_phone distance business_type
## <chr> <list> <chr> <chr> <dbl> <chr>
## 1 GA <chr [2]> +17704988777 (770) 498-8777 1934. banks
## 2 GA <chr [3]> +17704653876 (770) 465-3876 1676. banks
## 3 GA <chr [2]> +14048652675 (404) 865-2675 1603. banks
## 4 GA <chr [3]> +14045889857 (404) 588-9857 851. banks
## 5 GA <chr [2]> +14045250619 (404) 525-0619 1551. banks
## 6 GA <chr [3]> +14045220495 (404) 522-0495 892. banks
## 7 GA <chr [2]> +14043300750 (404) 330-0750 1557. banks
## 8 GA <chr [3]> +17705764471 (770) 576-4471 955. banks
## 9 GA <chr [2]> +17704988777 (770) 498-8777 1657. banks
## 10 GA <chr [2]> +14704221020 (470) 422-1020 399. banks
## # ℹ 114 more rows
The purpose of the code chunk is to store tract data for banks from the census business category. This creates an empty vector and populates it using a loop. For the banks, 124 observations were stored using this code chunk.
# Prepare a collector
yelp_carpenters_list <- vector("list", nrow(ready_4_yelp))
# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
yelp_carpenters_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "carpenters"))
if (row %% 36 == 0){
print(paste0("Current row: ", row))
}
}
## [1] "Current row: 36"
# Collapsing the list into a data.frame
yelp_carpenters <- yelp_carpenters_list %>% bind_rows() %>% as_tibble() %>%
mutate(business_type = "carpenters")
# print
yelp_carpenters %>% print(width=1000)
## # A tibble: 294 × 16
## id alias
## <chr> <chr>
## 1 NfiAXBdm0jq8kMdTbTlilg three-brothers-painting-woodstock
## 2 phVCavHyxoE3e-Iotxed3w atlanta-window-door-glass-stone-mountain-3
## 3 86hewzjrZ13MXZn_LsAB4A homefix-pro-johns-creek-3
## 4 zTVw5Ma-Se-aBJk7HA67uA certapro-painters-of-atlanta-atlanta-3
## 5 k2FUcWUQk7FWz1rx8hDEVg atlanta-handyman-atlanta-5
## 6 53MYt8bD8rnDOPwY8BF7FQ property-medics-of-georgia-peachtree-corners
## 7 1m3fOdoC5yqCkLfKmAKFRw modern-ideas-custom-interiors-conyers
## 8 FAysX0NqNGvVaGYd7bRKkg bellas-multi-services-lawrenceville-3
## 9 z5QFbk2bw9-_eP16VnpDyQ jai-lee-dezignz-atlanta
## 10 v7PTBSeRUlVPSf8GgkJoNg garrity-construction-alpharetta-3
## name
## <chr>
## 1 Three Brothers Painting
## 2 Atlanta Window Door + Glass
## 3 HomeFix pro
## 4 CertaPro Painters of Atlanta
## 5 Atlanta Handyman
## 6 Property Medics of Georgia
## 7 Modern Ideas Custom Interiors
## 8 Bella's Multi Services
## 9 Jai Lee Dezignz
## 10 Garrity Construction
## image_url
## <chr>
## 1 https://s3-media4.fl.yelpcdn.com/bphoto/L54VkBtt_c-mGv9hJAPmCA/o.jpg
## 2 https://s3-media2.fl.yelpcdn.com/bphoto/yC769_HwYvevOLWRtUiaoA/o.jpg
## 3 https://s3-media4.fl.yelpcdn.com/bphoto/LUerpW_Clq5FNf2xWgXRlw/o.jpg
## 4 https://s3-media2.fl.yelpcdn.com/bphoto/2zmYKPrD7PQUhOAaycROvw/o.jpg
## 5 https://s3-media1.fl.yelpcdn.com/bphoto/M3TVw2KHL1oVuBBIOQqjBQ/o.jpg
## 6 https://s3-media4.fl.yelpcdn.com/bphoto/C9xWMgplYQENtqXdAEpCaA/o.jpg
## 7 https://s3-media4.fl.yelpcdn.com/bphoto/GSV9hJBvpuhZlWiZmPDqkA/o.jpg
## 8 https://s3-media2.fl.yelpcdn.com/bphoto/1x4A5jh8XTqGYpNvc0Ah0A/o.jpg
## 9 https://s3-media3.fl.yelpcdn.com/bphoto/m1ni2HeDXPTTZN4iPNPSyA/o.jpg
## 10 https://s3-media4.fl.yelpcdn.com/bphoto/r2ugSP_mKJH1qRakpEmdSg/o.jpg
## is_closed
## <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## url
## <chr>
## 1 https://www.yelp.com/biz/three-brothers-painting-woodstock?adjust_creative=V…
## 2 https://www.yelp.com/biz/atlanta-window-door-glass-stone-mountain-3?adjust_c…
## 3 https://www.yelp.com/biz/homefix-pro-johns-creek-3?adjust_creative=VuthNv6lm…
## 4 https://www.yelp.com/biz/certapro-painters-of-atlanta-atlanta-3?adjust_creat…
## 5 https://www.yelp.com/biz/atlanta-handyman-atlanta-5?adjust_creative=VuthNv6l…
## 6 https://www.yelp.com/biz/property-medics-of-georgia-peachtree-corners?adjust…
## 7 https://www.yelp.com/biz/modern-ideas-custom-interiors-conyers?adjust_creati…
## 8 https://www.yelp.com/biz/bellas-multi-services-lawrenceville-3?adjust_creati…
## 9 https://www.yelp.com/biz/jai-lee-dezignz-atlanta?adjust_creative=VuthNv6lmGi…
## 10 https://www.yelp.com/biz/garrity-construction-alpharetta-3?adjust_creative=V…
## review_count categories rating coordinates$latitude $longitude transactions
## <int> <list> <dbl> <dbl> <dbl> <list>
## 1 53 <df [3 × 2]> 4.5 34.1 -84.5 <list [0]>
## 2 3 <df [3 × 2]> 5 33.8 -84.2 <list [0]>
## 3 5 <df [3 × 2]> 5 34.0 -84.2 <list [0]>
## 4 14 <df [3 × 2]> 3.5 33.7 -84.3 <list [0]>
## 5 1 <df [3 × 2]> 5 33.8 -84.4 <list [0]>
## 6 1 <df [3 × 2]> 5 33.9 -84.3 <list [0]>
## 7 1 <df [2 × 2]> 5 33.7 -84.1 <list [0]>
## 8 3 <df [3 × 2]> 3.5 33.9 -84.1 <list [0]>
## 9 2 <df [3 × 2]> 5 34.0 -83.6 <list [0]>
## 10 1 <df [2 × 2]> 5 34.1 -84.3 <list [0]>
## location$address1 $address2 $address3 $city $zip_code
## <chr> <chr> <chr> <chr> <chr>
## 1 "314 Creekstone Ridge" "" "" Woodstock 30189
## 2 "925 Main St" "300-32" "" Stone Mountain 30083
## 3 "" <NA> "" Johns Creek 30097
## 4 "2960 Alston Dr SE" "" "" Atlanta 30317
## 5 "" "" <NA> Atlanta 30303
## 6 "3250 Peachtree Corners Cir" "Ste A" "" Peachtree Corners 30092
## 7 "2592 Jeremiah Industrial Rd" "" <NA> Conyers 30012
## 8 <NA> <NA> <NA> Lawrenceville 30044
## 9 <NA> <NA> <NA> Atlanta 30303
## 10 "" <NA> <NA> Alpharetta 30004
## $country $state $display_address phone display_phone distance
## <chr> <chr> <list> <chr> <chr> <dbl>
## 1 US GA <chr [2]> "+17709283667" "(770) 928-3667" 45949.
## 2 US GA <chr [3]> "+14049130313" "(404) 913-0313" 1123.
## 3 US GA <chr [1]> "+16788302125" "(678) 830-2125" 27957.
## 4 US GA <chr [2]> "+14043771867" "(404) 377-1867" 12129.
## 5 US GA <chr [1]> "" "" 20464.
## 6 US GA <chr [3]> "+14044768080" "(404) 476-8080" 17586.
## 7 US GA <chr [2]> "+14044418865" "(404) 441-8865" 14770.
## 8 US GA <chr [1]> "+17708828667" "(770) 882-8667" 17922.
## 9 US GA <chr [1]> "+19138208422" "(913) 820-8422" 56978.
## 10 US GA <chr [1]> "+17702039404" "(770) 203-9404" 43467.
## business_type
## <chr>
## 1 carpenters
## 2 carpenters
## 3 carpenters
## 4 carpenters
## 5 carpenters
## 6 carpenters
## 7 carpenters
## 8 carpenters
## 9 carpenters
## 10 carpenters
## # ℹ 284 more rows
For the second business category, I pulled data about carpenters using census tract. I used the same methods that I used for the first business category. This returned 294 observations in my global environment.
yelp_final <- bind_rows(yelp_bank, yelp_carpenters)
Bind the business data together into yelp_final item in
my global environment, which returned 418 total
observations for my combined data frame.
# Extract coordinates
yelp_sf <- yelp_final %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$latitude) %>%
mutate(x = if_else(x > 0, 0 - x, x)) %>% ## because some data was mapping to + longitude incorrectly, to India
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# Map
tm_shape(yelp_sf) +
tm_dots(col = "business_type") ##changed the legend to sort by business type and print two very different colors
Next, I mapped the data frames on a leaflet library map using
coordinates. One of my plots was ending up in India due to incorrect
data. In yelp_bank I noticed that 4 rows had a positive
longitude, so I added a mutate() function to put this in
the correct longitude.
When looking at the spatial patterns for data of the two business categories, I notice that even though there are fewer banks represented, the bank data seems more dense and concentrated to the borders of neighboring counties rather than Central Dekalb county. The banks seem to be more concentrated around major streets and interstates for car traffic.
The banks conform to the polygons I set much more than the carpentry businesses, existing approximately within the Dekalb County, GA boundaries. The carpentry businesses are spread across metropolitan Atlanta. There’s not enough data here to know why for certain. However, I do come up with some educated guesses that might be useful to lead more exploration. * Carpenters may represent more self-owned businesses which use multiple addresses for UPS Box, P.O. Box, owner’s home address, etc. * Banks are more corporate and may have stricter census reporting standards.