KP Wells Urban Analytics Mini Assignment 1 Due 9/20/22
First, we need to decide what area we’re doing and what businesses we want to scrape Yelp! for. I live in DeKalb County, Georgia, which is part of the Atlanta metro, so let’s go with that. As for businesses, let’s say I want to have brunch and a spa day with my friends. So I’m going to be looking at restaurants that serve brunch and medical spas. That means I’ll look at the categories breakfast_brunch and medicalspa. Let’s start by calling some libraries I might need
library(tidycensus) #Lets us use Census api
library(sf) #allows us to read and write shapefifles
## Linking to GEOS 3.9.1, GDAL 3.4.3, PROJ 7.2.1; sf_use_s2() is TRUE
library(tmap) #visualizes simple maps
library(jsonlite) #reads and writes json files
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
library(httr) #lets us make api requests
library(jsonlite)
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(here) #stores paths
## here() starts at C:/Users/kwells65/OneDrive - Georgia Institute of Technology/Assignments
library(yelpr)
library(knitr)
Next, I need to install my Census API key
tidycensus::census_api_key(Sys.getenv("census_api"))
## To install your API key for use in future sessions, run this function with `install = TRUE`.
#NOTE: My census api is stored as an environment variable for security. Also, I didn't actually have my census api installed on this machine, but next time I need it, I'll add "install = TRUE" to this function.
Now I’m ready to get my polygonal data. The block of code below gets the census tract boundaries I want for DeKalb.
dek_tracts <- suppressMessages(
get_acs(geography = "tract", #you can use other geographies like 'county' or 'state' here
state = "GA",
county = "Dekalb",
variables = c(hhincome = "B19019_001", population = "B01003_001"),
year = 2019,
survey = "acs5", #This is American Community Survey 5-yr estimates
geometry = TRUE, #allows us to return sf objects
output = "wide"))
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|==== | 6%
|
|===== | 7%
|
|====== | 9%
|
|======= | 11%
|
|========= | 13%
|
|========== | 14%
|
|============ | 16%
|
|============ | 18%
|
|============== | 20%
|
|=============== | 21%
|
|================= | 24%
|
|================= | 25%
|
|=================== | 27%
|
|==================== | 29%
|
|====================== | 31%
|
|======================= | 32%
|
|======================== | 34%
|
|========================= | 36%
|
|=========================== | 38%
|
|============================ | 39%
|
|============================= | 42%
|
|============================== | 43%
|
|================================ | 45%
|
|================================= | 47%
|
|================================== | 49%
|
|=================================== | 50%
|
|===================================== | 52%
|
|====================================== | 54%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|============================================ | 63%
|
|============================================= | 64%
|
|=============================================== | 67%
|
|================================================ | 68%
|
|================================================= | 70%
|
|================================================== | 72%
|
|==================================================== | 74%
|
|===================================================== | 75%
|
|====================================================== | 77%
|
|======================================================= | 79%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|=========================================================== | 85%
|
|============================================================ | 86%
|
|============================================================== | 88%
|
|=============================================================== | 90%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|=================================================================== | 95%
|
|==================================================================== | 97%
|
|===================================================================== | 99%
|
|======================================================================| 100%
I like to specify dfs as much as possible which is why I used ‘dek_tracts’ instead of just ‘tracts’ it’s easier for me to avoid confusion when I’m repeating the analysis multiple times. This is especially if I’m using multiple geographies or looking at different locations.
#Let me check the output before moving on
message(sprintf("nrow: %s, ncol: %s", nrow(dek_tracts), ncol(dek_tracts)))
## nrow: 145, ncol: 7
dek_tracts %>% head() %>% knitr::kable()
| GEOID | NAME | hhincomeE | hhincomeM | populationE | populationM | geometry |
|---|---|---|---|---|---|---|
| 13089021213 | Census Tract 212.13, DeKalb County, Georgia | 154063 | 19674 | 3526 | 204 | MULTIPOLYGON (((-84.34783 3… |
| 13089023506 | Census Tract 235.06, DeKalb County, Georgia | 45924 | 13793 | 6465 | 927 | MULTIPOLYGON (((-84.25237 3… |
| 13089021305 | Census Tract 213.05, DeKalb County, Georgia | 55109 | 4607 | 4970 | 391 | MULTIPOLYGON (((-84.28811 3… |
| 13089023313 | Census Tract 233.13, DeKalb County, Georgia | 55143 | 5672 | 5294 | 576 | MULTIPOLYGON (((-84.14593 3… |
| 13089021604 | Census Tract 216.04, DeKalb County, Georgia | 159306 | 38073 | 3237 | 254 | MULTIPOLYGON (((-84.31051 3… |
| 13089021913 | Census Tract 219.13, DeKalb County, Georgia | 32983 | 3760 | 4450 | 559 | MULTIPOLYGON (((-84.1905 33… |
Next, I’m going to make it so I only keep the variables I want. The ‘E’ in this case stands for ‘Estimate’.
dek_tracts2 <- dek_tracts %>%
select(GEOID,
hhincome = hhincomeE, # New name = old name
population = populationE)
I want to visualize my data periodically to make sure everything looks okay. So let’s take a look at the tracts on a map.
tmap_mode("view")
## tmap mode set to interactive viewing
#tmap mode set to interactive viewing
tm_shape(dek_tracts2) + tm_borders()
#The above block of code should show us all the census tracts for DeKalb.
In this block of code, we’re going to get the bounding box of our polygon, get lat/long coordinates of any one of the corners of that bounding box and get the centroid of the bounding box (bb)
get_r <- function(poly, epsg_id){
bb <- st_bbox(poly)
bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
bb_center_x <- (bb[3]+bb[1])/2
bb_center_y <- (bb[4]+bb[2])/2
bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
#Next, I'm going to get the distance between bb_p and c and multiply it by 1.1 to make the circle a bit bigger than the tracts
r <- st_distance(bb_corner, bb_center)
bb_center$radius <- r*1.2
return(bb_center)
}
Now that I’ve established the bounding box, let’s apply the function to each of our polygons using a for loop. First, I need to create an empty vector for our results to fill.
epsg_id <- 4326 #NOTE: 4326 measures distance in meters. You can also use 26967.
r4all_loop <- vector("list", nrow(dek_tracts2))
#for loop starts here
for (i in 1:nrow(dek_tracts2)){
r4all_loop[[i]] <- dek_tracts2 %>%
st_transform(crs = epsg_id) %>%
st_geometry() %>%
.[[i]] %>%
get_r(epsg_id = epsg_id)
}
r4all_loop <- bind_rows(r4all_loop)
#Now, I append my x/y coordinates
ready_4_yelp <- r4all_loop %>%
mutate(x = st_coordinates(.)[,1],
y = st_coordinates(.)[,2])
Let’s map it!
tmap_mode('view')
## tmap mode set to interactive viewing
#To check my data, I'll take a look at the first 10 rows, draw a red buffer centered on the centroid of the tract polygons,and display the original polygons in blue
ready_4_yelp[1:10,] %>%
st_buffer(., dist = .$radius) %>%
tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
tm_shape(dek_tracts2[1:10,]) + tm_borders(col= 'blue')
I’ll start off by defining my function. The purpose of this block of code is to get one row of tract information (1,) #and category name (str). The output is a list of business data.frame.
get_yelp <- function(tract, category){
n <- 1
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50, # = 0 when n = 1
radius = round(tract$radius),
limit = 50)
required_n <- ceiling(resp$total/50)
out <- vector("list", required_n)
#'out' is where our results will be appended to.
# Store the business information to nth slot in out
out[[n]] <- resp$businesses
#Next, I need to change the name of the elements to the total required_n
#This is to know if there are more than 1000 businesses,we know how many.
names(out)[n] <- required_n
#throw error if more than 1000
if (resp$total >= 1000)
{
#glue formats strings of text by inserting {n} with what's currently stored in object n.
print(glue::glue("{n}th row has >= 1000 businesses."))
#Now, I need to stop before going into the loop because we need to break down Census Tract to something smaller.
return(out)
}
else
{
# add 1 to n
n <- n + 1
#here's where the while loop starts
while(n <= required_n){
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50,
radius = round(tract$radius),
limit = 50)
out[[n]] <- resp$businesses
n <- n + 1
} #<< this signifies the end of the while loop
#Finally, we merge all elements in the list into a single data frame
dek_out <- out %>% bind_rows()
return(dek_out)
}
}
First, let’s test things by applying the function for the first Census Tract.
yelp_first_tract_brunch <- get_yelp(ready_4_yelp[1,], "breakfast_brunch") %>%
as_tibble()
## No encoding supplied: defaulting to UTF-8.
yelp_first_tract_spa <- get_yelp(ready_4_yelp[1,], "medicalspa") %>%
as_tibble()
## No encoding supplied: defaulting to UTF-8.
#This gets the data for both business types I want to look at. Let's combine them.
yelp_first_tract <- bind_rows(yelp_first_tract_brunch, yelp_first_tract_spa)
#As always, let's check it!
yelp_first_tract %>% print
## # A tibble: 17 × 16
## id alias name image…¹ is_cl…² url revie…³ categ…⁴ rating coord…⁵
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl> <dbl>
## 1 Sh7BBAHsDkN… firs… Firs… "https… FALSE http… 268 <df> 3.5 33.9
## 2 C8SrEYsWjjG… j-ch… J Ch… "https… FALSE http… 88 <df> 3 33.9
## 3 iJ0DwsHhE75… bell… Bell… "https… FALSE http… 5 <df> 3.5 33.9
## 4 HW36mkQQdcX… park… Park… "https… FALSE http… 35 <df> 5 34.0
## 5 aaajyKLRtLL… atla… Atla… "https… FALSE http… 48 <df> 5 33.8
## 6 Cldc9nU5XRY… hydr… Hydr… "https… FALSE http… 63 <df> 4 33.8
## 7 W4_ucUE30B4… hydr… Hydr… "https… FALSE http… 37 <df> 4.5 33.8
## 8 Ns4Cu0YlZlv… b-ne… B Ne… "https… FALSE http… 10 <df> 4.5 33.8
## 9 pLaH_zlvbF1… lux-… LUX … "https… FALSE http… 16 <df> 4 33.9
## 10 5-OUTmlfQwB… hydr… Hydr… "https… FALSE http… 11 <df> 4 33.9
## 11 Sl34mZJVY7B… mirr… Mirr… "https… FALSE http… 1 <df> 5 33.9
## 12 WOcIAFJsOtG… adva… Adva… "https… FALSE http… 3 <df> 5 33.9
## 13 OfyqEX-Mb3j… body… Body… "https… FALSE http… 2 <df> 5 33.8
## 14 l5DR2U_9o4T… body… Body… "https… FALSE http… 4 <df> 5 33.6
## 15 1PKvJcXAkBv… the-… The … "https… FALSE http… 2 <df> 5 33.9
## 16 0rJVLBKHycm… bell… Bell… "" FALSE http… 3 <df> 2.5 34.0
## 17 WunHdCt2Rpr… 360-… 360 … "https… FALSE http… 1 <df> 1 33.8
## # … with 7 more variables: coordinates$longitude <dbl>, transactions <list>,
## # price <chr>, location <df[,8]>, phone <chr>, display_phone <chr>,
## # distance <dbl>, and abbreviated variable names ¹image_url, ²is_closed,
## # ³review_count, ⁴categories, ⁵coordinates$latitude
## # ℹ Use `colnames()` to see all variable names
First, I’ll prepare my collectors.
yelp_all_list <- vector("list", nrow(ready_4_yelp))
yelp_brunch_list <- vector("list", nrow(ready_4_yelp))
yelp_spa_list <- vector("list", nrow(ready_4_yelp))
Now I can write my loop. My advice is to really think about how to structure the loop when you’re working with multiple variables.
for (row in 1:nrow(ready_4_yelp)){
yelp_brunch <- suppressMessages(get_yelp(ready_4_yelp[row,], "breakfast_brunch"))
yelp_spa <- suppressMessages(get_yelp(ready_4_yelp[row,], "medicalspa"))
yelp_all_list[[row]] <- yelp_brunch %>% bind_rows(yelp_spa)
yelp_brunch_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "breakfast_brunch"))
yelp_spa_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "medicalspa"))
if (row %% 10 == 0){
print(paste0("Current row: ", row))
}
}
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 10"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 20"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 30"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 40"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 50"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 60"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 70"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 80"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 90"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 100"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 110"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 120"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 130"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## [1] "Current row: 140"
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
## Warning: Outer names are only allowed for unnamed scalar atomic inputs
#Finally, we can collapse everything into a single df
yelp_all_dek <- yelp_all_list %>% bind_rows() %>% as_tibble()
yelp_bunch_df <- yelp_brunch_list %>% bind_rows() %>% as_tibble()
yelp_spa_df <- yelp_spa_list %>% bind_rows() %>% as_tibble()
#let's take a look
yelp_all_dek %>% print(width=1000)
## # A tibble: 1,529 × 16
## id alias
## <chr> <chr>
## 1 Sh7BBAHsDkNKIv91xLetgg first-watch-dunwoody-3
## 2 C8SrEYsWjjGYfeF41g9wjw j-christophers-atlanta-3
## 3 iJ0DwsHhE75_MwjZI54_Sg bellas-g-kitchen-sandy-spring
## 4 HW36mkQQdcXEqZn6zlgRjA park-ave-cosmetic-center-roswell-3
## 5 aaajyKLRtLL_MjzLjRH-sA atlanta-medical-aesthetics-atlanta
## 6 Cldc9nU5XRYehC3-ByS-_Q hydra-buckhead-atlanta-2
## 7 W4_ucUE30B4mtGpBYAFldg hydra-virginia-highlands-atlanta-5
## 8 Ns4Cu0YlZlvGfpYTzO5QAg b-new-beauty-studios-atlanta
## 9 pLaH_zlvbF1hGRB8pIQS_w lux-med-spa-atlanta
## 10 5-OUTmlfQwBhfohoh9-MOA hydra-sandy-springs-sandy-springs-2
## name
## <chr>
## 1 First Watch
## 2 J Christopher's
## 3 Bella's G. Kitchen
## 4 Park Ave Cosmetic Center
## 5 Atlanta Medical Aesthetics
## 6 Hydra+ Buckhead
## 7 Hydra+ Virginia Highlands
## 8 B New Beauty Studio
## 9 LUX Med Spa
## 10 Hydra+ Sandy Springs
## image_url
## <chr>
## 1 https://s3-media4.fl.yelpcdn.com/bphoto/S8_zbrjLpaStDXJ_rlTMWA/o.jpg
## 2 https://s3-media2.fl.yelpcdn.com/bphoto/Qx_bVCh68BPassOe-EDRQw/o.jpg
## 3 https://s3-media2.fl.yelpcdn.com/bphoto/Tb8Kg_uDjwcFbCOM8MVLfg/o.jpg
## 4 https://s3-media4.fl.yelpcdn.com/bphoto/CRpFyrLc1jt3UNMMESH_AQ/o.jpg
## 5 https://s3-media4.fl.yelpcdn.com/bphoto/ya_3PALu7D7kJMYFIOYuDg/o.jpg
## 6 https://s3-media1.fl.yelpcdn.com/bphoto/2x3IGy4wCQPnhxe4VVzaeQ/o.jpg
## 7 https://s3-media4.fl.yelpcdn.com/bphoto/Wkv45cHYZYSTVWguGctIDQ/o.jpg
## 8 https://s3-media1.fl.yelpcdn.com/bphoto/9QgXi2PdnpXBw7m8q5xO_A/o.jpg
## 9 https://s3-media1.fl.yelpcdn.com/bphoto/fW7Dwiv1xbfSedpdhscaFQ/o.jpg
## 10 https://s3-media3.fl.yelpcdn.com/bphoto/6nFtxxpwtnU4v_Vic2Z-gg/o.jpg
## is_closed
## <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## url
## <chr>
## 1 https://www.yelp.com/biz/first-watch-dunwoody-3?adjust_creative=D_azJCkzTpdR…
## 2 https://www.yelp.com/biz/j-christophers-atlanta-3?adjust_creative=D_azJCkzTp…
## 3 https://www.yelp.com/biz/bellas-g-kitchen-sandy-spring?adjust_creative=D_azJ…
## 4 https://www.yelp.com/biz/park-ave-cosmetic-center-roswell-3?adjust_creative=…
## 5 https://www.yelp.com/biz/atlanta-medical-aesthetics-atlanta?adjust_creative=…
## 6 https://www.yelp.com/biz/hydra-buckhead-atlanta-2?adjust_creative=D_azJCkzTp…
## 7 https://www.yelp.com/biz/hydra-virginia-highlands-atlanta-5?adjust_creative=…
## 8 https://www.yelp.com/biz/b-new-beauty-studios-atlanta?adjust_creative=D_azJC…
## 9 https://www.yelp.com/biz/lux-med-spa-atlanta?adjust_creative=D_azJCkzTpdR6H0…
## 10 https://www.yelp.com/biz/hydra-sandy-springs-sandy-springs-2?adjust_creative…
## review_count categories rating coordinates$latitude $longitude transactions
## <int> <list> <dbl> <dbl> <dbl> <list>
## 1 268 <df [3 × 2]> 3.5 33.9 -84.3 <chr [1]>
## 2 88 <df [3 × 2]> 3 33.9 -84.3 <chr [1]>
## 3 5 <df [1 × 2]> 3.5 33.9 -84.4 <chr [1]>
## 4 35 <df [3 × 2]> 5 34.0 -84.3 <list [0]>
## 5 48 <df [2 × 2]> 5 33.8 -84.4 <list [0]>
## 6 63 <df [3 × 2]> 4 33.8 -84.4 <list [0]>
## 7 37 <df [2 × 2]> 4.5 33.8 -84.4 <list [0]>
## 8 10 <df [3 × 2]> 4.5 33.8 -84.4 <list [0]>
## 9 16 <df [3 × 2]> 4 33.9 -84.4 <list [0]>
## 10 11 <df [3 × 2]> 4 33.9 -84.4 <list [0]>
## price location$address1 $address2 $address3 $city $zip_code
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 $$ 1317 Dunwoody Village Pkwy "Ste 101" "" Dunwoody 30338
## 2 $$ 5482 Chamblee Dunwoody Rd "" "" Atlanta 30338
## 3 <NA> 6600 Peachtree Dunwoody Rd "" "" Sandy Spring 30328
## 4 $$$ 633 Holcomb Bridge Rd "Ste A" "" Roswell 30076
## 5 <NA> 77 12th St NE "Loft 6" <NA> Atlanta 30309
## 6 $$ 2221 Peachtree Rd NE "Ste Q" "" Atlanta 30309
## 7 <NA> 675 N Highland Ave NE "Ste 4000" <NA> Atlanta 30306
## 8 <NA> 1465 Howell Mill Rd 200A "Ste 200A" <NA> Atlanta 30318
## 9 $$$ 4684 Roswell Rd NE "Ste A1" "" Atlanta 30342
## 10 <NA> 6400 Blue Stone Rd "Ste 120" "" Sandy Springs 30328
## $country $state $display_address phone display_phone distance
## <chr> <chr> <list> <chr> <chr> <dbl>
## 1 US GA <chr [3]> +16784433447 (678) 443-3447 706.
## 2 US GA <chr [2]> +17703951642 (770) 395-1642 831.
## 3 US GA <chr [2]> +14043887873 (404) 388-7873 2400.
## 4 US GA <chr [3]> +17702991493 (770) 299-1493 8856.
## 5 US GA <chr [3]> +17706535173 (770) 653-5173 19301.
## 6 US GA <chr [3]> +14049486780 (404) 948-6780 16179.
## 7 US GA <chr [3]> +14046206915 (404) 620-6915 20180.
## 8 US GA <chr [3]> +16788206955 (678) 820-6955 18947.
## 9 US GA <chr [3]> +14043679005 (404) 367-9005 8705.
## 10 US GA <chr [3]> +14049961135 (404) 996-1135 4989.
## # … with 1,519 more rows
## # ℹ Use `print(n = ...)` to see more rows
First, I’ll extract coordinates.
dek_yelp_sf <- yelp_all_dek %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$latitude) %>%
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
Now to visualize it using tmap.
tm_shape(dek_yelp_sf) +
tm_dots(col = "review_count", style="quantile")
Q1. What’s the county and state of your choice? DeKalb County, GA
Q2.How many businesses are there in total? 1,529 total businesses.
Q3.How many businesses are there for each business category? There were 734 places that serve brunch and 795 places that were medical spas.
Q4. Upon visual inspection, can you see any noticeable spatial patterns to the way they are distributed across the county (e.g., clustering of businesses at some parts of the county)? I noticed clustering seems to be in and around what I think are the more affluent parts of the county. I don’t have a shapefile to confirm it, but this looks to especially be the case in/around Commission District 2 and incorporated areas–especially near the City of Atlanta and to the north. Historically, the move towards incorporation in the north of the county is driven by persistent NIMBYism in more affluent, predominantly white residents.
Q5.(Optional) Are there any other interesting findings? It’s not exactly a finding, but I think it’d be interesting to dig into demographics more. I’d like to look at things like housing stock characteristics, housing tenure, commute characteristics, etc. to see if I can figure out if these are places where the built environment is more walkable or more car-centered (It’s Atlanta, so I’m guessing the latter haha).Moreover, pulling crime statistics and looking at DUIs might be interesting. A lot of brunch places do bottomless cocktails and many spas will offer their patrons wine. If it’s a car-centric environment, then there might be instances where patrons are unwittingly over the limit and get behind the wheel.