SCenario: I’m a bike enthusiast looking to set up a bike rental store in the city of ATlanta. For this assignment, imagine that you are a bicycle enthusiast and want to set up a bicycle rental store in a place within the City of Atlanta.
The question in your mind is about where this store should be located. You realize that you can get the locations of all bike rental stores from Yelp API (categories: bikerentals). Another data that might help you could be the census data showing how many commuters commute on a bike (which could be a proxy for bike-friendliness of a community and environment). Here you will be looking for places where there is gap between bike stores (rentals in this case - but hopefully these also have other bike related services) and tracts with bike commuters.
I download the libraries and accessing the APIs, but I’m hiding it so I don’t reveal my API keys.
Monitoring my Yelp API limit.
Loading the data that I saved locally, so I don’t need to pull from the API.
# Used previously to save the data locally
#save(yelp_bikes, file = 'bike_rental_all.RData')
#To load the data set locally, I would run the code below at the beginning.
#load('bike_rental_all.RData')
I don’t like navigating the census website, so I’m loading data table to get variable codes here. It’s unnecessary to include this code.
Downloading the Fulton and Dekalb county census tracts. Requested data that could be related to commuting, like data about how far and how long people need to commute to their work.
# Rate limit??
tract <- tidycensus::get_acs(geography = "tract",
state = "GA",
county = c("Fulton", "Dekalb"),
variables = c(proximity2work = "B08131_001",
taveltimework = "B08303_001",
modebytraveltime = "B08134_001",
#transpo2work = "S0804_001",
#traveltimebysex = "B08012_001",
#traveltimebymode = "C08136_001"),
#medincome12mo = B19013_001,
workfromhome = "B99084_001"),
year = 2019,
survey = "acs5",
geometry = TRUE, # returns sf objects
output = "wide")
## Getting data from the 2015-2019 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 19%
|
|============== | 20%
|
|============== | 21%
|
|=============== | 21%
|
|================ | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 26%
|
|=================== | 27%
|
|=================== | 28%
|
|==================== | 29%
|
|===================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|============================ | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 63%
|
|============================================ | 64%
|
|============================================= | 64%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|================================================ | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================= | 92%
|
|================================================================== | 95%
|
|==================================================================== | 97%
|
|======================================================================| 100%
atlanta <- places('GA') %>%
filter(NAME == 'Atlanta') ##Dekalb county stretches into two cities
## Retrieving data for the year 2021
##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|=== | 4%
|
|=== | 5%
|
|==== | 5%
|
|==== | 6%
|
|===== | 6%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============= | 19%
|
|============== | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|===================== | 31%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 38%
|
|=========================== | 39%
|
|============================ | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 41%
|
|============================= | 42%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================= | 48%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|======================================== | 58%
|
|========================================= | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|========================================== | 61%
|
|=========================================== | 61%
|
|=========================================== | 62%
|
|============================================ | 62%
|
|============================================ | 63%
|
|============================================= | 64%
|
|============================================= | 65%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 68%
|
|================================================= | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 72%
|
|=================================================== | 73%
|
|=================================================== | 74%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|====================================================== | 78%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 79%
|
|======================================================== | 80%
|
|======================================================== | 81%
|
|========================================================= | 81%
|
|========================================================= | 82%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================= | 88%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 98%
|
|======================================================================| 100%
tract <- tract[atlanta,]
## View acs data
tract
## Simple feature collection with 161 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -84.63265 ymin: 33.61085 xmax: -84.25199 ymax: 33.91586
## Geodetic CRS: NAD83
## First 10 features:
## GEOID NAME proximity2workE
## 1 13121001100 Census Tract 11, Fulton County, Georgia 80000
## 2 13121009603 Census Tract 96.03, Fulton County, Georgia 61565
## 3 13121005800 Census Tract 58, Fulton County, Georgia 22750
## 5 13121009502 Census Tract 95.02, Fulton County, Georgia 48500
## 9 13121004900 Census Tract 49, Fulton County, Georgia 36590
## 10 13121004800 Census Tract 48, Fulton County, Georgia 8680
## 11 13121006300 Census Tract 63, Fulton County, Georgia 23780
## 13 13121005501 Census Tract 55.01, Fulton County, Georgia 33590
## 14 13121004200 Census Tract 42, Fulton County, Georgia NA
## 20 13121004400 Census Tract 44, Fulton County, Georgia 22565
## proximity2workM taveltimeworkE taveltimeworkM modebytraveltimeE
## 1 10671 3517 401 3517
## 2 8583 2990 377 2990
## 3 6089 724 157 724
## 5 14640 1926 373 1926
## 9 4967 1388 204 1388
## 10 2693 402 106 402
## 11 6106 671 119 671
## 13 6036 1151 205 1151
## 14 NA 900 187 900
## 20 8720 912 289 912
## modebytraveltimeM workfromhomeE geometry
## 1 401 3972 MULTIPOLYGON (((-84.38782 3...
## 2 377 3293 MULTIPOLYGON (((-84.38738 3...
## 3 157 736 MULTIPOLYGON (((-84.41692 3...
## 5 373 2087 MULTIPOLYGON (((-84.39472 3...
## 9 204 1665 MULTIPOLYGON (((-84.38779 3...
## 10 106 402 MULTIPOLYGON (((-84.38771 3...
## 11 119 691 MULTIPOLYGON (((-84.40797 3...
## 13 205 1257 MULTIPOLYGON (((-84.38795 3...
## 14 187 1002 MULTIPOLYGON (((-84.42334 3...
## 20 289 962 MULTIPOLYGON (((-84.40716 3...
Before setting the boundary, define the shape for the tract polygon.
# Function: Get tract-wise radius
get_r <- function(poly, epsg_id){
#---------------------
# Takes: a single POLYGON or LINESTRTING
# Outputs: distance between the centroid of the boundingbox and a corner of the bounding box
#---------------------
# Get bounding box of a given polygon
bb <- st_bbox(poly)
# Get lat & long coordinates of any one corner of the bounding box.
bb_corner <- st_point(c(bb[1], bb[2])) %>% st_sfc(crs = epsg_id)
# Get centroid of the bb
bb_center_x <- (bb[3]+bb[1])/2
bb_center_y <- (bb[4]+bb[2])/2
bb_center <- st_point(c(bb_center_x, bb_center_y)) %>% st_sfc(crs = epsg_id) %>% st_sf()
# Get the distance between bb_p and c
r <- st_distance(bb_corner, bb_center)
# Multiply 1.1 to make the circle a bit larger than the Census Tract.
# See the Yelp explanation of their radius parameter to see why we do this.
bb_center$radius <- r*1.2
return(bb_center)
}
## Using a loop -----------------------------------------------------------------
# Creating an empty vector of NA.
# Results will fill this vector
epsg_id <- 4326
r4all_loop <- vector("list", nrow(tract))
# Starting a for-loop
for (i in 1:nrow(tract)){
r4all_loop[[i]] <- tract %>%
st_transform(crs = epsg_id) %>%
st_geometry() %>%
.[[i]] %>%
get_r(epsg_id = epsg_id)
}
r4all_loop <- bind_rows(r4all_loop)
# Using a functional -----------------------------------------------------------
# We use a functional (sapply) to apply this custom function to each Census Tract.
r4all_apply <- tract %>%
st_geometry() %>%
st_transform(crs = epsg_id) %>%
lapply(., function(x) get_r(x, epsg_id = epsg_id))
r4all_apply <- bind_rows(r4all_apply)
# Are these two identical?
identical(r4all_apply, r4all_loop) ## checking because we used two functions that do the same thing
## [1] TRUE
# Appending X Y coordinates as seprate columns
ready_4_yelp <- r4all_apply %>%
mutate(x = st_coordinates(.)[,1],
y = st_coordinates(.)[,2])
Census tracts that are inside or intersecting the City of Atlanta boundary.
tmap_mode('view')
## tmap mode set to interactive viewing
# Select the 36 rows
ready_4_yelp[1:36,] %>%
# Draw a buffer centered at the centroid of Tract polygons.
# Radius of the buffer is the radius we just calculated using loop
st_buffer(., dist = .$radius) %>%
# Display this buffer in red
tm_shape(.) + tm_polygons(alpha = 0.5, col = 'red') +
# Display the original polygon in blue
tm_shape(tract[1:36,]) + tm_borders(col= 'blue')
Download Yelp data on categories = bikerentals for the City of Atlanta.
Applying the census tract to Yelp data for the
bikerentals business category.
# FUNCTION
get_yelp <- function(tract, category){
# ----------------------------------
# Gets one row of tract information (1,) and category name (str),
# Outputs a list of business data.frame
Sys.sleep(1)
n <- 1
# First request --------------------------------------------------------------
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50, # = 0 when n = 1
radius = round(tract$radius),
limit = 50)
# Calculate how many requests are needed in total
required_n <- ceiling(resp$total/50)
# out is where the results will be appended to.
out <- vector("list", required_n)
# Store the business information to nth slot in out
out[[n]] <- resp$businesses
# Change the name of the elements to the total required_n
# This is to know if there are more than 1000 businesses,
# we know how many.
names(out)[n] <- required_n
# Throw error if more than 1000
if (resp$total >= 1000)
{
# glue formats string by inserting {n} with what's currently stored in object n.
print(glue::glue("{n}th row has >= 1000 businesses."))
# Stop before going into the loop because we need to
# break down Census Tract to something smaller.
return(out)
}
else
{
# add 1 to n
n <- n + 1
# Now we know required_n -----------------------------------------------------
# Starting a loop
while(n <= required_n){
resp <- business_search(api_key = Sys.getenv("yelp_api"),
categories = category,
latitude = tract$y,
longitude = tract$x,
offset = (n - 1) * 50,
radius = round(tract$radius),
limit = 50)
out[[n]] <- resp$businesses
n <- n + 1
} #<< end of while loop
# Merge all elements in the list into a single data frame
out <- out %>% bind_rows()
return(out)
}
}
Store the tract data into empty vectors.
# Prepare a collector
yelp_bike_list <- vector("list", nrow(ready_4_yelp))
# Looping through all Census Tracts
for (row in 1:nrow(ready_4_yelp)){
yelp_bike_list[[row]] <- suppressMessages(get_yelp(ready_4_yelp[row,], "bikerentals"))
if (row %% 36 == 0){
print(paste0("Current row: ", row))
}
}
## [1] "Current row: 36"
## [1] "Current row: 72"
## [1] "Current row: 108"
## [1] "Current row: 144"
# Collapsing the list into a data frame
yelp_bikes <- yelp_bike_list %>% bind_rows() %>% as_tibble() %>%
mutate(business_type = "bikerentals")
# print
yelp_bikes %>% print(width=1000)
## # A tibble: 91 × 17
## id alias
## <chr> <chr>
## 1 JkkHRgYj0mvdgbMXFm436w civil-bikes-atlanta
## 2 FK7-M9BGyCgpEmVifcPfoA aztec-cycles-stone-mountain
## 3 UmftRC3h0h_owHEm5ZLp7Q jump-atlanta-2
## 4 FK7-M9BGyCgpEmVifcPfoA aztec-cycles-stone-mountain
## 5 JkkHRgYj0mvdgbMXFm436w civil-bikes-atlanta
## 6 b3nacMG8PR77GNCaI4RBKA atlanta-bicycle-barn-atlanta
## 7 tMNV5bj4rqud0cRRQiPbWA outback-bikes-atlanta
## 8 8PfRbXo6qhKliGDCp5l79g atlanta-pro-bikes-atlanta
## 9 FK7-M9BGyCgpEmVifcPfoA aztec-cycles-stone-mountain
## 10 rbf8bVY0cuqyGZtbn691lg pedego-electric-bikes-atlanta-atlanta-2
## name
## <chr>
## 1 Civil Bikes
## 2 Aztec Cycles
## 3 JUMP
## 4 Aztec Cycles
## 5 Civil Bikes
## 6 Atlanta Bicycle Barn
## 7 Outback Bikes
## 8 Atlanta Pro Bikes
## 9 Aztec Cycles
## 10 Pedego Electric Bikes Atlanta
## image_url
## <chr>
## 1 https://s3-media4.fl.yelpcdn.com/bphoto/JqTLT-chrqtbyuoB-52gdw/o.jpg
## 2 https://s3-media3.fl.yelpcdn.com/bphoto/-UkKevNihxQhIgFi_zR2iw/o.jpg
## 3 https://s3-media2.fl.yelpcdn.com/bphoto/D87H00XdLWZJS-LvQkTalA/o.jpg
## 4 https://s3-media3.fl.yelpcdn.com/bphoto/-UkKevNihxQhIgFi_zR2iw/o.jpg
## 5 https://s3-media4.fl.yelpcdn.com/bphoto/JqTLT-chrqtbyuoB-52gdw/o.jpg
## 6 https://s3-media3.fl.yelpcdn.com/bphoto/Ik2pMce41_MRcg3svjTbSQ/o.jpg
## 7 https://s3-media4.fl.yelpcdn.com/bphoto/rnpOTcs3WwjTq1JsfV1b8w/o.jpg
## 8 https://s3-media1.fl.yelpcdn.com/bphoto/1s0fhhJvN3z-SUPizRy5sw/o.jpg
## 9 https://s3-media3.fl.yelpcdn.com/bphoto/-UkKevNihxQhIgFi_zR2iw/o.jpg
## 10 https://s3-media2.fl.yelpcdn.com/bphoto/Z7KlxC0vcyoxOCSxbH3Rrg/o.jpg
## is_closed
## <lgl>
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## 7 FALSE
## 8 FALSE
## 9 FALSE
## 10 FALSE
## url
## <chr>
## 1 https://www.yelp.com/biz/civil-bikes-atlanta?adjust_creative=VuthNv6lmGi4hXZ…
## 2 https://www.yelp.com/biz/aztec-cycles-stone-mountain?adjust_creative=VuthNv6…
## 3 https://www.yelp.com/biz/jump-atlanta-2?adjust_creative=VuthNv6lmGi4hXZmh35I…
## 4 https://www.yelp.com/biz/aztec-cycles-stone-mountain?adjust_creative=VuthNv6…
## 5 https://www.yelp.com/biz/civil-bikes-atlanta?adjust_creative=VuthNv6lmGi4hXZ…
## 6 https://www.yelp.com/biz/atlanta-bicycle-barn-atlanta?adjust_creative=VuthNv…
## 7 https://www.yelp.com/biz/outback-bikes-atlanta?adjust_creative=VuthNv6lmGi4h…
## 8 https://www.yelp.com/biz/atlanta-pro-bikes-atlanta?adjust_creative=VuthNv6lm…
## 9 https://www.yelp.com/biz/aztec-cycles-stone-mountain?adjust_creative=VuthNv6…
## 10 https://www.yelp.com/biz/pedego-electric-bikes-atlanta-atlanta-2?adjust_crea…
## review_count categories rating coordinates$latitude $longitude transactions
## <int> <list> <dbl> <dbl> <dbl> <list>
## 1 11 <df [2 × 2]> 4.5 33.7 -84.4 <list [0]>
## 2 62 <df [3 × 2]> 5 33.8 -84.2 <list [0]>
## 3 1 <df [2 × 2]> 1 33.7 -84.4 <list [0]>
## 4 62 <df [3 × 2]> 5 33.8 -84.2 <list [0]>
## 5 11 <df [2 × 2]> 4.5 33.7 -84.4 <list [0]>
## 6 128 <df [3 × 2]> 4.5 33.8 -84.4 <list [0]>
## 7 103 <df [3 × 2]> 4 33.8 -84.3 <list [0]>
## 8 42 <df [3 × 2]> 4.5 33.8 -84.4 <list [0]>
## 9 62 <df [3 × 2]> 5 33.8 -84.2 <list [0]>
## 10 15 <df [3 × 2]> 4.5 33.7 -84.4 <list [0]>
## location$address1 $address2 $address3 $city $zip_code
## <chr> <chr> <chr> <chr> <chr>
## 1 "" <NA> "" Atlanta 30312
## 2 "901 Main St" "" "" Stone Mountain 30083
## 3 "" "" <NA> Atlanta 30301
## 4 "901 Main St" "" "" Stone Mountain 30083
## 5 "" <NA> "" Atlanta 30312
## 6 "151 Sampson St NE" "" "" Atlanta 30312
## 7 "484 Moreland Ave NE" "Ste E" "" Atlanta 30307
## 8 "1039 N Highland Ave NE" <NA> "" Atlanta 30306
## 9 "901 Main St" "" "" Stone Mountain 30083
## 10 "414 Bill Kennedy Way" "Ste 101" <NA> Atlanta 30316
## $country $state $display_address phone display_phone distance price
## <chr> <chr> <list> <chr> <chr> <dbl> <chr>
## 1 US GA <chr [1]> +14043238754 (404) 323-8754 366. <NA>
## 2 US GA <chr [2]> +16786369043 (678) 636-9043 20719. $$
## 3 US GA <chr [1]> +18333006106 (833) 300-6106 1238. <NA>
## 4 US GA <chr [2]> +16786369043 (678) 636-9043 20881. $$
## 5 US GA <chr [1]> +14043238754 (404) 323-8754 527. <NA>
## 6 US GA <chr [2]> +17708732413 (770) 873-2413 1192. $$
## 7 US GA <chr [3]> +14046884878 (404) 688-4878 675. $$
## 8 US GA <chr [2]> +14042541230 (404) 254-1230 689. <NA>
## 9 US GA <chr [2]> +16786369043 (678) 636-9043 29677. $$
## 10 US GA <chr [3]> +14049753915 (404) 975-3915 1453. <NA>
## business_type
## <chr>
## 1 bikerentals
## 2 bikerentals
## 3 bikerentals
## 4 bikerentals
## 5 bikerentals
## 6 bikerentals
## 7 bikerentals
## 8 bikerentals
## 9 bikerentals
## 10 bikerentals
## # ℹ 81 more rows
The loop stored tract data for bikerentals with 91 observations.
# Extract coordinates
yelp_sf <- yelp_bikes %>%
mutate(x = .$coordinates$longitude,
y = .$coordinates$latitude) %>%
mutate(x = if_else(x > 0, 0 - x, x)) %>% ## To list longitude correctly
filter(!is.na(x) & !is.na(y)) %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# Map
tm_shape(yelp_sf) +
tm_dots(col = "name")
After the initial accessing of Yelp data, I’ll download it so I don’t hit my Yelp data limit; I won’t need to pull from the API each time now.
At this point, I’ve already saved the data so I commented that code out. This code chunk will load the data.
# Used previously to save the data locally
#save(yelp_bikes, yelp_bike_list, yelp_sf, ready_4_yelp, r4all_loop, r4all_apply, atlanta, tract, variables, #file = 'bike_rental_all.RData')
##To load the data set locally, I would run the code below at the beginning.
##load('bike_rental_all.RData')
Tidying the data for the map with all three business types.
Check for duplicated rows in the data yelp_final, which
is the combined data, then delete any that exist.
yelp_bikes %>% distinct(.) # check for duplicates
## # A tibble: 91 × 17
## id alias name image_url is_closed url review_count categories rating
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl>
## 1 JkkHRgY… civi… Civi… https://… FALSE http… 11 <df> 4.5
## 2 FK7-M9B… azte… Azte… https://… FALSE http… 62 <df> 5
## 3 UmftRC3… jump… JUMP https://… FALSE http… 1 <df> 1
## 4 FK7-M9B… azte… Azte… https://… FALSE http… 62 <df> 5
## 5 JkkHRgY… civi… Civi… https://… FALSE http… 11 <df> 4.5
## 6 b3nacMG… atla… Atla… https://… FALSE http… 128 <df> 4.5
## 7 tMNV5bj… outb… Outb… https://… FALSE http… 103 <df> 4
## 8 8PfRbXo… atla… Atla… https://… FALSE http… 42 <df> 4.5
## 9 FK7-M9B… azte… Azte… https://… FALSE http… 62 <df> 5
## 10 rbf8bVY… pede… Pede… https://… FALSE http… 15 <df> 4.5
## # ℹ 81 more rows
## # ℹ 8 more variables: coordinates <df[,2]>, transactions <list>,
## # location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>,
## # price <chr>, business_type <chr>
yelp_bikes[!duplicated(yelp_bikes$id),]
## # A tibble: 10 × 17
## id alias name image_url is_closed url review_count categories rating
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list> <dbl>
## 1 JkkHRgY… civi… Civi… https://… FALSE http… 11 <df> 4.5
## 2 FK7-M9B… azte… Azte… https://… FALSE http… 62 <df> 5
## 3 UmftRC3… jump… JUMP https://… FALSE http… 1 <df> 1
## 4 b3nacMG… atla… Atla… https://… FALSE http… 128 <df> 4.5
## 5 tMNV5bj… outb… Outb… https://… FALSE http… 103 <df> 4
## 6 8PfRbXo… atla… Atla… https://… FALSE http… 42 <df> 4.5
## 7 rbf8bVY… pede… Pede… https://… FALSE http… 15 <df> 4.5
## 8 BozJwfo… podi… Podi… https://… FALSE http… 20 <df> 4.5
## 9 gcl6O-Z… clou… Clou… https://… FALSE http… 1 <df> 1
## 10 vfp82FZ… rela… Rela… https://… FALSE http… 12 <df> 2
## # ℹ 8 more variables: coordinates <df[,2]>, transactions <list>,
## # location <df[,8]>, phone <chr>, display_phone <chr>, distance <dbl>,
## # price <chr>, business_type <chr>
#delete_duplicates <- df[!duplicated(df$location), ]
#dupl_df[!duplicated(dupl_df$location),] # remove
#print(df_no_duplicates)
I have a hypothesis that location is going to return a
lot of separated columns.
yelp_flat <- yelp_bikes %>% unnest_wider(categories, names_sep = "_") %>%
unnest_wider(coordinates, names_sep = "_") %>% # use _ separator to replace the $ in original data set
unnest_wider(location, names_sep = "_") #%>%
# unnest_wider(categories, names_sep = "_") # not important for this
New yelp_flat contains 26 variables, while
yelp_bikes only had 17 variables. Using yelp_flat from here
forward.
Deleting rows that have NAs in the columns I might want to examine.
yelp_flat %>%
filter(!is.na(coordinates_latitude)) %>%
filter(!is.na(coordinates_longitude)) %>%
filter(!is.na(price))
## # A tibble: 61 × 26
## id alias name image_url is_closed url review_count categories_alias
## <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list<chr>>
## 1 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 2 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 3 b3nacMG8… atla… Atla… https://… FALSE http… 128 [3]
## 4 tMNV5bj4… outb… Outb… https://… FALSE http… 103 [3]
## 5 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 6 b3nacMG8… atla… Atla… https://… FALSE http… 128 [3]
## 7 BozJwfoX… podi… Podi… https://… FALSE http… 20 [2]
## 8 tMNV5bj4… outb… Outb… https://… FALSE http… 103 [3]
## 9 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 10 tMNV5bj4… outb… Outb… https://… FALSE http… 103 [3]
## # ℹ 51 more rows
## # ℹ 18 more variables: categories_title <list<chr>>, rating <dbl>,
## # coordinates_latitude <dbl>, coordinates_longitude <dbl>,
## # transactions <list>, location_address1 <chr>, location_address2 <chr>,
## # location_address3 <chr>, location_city <chr>, location_zip_code <chr>,
## # location_country <chr>, location_state <chr>,
## # location_display_address <list<list>>, phone <chr>, display_phone <chr>, …
tract2 <- tract %>%
filter(!is.na(proximity2workE)) %>%
filter(!is.na(proximity2workM)) %>%
filter(!is.na(workfromhomeE)) %>%
filter(!is.na(modebytraveltimeE)) %>%
filter(!is.na(modebytraveltimeM))
# # need to get polygon data here, choose a different variable
# tract <- tidycensus::get_acs(geography = "tract",
# state = "GA",
# county = "Dekalb",
# variables = c(population = "B01003_001",
# medianincome = "B19013_001"),
# year = 2019,
# survey = "acs5",
# geometry = TRUE, # returns sf objects
# output = "wide")
#
# atlanta <- places('GA') %>%
# filter(NAME %in% c('Stone Mountain', 'Atlanta')) ##Dekalb county stretches into two cities
#
# tract <- tract[atlanta,]
# Filter for specific cities
atlanta <- tract %>%
filter(NAME %in% c('Stone Mountain', 'Atlanta'))
# Make yelp_flat an sf object
yelp_flat_sf <- yelp_flat %>%
st_as_sf(coords = c("coordinates_longitude", "coordinates_latitude"), crs = st_crs(tract))
# Perform a spatial join between yelp_sf and atlanta
filtered_yelp <- st_join(yelp_flat_sf, atlanta)
## View acs data
filtered_yelp
## Simple feature collection with 91 features and 33 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -84.47153 ymin: 33.61131 xmax: -84.17012 ymax: 33.80578
## Geodetic CRS: NAD83
## # A tibble: 91 × 34
## id alias name image_url is_closed url review_count categories_alias
## * <chr> <chr> <chr> <chr> <lgl> <chr> <int> <list<chr>>
## 1 JkkHRgYj… civi… Civi… https://… FALSE http… 11 [2]
## 2 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 3 UmftRC3h… jump… JUMP https://… FALSE http… 1 [2]
## 4 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 5 JkkHRgYj… civi… Civi… https://… FALSE http… 11 [2]
## 6 b3nacMG8… atla… Atla… https://… FALSE http… 128 [3]
## 7 tMNV5bj4… outb… Outb… https://… FALSE http… 103 [3]
## 8 8PfRbXo6… atla… Atla… https://… FALSE http… 42 [3]
## 9 FK7-M9BG… azte… Azte… https://… FALSE http… 62 [3]
## 10 rbf8bVY0… pede… Pede… https://… FALSE http… 15 [3]
## # ℹ 81 more rows
## # ℹ 26 more variables: categories_title <list<chr>>, rating <dbl>,
## # transactions <list>, location_address1 <chr>, location_address2 <chr>,
## # location_address3 <chr>, location_city <chr>, location_zip_code <chr>,
## # location_country <chr>, location_state <chr>,
## # location_display_address <list<list>>, phone <chr>, display_phone <chr>,
## # distance <dbl>, price <chr>, business_type <chr>, geometry <POINT [°]>, …
Get everything into the same coordinate reference system (CRS).
I want to see a summary for.
# reviews submitted to bike rental stores in the tract
summary(yelp_flat_sf$review_count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 62.00 58.18 103.00 128.00
summary(yelp_flat_sf$distance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 230.5 1327.3 5412.2 9750.4 14744.0 39724.2
# Summary statistics for the proportion of bike commuters
summary(tract2$proximity2workE)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 8680 26250 39980 51775 65030 259860 72
Get the updated tract and yelp data into the same crs.
map2 <- yelp_flat_sf %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
I want to see the number of bike rentals by company.
# Create a bar chart of bike rentals by category
ggplot(data = yelp_flat_sf, aes(x = distance)) +
geom_dotplot() +
labs(x = "Distance", y = "Commuters per distance") +
theme_minimal()
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
Bike rental stores in Atlanta.
map2 <- yelp_flat_sf %>%
st_as_sf(coords = c("x", "y"), crs = 4326)
# # Map
# tm_shape(yelp_flat_sf) +
# tm_dots(col = "name")
# Compared for between each each census tract
distances <- st_distance(tract, yelp_flat_sf)
min_distances <- apply(distances, MARGIN = 1, FUN = min)
# Include minimum distances on the tract
tract$nearest_distance <- min_distances
# Map
tm_shape(tract) +
tm_borders() +
tm_fill(col = "nearest_distance") +
tm_legend(outside = TRUE, title = "Nearest Distance to a Bike Rental Store")