YelpR: An R library for the Yelp Fusion API. For more detail about this library you can go to https://github.com/OmaymaS/yelpr
RSocrate: The Socrata Open Data API allows you to programmatically access a wealth of open data resources from governments, non-profits, and NGOs around the world. For more detail, you can go to https://dev.socrata.com/
## Warning: package 'RSocrata' was built under R version 3.5.2
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.3.2
## ✔ tibble 2.1.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.0 ✔ forcats 0.3.0
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
all_df<- read.socrata("https://data.cityofnewyork.us/resource/43nn-pn8j.json")
Glimpsing the opendata, we decided to use Phone as Yelp Fusion API input to acquire business detail.
glimpse(all_df)
## Observations: 384,150
## Variables: 18
## $ action <chr> "Violations were cited in the following ar…
## $ boro <chr> "QUEENS", "BROOKLYN", "QUEENS", "MANHATTAN…
## $ building <chr> "13110", "224", "205", "38", "204", "97597…
## $ camis <chr> "50084802", "50056581", "50017217", "41585…
## $ critical_flag <chr> "Critical", "Critical", "Critical", "Not C…
## $ cuisine_description <chr> "Indian", "Cajun", "American", "Thai", "Th…
## $ dba <chr> "NAMASTE", "THE GUMBO BROS", "SHERWOODS KE…
## $ grade <chr> "A", "A", NA, "A", "A", "A", NA, "A", "A",…
## $ grade_date <dttm> 2019-01-10, 2017-11-01, NA, 2018-02-15, 2…
## $ inspection_date <dttm> 2019-01-10, 2017-11-01, 2019-04-03, 2018-…
## $ inspection_type <chr> "Pre-permit (Operational) / Re-inspection"…
## $ phone <chr> "7186746780", "9179091471", "7183810400", …
## $ record_date <dttm> 2019-05-11 06:08:52, 2019-05-11 06:08:52,…
## $ score <chr> "13", "11", "14", "11", "11", "12", "42", …
## $ street <chr> "ROCKAWAY BLVD", "ATLANTIC AVE", "CYPRESS …
## $ violation_code <chr> "02G", "06D", "04C", "09C", "02G", "04H", …
## $ violation_description <chr> "Cold food item held above 41º F (smoked …
## $ zipcode <chr> "11420", "11201", "11385", "10004", "11201…
There are 384,150 rows of data, with lots of duplicated phone numbers. We need to identify unique phone numbers.
dim(all_df)
## [1] 384150 18
rest_phone=data.frame(all_df$camis,all_df$phone)
names(rest_phone) = c('camis','phone')
unique_rest_phone= rest_phone[!duplicated(rest_phone),]
dim(unique_rest_phone)
## [1] 27105 2
Now, we acquire 27,105 unique phone numbers.
In the create new app form, enter information about your app, then agree to Yelp API Terms of Use and Display Requirements. Then click the Submit button. You will now have an API Key.
You will find each API credential can only be called 5000 times a day. To acquire the 27105 business details in one day. You will need at least 6 API credentials.
key = 'Your_Yelp_API_Key'
n_lower = 1
n_upper = n_lower+10
phone_df=unique_rest_phone[c(n_lower:n_upper),]
id = c()
rating = c()
review_count= c()
lat= c()
lon= c()
price= c()
phone_ls = c()
camis_ls = c()
I commented out some lines as it took so long time to run.
for(i in 1:nrow(phone_df)) {
row <- phone_df[i,]
phone=paste0('+1',row$phone)
camis=row$camis
test=business_search_phone(api_key = key, phone_number = phone)
camis_ls = c(camis_ls,camis)
phone_ls = c(phone_ls,phone)
id = c(id,test$businesses$id)
rating=c(rating,test$businesses$rating)
review_count = c(review_count,test$businesses$review_count)
lat=c(lat,test$businesses$coordinates$latitude)
lon=c(lon,test$businesses$coordinates$longitude)
price = c(price,test$businesses$price)
}
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
# business_detail= data.frame(cbind(camis_ls,phone_ls,id,rating, review_count, lat,lon,price))
# csvname = paste0('./yelp_data/yelp',n_lower,'_',n_upper,'.csv')
##write.csv(business_detail, file = csvname)