Data607_final

Loading Library.

YelpR: An R library for the Yelp Fusion API. For more detail about this library you can go to https://github.com/OmaymaS/yelpr

RSocrate: The Socrata Open Data API allows you to programmatically access a wealth of open data resources from governments, non-profits, and NGOs around the world. For more detail, you can go to https://dev.socrata.com/

## Warning: package 'RSocrata' was built under R version 3.5.2

## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.1     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.3.0     ✔ forcats 0.3.0

## Warning: package 'tibble' was built under R version 3.5.2

## Warning: package 'purrr' was built under R version 3.5.2

## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

all_df<- read.socrata("https://data.cityofnewyork.us/resource/43nn-pn8j.json")

Understand your data.

Glimpsing the opendata, we decided to use Phone as Yelp Fusion API input to acquire business detail.

glimpse(all_df)

## Observations: 384,150
## Variables: 18
## $ action                <chr> "Violations were cited in the following ar…
## $ boro                  <chr> "QUEENS", "BROOKLYN", "QUEENS", "MANHATTAN…
## $ building              <chr> "13110", "224", "205", "38", "204", "97597…
## $ camis                 <chr> "50084802", "50056581", "50017217", "41585…
## $ critical_flag         <chr> "Critical", "Critical", "Critical", "Not C…
## $ cuisine_description   <chr> "Indian", "Cajun", "American", "Thai", "Th…
## $ dba                   <chr> "NAMASTE", "THE GUMBO BROS", "SHERWOODS KE…
## $ grade                 <chr> "A", "A", NA, "A", "A", "A", NA, "A", "A",…
## $ grade_date            <dttm> 2019-01-10, 2017-11-01, NA, 2018-02-15, 2…
## $ inspection_date       <dttm> 2019-01-10, 2017-11-01, 2019-04-03, 2018-…
## $ inspection_type       <chr> "Pre-permit (Operational) / Re-inspection"…
## $ phone                 <chr> "7186746780", "9179091471", "7183810400", …
## $ record_date           <dttm> 2019-05-11 06:08:52, 2019-05-11 06:08:52,…
## $ score                 <chr> "13", "11", "14", "11", "11", "12", "42", …
## $ street                <chr> "ROCKAWAY BLVD", "ATLANTIC AVE", "CYPRESS …
## $ violation_code        <chr> "02G", "06D", "04C", "09C", "02G", "04H", …
## $ violation_description <chr> "Cold food item held above 41Âº F (smoked …
## $ zipcode               <chr> "11420", "11201", "11385", "10004", "11201…

Use Phone as API input to acquire business detail.

There are 384,150 rows of data, with lots of duplicated phone numbers. We need to identify unique phone numbers.

dim(all_df)

## [1] 384150     18

rest_phone=data.frame(all_df$camis,all_df$phone)
names(rest_phone) = c('camis','phone')
unique_rest_phone= rest_phone[!duplicated(rest_phone),]
dim(unique_rest_phone)

## [1] 27105     2

Now, we acquire 27,105 unique phone numbers.

Acquire Yelp API Key

In the create new app form, enter information about your app, then agree to Yelp API Terms of Use and Display Requirements. Then click the Submit button. You will now have an API Key.

Yelp Fusion API daily limit

You will find each API credential can only be called 5000 times a day. To acquire the 27105 business details in one day. You will need at least 6 API credentials. Caption for the picture.

Acquire Business Detail

key = 'Your_Yelp_API_Key'
n_lower = 1
n_upper = n_lower+10
phone_df=unique_rest_phone[c(n_lower:n_upper),]

Create empty list to store data.

id = c()
rating = c()
review_count= c()
lat= c()
lon= c()
price= c()
phone_ls = c()
camis_ls = c()

Create for loop to acquire Yelp Business data.

I commented out some lines as it took so long time to run.

for(i in 1:nrow(phone_df)) {
    row <- phone_df[i,]
    phone=paste0('+1',row$phone)
    camis=row$camis
    test=business_search_phone(api_key = key, phone_number = phone)
    camis_ls = c(camis_ls,camis)
    phone_ls = c(phone_ls,phone)
    id = c(id,test$businesses$id)
    rating=c(rating,test$businesses$rating)
    review_count = c(review_count,test$businesses$review_count)
    lat=c(lat,test$businesses$coordinates$latitude)
    lon=c(lon,test$businesses$coordinates$longitude)
    price = c(price,test$businesses$price)
}

## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.

Combine list to Dataframe and save as .csv in folder

# business_detail= data.frame(cbind(camis_ls,phone_ls,id,rating, review_count, lat,lon,price))
# csvname = paste0('./yelp_data/yelp',n_lower,'_',n_upper,'.csv')
##write.csv(business_detail, file = csvname)

Data607_final_yelpapi

Wei Zhou, Mia Chen

5/12/2019