This is a step-by-step tutorial on how to use the Yelp Fusion API in R. This API allows a user to collect data from a search on Yelp’s website. For more information about what this API is capable of, see: https://www.yelp.com/developers/documentation/v3/get_started

Step 1:

Load the required packages.

Packages Description
tidyverse Loads a number of packages including readr, dplyr, and ggplot
httr Allows user to connect to Yelp’s website and run the Yelp Fusion API

Step 2:

If you don’t have one already, create a Yelp account at https://www.yelp.com/

Then, go to https://www.yelp.com/developers/v3/manage_app to create an app that will give you access to Yelp’s API.

On this page, you will need to create an App Name, give a contact email (the one that you used to create your Yelp account is fine), and provide a description (I wrote: “For educational purposes”). Once this is complete, you should have a Client ID and an API Key at the top of the page.

Step 3:

Create a token using the Client ID and the API key (this is called the client_secret below).

Please keep in mind that these are unique to you and your Yelp App. They should not be shared on the internet or where someone you do not trust could gain access to them. For that reason, I will not include mine in the code. To run this example, please use your own Client ID and API Key :)

It should look like this:

client_id <- "your_client_ID"
client_secret <- "your_API_key"

res <- POST("https://api.yelp.com/oauth2/token",
            body = list(grant_type = "client_credentials",
                        client_id = client_id,
                        client_secret = client_secret))

token <- content(res)$access_token

Step 4:

Create the search url and collect the data.

This is where the elements of the search are determined (essentially, it determines which url the data will be collected from). In this case: I am searching for businesses that make and sell cookies within 5 miles (8800 yards) of Cincinnati, OH. I have also limited the number of businesses that will be collected to 50. The Yelp category of the business can also be defined by the user to further narrow the search.

The information stored in the list called results (last line of the below code) is the output data of this url.

yelp <- "https://api.yelp.com"
term <- "cookies"
location <- "Cincinnati, OH"
categories <- NULL
limit <- 50
radius <- 8800
url <- modify_url(yelp, path = c("v3", "businesses", "search"),
                  query = list(term = term, location = location, 
                               limit = limit,
                               radius = radius))
res <- GET(url, add_headers('Authorization' = paste("bearer", client_secret)))

results <- content(res)

Step 5:

Format the data.

Define the variables you would like to collect from Yelp. A full list of the potential variables can be found by clicking on ‘results’ in your environment, expanding the ‘businesses’ tab, and expanding any one of the observations found in it.

yelp_httr_parse <- function(x) {

  parse_list <- list(id = x$id, 
                     name = x$name, 
                     rating = x$rating, 
                     review_count = x$review_count, 
                     latitude = x$coordinates$latitude, 
                     longitude = x$coordinates$longitude, 
                     address1 = x$location$address1, 
                     city = x$location$city, 
                     state = x$location$state, 
                     distance = x$distance)
  
  parse_list <- lapply(parse_list, FUN = function(x) ifelse(is.null(x), "", x))
  
  df <- data_frame(id=parse_list$id,
                   name=parse_list$name, 
                   rating = parse_list$rating, 
                   review_count = parse_list$review_count, 
                   latitude=parse_list$latitude, 
                   longitude = parse_list$longitude, 
                   address1 = parse_list$address1, 
                   city = parse_list$city, 
                   state = parse_list$state, 
                   distance= parse_list$distance)
  df
}

results_list <- lapply(results$businesses, FUN = yelp_httr_parse)

business_data <- do.call("rbind", results_list)

Step 6:

That’s it!

You now have a data table that contains all of the above information than can be used for whatever analysis you deem to be appropriate. The table should look like this:

Yelp Cookie Business Data

Yelp Cookie Business Data

More information on the Yelp API

Something that would be extremely useful for a business analyst is collecting the review data of a single business. However, the Yelp API does not currently support collecting more than 3 reviews at once. There are some ways to work around that, but they are not simple ones.

Another limitation of this API is that it does not currently return businesses without reviews.

Lastly, there are other beta versions available that allow users to collect information on Yelp categories and events. These require a user to join the Yelp Developer Beta Program (it’s free) which gives early access to new and experimental features of the API.

Happy coding :)