Assignment 5 YouTube API
Step 1
Getting YouTube API Key
First, you will need to get your YouTube API (for free). From understanding, YouTube shares information through their product, the YouTube Data API which is part of the Google Cloud Platform, also known as GCP. So, to be able to extract data from YouTube legally, you need to sign up for a Google Cloud account. You can do so by using a Google account that you may have already.
Once you are logged in, navigate to the left-hand side of the screen and select Credentials from the list of options. Click Create Credentials and select API key.
You will then see that your API Key has been created. However, at this moment in time is is unrestricted, and we want to make it restricted to prevent any unauthorized use. So, click Edit API Key. Scroll down to API Restrictions and check the box that says Restrict Key. When a list of items pops up, scroll to the bottom and select YouTube Data API vs. Then save.
Step 2
Packages in R
Install 3 additional packages: httr, jsonlite, and dplyr.
The httr package allows us to communicate with the API and get the raw data from YouTube in JSON format. Afterwards, the jsonlite package takes this raw data and transforms it into a readable format.The dplyr is an all-around package for manipulating data in R.
Step 3
YouTube API Call
A YouTube Data API call has the following format:
https://www.googleapis.com/youtube/v3/{resource}?{parameters}
The {resource} tells us what kind of information we want to extract from YouTube. For example, you can extract data from three major resources: channels, playlist items and videos.
Channels to get channel information. PlaylistItems to list all videos uploaded in a channel or by a user. Videos to get detailed video information
The {parameters} allow us to further customize the results. Multiple parameters are separated by &, and you usually start with the following parameters:
-key (required) for your YouTube API key
-id, forUsername, or playlistId for the unique identifier of each data point
-part for the specific data points to extract
Some Basic YouTube Details to Know
For this we will be looking at a Programming and Computer Science channel, CS Dojo: https://www.youtube.com/channel/UCxX9wt5FWQUAAz4UrysqK9A
And a Mathematics channel, Numberphile: https://www.youtube.com/user/numberphile
NOTE: YouTube channels can either be identified by a Channel ID or a Username. Also, each channel has different URL format.
For example:
CS Dojo follows the channel ID format. Therefore, the Channel ID is UCxX9wt5FWQUAAz4UrysqK9A. Numberphile follows the username format. In this case, the Username is numberphile.
For YouTube videos, each of them is identified by their Video ID, and we can easily see this in the URL.
https://www.youtube.com/watch?v=bI5jpueiCWw Here, the Video ID is followed after v= and is bI5jpueiCWw.
Getting the Channel Information
Based on Channel ID:
https://www.googleapis.com/youtube/v3/channels?key=**********&id=UCxX9wt5FWQUAAz4UrysqK9A&part=snippet,contentDetails,statistics Parameters:
id=UCxX9wt5FWQUAAz4UrysqK9A (the Channel ID of CS Dojo) part=snippet,contentDetails,statistics Based on Username:
https://www.googleapis.com/youtube/v3/channels?key=**********&forUsername=numberphile&part=snippet,contentDetails,statistics Parameters:
forUsername=numberphile (the Username of Numberphile) part=snippet,contentDetails,statistics
Step 4
Sample Code for Extracting YouTube Data in R
Here’s an example that we can use to extract data from a YouTube channel.
First, set the key variable with your YouTube API key.
key <- "… add your YouTube API key here …"
Next, like we did for twitter it might be beneficial to set up variables that you will frequently use throughout the script.
channel_id <- "UCxX9wt5FWQUAAz4UrysqK9A" # CS Dojo Channel ID
user_id <- "numberphile" # Numberphile Username
base <- "https://www.googleapis.com/youtube/v3/"
Set your working directory if you want to save the output. setwd(“C:/Output”)
Load and install the needed packages mentioned above.
Construct the API call
I will be only demonstrating a Channel call for CS Dojo.
api_params <-
paste(paste0("key=", key),
paste0("id=", channel_id),
"part=snippet,contentDetails,statistics",
sep = "&")
api_call <- paste0(base, "channels", "?", api_params)
api_result <- GET(api_call)
json_result <- content(api_result, "text", encoding="UTF-8")
Process the raw data into a data frame
channel.json <- fromJSON(json_result, flatten = T)
channel.df <- as.data.frame(channel.json)
Data frame explanation
This data frame then outputs 29 variables for this particular channel. Some of the various characteristics are the channels description, when it was published, view count and subscriber count. While this doesn’t allow you to make comparisons between channels, I think it would be beneficial to add in a looping function or something of that nature so that you can create a data frame that lists multiple channels.
Another example - Harry Styles Channel
You can easily search on YouTube to find different channels, I specifically looks at Harry Styles. His channel id is UCZFWPqqPkFlNwIxcpsLOwew so all you would have to do is put this into the various values above and then you would be able to look at the different characteristics of his channel.
Disclaimer: Much of this information came from Yuichi Otsuka blog tutorial so credit for the substance matter goes to him/her.