This project seeks to make use of Youtube API to study the profile and the engagement statistics of the COVID-19 related videos on Youtube.
First, let’s require the two packages: jsonlie, RCurl and plotly.
require(jsonlite)
## Loading required package: jsonlite
require(RCurl)
## Loading required package: RCurl
require(plotly)
## Loading required package: plotly
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Setup your API token.
API <- "PUT YOUR YOUTUBE TOKEN HERE"
Read Youtube Search’s online document: “https://developers.google.com/youtube/v3/docs/search” and define the Youtube Search function - getvideo_search.
getvideo_search <- function(sterm,API_key,nextp=NA){
nextpage <- ifelse(is.na(nextp),"",paste0("&pageToken=",nextp))
url <- paste0("https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=50&q=",URLencode(sterm),"&key=",API_key,nextpage)
result <- fromJSON(txt=url, flatten = TRUE)
return(result)
}
Let’s search “covid vaccine side effect”. The main part of the returned list is the “items” where all searched 50 items are contained in a data.frame. “NextPageToken” is the token to access the next page.
covid_vaccine <- getvideo_search('covid vaccine side effect',API)
dim(covid_vaccine[["items"]])
## [1] 50 20
class(covid_vaccine[["items"]])
## [1] "data.frame"
covid_vaccine[["nextPageToken"]]
## [1] "CDIQAA"
Then, a new data.frame called video_data is set to record all returned items, 50 in a time. Extract the column videoId for the next step to collect engagement figures. We then create a for-loop to repeat getting the data of page 2 to page 5. (Remarks: Column 5 named “thumbnails” is a data.frame in a data.frame. Remove it avoiding unmatched format.)
video_data <- covid_vaccine[["items"]]
covid_vaccine_prepage <- covid_vaccine
for (page in 2:5){
covid_vaccine_newpage <- getvideo_search('covid vaccine side effect',API,covid_vaccine_prepage[["nextPageToken"]])
new_data <- covid_vaccine_newpage[["items"]]
video_data <- rbind(video_data,new_data)
covid_vaccine_prepage <- covid_vaccine_newpage
Sys.sleep(5)
}
Now it’s time to collect the engagement statisitcs. Let’s define another function to get the engagement statistics of a video with videoId.
getstats_video<-function(video_id,API_key){
url <- paste0("https://www.googleapis.com/youtube/v3/videos?part=snippet,statistics&id=",video_id,"&key=",API_key)
result <- fromJSON(txt=url, flatten = TRUE)
return(result$items)
}
Write a for loop to get all 500 items’ engagement statistics (views, likes, dislikes, comments, and favorites) for each video. stats_data is an empty data.frame. Data are iternatively added to it in the loop. Since the returned data may have missing fields, i.e. say missing commentCount or favoriteCount, the colnames of the returned data are compared to match with the reference called data_shell.
stats_data <- data.frame()
for (i in 1:nrow(video_data)){
stats_result <- getstats_video(video_data$id.videoId[i],API)
data_shell <- data.frame()
video_data$viewCount[i] <- ifelse(is.null(stats_result$statistics.viewCount),NA,stats_result$statistics.viewCount)
video_data$likeCount[i] <- ifelse(is.null(stats_result$statistics.likeCount),NA,stats_result$statistics.likeCount)
video_data$dislikeCount[i] <- ifelse(is.null(stats_result$statistics.dislikeCount),NA,stats_result$statistics.dislikeCount)
video_data$favoriteCount[i] <- ifelse(is.null(stats_result$statistics.favoriteCount),NA,stats_result$statistics.favoriteCount)
video_data$commentCount[i] <- ifelse(is.null(stats_result$statistics.commentCount),NA,stats_result$statistics.commentCount)
Sys.sleep(5)
}
Show the top 5 most viewed Youtube video.
video_data_sorted <- video_data[order(as.integer(video_data$viewCount),decreasing=T),]
head(video_data_sorted[,c("snippet.title","viewCount")])
## snippet.title
## 45 What The COVID Vaccine Does To Your Body
## 126 Doctor Dies After Getting COVID 19 Vaccine? || Florida Doctor's Death
## 49 COVID-19 survivor has rare side effect of COVID-19 treatment -- a massively swollen tongue
## 41 What Did Bill Gates Say About COVID Vaccine Side Effects?
## 66 Vaccine Side Effect? Norway Sounds Alarm As 23 Elderly Patients Die After Receiving Pfizer Vaccine
## 181 COVID 19 Vaccine Deep Dive: Safety, Immunity, RNA Production, w Shane Crotty, PhD (Pfizer / Moderna)
## viewCount
## 45 3562398
## 126 3190677
## 49 3190313
## 41 2269531
## 66 1438944
## 181 1252357
Show the top 5 most viewed Youtube channels (total)
video_totalview <- aggregate(as.integer(viewCount) ~ snippet.channelTitle,data=video_data,sum,na.rm=T)
video_totalview <- video_totalview[order(video_totalview$`as.integer(viewCount)`,decreasing=T),]
head(video_totalview)
## snippet.channelTitle as.integer(viewCount)
## 46 Doctor Mike Hansen 3598909
## 18 AsapSCIENCE 3562398
## 85 KHOU 11 3203115
## 40 CRUX 2799363
## 123 PowerfulJRE 2269531
## 100 MedCram - Medical Lectures Explained CLEARLY 1252357
Show the top 5 most viewed Youtube channels (average)
video_averageview <- aggregate(as.integer(viewCount) ~ snippet.channelTitle,data=video_data,mean,na.rm=T)
video_averageview <- video_averageview[order(video_averageview$`as.integer(viewCount)`,decreasing=T),]
head(video_averageview)
## snippet.channelTitle as.integer(viewCount)
## 18 AsapSCIENCE 3562398
## 123 PowerfulJRE 2269531
## 100 MedCram - Medical Lectures Explained CLEARLY 1252357
## 46 Doctor Mike Hansen 1199636
## 40 CRUX 933121
## 112 NBCLA 821423
Last, we create a horizonal bar to display the top5 and upload it to plotly site. (Signup your free plotly account: https://chart-studio.plotly.com/).
p1 <- plot_ly(head(video_averageview), y = ~snippet.channelTitle, x = ~`as.integer(viewCount)`, type = 'bar', orientation = 'h', name = "Top 5 Most Viewd Youtube Channels related to COVID (in average views per video")
p1
Sys.setenv("plotly_username"="YOUR USERNAME")
Sys.setenv("plotly_api_key"="YOUR PASSWORD")
api_create(p1, filename = "lecture4_2021")
## Found a grid already named: 'lecture4_2021 Grid'. Since fileopt='overwrite', I'll try to update it
## Found a plot already named: 'lecture4_2021'. Since fileopt='overwrite', I'll try to update it
Read the Youtube Data API page for more: https://developers.google.com/youtube/v3/docs and find more API calls and functions that can support your work. Here is the one to read a video’s (ID = ‘R6reyiSpKuw’) comments page-by-page.
getvideo_comments <- function(video_id,API_key,nextp=NA){
nextpage <- ifelse(is.na(nextp),"",paste0("&pageToken=",nextp))
url <- paste0("https://youtube.googleapis.com/youtube/v3/commentThreads?part=id,replies,snippet&maxResults=100&videoId=",video_id,"&key=",API_key,nextpage)
result <- fromJSON(txt=url, flatten = TRUE)
return(result)
}
First100comments <- getvideo_comments('R6reyiSpKuw',API)
Second100comments <- getvideo_comments('R6reyiSpKuw',API,First100comments$nextPageToken)
head(First100comments$items$snippet.topLevelComment.snippet.textDisplay)
## [1] "AOC is a horrible human being"
## [2] "More. MORE"
## [3] "JOHN 3:16 “For God so loved the world, that he gave his only begotten Son, that whosoever believeth in him should not perish, but have everlasting life.”"
## [4] "List of things she do all day:<br />#1. Be incredibly beautiful"
## [5] "AOC is so cool and amazing"
## [6] "Alexandria you really can talk crap and think you are safe. You need to keep using that black makeup to hide your green Lizard scales.<br />Some people really dont like you at all including homeless alies that might thing to roll you for a few bucks in your purse and whatever jewelry you wearing."