Youtube is the one of the most frequent sites right now. We watch, upload and share videos in this platform, daily. However, how about using it to get key information on a firm’s competitor? Companies are prone to promote its videos, so from a marketing approach, knowing what a competitor is doing, would lead to obtain insights and take better decisions.
We’ll need jsonlite and curl libraries as well as a youtube APIV3 key from google. So go directly to this link and obtain a key:
Google Cloud Youtube API Dashboard
for this blog, we’ll use youtube accounts for the largest banks in Colombia: Bancolombia, Davivienda & Banco de Bogotá. We want to see what they’re doing and how do people feel about their videos.
Before we move on, we’ll need every channel id: this can be obtained using external pages like this, just copy and paste the channel link and it will print the channel id right away. After finding all channel id’s we should have a dataframe like this:
| name | cha_id |
|---|---|
| Bancolombia | UCczkYQFOUOgsg958IfKMB_Q |
| Davivienda | UCoh2JDafVSw-0eBuZo4ecpQ |
| Banco de Bogota | UCHZS1Ghu4pU-rofAZUPoHFQ |
We coded the following functions which will help us to extract all information on a channel playlist. We’ll come back later to explain a little bit deeper on this.
API_key="PUT YOUR API KEY HERE"
getstats_video<-function(video_id,API_key){
url=paste0("https://www.googleapis.com/youtube/v3/videos?part=snippet,statistics&id=",video_id,"&key=",API_key)
result <- fromJSON(txt=url)
salida=list()
return(data.frame(name=result$items$snippet$channelTitle, result$items$statistics,title=result$items$snippet$title,date=result$items$snippet$publishedAt,descrip=result$items$snippet$description))
}
get_playlist_canal<-function(id,API_key,topn=15){
url=paste0('https://www.googleapis.com/youtube/v3/playlistItems?part=contentDetails&playlistId=',id,'&key=',API_key,'&maxResults=',topn)
result=fromJSON(txt=url)
return(data.frame(result$items$contentDetails))
}
getstats_canal<-function(id,API_key){
url=paste0('https://www.googleapis.com/youtube/v3/channels?part=snippet%2CcontentDetails%2Cstatistics&id=',id,'&key=',API_key)
result <- fromJSON(txt=url)
return(data.frame(name=result$items$snippet$title,result$items$statistics,pl_list_id=result$items$contentDetails$relatedPlaylists))
}
getall_channels<-function(ids,API_key,topn=5){
videos=lapply(ids,FUN=get_playlist_canal,API_key=API_key,topn=topn) %>% bind_rows()
stats=lapply(videos[,1],FUN=getstats_video,API_key=API_key)
stats=bind_rows(stats)
stats$vid_id=videos[,1]
return(stats)
}By using getstats_canal function, we can obtain a channel basic statistics such as total views, comments (if available), count of videos published and so on.
can_st=lapply(comp_data$cha_id,FUN = getstats_canal,API_key=API_key)
can_st=bind_rows(can_st)
can_st$viewCount=as.numeric(can_st$viewCount)
can_st[,1:6] %>% kable() %>%kable_styling()| name | viewCount | commentCount | subscriberCount | hiddenSubscriberCount | videoCount |
|---|---|---|---|---|---|
| Grupo Bancolombia | 54778143 | 0 | 178000 | FALSE | 936 |
| Banco Davivienda Colombia | 24685938 | 0 | 95300 | FALSE | 333 |
| Banco de Bogotá | 39510038 | 0 | 18400 | FALSE | 188 |
can_st$viewCount=round(as.numeric(can_st$viewCount)/1000000,2)
p1=can_st %>% ggplot(aes(x=reorder(name,viewCount),y=viewCount,fill=name))+
geom_bar(stat="sum")+guides(size=F)+coord_flip()+scale_fill_manual(values = c("red", "darkblue", "yellow2"))+
geom_text(inherit.aes = T,aes(label=paste(viewCount,"M")),nudge_y =0,angle = 90)+
labs(x="Total Visualizations(Millions)",y="Visualizations",fill="")+
theme(legend.position = "top")+mytheme3
ggplotly(p1,tooltip=c("name","viewCount")) %>%
layout(legend = list(orientation = "h",x = 0.01, y = -0.1,autosize=F)) After using getall_channels on a channel id, it will retrieve all videos from that channel and will call getstats_video to get all information out of every video id. Here we plot dislikeCount but you could try favoriteCount or commentCount. Another parameter one should change is topn, this is the number of videos the function will get from each channel. It’s set to 5 by default.
var_to_see="dislikeCount" #favoriteCount or commentCount
info=getall_channels(ids = can_st$pl_list_id.uploads,API_key = API_key,topn =20)
datacond=melt(info[,c(1:6,8)],id.vars = c("name","date"))
datacond$date=as.Date(datacond$date)
datacond$value=as.numeric(datacond$value)
ggplot(filter(datacond,variable==var_to_see),aes(x=as.Date(date),y=value,fill=name))+
geom_bar(stat="sum")+labs(x=var_to_see,y="",fill="")+guides(size=FALSE)+scale_fill_manual(values = c("red", "darkblue", "yellow2"))+theme(legend.position = "top")+
scale_x_date(limits =as.Date(c(as.Date(min(datacond$date)),as.Date(Sys.time()))),date_breaks ="month",date_labels="%b %y")+theme(legend.position = "top")+mytheme2 Seems like Banco de Bogotá stopped having dislikes but now Bancolombia is leading this blacklist.
As the getstats_video returns text description for each video, one could apply text analysis on every video description , label each output and track that account. Wordclouds on this could be very informative as well.
Using tubeR package could be another way of getting information from youtube within R. However if we want to append this analysis to a Shiny/Dash app or an automated script, using the raw query is the best way to go (speed based-statement).