Youtube Text Analysis


Youtube is the one of the most frequent sites right now. We watch, upload and share videos in this platform, daily. However, how about using it to get key information on a firm’s competitor? Companies are prone to promote its videos, so from a marketing approach, knowing what a competitor is doing, would lead to obtain insights and take better decisions.


Set up


We’ll need jsonlite and curl libraries as well as a youtube APIV3 key from google. So go directly to this link and obtain a key:

Google Cloud Youtube API Dashboard

Google Cloud Youtube API Dashboard

Colombian Banks: applied case


for this blog, we’ll use youtube accounts for the largest banks in Colombia: Bancolombia, Davivienda & Banco de Bogotá. We want to see what they’re doing and how do people feel about their videos.

Getting Youtube channel id’s

Before we move on, we’ll need every channel id: this can be obtained using external pages like this, just copy and paste the channel link and it will print the channel id right away. After finding all channel id’s we should have a dataframe like this:

name cha_id
Bancolombia UCczkYQFOUOgsg958IfKMB_Q
Davivienda UCoh2JDafVSw-0eBuZo4ecpQ
Banco de Bogota UCHZS1Ghu4pU-rofAZUPoHFQ

Statistics per Channel

By using getstats_canal function, we can obtain a channel basic statistics such as total views, comments (if available), count of videos published and so on.

name viewCount commentCount subscriberCount hiddenSubscriberCount videoCount
Grupo Bancolombia 54778143 0 178000 FALSE 936
Banco Davivienda Colombia 24685938 0 95300 FALSE 333
Banco de Bogotá 39510038 0 18400 FALSE 188

Information on Individual Videos per Channel

After using getall_channels on a channel id, it will retrieve all videos from that channel and will call getstats_video to get all information out of every video id. Here we plot dislikeCount but you could try favoriteCount or commentCount. Another parameter one should change is topn, this is the number of videos the function will get from each channel. It’s set to 5 by default.

Seems like Banco de Bogotá stopped having dislikes but now Bancolombia is leading this blacklist.

Possible Usages and Further Analysis

As the getstats_video returns text description for each video, one could apply text analysis on every video description , label each output and track that account. Wordclouds on this could be very informative as well.

tubeR or Raw Query?

Using tubeR package could be another way of getting information from youtube within R. However if we want to append this analysis to a Shiny/Dash app or an automated script, using the raw query is the best way to go (speed based-statement).




Written by: Jhon Parra Github