YouTube has become one of the leading socialmedia and video sharing websites. It has become a vital source of entertainment for millennial and gen-z internet users. Considering the importance of YouTube as a trend driving platform in modern popular culture, we’ve decided to analyze the data for for trending YouTube videos. The data is retrieved from Kaggle.
To start, we need to load the category tags from the JSON files. All the files have the same name and structure, but they have the country code at the beginning. So we can use that to loop over the files and load the categories.
Languages that are not latin-based were removed from the list of datasets since my system doesn’t support the character set encoding for these languages.
# Library for working with json files
library(jsonlite)
# Library for easily cleaning the data
library(tidyverse)
# Library for working with dataframes
library(plotly)
library(htmlwidgets)
library(textclean)
library(ggplot2)
library(sqldf)
library(knitr)
# Define the list of countries
countries <- c('CA', 'DE', 'FR', 'GB', 'MX', 'US')
# Create a blank dataframe with the column names. This dataframe
# will have the union data at the end.
categories <- data.frame(
kind=character(),
etag=character(),
id=integer(),
channel_id=character(),
title=character(),
assignable=logical(),
country=character(),
stringsAsFactors=FALSE
)
for (c in countries){
filr_path <- str_interp('YouTube_Trending/${c}_category_id.json')
# load the json file
jsonData <- fromJSON(filr_path, flatten=TRUE)
# convert the json into a dataframe and rename the columns
temp <- as.data.frame(jsonData[3]) %>% rename(
kind = items.kind,
etag = items.etag,
id = items.id,
channel_id = items.snippet.channelId,
title = items.snippet.title,
assignable = items.snippet.assignable
)
temp$id <- as.integer(temp$id)
# add country column
temp$country=c
# union the data into categories dataframe.
categories <- union(categories, temp)
}
categories <- subset(categories, select=-c(kind, etag, channel_id, assignable))
kable(sample_n(categories, 10), caption = "Categories Samples")
| id | title | country |
|---|---|---|
| 24 | Entertainment | MX |
| 15 | Pets & Animals | GB |
| 31 | Anime/Animation | US |
| 42 | Shorts | DE |
| 42 | Shorts | GB |
| 35 | Documentary | CA |
| 27 | Education | CA |
| 24 | Entertainment | DE |
| 24 | Entertainment | US |
| 44 | Trailers | FR |
Next we need to load the videos data set from the CSV files. The following paragraph prints a sample of the data
test <- read.csv('YouTube_Trending/CAvideos.csv')
test$publish_time <- as.POSIXct(test$publish_time, format="%Y-%m-%dT%H:%M:%OS", tz='UTC')
kable(head(test), caption="Sample of videos data")
| video_id | trending_date | title | channel_title | category_id | publish_time | tags | views | likes | dislikes | comment_count | thumbnail_link | comments_disabled | ratings_disabled | video_error_or_removed | description |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n1WpP7iowLc | 17.14.11 | Eminem - Walk On Water (Audio) ft. Beyoncé | EminemVEVO | 10 | 2017-11-10 17:00:03 | Eminem|Walk|On|Water|Aftermath/Shady/Interscope|Rap | 17158579 | 787425 | 43420 | 125882 | https://i.ytimg.com/vi/n1WpP7iowLc/default.jpg | False | False | False | Eminem’s new track Walk on Water ft. Beyoncé is available everywhere: http://shady.sr/WOWEminem Best of Eminem: https://goo.gl/AquNpo\nSubscribe for more: https://goo.gl/DxCrDV\n\nFor more visit: ://eminem.com://facebook.com/eminem://twitter.com/eminem://instagram.com/eminem://eminem.tumblr.com://shadyrecords.com://facebook.com/shadyrecords://twitter.com/shadyrecords://instagram.com/shadyrecords://trustshady.tumblr.comvideo by Eminem performing Walk On Water. (C) 2017 Aftermath Records://vevo.ly/gA7xKt |
| 0dBIkQ4Mz1M | 17.14.11 | PLUSH - Bad Unboxing Fan Mail | iDubbbzTV | 23 | 2017-11-13 17:00:00 | plush|bad unboxing|unboxing|fan mail|idubbbztv|idubbbztv2|things|best|packages|plushies|chontent chop | 1014651 | 127794 | 1688 | 13030 | https://i.ytimg.com/vi/0dBIkQ4Mz1M/default.jpg | False | False | False | STill got a lot of packages. Probably will last for another year. On a side note, more 2nd channel vids soon. editing with premiere from now on, gon’ be a tedious transition, but i think it’s for the best. __â–º http://www.youtube.com/subscription_center?add_user=iDubbbztv\n\nMain Channel â–º https://www.youtube.com/user/iDubbbzTV\nSecond Channel â–º https://www.youtube.com/channel/UC-tsNNJ3yIW98MtPH6PWFAQ\nGaming Channel â–º https://www.youtube.com/channel/UCVhfFXNY0z3-mbrTh1OYRXA\n\nWebsite â–º http://www.idubbbz.com/\n\nInstagram â–º https://instagram.com/idubbbz/\nTwitter â–º https://twitter.com/Idubbbz\nFacebook â–º http://www.facebook.com/IDubbbz\nTwitch â–º http://www.twitch.tv/idubbbz\n_ |
| 5qpjK5DgCt4 | 17.14.11 | Racist Superman | Rudy Mancuso, King Bach & Lele Pons | Rudy Mancuso | 23 | 2017-11-12 19:05:24 | racist superman|rudy|mancuso|king|bach|racist|superman|love|rudy mancuso poo bear black white official music video|iphone x by pineapple|lelepons|hannahstocking|rudymancuso|inanna|anwar|sarkis|shots|shotsstudios|alesso|anitta|brazil|Getting My Driver’s License | Lele Pons | 3191434 | 146035 | 5339 | 8181 | https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg | False | False | False | WATCH MY PREVIOUS VIDEO â–¶ â–º https://www.youtube.com/channel/UC5jkXpfnBhlDjqh0ir5FsIQ?sub_confirmation=1\n\nTHANKS FOR WATCHING! LIKE & SUBSCRIBE FOR MORE VIDEOS!———————————————————–ME ON: | http://instagram.com/rudymancuso\nTwitter | http://twitter.com/rudymancuso\nFacebook | http://facebook.com/rudymancuso\n\nCAST: Mancuso | http://youtube.com/c/rudymancuso\nLele Pons | http://youtube.com/c/lelepons\nKing Bach | https://youtube.com/user/BachelorsPadTv\n\nVideo Effects: Natale | https://instagram.com/calebnatale\n\nPA:\nPaulina GregoryStudios Channels:| https://youtube.com/c/alesso\nAnitta | http://youtube.com/c/anitta\nAnwar Jibawi | http://youtube.com/c/anwar\nAwkward Puppets | http://youtube.com/c/awkwardpuppets\nHannah Stocking | http://youtube.com/c/hannahstocking\nInanna Sarkis | http://youtube.com/c/inanna\nLele Pons | http://youtube.com/c/lelepons\nMaejor | http://youtube.com/c/maejor\nMike Tyson | http://youtube.com/c/miketyson Mancuso | http://youtube.com/c/rudymancuso\nShots Studios | http://youtube.com/c/shots\n\n#Rudy\n#RudyMancuso |
| d380meD0W0M | 17.14.11 | I Dare You: GOING BALD!? | nigahiga | 24 | 2017-11-12 18:01:41 | ryan|higa|higatv|nigahiga|i dare you|idy|rhpc|dares|no truth|comments|comedy|funny|stupid|fail | 2095828 | 132239 | 1989 | 17518 | https://i.ytimg.com/vi/d380meD0W0M/default.jpg | False | False | False | I know it’s been a while since we did this show, but we’re back with what might be the best episode yet!your dares in the comment section! my book how to write good ://higatv.com/ryan-higas-how-to-write-good-pre-order-links/Launched New Official Store://www.gianthugs.com/collections/ryanChannel://www.youtube.com/higatv://www.twitter.com/therealryanhiga://www.facebook.com/higatv://www.higatv.com://www.instagram.com/notryanhigaus mail or whatever you want here!Box 232355Vegas, NV 89105 |
| 2Vv-BfVoq4g | 17.14.11 | Ed Sheeran - Perfect (Official Music Video) | Ed Sheeran | 10 | 2017-11-09 11:04:14 | edsheeran|ed sheeran|acoustic|live|cover|official|remix|official video|lyrics|session | 33523622 | 1634130 | 21082 | 85067 | https://i.ytimg.com/vi/2Vv-BfVoq4g/default.jpg | False | False | False | 🎧: https://ad.gt/yt-perfect\n💰: https://atlanti.cr/yt-album\nSubscribe to Ed’s channel: http://bit.ly/SubscribeToEdSheeran\n\nFollow Ed on…: http://www.facebook.com/EdSheeranMusic\nTwitter: http://twitter.com/edsheeran\nInstagram: http://instagram.com/teddysphotos\nOfficial Website: http://edsheeran.com\n\nDirector: Jason Koenig: Honna Kimmerer: Ed Sheeran & Zoey Deutch of Photography: Johnny ValenciaCompany: Anonymous ContentProducer: Nina SorianoManager: Doug Hoff: Dan CurwinDesigner: John LavinCasting: Amy Hubbard by: Jason Koenig, Ed Sheeran, Andrew Kolvet, Jenny Koenig, Murray Cummingsby: Jason Koenig & Johnny Valencia: Ian Hubert: Bo Valencia, Dennis Ranalta, Arthur PauliCinematography: Corey KoniniecCamera op: Ryan Haug1st AC: Ryan Brown1st Assistant Director: Ole ZapatkaDirector: Klaus Hartlfx: Lucien Stephenson: Thomas Berz: Claudia Lajda& Makeup: Christel ThoresenCasting: Ursula KiplingerVFX: ZoicThanks to: The Hintertux Glacier, Austria;Tenne, and Hotel Neuhintertux |
| 0yIWz1XEeyc | 17.14.11 | Jake Paul Says Alissa Violet CHEATED with LOGAN PAUL! #DramaAlert Team 10 vs Martinez Twins! | DramaAlert | 25 | 2017-11-13 07:37:51 | #DramaAlert|Drama|Alert|DramaAlert|keemstar|youtube news|jake paul|team 10|alissa violet|cheated|logan paul|logan paul alissa violet|jake paul alissa violet|Martinez Twins|left team 10|faze banks|erika costell | 1309699 | 103755 | 4613 | 12143 | https://i.ytimg.com/vi/0yIWz1XEeyc/default.jpg | False | False | False | â–º Follow for News! - https://twitter.com/KEEMSTAR\n\nâ–º Also follow #DramaAlert on:‹† Instagram: https://instagram.com/DramaAlert\n⋆ Twitter: https://twitter.com/DramaAlert\n⋆ Facebook: https://facebook.com/DramaAlert\n\nâ–º Follow for livestreams! - https://twitch.tv/KEEMSTAR\n\nâ–º KEEM Merch://keem.shirtz.cool–º USE CODE (KEEM)://gfuel.com/pages/keemstarin the Woods! (OUT NOW)–º iTunes://itunes.apple.com/us/album/dollar-in-the-woods-single/id1295414119https://itunes.apple.com/us/album/dollar-in-the-woods-single/id1295414119–º Spotify ://open.spotify.com/track/3uUHoKWqPbJ5qoREGbguC9?si=v4CgSBBR–º YouTube (Music Video)://youtu.be/n38Qxi7TVWo! (My New Game)–º Apple (iOS)://itunes.apple.com/us/app/the-adpocalypse/id1263621591–º Android://play.google.com/store/apps/details?id=com.projectorgames.howtogetahead |
kable(summary(test), caption="Summary of videos data")
| video_id | trending_date | title | channel_title | category_id | publish_time | tags | views | likes | dislikes | comment_count | thumbnail_link | comments_disabled | ratings_disabled | video_error_or_removed | description | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Length:40881 | Length:40881 | Length:40881 | Length:40881 | Min. : 1.0 | Min. :2008-01-13 01:32:16 | Length:40881 | Min. : 733 | Min. : 0 | Min. : 0 | Min. : 0 | Length:40881 | Length:40881 | Length:40881 | Length:40881 | Length:40881 | |
| Class :character | Class :character | Class :character | Class :character | 1st Qu.:20.0 | 1st Qu.:2018-01-02 14:21:05 | Class :character | 1st Qu.: 143902 | 1st Qu.: 2191 | 1st Qu.: 99 | 1st Qu.: 417 | Class :character | Class :character | Class :character | Class :character | Class :character | |
| Mode :character | Mode :character | Mode :character | Mode :character | Median :24.0 | Median :2018-02-24 23:00:01 | Mode :character | Median : 371204 | Median : 8780 | Median : 303 | Median : 1301 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | |
| NA | NA | NA | NA | Mean :20.8 | Mean :2018-02-24 08:05:59 | NA | Mean : 1147036 | Mean : 39583 | Mean : 2009 | Mean : 5043 | NA | NA | NA | NA | NA | |
| NA | NA | NA | NA | 3rd Qu.:24.0 | 3rd Qu.:2018-04-23 01:48:54 | NA | 3rd Qu.: 963302 | 3rd Qu.: 28717 | 3rd Qu.: 950 | 3rd Qu.: 3713 | NA | NA | NA | NA | NA | |
| NA | NA | NA | NA | Max. :43.0 | Max. :2018-06-14 02:25:38 | NA | Max. :137843120 | Max. :5053338 | Max. :1602383 | Max. :1114800 | NA | NA | NA | NA | NA |
The sample shows that we need to do the following cleaning steps:
logical)POSIXct type to get the timestamp value, we will cast it to date since we will only use the date value.countries <- c('CA', 'DE', 'FR', 'GB', 'MX', 'US')
# countries <- c('CA')
videos <- data.frame(
ratings_disabled=logical(),
publish_time=as.Date(character()),
video_error_or_removed=logical(),
comment_count=integer(),
description=character(),
title=character(),
views=integer(),
trending_date=as.Date(character()),
thumbnail_link=character(),
category_id=integer(),
likes=integer(),
channel_title=character(),
comments_disabled=logical(),
video_id=character(),
dislikes=integer(),
tags=character(),
country=character(),
stringsAsFactors=FALSE
)
for (c in countries){
temp = read.csv(str_interp('YouTube_Trending/${c}videos.csv'))
temp$video_error_or_removed <- as.logical(temp$video_error_or_removed)
temp$ratings_disabled <- as.logical(temp$ratings_disabled)
temp$comments_disabled <- as.logical(temp$comments_disabled)
temp$publish_time <- as.Date(temp$publish_time, "%Y-%m-%d")
temp$trending_date <- as.Date(temp$trending_date, '%y.%d.%m')
temp$country = c
videos <- union(videos, temp)
}
kable(sample_n(videos, 14), caption='Sample clean videos data')
| ratings_disabled | publish_time | video_error_or_removed | comment_count | description | title | views | trending_date | thumbnail_link | category_id | likes | channel_title | comments_disabled | video_id | dislikes | tags | country |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FALSE | 2018-03-03 | FALSE | 845 | Meet Sparta, the world’s smallest cat. “Aww, small animals are so cute!†you might say but while most tiny animals are cute this one is mean. They don’t call him “The Mean Kitty†for nothing! Subscribe to TheMeanKitty: http://bit.ly/SubTmk\n\nBUY THE MEAN KITTY BOOK:://goo.gl/2QZ2TVMOREPopular: https://www.youtube.com/playlist?list=PLBBAC287EFA11E6AB\nSparta: https://www.youtube.com/playlist?list=PLsYvFuldmOwcsums4NYT4ZFt4p4OTiz5S\nLoki: https://www.youtube.com/playlist?list=PLsYvFuldmOwdbU1ZadPxB_YBFEt_H_nHm\nSongs: https://www.youtube.com/playlist?list=PLsYvFuldmOwdhEOquQifX0_uZ853B1fql\nBroken Cats: https://www.youtube.com/playlist?list=PLsYvFuldmOwcKofAXnRtyHIrYLzu6vkV6\n\n\nFollow TheMeanKitty on Socials:: https://www.facebook.com/TheRealMeanKitty/\nTwitter: https://twitter.com/TheMeanKitty\n\n\nCATS WITH ATTITUDES!- from viral hit The Mean Kitty Song which now has over 80 million views to date. He’s a Bengal mix, born sometime in mid-2007; I celebrate his b-day on May 20th. He rescued me in July of 2007. Sparta love stalking, wrestling, crunchy toys, playing fetch and being held like a baby.- the long white kitty that looks like a cow but I believe he may be part monkey. Loki is about the same age as Sparta, so we celebrate their birthdays together. He found me at a rescue center in July of 2008. He loves toys, making noise, hanging upside down, poking things and more than anything, he loves the love! | World’s Smallest Cat - Cute, Tiny and Mean | 311963 | 2018-03-10 | https://i.ytimg.com/vi/7HIt7pA4WqY/default.jpg | 15 | 5003 | TheMeanKitty | FALSE | 7HIt7pA4WqY | 1337 | worlds smallest cat|small cat|tiny cat|rusty spotted cat|smallest cat in the world|world’s smallest cat video|world’s smallest cat ever|world’s smallest cat species|smallest cats ever|smallest cat breed|smallest cat species|smallest|smallest animals in the world|smallest animals ever|smallest animals|tiniest|smallest things ever|world’s smallest animals|smallest animal|small animals|smallest animal on earth|smallest animals on earth|themeankitty | GB |
| FALSE | 2017-11-11 | FALSE | 860 | ሰላሠáŠá‹š ቪድዮ ካብዚ ዳዉሎድ áˆáŒá‰£áˆ ን ኣáˆá‰²áˆµ ብዙሕ ጉዳት ሲለ ዘለዎ ብáŠá‰¥áˆ¨á‰µáŠ©áˆ áŠ«á‰¥á‹š ቻáŠáˆ áŠá‹š ቪዲዮ ዳዉሎድ áˆáŒá‰£áˆ áˆá‹áˆáŒ‹áˆ•ን ኩáˆáŠ©áˆ áŠ¥á‹©á¢„— © Copyright: Shalom Entertainment | New Eritrean film Dama (ዳማ) part 13 Shalom Entertainment 2017 | 260199 | 2017-11-14 | https://i.ytimg.com/vi/1YhtkrC2t0c/default.jpg | 22 | 4833 | Shalom Entertainment | FALSE | 1YhtkrC2t0c | 302 | [none] | CA |
| FALSE | 2018-01-27 | FALSE | 1775 | Get 25% off the Solid Gold Calendar with coupon code LIMCAL25 (through this Sunday only!) â–º http://solidgoldaquatics.com/product/autographed-2018-solid-gold-calendar-monthly-giveaways/\n\nWatch the Aquascape team as they expertly construct a beautiful goldfish pond in my backyard! Pond Guy’s channel â–º https://www.youtube.com/user/ThePondGuyAquascape\n\nLearn more about Aquascape â–º https://www.aquascapeinc.com/\n\nTropical Water Gardens (my local certified Aquascape contractor) â–º https://www.tropicalwatergardens.com/ __________to my blog post â–º http://solidgoldaquatics.com/2018/01/27/my-epic-new-fish-pond/\n__________\n\nNEW VIDEOS FRIDAYS (and sometimes Tuesdays)!â–º http://www.youtube.com/subscription_center?add_user=flashofpink\nWebsite â–º http://www.solidgoldaquatics.com\nFacebook â–º https://www.facebook.com/solidgoldaquatics\nInstagram â–º http://instagram.com/solidgoldaquatics\nSnapchat â–º solidgoldaquaâ–º https://twitter.com/solidgoldaqua\n__________\n\nBecome a Solid Gold Member! â–º http://solidgoldaquatics.com/membership-join/\n__________\n\nITEMS IN THIS VIDEO: Aquascape Pond Kit â–º http://amzn.to/2FoyAx1 (affiliate link)Patio Pond â–º http://amzn.to/2Eg9qRQ (affiliate link) __________: Funky World by Geoxor: https://soundcloud.com/geoxor_official/funky-world and Toy Houses by Joey Pecoraro: https://soundcloud.com/joeypecoraro/toy-houses\n__________\n\nThis video description contains Amazon affiliate links, which means that if you click on the link and purchase the item, Amazon will give me a small percentage of the sale. Using my Amazon links for useful pet care products that I recommend is an easy way you can help support what I’m doing here at Solid Gold. | MY EPIC NEW FISH POND! | 298322 | 2018-02-03 | https://i.ytimg.com/vi/nUDX6OCXEyc/default.jpg | 15 | 9817 | Solid Gold Aquatics | FALSE | nUDX6OCXEyc | 310 | backyard pond|how to build a pond|how to|pond|fish pond|koi pond|pond waterfall|waterfall|water feature|tutorial|step by step|aquascape pond|building a pond|aquascape inc|the pond guy|Greg Wittstock|red eared slider|red-eared slider|turtle|turtle pond|patio pond|pet turtle|cuff and link|Jennifer Lynx|Solid Gold Aquatics|DIY|do it yourself | US |
| FALSE | 2018-04-12 | FALSE | 9439 | â–º Listen LIVE: http://power1051fm.com/\nâ–º Facebook: https://www.facebook.com/Power1051NY/\nâ–º Twitter: https://twitter.com/power1051/\nâ–º Instagram: https://www.instagram.com/power1051/ | Bow Wow Talks #BowWowChallenge And Addresses Rumors In His Last Radio Interview | 1479454 | 2018-04-14 | https://i.ytimg.com/vi/aCAhbjLKkt4/default.jpg | 24 | 18380 | Breakfast Club Power 105.1 FM | FALSE | aCAhbjLKkt4 | 1764 | the breakfast club|power1051|celebrity news|radio|video|interview|angela yee|charlamagne tha god|dj envy|bow wow|lil bow wow|shad moss|bowwowchallenge|bow wow challenge | GB |
| FALSE | 2018-04-21 | FALSE | 1106 | ذا Ùويس العرض المباشر الثالث | شيماء من المغرب | اختارها ØÙ…اقى | 21-4- The Voice 2018°Ø§ Ùويس 2018 مرØÙ„Ø© المواجهة | The Voice °Ø§ Ùويس الصوت Ùˆ بس | The Voice 2018‚ناة تقدم مضمونا متنوعا Ùˆ متميزا من ØÙŠØ« الشكل Ùˆ المضمون لإرضاء جميع الأذواق. | ذا Ùويس العرض المباشر الثالث | شيماء من المغرب | اختارها ØÙ…اقى | 21-4- The Voice 2018 | 395924 | 2018-04-23 | https://i.ytimg.com/vi/TvYfboCx-cE/default.jpg | 24 | 4969 | New life | FALSE | TvYfboCx-cE | 289 | ذاÙويس“|”كيدز“|”الاهلى“|”الزمالك“|”Ù„ÙŠÙØ±Ø¨ÙˆÙ„“|”برشلونة“|”ريال“|”مدريد“|”ذا“|”Ùويس“|”ØÙ„قات“|”الصوت“|”Ùˆ“|”بس“|”The“|”Voice“|”2018“|”مصر“|”سوريا“|”الجزائر“|”المغرب“|”تونس“|”ليبيا“|”الكويت“|”اليمن“|”لبنان“|”العراق“|”اØÙ„ام“|”اليسا“|”عاصى“|”ØÙ…اقى“|”Ù…ØÙ…د“|”صلاؓ|”بورنموث“|”ذا Ùويس العرض المباشر الثالث | شيماء من المغرب | اختارها ØÙ…اقى | 21-4- The Voice 2018 | FR |
| FALSE | 2018-02-28 | FALSE | 3697 | Thách thức danh hà i táºp 15 (gala 2) mùa 4 là sá»± trở lại cá»§a hai thà sinh được xem là tiá»m năng nhất Mỹ Ngân và Kim Hoà ng. Ở gala 2 nà y, chá»§ nhân giải thưởng 150 triệu sẽ xuất hiện, cùng theo dõi ngay để biết ai sẽ nháºn được trà ng cưá»i từ BGK nhé!¡ch thức danh hà i mùa 4 phát sóng và o lúc 20h30 thứ 4 hà ng tuần trên HTV7.¡ch thức danh hà i mùa 4 đã trở lại vá»›i sá»± góp mặt cá»§a bá»™ đôi Giám khảo Trấn Thà nh, Trưá»ng Giang và MC-Danh Hà i Tiến Luáºt cÅ©ng những thà sinh được tuyển chá»n trong toà n quốc.¡ch thức danh hà i trân trá»ng cảm Æ¡n Táºp Ä‘oà n Ä‘iện tá» ASANZO nhà tà i trợ chÃnh đã đồng hà nh cùng chương trình‘‰ Fanpage Thách thức danh hà i: https://www.facebook.com/thachthucdan… | Thách thức danh hà i 4 |gala 2: Hết “bá» túi†100 triệu, cô sinh viên Nông Lâm lại “ẵm trá»n†150 triệu | 3337702 | 2018-03-01 | https://i.ytimg.com/vi/Fi_RxjAKurw/default.jpg | 24 | 21370 | DIEN QUAN Comedy / Hà i | FALSE | Fi_RxjAKurw | 2504 | thách thức danh hà i|thach thuc danh hai|trấn thà nh|trưá»ng giang|tran thanh|truong giang|trấn thà nh thách thức danh hà i|trưá»ng giang thách thức danh hà i|tran thanh thach thuc danh hai|truong giang thach thuc danh hai|thách thức danh hà i gala 2|thach thuc danh hai gala 2|thách thức danh hà i gala 2 full|thach thuc danh hai gala 2 full|dienquan|htv thach thuc danh hai|thach thuc danh hai tap 15|thách thức danh hà i táºp 15 full | CA |
| FALSE | 2018-04-04 | FALSE | 208 | Aujourd’hui, je mélange 25 sortes de céréales et je fais un gâteau avec ça! C’est le défi que Huby m’a lancé 😜-toi à la chaîne de Huby: http://bit.ly/2kOAMDS\nInstagram d’Isaac: http://bit.ly/2A4lCEX\n\n💫 Like la vidéo et abonne-toi ici: http://bit.ly/1nAJhmi\n\nâœ”ï¸ Active les notifications en cliquant sur la 🔔 pour voir toutes mes vidéos et partage avec tes amis! ’« Suis-moi sur mes réseaux sociaux:: http://bit.ly/2mK60S3\nINSTAGRAM PRO: http://bit.ly/2zkxyDr\nSNAPCHAT : carlarsenault: http://bit.ly/2gZQDOL\nTWITTER : http://bit.ly/2he9uVf\n\nMa nouvelle collection de t-shirts👕: http://fybr.fr/45-carl-is-cooking\n\nMon livre📕: http://amzn.to/2nf4YM9\n\nTu auras besoin: g de chapelure de céréales g de beurre fondu g de fromage à la crème g de yaourt nature g de purée de framboise g de sucre g d’agar agarpour la décoque j’utilise pour filmer:©ra fixe: http://amzn.to/2mFfAmO\nCaméra mobile: http://amzn.to/2Dj1vVA\nRing flash: http://amzn.to/2DecpwZ\n\nN'OUBLIEZ PAS, VOUS ÊTES LES MEILLEURS! ðŸªðŸ’™Â© Carl is cooking 2015-2018 | JE FAIS UN GÂTEAU AVEC 25 SORTES DE CÉRÉALES | 14640 | 2018-04-05 | https://i.ytimg.com/vi/Hk7WucXqqQY/default.jpg | 26 | 1208 | Carl is cooking | FALSE | Hk7WucXqqQY | 18 | gateau 25 sortes de céréales“|”gateau xxl“|”recette xxl“|”dessert xxl“|”degustation américaine“|”dégustation américain“|”dégustation américaine en français“|”degustation quebecoise“|”je mélange“|”carl is cooking“|”huby“|”degustation canadienne“|”recette cheesecake“|”recette cheesecake facile“|”cheesecake“|”cheesecake sans cuisson“|”cheesecake sans cuisson sans gelatine“|”cheesecake sans gélatine“|”cheesecake framboise“|”recette cheesecake philadelphia“|”recette cheesecake sans cuisson | FR |
| FALSE | 2018-04-05 | FALSE | 24053 | Here’s the original video of Haru the Shiba Inu eating popcorn: https://www.youtube.com/watch?v=Vf_4juIwad0\n\nHere's Julien’s tribute video to #ad: https://www.youtube.com/watch?v=4UO9E3Tb1Qc&t=1s\n\nPlease subscribe to my channel and my vlog channel! I make new videos here every Wednesday and make vlogs during my majestical daily life. ://www.youtube.com/JennaMarbles://www.youtube.com/JennaMarblesVlogour weekly podcast ://www.youtube.com/user/JennaJulienPodcast://www.twitch.tv/jennajulienpast gaming from Twitch to Jenna Julien Games://www.youtube.com/channel/UC_Z0x662N1VUN9J7FYwCwkg:: ://www.facebook.com/pages/Jenna-Mourey/311917224927:://twitter.com/#!/Jenna_Marbles_Marbles://twitter.com/charlesmarbles://twitter.com/kermit_thedog_thedog:://jennamarblesblog.com/shop :://www.jennamarblesblog.com/: ://jennamarbles.tumblr.com/://instagram.com/JennaMarbles | My Dogs Eating Popcorn ASMR | 2437584 | 2018-04-17 | https://i.ytimg.com/vi/cYxzLGk2NcY/default.jpg | 23 | 167781 | JennaMarbles | FALSE | cYxzLGk2NcY | 7006 | jenna|marbles|mourey|dog asmr|my dogs|eating|eat|popcorn|asmr|popcorn asmr|cute|funny|kermit|peach|italian greyhound|chihuahua|chewing|munching|whisper|ad|hamster|julien|solomita|boyfriend|vlog|adorable|puppy|best|awesome|funniest|cutest|relaxing|tantrum|microphone|soothing|cermet|paesh|annoy|puppies|shiba|inu|quiet|channel|dog vlog | US |
| FALSE | 2018-01-10 | FALSE | 13529 | Lots to talk about, you Beautiful Bastards. Let’s jump into it… save yourself some money w/ TING!: http://phil.ting.com\nI Quit. We’re Starting Something New… : https://youtu.be/OYLqrH6YjLw\nWOW! I Didn’t Even Think Of That… : https://youtu.be/9lAoLe7tr60\nNew TheDeFrancoFam!: https://youtu.be/Cp5G0JFqoX0\n————————————to support the show, AND get cool stuff?!€”———————————up to http://DeFrancoElite.com to get early vlogs, bonus videos, exclusive livestreams, exclusive posters and mugs, and private Discord access.up for Postmates (Awesome Food/Drink Delivery) use code PhillyD and get $100 Free Delivery Credit: http://PostDeFranco.com\nInterested in Bitcoin? Sign up for Coinbase (Awesome way to Buy/Sell/Store Bitcoin/Etherium/Litecoin) and get $10 worth of Bitcoin with your first $100 deposit: https://www.coinbase.com/join/593e99f483ace31d47c4ba5b\nWHERE DID YOU GET THAT DOPE SHIRT?!: http://ShopDeFranco.com\n————————————Two DeFranco Shows!€”———————————! Video Exposes Teacher Being Slammed By Police For Criticizing School Board and More…: https://youtu.be/UWUmHI3GyTU\nWOW! Google Sued For Discriminating Against White Men and Bella Thorne’s Abuse Story Makes Waves..: https://youtu.be/zViKDlOzK0s\n————————————IN AWESOME!€”———————————the Dog to their Owner: https://youtu.be/rYzLh2QuraQ\nBad Lip Reading Trump Anthem: https://youtu.be/Zo_mpwmashg\nDunkey’s Best of 2017: https://youtu.be/P6ODTQKhaXk\nHonest Game Trailer Sonic Forces: https://youtu.be/SkerOgIfpGU\nWanna Work With Us? Here’s How!: https://twitter.com/GrownWomanChild/status/951104165161312256\nSecret Link: https://youtu.be/obkLDeO58Wo\n————————————:€”———————————Paul Dead Body Video Controversy: Coverage: https://youtu.be/ZAyvEft9MIs\nhttps://youtu.be/4_FHvf9typs\nUPDATE: ://www.polygon.com/2018/1/10/16873340/youtube-logan-paul-statement-consequence-channel://www.rollingstone.com/culture/news/logan-paul-youtube-looking-into-further-consequences-w515291://abcnews.go.com/Entertainment/youtube-responds-controversial-logan-paul-video/story?id=52254465-Ian’s Video ‘How to Get Views Like Logan Paul’: https://youtu.be/Q-iacolSpi8\n\nNorth Carolina Gerrymandering: ://time.com/5096431/north-carolina-voting-districts-gerrymandering/://www.washingtonpost.com/news/morning-mix/wp/2018/01/10/federal-court-voids-north-carolinas-gop-drawn-congressional-map-for-partisan-gerrymandering/?utm_term=.cdc1967d262e://www.nytimes.com/2018/01/09/us/north-carolina-gerrymander.html?_r=1://www.foxnews.com/politics/2018/01/09/north-carolina-congressional-map-illegally-gerrymandered-judges-rule.html://www.wral.com/court-throws-out-nc-congressional-map-again/17245449/ ://www.documentcloud.org/documents/4345694-North-Carolina-partisan-gerrymandering-opinion.html#search/p17/electing%20republicans%20is%20betterLeaves Breitbart: ://www.breitbart.com/big-government/2018/01/09/stephen-k-bannon-steps-breitbart-news-network/://www.nytimes.com/2018/01/09/us/politics/steve-bannon-breitbart-trump.html://www.cnn.com/2018/01/07/politics/read-bannon-full-statement/index.html://www.politico.com/story/2018/01/09/bannon-steps-down-from-breitbart-news-329603://www.newsweek.com/top-20-revelations-trump-fire-and-fury-book-about-golden-showers-ivanka-bannon-769899://www.foxnews.com/politics/2018/01/09/steve-bannon-steps-down-as-executive-chairman-breitbart-news.html://thehill.com/homenews/media/368264-fox-spokesperson-fox-news-will-not-be-hiring-steve-bannon€”———————————listen on the go? -ITUNES: http://PDSPodcast.com\n-SOUNDCLOUD: https://soundcloud.com/thephilipdefrancoshow\n————————————: http://on.fb.me/mqpRW7\nTWITTER: http://Twitter.com/PhillyD\nINSTAGRAM: https://instagram.com/phillydefranco/\nSNAPCHAT: TheDeFrancoFam: https://www.reddit.com/r/DeFranco\n\n————————————by:Girardier: https://twitter.com/jamesgirardier\n\n\nProduced by:Morones - https://twitter.com/MandaOhDang\n\nMotion Graphics Artist:Borst - https://twitter.com/brianjborst\n\nP.O. BOX: Philip DeFranco Ventura BlvdD #542, CA 91436 | Youtube’s RIDICULOUS New Response To The Logan Paul Scandal Reveals a Huge Problem and More… | 2048788 | 2018-01-12 | https://i.ytimg.com/vi/C-ePy-2WLfY/default.jpg | 24 | 103017 | Philip DeFranco | FALSE | C-ePy-2WLfY | 2705 | logan paul youtube“|”Logan Paul“|”sxephil“|”philip defranco“|”DeFranco“|”philip defranco show“|”YouTube“|”the philip defranco show“|”demonetization“|”Adpocalypse“|”Logan Paul Suicide“|”logan paul apology“|”pewdiepie“|”pewdiepie nazi“|”Ian Kung“|”North Carolina“|”Gerrymandering“|”David Lewis“|”Republicans“|”Democrats“|”Alabama“|”Roy Moore“|”Doug Jones“|”Steve Bannon“|”Trump“|”Donald Trump“|”Paul Manafort“|”Fire and Fury“|”Michael Wolff“|”Breitbart“|”news“|”us news“|”logan paul vlog“|”logan“|”paul“|”suicide apology | FR |
| FALSE | 2018-05-15 | FALSE | 100 | Reto 4 Elementos Episodio 34 Lunes 14 de Mayo 2018 Parte 1 | 72949 | 2018-05-15 | https://i.ytimg.com/vi/LZ5YHRqRda4/default.jpg | 22 | 252 | React Uni3z | FALSE | LZ5YHRqRda4 | 142 | [none] | MX | |
| FALSE | 2017-12-24 | FALSE | 334 | HERKEZE MERHABALAR SİZLERE MAÇ ÖZETLERİNİ EN HIZLI BİR ŞEKİLDE SUNMAYA ÇALIŞACAĞIZ EMEĞE KARŞILIK OLARAK 2SANİYENİZİ AYIRARAK ABONE OLUR LİKE ATARSANIZ ÇOK MUTLU OLURUZ İYİ SEYİRLER !göztepe,Galatasaray Göztepe,galatasaray göztepe maç özeti,galatasaray göztepe mac ozeti,galatasaray 3 göztepe 1,gs 3 göztepe 1,galatasaray goztepe hd maç özeti,galatasaray göztepe maç özeti izle,Yasin Öztekin,Galatasaray,galatasaray goztepe izle,Galatasaray Göztepe Maç Özeti izle,galatasaray 3 goztepe 1,galatasaray goztepe mac ozeti izle,Göztepe,Galatasaray Göztepe beınsport,Galatasaray 3 göztepe 1,Göztepe galatasaray izle,galatasaray goztepe mac,HD | Galatasaray 3-1 Göztepe -HD Maç Özeti - 24/12/2017 | 226163 | 2017-12-25 | https://i.ytimg.com/vi/jekzs98AanI/default.jpg | 17 | 1034 | Yasin Kayrancı | FALSE | jekzs98AanI | 202 | galatasaray göztepe|Galatasaray Göztepe|galatasaray göztepe maç özeti|galatasaray göztepe mac ozeti|galatasaray 3 göztepe 1|gs 3 göztepe 1|galatasaray goztepe hd maç özeti|galatasaray göztepe maç özeti izle|Yasin Öztekin|Galatasaray|galatasaray goztepe izle|Galatasaray Göztepe Maç Özeti izle|galatasaray 3 goztepe 1|galatasaray goztepe mac ozeti izle|Göztepe|Galatasaray Göztepe beınsport|Galatasaray 3 göztepe 1|Göztepe galatasaray izle|galatasaray goztepe mac|HD | DE |
| FALSE | 2018-03-06 | FALSE | 1631 | беÑплатные тарталетки Ñ Ð¼Ð°Ð»Ð¸Ð½Ð¾Ð¹ к заказу от 1500 до 15.03 , Ñкидка 40% без ограничений по Ñроку на 3 меÑÑца.€Ð¾Ð¼Ð¾ÐºÐ¾Ð´ oblomoff8://cheese-cake.ru“руппа Ð’Ñ‹ чо мне привезли? - ://vk.com/foodfails“руппа ВК-://vk.com/atpiska§Ð¸Ñто рецепты -://vk.com/club103827516¾Ð¹ инÑтаграмчик-://www.instagram.com/oblomoffood/±Ð·Ð¾Ñ€Ñ‹ техники-://www.youtube.com/user/muhanesidela’идео-бложик-://www.youtube.com/user/oblomoffstuff | Ищем ДОСТОЙÐЫЕ Ñуши на БÐЛИ! #СлавноеБали | 598186 | 2018-03-07 | https://i.ytimg.com/vi/du0cM_dBRiQ/default.jpg | 24 | 23211 | oblomoff | FALSE | du0cM_dBRiQ | 1718 | Ñлавный друже|обзор Ñуши|Ñуши|обзор|еда|роллы|обзор доÑтавки|друже|доÑтавка еды|ÑпонÑÐºÐ°Ñ ÐºÑƒÑ…Ð½Ñ|обзоры доÑтавок|ÑпонÑÐºÐ°Ñ ÐµÐ´Ð°|bali|руÑÑкие на бали|sashimi|japanese food|путешеÑтвиÑ|ролы|реÑтораны бали|джимбаран|еда на бали|индонезиÑ|bali food|реÑторан|где поеÑть на бали|где покушать на бали|оÑтров бали|отдых на бали|индонезийÑÐºÐ°Ñ ÐºÑƒÑ…Ð½Ñ|азиÑ|туризм|bali restaurants|цены на бали|Ñказочное бали|Ð¸Ð½Ð´Ð¾Ð½ÐµÐ·Ð¸Ñ Ð±Ð°Ð»Ð¸|ÐºÑƒÑ…Ð½Ñ Ð½Ð° бали|jimbaran|отзывы о реÑторанах | DE |
| FALSE | 2018-04-01 | FALSE | 408 | LUP Mexico Andre Marin Alex Aguinaga Gustavo Mendoza Salim SombraJornada 13 Liga Mx Aguilas del America Derrota a la Maquina de Cruz Azul Conferencia Caixinha. si le vas a Cruz Azul tienes Gana de Cambiar de Equipo? Goles Resumen.Raul Jimenez Rabona | La Ultima Palabra - America le Gana a Cruz Azul, Toluca Lider, Tigres Golea a Leon | 182783 | 2018-04-02 | https://i.ytimg.com/vi/uYaoNKY9tr0/default.jpg | 17 | 949 | Los Amos del Periodismo Deportivo 2 | FALSE | uYaoNKY9tr0 | 111 | [none] | MX |
| FALSE | 2018-06-11 | FALSE | 1403 | DOWNLOAD ONEFOOTBALL APP FOR FREE NOW: https://tinyurl.com/Shpendi10CFC----------------------------------------Â--------------------------[LIVE NOW] Belgium vs Costa Rica Live Stream (LIVE NOW) Belgium vs Costa Rica Live Stream Belgium vs Costa Rica 4-1 All Goals and Highlights with English Commentary 2017-18 HD 720pBelgium vs Costa Rica 4 - 1 â— All Goals | 2016/17 [HD] Belgium vs Costa Rica 4-1 Goal De Bruyne 11/06/2018 |HD|Belgium vs Costa Rica 4-1 Goal Lukaku 11/06/2018 |HD| Belgium vs Costa Rica 4-1 All Goals & Highlights 11/06/2018Belgium vs Costa Rica 4:1 2018 - Match Preview 11/06/2018 HD Belgium vs Costa Rica 4-1 All Goals & Highlights 11/06/2018 HDStay with me !https://www.youtube.com/Shpendi10CFChttps://www.twitter.com/ShpendZhubihttps://www.instagram.com/ShpendZhubi | Belgium vs Costa Rica 4-1 - All Goals & Extended Highlights - Friendly 11/06/2018 HD | 2133596 | 2018-06-13 | https://i.ytimg.com/vi/g4a4Mez2M8o/default.jpg | 17 | 11397 | Shpendi10CFC | FALSE | g4a4Mez2M8o | 992 | Belgium vs Costa Rica|Belgium vs Costa Rica 2018|Belgium vs Costa Rica 4-1|Belgium vs Costa Rica highlights|Belgium vs Costa Rica all goals|Belgium vs Costa Rica goals highlights|Costa Rica|Belgium|Romelu Lukaku|Batshuayi|Mertens|Hazard | DE |
Since we won’t be doing any analysis on some of the data, we can remove these columns from the dataframe
videos <- subset(videos, select=-c(description, ratings_disabled, thumbnail_link,
video_error_or_removed, comments_disabled, tags))
Exploring the categories for each country
categories %>% arrange(title) %>% head() %>% kable(caption="Sample of categories data")
| id | title | country |
|---|---|---|
| 32 | Action/Adventure | CA |
| 32 | Action/Adventure | DE |
| 32 | Action/Adventure | FR |
| 32 | Action/Adventure | GB |
| 32 | Action/Adventure | MX |
| 32 | Action/Adventure | US |
Considering that the category ID and title are the same for all countries, we can drop the country column and make everything unique
categories <- categories %>%
select(id, title) %>%
unique() %>%
rename(category_id = id)
head(categories) %>% kable(caption = "Sample clean categories data")
| category_id | title |
|---|---|
| 1 | Film & Animation |
| 2 | Autos & Vehicles |
| 10 | Music |
| 15 | Pets & Animals |
| 17 | Sports |
| 18 | Short Movies |
To avoid issues related to character encoding, we will only keep videos with titles and channel names that have characters in the ascii table.
vid_title <- replace_non_ascii(
videos$title,
replacement = NA,
remove.nonconverted = TRUE)
temp_videos_df <- mutate(videos, title = vid_title) %>% na.omit(temp_videos_df)
channel_title <- replace_non_ascii(
videos$channel_title,
replacement = NA,
remove.nonconverted = TRUE)
temp_videos_df <- mutate(temp_videos_df, channel_title = channel_title) %>% na.omit(temp_videos_df)
videos <- temp_videos_df
kable(head(videos, 10), caption="Sample ASCII compatible videos data")
| publish_time | comment_count | title | views | trending_date | category_id | likes | channel_title | video_id | dislikes | country |
|---|---|---|---|---|---|---|---|---|---|---|
| 2017-11-10 | 125882 | Eminem - Walk On Water (Audio) ft. BeyoncA(C) | 17158579 | 2017-11-14 | 10 | 787425 | EminemVEVO | n1WpP7iowLc | 43420 | CA |
| 2017-11-13 | 13030 | PLUSH - Bad Unboxing Fan Mail | 1014651 | 2017-11-14 | 23 | 127794 | iDubbbzTV | 0dBIkQ4Mz1M | 1688 | CA |
| 2017-11-12 | 8181 | Racist Superman | Rudy Mancuso, King Bach & Lele Pons | 3191434 | 2017-11-14 | 23 | 146035 | Rudy Mancuso | 5qpjK5DgCt4 | 5339 | CA |
| 2017-11-12 | 17518 | I Dare You: GOING BALD!? | 2095828 | 2017-11-14 | 24 | 132239 | nigahiga | d380meD0W0M | 1989 | CA |
| 2017-11-09 | 85067 | Ed Sheeran - Perfect (Official Music Video) | 33523622 | 2017-11-14 | 10 | 1634130 | Ed Sheeran | 2Vv-BfVoq4g | 21082 | CA |
| 2017-11-13 | 12143 | Jake Paul Says Alissa Violet CHEATED with LOGAN PAUL! #DramaAlert Team 10 vs Martinez Twins! | 1309699 | 2017-11-14 | 25 | 103755 | DramaAlert | 0yIWz1XEeyc | 4613 | CA |
| 2017-11-12 | 26629 | Vanoss Superhero School - New Students | 2987945 | 2017-11-14 | 23 | 187464 | VanossGaming | _uM5kFfkhB8 | 9850 | CA |
| 2017-11-13 | 15959 | WE WANT TO TALK ABOUT OUR MARRIAGE | 748374 | 2017-11-14 | 22 | 57534 | CaseyNeistat | 2kyS6SvSYSE | 2967 | CA |
| 2017-11-12 | 36391 | THE LOGANG MADE HISTORY. LOL. AGAIN. | 4477587 | 2017-11-14 | 24 | 292837 | Logan Paul Vlogs | JzCsM1vtn78 | 4123 | CA |
| 2017-11-10 | 1484 | Finally Sheldon is winning an argument about the existence of God | 505161 | 2017-11-14 | 22 | 4135 | Sheikh Musa | 43sm-QwLcx4 | 976 | CA |
To explore the data, we are going to look at the viewing trend of videos per country.
library(ggplot2)
ggplotly(
videos %>% group_by(country, trending_date) %>%
count() %>%
ggplot(aes(x=trending_date, y=n, group=country)) +
geom_line(aes(color=country)) +
ggtitle("Trending videos per country") +
scale_y_continuous(name="Videos", limits=c(70, 210), breaks=seq(70,210,10)) +
scale_x_continuous(name="Trending Date", breaks = seq(as.Date('2017-11-01'), as.Date('2018-06-26'), 30)) +
theme_minimal()
)
ggplotly(
videos %>% group_by(country, trending_date) %>% summarise(avg_views = mean(views)) %>%
ggplot(aes(x=trending_date, y=avg_views, group=country)) +
geom_line(aes(color=country)) +
ggtitle("Avg views per country") +
scale_y_continuous(name="Avg Views", limits=c(0, 14000000), breaks=seq(0,14000000,1000000), labels = scales::comma) +
scale_x_continuous(name="Trending Date", breaks = seq(as.Date('2017-11-01'), as.Date('2018-06-26'), 30)) +
theme_minimal()
)
ggplotly(
videos %>% group_by(country, trending_date) %>% summarise(pct_99 = quantile(views, probs=c(0.99))) %>%
ggplot(aes(x=trending_date, y=pct_99, group=country)) +
geom_line(aes(color=country)) +
ggtitle("99th Percentile of views per country") +
scale_y_continuous(name="PCT 99th Views", limits=c(0, 260000000), breaks=seq(0,260000000,20000000), labels = scales::comma) +
scale_x_continuous(name="Trending Date", breaks = seq(as.Date('2017-11-01'), as.Date('2018-06-26'), 30)) +
theme_minimal()
)
ggplotly(
videos %>% group_by(country, trending_date) %>% summarise(min_views = min(views)) %>%
ggplot(aes(x=trending_date, y=min_views, group=country)) +
geom_line(aes(color=country)) +
ggtitle("Min views per country") +
scale_y_continuous(name="Min Views", limits=c(0, 320000), breaks=seq(0,320000,20000), labels = scales::comma) +
scale_x_continuous(name="Trending Date", breaks = seq(as.Date('2017-11-01'), as.Date('2018-06-26'), 30)) +
theme_minimal()
)
More over, exploring the correlation between the numeric values in the data will also help show what kind of analysis can be done on the data
library(corrplot)
cor_matrix <- videos %>% select(views, likes, dislikes, comment_count) %>%
cor( method = "pearson", use = "complete.obs")
corrplot(cor_matrix, method="number")
videos %>% select(views, likes, dislikes, comment_count) %>%
pairs()
In this analysis, we have a few questions that we would like to answer:
To answer these questions, we would need to get the data for each video on its last trending date. This will provide a snapshot of all the videos at their peak trending time, which would be an even comparison ground for all the videos data.
videos_last_trending_day <- videos %>%
group_by(video_id) %>%
arrange(video_id, desc(trending_date)) %>%
mutate(row_num = row_number()) %>%
filter(row_num == 1) %>%
subset(select=-c(row_num))
Likes & dislikes per country
sqldf("
with cte as (
select *,
row_number() over (partition by country order by views desc) as video_rank
from videos_last_trending_day
)
select country,
sum(likes) as likes,
sum(dislikes) as dislikes
from cte
where video_rank <= 10
group by country
limit 100") %>% kable()
| country | likes | dislikes |
|---|---|---|
| CA | 5870253 | 616783 |
| DE | 1033171 | 423228 |
| FR | 1513184 | 137625 |
| GB | 28881235 | 3231725 |
| MX | 2871750 | 169955 |
| US | 7367068 | 437387 |
Overall likes & dislikes
sqldf("
with cte as (
select *,
row_number() over (partition by country order by views desc) as video_rank
from videos_last_trending_day
)
select sum(likes) as likes,
sum(dislikes) as dislikes
from cte
where video_rank <= 10
limit 100") %>% kable()
| likes | dislikes |
|---|---|
| 47536661 | 5016703 |
Videos per country
sqldf("
select count(1) videos_count
from videos_last_trending_day
where views > 10000000
limit 100") %>% kable()
| videos_count |
|---|
| 416 |
Overall videos
sqldf("
select country,
count(1) videos_count
from videos_last_trending_day
where views > 10000000
group by 1
limit 100") %>% kable()
| country | videos_count |
|---|---|
| CA | 77 |
| DE | 3 |
| FR | 1 |
| GB | 245 |
| MX | 10 |
| US | 80 |
To be able to find similarities in such a large dataset, we can use unsupervised machine learning algorithms to find clusters in the data. For this analysis, we are using K-Means clustering methods. The data displayed below shows the clusters with their centers
library(mltools)
library(data.table)
library(factoextra)
library(dummies)
library(factoextra)
library(fastDummies)
dataset <- base::merge(videos_last_trending_day, categories, by="category_id") %>%
rename(category_name = title.y) %>%
subset(select=-c(category_id,
publish_time,
title.x,
trending_date,
video_id,
channel_title
))
dataset <- dummy_cols(dataset, select_columns = c('country', 'category_name'))
set.seed(1234)
dataset <- dataset %>% subset(select=-c(country))
dataset <- dataset %>% subset(select=-c(category_name))
km.res <- kmeans(dataset, 6, nstart = 100)
km.res$centers %>% kable(row.names = TRUE)
| comment_count | views | likes | dislikes | country_CA | country_DE | country_FR | country_GB | country_MX | country_US | category_name_Autos & Vehicles | category_name_Comedy | category_name_Education | category_name_Entertainment | category_name_Film & Animation | category_name_Gaming | category_name_Howto & Style | category_name_Movies | category_name_Music | category_name_News & Politics | category_name_Nonprofits & Activism | category_name_People & Blogs | category_name_Pets & Animals | category_name_Science & Technology | category_name_Shows | category_name_Sports | category_name_Trailers | category_name_Travel & Events | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 35880.925 | 19231302.9 | 396207.034 | 25997.9887 | 0.2180451 | 0.0075188 | 0.0037594 | 0.5375940 | 0.0263158 | 0.2067669 | 0.0075188 | 0.0150376 | 0.0000000 | 0.1804511 | 0.0639098 | 0.0187970 | 0.0112782 | 0.000000 | 0.5827068 | 0.0037594 | 0.0037594 | 0.0300752 | 0.0000000 | 0.0112782 | 0.0000000 | 0.0676692 | 0.00e+00 | 0.0037594 |
| 2 | 211807.400 | 327910910.2 | 3257466.400 | 212472.6000 | 0.0000000 | 0.0000000 | 0.0000000 | 1.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.000000 | 1.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00e+00 | 0.0000000 |
| 3 | 149278.053 | 121396174.8 | 1675693.263 | 163983.3684 | 0.0000000 | 0.0000000 | 0.0000000 | 1.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.1052632 | 0.0000000 | 0.0000000 | 0.0000000 | 0.000000 | 0.8947368 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00e+00 | 0.0000000 |
| 4 | 90824.922 | 57377152.2 | 920458.875 | 81142.8750 | 0.0156250 | 0.0000000 | 0.0000000 | 0.8125000 | 0.0000000 | 0.1718750 | 0.0000000 | 0.0156250 | 0.0000000 | 0.1093750 | 0.0312500 | 0.0000000 | 0.0156250 | 0.000000 | 0.7968750 | 0.0000000 | 0.0000000 | 0.0156250 | 0.0000000 | 0.0156250 | 0.0000000 | 0.0000000 | 0.00e+00 | 0.0000000 |
| 5 | 13714.361 | 4274939.0 | 126603.557 | 6288.3621 | 0.3780488 | 0.0426829 | 0.0281426 | 0.2246717 | 0.0708255 | 0.2556285 | 0.0028143 | 0.0984991 | 0.0065666 | 0.2800188 | 0.0581614 | 0.0257974 | 0.0581614 | 0.000469 | 0.2631332 | 0.0201689 | 0.0009381 | 0.0769231 | 0.0065666 | 0.0201689 | 0.0000000 | 0.0792683 | 0.00e+00 | 0.0023452 |
| 6 | 1051.793 | 220959.1 | 7700.964 | 344.3836 | 0.2199418 | 0.1864425 | 0.2071976 | 0.0293918 | 0.2998477 | 0.0571786 | 0.0191910 | 0.0625782 | 0.0221152 | 0.3098580 | 0.0398917 | 0.0340705 | 0.0630406 | 0.000000 | 0.0647816 | 0.0907867 | 0.0046107 | 0.1536505 | 0.0076574 | 0.0229721 | 0.0018225 | 0.0981040 | 1.36e-05 | 0.0048556 |
Below is a sample of the data joined to the clustering output
categ_videos <- base::merge(videos_last_trending_day, categories, by="category_id") %>%
rename(category_name = title.y) %>%
subset(select=-c(category_id,
publish_time,
title.x,
video_id,
# split_tags,
channel_title
)) %>%
drop_na()
video_cluster <- cbind(categ_videos, cluster = km.res$cluster)
video_cluster %>% head() %>% kable()
| comment_count | views | trending_date | likes | dislikes | country | category_name | cluster |
|---|---|---|---|---|---|---|---|
| 791 | 198904 | 2018-06-09 | 4319 | 142 | CA | Film & Animation | 6 |
| 432 | 507121 | 2018-05-06 | 4129 | 185 | US | Film & Animation | 6 |
| 9 | 5797 | 2018-01-04 | 53 | 2 | MX | Film & Animation | 6 |
| 0 | 14745 | 2018-02-28 | 342 | 11 | MX | Film & Animation | 6 |
| 4 | 16786 | 2017-11-20 | 66 | 10 | FR | Film & Animation | 6 |
| 33 | 8224 | 2017-12-16 | 186 | 13 | DE | Film & Animation | 6 |
Below are charts showing how the video categories are clustered per average views, comment count, likes & dislikes.
ggplotly(
video_cluster %>% group_by(category_name, cluster = as.character(cluster)) %>%
summarise(avg_views = mean(views)) %>%
ggplot(aes(x=cluster, y=avg_views, group=category_name)) +
geom_bar(stat = 'identity', aes( fill=category_name), position=position_dodge()) +
theme(axis.text.x = element_text(face = "bold",size = 12, angle = 45, hjust = 1)) +
ggtitle("Avg views per cluster and category") +
theme_classic()
)
ggplotly(
video_cluster %>% group_by(category_name, cluster = as.character(cluster)) %>%
summarise(avg_comments = mean(comment_count)) %>%
ggplot(aes(x=cluster, y=avg_comments, group=category_name)) +
geom_bar(stat = 'identity', aes( fill=category_name), position=position_dodge()) +
theme(axis.text.x = element_text(face = "bold",size = 12, angle = 45, hjust = 1)) +
ggtitle("Avg comments per cluster and category") +
theme_classic()
)
ggplotly(
video_cluster %>% group_by(category_name, cluster = as.character(cluster)) %>%
summarise(avg_likes = mean(likes)) %>%
ggplot(aes(x=cluster, y=avg_likes, group=category_name)) +
geom_bar(stat = 'identity', aes( fill=category_name), position=position_dodge()) +
theme(axis.text.x = element_text(face = "bold",size = 12, angle = 45, hjust = 1)) +
ggtitle("Avg likes per cluster and category") +
theme_classic()
)
ggplotly(
video_cluster %>% group_by(category_name, cluster = as.character(cluster)) %>%
summarise(avg_dislikes = mean(dislikes)) %>%
ggplot(aes(x=cluster, y=avg_dislikes, group=category_name)) +
geom_bar(stat = 'identity', aes( fill=category_name), position=position_dodge()) +
theme(axis.text.x = element_text(face = "bold",size = 12, angle = 45, hjust = 1)) +
ggtitle("Avg dislikes per cluster and category") +
theme_classic()
)
Based on the charts above, we can make the following conclusions:
Based on the clusters created, we can conclude that the category of the video can affects the way users interact with it. The engagement rate on a video category is based on the interaction rate of users with the video.
ggplotly(
video_cluster %>% group_by(category_name) %>%
summarise(
avg_dislikes = mean(dislikes),
avg_likes = mean(likes),
avg_comments = mean(comment_count),
avg_views = mean(views)
) %>%
ggplot(aes(x=avg_views, y=avg_comments, color=category_name)) +
ggtitle("Avg views over avg comments per category") +
geom_point(aes(fill=category_name))
)
ggplotly(
video_cluster %>% group_by(category_name) %>%
summarise(
avg_dislikes = mean(dislikes),
avg_likes = mean(likes),
avg_comments = mean(comment_count),
avg_views = mean(views)
) %>%
ggplot(aes(x=avg_views, y=avg_likes, color=category_name)) +
ggtitle("Avg views over avg likes per category") +
geom_point(aes(fill=category_name))
)
ggplotly(
video_cluster %>% group_by(category_name) %>%
summarise(
avg_dislikes = mean(dislikes),
avg_likes = mean(likes),
avg_comments = mean(comment_count),
avg_views = mean(views)
) %>%
ggplot(aes(x=avg_views, y=avg_dislikes, color=category_name)) +
ggtitle("Avg views over avg dislikes per category") +
geom_point(aes(fill=category_name))
)
As displayed in the charts above, the clusters created by the K-Means algorithm do reflect the attributes of the data. The charts also confirm the assumptions made earlier that: