Visualizing US Youtube Trending Videos 2017

Data Explanation

Objective

This dataset comes from United States Youtube Trending Videos in 2017. There are 13.400 data from 16 category videos. This data consist of trending date, video title, channel title, views, likes, comment, dislikes, etc.

Why?

We have a lot of information of this dataset about some channel title with high views, likes, comment, and so on. But, we are not able to know what kind of videos which get a lot of view, likes, and comment. By aggregating and visualizing the data properly, we would understand the data better than before. That’s why, visualization is very important to deliver information and get more understandings of the data.

Data Preparation

data <- read.csv("USvideos.csv")
rmarkdown::paged_table(data)
str(data)
## 'data.frame':    13400 obs. of  12 variables:
##  $ trending_date         : chr  "17.14.11" "17.14.11" "17.14.11" "17.14.11" ...
##  $ title                 : chr  "WE WANT TO TALK ABOUT OUR MARRIAGE" "The Trump Presidency: Last Week Tonight with John Oliver (HBO)" "Racist Superman | Rudy Mancuso, King Bach & Lele Pons" "Nickelback Lyrics: Real or Fake?" ...
##  $ channel_title         : chr  "CaseyNeistat" "LastWeekTonight" "Rudy Mancuso" "Good Mythical Morning" ...
##  $ category_id           : int  22 24 23 24 24 28 24 28 1 25 ...
##  $ publish_time          : chr  "2017-11-13T17:13:01.000Z" "2017-11-13T07:30:00.000Z" "2017-11-12T19:05:24.000Z" "2017-11-13T11:00:04.000Z" ...
##  $ views                 : int  748374 2418783 3191434 343168 2095731 119180 2103417 817732 826059 256426 ...
##  $ likes                 : int  57527 97185 146033 10172 132235 9763 15993 23663 3543 12654 ...
##  $ dislikes              : int  2966 6146 5339 666 1989 511 2445 778 119 1363 ...
##  $ comment_count         : int  15954 12703 8181 2146 17518 1434 1970 3432 340 2368 ...
##  $ comments_disabled     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ ratings_disabled      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ video_error_or_removed: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...

Result: We have to notice that those (channel_title, publish_time, trending_date, and category_id) columns have incorrect data type. So we have to make it right by changing the type of the data.

Exploratory Data Analysis

Changing type of data

# ubah isi kolom
data$category_id <- sapply(as.character(data$category_id), switch, 
                           "1" = "Film and Animation",
                           "2" = "Autos and Vehicles", 
                           "10" = "Music", 
                           "15" = "Pets and Animals", 
                           "17" = "Sports",
                           "19" = "Travel and Events", 
                           "20" = "Gaming", 
                           "22" = "People and Blogs", 
                           "23" = "Comedy",
                           "24" = "Entertainment", 
                           "25" = "News and Politics",
                           "26" = "Howto and Style", 
                           "27" = "Education",
                           "28" = "Science and Technology", 
                           "29" = "Nonprofit and Activism",
                           "43" = "Shows")

data$category_id <- as.factor(data$category_id)

library(lubridate)
data$trending_date <- ydm(data$trending_date)

data$publish_time <- ymd_hms(data$publish_time, tz = "America/New_York")
## Date in ISO8601 format; converting timezone from UTC to "America/New_York".

Adding Some New Columns (Likes Rate, Dislike Rate, Comment Rate)

data$likesp <- data$likes/data$views
data$dislikesp <- data$dislikes/data$views
data$commentp <- data$comment_count/data$views

Top 5 Videos Based on Likes Rate

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
top_10_video <- aggregate(likesp~title, data, mean)
top_10_video <- head(top_10_video[order(top_10_video$likesp, decreasing = T),],10)
top_10_video
##                                                                                               title
## 1277 Idol EXPOSES Dark Side of KPOP | STORYTIME experiences with SCANDALS, Slave Contracts and more
## 1045                                                           Harry Styles - Kiwi (live in studio)
## 982                                                                               GOALS GOALS GOALS
## 1110                                                                         How Do Machines Learn?
## 2598                                                                          THINGS THAT ARE LOWER
## 588                                                                   COMING OUT TO MY LESBIAN MOMS
## 441                             BTS (‘¡©’Ä㓠ΑÉã‘ܬ) 'MIC Drop (Steve Aoki Remix)' Official Teaser
## 1594                                                     Liam Payne - Bedroom Floor (Live Acoustic)
## 2609                                                             This Is What I'm Wearing Right Now
## 1319                                                   Interlude III - original song | Tessa Violet
##         likesp
## 1277 0.2281817
## 1045 0.1915195
## 982  0.1834375
## 1110 0.1759662
## 2598 0.1740575
## 588  0.1709072
## 441  0.1666532
## 1594 0.1654369
## 2609 0.1624418
## 1319 0.1586368

Visualizing the Top 10 Videos

library(ggplot2)
plot_top_10_video <- ggplot(top_10_video, aes(x=likesp, y=reorder(title, likesp)))+
  geom_col(aes(fill=likesp), show.legend = F)+
  scale_fill_gradient(low = "black", high = "aquamarine")+
  #geom_vline(xintercept = mean(data$likesp),
             #col="green",
             #linetype=2,
             #lwd=1)+
  geom_label(data = top_10_video, mapping = aes(label=round(likesp, 2)))+
  labs(title = "Top 10 Videos ",
       x = "Likes rate",
       y = "Title",
       caption = "Source: Youtube US Trending")
  
plot_top_10_video

Result: We can see that video title “Idol EXPOSES Dark Side of KPOP | STORYTIME ….” gets the highest likes rate video.

Top 10 Channel Youtube Based on Likes rate

top_10_channel <- aggregate(likesp~channel_title, data, sum)
top_10_channel <- head(top_10_channel[order(top_10_channel$likesp, decreasing = T),],10)
top_10_channel
##         channel_title   likesp
## 1106           SMTOWN 4.230463
## 558    IISuperwomanII 3.893953
## 273      ConnorFranta 3.678392
## 1285        Tom Scott 3.102886
## 582        jacksfilms 3.043967
## 1348     vlogbrothers 2.979222
## 1383    William Osman 2.894096
## 766  Marques Brownlee 2.880029
## 771            Marzia 2.879472
## 580       Jackie Aina 2.856786

Visualizing the Top 10 Channel

plot_top_10_channel <- ggplot(top_10_channel, aes(x=likesp, y=reorder(channel_title, likesp)))+
  geom_col(aes(fill=likesp), show.legend = F)+
  scale_fill_gradient(low = "royal blue", high = "aquamarine")+
  #geom_vline(xintercept = mean(data$likesp),
             #col="green",
             #linetype=2,
             #lwd=1)+
  geom_label(data = top_10_channel, mapping = aes(label=round(likesp, 2)))+
  labs(title = "Top 10 Channel Based on Likes Rate",
       x = "Likes Rate",
       y = "Channel Title",
       caption = "Source: Youtube US Trending")
  
plot_top_10_channel

Result: Channels with the highest likes rate were SMTOWN, ||Superwoman||, ConnorFranta, and so on.

Top 10 Category Based on Total Views

top_category <- aggregate(views ~ category_id, data, sum)
top_category <- top_category[order(top_category$views, decreasing = TRUE),]
top_category
##               category_id      views
## 8                   Music 5479488307
## 4           Entertainment 4781000701
## 2                  Comedy 1373174698
## 5      Film and Animation 1066571908
## 7         Howto and Style  918635366
## 11       People and Blogs  794253563
## 13 Science and Technology  598040232
## 15                 Sports  464583585
## 9       News and Politics  314142992
## 3               Education  260789748
## 12       Pets and Animals  179846802
## 6                  Gaming  158951143
## 1      Autos and Vehicles   89924854
## 16      Travel and Events   61874890
## 14                  Shows    1751446
## 10 Nonprofit and Activism     383367

Visualizing Top 10 Category

plot_top_category <- ggplot(top_category, aes(x=views, y=reorder(category_id, views)))+
  geom_col(aes(fill=views), show.legend = T)+
  scale_fill_gradient(low = "black", high = "aquamarine")+
  labs(title = "Top 10 Category Based on Total Views",
       x = "Total Views",
       y = "Channel Title",
       caption = "Source: Youtube US Trending")
  
plot_top_category

Result: Based on the plot above, we can see that Music is the most views of category in US Youtube Trending Videos 2017. Therefore, we can find out which channels are the biggest contributors to Music views by aggregating the data.

#Filtering data only Music category
music <- data %>%
  filter(category_id=="Music")

#Selecting channel_title and views columns
music <- music %>% 
  select(channel_title, views)

#Aggregating data to ordering the top 10
music <- aggregate(views~channel_title, music, sum)
top_10_music <- head(music[order(music$views, decreasing = T),],10)
top_10_music
##        channel_title     views
## 149    LuisFonsiVEVO 534738794
## 91    GEazyMusicVEVO 415888915
## 74        Ed Sheeran 400504946
## 77        EminemVEVO 287435495
## 123 jypentertainment 276291969
## 125    KatyPerryVEVO 273333649
## 237  TaylorSwiftVEVO 244182424
## 108          ibighit 224009476
## 39        Bruno Mars 213126073
## 65    DemiLovatoVEVO 174572450

Visualizing Top 10 Channel in Music Category

plot_top_10_music <- ggplot(top_10_music, aes(x=views, y=reorder(channel_title, views)))+
  geom_col(aes(fill=views), show.legend = F)+
  scale_fill_gradient(low = "black", high = "aquamarine")+
  labs(title = "Top 10 Channel in Music Category Based on Total Views",
       x = "Total Views",
       y = "Channel Title",
       caption = "Source: Youtube US Trending")
  
plot_top_10_music

Result: LuisFonsiVEVO, GEazyMusicVEVO, and Ed Sheeran were the top three of music videos with the most viewed in US Trending Youtube 2017.

Conclusion

From the two first plot, we can see that Korean Youtube channel and video get the highest likes rate in US Youtube Trending Video 2017. If we do further analysis, we would find out that KPOP things is getting popular at the last decade with some artists or musicians such as BlackPink, BTS, etc. Then, in the US Youtube trending 2017, music got the most viewed category than the others. So we can conclude that US residents loved listening and watching music videos on Youtube in 2017.