Abstract

The data we’re working is USvideos.csv, a dataset scraped from Youtube’s US Trending Videos. By analizing this Trending Videos data, we will uncover how these videos were able to reach such massive audience. We’ll also take a look and see if there were any similarities between the videos that did well on the platform.

##  [1] "trending_date"          "title"                  "channel_title"         
##  [4] "category_id"            "publish_time"           "views"                 
##  [7] "likes"                  "dislikes"               "comment_count"         
## [10] "comments_disabled"      "ratings_disabled"       "video_error_or_removed"

I will use visualization techniques to gain such insights from the trending videos data: 1. Does the video publishing time affect its popularity? If so, when is the best time to publish a video? 2. Does user engagement relate to video’s popularity? If so, does it tend to has a positive or negative engagement? 3. Who were the most prolific producers of trending videos in recent weeks?

We will use two plotting systems for our task:
- Base plot for a quick and simple exploratory visualization - ggplot2 for uncovering more complex pattern & for producing explanatory visualization

## 'data.frame':    13400 obs. of  12 variables:
##  $ trending_date         : Factor w/ 67 levels "17.01.12","17.02.12",..: 14 14 14 14 14 14 14 14 14 14 ...
##  $ title                 : Factor w/ 2986 levels "'I have dad moves': Barack Obama discusses dancing on David Letterman's new Netflix show",..: 2802 2574 2081 1903 1231 89 2164 143 2482 2920 ...
##  $ channel_title         : Factor w/ 1408 levels "_¢_Á_\235","“÷\201\220µ_‘⬓_\220 Korean Englishman",..: 195 686 1046 472 902 559 1063 283 6 1358 ...
##  $ category_id           : int  22 24 23 24 24 28 24 28 1 25 ...
##  $ publish_time          : Factor w/ 2903 levels "2008-04-05T18:22:40.000Z",..: 302 271 255 275 253 307 240 258 281 279 ...
##  $ views                 : int  748374 2418783 3191434 343168 2095731 119180 2103417 817732 826059 256426 ...
##  $ likes                 : int  57527 97185 146033 10172 132235 9763 15993 23663 3543 12654 ...
##  $ dislikes              : int  2966 6146 5339 666 1989 511 2445 778 119 1363 ...
##  $ comment_count         : int  15954 12703 8181 2146 17518 1434 1970 3432 340 2368 ...
##  $ comments_disabled     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ ratings_disabled      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ video_error_or_removed: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...

Data Pre-Processing

Working with date/time

Date-Time Conversion

In order to answer our first question (Does the video publishing time affect its popularity?), we need to explore our datetime data more. Looking back to the structure of our dataset, there are two variables in our dataset which related to datetime data; trending_date and publish_date.

YEAR
%Y = YYYY
%y = YY

MONTH
%B = month name e.g. March
%b = month name(abbreviation) e.g. Mar
%m = 2 digits mo e.g. 03
%M = 1 digit mo e.g. 3

DAY
%A = weekday e.g. Friday
%d = weekday digit.

But now, we’ll use an easier alternative to work with date-time data, and that is through the use of lubridate. lubridate provides tools that make it easier to parse and manipulate dates:

## [1] 18.21.01 18.21.01 18.21.01 18.21.01 18.21.01 18.21.01
## 67 Levels: 17.01.12 17.02.12 17.03.12 17.04.12 17.05.12 17.06.12 ... 18.21.01
## Date in ISO8601 format; converting timezone from UTC to "America/New_York".
##   trending_date                                                          title
## 1    2017-11-14                             WE WANT TO TALK ABOUT OUR MARRIAGE
## 2    2017-11-14 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
## 3    2017-11-14          Racist Superman | Rudy Mancuso, King Bach & Lele Pons
## 4    2017-11-14                               Nickelback Lyrics: Real or Fake?
## 5    2017-11-14                                       I Dare You: GOING BALD!?
## 6    2017-11-14                                          2 Weeks with iPhone X
##           channel_title category_id        publish_time   views  likes dislikes
## 1          CaseyNeistat          22 2017-11-13 12:13:01  748374  57527     2966
## 2       LastWeekTonight          24 2017-11-13 02:30:00 2418783  97185     6146
## 3          Rudy Mancuso          23 2017-11-12 14:05:24 3191434 146033     5339
## 4 Good Mythical Morning          24 2017-11-13 06:00:04  343168  10172      666
## 5              nigahiga          24 2017-11-12 13:01:41 2095731 132235     1989
## 6              iJustine          28 2017-11-13 14:07:23  119180   9763      511
##   comment_count comments_disabled ratings_disabled video_error_or_removed
## 1         15954             FALSE            FALSE                  FALSE
## 2         12703             FALSE            FALSE                  FALSE
## 3          8181             FALSE            FALSE                  FALSE
## 4          2146             FALSE            FALSE                  FALSE
## 5         17518             FALSE            FALSE                  FALSE
## 6          1434             FALSE            FALSE                  FALSE

Time-zone reference

Categorizing hour

##  [1] 12  2 14  6 13  0 16  9  8 21 22 15 11 10 17  7 20 18 19  1  4  3 23  5
## [1] 8AM to 3PM  12AM to 7AM 4PM to 11PM
## Levels: 12AM to 7AM 4PM to 11PM 8AM to 3PM
## [1] 8AM to 3PM  12AM to 7AM 8AM to 3PM  12AM to 7AM 8AM to 3PM  8AM to 3PM 
## Levels: 12AM to 7AM < 8AM to 3PM < 4PM to 11PM

Quick Exploration with R’s Base plot

##   trending_date                                                          title
## 1    2017-11-14                             WE WANT TO TALK ABOUT OUR MARRIAGE
## 2    2017-11-14 The Trump Presidency: Last Week Tonight with John Oliver (HBO)
## 3    2017-11-14          Racist Superman | Rudy Mancuso, King Bach & Lele Pons
## 4    2017-11-14                               Nickelback Lyrics: Real or Fake?
## 5    2017-11-14                                       I Dare You: GOING BALD!?
## 6    2017-11-14                                          2 Weeks with iPhone X
##           channel_title            category_id        publish_time   views
## 1          CaseyNeistat       People and Blogs 2017-11-13 12:13:01  748374
## 2       LastWeekTonight          Entertainment 2017-11-13 02:30:00 2418783
## 3          Rudy Mancuso                 Comedy 2017-11-12 14:05:24 3191434
## 4 Good Mythical Morning          Entertainment 2017-11-13 06:00:04  343168
## 5              nigahiga          Entertainment 2017-11-12 13:01:41 2095731
## 6              iJustine Science and Technology 2017-11-13 14:07:23  119180
##    likes dislikes comment_count comments_disabled ratings_disabled
## 1  57527     2966         15954             FALSE            FALSE
## 2  97185     6146         12703             FALSE            FALSE
## 3 146033     5339          8181             FALSE            FALSE
## 4  10172      666          2146             FALSE            FALSE
## 5 132235     1989         17518             FALSE            FALSE
## 6   9763      511          1434             FALSE            FALSE
##   video_error_or_removed trending_dow trending_month publish_wday publish_hour
## 1                  FALSE      Tuesday            Nov       Monday           12
## 2                  FALSE      Tuesday            Nov       Monday            2
## 3                  FALSE      Tuesday            Nov       Sunday           14
## 4                  FALSE      Tuesday            Nov       Monday            6
## 5                  FALSE      Tuesday            Nov       Sunday           13
## 6                  FALSE      Tuesday            Nov       Monday           14
##   publish_when
## 1   8AM to 3PM
## 2  12AM to 7AM
## 3   8AM to 3PM
## 4  12AM to 7AM
## 5   8AM to 3PM
## 6   8AM to 3PM

How High is the Engagement?

Say, one of the content we want to publish have people who interested in “Autos and Vehicles”, “Gaming”, and “Travel and Events” as our target audience. So let’s subset our data based on these particular categories:

  • The higher the number of views in a video, the higher the number of comments on the video
  • The “Gaming” category has the highest views andcomment_count
  • From the Gaming category, there are some videos with high views' but few comment_count`, and vice versa

Exploratory Viz

Pre-Campaign Analysis

  1. Amongst Automotive, Gaming & Travel, which category shows highest likeability (likesperviews)?
  2. Does the publish time affect likeability?
##     trending_date                                                        title
## 31     2017-11-14 I TOOK THE $3,000,000 LAMBO TO CARMAX! They offered me......
## 35     2017-11-14       New Emirates First Class Suite | Boeing 777 | Emirates
## 59     2017-11-14                                  Train Swipes Parked Vehicle
## 132    2017-11-14                         L.A. Noire - Nintendo Switch Trailer
## 164    2017-11-14                 Caterham Chris Hoy 60 Second Donut Challenge
## 198    2017-11-14          Inside Keanu Reeves' Custom Motorcycle Shop | WIRED
##     channel_title        category_id        publish_time  views likes dislikes
## 31    hp_overload Autos and Vehicles 2017-11-12 20:43:12  98378  4035      495
## 35       Emirates  Travel and Events 2017-11-12 00:55:42 141148  1661       70
## 59       ViralHog Autos and Vehicles 2017-11-12 19:46:11   7265    89        8
## 132      Nintendo             Gaming 2017-11-09 14:59:48 154872  7683      164
## 164 Caterham Cars Autos and Vehicles 2017-11-09 04:59:31   4850    22        1
## 198         WIRED Autos and Vehicles 2017-11-08 10:00:27 704363 16352      224
##     comment_count comments_disabled ratings_disabled video_error_or_removed
## 31            486             FALSE            FALSE                  FALSE
## 35            236             FALSE            FALSE                  FALSE
## 59             22             FALSE            FALSE                  FALSE
## 132          1734             FALSE            FALSE                  FALSE
## 164             1             FALSE            FALSE                  FALSE
## 198           841             FALSE            FALSE                  FALSE
##     trending_dow trending_month publish_wday publish_hour publish_when
## 31       Tuesday            Nov       Sunday           20  4PM to 11PM
## 35       Tuesday            Nov       Sunday            0  12AM to 7AM
## 59       Tuesday            Nov       Sunday           19  4PM to 11PM
## 132      Tuesday            Nov     Thursday           14   8AM to 3PM
## 164      Tuesday            Nov     Thursday            4  12AM to 7AM
## 198      Tuesday            Nov    Wednesday           10   8AM to 3PM
##      likesratio dislikesratio commentsratio
## 31  0.041015268  0.0050316128  0.0049401289
## 35  0.011767790  0.0004959333  0.0016720039
## 59  0.012250516  0.0011011700  0.0030282175
## 132 0.049608709  0.0010589390  0.0111963428
## 164 0.004536082  0.0002061856  0.0002061856
## 198 0.023215302  0.0003180178  0.0011939866

Visualizing comparison with barchart

Multivariate Barchart

How positive is the engagement?

Data transformation

##     trending_date
## 10     2017-11-14
## 11     2017-11-14
## 24     2017-11-14
## 53     2017-11-14
## 79     2017-11-14
## 103    2017-11-14
##                                                                                      title
## 10                                 Why the rise of the robots won‰Ûªt mean the end of work
## 11  Dion Lewis' 103-Yd Kick Return TD vs. Denver! | Can't-Miss Play | NFL Wk 10 Highlights
## 24                     What $4,800 Will Get You In NYC | Sweet Digs Home Tour | Refinery29
## 53                               Wildest Superstar distractions: WWE Top 10, Nov. 11, 2017
## 79                                           Jason Momoa & Lisa Bonet: Love at First Sight
## 103                                 Mark Wahlberg's Kids Use Him for His Celeb Connections
##                            channel_title       category_id        publish_time
## 10                                   Vox News and Politics 2017-11-13 08:45:16
## 11                                   NFL            Sports 2017-11-12 21:05:26
## 24                            Refinery29   Howto and Style 2017-11-12 11:00:01
## 53                                   WWE            Sports 2017-11-11 10:00:00
## 79  The Late Late Show with James Corden     Entertainment 2017-11-10 04:35:00
## 103          Late Night with Seth Meyers            Comedy 2017-11-10 06:00:04
##       views likes dislikes comment_count comments_disabled ratings_disabled
## 10   256426 12654     1363          2368             FALSE            FALSE
## 11    81377   655       25           177             FALSE            FALSE
## 24   145921  1707      578           673             FALSE            FALSE
## 53  1044813 15397      838          1100             FALSE            FALSE
## 79  1497519 15504      353          1084             FALSE            FALSE
## 103  225286  1731      193           206             FALSE            FALSE
##     video_error_or_removed trending_dow trending_month publish_wday
## 10                   FALSE      Tuesday            Nov       Monday
## 11                   FALSE      Tuesday            Nov       Sunday
## 24                   FALSE      Tuesday            Nov       Sunday
## 53                   FALSE      Tuesday            Nov     Saturday
## 79                   FALSE      Tuesday            Nov       Friday
## 103                  FALSE      Tuesday            Nov       Friday
##     publish_hour publish_when  likesratio dislikesratio commentsratio
## 10             8   8AM to 3PM 0.049347570  0.0053153736  0.0092346330
## 11            21  4PM to 11PM 0.008048957  0.0003072121  0.0021750617
## 24            11   8AM to 3PM 0.011698111  0.0039610474  0.0046120846
## 53            10   8AM to 3PM 0.014736608  0.0008020574  0.0010528200
## 79             4  12AM to 7AM 0.010353124  0.0002357232  0.0007238639
## 103            6  12AM to 7AM 0.007683567  0.0008566888  0.0009143933
Data Aggregation`
##                       channel likesratio dislikesratio
## 1                         CNN 0.01253842  0.0062069224
## 2                        ESPN 0.01016376  0.0007641081
## 3              First We Feast 0.04233054  0.0014632845
## 4           Jimmy Kimmel Live 0.02053525  0.0007489192
## 5 Late Night with Seth Meyers 0.01392779  0.0010047001
## 6                         NBA 0.01883015  0.0032860873
## # A tibble: 6 x 3
##   channel        name             value
##   <fct>          <chr>            <dbl>
## 1 CNN            likesratio    0.0125  
## 2 CNN            dislikesratio 0.00621 
## 3 ESPN           likesratio    0.0102  
## 4 ESPN           dislikesratio 0.000764
## 5 First We Feast likesratio    0.0423  
## 6 First We Feast dislikesratio 0.00146

Conclusion

From this analysis, we can derive several conclusion :

  1. “Music” has the highest likesratio value for variance.

  2. “Sports” and “Entertainment” have the most outliers.

  3. Overall, the `likes’ of the videos in the" Animals and Animals "category are spread out normally.

  4. From the boxplot analysis, we can conclude that publishing time highly affect the user engagement rates, and the best time to publish a video is between 8AM to 3PM.

  5. The higher the number of views in a video, the higher the number of comments on the video The “Gaming” category has the highest views andcomment_count.

  6. By comparing the likeability analysis and the popularity analysis chart, we can see that the more popular a video, the more it tends to have positive user engagement.

  7. The top 3 most prolific producers of trending videos in the week is “Refinery29”, “The Tonight Show Starring Jimmy Fallon”, and “Vox”.