This project is divided into 3 major parts which will be explained through the document.

Part B: Analysing Facebook Promotional Video Campaigns

Aim Of Part B

The aim of Part B is to identify, from the set of available campaigns, the best and worst campaigns based on the following metrics:

  1. Cost per reach
  2. Cost per engagement
  3. Content Quality
  4. Average percent of video watched per engagement

Loading Data

#loading data

OML_TS<-read_xlsx("OML_TS.xlsx")

Data Manipulation

In this section, I will create data frames and order them for future use. Respective data frames will be explained as required.

Note: To calculate the total engagement, I have added only the positive reactions (i.e. like, haha, wow, love) to the number of comments and shares. In certain thought-provoking and political videos, the “sad” react and “angry” react would also be meaningful, but we know that these are simply video promotions, and hence, it does not make sense to include negative reactions in total engagement.

#adding "cost per reach", "total engagement", and "cost per engagement" fields
OML_TS <- OML_TS %>% mutate(cost_per_reach = amount/Reach, tot_eng = shares + comments + pos_reactions, cost_per_eng = amount/tot_eng, cost_per_view = amount/three_sec_views)

#arranging the data frame by the "cost per reach" field
cheap_reach<-OML_TS[order(OML_TS$cost_per_reach),]
cheap_reach <- select(cheap_reach, campaign_name, cost_per_reach)

#arranging the data frame by "cost per engagement" field
cheap_eng<-OML_TS[order(OML_TS$cost_per_eng),] 

#calculating the ratio of thirty-second-views by ten-second views
OML_TS<-OML_TS %>% mutate(thirty_ten = thirty_sec_views/ten_sec_views)

#calculating the percent of video watched per unit engagement
OML_TS <- mutate(OML_TS, shares_per_view = shares/ten_sec_views)

shares_p_view<-OML_TS[order(OML_TS$shares_per_view),] 
shares_p_view<-select(shares_p_view, campaign_name, shares_per_view)

thirty_ten_eng<-OML_TS[order(OML_TS$thirty_ten),] 
thirty_ten_eng<-select(thirty_ten_eng, campaign_name, thirty_ten)

video_watched<-OML_TS[order(OML_TS$video_percent_watched),] 
video_watched<-select(video_watched, campaign_name, video_percent_watched)

Cost per Reach

In this section, I will explore the costs per reach of all the campaigns to identify the cheapest and the most expensive ones.

ggplot(OML_TS, aes(campaign_name, cost_per_reach)) + geom_point(colour = "blue") + labs(x = "Campaign Name", y = "Cost/Reach")

cheap_reach
## # A tibble: 17 x 2
##    campaign_name cost_per_reach
##    <chr>                  <dbl>
##  1 I                    0.00283
##  2 H                    0.00457
##  3 D                    0.00738
##  4 J                    0.00823
##  5 F                    0.00903
##  6 E                    0.00957
##  7 A                    0.0117 
##  8 P                    0.0121 
##  9 B                    0.0128 
## 10 G                    0.0154 
## 11 Q                    0.0176 
## 12 L                    0.0200 
## 13 N                    0.0202 
## 14 K                    0.0206 
## 15 C                    0.0209 
## 16 O                    0.0276 
## 17 M                    0.0285

It appears that campaigns D, H and I have the cheapest reaches, while campaigns M, O, and C have considerably higher costs per reach.

Cost per Engagement

Here, I will attempt to address which campaign has the cheapest engagement, and which one, the costliest.

ggplot(OML_TS, aes(campaign_name, cost_per_eng)) + geom_point(colour = "blue") + labs(x ="Campaign Name", y = "Cost/Engagement")

cheap_eng %>% select(campaign_name, cost_per_eng)
## # A tibble: 17 x 2
##    campaign_name cost_per_eng
##    <chr>                <dbl>
##  1 I                    0.165
##  2 H                    0.489
##  3 J                    0.700
##  4 E                    0.921
##  5 D                    1.01 
##  6 F                    1.06 
##  7 P                    1.58 
##  8 B                    1.60 
##  9 K                    2.03 
## 10 A                    2.05 
## 11 C                    2.21 
## 12 N                    2.41 
## 13 L                    2.52 
## 14 G                    2.85 
## 15 M                    3.24 
## 16 O                    3.43 
## 17 Q                    4.17

I, H, and J are the cheapest campaigns with respect to engagement, and Q, O and M, the costliest.

It is interesting to note that O and M are also among the least 3 costs per reach, while H and I are in the highest 3 costs per reach.

Cost Per 3-Second-View

I want to investigate how expensive each view is in different campaigns.

ggplot(OML_TS, aes(campaign_name, cost_per_view)) + geom_point(colour="blue") + labs(x = "Campaign Name", y = "Cost/3-sec-view")

cost_p_view<-OML_TS[order(OML_TS$cost_per_view),]
cost_p_view 
## # A tibble: 17 x 17
##    campaign_name   Reach amount avg_watch_time video_percent_watched
##    <chr>           <dbl>  <dbl>          <dbl>                 <dbl>
##  1 I             1768376   5000             16                 10.9 
##  2 H             1093412   5000             12                  8.2 
##  3 D              677633   5000             18                 11.6 
##  4 F              553417   5000             27                  5.53
##  5 E              595320   5700             34                  3.72
##  6 G              324869   5000             26                  5.71
##  7 J              607497   5000             15                  4.34
##  8 Q              567135  10000             32                  1.06
##  9 P              268717   3250             12                  6.15
## 10 A              426501   5000             38                  3.04
## 11 B              390876   5000             13                  4.97
## 12 L              500542  10000             26                  4.37
## 13 N              494105  10000             18                  4.91
## 14 O              362602  10000             14                  3.86
## 15 K              242472   5000              7                  2.07
## 16 C              239803   5000             10                  4.88
## 17 M              351346  10000             15                  3.74
## # ... with 12 more variables: three_sec_views <dbl>, ten_sec_views <dbl>,
## #   thirty_sec_views <dbl>, shares <dbl>, comments <dbl>,
## #   pos_reactions <dbl>, cost_per_reach <dbl>, tot_eng <dbl>,
## #   cost_per_eng <dbl>, cost_per_view <dbl>, thirty_ten <dbl>,
## #   shares_per_view <dbl>

Campaigns I, H and D are the cheapest w.r.t this metric.

Content Quality

For this analysis, I will measure the engagement factor of video campaigns by the following metrics:

  1. Average percentage of the video watched: Ideally, I would like more information on how much percentage of the video each viewer watched to determine if the median or the mean is a better measure of central tendency, but since I do not have that data, I will make use of the mean.

  2. Retention factor: I will define the retention factor as 30-second-views/ten-second-views i.e., out of those viewers who watched the first ten seconds of the video, how many watched 30 seconds of it?

I will ignore the 3-second-views data because it includes auto-play views too, and that is not a good indicator of how many people consciously made the choice to watch the videos. The retention factor measured above can give us a good idea of how engaging a particular video is because it tells us the % of viewers who were engaged enough within the first 10 seconds that continued to watch for the next 20 seconds.

Plot of video percent watched in each campaign
ggplot(OML_TS, aes(campaign_name, video_percent_watched)) + geom_point(colour = "blue") + labs(x = "Campaign Name", y = "Average Percent of Video Watched")

video_watched
## # A tibble: 17 x 2
##    campaign_name video_percent_watched
##    <chr>                         <dbl>
##  1 Q                              1.06
##  2 K                              2.07
##  3 A                              3.04
##  4 E                              3.72
##  5 M                              3.74
##  6 O                              3.86
##  7 J                              4.34
##  8 L                              4.37
##  9 C                              4.88
## 10 N                              4.91
## 11 B                              4.97
## 12 F                              5.53
## 13 G                              5.71
## 14 P                              6.15
## 15 H                              8.2 
## 16 I                             10.9 
## 17 D                             11.6

I can derive 2 key insights from this graph:

  1. Campaign I has the second highest percentage of video watched in all the campaigns and it is among the cheapest campaigns.

  2. Campaign D has the highest percent of video watched.

Plot of retention factor of each campaign
 ggplot(OML_TS, aes(campaign_name, thirty_ten)) + geom_point(colour = "blue") + labs(x = "Campaign Name", y = "Retention Factor")

thirty_ten_eng
## # A tibble: 17 x 2
##    campaign_name thirty_ten
##    <chr>              <dbl>
##  1 K                  0.507
##  2 Q                  0.512
##  3 E                  0.554
##  4 F                  0.576
##  5 O                  0.585
##  6 P                  0.595
##  7 M                  0.598
##  8 C                  0.622
##  9 G                  0.623
## 10 H                  0.623
## 11 B                  0.634
## 12 A                  0.657
## 13 J                  0.662
## 14 N                  0.680
## 15 L                  0.681
## 16 I                  0.701
## 17 D                  0.795

Campaigns D, I, N and L have the most retentive content.

Further Analysis

People only share on Facebook the content that they find incredibly appealing. Therefore, I’m interesed to find out the number of shares per view of each campaign. It would provide further insight to the quality of the content.

Again, I will use the ten-second-view metric to weed out auto-play views.

Plot of number of shares per view
ggplot(OML_TS, aes(campaign_name, shares_per_view)) + geom_point(colour = "blue") + labs(x = "Campaign Name", y = "Shares/View")

shares_p_view
## # A tibble: 17 x 2
##    campaign_name shares_per_view
##    <chr>                   <dbl>
##  1 C                     0.00194
##  2 A                     0.00463
##  3 M                     0.00502
##  4 G                     0.00553
##  5 L                     0.00687
##  6 Q                     0.00734
##  7 E                     0.00751
##  8 D                     0.00814
##  9 O                     0.00835
## 10 B                     0.0109 
## 11 N                     0.0131 
## 12 F                     0.0132 
## 13 H                     0.0191 
## 14 P                     0.0273 
## 15 K                     0.0313 
## 16 J                     0.0485 
## 17 I                     0.0636

Campaigns P, K, J, and I have the highest shares per 10-second-view, while C, A, and M, the least.

Conclusion of Part B

Based on the 5 analyses, I can draw the following insights:

  1. Campaign I is the best campaign: Campaign I has ranked in the top 2 of all the metrics. Particularly, it is the cheapest in terms of reach & engagement, and it has also garnered most shares per 10 second view.

  2. Campaign D has also performed very well: It has the highest retention factor & average percent of video watched (which means it is very engaging), quite cheap as well.

  3. Campaign H is the 3rd best campaign: This campaign is the 2nd cheapest in reach, engagement as well as reach. Additionally, it has the 3rd highest video percent watched and 5th highest shares per view.

  4. Campaign O has performed poorly: The campaign has ranked in the costliest 3 in terms of all, reach, engagement and view. It has the 6th lowest average video percent watched, and 5th lowest retention factor.

  5. Campaign Q has poor engagement: Campaign K has high costs of reach and views, and lowest retention factor. It also has the 2nd least percent of video watched.

  6. Campaign M is another bad apple: Campaign M has turned out to be a very costly campaign (with the highest cost per reach and view). It is also the 3rd worst in terms of shares per view and has 6th least amount of video percent watched.

Thus, we have our 3 best and 3 worst campaigns based on 6 different metrics.

Part C: What More Information Is Required?

This project is formulated based on the data provided by OML. Even though I have been able to produce some interesting insights, my analysis would have been more robust had I received data on the following parameters.

Data For Part A

For YouTube videos, rankings are important. There are a few metrics that indicate how well a video can be ranked. I’ve discussed the following metrics below*:

  1. Length of each video: The average length of a video on the front page is 14 minutes 50 seconds. Because the YouTube audience visits the site with an intention to watch long videos, as a general thumb rule, the longer the video, the better is its ranking. After all, YouTube wants to increase the time viewers spend on the site. (https://youtube-creators.googleblog.com/2012/08/youtube-now-why-we-focus-on-watch-time.html)

  2. The number of shares per video: Again, greater the number of shares, better is the video’s ranking.

  3. Number of channel subsribers per video: This information can allow us to measure the loyalty of the audience. Also, it can tell us whether a video increases or decreases the number of subsribers.

these metrics are sourced from https://backlinko.com/youtube-ranking-factors where 1.3 million YouTube videos were analysed.

Data For Part B

The process of identifying the best and the worst campaigns would be aided by the following metrics:

  1. Post Clicks
  2. Percent of auto-play vs. click-to-play videos
  3. Number of Impressions: to measure how widely the content is being circulated
  4. Negative Feedback: hide posts, hide all posts, report as spam, unlike page
  5. Video watched at 100%: This tell us how many people were engaged enough to watch the whole video as well as those who didn’t watch the whole video but were interested in the video’s concluding part.