MATH2406: Assessment 3 Business case and presentation

WNW - An analysis

Heinzy Johan Bernt 3923824

Introduction

My Intro - Competition between streaming giants is high as they fight for user engagement. - It is paramount that content is optimally targeted to increase user engagement which means more hours of streaming and plenty of opportunities for advertising revenue and company profit. - Better recommendation systems improve user engagement - Key metric is to improve the average hours watched per user per day. - Executives want to know if the new recommendation engine algorithm is worth rolling out to all subscribers. - We analyse the results from a recent change they made in their algorithm and present the results to the executive team.

Problem Statement

My problem statement - Executives want to know if the new recommendation engine algorithm is worth rolling out to all subscribers. - We analyse the results from a recent change they made in their algorithm and present the results to the executive team.

My Data These are the important variables in the data set that I will investigate as part of the analysis - hours_watched: duration of viewing hours (daily). Our goal is to see these numbers increase - gender: Male (M) and Female (F). Could be useful in determining target audiences. - age: age of the subscriber in years. Also could be useful in determining target audience - social_metric: a combined metric based on viewing habits (factored from 1 to 4) - time_since_signup: number of months since the customer signed up. - demographic: An assigned demographic number which could be important in targeting demograpics (factored from 1 to 4) - Data has Group A the control group, Group B the treatment group and the pre-implementation date of the recommender engine can be used to determine population mean and standard deviation.

Import the data

WNW_data <- read.csv("streaming_data.csv")
head(WNW_data)
summary(WNW_data)
##      date              gender               age        social_metric   
##  Length:1000        Length:1000        Min.   :18.00   Min.   : 0.000  
##  Class :character   Class :character   1st Qu.:28.00   1st Qu.: 2.000  
##  Mode  :character   Mode  :character   Median :36.00   Median : 5.000  
##                                        Mean   :36.49   Mean   : 4.911  
##                                        3rd Qu.:46.00   3rd Qu.: 8.000  
##                                        Max.   :55.00   Max.   :10.000  
##  time_since_signup  demographic       group           hours_watched  
##  Min.   : 0.00     Min.   :1.000   Length:1000        Min.   :0.500  
##  1st Qu.: 5.70     1st Qu.:2.000   Class :character   1st Qu.:3.530  
##  Median :11.80     Median :3.000   Mode  :character   Median :4.415  
##  Mean   :11.97     Mean   :2.603                      Mean   :4.393  
##  3rd Qu.:18.70     3rd Qu.:4.000                      3rd Qu.:5.322  
##  Max.   :24.00     Max.   :4.000                      Max.   :8.300
#View(WNW1_data)

Since the campaign ran from 18/07 onwards, the data may be bias if we include earlier entries For example, content may have been different resulting in higher or lower hours watched for the period before the campaign.

# convert date
WNW_data$date <- as.Date(WNW_data$date, '%d/%m')
WNW_data$date <- format(WNW_data$date, format='%d-%m')
# filter date from start of campaign 18/07
WNW1_data <- WNW_data %>% filter(date >= '18-07')

Get summary statistics

summary(WNW1_data)
##      date              gender               age        social_metric   
##  Length:452         Length:452         Min.   :18.00   Min.   : 0.000  
##  Class :character   Class :character   1st Qu.:27.00   1st Qu.: 2.000  
##  Mode  :character   Mode  :character   Median :36.00   Median : 5.000  
##                                        Mean   :36.53   Mean   : 4.991  
##                                        3rd Qu.:46.00   3rd Qu.: 8.000  
##                                        Max.   :55.00   Max.   :10.000  
##  time_since_signup  demographic       group           hours_watched  
##  Min.   : 0.00     Min.   :1.000   Length:452         Min.   :0.500  
##  1st Qu.: 5.50     1st Qu.:2.000   Class :character   1st Qu.:3.664  
##  Median :11.10     Median :3.000   Mode  :character   Median :4.543  
##  Mean   :11.66     Mean   :2.624                      Mean   :4.510  
##  3rd Qu.:18.62     3rd Qu.:4.000                      3rd Qu.:5.410  
##  Max.   :24.00     Max.   :4.000                      Max.   :8.300

Check for outliers

boxplot1 <- boxplot(WNW1_data$hours_watched, ylab="Hours watched", main="Hours watched - with outliers")

# Remove outliers

WNW1_data <- WNW1_data %>% filter(hours_watched < 8 & hours_watched > 0.95)
boxplot2 <- boxplot(WNW1_data$hours_watched, ylab="Hours watched", main="Hours watched - without outliers")

# new summary statistics

summary(WNW1_data)
##      date              gender               age        social_metric   
##  Length:447         Length:447         Min.   :18.00   Min.   : 0.000  
##  Class :character   Class :character   1st Qu.:27.00   1st Qu.: 2.000  
##  Mode  :character   Mode  :character   Median :36.00   Median : 5.000  
##                                        Mean   :36.43   Mean   : 5.011  
##                                        3rd Qu.:46.00   3rd Qu.: 8.000  
##                                        Max.   :55.00   Max.   :10.000  
##  time_since_signup  demographic       group           hours_watched  
##  Min.   : 0.00     Min.   :1.000   Length:447         Min.   :1.240  
##  1st Qu.: 5.50     1st Qu.:2.000   Class :character   1st Qu.:3.675  
##  Median :11.10     Median :3.000   Mode  :character   Median :4.550  
##  Mean   :11.67     Mean   :2.624                      Mean   :4.536  
##  3rd Qu.:18.70     3rd Qu.:4.000                      3rd Qu.:5.410  
##  Max.   :24.00     Max.   :4.000                      Max.   :7.930
#preparing group A
WNW_groups <- WNW_data %>%
  select(group)

WNW_groupA <- WNW_groups %>%
  filter(group=="A")

View correlations

Gender

qplot1 <- qplot(WNW1_data$hours_watched, fill=WNW1_data$gender, main = "Distribution of hours watched by gender", xlab='Hours', ylab='Count') 
qplot1

boxplot(WNW1_data$hours_watched~WNW1_data$gender,main="Hours watched by Gender", ylab="Hours watched", xlab="Gender")

Group

boxplot(WNW1_data$hours_watched~WNW1_data$group, main="Hours watched by group A and B", xlab="Group", ylab="Hours watched")

The boxplot shows the IQR and median of the viewing hours split by gender. However, it cannot be determined from this visualization alone whether gender significantly impacts hours of content watched.

Social metric

r_coeff1 <- cor(x=WNW1_data$hours_watched, y=WNW1_data$social_metric, method='pearson')
print(r_coeff1)
## [1] 0.1999463
title_text1 <- paste0('r_xy = ', r_coeff1)
plot(x=WNW1_data$hours_watched, y=WNW1_data$social_metric, main=title_text1)

# time_since_signup

#qplot2 <- qplot(WNW1_data$hours_watched, fill=WNW1_data$time_since_signup, main = "Distribution of hours watched by time since signup", xlab='Hours', ylab='time since signup') 
#qplot2
r_coeff2 <- cor(x=WNW1_data$hours_watched, y=WNW1_data$time_since_signup, method='pearson')
print(r_coeff2)
## [1] -0.02363155
title_text2 <- paste0('r_xy = ', r_coeff2)
plot(x=WNW1_data$hours_watched, y=WNW1_data$time_since_signup, main=title_text2)

# demographic

r_coeff3 <- cor(x=WNW1_data$hours_watched, y=WNW1_data$demographic, method='pearson')
print(r_coeff3)
## [1] -0.4491744
title_text3 <- paste0('r_xy = ', r_coeff3)
plot(x=WNW1_data$hours_watched, y=WNW1_data$demographic, main=title_text3)

# Age

r_coeff4 <- cor(x=WNW1_data$hours_watched, y=WNW1_data$age, method='pearson')
print(r_coeff4)
## [1] -0.5708165
title_text4 <- paste0('r_xy = ', r_coeff4)
plot(x=WNW1_data$hours_watched, y=WNW1_data$age, main=title_text4)

Pearson Correlation Coefficients The pearson correlation coefficients range in strength from 0 to 1. We have the pearson correlation coefficients for social metric and time since signup against hours watched.

We have the pearson correlation coefficients for demographic and age against hours watched. The pearson correlation coefficient is negative suggesting an inverse relationship between demographic and age. I.e. older subscribers watched fewer hours. The pearson correlation coefficient is stronger than those before but not necessarily significant.

It appears that there is not much correlation here but we will need to delve further into the analysis

Build multiple regression model

lm <- lm(formula = hours_watched ~ social_metric + gender + age + time_since_signup + demographic + group, data=WNW1_data)
summary(lm)
## 
## Call:
## lm(formula = hours_watched ~ social_metric + gender + age + time_since_signup + 
##     demographic + group, data = WNW1_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.73382 -0.58671  0.01133  0.67067  2.74870 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        6.409348   0.219078  29.256  < 2e-16 ***
## social_metric      0.104013   0.015801   6.583 1.32e-10 ***
## genderM            0.146181   0.134801   1.084   0.2788    
## age               -0.060212   0.008744  -6.886 1.99e-11 ***
## time_since_signup  0.003927   0.006676   0.588   0.5567    
## demographic       -0.189541   0.095391  -1.987   0.0475 *  
## groupB             0.617118   0.111311   5.544 5.10e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.006 on 440 degrees of freedom
## Multiple R-squared:  0.4293, Adjusted R-squared:  0.4215 
## F-statistic: 55.15 on 6 and 440 DF,  p-value: < 2.2e-16

We can confirm that the correlations between the variables social_metric, age, and demographic and hours watched are statistically significant.

build LM with only correlated variables and show graph

lm2 <- lm(formula = hours_watched ~ social_metric + age + demographic + group, data=WNW1_data)
summary(lm2)
## 
## Call:
## lm(formula = hours_watched ~ social_metric + age + demographic + 
##     group, data = WNW1_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6989 -0.5898 -0.0124  0.6648  2.8522 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.554877   0.181299  36.155  < 2e-16 ***
## social_metric  0.104122   0.015788   6.595 1.22e-10 ***
## age           -0.066027   0.006966  -9.478  < 2e-16 ***
## demographic   -0.116212   0.069764  -1.666   0.0965 .  
## groupB         0.632766   0.109908   5.757 1.60e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.005 on 442 degrees of freedom
## Multiple R-squared:  0.4274, Adjusted R-squared:  0.4222 
## F-statistic: 82.48 on 4 and 442 DF,  p-value: < 2.2e-16
# the correlation between demographic and hours watched is inconclusive and requires further investigation

We built regression models to determine the statistical significance of each variable. We firstly included all variables available in the model. The model determined that social metric, age, demographic and group were statistically significant in impacting viewing hours. This means that gender and time since signup were not statistically significant in impacting viewing hours. We then removed gender and time since signup for the regression model on the right. This showed that demographic may or may not be statistically significant in improving viewing hours and needs further analysis.

lm3 <- lm(formula = hours_watched ~ social_metric, data=WNW1_data)
summary(lm3)
## 
## Call:
## lm(formula = hours_watched ~ social_metric, data = WNW1_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5315 -0.8130  0.0237  0.8391  3.3965 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.09905    0.11866  34.545  < 2e-16 ***
## social_metric  0.08724    0.02027   4.305 2.06e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.297 on 445 degrees of freedom
## Multiple R-squared:  0.03998,    Adjusted R-squared:  0.03782 
## F-statistic: 18.53 on 1 and 445 DF,  p-value: 2.057e-05
# the correlation between demographic and hours watched is inconclusive and requires further investigation
plot(hours_watched~social_metric, data=WNW1_data,xlab="social_metric",ylab="Hours watched")
abline(lm3, col="blue")

lm4 <- lm(formula = hours_watched ~ demographic, data=WNW1_data)
summary(lm4)
## 
## Call:
## lm(formula = hours_watched ~ demographic, data = WNW1_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.09346 -0.72298 -0.04346  0.76654  3.05702 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.95204    0.14475   41.12   <2e-16 ***
## demographic -0.53953    0.05087  -10.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.183 on 445 degrees of freedom
## Multiple R-squared:  0.2018, Adjusted R-squared:    0.2 
## F-statistic: 112.5 on 1 and 445 DF,  p-value: < 2.2e-16
# the correlation between demographic and hours watched is inconclusive and requires further investigation
plot(hours_watched~demographic, data=WNW1_data,xlab="Demographic",ylab="Hours watched")
abline(lm4, col="blue")

WNW_social_model <- lm(hours_watched~social_metric, data=WNW1_data)
summary(WNW_social_model)
## 
## Call:
## lm(formula = hours_watched ~ social_metric, data = WNW1_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5315 -0.8130  0.0237  0.8391  3.3965 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.09905    0.11866  34.545  < 2e-16 ***
## social_metric  0.08724    0.02027   4.305 2.06e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.297 on 445 degrees of freedom
## Multiple R-squared:  0.03998,    Adjusted R-squared:  0.03782 
## F-statistic: 18.53 on 1 and 445 DF,  p-value: 2.057e-05
plot(hours_watched~social_metric, data=WNW1_data,xlab="Social metric",ylab="Hours watched")
abline(WNW_social_model, col="blue")

The graphs above show the relationship between the significant variables determined earlier, social metric and demographic. Hours watched increases as the social metric number increases. Whereas hours watched decreases as demographic number increases.

WNW_signup_model <- lm(hours_watched~time_since_signup, data=WNW1_data)

plot(hours_watched~time_since_signup,data=WNW1_data,xlab="Time since signup",ylab="Hours watched")
abline(WNW_signup_model,col="blue")

The graphs above show the ‘not statistically significant’ variables eliminated from the regression model. Visually, we can see that the ab line is nearly horizontal suggesting not much of a relationship between hours watched and time since signup.

Split Groups by demographic for analysis

WNW1_df_A <- WNW1_data %>% subset(group=="A")
WNW1_df_Ad1 <- WNW1_df_A %>% subset(demographic=="1")
WNW1_df_Ad2 <- WNW1_df_A %>% subset(demographic=="2")
WNW1_df_Ad3 <- WNW1_df_A%>% subset(demographic=="3")
WNW1_df_Ad4 <- WNW1_df_A%>% subset(demographic=="4")

WNW1_df_B <- WNW1_data %>% subset(group=="B")
WNW1_df_Bd1 <- WNW1_df_B %>% subset(demographic=="1")
WNW1_df_Bd2 <- WNW1_df_B %>% subset(demographic=="2")
WNW1_df_Bd3 <- WNW1_df_B %>% subset(demographic=="3")
WNW1_df_Bd4 <- WNW1_df_B %>% subset(demographic=="4")

use ## Multi-dimensional visualizations

# Visualising demographic groupings by age
int_breaks_rounded <- function(x, n = 10) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]

gg <- ggplot() + geom_point(position=position_jitter(width=0.45, height=0.45),
                            aes(x=WNW1_data$age, y=WNW1_data$hours_watched, colour=(WNW1_data$demographic)))
gg <- gg + scale_y_continuous(breaks = int_breaks_rounded) + labs(x='Age', y='Hours watched', colour='Demographic')
gg <- gg + ggtitle("Full dataset")
gg

Here we have multi-dimensional charts between hours watched, age and demographic. As the blue dots become lighter as we move rightwards along the age bar it confirms to us that demographic is related to age.

# Visualising demographic groupings by age Group A
int_breaks_rounded <- function(x, n = 10) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]

gga <- ggplot() + geom_point(position=position_jitter(width=0.45, height=0.45),
                            aes(x=WNW1_df_A$age, y=WNW1_df_A$hours_watched, colour=(WNW1_df_A$demographic)))
gga <- gga + scale_y_continuous(breaks = int_breaks_rounded) + labs(x='Age', y='Hours watched', colour='Demographic')
gga <- gga + ggtitle("Control Group A")
gga

# Visualising demographic groupings by age Group B
int_breaks_rounded <- function(x, n = 10) pretty(x, n)[round(pretty(x, n),1) %% 1 == 0]

ggb <- ggplot() + geom_point(position=position_jitter(width=0.45, height=0.45),
                            aes(x=WNW1_df_B$age, y=WNW1_df_B$hours_watched, colour=(WNW1_df_B$demographic)))
ggb <- ggb + scale_y_continuous(breaks = int_breaks_rounded) + labs(x='Age', y='Hours watched', colour='Demographic')
ggb <- ggb + ggtitle("Treatment Group B")
ggb

Here we have multi-dimensional charts between hours watched, age and demographic for Group A and Group B The effect that hours watched declines for older users is seen in both the Control A group and the Treatment group B.

Another look at age

WNW_age_model <- lm(hours_watched ~age, data=WNW1_data)
summary(WNW_age_model)
## 
## Call:
## lm(formula = hours_watched ~ age, data = WNW1_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.71609 -0.75523  0.01758  0.74342  2.73830 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.06854    0.18017   39.23   <2e-16 ***
## age         -0.06951    0.00474  -14.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.087 on 445 degrees of freedom
## Multiple R-squared:  0.3258, Adjusted R-squared:  0.3243 
## F-statistic: 215.1 on 1 and 445 DF,  p-value: < 2.2e-16
plot(hours_watched~age, data=WNW1_data,xlab="Age",ylab="Hours watched")
abline(WNW_age_model, col="blue")

The scatter plot with a line confirms that there is a negative relationship here between age and hours watched. It can’t be seen from these visuals if gender has an impact though.

check group balances

# count the numbers in each demographic category based on the A/B group
check_a_df <- WNW1_data %>% filter(group=='A') %>% select(gender, demographic,age) %>% 
  group_by(gender, demographic,age) %>% mutate(n_a=n()) %>% distinct()

check_b_df<- WNW1_data %>% filter(group=='B') %>% select(gender, demographic,age) %>% 
  group_by(gender, demographic,age) %>% mutate(n_b=n()) %>% distinct()

# calculate the difference in proportions
n_total_a <- sum(WNW1_data$group=='A')
n_total_b <- sum(WNW1_data$group=='B')

check_a_df$p_a <- check_a_df$n_a / n_total_a
check_b_df$p_b <- check_b_df$n_b / n_total_b

check_df <- inner_join(check_a_df, check_b_df)

check_df$diff <- check_df$p_a - check_df$p_b

# Plot the normal q-q plot
qqnorm(y=check_df$diff)

# if there is no bias aside from sampling noise then the difference should be small and normally distributed

Based on the distribution of the QQplot it looks like the groups are drawn from the same population which is a requirement to continue with A/B hypothesis testing. As the CLT holds we can use the z test for later analysis.

Examine the effect size

print("Outcome breakdown")
## [1] "Outcome breakdown"
cond_A <- WNW1_data$group=='A'
print(paste('A:', sum(WNW1_data$hours_watched[cond_A])/sum(cond_A)))
## [1] "A: 4.43544342507645"
cond_B <- WNW1_data$group=='B'
print(paste('B:', sum(WNW1_data$hours_watched[cond_B])/sum(cond_A)))
## [1] "B: 1.76545871559633"

We split the data into group A and group B, and within each group split the data in demographics 1 to 4 (d1, d2, d3, d4). This means there are 8 sub-groups. We then calculated the mean and standard deviation of each sub-group.

# mean and sd
WNW_data_pop <- WNW_data %>% subset(date<="17/07/2020")
WNW_data_mean <- mean(WNW_data_pop$hours_watched)
WNW_data_sd <- sd(WNW_data_pop$hours_watched)

# Calculating group and demographic means
groupA_mean <- mean(WNW1_df_A$hours_watched)
groupAd1_mean <- mean(WNW1_df_Ad1$hours_watched)
groupAd2_mean <- mean(WNW1_df_Ad2$hours_watched)
groupAd3_mean <- mean(WNW1_df_Ad3$hours_watched)
groupAd4_mean <- mean(WNW1_df_Ad4$hours_watched)

groupB_mean <- mean(WNW1_df_B$hours_watched)
groupBd1_mean <- mean(WNW1_df_Bd1$hours_watched)
groupBd2_mean <- mean(WNW1_df_Bd2$hours_watched)
groupBd3_mean <- mean(WNW1_df_Bd3$hours_watched)
groupBd4_mean <- mean(WNW1_df_Bd4$hours_watched)

groupAB_mean_diff <- groupA_mean - groupB_mean
groupABd1_mean_diff <- groupAd1_mean - groupBd1_mean
groupABd2_mean_diff <- groupAd2_mean - groupBd2_mean
groupABd3_mean_diff <- groupAd3_mean - groupBd3_mean
groupABd4_mean_diff <- groupAd4_mean - groupBd4_mean

groupAB_mean_diff
## [1] -0.3754316
groupABd1_mean_diff
## [1] -0.5883983
groupABd2_mean_diff
## [1] -0.595
groupABd3_mean_diff
## [1] -0.5259191
groupABd4_mean_diff
## [1] -0.5917355

Calculate the minimum sample size (mss) for each demographic which will indicate whether a test is valid.

# calc standard deviation
groupB_sd <- sd(WNW1_df_B$hours_watched)
groupBd1_sd <- sd(WNW1_df_Bd1$hours_watched)
groupBd2_sd <- sd(WNW1_df_Bd2$hours_watched)
groupBd3_sd <- sd(WNW1_df_Bd3$hours_watched)
groupBd4_sd <- sd(WNW1_df_Bd4$hours_watched)

mss_gB <- (1.96*groupB_sd/groupAB_mean_diff)^2
mss_gBd1 <- (1.96*groupBd1_sd/groupABd1_mean_diff)^2
mss_gBd2 <- (1.96*groupBd2_sd/groupABd2_mean_diff)^2
mss_gBd3 <- (1.96*groupBd3_sd/groupABd3_mean_diff)^2
mss_gBd4 <- (1.96*groupBd4_sd/groupABd4_mean_diff)^2

size_gB <- length(WNW1_df_B$hours_watched)
size_gBd1 <- length(WNW1_df_Bd1$hours_watched)
size_gBd2 <- length(WNW1_df_Bd2$hours_watched)
size_gBd3 <- length(WNW1_df_Bd3$hours_watched)
size_gBd4 <- length(WNW1_df_Bd4$hours_watched)

size_gB
## [1] 120
mss_gB # valid
## [1] 48.15313
size_gBd1
## [1] 13
mss_gBd1 # valid
## [1] 5.193261
size_gBd2
## [1] 32
mss_gBd2 # valid
## [1] 13.21025
size_gBd3
## [1] 16
mss_gBd3 # not valid 
## [1] 25.55141
size_gBd4
## [1] 59
mss_gBd4 # valid
## [1] 15.14134
size_table <- data.frame(Demographic=(c("d1","d2","d3","d4")),
                         GroupB_Sample=(c(13,32,16,59)),
                         Min_Sample=(c(6,14,26,16)),
                         Validity=(c("Valid","Valid","Not Valid","Valid")))
size_table

Statistical significance tests

We then were able to conduct t tests to determine each demographics significant improvement between the control and treatment group. We also carried out sample size tests to validate whether each result was valid i.e. did we have a large enough sample size in order to be confident in our results.

We are looking for p-values under 0.05 so we can say that we are 95% confident that a particular demographic showed significant improvement in viewing hours. Demographics 1, 2, and 4 showed statistically improved viewing hours which we can be 95% confident in. Demographic 3 did not show statistically significant improvements.

By conducting minimum sample size tests we validated that the above results for d1, d2 and d4 are correct. However, the test showing that demographic 3 did not show improvement was not valid.

#t.test(WNW1_data$hours_watched, mu=WNW_data_mean, alternative=c("greater"), sd=WNW_data_sd, conf.level=0.95)
groupA_mean 
## [1] 4.435443
groupAd1_mean 
## [1] 5.156986
groupAd2_mean 
## [1] 5.11125
groupAd3_mean 
## [1] 3.694706
groupAd4_mean 
## [1] 3.687671
groupB_mean 
## [1] 4.810875
groupBd1_mean 
## [1] 5.745385
groupBd2_mean 
## [1] 5.70625
groupBd3_mean 
## [1] 4.220625
groupBd4_mean
## [1] 4.279407
#calc z score (sample mean - control mean)/ sd then use pnorm(zscore,lower.tail=FALSE)

# prepare sd stat for group A
WNW1_groupA_sd <- sd(WNW1_df_A$hours_watched)
WNW1_groupAd1_sd <- sd(WNW1_df_Ad1$hours_watched)
WNW1_groupAd2_sd <- sd(WNW1_df_Ad2$hours_watched)
WNW1_groupAd3_sd <- sd(WNW1_df_Ad3$hours_watched)
WNW1_groupAd4_sd <- sd(WNW1_df_Ad4$hours_watched)

t.test(WNW1_df_Bd1$hours_watched,mu=groupAd1_mean,alternative = c("greater"),sd=WNW1_groupAd1_sd,conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  WNW1_df_Bd1$hours_watched
## t = 3.101, df = 12, p-value = 0.004586
## alternative hypothesis: true mean is greater than 5.156986
## 95 percent confidence interval:
##  5.40721     Inf
## sample estimates:
## mean of x 
##  5.745385
t.test(WNW1_df_Bd2$hours_watched,mu=groupAd2_mean,alternative = c("greater"),sd=WNW1_groupAd2_sd,conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  WNW1_df_Bd2$hours_watched
## t = 3.0505, df = 31, p-value = 0.002325
## alternative hypothesis: true mean is greater than 5.11125
## 95 percent confidence interval:
##  5.375543      Inf
## sample estimates:
## mean of x 
##   5.70625
t.test(WNW1_df_Bd3$hours_watched,mu=groupAd3_mean,alternative = c("greater"),sd=WNW1_groupAd3_sd,conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  WNW1_df_Bd3$hours_watched
## t = 1.551, df = 15, p-value = 0.07087
## alternative hypothesis: true mean is greater than 3.694706
## 95 percent confidence interval:
##  3.62619     Inf
## sample estimates:
## mean of x 
##  4.220625
t.test(WNW1_df_Bd4$hours_watched,mu=groupAd4_mean,alternative = c("greater"),sd=WNW1_groupAd4_sd,conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  WNW1_df_Bd4$hours_watched
## t = 3.869, df = 58, p-value = 0.0001396
## alternative hypothesis: true mean is greater than 3.687671
## 95 percent confidence interval:
##  4.023756      Inf
## sample estimates:
## mean of x 
##  4.279407
t_table <- data.frame(Demographic=(c("d1","d2","d3","d4")),
                      p_value=c(0.004586,0.002325,0.07087,0.0001396))
t_table
mean_table <- data.frame(GroupA=(c("Group A", "Group A d1","Group A d2","Group A d3","Group A d4")),
                         GroupA_means=(c(groupA_mean,groupAd1_mean,groupAd2_mean,groupAd3_mean,groupAd4_mean)),
                         GroupB=(c("Group B", "Group B d1","Group B d2","Group B d3","Group B d4")),
                         GroupB_means=(c(groupB_mean,groupBd1_mean,groupBd2_mean,groupBd3_mean,groupBd4_mean)))
mean_table

Conclusions

We looked at the variables age, social metric, number of months since the customer signed up, demographic, to determine whether the new recommender algorithm engine improved the viewing hours of the treatment Group B over the control Group A and which variables.

Age – Our null and alternative hypothesis for age are: Ho: There is no correlation between age and hours watched. Ha: There is a correlation between age and hours watched. The p-value for Age in our multilinear regression model was very small at 1.99e-11. Therefore we reject the null hypothesis and can say that age and hours watched are statistically significantly correlated within a 95% confidence level. We’ve seen from the graphs that older people watch fewer hours on average.

Social metric – Our null and alternate hypothesis for social metric are: Ho: There is no correlation between social metric and hours watched. Ha: There is a correlation between social metric and hours watched. The p value for social metric and hours watched is very small at 1.32e-10. Therefore, we reject the null hypothesis and can conclude that social metric and hours watched are statistically significantly correlated within a 95% confidence level.

Group A/B - Our null and alternate hypothesis for Group A/ B are: Ho: The average hours watched between Group A and Group B, are the same (no significant difference in hours watched) Ha: The average hours watched between Group A and Group B are statistically different. The p value for Group B is very small at 5.10e-08. Therefore we would reject the null hypothesis and conclude that Group B showed statistical improvements in viewing hours with a 95% confidence level. Therefore, we could roll out the new algorithm to all social metric groups.

Gender - Our null and alternate hypothesis for gender are: Ho: There is no correlation between gender and hours watched. Ha: There is a correlation between gender and hours watched. The p value for gender is large at 0.2788. Therefore, we accept the null hypothesis and conclude there is no relationship between gender and hours watched.

Time since signup - Our null and alternate hypothesis for Time since signup are: Ho: There is no correlation between Time since signup and hours watched. Ha: There is a correlation between Time since signup and hours watched. The p value for time since signup is large enough at 0.5567, to accept the null hypothesis and conclude there is no correlation between time since signup and hours watched

Demographic is related to age and gender. Our regression model returned inconclusive results for this variable so we needed to take a deeper look into each demographic. The null and alternate hypothesis for demographic are: Ho: There is no relationship between demographic group and hours watched Ha: There is a relationship between demographic group and hours watched Demographic 1 (females aged 18-35): Group A returned an average hours watched of 5.16 and Group B return 5.75. With a small p-value of 0.004586 we reject the null hypothesis and conclude that Group B had a statistically significant improvement in hours watched with a 95% confidence level. Demographic 2 (males aged 18-35): Group A returned an average hours watched of 5.11 and Group B return 5.71. With a small p-value of 0.002325 we reject the null hypothesis and conclude that Group B had a statistically significant improvement in hours watched. Demographic 3 (females aged 36-55): Group A returned an average hours watched of 3.69 and Group B return 4.22. With a large p-value of 0.07087 we accept the null hypothesis and conclude that Group B was not statistically significant in improving hours watched. Demographic 4(males aged 36-55): Group A returned an average hours watched of 3.69 and Group B return 4.28. With a small p-value of 0.0001396 we reject the null hypothesis and conclude that Group B had a statistically significant improvement in hours watched.

Overall conclusion and further recommendations From our analysis we can reasonably conclude that it would be worthwhile for WNW to rollout the new recommendation engine to our subscribers. However, the data set was limited in sample size as we could not conclude that the new algorithm showed statistical improvements for our demographic of females aged 36 to 55. The experiment was also short in terms of time conducted. i.e. The treatment group only saw the new algorithm for approximately 2 weeks. To gain a better understanding of whether the new algorithm should be implemented for all subscribers, the experiment should run for at least 6 months and should be expanded to at least twice the sample size. Whilst most variables captured were useful in the testing we should try to understand the rotation of the content that is available for streaming. i.e. did new content improve viewing hours or for example could removing a beloved show have reduced viewing hours in some cases.