This project is based on data from an online movie streaming company who have just conducted an A/B test on their users to analyze whether the new recommendation system they trialed can result in an increase in the amount of hours watched.
Competition between the streaming titans is heating up and the competition for user engagement through increasingly well targeted content has never been higher. An increase in user engagement means more hours of streaming and plenty of opportunities for advertising revenue and therefore company profit.
The streaming company has built up a healthy customer base with good content offerings but is continually refining their recommendation engine to provide better recommendations to customers. Better recommendations improve user engagement and importantly increase the average hours watched per user per day, a key metric used to price ads for 3rd party marketing companies. The company executives want to know if the new recommendation engine algorithm is worth rolling out to all their subscribers. They want to analyse the results from a recent change they made in their recommendation engine and present the findings to the executive team.
They are also interested in any other insights you may learn from the sample data.
In particular, they are curious about the following:
• Is there any bias in the data collection?
• How could any bias be corrected?
• What improvements could be made to future A/B tests?
The sample data shows the effect of an A/B test conducted to measure the effectiveness of a change to the recommendation engine used on some subscribers, but not others. The change to the recommendation went live at 1-minute past midnight on the 18th of July. Those customers who were unknowingly using the new recommendation engine to suggest what to watch next are labelled as group B, while group A was used as a control group.
options(warn= -1)
library("PerformanceAnalytics")
library(ggplot2)
library(dplyr)
library(cowplot)
library(ggthemes)
library(reshape)
library(readr)
library(plotly)
library(corrplot)
library(RColorBrewer)
library(gplots)
df <- read_csv("streaming_data.csv")
head(df)
## # A tibble: 6 x 8
## date gender age social_metric time_since_sign~ demographic group
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 01/07 F 28 5 19.3 1 A
## 2 01/07 F 32 7 11.5 1 A
## 3 01/07 F 39 4 4.3 3 A
## 4 01/07 M 52 10 9.5 4 A
## 5 01/07 M 25 1 19.5 2 A
## 6 01/07 M 51 0 22.6 4 A
## # ... with 1 more variable: hours_watched <dbl>
dim(df)
## [1] 1000 8
The data contains 8 columns and 1000 rows of data. The data set has the following fields (columns):
• date in format dd/mm
• gender of the customer
• age of the customer
• social metric, which is a combined metric based on previous viewing habits
• number of months since the customer signed up
• demographic number
• group (A/B) where A is the control and B is the treated group
• number of hours watched in that day
#Lets create a frequency table to assess the number of people in each group (A or B)
table(df$group)
##
## A B
## 880 120
# Group A has 880 people and Group B has 120 people
880/1000 #Calculate what percentage of the total participants are in group A
## [1] 0.88
120/1000 #Calculate what percentage of the total participants are in group B
## [1] 0.12
#Making a new data frame vector for the count and proportion of Group A and Group B
count.data <- data.frame(
Group = c("A", "B"),
n = c(880,120),
prop = c(88, 12)
)
count.data
## Group n prop
## 1 A 880 88
## 2 B 120 12
# Add label position
count.data <- count.data %>%
arrange(desc(Group)) %>%
mutate(lab.ypos = cumsum(prop) - 0.5*prop)
count.data
## Group n prop lab.ypos
## 1 B 120 12 6
## 2 A 880 88 56
# Building a pie chart
# Picking colors
mycols <- c("#0073C2FF", "#CD534CFF")
#Bar plot of counts in each group
GroupsBarPlot = ggplot(data=df, aes(group)) +
geom_bar(fill=mycols)+
ylab("Number of participants")+
xlab("Group")+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
ggtitle("Number of participants by group")+
geom_text(stat='count',aes(label=..count..), vjust=2.1, nudge_y=0.9)+
theme(plot.title = element_text(hjust=0.5))
#Making the pie chart
# Add in title and maybe a legend
GroupsPiePlot = ggplot(count.data, aes(x = "", y = prop, fill = Group)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0)+
geom_text(aes(y = lab.ypos, label = paste(prop,"%")), color = "white")+
scale_fill_manual(values = mycols) +
theme_void()+
ggtitle("Percentage breakdown of participants\nby group")+
theme(plot.title = element_text(hjust=0.5))
# Plot both charts on the same grid (Graph 1)
plot_grid(GroupsBarPlot, GroupsPiePlot,nrow = 1, ncol=2)
We can see from graph 1 that Group A (the control group) contains 88% of the participants and Group B (the treated group) contains 12% of the participants. Typically for an A/B test you want more people in your treatment group in order to try and maximize profits. Having a higher percentage of participants in the treated group than the control group is the first thing that should be changed for future A/B tests.
#Lets repeat this entire process for gender
table(df$gender)
##
## F M
## 429 571
# There are 429 Females and 571 Males in the dataset
429/1000
## [1] 0.429
571/1000
## [1] 0.571
#Making a new data frame vector for the count and proportion of Group A and Group B
Gender.data <- data.frame(
Gender = c("Females", "Males"),
n = c(429,571),
prop = c(42.9, 57.1)
)
Gender.data
## Gender n prop
## 1 Females 429 42.9
## 2 Males 571 57.1
# Add label position
Gender.data <- Gender.data %>%
arrange(desc(Gender)) %>%
mutate(lab.ypos = cumsum(prop) - 0.5*prop)
Gender.data
## Gender n prop lab.ypos
## 1 Males 571 57.1 28.55
## 2 Females 429 42.9 78.55
# Building a pie chart
# Picking colors
mycols <- c("#0073C2FF", "#CD534CFF")
#Bar plot of counts in each group
GenderBarPlot = ggplot(data=df, aes(gender, fill=gender)) +
geom_bar(fill=mycols) + #Make a bar chart
labs(x= "Gender", y= "Number of People", size=1) + #Add axis labels
geom_text(aes(label= ..count..), stat="count", position=position_dodge(width=0.9), vjust=-0.25) + #Label the bars
theme(axis.text.x = element_text(size = 8), panel.border = element_rect(linetype = "dashed", fill = NA),
panel.background = element_rect(fill='white'), plot.title = element_text(size= 12, hjust=.5)) +
#Customize the theme elements
ggtitle("Number of Males and Females") #Add a title
#Making the pie chart
GenderPiePlot = ggplot(Gender.data, aes(x = "", y = prop, fill = Gender)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0)+ #Change the shape to a pie
geom_text(aes(y = lab.ypos, label = prop), color = "white")+ #Add the label we created earlier
scale_fill_manual(values = mycols) +
theme_void() + #Void the theme
theme(plot.title = element_text(size= 12, hjust=-0.4), axis.text.y = element_text(size = 6)) + #Change title size and position
ggtitle("Percentage breakdown of Males and Females") #Add title
# Plot 2
# Plot both charts on the same grid
plot_grid(GenderBarPlot, GenderPiePlot,nrow = 1, ncol=2)
We can see from graph 2 that there are more males participants than females, however the percentages are fairly close to being 50/50.
#Make a bar chart for both genders by group
#Make a pie chart for both genders by group
table(df$gender, df$group)
##
## A B
## F 400 29
## M 480 91
#We can see there is a large gender imbalance in the treatment group (Group B), there are only 29 females yet there are 91 males
400/1000
## [1] 0.4
29/1000
## [1] 0.029
480/1000
## [1] 0.48
91/1000
## [1] 0.091
#Making a new data frame vector for the count and proportion of Group A and Group B
GenGroup.data <- data.frame(
GenderGroup = c("Male Group A", "Female Group A", "Male Group B", "Female Group B" ),
n = c(480, 400, 91, 29),
prop = c(48, 40, 9.1, 2.9)
)
GenGroup.data
## GenderGroup n prop
## 1 Male Group A 480 48.0
## 2 Female Group A 400 40.0
## 3 Male Group B 91 9.1
## 4 Female Group B 29 2.9
# Add label position
GenGroup.data <- GenGroup.data %>%
arrange(desc(GenderGroup)) %>%
mutate(lab.ypos = cumsum(prop) - 0.5*prop)
GenGroup.data
## GenderGroup n prop lab.ypos
## 1 Male Group B 91 9.1 4.55
## 2 Male Group A 480 48.0 33.10
## 3 Female Group B 29 2.9 58.55
## 4 Female Group A 400 40.0 80.00
#Vector of colours to change to
GenGroupBarCols <- c("#CD534CFF", "#868686FF", "#EFC000FF", "#0073C2FF")
#Bar chart of all four gender groups
GenGroupBar = ggplot(data=GenGroup.data, aes(x=GenderGroup, y=n)) +
geom_col(fill=GenGroupBarCols) + #Make a column chart
labs(x = "Gender Group", y= "Number of People", size=1) + #Label axis'
geom_text(size=3.5,aes(label=n), position=position_dodge(width=0.9), vjust=-0.25) + #Label the bars
theme(axis.text.x = element_text(size = 8, angle=45, vjust=1.2, hjust=1.1), panel.border = element_rect(linetype = "dashed", fill = NA),
panel.background = element_rect(fill='white'), plot.title = element_text(size= 12, hjust=.5)) + #customize theme
ggtitle("Number of People by Gender in each Group") #Add title
#Vector of colours to change to
GenGroupCols <- c("#0073C2FF", "#EFC000FF", "#868686FF", "#CD534CFF")
# Pie chart of all four gender groups
GenGroupPiePlot = ggplot(GenGroup.data, aes(x = "", y = prop, fill = GenderGroup)) +
geom_bar(width = 1, stat = "identity", color = "white") +
coord_polar("y", start = 0)+ #Create the pie shape
geom_text(aes(y = lab.ypos, label = prop), color = "black", size=4)+ #Use labels we created earlier
scale_fill_manual(values = GenGroupCols) + #Customize colours of the pie
theme_void() + #Void theme
theme(plot.title = element_text(size= 12, hjust=-0.4), axis.text.y = element_text(hjust=0.5,size = 4.5)) + #Customize theme elements
ggtitle("Percentage breakdown of Gender Groups") #Add title
# Plot both charts on the same grid
#This is the two plots of all four gender groups
# Plot 3
plot_grid(GenGroupBar, GenGroupPiePlot,nrow = 1, ncol=2)
We can see that the number of males in group A and females in group A is comparable, which is good. However more importantly we can see there are more than three times the number of males in group B (the treatment group) than there are females in group B. This means that group B is heavily biased towards observing how men respond to the new recommendation algorithm. Additionally, identifying any major gender differences in response to the new recommendation algorithm will be difficult to detect due to the small number of females in group B. This is the first bias in the data
GroupA <- filter(df, group=='A')
GroupB <- filter(df, group=='B')
summary(GroupA$hours_watched)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.500 3.487 4.355 4.336 5.250 8.300
summary(GroupB$hours_watched)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.525 3.939 4.860 4.811 5.723 7.930
We can see that on average Group B watches roughly an extra half hour more than group A.
# Plot 4.0
ggplot(data=df, aes(x=age,y=hours_watched, colour=group))+
geom_point(aes(x=age,y=hours_watched))+
geom_smooth(method="lm") + #Add linear regression line
labs(y="Number of hours watched", x="User Age (in years)") + #Label axis'
ggtitle("Number of Hours Watched by User Age per Group") + #Add title
scale_y_continuous(breaks=seq(0, 9, 0.5), sec.axis = dup_axis()) + #Set y axis tick marks from 0 to 9 every 0.5
scale_x_continuous(breaks=seq(0, 56, 2)) + # x axis Tick marks from 0-56, every 2
theme(panel.background = element_rect(fill="#f4edca",color="pink"), #Customize theme elements
plot.background = element_rect("#46dbdf"),
plot.title = element_text(hjust= 0.5),
panel.border = element_rect(linetype = "dashed", fill = NA))
## `geom_smooth()` using formula 'y ~ x'
Here we get our first look at the effects of the A/B test and we can see that on average, Group B (the treated group) watches 45 minutes more than Group A (the control group), which is the desired outcome. The graph also shows us that the number of hours watched is negatively correlated to user age. That is, on average, with an increase in user age comes a decrease in the amount of hours watched, irrespective of group. We can also notice that there are no 22 year olds in Group B, which is how we know the subject groups have not been stratified by age. Now we want to know, does this matter?
# QQ Plot of the population
PopQQPlot <- ggplot(df, aes(sample=age)) +
stat_qq() + #Add QQ stat
stat_qq_line(color = "red") + #Make QQPlot line red
ggtitle("Population age distribution")+ #Add title
theme(plot.title = element_text(size=11, hjust=.5), panel.border = element_rect(linetype = "dashed", fill=NA), plot.background = element_rect("cornflowerblue"))
##Centre title
AQQPlot <- ggplot(GroupA, aes(sample=age)) +
stat_qq() +
stat_qq_line(color = "red") + #Make the QQ Line red
ggtitle("Group A age distribution")+ #Add title
theme(plot.title = element_text(size=11, hjust=.5), panel.border = element_rect(linetype = "dashed", fill=NA), plot.background = element_rect("cornflowerblue"))
BQQPlot <- ggplot(GroupB, aes(sample=age)) +
stat_qq() +
stat_qq_line(color = "red") + #Make the QQ Line red
ggtitle("Group B age distribution")+ #Add title
theme(plot.title = element_text(size=11, hjust=.5), panel.border = element_rect(linetype = "dashed", fill=NA), plot.background = element_rect("cornflowerblue"))
#Centre title
#plot the QQ Plots on the same grid
plot_grid(PopQQPlot, AQQPlot, BQQPlot, nrow = 1, ncol=3)
The above Quantile-Quantile plot (graph 5) shows us that groups A and B follow the same distribution as the population, so the answer to our question, does it matter that there are no 22 year olds in Group B, is no, it does not matter as the age distributions of Group A and Group B closely represent the age distribution for the population. Meaning that the increase in the number of hours watched in Group B is not due to any differences in the ages of the two groups (for example if Group B were much younger than Group A, this would impact the relevance of the difference in the amount of hours watched).
AgeSummary = df %>% group_by(age, group) %>%
summarize(avg_hours = mean(hours_watched), #Mean avg hours for each age and group
SD = sd(hours_watched), #Standard deviation for each age and group
Min = min(hours_watched), #Minimum age for each age and group
Q1 = quantile(hours_watched,probs =.25), #Quartile 1 for each age and group
Median = median(hours_watched), #Median for each age and group
Q3 = quantile(hours_watched,probs = .75), #Quartile 3 for each age and group
Max = max(hours_watched), #Max age for each age and group
IQR = IQR(hours_watched), .groups = "keep") #IQR for each age and group.
#To suppress the warning we add groups = "keep"
AgeSummary #See what the variable looks like
## # A tibble: 75 x 10
## # Groups: age, group [75]
## age group avg_hours SD Min Q1 Median Q3 Max IQR
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 18 A 5.66 0.936 3.73 5.18 5.80 6.32 6.98 1.13
## 2 18 B 6.04 1.18 5.21 5.63 6.04 6.46 6.88 0.835
## 3 19 A 5.60 1.18 3.43 4.95 5.29 6.32 8.3 1.36
## 4 19 B 6.16 0.00707 6.15 6.15 6.16 6.16 6.16 0.00500
## 5 20 A 5.43 1.19 2.06 4.74 5.64 6.15 7.45 1.41
## 6 20 B 3.94 NA 3.94 3.94 3.94 3.94 3.94 0
## 7 21 A 4.91 1.31 1.93 4.28 5.12 5.46 7.4 1.18
## 8 21 B 6.40 0.969 5.72 6.06 6.40 6.75 7.09 0.685
## 9 22 A 5.16 0.944 3.44 4.64 4.96 5.76 7.67 1.12
## 10 23 A 5.46 0.997 3.52 4.74 5.64 5.98 7.52 1.23
## # ... with 65 more rows
# We've already determined that both groups follow a normal distribution for age
#Convert the dates to just a number
DateAsNumber <- df$date
DateNumber <- substr(DateAsNumber, start=1, stop=2)
df['DateNumber'] <- DateNumber
#Make the column numeric
df$DateNumber<- as.numeric(df$DateNumber)
#Plot 6.0
#This graph shows the difference over the month between group A and B
ggplot(data=df, aes(x=DateNumber, y=hours_watched, colour=group))+
geom_point(aes(x=DateNumber, y=hours_watched))+ #Make a scatterplot
geom_smooth(method="lm") + #Add linear regression line
scale_y_continuous(breaks=seq(0, 9, 0.5), sec.axis = dup_axis()) + #Set y axis tick marks from 0 to 9 every 0.5
scale_x_continuous(breaks=seq(0, 32, 2)) + #Set the x axis values from 0 to 32 every 2
labs(y="Number of hours watched", x="Date of the month for July") + #Label the x and y axis'
ggtitle("Number of Hours Watched by Date of the month (July)") + #Give a title
theme(panel.background = element_rect(fill="#f4edca",color="pink"), #Customize the colours
plot.background = element_rect("#FF6666"),
plot.title = element_text(hjust= 0.5), #Centre the title, give a dashed border
panel.border = element_rect(linetype = "dashed", fill = NA))
## `geom_smooth()` using formula 'y ~ x'
This graph (graph 6) highlights another important design flaw in the A/B test; as we can see from our regression lines the treatment group (group B) only contains data from participants on and after the 18th of July. Meaning any natural fluctuations throughout the course of a month cannot be fully observed in the B group. We can also see that users in both groups tend to watch more at the end of the month, the control group (group A) tends to watch about 15 minutes more per day at the end of the month compared to the start of the month. This is about a 5% increase. We can also see that the treatment group (group B) watches an extra half an hour per day more than the control group. This equates to more than a 10% difference.
After18th <- filter(df, DateNumber>=18)
table(After18th$gender, After18th$DateNumber)
##
## 18 19 20 21 22 23 24 25 26 27 28 29 30 31
## F 18 19 9 11 13 12 11 14 11 17 15 16 12 14
## M 14 13 24 21 19 20 22 18 21 15 18 16 20 19
There are 192 females from the 18th onwards and 260 males from the 18th onwards.We know there are only 29 females in group B, so we now know the participants have definitely not been stratified by gender, even after the 18th of the month.
ABAgeVec <-table(After18th$age, After18th$group)
ABAgeVec
##
## A B
## 18 6 2
## 19 14 2
## 20 17 1
## 21 9 2
## 22 6 0
## 23 12 2
## 24 7 2
## 25 9 2
## 26 7 5
## 27 8 3
## 28 11 3
## 29 6 1
## 30 8 2
## 31 15 4
## 32 8 2
## 33 8 3
## 34 13 5
## 35 6 4
## 36 8 5
## 37 7 4
## 38 10 4
## 39 6 2
## 40 8 1
## 41 9 4
## 42 7 2
## 43 8 2
## 44 7 8
## 45 11 7
## 46 7 5
## 47 7 4
## 48 10 2
## 49 9 3
## 50 9 4
## 51 5 3
## 52 12 4
## 53 7 6
## 54 11 2
## 55 4 3
We can notice that there are no 22 year olds in group B, this is an important point when figuring out how subjects were stratified into their groups. They were clearly not stratified by age as there are six 22 year old’s in group A after the 18th (when the study group began).
table(df$demographic)
##
## 1 2 3 4
## 216 268 213 303
table(df$gender, df$demographic)
##
## 1 2 3 4
## F 216 0 213 0
## M 0 268 0 303
We can see that there are a comparable number of participants in each demographic but we can see there are no females in demographic 2 or 4 and no males in demographic 1 or 3.
table(df$group, df$demographic)
##
## 1 2 3 4
## A 203 236 197 244
## B 13 32 16 59
We can also see that participants are not evenly split and thus have not been stratified by demographic.
Now lets run a significance test on the means from the graph above. Our null hypothesis (H0) is that the mean hours watched for Group B is greater than the mean hours watched of Group A.
t.test(SocialMetricSummary$avg_hours~SocialMetricSummary$group, alt="two.sided", conf=0.95, var.eq=F, paired=F)
##
## Welch Two Sample t-test
##
## data: SocialMetricSummary$avg_hours by SocialMetricSummary$group
## t = -1.6565, df = 14.146, p-value = 0.1196
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.8974471 0.1148458
## sample estimates:
## mean in group A mean in group B
## 4.337245 4.728546
Based on the p-value we cannot accept the null hypothesis, the difference between means was not statistically significant enough.
We will now conduct an Analysis of Variance (ANOVA) on the differences among means.
The null nypothesis (H0) is that the means of all groups are equal
aov(df$hours_watched~df$group)
## Call:
## aov(formula = df$hours_watched ~ df$group)
##
## Terms:
## df$group Residuals
## Sum of Squares 23.8009 1751.6242
## Deg. of Freedom 1 998
##
## Residual standard error: 1.324815
## Estimated effects may be unbalanced
summary(aov(df$hours_watched~df$group))
## Df Sum Sq Mean Sq F value Pr(>F)
## df$group 1 23.8 23.801 13.56 0.000243 ***
## Residuals 998 1751.6 1.755
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p value is <0.001, therefore we must accept the null hypothesis.
We will use the following equation to calculate the required minimum sample size: \[ n_{ss} = z_{alpha}^2 \left( \frac{ \sigma }{E} \right)^2 \]
In order to calculate a minimum sample size per group \[n_{ss}\]
We need to define a confidence interval \[z_{alpha}^2\]
We need to estimate what the standard deviation is for hours watched \[{\sigma}\]
And we need to estimate what the effect size between groups A and B will be \[{E}\]
It’s important to observe from the above equation that the larger the effect size, the lower sample size needed per group.
summary(GroupA$hours_watched)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.500 3.487 4.355 4.336 5.250 8.300
summary(GroupB$hours_watched)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.525 3.939 4.860 4.811 5.723 7.930
sd(df$hours_watched) #The standard deviation is 1.333118
## [1] 1.333118
#Mean Hours watched of group B divided by mean hours watched of control group - group A
4.811 / 4.336
## [1] 1.109548
# Group B watches 10.9% more hours than Group A
We can see the standard deviation is 1.333118 and that the estimated effect size (the difference in hours watched between group A and group B) is 10.9%. Taking a 95% confidence level we can calculate the minimum sample size as
# Minimum sample size calculation
z_alpha <- 1.96
effect_est <- 0.109
sd_est <- 1.33
n_ss <- ceiling((z_alpha * sd_est / effect_est)^2)
print(paste('Min sample size', n_ss))
## [1] "Min sample size 572"
The minimum sample size is 572 for EACH group, meaning we do not have enough people in group B to achieve statistically significant results.
# subset only numerical columns
M = cor( df[,c(3:5,8,9)], method = "pearson" )
# Make the correlation plot. Plot 8
corrplot(M, type="upper", # To make diagonal matrix
order="hclust", # to order based on similarity
col=brewer.pal(n=8, name="RdYlBu")) # For the colour scale
The above plot (graph 9) shows us the Pearson’s correlation co-efficient between numeric groups in our data set. Pearson’s correlation co-efficient shows us the relationship (or lack of) between two numeric variables, it shows us the strength and the direction of the relationship. For example a score of +1 would indicate a 100% positive relationship; as variable X increases, so does variable Y. A -1 score would indicate a 100% negative relationship; for example when variable X increases, variable Y decreases. We can see that there is a moderate to strong negative relationship between hours watched and age, which we have already observed in graph 4. We can see that there is a weak positive relationship between the date of the month and hours watched, which we also previously observed in graph 6. We can also see there is a moderate positive relationship between social metric and hours watched, social metric is a categorical variable rather than a discrete or continuous variable. Typically Pearons correlation is only used for discrete and continuous variables, not categorical variables, however as there are 11 categories of social metric we can still get a good distribution, enough to see that hours watched may be affected by the participants social metric. Lets perform a Two Way ANOVA to test if there is a statistically significant interaction between group and social metric.
anova2 <- aov(hours_watched ~ group + social_metric, data=df)
summary(anova2)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 1 23.8 23.80 14.31 0.000165 ***
## social_metric 1 92.9 92.87 55.82 1.73e-13 ***
## Residuals 997 1658.8 1.66
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results of the two way ANOVA (P-value: <0.001) indicate a non-statistically-significant difference between variances of groups. Meaning that there is no statistically significant interaction between group and social metric.
Now lets finish evaluating the numerical variables.
#Plot 10
#Make chart to view the distribution, regression and pearsons correlation between numerical categories
chart.Correlation(df[,c(3:5,8,9)], histogram=TRUE, pch=19)
We can see a number of important things from figure 10.
The pearsons correlation co-efficient of age and hours watched is -0.57, indicating a moderate to strong negative relationship, as we previously mentioned.
The pearsons correlation co-efficient of date and hours watched is only 0.085, indicating there is almost no relationship.
Hours watched is virtually a perfect normal distribution
We’ve already touched on hours watched by social metric
We’ve already touched on the distribution of participants ages
Date number (the date) is virtually perfectly evenly distributed, lets just quickly confirm that:
table(df$DateNumber)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 32 32 32 33 32 32 32 33 32 32 32 33 32 32 32 33 32 32 32 33 32 32 32 33 32 32
## 27 28 29 30 31
## 32 33 32 32 33
# Plot Heatmap
numCat.Vars = 5 # How many categorical variables in the dataset
idxs = c(1,2,4,6,7)
colnames(df)[idxs]
## [1] "date" "gender" "social_metric" "demographic"
## [5] "group"
Mtx = matrix(0, nrow = numCat.Vars, ncol= numCat.Vars)
colnames(Mtx) = colnames(df)[idxs]
rownames(Mtx) = colnames(df)[idxs]
# Create matrix of p values
# Dont worry about the warnings
for(i in 1:numCat.Vars){
x = df[,idxs[i]]
for(j in 1:numCat.Vars){
y = df[,idxs[j]]
# Perform Chi square stat and get the p-value
Mtx[i,j] = chisq.test( table( data.frame(x,y) ) )$p.value
}
}
#
#Heat map of Chi-square Test p-values, Figure 11
heatmap.2(Mtx, scale ="none", trace ="none",
col = brewer.pal(n=8, name="RdYlBu"), #Customize colours, add title, adjust cex for row and column
cexRow=0.75, cexCol = 0.75)
title(main = "Heatmap of Chi-square Test p-values",cex.main=0.95)
This heat map shows the results of the Chi-Square Test of Independence, which determines whether there is an association between categorical variables. The red squares in this heat map indicate there is a relationship between two categorical variables.
As alluded to earlier, there is a relationship between date and what group the participant was in (group A or group B)
Similarly there is a relationship between gender and date
There is a relationship between gender and demographic
There is no relationship between group and social metric
The size of the treatment group (group B) should be greater than the control group (group A)
Sample Bias 1: More than three quarters of the treatment group is male, meaning that group B is heavily biased towards observing how men respond to the new recommendation algorithm. And making identifying any major gender differences in response to the new recommendation algorithm difficult to detect due to the small number of females in group B
They did not stratify the groups participants for age (minor flaw)
Sample Bias 2: They did not start the intervention at the beginning of a month and consequently do not have a full months data for group B. They did not stratify the participants by date, despite having the even distribution for date to do this with.
There were not enough participants in Group B for the results of the A/B test to achieve statistical significance
Demographic is not split evenly and thus was not a block used to stratify people
Individual participants trends weren’t tracked
More data is needed before we can say the differences observed were more than chance
Social metric is fairly evenly split and was likely a block used to stratify subjects
Subject age was normally distributed in both Group A and Group B
Overall this A/B Test has a lot of issues both with design and with its results. Future A/B Tests should be completely re-designed following the recommendations that I have outlined in this project.