Bulk importing data from external sources into R can be a great way to gather and analyze experimental data for A/B testing. In this case I’m going to show how I might use SQL and R to run an experiment on the placement of a button on a webpage.
Assume a situation where you’d like to find an ideal location for a button on a webpage where there are four distinct possible locations { ‘top left’, ‘bottom left’, ‘top right’, ‘bottom right’}. You randomly assign users to be able to interact with each page and collect the number of times he/she clicked the button over a one week period. Your goal is to find out if there is a meaningful difference between the mean number of clicks for any given location of a button over one week period.
I created a new table (called Click) in the database to store the information from the experiment, updated it with the values, and from there we can begin.
# Pull all values from the DB from the experiment
ClicksDF <- dbGetQuery(con, "SELECT * FROM Click")
# View the format of the DF and change type condition to a factor
ClicksDF$condition <- as.factor(ClicksDF$condition)
Next let’s visualize the means of the condition levels to see if there is any difference at all between the mean number of hits by button location.
# Plot the mean number of clicks by condition
Clicksplot <- ggplot(ClicksDF, aes(x = ClicksDF$condition, y = ClicksDF$clicks)) +
geom_boxplot(color = "Red") + labs(x = "Condition", y = "Clicks") + theme(panel.background = element_rect(fill = "darkgrey"))
Clicksplot
From the plot above we can see that there seems to be a visual difference between the mean number of clicks by the location of the button on the webpage; particularly, it seems like there is a meaningful difference between the ‘top right’ button location and all others. This might be useful if we were finding the optimal location of a “buy” button, “checkout” button or other types that can be directly tied to revenue for a company. If you look below, you can see the mean values that are implicit in the chart.
# Calculate the mean number of clicks by condition
ConditionMean <- tapply(ClicksDF$clicks, ClicksDF$condition, mean)
| Top Left | Top Right | Bottom Left | Bottom Right |
|---|---|---|---|
| 11.0 | 16.5 | 13.3 | 12.0 |
What we need to do now is complete the following: is the difference between the mean number of clicks on the top right location likely due to chance or is there a less than 5% probability that this is due to chance? This section in particular is where I enjoy R very much.
Step 1: If we want to understand if any difference in the variability (variance) in the means is due to our experiment (placement of the buttons in different locations), we first need to know that the variance across conditions is homogenous. This would make sense because after the experiement is complete, we could reasonably say that the experiment likely caused the new variablity that we see between conditions. To test for this we complete a Levene’s Test for Homogeneity of Variance:
# Check for homogeniety of variance across conditions
leveneTest(ClicksDF$clicks ~ ClicksDF$condition, center = mean, ClicksDF)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 3 1.6538 0.2294
## 12
The null hypothesis for Levene’s test is that the variance across the conditions in our set is in fact homogeneous. We see from the results above that the P value for this test is not less than .05, indicating that there is not enough evidence to reject the null hypothesis. It is therefore reasonable to assume that the variance across conditions is in fact homogenous.
Step 2: We now need to find out if the between group differences in variance are signicant. To complete this we use the aov function in R like so:
# Complete an ANOVA test to access if the mean differneces are significant
ClickAnova <- aov(ClicksDF$clicks ~ ClicksDF$condition, ClicksDF)
summary(ClickAnova)
## Df Sum Sq Mean Sq F value Pr(>F)
## ClicksDF$condition 3 68.69 22.896 8.654 0.0025 **
## Residuals 12 31.75 2.646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see two things that are a good sign for our experiment 1) We see a relatively large F statistic and 2) the P value is less than .05 for the conditions variable, meaning that the differences between means in our experiment are likely not due to chance. However, we still do not know which comparison(s) of conditions in our experiment are significant.
Step 3: In an ideal world we would see that the difference in the mean number of clicks between the “top right” condition and all other conditions are significant. I think of these like matchups in sports: we pit all combinations of two conditions against eachother (“top right” vs “top left” for example) and compare the differnces in varinace. For those where there are siginificant differences, we can say that for that “matchup” we’ve created some differenes due to our experiment.
We also need to correct the p value in the anova test for the familywise error rate. (I won’t travel too far into this, but for those interested you can read more here: FamilyWise) I am going use a very conservative method for P value correction called Bonferroni’s Method and another less stringent method called Tukey’s Method to correct for the familywise error and to complete the analysis of variance between our “matchups”. The result of these test are below.
# Compute pairwise T test to find significant condition diffences
pairwise.t.test(ClicksDF$clicks, ClicksDF$condition, p.adjust.method = "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: ClicksDF$clicks and ClicksDF$condition
##
## bottom left bottom right top left
## bottom right 1.0000 - -
## top left 0.4447 1.0000 -
## top right 0.0918 0.0124 0.0027
##
## P value adjustment method: bonferroni
# Calculate TukeyHSD as a posthoc test to correct for familywise error rate
TukeyClicks <- TukeyHSD(ClickAnova)
TukeyClicks
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = ClicksDF$clicks ~ ClicksDF$condition, data = ClicksDF)
##
## $`ClicksDF$condition`
## diff lwr upr p adj
## bottom right-bottom left -1.25 -4.6647741 2.164774 0.7037830
## top left-bottom left -2.25 -5.6647741 1.164774 0.2569818
## top right-bottom left 3.25 -0.1647741 6.664774 0.0639487
## top left-bottom right -1.00 -4.4147741 2.414774 0.8203442
## top right-bottom right 4.50 1.0852259 7.914774 0.0096208
## top right-top left 5.50 2.0852259 8.914774 0.0021795
We can see from the two tests above that all matchups for the “top right” condition are significantly different in comparison to its peers, except for the bottom left condition. The charts above list the corrected P values and any matchups that show less than .05 are considered significant differences.
Step 4: Let’s step back from the statistics for a moment to access what we’ve tried to do and what we have achieved.
We first began by creating an experiment to find the optimal location for a button on a webpage given four distinct possible locations.
We collected data from the experient and stored it in a server so that we had a central location for the data. We pulled this data directly into R, visualized it, and analyzed it.
The analyis led us to the following conclusion: Compared with the options available, the optimal position to place the button is in the “top right” location of the page. The data seems to suggest that this location is significantly better than the top left and the bottom right location. We can’t conclude this location is significantly better than the bottom left location, but the differences between these two conditions are not meaningful.