A/B Testing for ShoeFly.com
Our favorite online shoe store, ShoeFly.com is performing an A/B
Test. They have two different versions of an ad, which they have placed
in emails, as well as in banner ads on Facebook, Twitter, and Google.
They want to know how the two ads are performing on each of the
different platforms on each day of the week. Help them analyze the data
using aggregate measures.
Analyzing Ad Souces
1.Inspect the first few rows of ad_clicks using head(). What
variables are stored in the columns of the data frame?
Click the hint to see a description of the columns of ad_clicks.
# load packages
library(readr)
library(dplyr)
# load ad clicks data
ad_clicks <- read_csv("ad_clicks.csv")
# inspect ad_clicks here:
head(ad_clicks)
ad_clicks
2.Your manager wants to know which ad platform is getting the most
views.
How many views (i.e., rows of the data frame) came from each
utm_source? Group ad_clicks by utm_source and count the number of rows
in each group. Save your result to views_by_utm, and view it.
# define views_by_utm here:
views_by_utm <- ad_clicks %>%
group_by(utm_source) %>%
summarize(count = n())
views_by_utm
3.We want to know the percentage of people who clicked on ads from
each utm_source. Let’s start by finding the number of clicks per
utm_source.
Group ad_clicks by utm_source and ad_clicked and count the number of
rows in each group, naming the resulting column count. Save your answer
to the variable clicks_by_utm, and view it.
*Hint:
To group together the rows of ad_clicks with the same value for
utm_source and ad_clicked with group_by():
”
clicks_by_utm <- ad_clicks %>%
group_by(utm_source, ad_clicked)
”
To find the number of rows in each grouping by utm_source and
ad_clicked, pipe the result from the grouping into summarize() and use
n():
”
clicks_by_utm <- ad_clicks %>%
group_by(utm_source, ad_clicked) %>%
summarize(count = n())
”
# define clicks_by_utm here:
clicks_by_utm <- ad_clicks %>%
group_by(utm_source, ad_clicked) %>%
summarize(count = n())
clicks_by_utm
4.To find the percentage of people who clicked on ads from each
utm_source, we need to add a new column to ad_clicks that stores the
count of clicks (or the count of not clicking) divided by the total
number of ad views.
Group clicks_by_utm by utm_source and pipe the result into mutate(),
creating a new column percentage that is defined as
count/sum(count).
Save your result to percentage_by_utm and view it. Open the hint for
a description of the percentage_by_utm data frame.
*Hint:
To group together the rows of clicks_by_utm with the same value for
utm_source with group_by():
”
percentage_by_utm <- clicks_by_utm %>%
group_by(utm_source) %>%
”
To add a column that stores the percentage of users who did or did
not click on the ad, pipe the result from the grouping into mutate() and
create a new column percentage defined as count/sum(count):
”
percentage_by_utm <- clicks_by_utm %>%
group_by(utm_source) %>%
mutate(percentage = count/sum(count))
”
1.count represents the number of users who did or did not click on an
ad
2.sum(count) represents the total number of users who saw the ad
percentage_by_utm will now contain the click and non-click rates for
each utm_source/ad_clicked
# define percentage_by_utm here:
percentage_by_utm <- clicks_by_utm %>%
group_by(utm_source) %>%
mutate(percentage = count/sum(count))
percentage_by_utm
5.Filter percentage_by_utm to remove all rows that do not describe
clicked ads, and view it.
Was there a difference in click rates for each source?
*Hint
percentage_by_utm <-clicks_by_utm %>%
group_by(utm_source) %>%
mutate(percentage = count/sum(count)) %>%
filter(ad_clicked == TRUE)
percentage_by_utm
Analyzing an A/B Test
6.The column experimental_group tells us whether the user was shown
ad A or ad B.
Were approximately the same number of people shown both ads? Group
ad_clicks by experimental_group and count the rows, saving your result
to experiment_split.
View experiment_split to see how users were split across the
experiment groups!
# define experiment_split here:
experiment_split <- ad_clicks %>%
group_by(experimental_group) %>%
summarize(count = n())
experiment_split
7.Using group_by() and the columns ad_clicked and experimental_group,
check to see if a greater number of users clicked on ad A or ad B. Save
your result to clicks_by_experiment, and view it.
Reveal the hint to see which ad was more effective.
# define clicks_by_experiment here:
clicks_by_experiment <- ad_clicks %>%
group_by(ad_clicked, experimental_group) %>%
summarize(count = n())
clicks_by_experiment
8.The Product Manager for the A/B test thinks that the clicks might
have changed by day of the week.
Start by creating two data frames: a_clicks and b_clicks, which
contain only the results for A group and B group, respectively.
# define a_clicks here:
a_clicks <- ad_clicks %>%
filter(experimental_group == 'A')
a_clicks
# define b_clicks here:
b_clicks <- ad_clicks %>%
filter(experimental_group == 'B')
b_clicks
9.For each data frame (a_clicks and b_clicks), we want to find the
number of users who clicked on the ad (and did not click on the ad) by
day. Save the results to a_clicks_by_day and b_clicks_by_day, and view
them.
# define a_clicks_by_day here:
a_clicks_by_day <- a_clicks %>%
group_by(day, ad_clicked) %>%
summarize(count = n())
a_clicks_by_day
# a_clicks_by_day <- a_clicks %>%
b_clicks_by_day <- b_clicks %>%
group_by(day, ad_clicked) %>%
summarize(count = n())
b_clicks_by_day
10.To find the percentage of people who clicked on the ads from each
day, we need to add a new column to a_clicks_by_day and b_clicks_by_day
that stores the count of clicks (or the count of not clicking) divided
by the total number of ad views.
Group a_clicks_by_day and b_clicks_by_day by day and pipe the result
into mutate(), creating a new column percentage that is defined as
count/sum(count).
Save your result to a_percentage_by_day and b_percentage_by_day and
view them. Open the hint for a description of these data frames.
# define a_percentage_by_day here:
a_percentage_by_day <- a_clicks_by_day %>%
group_by(day) %>%
mutate(percentage = count/sum(count))
a_percentage_by_day
# define b_percentage_by_day here:
b_percentage_by_day <- b_clicks_by_day %>%
group_by(day) %>%
mutate(percentage = count/sum(count))
b_percentage_by_day
11.Filter a_percentage_by_day and b_percentage_by_day to remove all
rows that do not describe clicked ads, and view them.
Was there a difference in click rates for each day between the two
ads? What happened over the course of the week?
Do you recommend that ShoeFly.com use ad A or ad B?
Reveal the hint for our analysis.
# define a_percentage_by_day here:
a_percentage_by_day <- a_clicks_by_day %>%
group_by(day) %>%
mutate(percentage = count/sum(count)) %>%
filter(ad_clicked == TRUE)
a_percentage_by_day
# define b_percentage_by_day here:
b_percentage_by_day <- b_clicks_by_day %>%
group_by(day) %>%
mutate(percentage = count/sum(count)) %>%
filter(ad_clicked == TRUE)
b_percentage_by_day
*Hint:
To filter a_percentage_by_day to only include rows where ads were
clicked, pipe the result from the previous task into filter() with the
additional argument ad_clicked == TRUE:
”
a_percentage_by_day <- a_clicks_by_day
group_by(day
mutate(percentage = count/sum(count)) %>%
filter(ad_clicked == TRUE)
”
