A/B Testing for ShoeFly.com

Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

Analyzing Ad Souces

1.Inspect the first few rows of ad_clicks using head(). What variables are stored in the columns of the data frame?

Click the hint to see a description of the columns of ad_clicks.

# load packages
library(readr)
library(dplyr)
# load ad clicks data
ad_clicks <- read_csv("ad_clicks.csv")
# inspect ad_clicks here:
head(ad_clicks)
ad_clicks

2.Your manager wants to know which ad platform is getting the most views.

How many views (i.e., rows of the data frame) came from each utm_source? Group ad_clicks by utm_source and count the number of rows in each group. Save your result to views_by_utm, and view it.

# define views_by_utm here:
views_by_utm <- ad_clicks %>%
  group_by(utm_source) %>%
  summarize(count = n())

views_by_utm

3.We want to know the percentage of people who clicked on ads from each utm_source. Let’s start by finding the number of clicks per utm_source.

Group ad_clicks by utm_source and ad_clicked and count the number of rows in each group, naming the resulting column count. Save your answer to the variable clicks_by_utm, and view it.

*Hint:

To group together the rows of ad_clicks with the same value for utm_source and ad_clicked with group_by():

clicks_by_utm <- ad_clicks %>%

group_by(utm_source, ad_clicked)

To find the number of rows in each grouping by utm_source and ad_clicked, pipe the result from the grouping into summarize() and use n():

clicks_by_utm <- ad_clicks %>%

group_by(utm_source, ad_clicked) %>%

summarize(count = n())

# define clicks_by_utm here:
clicks_by_utm <- ad_clicks %>%
  group_by(utm_source, ad_clicked) %>%
  summarize(count = n())

clicks_by_utm

4.To find the percentage of people who clicked on ads from each utm_source, we need to add a new column to ad_clicks that stores the count of clicks (or the count of not clicking) divided by the total number of ad views.

Group clicks_by_utm by utm_source and pipe the result into mutate(), creating a new column percentage that is defined as count/sum(count).

Save your result to percentage_by_utm and view it. Open the hint for a description of the percentage_by_utm data frame.

*Hint:

To group together the rows of clicks_by_utm with the same value for utm_source with group_by():

percentage_by_utm <- clicks_by_utm %>%

group_by(utm_source) %>%

To add a column that stores the percentage of users who did or did not click on the ad, pipe the result from the grouping into mutate() and create a new column percentage defined as count/sum(count):

percentage_by_utm <- clicks_by_utm %>%

group_by(utm_source) %>%

mutate(percentage = count/sum(count))

1.count represents the number of users who did or did not click on an ad

2.sum(count) represents the total number of users who saw the ad

percentage_by_utm will now contain the click and non-click rates for each utm_source/ad_clicked

# define percentage_by_utm here:
percentage_by_utm <- clicks_by_utm %>%
  group_by(utm_source) %>%
  mutate(percentage = count/sum(count))

percentage_by_utm

5.Filter percentage_by_utm to remove all rows that do not describe clicked ads, and view it.

Was there a difference in click rates for each source?

*Hint

percentage_by_utm <-clicks_by_utm %>%
  group_by(utm_source) %>%
  mutate(percentage = count/sum(count)) %>%
  filter(ad_clicked == TRUE)

percentage_by_utm

Analyzing an A/B Test

6.The column experimental_group tells us whether the user was shown ad A or ad B.

Were approximately the same number of people shown both ads? Group ad_clicks by experimental_group and count the rows, saving your result to experiment_split.

View experiment_split to see how users were split across the experiment groups!

# define experiment_split here:
experiment_split <- ad_clicks %>%
  group_by(experimental_group) %>%
  summarize(count = n())

experiment_split

7.Using group_by() and the columns ad_clicked and experimental_group, check to see if a greater number of users clicked on ad A or ad B. Save your result to clicks_by_experiment, and view it.

Reveal the hint to see which ad was more effective.

# define clicks_by_experiment here:
clicks_by_experiment <- ad_clicks %>%
  group_by(ad_clicked, experimental_group) %>%
  summarize(count = n())

clicks_by_experiment

8.The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

Start by creating two data frames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.

# define a_clicks here:
a_clicks <- ad_clicks %>%
  filter(experimental_group == 'A')

a_clicks

# define b_clicks here:
b_clicks <- ad_clicks %>%
  filter(experimental_group == 'B')

b_clicks

9.For each data frame (a_clicks and b_clicks), we want to find the number of users who clicked on the ad (and did not click on the ad) by day. Save the results to a_clicks_by_day and b_clicks_by_day, and view them.

# define a_clicks_by_day here:
a_clicks_by_day <- a_clicks %>%
  group_by(day, ad_clicked) %>%
  summarize(count = n())
a_clicks_by_day

# a_clicks_by_day <- a_clicks %>%
b_clicks_by_day <- b_clicks %>%
  group_by(day, ad_clicked) %>%
  summarize(count = n())

b_clicks_by_day

10.To find the percentage of people who clicked on the ads from each day, we need to add a new column to a_clicks_by_day and b_clicks_by_day that stores the count of clicks (or the count of not clicking) divided by the total number of ad views.

Group a_clicks_by_day and b_clicks_by_day by day and pipe the result into mutate(), creating a new column percentage that is defined as count/sum(count).

Save your result to a_percentage_by_day and b_percentage_by_day and view them. Open the hint for a description of these data frames.

# define a_percentage_by_day here:
a_percentage_by_day <- a_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count))

a_percentage_by_day

# define b_percentage_by_day here:
b_percentage_by_day <- b_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count))

b_percentage_by_day

11.Filter a_percentage_by_day and b_percentage_by_day to remove all rows that do not describe clicked ads, and view them.

Was there a difference in click rates for each day between the two ads? What happened over the course of the week?

Do you recommend that ShoeFly.com use ad A or ad B?

Reveal the hint for our analysis.

# define a_percentage_by_day here:
a_percentage_by_day <- a_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count)) %>%
  filter(ad_clicked == TRUE)

a_percentage_by_day

# define b_percentage_by_day here:
b_percentage_by_day <- b_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count)) %>%
  filter(ad_clicked == TRUE)

b_percentage_by_day

*Hint:

To filter a_percentage_by_day to only include rows where ads were clicked, pipe the result from the previous task into filter() with the additional argument ad_clicked == TRUE:

a_percentage_by_day <- a_clicks_by_day

group_by(day

mutate(percentage = count/sum(count)) %>%

filter(ad_clicked == TRUE)

---
title: "A/B Testing for ShoeFly.com"
author: "Annabel Kuo"
date: "`r format(Sys.time(), '%Y-%m-%d %H:%M')`"
output: html_notebook
---

# A/B Testing for ShoeFly.com
Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

# Analyzing Ad Souces
1.Inspect the first few rows of ad_clicks using head(). What variables are stored in the columns of the data frame?

Click the hint to see a description of the columns of ad_clicks.

```{r message = FALSE, error=TRUE}
# load packages
library(readr)
library(dplyr)
```

```{r message = FALSE, error=TRUE}
# load ad clicks data
ad_clicks <- read_csv("ad_clicks.csv")
```

```{r error=TRUE}
# inspect ad_clicks here:
head(ad_clicks)
ad_clicks
```

2.Your manager wants to know which ad platform is getting the most views.

How many views (i.e., rows of the data frame) came from each utm_source? Group ad_clicks by utm_source and count the number of rows in each group. Save your result to views_by_utm, and view it.

```{r error=TRUE}
# define views_by_utm here:
views_by_utm <- ad_clicks %>%
  group_by(utm_source) %>%
  summarize(count = n())

views_by_utm
```


3.We want to know the percentage of people who clicked on ads from each utm_source. Let’s start by finding the number of clicks per utm_source.

Group ad_clicks by utm_source and ad_clicked and count the number of rows in each group, naming the resulting column count. Save your answer to the variable clicks_by_utm, and view it.

*Hint:

To group together the rows of ad_clicks with the same value for utm_source and ad_clicked with group_by():

"

clicks_by_utm <- ad_clicks %>%

  group_by(utm_source, ad_clicked)

"

To find the number of rows in each grouping by utm_source and ad_clicked, pipe the result from the grouping into summarize() and use n():

"

clicks_by_utm <- ad_clicks %>%

  group_by(utm_source, ad_clicked) %>%
  
  summarize(count = n())

"



```{r error=TRUE}
# define clicks_by_utm here:
clicks_by_utm <- ad_clicks %>%
  group_by(utm_source, ad_clicked) %>%
  summarize(count = n())

clicks_by_utm
```

4.To find the percentage of people who clicked on ads from each utm_source, we need to add a new column to ad_clicks that stores the count of clicks (or the count of not clicking) divided by the total number of ad views.

Group clicks_by_utm by utm_source and pipe the result into mutate(), creating a new column percentage that is defined as count/sum(count).

Save your result to percentage_by_utm and view it. Open the hint for a description of the percentage_by_utm data frame.

*Hint: 

To group together the rows of clicks_by_utm with the same value for utm_source with group_by():

"

percentage_by_utm <- clicks_by_utm %>%

  group_by(utm_source) %>%
  
"

To add a column that stores the percentage of users who did or did not click on the ad, pipe the result from the grouping into mutate() and create a new column percentage defined as count/sum(count):

"

percentage_by_utm <- clicks_by_utm %>%

  group_by(utm_source) %>%
  
  mutate(percentage = count/sum(count))

"

1.count represents the number of users who did or did not click on an ad

2.sum(count) represents the total number of users who saw the ad

percentage_by_utm will now contain the click and non-click rates for each utm_source/ad_clicked 

```{r error=TRUE}
# define percentage_by_utm here:
percentage_by_utm <- clicks_by_utm %>%
  group_by(utm_source) %>%
  mutate(percentage = count/sum(count))

percentage_by_utm
```

5.Filter percentage_by_utm to remove all rows that do not describe clicked ads, and view it.

Was there a difference in click rates for each source?

*Hint
```{r}
percentage_by_utm <-clicks_by_utm %>%
  group_by(utm_source) %>%
  mutate(percentage = count/sum(count)) %>%
  filter(ad_clicked == TRUE)

percentage_by_utm
```

# Analyzing an A/B Test

6.The column experimental_group tells us whether the user was shown ad A or ad B.

Were approximately the same number of people shown both ads? Group ad_clicks by experimental_group and count the rows, saving your result to experiment_split.

View experiment_split to see how users were split across the experiment groups!

```{r error=TRUE}
# define experiment_split here:
experiment_split <- ad_clicks %>%
  group_by(experimental_group) %>%
  summarize(count = n())

experiment_split
```

7.Using group_by() and the columns ad_clicked and experimental_group, check to see if a greater number of users clicked on ad A or ad B. Save your result to clicks_by_experiment, and view it.

Reveal the hint to see which ad was more effective.

```{r error=TRUE}
# define clicks_by_experiment here:
clicks_by_experiment <- ad_clicks %>%
  group_by(ad_clicked, experimental_group) %>%
  summarize(count = n())

clicks_by_experiment
```

8.The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

Start by creating two data frames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.

```{r error=TRUE}
# define a_clicks here:
a_clicks <- ad_clicks %>%
  filter(experimental_group == 'A')

a_clicks

# define b_clicks here:
b_clicks <- ad_clicks %>%
  filter(experimental_group == 'B')

b_clicks
```

9.For each data frame (a_clicks and b_clicks), we want to find the number of users who clicked on the ad (and did not click on the ad) by day. Save the results to a_clicks_by_day and b_clicks_by_day, and view them.

```{r error=TRUE}
# define a_clicks_by_day here:
a_clicks_by_day <- a_clicks %>%
  group_by(day, ad_clicked) %>%
  summarize(count = n())
a_clicks_by_day

# a_clicks_by_day <- a_clicks %>%
b_clicks_by_day <- b_clicks %>%
  group_by(day, ad_clicked) %>%
  summarize(count = n())

b_clicks_by_day
```


10.To find the percentage of people who clicked on the ads from each day, we need to add a new column to a_clicks_by_day and b_clicks_by_day that stores the count of clicks (or the count of not clicking) divided by the total number of ad views.

Group a_clicks_by_day and b_clicks_by_day by day and pipe the result into mutate(), creating a new column percentage that is defined as count/sum(count).

Save your result to a_percentage_by_day and b_percentage_by_day and view them. Open the hint for a description of these data frames.

```{r error=TRUE}
# define a_percentage_by_day here:
a_percentage_by_day <- a_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count))

a_percentage_by_day

# define b_percentage_by_day here:
b_percentage_by_day <- b_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count))

b_percentage_by_day
```

11.Filter a_percentage_by_day and b_percentage_by_day to remove all rows that do not describe clicked ads, and view them.

Was there a difference in click rates for each day between the two ads? What happened over the course of the week?

Do you recommend that ShoeFly.com use ad A or ad B?

Reveal the hint for our analysis.

```{r}
# define a_percentage_by_day here:
a_percentage_by_day <- a_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count)) %>%
  filter(ad_clicked == TRUE)

a_percentage_by_day

# define b_percentage_by_day here:
b_percentage_by_day <- b_clicks_by_day %>%
  group_by(day) %>%
  mutate(percentage = count/sum(count)) %>%
  filter(ad_clicked == TRUE)

b_percentage_by_day
```

*Hint:

To filter a_percentage_by_day to only include rows where ads were clicked, pipe the result from the previous task into filter() with the additional argument ad_clicked == TRUE:

"

a_percentage_by_day <- a_clicks_by_day

  group_by(day
  
  mutate(percentage = count/sum(count)) %>%
  
  filter(ad_clicked == TRUE)

"