See the notes for this section here .
The data file contains session time for page A vs. B:
library(ggplot2)
session_times <- read.csv('./data_files/web_page_data.csv')
session_times %>%
head() %>%
kable() %>%
kable_styling()| Page | Time |
|---|---|
| Page A | 0.21 |
| Page B | 2.53 |
| Page A | 0.35 |
| Page B | 0.71 |
| Page A | 0.67 |
| Page B | 0.85 |
Visualize distribution of session times for either page:
Steps:
perm_fun <- function(x, n1, n2) {
n <- n1 + n2
idx_b <- sample(1:n, n1)
idx_a <- setdiff(1:n, idx_b)
mean_diff <- mean(x[idx_b]) - mean(x[idx_a])
return(mean_diff)
}## Resample data 1000 times
perm_diffs <- rep(0, 1000)
for(i in 1:1000)
perm_diffs[i] = perm_fun(session_times[,'Time'], 21, 15)
## Plot distribution of differences
hist(perm_diffs, xlab='Session time differences (in seconds)')
## Visualize where difference between page A anad B fall within the distribution
mean_a <- mean(session_times[session_times['Page']=='Page A', 'Time'])
mean_b <- mean(session_times[session_times['Page']=='Page B', 'Time'])
abline(v = mean_b - mean_a)The observed difference between mean session times for page A and B fall within the range of chance variation, so it is not statistically significant.