Velo.com

Questions

Lightly comment your code and use pipes for readability.

Comment briefly on each of the questions, as directed. Only the the final question requires a lengthier response.

Q1

Plot the distribution of spent by checkout_system. Below you will use a t-test to compare these distributions statistically. However, a t-test assumes normally distributed data. Is that assumption valid in this case? Why or why not?

Note:

You could compare the two distributions using histograms but a density plot works better. (A boxplot is also an option.)
Make sure to include a plot title.

library(ggplot2)

ggplot(v, aes(x = spent, col = checkout_system))+
  geom_density() +
  theme_minimal() +
  labs(title = "distribution of spent by checkout system")

Answer: The assumption is valid. According to the graph, the distribution of spent by checkout system is a normal distribution with a slightly longer right tail, and hence a t-test is an approprite method to compare the distrubutions statistically.

Q2

Create a summary table of spent by checkout_system with the following statistics:

n
mean
median
standard deviation
total
the lower and upper bound of a 95% z-confidence interval for the mean.

Your table should have 2 rows and 8 columns.

v %>% 
  group_by(checkout_system) %>% 
  summarize(n = n(),
            mean = mean(spent),
            sd = sd(spent),
            median = median(spent),
            se = (sd/sqrt(n)),
            lowerCI = (mean - 1.96 * se) %>%  round(2),
            upperCI = (mean + 1.96 * se)  %>%  round(2))

## # A tibble: 2 × 8
##   checkout_system     n  mean    sd median    se lowerCI upperCI
##   <chr>           <int> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>
## 1 new              1828 2280. 1316.  2100.  30.8   2220.   2340.
## 2 old              1655 2217. 1277.  2091.  31.4   2156.   2279.

Q3

Is average spending significantly higher in the treatment group? (The treatment group consists in the customers using the new checkout system.) Answer this question using a 2 sample, 2-tailed t-test with alpha set at .05. (Note that these are the default settings for the t.test() function when vectors are supplied for the x and y arguments.)

t.test(x = filter(v, checkout_system == 'old')$spent,
       y = filter(v, checkout_system == 'new')$spent)

## 
##  Welch Two Sample t-test
## 
## data:  filter(v, checkout_system == "old")$spent and filter(v, checkout_system == "new")$spent
## t = -1.4272, df = 3464.4, p-value = 0.1536
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -148.93475   23.45215
## sample estimates:
## mean of x mean of y 
##  2217.148  2279.890

Answer: The p-value for the t-test is 0.1536, which is greater than the alpha at 0.05. In the density graph created to show the distribution of spent by checkout system appears to have little The average spending in the treatment group is not significantly different from that in the control group. Hence, we would not reject the null hypothesis. In other words, the checkout system has no positive effect in the average spending of the customers.

Q4

Create another summary table of spent by checkout_system and device. Include these same statistics:

n
mean
median
standard deviation
the lower and upper bound of a 95% confidence interval for the mean.

v %>% 
  group_by(checkout_system, device) %>% 
  summarize(n = n(),
            mean = mean(spent),
            sd = sd(spent),
            median = median(spent),
            se = (sd/sqrt(n)),
            lowerCI = (mean - 1.96 * se) %>%  round(2),
            upperCI = (mean + 1.96 * se)  %>%  round(2))

## # A tibble: 4 × 9
## # Groups:   checkout_system [2]
##   checkout_system device       n  mean    sd median    se lowerCI upperCI
##   <chr>           <chr>    <int> <dbl> <dbl>  <dbl> <dbl>   <dbl>   <dbl>
## 1 new             computer   829 2228. 1303.  2058.  45.2   2139.   2317.
## 2 new             mobile     999 2323. 1326.  2145.  42.0   2241.   2405.
## 3 old             computer   857 2256. 1274.  2147.  43.5   2171.   2342.
## 4 old             mobile     798 2175. 1279.  2027.  45.3   2086.   2264.

The table should have 4 rows and 8 columns.

Based on this information (as well as Sarah’s observation, noted in the case description, that the glitch in the checkout system seemed more prevalent for mobile users), an additional statistical comparison of new and old among just mobile users seems warranted. Make that comparison using a 2 sample, 2-tailed t-test with alpha set at .05. Report your results.

Note that a t-test can only compare two groups. Therefore, you will need to subset the data before making the comparison.

x <- v %>%
  filter(checkout_system == 'old', device == 'mobile')

y <- v %>%
  filter(checkout_system == 'new', device == 'mobile')


t.test(x = x$spent,
       y = y$spent)

## 
##  Welch Two Sample t-test
## 
## data:  x$spent and y$spent
## t = -2.399, df = 1733.1, p-value = 0.01655
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -269.13848  -27.01302
## sample estimates:
## mean of x mean of y 
##  2174.920  2322.996

Answer: The p-value is 0.01655 which is smaller than the alpha level at 0.05. The average spending in the treatment group is significantly different from that in the control group. Hence, we would reject the null hypothesis. In other words, the average spending is significant higher in the group of mobile users using the new checkout system. The glitch in the checkout system has a prevalent effect on the mobile users.

Q5

What course of action should Sarah recommend to the management at velo.com? Please incorporate your analytic results from above in fashioning an answer.

Answer: Sarah should recommend to the management at velo.com to roll out with the new checkout system and retire the old system. P-value in a t-test represents the probaility to see sample distributions falling within the range (the range depends on the confidence level), based on an assumption that the two groups of interest are not significantly different. From the statstistical testing in Q4, the p-value is at 0.01655, which is smaller than the alpha level at 0.05. Hence, there is a small probability of getting results falling within the 95% range to show that the average spendings of the mobile users are not significantly different in the new and old checkout system. In other words, the glitch in the checkout system has a prevalent effect on the mobile users, causing a loss in the revenue in velo.com. Therefore, to capture the profit from the mobile customers, velo.com should implement the new checkout system.

Challenge (Optional)

In looking at the summary tables you created above you might wonder about differences not just in spending but also in the number of customers. After all, the case description indicated that customers may have been prevented from completing purchases using the old checkout system. Here are the counts:

table(v$checkout_system)

## 
##  new  old 
## 1828 1655

Obviously there are some notable differences in the number of customers Are these differences statistically significant?

We could answer this question using simulation. For example, the binomial distribution could be used to represent the null distribution, the number of expected buyers under the null hypothesis of no difference between the checkout systems (that is, no difference in buying probability). The observed proportion of buyers under the new system is 1828 / (1828 + 1655) = .525. How often would this proportion occur under the null?

# We will use the rbinom() function to do this simulation. n refers to the number of simulations, 
# size refers to the number of trials, and prob is the probability of getting a 1 under the null. 

# Example:
rbinom(n = 1, size = 1, prob = .5)

## [1] 1

rbinom(10, 1, .5)

##  [1] 0 1 1 1 0 0 1 0 1 0

rbinom(10, 10, .5)

##  [1] 5 8 3 6 7 4 3 5 4 5

# Here is the simulation.  Note that we divide by the total number of trials to obtain the proportion of 1s.
set.seed(123)
sims <- rbinom(n = 100000, size = nrow(v), prob = .5) / nrow(v)

hist(sims)

The observed proportion would not happen very often under the null. Let’s calculate a formal p-value.

(sims >= (1828 / (1828 + 1655))) %>% mean

## [1] 0.00179

We would double this for a 2-sided test, of course, but the result is still easily statistically significant at the conventional threshold of p < .05.

The Chi-squared test is the statistical test typically used in this situation to do a formal hypothesis test of the counts in a 1 x 2 or 2 x 2 (or larger) contingency table. Here is a Kahn Academy video on it:

https://www.khanacademy.org/math/ap-statistics/chi-square-tests/chi-square-goodness-fit/v/chi-square-statistic.

And here is the Wikipedia article:

https://en.wikipedia.org/wiki/Chi-squared_test.

Here is the R function:

?chisq.test

Note that this R function takes a table as its argument:

chisq.test(table(v$checkout_system))

## 
##  Chi-squared test for given probabilities
## 
## data:  table(v$checkout_system)
## X-squared = 8.5929, df = 1, p-value = 0.003375

Notice that the p-value is almost identical to what we calculated using simulation!

Explain the chisquare test.
Run the chisquare test also on the 2 x 2 contingency table comparing checkout system and device.
Interpret the statistical results for the chisquare tests for both the 1 x 2 table and the 2 x 2 table.
What is the relevance of these for the velo.com case?

Velo.com

WingKi Yu

3/2/2023

Load data and packages

Questions

Q1

Q2

Q3

Q4

Q5

Challenge (Optional)