Velo.com. Case

Q1

# Load necessary libraries
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

# Load the dataset
velo_data <- read.csv("velo.csv")

# Plot the distribution of spent by checkout_system
ggplot(velo_data, aes(x = spent, fill = checkout_system)) +
  geom_density(alpha = 0.5) +
  labs(title = "Distribution of Spent by Checkout System")

Q2

# Create a summary table of spent by checkout_system
summary_table <- velo_data %>%
  group_by(checkout_system) %>%
  summarize(
    n = n(),
    mean = mean(spent),
    median = median(spent),
    std_dev = sd(spent),
    total = sum(spent),
    lower_bound = mean - qt(0.975, n - 1) * (std_dev / sqrt(n)),
    upper_bound = mean + qt(0.975, n - 1) * (std_dev / sqrt(n))
  )

print(summary_table)

## # A tibble: 2 × 8
##   checkout_system     n  mean median std_dev    total lower_bound upper_bound
##   <chr>           <int> <dbl>  <dbl>   <dbl>    <dbl>       <dbl>       <dbl>
## 1 new              1828 2280.  2100.   1316. 4167638.       2220.       2340.
## 2 old              1655 2217.  2091.   1277. 3669381.       2156.       2279.

Q3

# Is average spending significantly higher in the treatment group?
t_test_result <- t.test(
  velo_data$spent[velo_data$checkout_system == "new"],
  velo_data$spent[velo_data$checkout_system == "old"],
  alternative = "two.sided"
)

# Print the results of the t-test
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  velo_data$spent[velo_data$checkout_system == "new"] and velo_data$spent[velo_data$checkout_system == "old"]
## t = 1.4272, df = 3464.4, p-value = 0.1536
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -23.45215 148.93475
## sample estimates:
## mean of x mean of y 
##  2279.890  2217.148

Q4

# Req: Create a summary table of spent by checkout_system and device
summary_table_device <- velo_data %>%
  group_by(checkout_system, device) %>%
  summarize(
    n = n(),
    mean = mean(spent),
    median = median(spent),
    std_dev = sd(spent),
    lower_bound = mean - qt(0.975, n - 1) * (std_dev / sqrt(n)),
    upper_bound = mean + qt(0.975, n - 1) * (std_dev / sqrt(n)),
    .groups = 'drop' # Override grouped output
  )

# Print the summary table
print(summary_table_device)

## # A tibble: 4 × 8
##   checkout_system device       n  mean median std_dev lower_bound upper_bound
##   <chr>           <chr>    <int> <dbl>  <dbl>   <dbl>       <dbl>       <dbl>
## 1 new             computer   829 2228.  2058.   1303.       2139.       2317.
## 2 new             mobile     999 2323.  2145.   1326.       2241.       2405.
## 3 old             computer   857 2256.  2147.   1274.       2171.       2342.
## 4 old             mobile     798 2175.  2027.   1279.       2086.       2264.

# Req 2: Perform a 2-sample, 2-tailed t-test for mobile users
t_test_mobile_result <- t.test(
  velo_data$spent[velo_data$checkout_system == "new" & velo_data$device == "mobile"],
  velo_data$spent[velo_data$checkout_system == "old" & velo_data$device == "mobile"],
  alternative = "two.sided"
)

# Print the results of the t-test for mobile users
print(t_test_mobile_result)

## 
##  Welch Two Sample t-test
## 
## data:  velo_data$spent[velo_data$checkout_system == "new" & velo_data$device == "mobile"] and velo_data$spent[velo_data$checkout_system == "old" & velo_data$device == "mobile"]
## t = 2.399, df = 1733.1, p-value = 0.01655
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   27.01302 269.13848
## sample estimates:
## mean of x mean of y 
##  2322.996  2174.920

Q5

## The Two Sample t-test from question #3 comparing the spending of customers in the new checkout system  and the old checkout system yielded a t-stat of 1.4272 with a p-value of 0.1536. The p-value is above the  significance level of 0.05, indicating that there is no strong statistical evidence to reject the null hypothesis. This suggests that there is no significant difference in spending between the two checkout systems.

## The Two Sample t-test from question #4 specifically for mobile users comparing the new and old checkout systems yielded a t-stat of 2.399 with a p-value of 0.01655. The p-value is less than 0.05, indicating statistical significance. This does imply the new checkout system led to a statistically significant increase in spending compared to the old system.

## Based on this statistical evidence I would recommend that Sarah should advise Velo.com to implement the New Checkout System for mobile users. This is because there is a significant improvement in the amount of spending for mobile users. This should enhance their experience and increase revenue for Velo.com. While there was no significant findings from the data for all users, it is critical to keep in mind that the performance of the system could vary over time and there may be more analysis needed to track impact over time. In addition to the increase in spending this process will also increase customer satisfaction. Increases in customer experience could be show a significant increase in revenue as a result. This is why implementing the New Checkout System for mobile users and continuing to track the significance for all user experience would be impactful for Velo.com.

Velo.com. Case

TX Harris

10/15/2023

Q1

Q2

Q3

Q4

Q5