Q1

# Load necessary libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Load the dataset
velo <- read.csv("velo.csv")

# Manually change factor levels for checkout_system
velo <- velo %>%
  mutate(checkout_system = factor(checkout_system, 
                                  levels = c("old", "new")))
velo$checkout_system <- factor(velo$checkout_system, levels = c("old", "new"))

# Fit the simple linear regression model
model <- lm(spent ~ checkout_system, data = velo)

# Display the model summary
summary(model)
## 
## Call:
## lm(formula = spent ~ checkout_system, data = velo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2256.0  -986.2  -156.5   791.8  6541.5 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2217.15      31.90  69.511   <2e-16 ***
## checkout_systemnew    62.74      44.03   1.425    0.154    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1298 on 3481 degrees of freedom
## Multiple R-squared:  0.000583,   Adjusted R-squared:  0.0002959 
## F-statistic: 2.031 on 1 and 3481 DF,  p-value: 0.1542

1.) Average spending for customers using the old system is $2217.50, and customers using the new system is $2280.89.

2.) The difference in spending is estimated to be $62.74, however, the P-Value is above the value of 0.05 which means that it is not considered to be statistically significant at the P-Value<0.05 level.

3.) In the previous module our prediction ended up being that the average spending would be the exact same values in our results.

Q2

# Load necessary libraries
library(dplyr)

# Load the dataset
velo <- read.csv("velo.csv")

# Filter data for mobile users
mobile_data <- velo %>% filter(device == "mobile")

# Manually change factor levels for checkout_system
mobile_data$checkout_system <- factor(mobile_data$checkout_system, 
                                      levels = c("old", "new"))
# Filter data for mobile users
mobile_data <- filter(velo, device == "mobile")

# Fit the simple linear regression model for mobile users
model_mobile <- lm(spent ~ checkout_system, data = mobile_data)

# Display the model summary
summary(model_mobile)
## 
## Call:
## lm(formula = spent ~ checkout_system, data = mobile_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2284.4  -976.2  -171.8   803.1  6498.4 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2323.00      41.30  56.247   <2e-16 ***
## checkout_systemold  -148.08      61.98  -2.389    0.017 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1305 on 1795 degrees of freedom
## Multiple R-squared:  0.00317,    Adjusted R-squared:  0.002615 
## F-statistic: 5.708 on 1 and 1795 DF,  p-value: 0.01699

1.) The difference in mobile customer spending is estimated to be -$148.08 between the new and old systems. Meaning that the new system will have a higher average spend in this regression.

2.) The P-Value here for this regression is 0.017 which is lower than the required value of P<0.05 in order for the results to be statisitcally significant which means that they are significant at this value.

3.) These results also correlate to the results found in the previous case. The values are exactly the same as well.

Q3

# Load necessary libraries
library(dplyr)

# Load the dataset
velo <- read.csv("velo.csv")

# Filter data for mobile users
mobile_data <- velo %>% filter(device == "mobile")

# Manually change factor levels for checkout_system
mobile_data$checkout_system <- factor(mobile_data$checkout_system, 
                                      levels = c("old", "new"))

# Fit the simple linear regression model
model_mobile <- lm(spent ~ checkout_system, data = mobile_data)

# Calculate the confidence intervals for the entire dataset
ci_entire <- confint(model, level = 0.95)

# Create a data frame to store the confidence intervals with labels
confidence_intervals <- data.frame(
  Parameter = c("Intercept (old system)", "checkout_systemnew (new system)"),
  Lower_Bound = c(ci_entire[1, 1], ci_entire[2, 1]),
  Upper_Bound = c(ci_entire[1, 2], ci_entire[2, 2])
)

# Print the confidence intervals with explanations
confidence_intervals
##                         Parameter Lower_Bound Upper_Bound
## 1          Intercept (old system)  2154.61124   2279.6856
## 2 checkout_systemnew (new system)   -23.58175    149.0644

The confidence interval means that there is a 95% confidence that the intercept will fall within each of these given ranges for the type of system in use. The CI for the new checkout system includes positive and negative values which suggests there is not a high confidence that there is a statistically significant difference in the average spend between systems. This also implies that there is some uncertainty as to whether the new system will end up resulting in a statistically significant difference in average spend.

Q4

# Load necessary libraries
library(dplyr)

# Load the dataset
velo <- read.csv("velo.csv")

# Calculate 'num_customers' from the 'velo' dataset
num_customers <- nrow(velo)

# Filter data for mobile users
mobile_data <- velo %>% filter(device == "mobile")

# Filter data for mobile users using the new checkout system
mobile_data_new <- velo %>% filter(device == "mobile" & checkout_system == "new")

# Fit a simple linear regression model for the new checkout system
model_mobile_new <- lm(spent ~ 1, data = mobile_data_new)

# Calculate the confidence interval for the new checkout system using 2 as the critical value
ci_new <- confint(model_mobile_new, level = 0.95, alternative = "two.sided")

# Calculate the coefficient for the new checkout system
coefficient_new <- coef(model_mobile_new)[1]

# Calculate lower and upper bounds of revenue projections for the new system
lower_bound_new <- (coefficient_new - ci_new[1, 1]) * num_customers
upper_bound_new <- (coefficient_new + ci_new[1, 1]) * num_customers

# Print the lower and upper bounds for the new system
cat("checkout_systemnew lower bound:", lower_bound_new, "\n")
## checkout_systemnew lower bound: 286780.7
cat("checkout_systemnew upper bound:", upper_bound_new, "\n")
## checkout_systemnew upper bound: 15895210
# Filter data for mobile users using the old checkout system
mobile_data_old <- velo %>% filter(device == "mobile" & checkout_system == "old")

# Fit a simple linear regression model for the old checkout system
model_mobile_old <- lm(spent ~ 1, data = mobile_data_old)

# Calculate the confidence interval for the old checkout system using 2 as the critical value
ci_old <- confint(model_mobile_old, level = 0.95, alternative = "two.sided")

# Calculate the coefficient for the old checkout system
coefficient_old <- coef(model_mobile_old)[1]

# Calculate lower and upper bounds of revenue projections for the old system
lower_bound_old <- (coefficient_old - ci_old[1, 1]) * num_customers
upper_bound_old <- (coefficient_old + ci_old[1, 1]) * num_customers

# Print the lower and upper bounds for the old system
cat("checkout_systemold lower bound:", lower_bound_old, "\n")
## checkout_systemold lower bound: 309511.1
cat("checkout_systemold upper bound:", upper_bound_old, "\n")
## checkout_systemold upper bound: 14840984

What range of increased revenue might the company expect using the new checkout system (compared to the old system) and, based on this, does the coefficient estimate for checkout_system have practical significance in your view?

The bounds are listed above. Reviewing this data, there is not a clear practical significance for the coefficient for checkout_system. The values for the new and old system are practically the same and show very little room for difference. It is not necessarily true that the new system will provide a significant increase in revenue like the previous case work suggested.

The range of increased revenue the company could expect using the new checkout system is estimated to be $22,730.4 - $1,054,226.

Q5

## It seems that Sarah should recommend a cautious approach to the management of Velo.com based on the following data and information gathered:  

## The simple linear regression analysis showed that there is not strong statistical evidence to support a significant difference in average customer spending between the old and new checkout systems. The coefficient estimate for the "new" system was $62.74, but the p-value was 0.154, indicating that the difference was not statistically significant at the 0.05 significance level.

## For mobile users, the analysis suggested a lower average spending for the "old" system based on a coefficient estimate of -148.08. However, this result was statistically significant with a p-value of 0.017.

## If we look at the results from Question 4, the analysis returned a range of increased revenue to the company. This range had a small number (given the previous revenue estimates) as the lower bound and significantly positive number as the upper bound. This range will likely account for the uncertainty reflected in our given Confidence Intervals. There is a greater chance of the decrease in revenue because of the smaller lower bound.

## My specific recommendation to Sarah and Velo.Com is that given the mixed results and seemingly large range of revenue results, there should be a cautious step forward in implementing the new system. It may not result in the desired outcome. There should be further testing done to really analyze the impact of the new system in a specific and data driven manner. Sarah should of course recommend monitoring the reuslts over time to see if there are longer term trends that may become apparent in an ongoing scenario.