Q1
# Load necessary libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Load the dataset
velo <- read.csv("velo.csv")
# Manually change factor levels for checkout_system
velo <- velo %>%
mutate(checkout_system = factor(checkout_system,
levels = c("old", "new")))
velo$checkout_system <- factor(velo$checkout_system, levels = c("old", "new"))
# Fit the simple linear regression model
model <- lm(spent ~ checkout_system, data = velo)
# Display the model summary
summary(model)
##
## Call:
## lm(formula = spent ~ checkout_system, data = velo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2256.0 -986.2 -156.5 791.8 6541.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2217.15 31.90 69.511 <2e-16 ***
## checkout_systemnew 62.74 44.03 1.425 0.154
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1298 on 3481 degrees of freedom
## Multiple R-squared: 0.000583, Adjusted R-squared: 0.0002959
## F-statistic: 2.031 on 1 and 3481 DF, p-value: 0.1542
1.) Average spending for customers using the old system is $2217.50,
and customers using the new system is $2280.89.
2.) The difference in spending is estimated to be $62.74, however,
the P-Value is above the value of 0.05 which means that it is not
considered to be statistically significant at the P-Value<0.05
level.
3.) In the previous module our prediction ended up being that the
average spending would be the exact same values in our results.
Q2
# Load necessary libraries
library(dplyr)
# Load the dataset
velo <- read.csv("velo.csv")
# Filter data for mobile users
mobile_data <- velo %>% filter(device == "mobile")
# Manually change factor levels for checkout_system
mobile_data$checkout_system <- factor(mobile_data$checkout_system,
levels = c("old", "new"))
# Filter data for mobile users
mobile_data <- filter(velo, device == "mobile")
# Fit the simple linear regression model for mobile users
model_mobile <- lm(spent ~ checkout_system, data = mobile_data)
# Display the model summary
summary(model_mobile)
##
## Call:
## lm(formula = spent ~ checkout_system, data = mobile_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2284.4 -976.2 -171.8 803.1 6498.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2323.00 41.30 56.247 <2e-16 ***
## checkout_systemold -148.08 61.98 -2.389 0.017 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1305 on 1795 degrees of freedom
## Multiple R-squared: 0.00317, Adjusted R-squared: 0.002615
## F-statistic: 5.708 on 1 and 1795 DF, p-value: 0.01699
1.) The difference in mobile customer spending is estimated to be
-$148.08 between the new and old systems. Meaning that the new system
will have a higher average spend in this regression.
2.) The P-Value here for this regression is 0.017 which is lower
than the required value of P<0.05 in order for the results to be
statisitcally significant which means that they are significant at this
value.
3.) These results also correlate to the results found in the
previous case. The values are exactly the same as well.
Q3
# Load necessary libraries
library(dplyr)
# Load the dataset
velo <- read.csv("velo.csv")
# Filter data for mobile users
mobile_data <- velo %>% filter(device == "mobile")
# Manually change factor levels for checkout_system
mobile_data$checkout_system <- factor(mobile_data$checkout_system,
levels = c("old", "new"))
# Fit the simple linear regression model
model_mobile <- lm(spent ~ checkout_system, data = mobile_data)
# Calculate the confidence intervals for the entire dataset
ci_entire <- confint(model, level = 0.95)
# Create a data frame to store the confidence intervals with labels
confidence_intervals <- data.frame(
Parameter = c("Intercept (old system)", "checkout_systemnew (new system)"),
Lower_Bound = c(ci_entire[1, 1], ci_entire[2, 1]),
Upper_Bound = c(ci_entire[1, 2], ci_entire[2, 2])
)
# Print the confidence intervals with explanations
confidence_intervals
## Parameter Lower_Bound Upper_Bound
## 1 Intercept (old system) 2154.61124 2279.6856
## 2 checkout_systemnew (new system) -23.58175 149.0644
The confidence interval means that there is a 95% confidence that
the intercept will fall within each of these given ranges for the type
of system in use. The CI for the new checkout system includes positive
and negative values which suggests there is not a high confidence that
there is a statistically significant difference in the average spend
between systems. This also implies that there is some uncertainty as to
whether the new system will end up resulting in a statistically
significant difference in average spend.
Q4
# Load necessary libraries
library(dplyr)
# Load the dataset
velo <- read.csv("velo.csv")
# Calculate 'num_customers' from the 'velo' dataset
num_customers <- nrow(velo)
# Filter data for mobile users
mobile_data <- velo %>% filter(device == "mobile")
# Filter data for mobile users using the new checkout system
mobile_data_new <- velo %>% filter(device == "mobile" & checkout_system == "new")
# Fit a simple linear regression model for the new checkout system
model_mobile_new <- lm(spent ~ 1, data = mobile_data_new)
# Calculate the confidence interval for the new checkout system using 2 as the critical value
ci_new <- confint(model_mobile_new, level = 0.95, alternative = "two.sided")
# Calculate the coefficient for the new checkout system
coefficient_new <- coef(model_mobile_new)[1]
# Calculate lower and upper bounds of revenue projections for the new system
lower_bound_new <- (coefficient_new - ci_new[1, 1]) * num_customers
upper_bound_new <- (coefficient_new + ci_new[1, 1]) * num_customers
# Print the lower and upper bounds for the new system
cat("checkout_systemnew lower bound:", lower_bound_new, "\n")
## checkout_systemnew lower bound: 286780.7
cat("checkout_systemnew upper bound:", upper_bound_new, "\n")
## checkout_systemnew upper bound: 15895210
# Filter data for mobile users using the old checkout system
mobile_data_old <- velo %>% filter(device == "mobile" & checkout_system == "old")
# Fit a simple linear regression model for the old checkout system
model_mobile_old <- lm(spent ~ 1, data = mobile_data_old)
# Calculate the confidence interval for the old checkout system using 2 as the critical value
ci_old <- confint(model_mobile_old, level = 0.95, alternative = "two.sided")
# Calculate the coefficient for the old checkout system
coefficient_old <- coef(model_mobile_old)[1]
# Calculate lower and upper bounds of revenue projections for the old system
lower_bound_old <- (coefficient_old - ci_old[1, 1]) * num_customers
upper_bound_old <- (coefficient_old + ci_old[1, 1]) * num_customers
# Print the lower and upper bounds for the old system
cat("checkout_systemold lower bound:", lower_bound_old, "\n")
## checkout_systemold lower bound: 309511.1
cat("checkout_systemold upper bound:", upper_bound_old, "\n")
## checkout_systemold upper bound: 14840984
What range of increased revenue might the company expect using the
new checkout system (compared to the old system) and, based on this,
does the coefficient estimate for checkout_system have practical
significance in your view?
The bounds are listed above. Reviewing this data, there is not a
clear practical significance for the coefficient for checkout_system.
The values for the new and old system are practically the same and show
very little room for difference. It is not necessarily true that the new
system will provide a significant increase in revenue like the previous
case work suggested.
The range of increased revenue the company could expect using the
new checkout system is estimated to be $22,730.4 - $1,054,226.
Q5
## It seems that Sarah should recommend a cautious approach to the management of Velo.com based on the following data and information gathered:
## The simple linear regression analysis showed that there is not strong statistical evidence to support a significant difference in average customer spending between the old and new checkout systems. The coefficient estimate for the "new" system was $62.74, but the p-value was 0.154, indicating that the difference was not statistically significant at the 0.05 significance level.
## For mobile users, the analysis suggested a lower average spending for the "old" system based on a coefficient estimate of -148.08. However, this result was statistically significant with a p-value of 0.017.
## If we look at the results from Question 4, the analysis returned a range of increased revenue to the company. This range had a small number (given the previous revenue estimates) as the lower bound and significantly positive number as the upper bound. This range will likely account for the uncertainty reflected in our given Confidence Intervals. There is a greater chance of the decrease in revenue because of the smaller lower bound.
## My specific recommendation to Sarah and Velo.Com is that given the mixed results and seemingly large range of revenue results, there should be a cautious step forward in implementing the new system. It may not result in the desired outcome. There should be further testing done to really analyze the impact of the new system in a specific and data driven manner. Sarah should of course recommend monitoring the reuslts over time to see if there are longer term trends that may become apparent in an ongoing scenario.