Logistic Regression: Predicting Purchases
To investigate the relationship between product performance ratings (Wet and Noise) and the likelihood of purchase for all season tires.
##Question 1 & 2: Develop the model & assess predictor significance
library(readxl) #allows us to import excel files
library(pROC) #allows us to run the area under the curve (AUC) package to get the plot and AUC score
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
data <- read_excel("/Users/ericheredia/Downloads/Class Exercise 15_TireRatings.xlsx")
print(head(data))
## # A tibble: 6 × 5
## Tire Wet Noise Buy_Again Purchase
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 BFGoodrich g-Force Super Sport A/S 8 7.2 6.1 0
## 2 BFGoodrich g-Force Super Sport A/S H&V 8 7.2 6.6 1
## 3 BFGoodrich g-Force T/A KDWS 7.6 7.5 6.9 1
## 4 Bridgestone B381 6.6 5.4 6.6 0
## 5 Bridgestone Insignia SE200 5.8 6.3 4 0
## 6 Bridgestone Insignia SE200-02 6.3 5.7 4.5 0
Data Description: A description of some of the features are presented in the table below.
Variable | Definition
----------|------------
1. Wet | Tire's wet handling score
2. Noise | Tire's noise rating score
3. Purchase| Whether a cusomter purchases the tire
print(summary(data))
## Tire Wet Noise Buy_Again
## Length:68 Min. :4.300 Min. :3.600 Min. :1.400
## Class :character 1st Qu.:6.450 1st Qu.:6.000 1st Qu.:3.850
## Mode :character Median :7.750 Median :7.100 Median :6.150
## Mean :7.315 Mean :6.903 Mean :5.657
## 3rd Qu.:8.225 3rd Qu.:7.925 3rd Qu.:7.400
## Max. :9.200 Max. :8.900 Max. :8.900
## Purchase
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4412
## 3rd Qu.:1.0000
## Max. :1.0000
Interpretation: Higher wet handling and noise rating scores significantly increase the likelihood of tire purchase, with wet handling having the strongest impact.
logit_model <- glm(Purchase ~ Wet + Noise, data = data, family = binomial)
print(summary(logit_model))
##
## Call:
## glm(formula = Purchase ~ Wet + Noise, family = binomial, data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -39.4982 12.4779 -3.165 0.00155 **
## Wet 3.3745 1.2641 2.670 0.00760 **
## Noise 1.8163 0.8312 2.185 0.02887 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 93.325 on 67 degrees of freedom
## Residual deviance: 27.530 on 65 degrees of freedom
## AIC: 33.53
##
## Number of Fisher Scoring iterations: 8
Interpretation: All the predictors are significant with the target variable.
logit_model <- glm(Purchase ~ Wet + Noise, data = data, family = binomial)
print(summary(logit_model))
##
## Call:
## glm(formula = Purchase ~ Wet + Noise, family = binomial, data = data)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -39.4982 12.4779 -3.165 0.00155 **
## Wet 3.3745 1.2641 2.670 0.00760 **
## Noise 1.8163 0.8312 2.185 0.02887 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 93.325 on 67 degrees of freedom
## Residual deviance: 27.530 on 65 degrees of freedom
## AIC: 33.53
##
## Number of Fisher Scoring iterations: 8
Interpretation: All the independent variables were significant (p value < 0.05)
# Fit a null model
null_model <- glm(Purchase ~ 1, data = data, family = binomial)
# Perform the likelihood ratio test
anova(null_model, logit_model, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: Purchase ~ 1
## Model 2: Purchase ~ Wet + Noise
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 67 93.325
## 2 65 27.530 2 65.795 5.162e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation: The inclusion of Wet and Noise as predictors significantly improves the model's ability to predict Purchase compared to the null model, as indicated by the highly significant p-value (p < 0.001).
null_deviance <- logit_model$null.deviance
residual_deviance <- logit_model$deviance
mcfadden_r_squared <- 1 - (residual_deviance / null_deviance)
mcfadden_r_squared <- round(mcfadden_r_squared, 2)
print(paste("McFadden R-Squared:", mcfadden_r_squared))
## [1] "McFadden R-Squared: 0.71"
Interpretation: A McFadden R-squared of 0.71 means that our LR model explains approximately 71% of the variability in the likelihood of tire purchase, relative to a model with no predictors. This is considered a moderate to good fit, where values above 0.2 to 0.4 are often seen as indicative of a useful model.
### Area Under the Curve (AUC)
The Area Under the Curve (AUC) score represents the ability of the model to correctly classify customers who will purchase the tires and those who will not.
``` r
roc_curve <- roc(data$Purchase, fitted(logit_model)) # Create the ROC curve object
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve, main = "ROC Curve for Logistic Regression Model")
auc_value <- auc(roc_curve) # Calculate the AUC value
print(paste("AUC:", round(auc_value, 2)))
## [1] "AUC: 0.97"
Interpretation:An AUC score of 0.97 indicates that the logistic regression model has a high level of accuracy in predicting between customers who are likely to purchase the tire and those who are not.
new_data_1 <- data.frame(Wet = 8, Noise = 8)
prob_1 <- predict(logit_model, newdata = new_data_1, type = "response")
prob_1_percentage <- round(prob_1 * 100, 2)
print(paste("Probability (Wet=8, Noise=8):", prob_1_percentage))
## [1] "Probability (Wet=8, Noise=8): 88.37"
new_data_2 <- data.frame(Wet = 7, Noise = 7)
prob_2 <- predict(logit_model, newdata = new_data_2, type = "response")
prob_2_percentage <- round(prob_2 * 100, 2)
print(paste("Probability (Wet=7, Noise=7):", prob_2_percentage))
## [1] "Probability (Wet=7, Noise=7): 4.06"
Interpretation:
(1) There is an 88.37% chance that a customer will purchase the tire if it has a Wet performance rating of 8 and a Noise performance rating of 8.
(2) There is a 4.06% chance that a customer will purchase the tire if it has a Wet performance rating of 7 and a Noise performance rating of 7.