Project Objective

To investigate the relationship between tire rating and whether it will be purchased again or not.

Part 1: Develop the Model & Assess Predictor Significance

Step 1: Install and load required libraries

# Uncomment the lines below to install the required packages if needed.
# install.packages("readxl")
# install.packages("Hmisc")
# install.packages("pscl")
# if(!require(pROC)) install.packages("pROC")

library(readxl)
library(Hmisc)
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(pscl)
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
library(pROC)
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

###Step 2: Import and clean the data

Ratings_df <- read_excel("/Users/jb/Documents/R Files/Class Exercise 15_TireRatings.xlsx")
rate_df <- subset(Ratings_df, select = -c(Tire))

###Step 3: Summarize the data

head(rate_df)
## # A tibble: 6 × 4
##     Wet Noise Buy_Again Purchase
##   <dbl> <dbl>     <dbl>    <dbl>
## 1   8     7.2       6.1        0
## 2   8     7.2       6.6        1
## 3   7.6   7.5       6.9        1
## 4   6.6   5.4       6.6        0
## 5   5.8   6.3       4          0
## 6   6.3   5.7       4.5        0
summary(rate_df)
##       Wet            Noise         Buy_Again        Purchase     
##  Min.   :4.300   Min.   :3.600   Min.   :1.400   Min.   :0.0000  
##  1st Qu.:6.450   1st Qu.:6.000   1st Qu.:3.850   1st Qu.:0.0000  
##  Median :7.750   Median :7.100   Median :6.150   Median :0.0000  
##  Mean   :7.315   Mean   :6.903   Mean   :5.657   Mean   :0.4412  
##  3rd Qu.:8.225   3rd Qu.:7.925   3rd Qu.:7.400   3rd Qu.:1.0000  
##  Max.   :9.200   Max.   :8.900   Max.   :8.900   Max.   :1.0000

Data Description: A description of some of the features is presented in the table below.

Variable Definition Wet Represents the average ratings for each tire’s wet traction performance Noise Represents the average ratings for the noise level generated by each tire Buy Again Represents the average buy-again responses from respondents Purchase Purchase = 1, the respondent would probably or definitely buy the tire again ###Step 4: Feature Selection (i.e., correlation analysis)

corr <- rcorr(as.matrix(rate_df))
corr
##            Wet Noise Buy_Again Purchase
## Wet       1.00  0.76      0.91     0.74
## Noise     0.76  1.00      0.83     0.72
## Buy_Again 0.91  0.83      1.00     0.83
## Purchase  0.74  0.72      0.83     1.00
## 
## n= 68 
## 
## 
## P
##           Wet Noise Buy_Again Purchase
## Wet            0     0         0      
## Noise      0         0         0      
## Buy_Again  0   0               0      
## Purchase   0   0     0

###Step 5: Build the logistic regression

model <- glm(Purchase ~ Wet + Noise, data = rate_df, family = binomial)
summary(model)
## 
## Call:
## glm(formula = Purchase ~ Wet + Noise, family = binomial, data = rate_df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -39.4982    12.4779  -3.165  0.00155 **
## Wet           3.3745     1.2641   2.670  0.00760 **
## Noise         1.8163     0.8312   2.185  0.02887 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 93.325  on 67  degrees of freedom
## Residual deviance: 27.530  on 65  degrees of freedom
## AIC: 33.53
## 
## Number of Fisher Scoring iterations: 8

#Part 2: Overall Model Significance ###Likelihood Ratio Test

null_model <- glm(Purchase ~ 1, data = rate_df, family = binomial)
anova(null_model, model, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: Purchase ~ 1
## Model 2: Purchase ~ Wet + Noise
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1        67     93.325                          
## 2        65     27.530  2   65.795 5.162e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

###Pseudo-R-squared

pR2(model)
## fitting null model for pseudo-r2
##         llh     llhNull          G2    McFadden        r2ML        r2CU 
## -13.7649516 -46.6623284  65.7947536   0.7050093   0.6199946   0.8305269

###Area Under the Curve (AUC)

roc_curve <- roc(rate_df$Purchase, fitted(model))
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc_curve)

auc(roc_curve)
## Area under the curve: 0.9741

###Part 3: Predicting with new information

new_data1 <- data.frame(Wet = 7.0, Noise = 7.0)
new_data2 <- data.frame(Wet = 8.0, Noise = 8.0)

prob1 <- predict(model, newdata = new_data1, type = "response")
prob1 * 100
##        1 
## 4.058753
prob2 <- predict(model, newdata = new_data2, type = "response")
prob2 * 100
##        1 
## 88.36964