Tire-Rack.knit

Project Objective:

Analyze Tire Rack's consumer survey data to identify factors that influence customer satisfaction and the likelihood of repurchasing

Step 1: Install and load required libraries

library(readxl)

## Warning: package 'readxl' was built under R version 4.3.3

library(Hmisc)

## Warning: package 'Hmisc' was built under R version 4.3.3

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':
## 
##     format.pval, units

library(pscl)

## Warning: package 'pscl' was built under R version 4.3.3

## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.

library(pROC)

## Warning: package 'pROC' was built under R version 4.3.3

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

Step 2 & 3: Explore the dataset

tirerack = read_excel(file.choose())
head(tirerack)

## # A tibble: 6 × 5
##   Tire                                     Wet Noise Buy_Again Purchase
##   <chr>                                  <dbl> <dbl>     <dbl>    <dbl>
## 1 BFGoodrich g-Force Super Sport A/S       8     7.2       6.1        0
## 2 BFGoodrich g-Force Super Sport A/S H&V   8     7.2       6.6        1
## 3 BFGoodrich g-Force T/A KDWS              7.6   7.5       6.9        1
## 4 Bridgestone B381                         6.6   5.4       6.6        0
## 5 Bridgestone Insignia SE200               5.8   6.3       4          0
## 6 Bridgestone Insignia SE200-02            6.3   5.7       4.5        0

summary(tirerack)

##      Tire                Wet            Noise         Buy_Again    
##  Length:68          Min.   :4.300   Min.   :3.600   Min.   :1.400  
##  Class :character   1st Qu.:6.450   1st Qu.:6.000   1st Qu.:3.850  
##  Mode  :character   Median :7.750   Median :7.100   Median :6.150  
##                     Mean   :7.315   Mean   :6.903   Mean   :5.657  
##                     3rd Qu.:8.225   3rd Qu.:7.925   3rd Qu.:7.400  
##                     Max.   :9.200   Max.   :8.900   Max.   :8.900  
##     Purchase     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4412  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

tirerackr = subset(tirerack, select = -c(Tire)) # removing the Tire names since it is not needed and naming the new data tirerackr

# purchase is our dependent variable
# Interpretation: The "tirerack" data contains information about 68 all-season tires with variables measuring "Wet" traction, "Noise" level, and "Buy Again" intention on a 10-point scale, along with a binary "Purchase" variable indicating whether customers would probably or definitely buy the tire again.In the first six rows, the "Wet" traction ranges from 5.8 to 8.0, "Noise" level from 5.4 to 7.5, and "Buy Again" from 4.0 to 6.9, with three out of six indicating a high likelihood of repurchase. Summary statistics show that the median for "Wet" is 7.75, while the mean for "Buy Again" is 5.657.

Step 4: Feature selection (i.e., Correlation Analysis)

corr = rcorr(as.matrix(tirerackr))
corr

##            Wet Noise Buy_Again Purchase
## Wet       1.00  0.76      0.91     0.74
## Noise     0.76  1.00      0.83     0.72
## Buy_Again 0.91  0.83      1.00     0.83
## Purchase  0.74  0.72      0.83     1.00
## 
## n= 68 
## 
## 
## P
##           Wet Noise Buy_Again Purchase
## Wet            0     0         0      
## Noise      0         0         0      
## Buy_Again  0   0               0      
## Purchase   0   0     0

# Interpretation: The correlation matrix shows that "Wet" and "Buy_Again" have the strongest relationship, with a correlation of 0.91, while the weakest correlation is between "Noise" and "Purchase," at 0.72. All predictors have some degree of correlation with "Purchase," indicating that they are useful for predicting the likelihood of repurchase.

Step 5: Build logistic regression model and assess predictor significance

model = glm(Purchase ~ Wet + Noise, data = tirerackr, family = binomial) # binomial because return varibale is binary
summary(model)

## 
## Call:
## glm(formula = Purchase ~ Wet + Noise, family = binomial, data = tirerackr)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -39.4982    12.4779  -3.165  0.00155 **
## Wet           3.3745     1.2641   2.670  0.00760 **
## Noise         1.8163     0.8312   2.185  0.02887 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 93.325  on 67  degrees of freedom
## Residual deviance: 27.530  on 65  degrees of freedom
## AIC: 33.53
## 
## Number of Fisher Scoring iterations: 8

#The output from the logistic regression model indicates that both "Wet" and "Noise" are statistically significant predictors of the "Purchase" outcome. The estimated coefficients for "Wet" and "Noise" are 3.37 and 1.82 respectively, suggesting that higher ratings in these variables increase the likelihood of purchase.The significant codes highlight the importance of each variable, with p-values below 0.05, indicating that both "Wet" and "Noise" have a significant impact on the dependent variable, "Purchase." The residual deviance of 27.530 and an AIC of 33.53 suggest that the model fits the data well.

Step 6: Likeli hood ratio test

null_model = glm(Purchase ~ 1, data=tirerackr, family = binomial)
null_model

## 
## Call:  glm(formula = Purchase ~ 1, family = binomial, data = tirerackr)
## 
## Coefficients:
## (Intercept)  
##     -0.2364  
## 
## Degrees of Freedom: 67 Total (i.e. Null);  67 Residual
## Null Deviance:       93.32 
## Residual Deviance: 93.32     AIC: 95.32

anova(null_model, model, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: Purchase ~ 1
## Model 2: Purchase ~ Wet + Noise
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1        67     93.325                          
## 2        65     27.530  2   65.795 5.162e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

pseudo r square

pR2(model)

## fitting null model for pseudo-r2

##         llh     llhNull          G2    McFadden        r2ML        r2CU 
## -13.7649516 -46.6623284  65.7947536   0.7050093   0.6199946   0.8305269

#The McFadden's pseudo-R² is 0.7050093, indicating that the model explains about 70.5% of the variability relative to the null model.

Area Under the Curve (AUC)

roc = roc(tirerackr$Purchase, fitted(model))

## Setting levels: control = 0, case = 1

## Setting direction: controls < cases

plot(roc)

auc(roc)

## Area under the curve: 0.9741

#An AUC (Area Under the Curve) score of 0.9741 indicates a high level of accuracy for the logistic regression (LR) model in predicting buying behavior based on "Wet" and "Noise" ratings.

Project Objective:

Step 1: Install and load required libraries

Step 2 & 3: Explore the dataset

Step 4: Feature selection (i.e., Correlation Analysis)

Step 5: Build logistic regression model and assess predictor significance

Step 6: Likeli hood ratio test

pseudo r square

Area Under the Curve (AUC)

Predicting with New Information