#**Predicting Tire Rack’s Customer Loyality
Analyze Tire Rack's consumer survey data to identify factors that influence customer satisfaction and the likelihood of repurchasing
library(readxl)
## Warning: package 'readxl' was built under R version 4.3.3
library(Hmisc)
## Warning: package 'Hmisc' was built under R version 4.3.3
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
library(pscl)
## Warning: package 'pscl' was built under R version 4.3.3
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
library(pROC)
## Warning: package 'pROC' was built under R version 4.3.3
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
tirerack = read_excel(file.choose())
head(tirerack)
## # A tibble: 6 × 5
## Tire Wet Noise Buy_Again Purchase
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 BFGoodrich g-Force Super Sport A/S 8 7.2 6.1 0
## 2 BFGoodrich g-Force Super Sport A/S H&V 8 7.2 6.6 1
## 3 BFGoodrich g-Force T/A KDWS 7.6 7.5 6.9 1
## 4 Bridgestone B381 6.6 5.4 6.6 0
## 5 Bridgestone Insignia SE200 5.8 6.3 4 0
## 6 Bridgestone Insignia SE200-02 6.3 5.7 4.5 0
summary(tirerack)
## Tire Wet Noise Buy_Again
## Length:68 Min. :4.300 Min. :3.600 Min. :1.400
## Class :character 1st Qu.:6.450 1st Qu.:6.000 1st Qu.:3.850
## Mode :character Median :7.750 Median :7.100 Median :6.150
## Mean :7.315 Mean :6.903 Mean :5.657
## 3rd Qu.:8.225 3rd Qu.:7.925 3rd Qu.:7.400
## Max. :9.200 Max. :8.900 Max. :8.900
## Purchase
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4412
## 3rd Qu.:1.0000
## Max. :1.0000
tirerackr = subset(tirerack, select = -c(Tire)) # removing the Tire names since it is not needed and naming the new data tirerackr
# purchase is our dependent variable
# Interpretation: The "tirerack" data contains information about 68 all-season tires with variables measuring "Wet" traction, "Noise" level, and "Buy Again" intention on a 10-point scale, along with a binary "Purchase" variable indicating whether customers would probably or definitely buy the tire again.In the first six rows, the "Wet" traction ranges from 5.8 to 8.0, "Noise" level from 5.4 to 7.5, and "Buy Again" from 4.0 to 6.9, with three out of six indicating a high likelihood of repurchase. Summary statistics show that the median for "Wet" is 7.75, while the mean for "Buy Again" is 5.657.
corr = rcorr(as.matrix(tirerackr))
corr
## Wet Noise Buy_Again Purchase
## Wet 1.00 0.76 0.91 0.74
## Noise 0.76 1.00 0.83 0.72
## Buy_Again 0.91 0.83 1.00 0.83
## Purchase 0.74 0.72 0.83 1.00
##
## n= 68
##
##
## P
## Wet Noise Buy_Again Purchase
## Wet 0 0 0
## Noise 0 0 0
## Buy_Again 0 0 0
## Purchase 0 0 0
# Interpretation: The correlation matrix shows that "Wet" and "Buy_Again" have the strongest relationship, with a correlation of 0.91, while the weakest correlation is between "Noise" and "Purchase," at 0.72. All predictors have some degree of correlation with "Purchase," indicating that they are useful for predicting the likelihood of repurchase.
model = glm(Purchase ~ Wet + Noise, data = tirerackr, family = binomial) # binomial because return varibale is binary
summary(model)
##
## Call:
## glm(formula = Purchase ~ Wet + Noise, family = binomial, data = tirerackr)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -39.4982 12.4779 -3.165 0.00155 **
## Wet 3.3745 1.2641 2.670 0.00760 **
## Noise 1.8163 0.8312 2.185 0.02887 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 93.325 on 67 degrees of freedom
## Residual deviance: 27.530 on 65 degrees of freedom
## AIC: 33.53
##
## Number of Fisher Scoring iterations: 8
#The output from the logistic regression model indicates that both "Wet" and "Noise" are statistically significant predictors of the "Purchase" outcome. The estimated coefficients for "Wet" and "Noise" are 3.37 and 1.82 respectively, suggesting that higher ratings in these variables increase the likelihood of purchase.The significant codes highlight the importance of each variable, with p-values below 0.05, indicating that both "Wet" and "Noise" have a significant impact on the dependent variable, "Purchase." The residual deviance of 27.530 and an AIC of 33.53 suggest that the model fits the data well.
null_model = glm(Purchase ~ 1, data=tirerackr, family = binomial)
null_model
##
## Call: glm(formula = Purchase ~ 1, family = binomial, data = tirerackr)
##
## Coefficients:
## (Intercept)
## -0.2364
##
## Degrees of Freedom: 67 Total (i.e. Null); 67 Residual
## Null Deviance: 93.32
## Residual Deviance: 93.32 AIC: 95.32
anova(null_model, model, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: Purchase ~ 1
## Model 2: Purchase ~ Wet + Noise
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 67 93.325
## 2 65 27.530 2 65.795 5.162e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pR2(model)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML r2CU
## -13.7649516 -46.6623284 65.7947536 0.7050093 0.6199946 0.8305269
#The McFadden's pseudo-R² is 0.7050093, indicating that the model explains about 70.5% of the variability relative to the null model.
roc = roc(tirerackr$Purchase, fitted(model))
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
plot(roc)
auc(roc)
## Area under the curve: 0.9741
#An AUC (Area Under the Curve) score of 0.9741 indicates a high level of accuracy for the logistic regression (LR) model in predicting buying behavior based on "Wet" and "Noise" ratings.
willbuy = data.frame(Wet = 7, Noise = 7, BuyAgain= 1) # predicting with new dataset
prob = predict (model, newdata = willbuy, type = "response")
willbuy #4.06%probability of buying again with the given scores
## Wet Noise BuyAgain
## 1 7 7 1
willbuy2 = data.frame(Wet = 8, Noise = 8, BuyAgain= 1) # did attended orientation
prob2 = predict (model, newdata = willbuy2, type = "response")
prob2 # 88.37%probability of buying again with the given scores
## 1
## 0.8836964