We will do a simple logistic regression problem.
The store manager wants to find out the loyalty of his customer. For this he collects data regarding customers’
* Brand preference = on a scale 1-7
* Shopping attitude = on a scale 1-7
* Product preference = on a scale 1-7
So we will load data into R.
head(loyalty)
## Loyalty brand product shopping
## 1 1 4 3 5
## 2 1 6 4 4
## 3 1 5 2 4
## 4 1 7 5 5
## 5 1 6 3 4
## 6 1 3 4 5
dim(loyalty)
## [1] 30 4
str(loyalty)
## 'data.frame': 30 obs. of 4 variables:
## $ Loyalty : int 1 1 1 1 1 1 1 1 1 1 ...
## $ brand : int 4 6 5 7 6 3 5 5 7 7 ...
## $ product : int 3 4 2 5 3 4 5 4 5 6 ...
## $ shopping: int 5 4 4 5 4 5 5 2 4 4 ...
# Since our response variable is numeric, we will go ahead and change that to factor
loyalty$Loyalty <- as.factor(loyalty$Loyalty)
str(loyalty)
## 'data.frame': 30 obs. of 4 variables:
## $ Loyalty : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## $ brand : int 4 6 5 7 6 3 5 5 7 7 ...
## $ product : int 3 4 2 5 3 4 5 4 5 6 ...
## $ shopping: int 5 4 4 5 4 5 5 2 4 4 ...
We build the logistic regression model on the data
loyalty.logreg <- glm(Loyalty ~ brand + product + shopping, family = "binomial", data = loyalty)
summary(loyalty.logreg)
##
## Call:
## glm(formula = Loyalty ~ brand + product + shopping, family = "binomial",
## data = loyalty)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5075 -0.5452 0.0459 0.5840 1.6774
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.642 3.346 -2.58 0.0098 **
## brand 1.274 0.479 2.66 0.0078 **
## product 0.186 0.322 0.58 0.5629
## shopping 0.590 0.491 1.20 0.2297
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41.589 on 29 degrees of freedom
## Residual deviance: 23.471 on 26 degrees of freedom
## AIC: 31.47
##
## Number of Fisher Scoring iterations: 5
To check how the model performed, we will do model validation.
Classification table
loyalty.probs <- predict(loyalty.logreg, type = "response")
loyalty.probs[1:5]
## 1 2 3 4 5
## 0.4905 0.8915 0.6129 0.9846 0.8721
loyalty.pred <- ifelse(loyalty.probs > .5, "1", "0")
table(loyalty.pred, loyalty$Loyalty)
##
## loyalty.pred 0 1
## 0 12 3
## 1 3 12
mean(loyalty.pred == loyalty$Loyalty)
## [1] 0.8
ROC Curve
require(ROCR)
loyalty.roc <- prediction(loyalty.probs, loyalty$Loyalty)
plot(performance(loyalty.roc, "tpr", "fpr"), col = "red", main = "ROC Curve")
abline(0,1, lty = 8, col = "grey")
Kolmogorov-Smirnov test
ks <- performance(loyalty.roc, "tpr", "fpr")
loyalty.ks <- max(attr(ks, "y.values")[[1]] - (attr(ks, "x.values")[[1]]))
loyalty.ks
## [1] 0.7333
Since we see only one variable is significant, we’ll run another model with this significant variable only.
loyalty.logreg2 <- glm(Loyalty ~ brand, family = "binomial", data = loyalty)
summary(loyalty.logreg2)
##
## Call:
## glm(formula = Loyalty ~ brand, family = "binomial", data = loyalty)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0147 -0.4665 0.0767 0.5308 2.1317
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.21 2.21 -2.81 0.005 **
## brand 1.35 0.47 2.87 0.004 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41.589 on 29 degrees of freedom
## Residual deviance: 25.100 on 28 degrees of freedom
## AIC: 29.1
##
## Number of Fisher Scoring iterations: 5
We will do model validation for this model too.
Classification table
loyalty.probs2 <- predict(loyalty.logreg2, type = "response")
loyalty.probs2[1:5]
## 1 2 3 4 5
## 0.3073 0.8686 0.6313 0.9623 0.8686
loyalty.pred2 <- ifelse(loyalty.probs > 0.5, "1", "0")
table(loyalty.pred2, loyalty$Loyalty)
##
## loyalty.pred2 0 1
## 0 12 3
## 1 3 12
mean(loyalty.pred2 == loyalty$Loyalty)
## [1] 0.8
ROC Curve
loyalty.roc2 <- prediction(loyalty.probs2, loyalty$Loyalty)
plot(performance(loyalty.roc2, "tpr", "fpr"), col = "red", main = "ROC Curve")
abline(0,1, lty = 2, col = "grey")
Kolmogorov-Smirnov test
ks2 <- performance(loyalty.roc2, "tpr", "fpr")
loyalty.ks2 <- max(attr(ks2, "y.values")[[1]] - (attr(ks2, "x.values")[[1]]))
loyalty.ks2
## [1] 0.6667
For every unit change in brand preference of the customer, the loyalty logit ratio increases by 1.35. Based on this model, if there is a new customer with a brand preference of 4, then what is the probability of him being loyal to the brand.
# Brand value from the equation is multiplied by 4
1.35*4
## [1] 5.4
# Result is added with the constant
(-6.21) + 5.4
## [1] -0.81
# We take the exponent of the outcome value
exp(0.81)
## [1] 2.248
# Probability
2.248/(1 + 2.248)
## [1] 0.6921
So a new customer with a brand preference of 4 rating can be said to have a loyalty probability of 0.69