Alternatives of Logistic Regression

Logistic regression provides a method in predicting the classifications based on a set of independent variables. However, in some cases, there are other models that may provide better predictive performance through the use of Bayesโ€™ Theorem. In this section we discuss three alternative classifiers: linear discriminant analysis, quadratic discriminant analysis, and naive Bayes. R codes for each method are provided below.

1. Linear Discriminant Analysis

The linear discriminant analysis (LDA) classifier results from assuming that the observations within each class come from a normal distribution with a specific mean and a common variance \(\sigma^2\). The word linear stems from the fact that the discriminant functions are linear functions of \(x\) as opposed to more complex functions. We again consider the Default credit dataset and apply LDA in classifying customers based on predictors balance and income.

library(ISLR2)
library(MASS)

names(Default)
## [1] "default" "student" "balance" "income"
head(Default)
##   default student   balance    income
## 1      No      No  729.5265 44361.625
## 2      No     Yes  817.1804 12106.135
## 3      No      No 1073.5492 31767.139
## 4      No      No  529.2506 35704.494
## 5      No      No  785.6559 38463.496
## 6      No     Yes  919.5885  7491.559
#Rename Default data to Credit
Credit <- Default

# Using the function lda() to generate the linear discriminant analysis
# lda() uses the same syntax as lm()
lda.fit <- lda(default ~ balance + income, data=Credit)
lda.fit
## Call:
## lda(default ~ balance + income, data = Credit)
## 
## Prior probabilities of groups:
##     No    Yes 
## 0.9667 0.0333 
## 
## Group means:
##       balance   income
## No   803.9438 33566.17
## Yes 1747.8217 32089.15
## 
## Coefficients of linear discriminants:
##                  LD1
## balance 2.230835e-03
## income  7.793355e-06
# The plot() function produces plots of the linear discriminants
plot(lda.fit)

# The predict() function returns a list that contains the predicted classes
lda.pred <- predict(lda.fit, Credit)
lda.prob <- lda.pred$posterior[,2]


# Suppose we want a 0.3 threshhold in classifying default
# The predictions can then be assessed using a confusion matrix
lda.class <- ifelse(lda.prob > 0.3, "Yes", "No")
table(lda.class, Credit$default)
##          
## lda.class   No  Yes
##       No  9575  182
##       Yes   92  151
# Correct classification rate (97.2%)
mean(lda.class == Credit$default)
## [1] 0.9726

2. Quadratic Discriminant Analysis

Quadratic Discriminant Analysis (QDA) provides an alernative approach in classifying the outcome variable. Like LDA, the QDA classifier results from assuming that the observations from each class are drawn from a Gaussian distribution. However, unlike LDA, QDA assumes that each class has its own covariance matrix, i.e., it assumes that an observation from the kth class is of the form \(X\) that follows \(N(\mu_k, \Sigma_k)\), where \(\Sigma_k\) is a covariance matrix for the kth class. Using QDA, we can also model the probability of default from the Default data.

# Almost the same codes were adopted from LDA but lda was changed to qda
qda.fit <- qda(default ~ balance + income, data=Credit)
qda.fit
## Call:
## qda(default ~ balance + income, data = Credit)
## 
## Prior probabilities of groups:
##     No    Yes 
## 0.9667 0.0333 
## 
## Group means:
##       balance   income
## No   803.9438 33566.17
## Yes 1747.8217 32089.15
qda.pred <- predict(qda.fit, Credit)
qda.prob <- qda.pred$posterior[,2]
qda.class <- ifelse(qda.prob > 0.3, "Yes", "No")
table(qda.class, Credit$default)
##          
## qda.class   No  Yes
##       No  9509  161
##       Yes  158  172
mean(qda.class == Credit$default)
## [1] 0.9681

3. Naive Bayes

The naive Bayes classifier takes a different track for estimating \(f_1(x),...,f_K(x)\). Instead of assuming that these functions belong to a particular family of distributions, we make a single assumption: within the kth class, the p predictors are independent. The naive Bayes function is found in the e1071 library.

library(e1071)
nb.fit <- naiveBayes(default ~ balance + income, data=Credit)
nb.fit
## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## 
## A-priori probabilities:
## Y
##     No    Yes 
## 0.9667 0.0333 
## 
## Conditional probabilities:
##      balance
## Y          [,1]     [,2]
##   No   803.9438 456.4762
##   Yes 1747.8217 341.2668
## 
##      income
## Y         [,1]     [,2]
##   No  33566.17 13318.25
##   Yes 32089.15 13804.22
# the predict() function needs the type=raw argument to generate the probabilities
nb.pred <- predict(nb.fit, Credit, type="raw")
nb.prob <- nb.pred[,2]
nb.class <- ifelse(nb.prob > 0.3, "Yes", "No")
table(nb.class, Credit$default)
##         
## nb.class   No  Yes
##      No  9493  162
##      Yes  174  171
mean(nb.class == Credit$default)
## [1] 0.9664