Logistic Regression and Linear Discriminant Analysis

library(MASS)
library(stats)
library(dplyr)
library(caret)

Linear Discriminant Analysis (LDA)

LDA provides analysts the opportunity to scale large amounts of data to be projected onto 2 axes that allows us for efficient separation. Assume we have two variables: X1, X2, that are projected onto a 2 dimensional graph. LDA projects these variables onto a 1 dimensional plane, or a line, that maximizes the separation of the two variables. This is done by maximizing the distance between the two means of X1 and X2. Then, we minimize the variation within both variables. We then take the ratio of the maximization of the means, and the minimization of the variation. This provides the data with the best separation possible. \(^1\)

Ratio for maximization and minimization of the means and variances:

\(\frac{(\mu_1 - \mu_2)^2}{(\sigma_1 - \sigma_2)}\)

Example using Orange dataset from R. Here we wish to predict the circumference of a tree \(^7\):

set.seed(1)

data("iris")
head(iris,2)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa

training_tree <- iris$Species %>%
  createDataPartition(p = 0.8, list = FALSE)

train_data <- iris[training_tree,]
test_data <- iris[-training_tree,]

preprocess_param <- train_data %>%
  preProcess(method = c("center", "scale"))

train_transform <- preprocess_param %>% predict(train_data)
test_transform <- preprocess_param %>% predict(test_data)

model <- lda(Species~., data = train_transform)

predictions <- model %>% predict(test_transform)

model <- lda(Species~., data = train_transform)
model

## Call:
## lda(Species ~ ., data = train_transform)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3333333  0.3333333  0.3333333 
## 
## Group means:
##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa       -1.0403185   0.8430286   -1.3160834   -1.257539
## versicolor    0.2031872  -0.5979621    0.3285433    0.183870
## virginica     0.8371313  -0.2450664    0.9875400    1.073669
## 
## Coefficients of linear discriminants:
##                     LD1         LD2
## Sepal.Length  0.6603021 -0.03112867
## Sepal.Width   0.6032864  0.77207797
## Petal.Length -4.0979045 -2.32179601
## Petal.Width  -1.9769248  2.81185685
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9916 0.0084

Plotting the Results

library(ggplot2)
library(MASS)
library(mvtnorm)

lda_plot <- cbind(train_data, predict(model)$x)

ggplot(lda_plot, aes(LD1, LD2)) +
  geom_point(aes(color = Species))

Logistic Regression

Logistic regression is used for binary predictions, such as true or false, happy or sad, etc. Compared to linear regression, logistic regression uses an S shaped curve to measure the probabilities amongst the variables. The line is fitted using the “maximum likelihood” method, which is essentially done by finding the probability of each variable along the logistic curve to see other probabilities of other variables. The way these probabilities are found is by shifting the curve, and then choosing the curve with the maximum likelihood. Logistic regression is used to classify samples, and then utilizing the variables to perform such a classification.\(^{2,3}\)

\(P(X)=P(Y=1|X)= \frac{e^{\beta_0+\beta_1*X}}{1+e^{\beta_0+\beta_1*X}}\)

Difference Between LDA and Logistic Regression

While both methods are used for classification, the ultimate difference between the two is that logistic regression is for determining categorical variables, while linear discriminant analysis is for modeling continuous variables.

Considering Variances in Quadractic and Linear Discriminant Analysis

In addition to linear discriminant analysis, there is also quadratic linear analysis (QDA); the main difference is the assumption all covariants are different, whereas in LDA it is the same. To determine this, Bartlett’s test is used. \(^4\)

Bartlett’s Test

Bartlett’s test is used to examine if a positive, real number of samples from a population has equal variances, also known as a “homogeniety of variances”. It’s defined as:

\(H_0: \sigma_0^2 = \sigma_1^2 =...= \sigma_k^2\)

\(H_A: \sigma_i^2 \neq \sigma_j^2\)

\(T = \frac{(N-k)ln(s_p^2) - \sum^k_{i=1}(N_i - 1)ln(s^2_i)}{1 + \frac{1}{3(k-1)}*(\sum(\frac{1}{N_i - 1})) - \frac{1}{N-k}}\)

where…

\(s_i^2\) is the variance of the \(i^{th}\) group
\(N\) is the total sample size
\(N_i\) is the sample size of the \(i^{th}\) group
\(k\) is the number of groups
and \(s_p^2\) is the pooled variance, which is defined as…

\(s_p^2 = \sum_{i=1}^k(N_i - 1) * \frac{s_i^2}{N-k}\)

An Example

Here, using the Orange dataset, we will measure the variance for circumference of each tree. The null hypothesis states the variances are equal to each other, while the alternative states otherwise. Assuming a 95% level of confidence, we will maintain the null hypothesis if the p-value is greater than \(\alpha = 0.05\).\(^6\)

data <- Orange
head(data, 3)

## Grouped Data: circumference ~ age | Tree
##   Tree age circumference
## 1    1 118            30
## 2    1 484            58
## 3    1 664            87

bartlett.test(circumference ~ Tree, data = data)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  circumference by Tree
## Bartlett's K-squared = 2.4607, df = 4, p-value = 0.6517

margin = 0.6517 - 0.05
margin

## [1] 0.6017

Since the obtained p-value exceeds \(\alpha\) by a substantial margin, 0.6017, we cannot reject the null hypothesis–therefore, the variances of circumference for each tree is different. Therefore, if one were to perform either a linear or quadratic analysis on this dataset, the quadratic method would be most appropriate.

Sources

\(^1\) https://www.youtube.com/watch?time_continue=480&v=azXCzI57Yfc&feature=emb_title

\(^2\) https://www.youtube.com/watch?v=yIYKR4sgzI8

\(^3\) Nwanganga, F. (2020). Chapter 5: Logistic Regression. In M. Chapple (Ed.), Practical Machine Learning in R (p. 172). essay, Wiley.

\(^4\) https://online.stat.psu.edu/stat505/book/export/html/705

\(^5\) https://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm

\(^6\) https://www.statology.org/bartletts-test-in-r/#:~:text=Bartlett%E2%80%99s%20test%20is%20a%20statistical%20test%20that%20is,test%20can%20be%20used%20to%20verify%20that%20assumption

\(^7\) https://www.geeksforgeeks.org/linear-discriminant-analysis-in-r-programming/#:~:text=LDA%20or%20Linear%20Discriminant%20Analysis%20can%20be%20computed,acquires%20the%20highest%20probability%20score%20in%20that%20group.