library(MASS)
library(stats)
library(dplyr)
library(caret)
LDA provides analysts the opportunity to scale large amounts of data to be projected onto 2 axes that allows us for efficient separation. Assume we have two variables: X1, X2, that are projected onto a 2 dimensional graph. LDA projects these variables onto a 1 dimensional plane, or a line, that maximizes the separation of the two variables. This is done by maximizing the distance between the two means of X1 and X2. Then, we minimize the variation within both variables. We then take the ratio of the maximization of the means, and the minimization of the variation. This provides the data with the best separation possible. \(^1\)
Ratio for maximization and minimization of the means and variances:
\(\frac{(\mu_1 - \mu_2)^2}{(\sigma_1 - \sigma_2)}\)
Example using Orange dataset from R. Here we wish to predict the circumference of a tree \(^7\):
set.seed(1)
data("iris")
head(iris,2)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
training_tree <- iris$Species %>%
createDataPartition(p = 0.8, list = FALSE)
train_data <- iris[training_tree,]
test_data <- iris[-training_tree,]
preprocess_param <- train_data %>%
preProcess(method = c("center", "scale"))
train_transform <- preprocess_param %>% predict(train_data)
test_transform <- preprocess_param %>% predict(test_data)
model <- lda(Species~., data = train_transform)
predictions <- model %>% predict(test_transform)
model <- lda(Species~., data = train_transform)
model
## Call:
## lda(Species ~ ., data = train_transform)
##
## Prior probabilities of groups:
## setosa versicolor virginica
## 0.3333333 0.3333333 0.3333333
##
## Group means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa -1.0403185 0.8430286 -1.3160834 -1.257539
## versicolor 0.2031872 -0.5979621 0.3285433 0.183870
## virginica 0.8371313 -0.2450664 0.9875400 1.073669
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.Length 0.6603021 -0.03112867
## Sepal.Width 0.6032864 0.77207797
## Petal.Length -4.0979045 -2.32179601
## Petal.Width -1.9769248 2.81185685
##
## Proportion of trace:
## LD1 LD2
## 0.9916 0.0084
Plotting the Results
library(ggplot2)
library(MASS)
library(mvtnorm)
lda_plot <- cbind(train_data, predict(model)$x)
ggplot(lda_plot, aes(LD1, LD2)) +
geom_point(aes(color = Species))
Logistic regression is used for binary predictions, such as true or false, happy or sad, etc. Compared to linear regression, logistic regression uses an S shaped curve to measure the probabilities amongst the variables. The line is fitted using the “maximum likelihood” method, which is essentially done by finding the probability of each variable along the logistic curve to see other probabilities of other variables. The way these probabilities are found is by shifting the curve, and then choosing the curve with the maximum likelihood. Logistic regression is used to classify samples, and then utilizing the variables to perform such a classification.\(^{2,3}\)
\(P(X)=P(Y=1|X)= \frac{e^{\beta_0+\beta_1*X}}{1+e^{\beta_0+\beta_1*X}}\)
While both methods are used for classification, the ultimate difference between the two is that logistic regression is for determining categorical variables, while linear discriminant analysis is for modeling continuous variables.
In addition to linear discriminant analysis, there is also quadratic linear analysis (QDA); the main difference is the assumption all covariants are different, whereas in LDA it is the same. To determine this, Bartlett’s test is used. \(^4\)
Bartlett’s test is used to examine if a positive, real number of samples from a population has equal variances, also known as a “homogeniety of variances”. It’s defined as:
\(H_0: \sigma_0^2 = \sigma_1^2 =...= \sigma_k^2\)
\(H_A: \sigma_i^2 \neq \sigma_j^2\)
\(T = \frac{(N-k)ln(s_p^2) - \sum^k_{i=1}(N_i - 1)ln(s^2_i)}{1 + \frac{1}{3(k-1)}*(\sum(\frac{1}{N_i - 1})) - \frac{1}{N-k}}\)
where…
\(s_i^2\) is the variance of the \(i^{th}\) group
\(N\) is the total sample size
\(N_i\) is the sample size of the \(i^{th}\) group
\(k\) is the number of groups
and \(s_p^2\) is the pooled variance, which is defined as…
\(s_p^2 = \sum_{i=1}^k(N_i - 1) * \frac{s_i^2}{N-k}\)
data <- Orange
head(data, 3)
## Grouped Data: circumference ~ age | Tree
## Tree age circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
bartlett.test(circumference ~ Tree, data = data)
##
## Bartlett test of homogeneity of variances
##
## data: circumference by Tree
## Bartlett's K-squared = 2.4607, df = 4, p-value = 0.6517
margin = 0.6517 - 0.05
margin
## [1] 0.6017
\(^1\) https://www.youtube.com/watch?time_continue=480&v=azXCzI57Yfc&feature=emb_title
\(^2\) https://www.youtube.com/watch?v=yIYKR4sgzI8
\(^3\) Nwanganga, F. (2020). Chapter 5: Logistic Regression. In M. Chapple (Ed.), Practical Machine Learning in R (p. 172). essay, Wiley.
\(^4\) https://online.stat.psu.edu/stat505/book/export/html/705
\(^5\) https://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm
\(^6\) https://www.statology.org/bartletts-test-in-r/#:~:text=Bartlett%E2%80%99s%20test%20is%20a%20statistical%20test%20that%20is,test%20can%20be%20used%20to%20verify%20that%20assumption
\(^7\) https://www.geeksforgeeks.org/linear-discriminant-analysis-in-r-programming/#:~:text=LDA%20or%20Linear%20Discriminant%20Analysis%20can%20be%20computed,acquires%20the%20highest%20probability%20score%20in%20that%20group.