Linear Discriminant Analysis (LDA) is a supervised classification method used to find linear combinations of predictor variables that best separate predefined groups. Unlike kNN or decision trees, LDA is a model-based approach that assumes the data in each class follow a multivariate normal distribution with a common covariance structure.
The main idea of LDA is to project the data onto a lower-dimensional space in such a way that the separation between classes is maximised. These new dimensions are called discriminant functions. The first discriminant function provides the best separation between groups, the second provides the next best (orthogonal to the first), and so on.
In the following section, we will show how to perform LDA in R, how to interpret the model output, and how to evaluate its classification performance.
In R, LDA is commonly performed using the lda() function
from the MASS package.
The general syntax is:
Syntax:
lda(formula, data, prior = proportions)Where:
- formula: Specifies the response variable and predictors (e.g. y ~ .)
- data: The dataset used to fit the model
- prior: Prior optionally specifies prior class probabilities (default is proportions for the training set)
The lda() function returns an object that contains
several components describing the fitted discriminant model and how it
separates the classes.
The main elements of the output are:
First we need to read in our data into R.Throughtout this example we
will use the wine data. These data are the results of a
chemical analysis of wines grown in the same region in Italy but derived
from three different cultivars. The analysis determined the quantities
of 13 constituents found in each of the three types of wines.
The attributes are:
The wine data is in a .txt format, so to read in the
data we can use the read.table() function in R.
wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")
colnames(wine) <- c("Cultivar","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline")
dim(wine)
#> [1] 178 14
head(wine, 5)
#> Cultivar Alcohol Malic acid Ash Alcalinity of ash Magnesium Total phenols
#> 1 1 14.23 1.71 2.43 15.6 127 2.80
#> 2 1 13.20 1.78 2.14 11.2 100 2.65
#> 3 1 13.16 2.36 2.67 18.6 101 2.80
#> 4 1 14.37 1.95 2.50 16.8 113 3.85
#> 5 1 13.24 2.59 2.87 21.0 118 2.80
#> Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity Hue
#> 1 3.06 0.28 2.29 5.64 1.04
#> 2 2.76 0.26 1.28 4.38 1.05
#> 3 3.24 0.30 2.81 5.68 1.03
#> 4 3.49 0.24 2.18 7.80 0.86
#> 5 2.69 0.39 1.82 4.32 1.04
#> OD280/OD315 of diluted wines Proline
#> 1 3.92 1065
#> 2 3.40 1050
#> 3 3.17 1185
#> 4 3.45 1480
#> 5 2.93 735The wine dataset contains 178 observations of 14
variables, including the 13 measured quantities of chemicals and the
variable Cultivar, which indicates the type of grape from which the wine
was produced.
The measured attributes have very different ranges, so the data should be standardized before performing hierarchical clustering to ensure that all variables contribute equally to the analysis.
To evaluate the performance of an LDA classifier, we first need to split the data into a training set and a test set. The training set is used to build the model, while the test set is used to assess how well the model performs on unseen data.
In R, this can be done using the createDataPartition()
function from the caret package. This function ensures that
the split is done in a way that preserves the class distribution in both
sets (i.e., a stratified sample).
A common choice is an 80–20 split, where 80% of the data is used for training and 20% for testing.
library(caret)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.5.3
#> Loading required package: lattice
set.seed(255)
trainIndex <- createDataPartition(wine_stand$Cultivar,
times=1,
p = .8,
list = FALSE)
train <- wine_stand[trainIndex, ]
test <- wine_stand[-trainIndex, ]library(MASS)
wine.lda <- lda(
Cultivar ~ .,
data = train
)
wine.lda
#> Call:
#> lda(Cultivar ~ ., data = train)
#>
#> Prior probabilities of groups:
#> 1 2 3
#> 0.3333333 0.3958333 0.2708333
#>
#> Group means:
#> Alcohol `Malic acid` Ash `Alcalinity of ash` Magnesium
#> 1 0.8512270 -0.2386050 0.3772021 -0.6859510 0.4629863
#> 2 -0.8686331 -0.4675425 -0.4937750 0.1701462 -0.2889930
#> 3 0.1492645 0.9716245 0.1884068 0.5120981 -0.1650244
#> `Total phenols` Flavanoids `Nonflavanoid phenols` Proanthocyanins
#> 1 0.81038341 0.909685241 -0.54555064 0.514932401
#> 2 -0.08777955 -0.005240556 0.00342915 0.008544634
#> 3 -1.07297514 -1.304861485 0.79067767 -0.852300864
#> `Color intensity` Hue `OD280/OD315 of diluted wines` Proline
#> 1 0.1229508 0.4714449 0.7452888 1.0796242
#> 2 -0.8463689 0.4629347 0.2163096 -0.6723733
#> 3 0.9567689 -1.1712124 -1.2793863 -0.3606136
#>
#> Coefficients of linear discriminants:
#> LD1 LD2
#> Alcohol -0.18481612 0.68954159
#> `Malic acid` 0.16376534 0.47828853
#> Ash -0.10351254 0.60212950
#> `Alcalinity of ash` 0.51683232 -0.45052898
#> Magnesium -0.12912872 -0.07257685
#> `Total phenols` 0.50280213 -0.01383282
#> Flavanoids -2.26219976 -0.47682344
#> `Nonflavanoid phenols` -0.21303042 -0.09984860
#> Proanthocyanins 0.25305609 -0.08993530
#> `Color intensity` 0.74044550 0.70342842
#> Hue -0.08259957 -0.32326103
#> `OD280/OD315 of diluted wines` -0.72339848 0.12889261
#> Proline -0.92005728 1.00329701
#>
#> Proportion of trace:
#> LD1 LD2
#> 0.6948 0.3052
plot(wine.lda)The wine.lda output contains the following
components:
wine.lda$scaling # The coefficients of the linear discriminant
#> LD1 LD2
#> Alcohol -0.18481612 0.68954159
#> `Malic acid` 0.16376534 0.47828853
#> Ash -0.10351254 0.60212950
#> `Alcalinity of ash` 0.51683232 -0.45052898
#> Magnesium -0.12912872 -0.07257685
#> `Total phenols` 0.50280213 -0.01383282
#> Flavanoids -2.26219976 -0.47682344
#> `Nonflavanoid phenols` -0.21303042 -0.09984860
#> Proanthocyanins 0.25305609 -0.08993530
#> `Color intensity` 0.74044550 0.70342842
#> Hue -0.08259957 -0.32326103
#> `OD280/OD315 of diluted wines` -0.72339848 0.12889261
#> Proline -0.92005728 1.00329701
wine.lda$means # The group means
#> Alcohol `Malic acid` Ash `Alcalinity of ash` Magnesium
#> 1 0.8512270 -0.2386050 0.3772021 -0.6859510 0.4629863
#> 2 -0.8686331 -0.4675425 -0.4937750 0.1701462 -0.2889930
#> 3 0.1492645 0.9716245 0.1884068 0.5120981 -0.1650244
#> `Total phenols` Flavanoids `Nonflavanoid phenols` Proanthocyanins
#> 1 0.81038341 0.909685241 -0.54555064 0.514932401
#> 2 -0.08777955 -0.005240556 0.00342915 0.008544634
#> 3 -1.07297514 -1.304861485 0.79067767 -0.852300864
#> `Color intensity` Hue `OD280/OD315 of diluted wines` Proline
#> 1 0.1229508 0.4714449 0.7452888 1.0796242
#> 2 -0.8463689 0.4629347 0.2163096 -0.6723733
#> 3 0.9567689 -1.1712124 -1.2793863 -0.3606136By default the lda function uses the proportions of the
training dataset as the prior probabilities:
The singular values are analogous to the eigenvalues of the Principal Component Analysis, except that LDA does not maximize the variance of a component, instead it maximizes the separability (defined by the between and within-group standard deviation). Thus, the “proportion of trace” is the proportion of between-class variance that is explained by successive discriminant functions.
Hence, 69.48% of the between-class variance is explained by the first linear discriminant function (LD1).
A nice way to visualise LDA is to plot the discriminant scores and add group contours (or ellipses) to show class separation clearly.
library(ggplot2)
lda_pred <- predict(wine.lda, test)
df <- data.frame(
LD1 = lda_pred$x[,1],
LD2 = lda_pred$x[,2],
Cultivar = test$Cultivar
)
ggplot(df, aes(x = LD1, y = LD2, color = Cultivar)) +
geom_point(size = 2, alpha = 0.8) +
stat_ellipse(level = 0.95, linewidth = 1) +
theme_minimal() +
labs(
title = "LDA: Discriminant analysis with class contours",
x = "Linear Discriminant 1",
y = "Linear Discriminant 2"
)Well-separated ellipses indicate strong discriminative power of the model, while overlapping contours suggest that some classes are not well separated in the discriminant space.
When using predict() on an LDA model, additional output
is produced:
Together, these outputs allow you to both classify new observations and understand how the model separates the groups.
To make predictions on new data:
lda_pred <- predict(wine.lda, test)
lda_pred
#> $class
#> [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 3 3 3 3 3 3 3 3
#> Levels: 1 2 3
#>
#> $posterior
#> 1 2 3
#> 4 1.000000e+00 7.625093e-14 1.194116e-18
#> 7 1.000000e+00 7.621901e-12 1.228083e-17
#> 8 1.000000e+00 2.733462e-11 9.309561e-18
#> 15 1.000000e+00 4.467867e-15 1.910570e-22
#> 19 1.000000e+00 9.041692e-16 6.538205e-24
#> 25 9.997005e-01 2.995355e-04 1.361809e-13
#> 39 9.964906e-01 3.509374e-03 6.494207e-14
#> 48 9.999984e-01 1.593550e-06 1.311790e-15
#> 50 1.000000e+00 1.089628e-09 5.640580e-17
#> 57 9.999989e-01 1.104966e-06 4.385765e-13
#> 59 1.000000e+00 3.215679e-10 5.626867e-18
#> 65 3.181208e-09 1.000000e+00 2.884945e-09
#> 83 3.317497e-09 1.000000e+00 1.759653e-08
#> 84 5.167999e-07 7.148702e-01 2.851293e-01
#> 87 6.469626e-11 1.000000e+00 6.415609e-09
#> 93 9.030532e-09 9.999971e-01 2.877281e-06
#> 94 6.941453e-08 9.999999e-01 2.650275e-10
#> 95 1.618211e-09 1.000000e+00 2.340817e-09
#> 99 1.003244e-03 9.989968e-01 1.177024e-14
#> 109 1.123824e-10 1.000000e+00 1.415239e-10
#> 117 5.382296e-10 1.000000e+00 3.287325e-10
#> 120 1.269949e-07 9.999999e-01 1.367816e-08
#> 122 8.680420e-01 1.319580e-01 4.627266e-20
#> 124 1.330437e-04 9.998635e-01 3.451807e-06
#> 125 1.501618e-05 9.999850e-01 1.483586e-09
#> 131 6.357776e-07 8.557655e-01 1.442339e-01
#> 139 3.467741e-13 7.684678e-07 9.999992e-01
#> 151 2.352562e-13 6.870259e-08 9.999999e-01
#> 153 1.200220e-17 1.065000e-07 9.999999e-01
#> 159 1.865705e-20 2.926708e-15 1.000000e+00
#> 161 7.210379e-18 3.139046e-09 1.000000e+00
#> 170 6.318248e-17 7.197804e-13 1.000000e+00
#> 175 1.147842e-15 2.126589e-10 1.000000e+00
#> 177 5.756769e-14 5.187351e-10 1.000000e+00
#>
#> $x
#> LD1 LD2
#> 4 -4.67905134 4.3433036
#> 7 -4.39171547 3.4938948
#> 8 -4.42748428 3.1608100
#> 15 -5.76942375 4.0953836
#> 19 -6.19000367 4.1191197
#> 25 -3.24261204 0.3170844
#> 39 -3.33698672 -0.3470066
#> 48 -3.81851104 1.0726561
#> 50 -4.20549160 2.4742440
#> 57 -3.09143800 1.7734663
#> 59 -4.49262759 2.5211950
#> 65 0.42484214 -3.1502063
#> 83 0.64573025 -2.9537293
#> 84 2.09500393 -0.4530168
#> 87 1.00838974 -3.5812316
#> 93 1.15871851 -2.2830477
#> 94 -0.25642231 -2.9943826
#> 95 0.48262766 -3.2617133
#> 99 -2.69878327 -2.7849462
#> 109 0.46292840 -3.9108860
#> 117 0.37383932 -3.6146398
#> 120 0.16170501 -2.4980411
#> 122 -5.09294018 -2.7227242
#> 124 -0.01011004 -0.9938402
#> 125 -0.70859694 -2.1010743
#> 131 1.98390542 -0.5403599
#> 139 4.02886232 1.0682580
#> 151 4.07917577 1.5913706
#> 153 5.30582702 0.1793505
#> 159 6.12412086 3.4656026
#> 161 5.37221817 0.9503615
#> 170 5.11018061 3.2313251
#> 175 4.74513957 2.2616632
#> 177 4.25828218 2.5675517A standard way to assess classification performance is through a
confusion matrix, which summarises how many observations are correctly
and incorrectly classified for each class. In R, this can be computed
using the confusionMatrix() function from the
caret package:
library(caret)
confusionMatrix(lda_pred$class,test$Cultivar)
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction 1 2 3
#> 1 11 1 0
#> 2 0 13 1
#> 3 0 0 8
#>
#> Overall Statistics
#>
#> Accuracy : 0.9412
#> 95% CI : (0.8032, 0.9928)
#> No Information Rate : 0.4118
#> P-Value [Acc > NIR] : 9.446e-11
#>
#> Kappa : 0.9101
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: 1 Class: 2 Class: 3
#> Sensitivity 1.0000 0.9286 0.8889
#> Specificity 0.9565 0.9500 1.0000
#> Pos Pred Value 0.9167 0.9286 1.0000
#> Neg Pred Value 1.0000 0.9500 0.9615
#> Prevalence 0.3235 0.4118 0.2647
#> Detection Rate 0.3235 0.3824 0.2353
#> Detection Prevalence 0.3529 0.4118 0.2353
#> Balanced Accuracy 0.9783 0.9393 0.9444Another useful way to assess how well the LDA model separates the
classes is by examining the distribution of the discriminant scores.
This can be done using the ldahist() function from the
MASS package.
The function produces histograms of the LDA scores for each class, allowing us to visually compare how well the groups are separated along a given discriminant axis (typically LD1 and LD2).
These plots show the distribution of each class along the linear discriminant axes. If the LDA model performs well, the histograms for different classes will show clear separation with minimal overlap. In contrast, substantial overlap between classes indicates that the discriminant functions are less effective at distinguishing between groups.