Linear Discriminant Analysis (LDA)

dr. Annelies Agten

2026-04-18

Linear Discriminant Analysis (LDA) is a supervised classification method used to find linear combinations of predictor variables that best separate predefined groups. Unlike kNN or decision trees, LDA is a model-based approach that assumes the data in each class follow a multivariate normal distribution with a common covariance structure.

The main idea of LDA is to project the data onto a lower-dimensional space in such a way that the separation between classes is maximised. These new dimensions are called discriminant functions. The first discriminant function provides the best separation between groups, the second provides the next best (orthogonal to the first), and so on.

In the following section, we will show how to perform LDA in R, how to interpret the model output, and how to evaluate its classification performance.

Linear Discriminant Analysis (LDA) in R

In R, LDA is commonly performed using the lda() function from the MASS package.

The general syntax is:

Syntax:
lda(formula, data, prior = proportions)

Where:

The lda() function returns an object that contains several components describing the fitted discriminant model and how it separates the classes.

The main elements of the output are:

Read data

First we need to read in our data into R.Throughtout this example we will use the wine data. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are:

The wine data is in a .txt format, so to read in the data we can use the read.table() function in R.

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")

colnames(wine) <- c("Cultivar","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline")

dim(wine)
#> [1] 178  14

head(wine, 5)
#>   Cultivar Alcohol Malic acid  Ash Alcalinity of ash Magnesium Total phenols
#> 1        1   14.23       1.71 2.43              15.6       127          2.80
#> 2        1   13.20       1.78 2.14              11.2       100          2.65
#> 3        1   13.16       2.36 2.67              18.6       101          2.80
#> 4        1   14.37       1.95 2.50              16.8       113          3.85
#> 5        1   13.24       2.59 2.87              21.0       118          2.80
#>   Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity  Hue
#> 1       3.06                 0.28            2.29            5.64 1.04
#> 2       2.76                 0.26            1.28            4.38 1.05
#> 3       3.24                 0.30            2.81            5.68 1.03
#> 4       3.49                 0.24            2.18            7.80 0.86
#> 5       2.69                 0.39            1.82            4.32 1.04
#>   OD280/OD315 of diluted wines Proline
#> 1                         3.92    1065
#> 2                         3.40    1050
#> 3                         3.17    1185
#> 4                         3.45    1480
#> 5                         2.93     735

The wine dataset contains 178 observations of 14 variables, including the 13 measured quantities of chemicals and the variable Cultivar, which indicates the type of grape from which the wine was produced.

The measured attributes have very different ranges, so the data should be standardized before performing hierarchical clustering to ensure that all variables contribute equally to the analysis.

wine_stand <- as.data.frame(scale(wine[,2:14])) # standardize data by subtracting the mean and deviding by the sd

wine_stand$Cultivar <- factor(wine$Cultivar)

Split data into training and test set

To evaluate the performance of an LDA classifier, we first need to split the data into a training set and a test set. The training set is used to build the model, while the test set is used to assess how well the model performs on unseen data.

In R, this can be done using the createDataPartition() function from the caret package. This function ensures that the split is done in a way that preserves the class distribution in both sets (i.e., a stratified sample).

A common choice is an 80–20 split, where 80% of the data is used for training and 20% for testing.

library(caret)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.5.3
#> Loading required package: lattice

set.seed(255)

trainIndex <- createDataPartition(wine_stand$Cultivar, 
                                  times=1, 
                                  p = .8, 
                                  list = FALSE)

train <- wine_stand[trainIndex, ]
test <- wine_stand[-trainIndex, ]

Perform LDA

library(MASS)

wine.lda <- lda(
  Cultivar ~ .,
  data = train
)

wine.lda
#> Call:
#> lda(Cultivar ~ ., data = train)
#> 
#> Prior probabilities of groups:
#>         1         2         3 
#> 0.3333333 0.3958333 0.2708333 
#> 
#> Group means:
#>      Alcohol `Malic acid`        Ash `Alcalinity of ash`  Magnesium
#> 1  0.8512270   -0.2386050  0.3772021          -0.6859510  0.4629863
#> 2 -0.8686331   -0.4675425 -0.4937750           0.1701462 -0.2889930
#> 3  0.1492645    0.9716245  0.1884068           0.5120981 -0.1650244
#>   `Total phenols`   Flavanoids `Nonflavanoid phenols` Proanthocyanins
#> 1      0.81038341  0.909685241            -0.54555064     0.514932401
#> 2     -0.08777955 -0.005240556             0.00342915     0.008544634
#> 3     -1.07297514 -1.304861485             0.79067767    -0.852300864
#>   `Color intensity`        Hue `OD280/OD315 of diluted wines`    Proline
#> 1         0.1229508  0.4714449                      0.7452888  1.0796242
#> 2        -0.8463689  0.4629347                      0.2163096 -0.6723733
#> 3         0.9567689 -1.1712124                     -1.2793863 -0.3606136
#> 
#> Coefficients of linear discriminants:
#>                                        LD1         LD2
#> Alcohol                        -0.18481612  0.68954159
#> `Malic acid`                    0.16376534  0.47828853
#> Ash                            -0.10351254  0.60212950
#> `Alcalinity of ash`             0.51683232 -0.45052898
#> Magnesium                      -0.12912872 -0.07257685
#> `Total phenols`                 0.50280213 -0.01383282
#> Flavanoids                     -2.26219976 -0.47682344
#> `Nonflavanoid phenols`         -0.21303042 -0.09984860
#> Proanthocyanins                 0.25305609 -0.08993530
#> `Color intensity`               0.74044550  0.70342842
#> Hue                            -0.08259957 -0.32326103
#> `OD280/OD315 of diluted wines` -0.72339848  0.12889261
#> Proline                        -0.92005728  1.00329701
#> 
#> Proportion of trace:
#>    LD1    LD2 
#> 0.6948 0.3052

plot(wine.lda)

The wine.lda output contains the following components:

wine.lda$scaling # The coefficients of the linear discriminant
#>                                        LD1         LD2
#> Alcohol                        -0.18481612  0.68954159
#> `Malic acid`                    0.16376534  0.47828853
#> Ash                            -0.10351254  0.60212950
#> `Alcalinity of ash`             0.51683232 -0.45052898
#> Magnesium                      -0.12912872 -0.07257685
#> `Total phenols`                 0.50280213 -0.01383282
#> Flavanoids                     -2.26219976 -0.47682344
#> `Nonflavanoid phenols`         -0.21303042 -0.09984860
#> Proanthocyanins                 0.25305609 -0.08993530
#> `Color intensity`               0.74044550  0.70342842
#> Hue                            -0.08259957 -0.32326103
#> `OD280/OD315 of diluted wines` -0.72339848  0.12889261
#> Proline                        -0.92005728  1.00329701

wine.lda$means # The group means
#>      Alcohol `Malic acid`        Ash `Alcalinity of ash`  Magnesium
#> 1  0.8512270   -0.2386050  0.3772021          -0.6859510  0.4629863
#> 2 -0.8686331   -0.4675425 -0.4937750           0.1701462 -0.2889930
#> 3  0.1492645    0.9716245  0.1884068           0.5120981 -0.1650244
#>   `Total phenols`   Flavanoids `Nonflavanoid phenols` Proanthocyanins
#> 1      0.81038341  0.909685241            -0.54555064     0.514932401
#> 2     -0.08777955 -0.005240556             0.00342915     0.008544634
#> 3     -1.07297514 -1.304861485             0.79067767    -0.852300864
#>   `Color intensity`        Hue `OD280/OD315 of diluted wines`    Proline
#> 1         0.1229508  0.4714449                      0.7452888  1.0796242
#> 2        -0.8463689  0.4629347                      0.2163096 -0.6723733
#> 3         0.9567689 -1.1712124                     -1.2793863 -0.3606136

By default the lda function uses the proportions of the training dataset as the prior probabilities:

wine.lda$prior
#>         1         2         3 
#> 0.3333333 0.3958333 0.2708333

The singular values are analogous to the eigenvalues of the Principal Component Analysis, except that LDA does not maximize the variance of a component, instead it maximizes the separability (defined by the between and within-group standard deviation). Thus, the “proportion of trace” is the proportion of between-class variance that is explained by successive discriminant functions.

wine.lda$svd
#> [1] 26.31524 17.44167

wine.lda$svd^2 / sum(wine.lda$svd^2)
#> [1] 0.694782 0.305218

Hence, 69.48% of the between-class variance is explained by the first linear discriminant function (LD1).

A nice way to visualise LDA is to plot the discriminant scores and add group contours (or ellipses) to show class separation clearly.

library(ggplot2)

lda_pred <- predict(wine.lda, test)

df <- data.frame(
  LD1 = lda_pred$x[,1],
  LD2 = lda_pred$x[,2],
  Cultivar = test$Cultivar
)

ggplot(df, aes(x = LD1, y = LD2, color = Cultivar)) +
  geom_point(size = 2, alpha = 0.8) +
  stat_ellipse(level = 0.95, linewidth = 1) +
  theme_minimal() +
  labs(
    title = "LDA: Discriminant analysis with class contours",
    x = "Linear Discriminant 1",
    y = "Linear Discriminant 2"
  )

Well-separated ellipses indicate strong discriminative power of the model, while overlapping contours suggest that some classes are not well separated in the discriminant space.

Making predictions

When using predict() on an LDA model, additional output is produced:

Together, these outputs allow you to both classify new observations and understand how the model separates the groups.

To make predictions on new data:

lda_pred <- predict(wine.lda, test)

lda_pred
#> $class
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 3 3 3 3 3 3 3 3
#> Levels: 1 2 3
#> 
#> $posterior
#>                1            2            3
#> 4   1.000000e+00 7.625093e-14 1.194116e-18
#> 7   1.000000e+00 7.621901e-12 1.228083e-17
#> 8   1.000000e+00 2.733462e-11 9.309561e-18
#> 15  1.000000e+00 4.467867e-15 1.910570e-22
#> 19  1.000000e+00 9.041692e-16 6.538205e-24
#> 25  9.997005e-01 2.995355e-04 1.361809e-13
#> 39  9.964906e-01 3.509374e-03 6.494207e-14
#> 48  9.999984e-01 1.593550e-06 1.311790e-15
#> 50  1.000000e+00 1.089628e-09 5.640580e-17
#> 57  9.999989e-01 1.104966e-06 4.385765e-13
#> 59  1.000000e+00 3.215679e-10 5.626867e-18
#> 65  3.181208e-09 1.000000e+00 2.884945e-09
#> 83  3.317497e-09 1.000000e+00 1.759653e-08
#> 84  5.167999e-07 7.148702e-01 2.851293e-01
#> 87  6.469626e-11 1.000000e+00 6.415609e-09
#> 93  9.030532e-09 9.999971e-01 2.877281e-06
#> 94  6.941453e-08 9.999999e-01 2.650275e-10
#> 95  1.618211e-09 1.000000e+00 2.340817e-09
#> 99  1.003244e-03 9.989968e-01 1.177024e-14
#> 109 1.123824e-10 1.000000e+00 1.415239e-10
#> 117 5.382296e-10 1.000000e+00 3.287325e-10
#> 120 1.269949e-07 9.999999e-01 1.367816e-08
#> 122 8.680420e-01 1.319580e-01 4.627266e-20
#> 124 1.330437e-04 9.998635e-01 3.451807e-06
#> 125 1.501618e-05 9.999850e-01 1.483586e-09
#> 131 6.357776e-07 8.557655e-01 1.442339e-01
#> 139 3.467741e-13 7.684678e-07 9.999992e-01
#> 151 2.352562e-13 6.870259e-08 9.999999e-01
#> 153 1.200220e-17 1.065000e-07 9.999999e-01
#> 159 1.865705e-20 2.926708e-15 1.000000e+00
#> 161 7.210379e-18 3.139046e-09 1.000000e+00
#> 170 6.318248e-17 7.197804e-13 1.000000e+00
#> 175 1.147842e-15 2.126589e-10 1.000000e+00
#> 177 5.756769e-14 5.187351e-10 1.000000e+00
#> 
#> $x
#>             LD1        LD2
#> 4   -4.67905134  4.3433036
#> 7   -4.39171547  3.4938948
#> 8   -4.42748428  3.1608100
#> 15  -5.76942375  4.0953836
#> 19  -6.19000367  4.1191197
#> 25  -3.24261204  0.3170844
#> 39  -3.33698672 -0.3470066
#> 48  -3.81851104  1.0726561
#> 50  -4.20549160  2.4742440
#> 57  -3.09143800  1.7734663
#> 59  -4.49262759  2.5211950
#> 65   0.42484214 -3.1502063
#> 83   0.64573025 -2.9537293
#> 84   2.09500393 -0.4530168
#> 87   1.00838974 -3.5812316
#> 93   1.15871851 -2.2830477
#> 94  -0.25642231 -2.9943826
#> 95   0.48262766 -3.2617133
#> 99  -2.69878327 -2.7849462
#> 109  0.46292840 -3.9108860
#> 117  0.37383932 -3.6146398
#> 120  0.16170501 -2.4980411
#> 122 -5.09294018 -2.7227242
#> 124 -0.01011004 -0.9938402
#> 125 -0.70859694 -2.1010743
#> 131  1.98390542 -0.5403599
#> 139  4.02886232  1.0682580
#> 151  4.07917577  1.5913706
#> 153  5.30582702  0.1793505
#> 159  6.12412086  3.4656026
#> 161  5.37221817  0.9503615
#> 170  5.11018061  3.2313251
#> 175  4.74513957  2.2616632
#> 177  4.25828218  2.5675517

A standard way to assess classification performance is through a confusion matrix, which summarises how many observations are correctly and incorrectly classified for each class. In R, this can be computed using the confusionMatrix() function from the caret package:

library(caret)

confusionMatrix(lda_pred$class,test$Cultivar)
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  1  2  3
#>          1 11  1  0
#>          2  0 13  1
#>          3  0  0  8
#> 
#> Overall Statistics
#>                                           
#>                Accuracy : 0.9412          
#>                  95% CI : (0.8032, 0.9928)
#>     No Information Rate : 0.4118          
#>     P-Value [Acc > NIR] : 9.446e-11       
#>                                           
#>                   Kappa : 0.9101          
#>                                           
#>  Mcnemar's Test P-Value : NA              
#> 
#> Statistics by Class:
#> 
#>                      Class: 1 Class: 2 Class: 3
#> Sensitivity            1.0000   0.9286   0.8889
#> Specificity            0.9565   0.9500   1.0000
#> Pos Pred Value         0.9167   0.9286   1.0000
#> Neg Pred Value         1.0000   0.9500   0.9615
#> Prevalence             0.3235   0.4118   0.2647
#> Detection Rate         0.3235   0.3824   0.2353
#> Detection Prevalence   0.3529   0.4118   0.2353
#> Balanced Accuracy      0.9783   0.9393   0.9444

Another useful way to assess how well the LDA model separates the classes is by examining the distribution of the discriminant scores. This can be done using the ldahist() function from the MASS package.

The function produces histograms of the LDA scores for each class, allowing us to visually compare how well the groups are separated along a given discriminant axis (typically LD1 and LD2).

ldahist(lda_pred$x[,1], test$Cultivar)  # LD1

ldahist(lda_pred$x[,2], test$Cultivar)  # LD2

These plots show the distribution of each class along the linear discriminant axes. If the LDA model performs well, the histograms for different classes will show clear separation with minimal overlap. In contrast, substantial overlap between classes indicates that the discriminant functions are less effective at distinguishing between groups.