LDA_Classificaion_Iris

Introduction :-

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Data Partitioning :-

Applying LDA :-

The summary of the fitted LDA function is as follows.

## Call:
## lda(Species ~ ., data = training_data)
## 
## Prior probabilities of groups:
##     setosa versicolor  virginica 
##  0.3837209  0.3139535  0.3023256 
## 
## Group means:
##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa         4.975758    3.357576     1.472727   0.2454545
## versicolor     5.974074    2.751852     4.281481   1.3407407
## virginica      6.580769    2.946154     5.553846   1.9807692
## 
## Coefficients of linear discriminants:
##                    LD1        LD2
## Sepal.Length  1.252207 -0.1229923
## Sepal.Width   1.115823  2.2711963
## Petal.Length -2.616277 -0.7924520
## Petal.Width  -2.156489  2.6956343
## 
## Proportion of trace:
##    LD1    LD2 
## 0.9937 0.0063

out of 86 records , the count for each category is as follows.

##     setosa versicolor  virginica 
##         33         27         26

That means, In the given training data, 38.37 % of records belongs to setosa. 31.39 % of records belongs to versicolor . 30.23 % of records belongs to virginica.

The cross tabulation for mean values of each sub categories is as follows.

##            Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa         4.975758    3.357576     1.472727   0.2454545
## versicolor     5.974074    2.751852     4.281481   1.3407407
## virginica      6.580769    2.946154     5.553846   1.9807692

The two discriminant functions to separate the three classes in given dataset can be achieved by following functions.

\(LD1 = 1.252207*Sepal.Length + 1.115823*Sepal.Width - 2.616277*Petal.Length - 2.156489*Petal.Width\)

\(LD2 = - 0.1229923*Sepal.Length + 2.2711963*Sepal.Width - 0.7924520*Petal.Length - 2.6956343*Petal.Width\)

Percentage separation that is achieved by first Discriminant function is 99.37 %
Percentage separation that is achieved by second Discriminant funciton is 0.63 %

Stacked Histograms of Discriminant values :-

The Histogram for the first discriminant values in all the three differnt groups is as follows

The Histogram for the second discriminant values in all the three differnt groups is as follows

Bi-Plot :- The transformed Linear Discriminent function vector space with all the data points from training dataset is as follows.

As descibed in summary of LDA function and in histograms,

      * 99.37 % of seperation of data points can be explained by LD1 ( X - axis )
      * 0.63 % of seperation of data points can be explained by LD2 ( Y - axis )

Partition Plot :-

Evaluation :-

The predicted and observed classes are as follows.

##      Predicted Observed
## [1,]         1        1
## [2,]         1        1
## [3,]         1        1
## [4,]         1        1
## [5,]         1        1
## [6,]         1        1

** Confusion Matrix :- **

##             Actual
## Predicted    setosa versicolor virginica
##   setosa         33          0         0
##   versicolor      0         26         1
##   virginica       0          1        25

Training data accuracy = 97.67 %

The predicted and observed classes for testing data are as follows.

##      Predicted Observed
## [1,]         1        1
## [2,]         1        1
## [3,]         1        1
## [4,]         1        1
## [5,]         1        1
## [6,]         1        1

** Confusion Matrix :- **

##             Actual
## Predicted    setosa versicolor virginica
##   setosa         17          0         0
##   versicolor      0         22         0
##   virginica       0          1        24

Testing data accuracy = 98.44 %

LDA_Classificaion_Iris_Data