Introduction :-
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Data Partitioning :-
Applying LDA :-
The summary of the fitted LDA function is as follows.
## Call:
## lda(Species ~ ., data = training_data)
##
## Prior probabilities of groups:
## setosa versicolor virginica
## 0.3837209 0.3139535 0.3023256
##
## Group means:
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa 4.975758 3.357576 1.472727 0.2454545
## versicolor 5.974074 2.751852 4.281481 1.3407407
## virginica 6.580769 2.946154 5.553846 1.9807692
##
## Coefficients of linear discriminants:
## LD1 LD2
## Sepal.Length 1.252207 -0.1229923
## Sepal.Width 1.115823 2.2711963
## Petal.Length -2.616277 -0.7924520
## Petal.Width -2.156489 2.6956343
##
## Proportion of trace:
## LD1 LD2
## 0.9937 0.0063
out of 86 records , the count for each category is as follows.
## setosa versicolor virginica
## 33 27 26
That means, In the given training data, 38.37 % of records belongs to setosa. 31.39 % of records belongs to versicolor . 30.23 % of records belongs to virginica.
The cross tabulation for mean values of each sub categories is as follows.
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa 4.975758 3.357576 1.472727 0.2454545
## versicolor 5.974074 2.751852 4.281481 1.3407407
## virginica 6.580769 2.946154 5.553846 1.9807692
The two discriminant functions to separate the three classes in given dataset can be achieved by following functions.
\(LD1 = 1.252207*Sepal.Length + 1.115823*Sepal.Width - 2.616277*Petal.Length - 2.156489*Petal.Width\)
\(LD2 = - 0.1229923*Sepal.Length + 2.2711963*Sepal.Width - 0.7924520*Petal.Length - 2.6956343*Petal.Width\)
Percentage separation that is achieved by first Discriminant function is 99.37 %
Percentage separation that is achieved by second Discriminant funciton is 0.63 %
Stacked Histograms of Discriminant values :-
The Histogram for the first discriminant values in all the three differnt groups is as follows
The Histogram for the second discriminant values in all the three differnt groups is as follows
Bi-Plot :- The transformed Linear Discriminent function vector space with all the data points from training dataset is as follows.
As descibed in summary of LDA function and in histograms,
* 99.37 % of seperation of data points can be explained by LD1 ( X - axis )
* 0.63 % of seperation of data points can be explained by LD2 ( Y - axis )
Partition Plot :-
Evaluation :-
The predicted and observed classes are as follows.
## Predicted Observed
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [5,] 1 1
## [6,] 1 1
** Confusion Matrix :- **
## Actual
## Predicted setosa versicolor virginica
## setosa 33 0 0
## versicolor 0 26 1
## virginica 0 1 25
Training data accuracy = 97.67 %
The predicted and observed classes for testing data are as follows.
## Predicted Observed
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [5,] 1 1
## [6,] 1 1
** Confusion Matrix :- **
## Actual
## Predicted setosa versicolor virginica
## setosa 17 0 0
## versicolor 0 22 0
## virginica 0 1 24
Testing data accuracy = 98.44 %