Introduction :-
## 'data.frame': 45 obs. of 5 variables:
## $ group : chr "A" "A" "A" "A" ...
## $ security : int 8 7 8 9 10 7 9 8 7 8 ...
## $ ecology : int 6 7 6 5 8 5 7 7 6 7 ...
## $ innovation: int 8 8 9 7 9 6 10 9 7 8 ...
## $ prestige : int 10 9 9 9 10 8 9 9 9 8 ...
## 'data.frame': 45 obs. of 5 variables:
## $ group : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
## $ security : int 8 7 8 9 10 7 9 8 7 8 ...
## $ ecology : int 6 7 6 5 8 5 7 7 6 7 ...
## $ innovation: int 8 8 9 7 9 6 10 9 7 8 ...
## $ prestige : int 10 9 9 9 10 8 9 9 9 8 ...
## group security ecology innovation prestige
## A:15 Min. : 3.000 Min. : 5.000 Min. : 3.0 Min. : 2.000
## B:15 1st Qu.: 5.000 1st Qu.: 7.000 1st Qu.: 4.0 1st Qu.: 3.000
## C:15 Median : 8.000 Median : 8.000 Median : 5.0 Median : 6.000
## Mean : 7.133 Mean : 7.533 Mean : 5.6 Mean : 5.711
## 3rd Qu.: 9.000 3rd Qu.: 8.000 3rd Qu.: 7.0 3rd Qu.: 8.000
## Max. :10.000 Max. :10.000 Max. :10.0 Max. :10.000
## 'data.frame': 30 obs. of 5 variables:
## $ group : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
## $ security : int 8 7 8 9 10 7 9 8 7 8 ...
## $ ecology : int 6 7 6 5 8 5 7 7 6 7 ...
## $ innovation: int 8 8 9 7 9 6 10 9 7 8 ...
## $ prestige : int 10 9 9 9 10 8 9 9 9 8 ...
## group security ecology innovation prestige
## A:15 Min. : 7.000 Min. : 5.000 Min. : 4.000 Min. : 5.000
## B:15 1st Qu.: 8.000 1st Qu.: 6.250 1st Qu.: 5.000 1st Qu.: 6.000
## Median : 8.000 Median : 7.500 Median : 6.000 Median : 6.500
## Mean : 8.433 Mean : 7.367 Mean : 6.633 Mean : 7.133
## 3rd Qu.: 9.000 3rd Qu.: 8.000 3rd Qu.: 8.000 3rd Qu.: 9.000
## Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
Data Partitioning :-
Applying LDA :-
The summary of the fitted LDA function is as follows.
## Call:
## lda(group ~ ., data = training_data)
##
## Prior probabilities of groups:
## A B
## 0.4210526 0.5789474
##
## Group means:
## security ecology innovation prestige
## A 8.500000 6.375000 7.125000 8.625000
## B 8.636364 8.090909 5.272727 5.454545
##
## Coefficients of linear discriminants:
## LD1
## security 0.06891706
## ecology 0.19335376
## innovation 0.30970542
## prestige -1.30619707
out of 19 records , the count for each category is as follows.
## A B
## 8 11
That means, In the given training data, 42.10 % of records belongs to group-A 57.89 % of records belongs to group-B
The cross tabulation for mean values of each sub categories is as follows.
## security ecology innovation prestige
## A 8.500000 6.375000 7.125000 8.625000
## B 8.636364 8.090909 5.272727 5.454545
The two discriminant functions to separate the three classes in given dataset can be achieved by following functions.
\(LD1 = 0.06891706*security + 0.19335376*ecology + 0.30970542*innovation - 1.30619707*prestige\)
Stacked Histograms of Discriminant values :-
The Histogram for the first discriminant values in all the three differnt groups is as follows
Bi-Plot :- The transformed Linear Discriminent function vector space with all the data points from training dataset is as follows.
As described in summary of LDA function and in histograms, The Bi-Plot is showing that in the transformed vector space, LD1 is seperating the two classes very well.
Partition Plot :-
Evaluation :-
The predicted and observed classes are as follows.
## Predicted Observed
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [5,] 1 1
## [6,] 2 1
** Confusion Matrix :- **
## Actual
## Predicted A B
## A 7 0
## B 1 11
Training data accuracy = 94.74 %
The predicted and observed classes for testing data are as follows.
## Predicted Observed
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [5,] 1 1
## [6,] 1 1
** Confusion Matrix :- **
## Actual
## Predicted A B
## A 7 0
## B 0 4
Testing data accuracy = 100 %
Predicting real time data :-
## $class
## [1] A B
## Levels: A B
##
## $posterior
## A B
## 1 0.9764860254 0.02351397
## 2 0.0003922625 0.99960774
##
## $x
## LD1
## 1 -1.343396
## 2 1.616511
Applying greedy.wilks Algorithm :-
## Formula containing included variables:
##
## group ~ prestige + ecology
## <environment: 0x000000001d2026e8>
##
##
## Values calculated in each step of the selection procedure:
##
## vars Wilks.lambda F.statistics.overall p.value.overall F.statistics.diff
## 1 prestige 0.2116244 104.30986 6.028798e-11 104.309859
## 2 ecology 0.1755542 63.39934 6.303051e-11 5.547558
## p.value.diff
## 1 6.028798e-11
## 2 2.574493e-02
As per this greedy wilks report, we can conclude that prestige and ecology important variables which are contributing to the LDA Process. we can define a new LDA space with only these two variables.
The summary of the redefined LDA function is as follows.
## Call:
## lda(fit.gw$formula, data = training_data)
##
## Prior probabilities of groups:
## A B
## 0.4210526 0.5789474
##
## Group means:
## prestige ecology
## A 8.625000 6.375000
## B 5.454545 8.090909
##
## Coefficients of linear discriminants:
## LD1
## prestige -1.0635123
## ecology 0.2725672
The discriminant function to separate the two classes in given dataset can be achieved by following functions.
\(LD1 = 0.2725672*ecology - 1.0635123*prestige\)
Stacked Histograms of Discriminant values :-
The Histogram for the first discriminant values in all the three differnt groups is as follows
Bi-Plot :- The transformed Linear Discriminent function vector space with all the data points from training dataset is as follows.
As described in summary of LDA function and in histograms, The Bi-Plot is showing that in the transformed vector space, LD1 is seperating the two classes very well.
Partition Plot :-
Evaluation :-
The predicted and observed classes are as follows.
## Predicted Observed
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [5,] 1 1
## [6,] 2 1
** Confusion Matrix :- **
## Actual
## Predicted A B
## A 7 0
## B 1 11
Training data accuracy = 94.74 %
The predicted and observed classes for testing data are as follows.
## Predicted Observed
## [1,] 1 1
## [2,] 1 1
## [3,] 1 1
## [4,] 1 1
## [5,] 1 1
## [6,] 1 1
** Confusion Matrix :- **
## Actual
## Predicted A B
## A 7 0
## B 0 4
Testing data accuracy = 100 %
Predicting real time data :-
## $class
## [1] A B
## Levels: A B
##
## $posterior
## A B
## 1 0.942663947 0.05733605
## 2 0.001813352 0.99818665
##
## $x
## LD1
## 1 -1.115262
## 2 1.257573