LDA_Classificaion_Automotive

Introduction :-

## 'data.frame':    45 obs. of  5 variables:
##  $ group     : chr  "A" "A" "A" "A" ...
##  $ security  : int  8 7 8 9 10 7 9 8 7 8 ...
##  $ ecology   : int  6 7 6 5 8 5 7 7 6 7 ...
##  $ innovation: int  8 8 9 7 9 6 10 9 7 8 ...
##  $ prestige  : int  10 9 9 9 10 8 9 9 9 8 ...

## 'data.frame':    45 obs. of  5 variables:
##  $ group     : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
##  $ security  : int  8 7 8 9 10 7 9 8 7 8 ...
##  $ ecology   : int  6 7 6 5 8 5 7 7 6 7 ...
##  $ innovation: int  8 8 9 7 9 6 10 9 7 8 ...
##  $ prestige  : int  10 9 9 9 10 8 9 9 9 8 ...

##  group     security         ecology         innovation      prestige     
##  A:15   Min.   : 3.000   Min.   : 5.000   Min.   : 3.0   Min.   : 2.000  
##  B:15   1st Qu.: 5.000   1st Qu.: 7.000   1st Qu.: 4.0   1st Qu.: 3.000  
##  C:15   Median : 8.000   Median : 8.000   Median : 5.0   Median : 6.000  
##         Mean   : 7.133   Mean   : 7.533   Mean   : 5.6   Mean   : 5.711  
##         3rd Qu.: 9.000   3rd Qu.: 8.000   3rd Qu.: 7.0   3rd Qu.: 8.000  
##         Max.   :10.000   Max.   :10.000   Max.   :10.0   Max.   :10.000

## 'data.frame':    30 obs. of  5 variables:
##  $ group     : Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
##  $ security  : int  8 7 8 9 10 7 9 8 7 8 ...
##  $ ecology   : int  6 7 6 5 8 5 7 7 6 7 ...
##  $ innovation: int  8 8 9 7 9 6 10 9 7 8 ...
##  $ prestige  : int  10 9 9 9 10 8 9 9 9 8 ...

##  group     security         ecology         innovation        prestige     
##  A:15   Min.   : 7.000   Min.   : 5.000   Min.   : 4.000   Min.   : 5.000  
##  B:15   1st Qu.: 8.000   1st Qu.: 6.250   1st Qu.: 5.000   1st Qu.: 6.000  
##         Median : 8.000   Median : 7.500   Median : 6.000   Median : 6.500  
##         Mean   : 8.433   Mean   : 7.367   Mean   : 6.633   Mean   : 7.133  
##         3rd Qu.: 9.000   3rd Qu.: 8.000   3rd Qu.: 8.000   3rd Qu.: 9.000  
##         Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :10.000

Data Partitioning :-

Applying LDA :-

The summary of the fitted LDA function is as follows.

## Call:
## lda(group ~ ., data = training_data)
## 
## Prior probabilities of groups:
##         A         B 
## 0.4210526 0.5789474 
## 
## Group means:
##   security  ecology innovation prestige
## A 8.500000 6.375000   7.125000 8.625000
## B 8.636364 8.090909   5.272727 5.454545
## 
## Coefficients of linear discriminants:
##                    LD1
## security    0.06891706
## ecology     0.19335376
## innovation  0.30970542
## prestige   -1.30619707

out of 19 records , the count for each category is as follows.

##  A  B 
##  8 11

That means, In the given training data, 42.10 % of records belongs to group-A 57.89 % of records belongs to group-B

The cross tabulation for mean values of each sub categories is as follows.

##   security  ecology innovation prestige
## A 8.500000 6.375000   7.125000 8.625000
## B 8.636364 8.090909   5.272727 5.454545

The two discriminant functions to separate the three classes in given dataset can be achieved by following functions.

\(LD1 = 0.06891706*security + 0.19335376*ecology + 0.30970542*innovation - 1.30619707*prestige\)

Percentage separation that is achieved by first Discriminant function is 100 %

Stacked Histograms of Discriminant values :-

The Histogram for the first discriminant values in all the three differnt groups is as follows

Bi-Plot :- The transformed Linear Discriminent function vector space with all the data points from training dataset is as follows.

As described in summary of LDA function and in histograms, The Bi-Plot is showing that in the transformed vector space, LD1 is seperating the two classes very well.

Partition Plot :-

Evaluation :-

The predicted and observed classes are as follows.

##      Predicted Observed
## [1,]         1        1
## [2,]         1        1
## [3,]         1        1
## [4,]         1        1
## [5,]         1        1
## [6,]         2        1

** Confusion Matrix :- **

##          Actual
## Predicted  A  B
##         A  7  0
##         B  1 11

Training data accuracy = 94.74 %

The predicted and observed classes for testing data are as follows.

##      Predicted Observed
## [1,]         1        1
## [2,]         1        1
## [3,]         1        1
## [4,]         1        1
## [5,]         1        1
## [6,]         1        1

** Confusion Matrix :- **

##          Actual
## Predicted A B
##         A 7 0
##         B 0 4

Testing data accuracy = 100 %

Predicting real time data :-

## $class
## [1] A B
## Levels: A B
## 
## $posterior
##              A          B
## 1 0.9764860254 0.02351397
## 2 0.0003922625 0.99960774
## 
## $x
##         LD1
## 1 -1.343396
## 2  1.616511

Applying greedy.wilks Algorithm :-

## Formula containing included variables: 
## 
## group ~ prestige + ecology
## <environment: 0x000000001d2026e8>
## 
## 
## Values calculated in each step of the selection procedure: 
## 
##       vars Wilks.lambda F.statistics.overall p.value.overall F.statistics.diff
## 1 prestige    0.2116244            104.30986    6.028798e-11        104.309859
## 2  ecology    0.1755542             63.39934    6.303051e-11          5.547558
##   p.value.diff
## 1 6.028798e-11
## 2 2.574493e-02

As per this greedy wilks report, we can conclude that prestige and ecology important variables which are contributing to the LDA Process. we can define a new LDA space with only these two variables.

The summary of the redefined LDA function is as follows.

## Call:
## lda(fit.gw$formula, data = training_data)
## 
## Prior probabilities of groups:
##         A         B 
## 0.4210526 0.5789474 
## 
## Group means:
##   prestige  ecology
## A 8.625000 6.375000
## B 5.454545 8.090909
## 
## Coefficients of linear discriminants:
##                 LD1
## prestige -1.0635123
## ecology   0.2725672

The discriminant function to separate the two classes in given dataset can be achieved by following functions.

\(LD1 = 0.2725672*ecology - 1.0635123*prestige\)

Percentage separation that is achieved by first Discriminant function is 100 %

Stacked Histograms of Discriminant values :-

The Histogram for the first discriminant values in all the three differnt groups is as follows

Bi-Plot :- The transformed Linear Discriminent function vector space with all the data points from training dataset is as follows.

As described in summary of LDA function and in histograms, The Bi-Plot is showing that in the transformed vector space, LD1 is seperating the two classes very well.

Partition Plot :-

Evaluation :-

The predicted and observed classes are as follows.

##      Predicted Observed
## [1,]         1        1
## [2,]         1        1
## [3,]         1        1
## [4,]         1        1
## [5,]         1        1
## [6,]         2        1

** Confusion Matrix :- **

##          Actual
## Predicted  A  B
##         A  7  0
##         B  1 11

Training data accuracy = 94.74 %

The predicted and observed classes for testing data are as follows.

##      Predicted Observed
## [1,]         1        1
## [2,]         1        1
## [3,]         1        1
## [4,]         1        1
## [5,]         1        1
## [6,]         1        1

** Confusion Matrix :- **

##          Actual
## Predicted A B
##         A 7 0
##         B 0 4

Testing data accuracy = 100 %

Predicting real time data :-

## $class
## [1] A B
## Levels: A B
## 
## $posterior
##             A          B
## 1 0.942663947 0.05733605
## 2 0.001813352 0.99818665
## 
## $x
##         LD1
## 1 -1.115262
## 2  1.257573

LDA_Classificaion_Automotive_Classes