The Linear DIscrimnant Analysis is a type of dimensionality reduction technique for linearly distributed data to increase the seperation between the class Variable which is described bu n feature variables.

The main Idea behind LDA is to tranforms the n dimensional feature vector of the Class variable to d < n-1 , d dimesional vector to separate the class variables.

Intution behind LDA

we are Gonna Create a model of LDA discrination fuction for pokeman stats Dataset

The Pokemon Dataset contain the name and other characterstics of the pokemon like hitpoints, strength etc.

Using LDA Technique we are gonna Classify the Type of Pokemon

library(caret)
## Warning: package 'caret' was built under R version 3.6.3
library(ggplot2)
library(dplyr)
library(MASS)
raw.data <- read.csv("Pokemon.csv" ,header = TRUE)
names(raw.data)
##  [1] "X."         "Name"       "Type.1"     "Type.2"     "Total"     
##  [6] "HP"         "Attack"     "Defense"    "Sp..Atk"    "Sp..Def"   
## [11] "Speed"      "Generation" "Legendary"

Preprocessing of the Data

new.data <- raw.data%>%
  dplyr::select(-X., -Name , -Type.2)%>%
  mutate(Type.1 = as.factor(Type.1))

new.data <- na.omit(new.data)
str(new.data)
## 'data.frame':    800 obs. of  10 variables:
##  $ Type.1    : Factor w/ 18 levels "Bug","Dark","Dragon",..: 10 10 10 10 7 7 7 7 7 18 ...
##  $ Total     : int  318 405 525 625 309 405 534 634 634 314 ...
##  $ HP        : int  45 60 80 80 39 58 78 78 78 44 ...
##  $ Attack    : int  49 62 82 100 52 64 84 130 104 48 ...
##  $ Defense   : int  49 63 83 123 43 58 78 111 78 65 ...
##  $ Sp..Atk   : int  65 80 100 122 60 80 109 130 159 50 ...
##  $ Sp..Def   : int  65 80 100 120 50 65 85 85 115 64 ...
##  $ Speed     : int  45 60 80 80 65 80 100 100 100 43 ...
##  $ Generation: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Legendary : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...
indexes <- createDataPartition(new.data$Type.1 , p =0.8, list =FALSE)

train.data <- new.data[indexes, ]
test.data<- new.data[-indexes, ]

The number of observations in Training Data 649.

Modeling Using MASS Package

model <- lda(Type.1~. ,data = train.data)
## Warning in lda.default(x, grouping, ...): variables are collinear
model$scaling
##                         LD1           LD2           LD3           LD4
## Total         -3.646229e-05 -0.0004959523  0.0007443766 -0.0006166644
## HP             7.039665e-04  0.0213389294  0.0019103014 -0.0011147122
## Attack        -2.009297e-02  0.0030417675  0.0233278475 -0.0231522683
## Defense       -2.118141e-02 -0.0206585461 -0.0176300487  0.0148585045
## Sp..Atk        2.668373e-02 -0.0208836401 -0.0093275958 -0.0173644867
## Sp..Def        7.504335e-03  0.0031052961  0.0037015655 -0.0086886002
## Speed          6.188478e-03  0.0117662098  0.0093598330  0.0297643340
## Generation     4.127702e-02 -0.1988935350  0.1110448660  0.2486774479
## LegendaryTrue  1.293321e-01 -1.3002278128  1.7499712484  1.2853764876
##                         LD5          LD6          LD7           LD8
## Total         -0.0002316904 -0.001113855 -0.001521317 -0.0014834015
## HP             0.0313717939 -0.015858635 -0.018850751  0.0054212723
## Attack        -0.0147988700  0.013574351  0.004658984  0.0018682181
## Defense        0.0021838305 -0.022813488 -0.010014217  0.0000748689
## Sp..Atk       -0.0093134806 -0.007650634 -0.007713171  0.0143157034
## Sp..Def        0.0065828770  0.028239742  0.011759812 -0.0363839044
## Speed         -0.0119503490 -0.016595856 -0.007631734 -0.0121223800
## Generation     0.0292526851  0.364083459 -0.355566048  0.0874242761
## LegendaryTrue  1.6986392496 -0.271033139  2.678740965  1.4124709966

The model throughs a Dimensionality reduction from 9 to 8 variables , This leads to a conclusion that the LDA menthod is not suitable for the Pokemon dataset.

Because of the non linear relationship between the feature variables the LDA method is not effective in dimensionality reduction.