The Linear DIscrimnant Analysis is a type of dimensionality reduction technique for linearly distributed data to increase the seperation between the class Variable which is described bu n feature variables.
The main Idea behind LDA is to tranforms the n dimensional feature vector of the Class variable to d < n-1 , d dimesional vector to separate the class variables.
we are Gonna Create a model of LDA discrination fuction for pokeman stats Dataset
The Pokemon Dataset contain the name and other characterstics of the pokemon like hitpoints, strength etc.
Using LDA Technique we are gonna Classify the Type of Pokemon
library(caret)
## Warning: package 'caret' was built under R version 3.6.3
library(ggplot2)
library(dplyr)
library(MASS)
raw.data <- read.csv("Pokemon.csv" ,header = TRUE)
names(raw.data)
## [1] "X." "Name" "Type.1" "Type.2" "Total"
## [6] "HP" "Attack" "Defense" "Sp..Atk" "Sp..Def"
## [11] "Speed" "Generation" "Legendary"
new.data <- raw.data%>%
dplyr::select(-X., -Name , -Type.2)%>%
mutate(Type.1 = as.factor(Type.1))
new.data <- na.omit(new.data)
str(new.data)
## 'data.frame': 800 obs. of 10 variables:
## $ Type.1 : Factor w/ 18 levels "Bug","Dark","Dragon",..: 10 10 10 10 7 7 7 7 7 18 ...
## $ Total : int 318 405 525 625 309 405 534 634 634 314 ...
## $ HP : int 45 60 80 80 39 58 78 78 78 44 ...
## $ Attack : int 49 62 82 100 52 64 84 130 104 48 ...
## $ Defense : int 49 63 83 123 43 58 78 111 78 65 ...
## $ Sp..Atk : int 65 80 100 122 60 80 109 130 159 50 ...
## $ Sp..Def : int 65 80 100 120 50 65 85 85 115 64 ...
## $ Speed : int 45 60 80 80 65 80 100 100 100 43 ...
## $ Generation: int 1 1 1 1 1 1 1 1 1 1 ...
## $ Legendary : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...
indexes <- createDataPartition(new.data$Type.1 , p =0.8, list =FALSE)
train.data <- new.data[indexes, ]
test.data<- new.data[-indexes, ]
The number of observations in Training Data 649.
model <- lda(Type.1~. ,data = train.data)
## Warning in lda.default(x, grouping, ...): variables are collinear
model$scaling
## LD1 LD2 LD3 LD4
## Total -3.646229e-05 -0.0004959523 0.0007443766 -0.0006166644
## HP 7.039665e-04 0.0213389294 0.0019103014 -0.0011147122
## Attack -2.009297e-02 0.0030417675 0.0233278475 -0.0231522683
## Defense -2.118141e-02 -0.0206585461 -0.0176300487 0.0148585045
## Sp..Atk 2.668373e-02 -0.0208836401 -0.0093275958 -0.0173644867
## Sp..Def 7.504335e-03 0.0031052961 0.0037015655 -0.0086886002
## Speed 6.188478e-03 0.0117662098 0.0093598330 0.0297643340
## Generation 4.127702e-02 -0.1988935350 0.1110448660 0.2486774479
## LegendaryTrue 1.293321e-01 -1.3002278128 1.7499712484 1.2853764876
## LD5 LD6 LD7 LD8
## Total -0.0002316904 -0.001113855 -0.001521317 -0.0014834015
## HP 0.0313717939 -0.015858635 -0.018850751 0.0054212723
## Attack -0.0147988700 0.013574351 0.004658984 0.0018682181
## Defense 0.0021838305 -0.022813488 -0.010014217 0.0000748689
## Sp..Atk -0.0093134806 -0.007650634 -0.007713171 0.0143157034
## Sp..Def 0.0065828770 0.028239742 0.011759812 -0.0363839044
## Speed -0.0119503490 -0.016595856 -0.007631734 -0.0121223800
## Generation 0.0292526851 0.364083459 -0.355566048 0.0874242761
## LegendaryTrue 1.6986392496 -0.271033139 2.678740965 1.4124709966
The model throughs a Dimensionality reduction from 9 to 8 variables , This leads to a conclusion that the LDA menthod is not suitable for the Pokemon dataset.
Because of the non linear relationship between the feature variables the LDA method is not effective in dimensionality reduction.