ABALONE AGE CLASSIFICATION USING LDA

abalone <- read.csv("train1_abalone.csv")
# plot(abalone[,c(1,2,3,4,5,6,7)], col=abalone[,8])

set.seed(50) 
my_abalone<-abalone
my_abalone <- my_abalone[sample(1:4177),]
table(my_abalone$age)
## 
## adult   old young 
##  3498   490   189
# Take sample
trainset <- my_abalone[1:2088, ]
table(trainset$age)
## 
## adult   old young 
##  1752   253    83
testset <- my_abalone[2089:4177, ]
table(testset$age)
## 
## adult   old young 
##  1746   237   106
library("MASS")
dev_fit<-lda(age~.,data=trainset)
dev_fit
## Call:
## lda(age ~ ., data = trainset)
## 
## Prior probabilities of groups:
##      adult        old      young 
## 0.83908046 0.12116858 0.03975096 
## 
## Group means:
##          length  diameter     height     weight    shucked    viscera
## adult 0.5288470 0.4113071 0.13998288 0.81823545 0.36210245 0.18009247
## old   0.5842688 0.4617194 0.16731225 1.10796443 0.42112253 0.23135178
## young 0.2526506 0.1858434 0.06174699 0.09089759 0.03875904 0.01960241
##            shell
## adult 0.23162158
## old   0.35438933
## young 0.02708434
## 
## Coefficients of linear discriminants:
##                  LD1       LD2
## length   -10.3624026  9.772793
## diameter -12.2045160  5.571747
## height    -5.9341077 -1.248201
## weight    -0.9072148 -8.408879
## shucked    6.4836940 12.695092
## viscera    6.5930527  8.476790
## shell      0.5377810 -9.538841
## 
## Proportion of trace:
##    LD1    LD2 
## 0.6331 0.3669

LD1 is 63% while LD2 = 36%. Interpretations - first linear discriminant explains ~ 63% of the variance in the abalone dataset.

Calculation of Trace or Proportion for MATHS sake

dev_fit$svd
## [1] 24.20465 18.42446
# calculate proportion
prop_dev <- dev_fit$svd^2 / sum(dev_fit$svd^2 )
prop_dev
## [1] 0.6331448 0.3668552

Check on Train Data

# Check accuracy on train data
trainset$predicted_outcome <-predict(lda(age ~. , data=trainset))$class
table(trainset$age, trainset$predicted_outcome)
##        
##         adult  old young
##   adult  1643   70    39
##   old     154   99     0
##   young    15    0    68

Check on Test Data

# Apply it on test data and check for accuracy 
# note - newdata in syntax
testset$class<-predict(dev_fit,newdata=testset)$class

# check accuracy
table(testset$age, testset$class)
##        
##         adult  old young
##   adult  1631   68    47
##   old     136  101     0
##   young    22    0    84
# Visualize the LDA
plot(dev_fit)

# Exploratory Graph for LDA
library(klaR)
## Warning: package 'klaR' was built under R version 3.4.4
partimat(age~ length + diameter + height + weight, data=trainset,method="lda")

# Understand LD1 and LD2 together
train_new <-data.frame(predict(lda(age ~ .,data=trainset)))
train_final <- cbind(trainset, train_new)

head(train_final)
##      length diameter height weight shucked viscera  shell   age
## 2961  0.660    0.525  0.180 1.5965  0.7765  0.3970 0.3605 adult
## 1828  0.340    0.260  0.085 0.1885  0.0815  0.0335 0.0600 adult
## 836   0.450    0.350  0.130 0.5470  0.2450  0.1405 0.1405 adult
## 3202  0.385    0.300  0.100 0.2725  0.1115  0.0570 0.0800 adult
## 2142  0.320    0.240  0.075 0.1735  0.0760  0.0355 0.0500 adult
## 187   0.630    0.480  0.150 1.0525  0.3920  0.3360 0.2850 adult
##      predicted_outcome class posterior.adult posterior.old posterior.young
## 2961             adult adult       0.9968755    0.00312446    1.228839e-08
## 1828             adult adult       0.9863290    0.01367066    3.432673e-07
## 836              adult adult       0.9898529    0.01014708    4.600282e-08
## 3202             adult adult       0.9852054    0.01479449    8.888231e-08
## 2142             adult adult       0.9885215    0.01147766    8.074907e-07
## 187              adult adult       0.9790769    0.02092314    3.985215e-09
##            x.LD1     x.LD2
## 2961 -0.20554809 1.4211001
## 1828  0.31327850 0.4744390
## 836  -0.03769317 0.7251000
## 3202  0.05975571 0.4905251
## 2142  0.48163735 0.5323156
## 187  -0.53297226 0.4365346

Conclusions - from Graphs, difficult to classify the different AGE group clearly.