ABALONE AGE CLASSIFICATION USING LDA
abalone <- read.csv("train1_abalone.csv")
# plot(abalone[,c(1,2,3,4,5,6,7)], col=abalone[,8])
set.seed(50)
my_abalone<-abalone
my_abalone <- my_abalone[sample(1:4177),]
table(my_abalone$age)
##
## adult old young
## 3498 490 189
# Take sample
trainset <- my_abalone[1:2088, ]
table(trainset$age)
##
## adult old young
## 1752 253 83
testset <- my_abalone[2089:4177, ]
table(testset$age)
##
## adult old young
## 1746 237 106
library("MASS")
dev_fit<-lda(age~.,data=trainset)
dev_fit
## Call:
## lda(age ~ ., data = trainset)
##
## Prior probabilities of groups:
## adult old young
## 0.83908046 0.12116858 0.03975096
##
## Group means:
## length diameter height weight shucked viscera
## adult 0.5288470 0.4113071 0.13998288 0.81823545 0.36210245 0.18009247
## old 0.5842688 0.4617194 0.16731225 1.10796443 0.42112253 0.23135178
## young 0.2526506 0.1858434 0.06174699 0.09089759 0.03875904 0.01960241
## shell
## adult 0.23162158
## old 0.35438933
## young 0.02708434
##
## Coefficients of linear discriminants:
## LD1 LD2
## length -10.3624026 9.772793
## diameter -12.2045160 5.571747
## height -5.9341077 -1.248201
## weight -0.9072148 -8.408879
## shucked 6.4836940 12.695092
## viscera 6.5930527 8.476790
## shell 0.5377810 -9.538841
##
## Proportion of trace:
## LD1 LD2
## 0.6331 0.3669
LD1 is 63% while LD2 = 36%. Interpretations - first linear discriminant explains ~ 63% of the variance in the abalone dataset.
Calculation of Trace or Proportion for MATHS sake
dev_fit$svd
## [1] 24.20465 18.42446
# calculate proportion
prop_dev <- dev_fit$svd^2 / sum(dev_fit$svd^2 )
prop_dev
## [1] 0.6331448 0.3668552
Check on Train Data
# Check accuracy on train data
trainset$predicted_outcome <-predict(lda(age ~. , data=trainset))$class
table(trainset$age, trainset$predicted_outcome)
##
## adult old young
## adult 1643 70 39
## old 154 99 0
## young 15 0 68
Check on Test Data
# Apply it on test data and check for accuracy
# note - newdata in syntax
testset$class<-predict(dev_fit,newdata=testset)$class
# check accuracy
table(testset$age, testset$class)
##
## adult old young
## adult 1631 68 47
## old 136 101 0
## young 22 0 84
# Visualize the LDA
plot(dev_fit)

# Exploratory Graph for LDA
library(klaR)
## Warning: package 'klaR' was built under R version 3.4.4
partimat(age~ length + diameter + height + weight, data=trainset,method="lda")

# Understand LD1 and LD2 together
train_new <-data.frame(predict(lda(age ~ .,data=trainset)))
train_final <- cbind(trainset, train_new)
head(train_final)
## length diameter height weight shucked viscera shell age
## 2961 0.660 0.525 0.180 1.5965 0.7765 0.3970 0.3605 adult
## 1828 0.340 0.260 0.085 0.1885 0.0815 0.0335 0.0600 adult
## 836 0.450 0.350 0.130 0.5470 0.2450 0.1405 0.1405 adult
## 3202 0.385 0.300 0.100 0.2725 0.1115 0.0570 0.0800 adult
## 2142 0.320 0.240 0.075 0.1735 0.0760 0.0355 0.0500 adult
## 187 0.630 0.480 0.150 1.0525 0.3920 0.3360 0.2850 adult
## predicted_outcome class posterior.adult posterior.old posterior.young
## 2961 adult adult 0.9968755 0.00312446 1.228839e-08
## 1828 adult adult 0.9863290 0.01367066 3.432673e-07
## 836 adult adult 0.9898529 0.01014708 4.600282e-08
## 3202 adult adult 0.9852054 0.01479449 8.888231e-08
## 2142 adult adult 0.9885215 0.01147766 8.074907e-07
## 187 adult adult 0.9790769 0.02092314 3.985215e-09
## x.LD1 x.LD2
## 2961 -0.20554809 1.4211001
## 1828 0.31327850 0.4744390
## 836 -0.03769317 0.7251000
## 3202 0.05975571 0.4905251
## 2142 0.48163735 0.5323156
## 187 -0.53297226 0.4365346
Conclusions - from Graphs, difficult to classify the different AGE group clearly.