This method is used for a pattern recognition problem.
The linear discrimant analysis equation: B = (a1)x(x1) + (a2)x(x2) + … + (ap)x(xp)
We are looking for the solution to the problem where we classify a newt subject into population v if the value of B < a given value.
Idea came from Fisher and is the first classification protocol developed. This is a classification technique which predated other classification methods like logistic regression and classification trees.
Load the salmon data from the rrcov package.
setwd("~/Google Drive/CR Rao Course/")
library(rrcov)
## Loading required package: robustbase
## Scalable Robust Estimators with High Breakdown Point (version 1.3-8)
data(salmon)
head(salmon)
## Gender Freshwater Marine Origin
## 1 2 108 368 Alaskan
## 2 1 131 355 Alaskan
## 3 1 105 469 Alaskan
## 4 2 86 506 Alaskan
## 5 1 99 402 Alaskan
## 6 2 87 423 Alaskan
salmon <- salmon[ , -1]
summary(salmon)
## Freshwater Marine Origin
## Min. : 53.0 Min. :301.0 Alaskan :50
## 1st Qu.: 99.0 1st Qu.:367.0 Canadian:50
## Median :117.5 Median :396.5
## Mean :117.9 Mean :398.1
## 3rd Qu.:140.0 3rd Qu.:428.2
## Max. :179.0 Max. :511.0
alaska <- subset(salmon, salmon$Origin =="Alaskan") # Create a alaskan fish subset.
canada <- subset(salmon, salmon$Origin == "Canadian") # Create a canadian fish subset.
# Generate a scatter plot with a range of values that can be accomodated. .
plot(alaska$Freshwater, alaska$Marine, pch = 20, col=2, xlim=c(50,200), ylim=c(300, 550), main="Plot of Scale size of Salmon", xlab="Freshwater scale diameter", ylab="Marine scale diameter")
# Plot command does not accoept another plot unlike the points and curve commands.
points(canada$Freshwater, canada$Marine, col=3, pch=15)
legend("topright", legend =c("Alaskan Salmon", "Canadian Salmon"), pch=c(20,15), col=c(2:3))
The objective of Linear Discriminant analysis is to fit a line that seperate the alaskan and canadian fish.
Load the package MASS.
library(MASS)
lda1 <- lda(salmon$Origin~salmon$Freshwater+salmon$Marine , na.action = "na.omit" )
lda1
## Call:
## lda(salmon$Origin ~ salmon$Freshwater + salmon$Marine, na.action = "na.omit")
##
## Prior probabilities of groups:
## Alaskan Canadian
## 0.5 0.5
##
## Group means:
## salmon$Freshwater salmon$Marine
## Alaskan 98.38 429.66
## Canadian 137.46 366.62
##
## Coefficients of linear discriminants:
## LD1
## salmon$Freshwater 0.04458572
## salmon$Marine -0.01803856
The output of the linear discriminant analysis starts with the baseline (bayesian) probablity of the probablities of the fish being alaskan or canadian.
Coefficients of the linear discrimant analysis gives you the a1 and a2 so that we can draw line.
salmon1 <- predict(lda1)
confus_m <- table(salmon$Origin, salmon1$class)
confus_m
##
## Alaskan Canadian
## Alaskan 44 6
## Canadian 1 49
So the classification protocol correctly classifed 44 alaskan and 49 canadian fish. Misclassification rate = 7%
Is an extension of the LDA method. We transfer the problem to a higher dimension. We look at multiple permutations of the predictor values like this. if x is a value then we make a high dimensional vector like this…
(x1, x2, x12, x22, x1.x2).
In this dimension we can potentially find a hyperplane which may be able to seperate the values.