This project aims to apply Multinomial Logistic Regression to classify more than 2 classes. The available Iris data will be used for this classficiation problem

data("iris")
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

We have the last column represents three species responded to their characteristic in the first 4 variables. We will build the multinomial logistic regression model based on that four variables to predict their species.

Multinomial logistic regression requires the baseline for the probability calculation, we have to choose any one out of three species as the baseline.

#Choose Setosa as the baseline
iris$base<-relevel(iris$Species,ref="setosa")

After choosing the baseline, we will create the model.

library(nnet)
model<-multinom(iris$base~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=iris)
## # weights:  18 (10 variable)
## initial  value 164.791843 
## iter  10 value 16.177348
## iter  20 value 7.111438
## iter  30 value 6.182999
## iter  40 value 5.984028
## iter  50 value 5.961278
## iter  60 value 5.954900
## iter  70 value 5.951851
## iter  80 value 5.950343
## iter  90 value 5.949904
## iter 100 value 5.949867
## final  value 5.949867 
## stopped after 100 iterations
model
## Call:
## multinom(formula = iris$base ~ Sepal.Length + Sepal.Width + Petal.Length + 
##     Petal.Width, data = iris)
## 
## Coefficients:
##            (Intercept) Sepal.Length Sepal.Width Petal.Length Petal.Width
## versicolor    18.69037    -5.458424   -8.707401     14.24477   -3.097684
## virginica    -23.83628    -7.923634  -15.370769     23.65978   15.135301
## 
## Residual Deviance: 11.89973 
## AIC: 31.89973

We can set up the formula based on the coefficient of our model.

\(y_1=\ln.\big(\frac{P(versicolor)}{P(setosa)}\big)=18.7-5.458.Sepal.Length-8.707.Sepal.Width+14.245.Petal.Length-3.097.Petal.Width\)

and

\(y_2=\ln.\big(\frac{P(virginica)}{P(setosa)}\big)=-23.836-7.923.Sepal.Length-15.370.Sepal.Width+23.659.Petal.Length+15.135.Petal.Width\)

then

\(\frac{P(versicolor)}{P(setosa)}=e^{y1}\)

and

\(\frac{P(verginica)}{P(setosa)}=e^{y2}\)

Since \(P(versicolor)+P(setosa)+P(virginica)=1\) then we obtain

\(P(setosa)=\frac{1}{1+e^{y_1}+e^{y_2}}\)

We will predict the classification by our multinomial logistic regression model.

predicted<-predict(model,iris)
tab<-table(Predicted=predicted,Actual=iris$base)
print(tab)
##             Actual
## Predicted    setosa versicolor virginica
##   setosa         50          0         0
##   versicolor      0         49         1
##   virginica       0          1        49
#Accuracy of the model
sum(diag(tab))/sum(tab)
## [1] 0.9866667

The accuracy of this model is up to 98.6%.