library(readr)
tissue <- read.csv("D:/Online course/R code/BreastTissue.csv")
# Checking the structure of adult data
str(tissue)
## 'data.frame': 106 obs. of 11 variables:
## $ Case..: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class : chr "car" "car" "car" "car" ...
## $ I0 : num 525 330 552 380 363 ...
## $ PA500 : num 0.187 0.227 0.232 0.241 0.201 ...
## $ HFS : num 0.0321 0.2653 0.0635 0.2862 0.2443 ...
## $ DA : num 229 121 265 138 125 ...
## $ Area : num 6844 3163 11888 5402 3290 ...
## $ A.DA : num 29.9 26.1 44.9 39.2 26.3 ...
## $ Max.IP: num 60.2 69.7 77.8 88.8 69.4 ...
## $ DR : num 220.7 99.1 253.8 105.2 103.9 ...
## $ P : num 557 400 657 494 425 ...
tissue <- tissue[, -1]
tissue$Class <- as.factor(tissue$Class)
levels(tissue$Class)[levels(tissue$Class) %in% c("fad", "gla", "mas")] <- "other"
levels(tissue$Class)
## [1] "adi" "car" "con" "other"
#Splitting the data using a function from dplyr package
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
index <- createDataPartition(tissue$Class, p = .70, list = FALSE)
train <- tissue[index,]
test <- tissue[-index,]
# Setting the reference
train$Class <- relevel(train$Class, ref = "adi")
require(nnet)
## Loading required package: nnet
# Training the multinomial model
multinom_model <- multinom(Class ~ ., data = tissue)
## # weights: 44 (30 variable)
## initial value 146.947202
## iter 10 value 107.785951
## iter 20 value 71.796827
## iter 30 value 17.709626
## iter 40 value 11.374947
## iter 50 value 6.775081
## iter 60 value 5.795295
## iter 70 value 5.581878
## iter 80 value 4.866367
## iter 90 value 4.196949
## iter 100 value 4.148436
## final value 4.148436
## stopped after 100 iterations
# Checking the model
summary(multinom_model)
## Call:
## multinom(formula = Class ~ ., data = tissue)
##
## Coefficients:
## (Intercept) I0 PA500 HFS DA Area
## car 86.60913 -0.96305833 35.275690 -29.634394 -2.8849859 -0.010116698
## con 65.26559 -0.03206876 3.504293 5.184549 0.5988307 -0.008582223
## other 94.22098 -0.47922587 -10.419688 45.952288 -0.3353446 -0.011760021
## A.DA Max.IP DR P
## car -0.5642795 2.4929893 3.2066178 0.70487069
## con 1.7128402 0.2689507 -0.2868354 -0.08833327
## other 1.6745352 0.2754780 0.7321152 0.31286440
##
## Std. Errors:
## (Intercept) I0 PA500 HFS DA Area
## car 0.03664981 0.09942884 0.018333418 0.038104270 0.6245774 0.03834079
## con 0.02221388 0.55721681 0.003310544 0.001951953 0.6365584 0.02143104
## other 0.03682272 0.09274290 0.017958668 0.037604062 0.6502786 0.03820181
## A.DA Max.IP DR P
## car 0.5939056 0.8468039 0.6085151 0.1867447
## con 0.5278823 0.7128738 0.1856930 0.3497548
## other 0.5893114 0.8206051 0.6075567 0.1825095
##
## Residual Deviance: 8.296872
## AIC: 68.29687
Just like binary logistic regression, we need to convert the coefficients to odds by taking the exponential of the coefficients.
exp(coef(multinom_model))
## (Intercept) I0 PA500 HFS DA Area
## car 4.110228e+37 0.3817237 2.089476e+15 1.348795e-13 0.05585558 0.9899343
## con 2.210481e+28 0.9684400 3.325791e+01 1.784930e+02 1.81998945 0.9914545
## other 8.310992e+40 0.6192626 2.983919e-05 9.053680e+19 0.71509165 0.9883089
## A.DA Max.IP DR P
## car 0.5687698 12.097385 24.6954204 2.0235850
## con 5.5446873 1.308591 0.7506353 0.9154557
## other 5.3363143 1.317160 2.0794745 1.3673361
The predicted values are saved as fitted.values in the model object. Let’s see the top 6 observations.
head(round(fitted(multinom_model), 2))
## adi car con other
## 1 0 0.97 0 0.03
## 2 0 1.00 0 0.00
## 3 0 1.00 0 0.00
## 4 0 1.00 0 0.00
## 5 0 1.00 0 0.00
## 6 0 0.96 0 0.04
The multinomial regression predicts the probability of a particular observation to be part of the said level. This is what we are seeing in the above table. Columns represent the classification levels and rows represent the observations. This means that the first six observation are classified as car.
To validate the model, we will be looking at the accuracy of the model. This accuracy can be calculated from the classification table.
# Predicting the values for train dataset
train$ClassPredicted <- predict(multinom_model, newdata = train, "class")
# Building classification table
tab <- table(train$Class, train$ClassPredicted)
# Calculating accuracy - sum of diagonal elements divided by total obs
round((sum(diag(tab))/sum(tab))*100,2)
## [1] 98.68
[1] 98.68
Our model accuracy has turned out to be 98.68% in the training dataset.
# Predicting the class for test dataset
test$ClassPredicted <- predict(multinom_model, newdata = test, "class")
# Building classification table
tab <- table(test$Class, test$ClassPredicted)
tab
##
## adi car con other
## adi 6 0 0 0
## car 0 6 0 0
## con 0 0 4 0
## other 0 0 0 14
We were able to achieve 100% accuracy in the test dataset and this number is very close to train, and thus we conclude that the model is good and is also stable.
In this tutorial, we learned how to build the multinomial logistic regression model, how to validate, and make a prediction on the unseen dataset.
#source https://www.r-bloggers.com/multinomial-logistic-regression-with-r/