Multinomial Logistic Regression (MLR) is a form of linear regression analysis conducted when the dependent variable is nominal with more than two levels. It is used to describe data and to explain the relationship between one dependent nominal variable and one or more continuous-level (interval or ratio scale) independent variables. You can understand nominal variable as, a variable which has no intrinsic ordering.
For example: Types of Forests: ‘Evergreen Forest’, ‘Deciduous Forest’, ‘Rain Forest’.
As you see, there is no intrinsic order in them, but each forest represent a unique category. In other words, multinomial regression is an extension of logistic regression, which analyzes dichotomous (binary) dependents.
library(nnet)
library(ggplot2)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.0.5
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
ml <- read.csv("https://raw.githubusercontent.com/RWorkshop/workshopdatasets/master/multilog.csv")
ml <- ml %>% mutate(prog=factor(prog))
with(ml, table(ses, prog))
## prog
## ses academic general vocation
## high 42 9 7
## low 19 16 12
## middle 44 20 31
prog
ses academic general vocation high 42 9 7 low 19 16 12 middle 44 20 31
with(ml, do.call(rbind, tapply(write, prog, function(x) c(M = mean(x), SD = sd(x)))))
## M SD
## academic 56.25714 7.943343
## general 51.33333 9.397775
## vocation 46.76000 9.318754
M SD academic 56.25714 7.943343 general 51.33333 9.397775 vocation 46.76000 9.318754
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
## # weights: 15 (8 variable)
## initial value 219.722458
## iter 10 value 179.983731
## final value 179.981726
## converged
summary(test)
## Call:
## multinom(formula = prog2 ~ ses + write, data = ml)
##
## Coefficients:
## (Intercept) seslow sesmiddle write
## general 1.689478 1.1628411 0.6295638 -0.05793086
## vocation 4.235574 0.9827182 1.2740985 -0.11360389
##
## Std. Errors:
## (Intercept) seslow sesmiddle write
## general 1.226939 0.5142211 0.4650289 0.02141101
## vocation 1.204690 0.5955688 0.5111119 0.02222000
##
## Residual Deviance: 359.9635
## AIC: 375.9635
Call: multinom(formula = prog2 ~ ses + write, data = ml)
Coefficients: (Intercept) seslow sesmiddle write general 1.689478 1.1628411 0.6295638 -0.05793086 vocation 4.235574 0.9827182 1.2740985 -0.11360389
Std. Errors: (Intercept) seslow sesmiddle write general 1.226939 0.5142211 0.4650289 0.02141101 vocation 1.204690 0.5955688 0.5111119 0.02222000
Residual Deviance: 359.9635 AIC: 375.9635 ### Test Statistics and P-values z <- summary(test)\(coefficients/summary(test)\)standard.errors z
p <- (1 - pnorm(abs(z), 0, 1)) * 2 p (Intercept) seslow sesmiddle write general 1.376987 2.261364 1.353816 -2.705658 vocation 3.515904 1.650050 2.492798 -5.112687 (Intercept) seslow sesmiddle write general 0.1685163893 0.02373673 0.1757949 6.816914e-03 vocation 0.0004382601 0.09893276 0.0126741 3.176088e-07
### Interpretation Model execution output shows some iteration history and includes the final negative log-likelihood 179.981726. This value is multiplied by two as shown in the model summary as the Residual Deviance. |
The summary output has a block of coefficients and another block of standard errors. Each blocks has one row of values corresponding to one model equation. In the block of coefficients, we see that the first row is being compared to prog = “general” to our baseline prog = “academic” and the second row to prog = “vocation” to our baseline prog = “academic”. |
A one-unit increase in write decreases the log odds of being in general program vs. academic program by 0.0579 |
A one-unit increase in write decreases the log odds of being in vocation program vs. academic program by 0.1136 |
The log odds of being in general program than in academic program will decrease by 1.163 if moving from ses=”low” to ses=”high”. |
On the other hand, Log odds of being in general program than in academic program will decrease by 0.5332 if moving from ses="low" to ses=”middle” |
The log odds of being in vocation program vs. in academic program will decrease by 0.983 if moving from ses="low" to ses=”high” |
The log odds of being in vocation program vs. in academic program will increase by 0.291 if moving from ses="low" to ses="middle" |
Now we’ll calculate Z score and p-Value for the variables in the model. |
## extract the coefficients from the model and exponentiate exp(coef(test)) |
head(pp <- fitted(test)) (Intercept) seslow sesmiddle write general 5.416653 3.199009 1.876792 0.9437152 vocation 69.101326 2.671709 3.575477 0.8926115 academic general vocation 1 0.1482721 0.3382509 0.5134769 2 0.1201988 0.1806335 0.6991678 3 0.4186768 0.2368137 0.3445095 4 0.1726839 0.3508433 0.4764728 5 0.1001206 0.1689428 0.7309367 6 0.3533583 0.2378047 0.4088370 |
```r dses <- data.frame(ses = c("low", "middle", "high"), write = mean(ml$write)) |
predict(test, newdata = dses, "probs") ``` |
## academic general vocation ## 1 0.4396813 0.3581915 0.2021272 ## 2 0.4777451 0.2283359 0.2939190 ## 3 0.7009046 0.1784928 0.1206026 |
academic general vocation 1 0.4396813 0.3581915 0.2021272 2 0.4777451 0.2283359 0.2939190 3 0.7009046 0.1784928 0.1206026 |
```r dwrite <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70), 3)) |
## store the predicted probabilities for each value of ses and write pp.write <- cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se = TRUE)) ``` |
r ## calculate the mean probabilities within each level of ses by(pp.write[, 3:5], pp.write$ses, colMeans) |
## pp.write$ses: high ## academic general vocation ## 0.6164348 0.1808049 0.2027603 ## ------------------------------------------------------------ ## pp.write$ses: low ## academic general vocation ## 0.3972955 0.3278180 0.2748864 ## ------------------------------------------------------------ ## pp.write$ses: middle ## academic general vocation ## 0.4256172 0.2010877 0.3732951 |
pp.write$ses: low academic general vocation 0.3972955 0.3278180 0.2748864 |
pp.write$ses: middle academic general vocation 0.4256172 0.2010877 0.3732951
## melt data set to long for ggplot2
lpp <- melt(pp.write, id.vars = c("ses", "write"), value.name = "probability")
head(lpp) # view first few rows
## ses write variable probability
## 1 low 30 academic 0.09843258
## 2 low 31 academic 0.10716517
## 3 low 32 academic 0.11650018
## 4 low 33 academic 0.12645441
## 5 low 34 academic 0.13704163
## 6 low 35 academic 0.14827211
## plot predicted probabilities across write values for each level of ses
## facetted by program type
ggplot(lpp, aes(x = write, y = probability, colour = ses)) + geom_line() + facet_grid(variable ~
., scales = "free")
ses write variable probability low 30 academic 0.09843258 low 31 academic 0.10716517 low 32 academic 0.11650018 low 33 academic 0.12645441 low 34 academic 0.13704163 low 35 academic 0.14827211 png
## melt data set to long for ggplot2
lpp <- melt(pp.write, id.vars = c("ses", "write"), value.name = "probability")
head(lpp) # view first few rows
## ses write variable probability
## 1 low 30 academic 0.09843258
## 2 low 31 academic 0.10716517
## 3 low 32 academic 0.11650018
## 4 low 33 academic 0.12645441
## 5 low 34 academic 0.13704163
## 6 low 35 academic 0.14827211
## ses write variable probability
## 1 low 30 academic 0.09844
## 2 low 31 academic 0.10717
## 3 low 32 academic 0.11650
## 4 low 33 academic 0.12646
## 5 low 34 academic 0.13705
## 6 low 35 academic 0.14828
ggplot(lpp, aes(x = write, y = probability, colour = ses)) + geom_line() + facet_grid(variable ~ ., scales = "free")