Multinomial Logistic Regression

Multinomial Logistic Regression | R Data Analysis Examples

Multinomial logistic regression is used to model nominal outcome variables, in which the log odds of the outcomes are modeled as a linear combination of the predictor variables.

## Loading required package: foreign

## Loading required package: nnet

## Loading required package: ggplot2

## Loading required package: reshape2

Examples of multinomial logistic regression

Example 1. People’s occupational choices might be influenced by their parents’ occupations and their own education level. We can study the relationship of one’s occupation choice with education level and father’s occupation. The occupational choices will be the outcome variable which consists of categories of occupations.

Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have different preferences from young ones. The outcome variable here will be the types of food, and the predictor variables might be size of the alligators and other environmental variables.

Example 3. Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status.

Description of the data

For our data analysis example, we will expand the third example using the hsbdemo data set. Let’s first read in the data.

ml <- read.csv("/home/peopleanalytics/file=hbsdemo.csv")

The data set contains variables on 200 students. The outcome variable is prog, program type. The predictor variables are social economic status, ses, a three-level categorical variable and writing score, write, a continuous variable. Let’s start with getting some descriptive statistics of the variables of interest.

##         prog
## ses      academic general vocation
##   high         42       9        7
##   low          19      16       12
##   middle       44      20       31

##                 M       SD
## academic 56.25714 7.943343
## general  51.33333 9.397775
## vocation 46.76000 9.318754

Analysis methods you might consider

Multinomial logistic regression, the focus of this page.

Multinomial probit regression, similar to multinomial logistic regression with independent normal error terms.

Multiple-group discriminant function analysis. A multivariate method for multinomial outcome variables

Multiple logistic regression analyses, one for each pair of outcomes: One problem with this approach is that each analysis is potentially run on a different sample. The other problem is that without constraining the logistic models, we can end up with the probability of choosing all possible outcome categories greater than 1.

Collapsing number of categories to two and then doing a logistic regression: This approach suffers from loss of information and changes the original research questions to very different ones.

Ordinal logistic regression: If the outcome variable is truly ordered and if it also satisfies the assumption of proportional odds, then switching to ordinal logistic regression will make the model more parsimonious.

Alternative-specific multinomial probit regression, which allows different error structures therefore allows to relax the IIA assumption. This requires that the data structure be choice-specific.

Nested logit model, another way to relax the IIA assumption, also requires the data structure be choice-specific.

Multinomial logistic regression

Below we use the multinom function from the nnet package to estimate a multinomial logistic regression model. There are other functions in other R packages capable of multinomial regression. We chose the multinom function because it does not require the data to be reshaped (as the mlogit package does) and to mirror the example code found in Hilbe’s Logistic Regression Models.

Before running our model. We then choose the level of our outcome that we wish to use as our baseline and specify this in the relevel function. Then, we run our model using multinom. The multinom package does not include p-value calculation for the regression coefficients, so we calculate p-values using Wald tests (here z-tests).

ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)

## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.983731
## final  value 179.981726 
## converged

summary(test)

## Call:
## multinom(formula = prog2 ~ ses + write, data = ml)
## 
## Coefficients:
##          (Intercept)    seslow sesmiddle       write
## general     1.689478 1.1628411 0.6295638 -0.05793086
## vocation    4.235574 0.9827182 1.2740985 -0.11360389
## 
## Std. Errors:
##          (Intercept)    seslow sesmiddle      write
## general     1.226939 0.5142211 0.4650289 0.02141101
## vocation    1.204690 0.5955688 0.5111119 0.02222000
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635

z <- summary(test)$coefficients/summary(test)$standard.errors
z

##          (Intercept)   seslow sesmiddle     write
## general     1.376987 2.261364  1.353816 -2.705658
## vocation    3.515904 1.650050  2.492798 -5.112687

# 2-tailed z test
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p

##           (Intercept)     seslow sesmiddle        write
## general  0.1685163893 0.02373673 0.1757949 6.816914e-03
## vocation 0.0004382601 0.09893276 0.0126741 3.176088e-07

We first see that some output is generated by running the model, even though we are assigning the model to a new R object. This model-running output includes some iteration history and includes the final negative log-likelihood 179.981726. This value multiplied by two is then seen in the model summary as the Residual Deviance and it can be used in comparisons of nested models, but we won’t show an example of comparing models on this page.

The model summary output has a block of coefficients and a block of standard errors. Each of these blocks has one row of values corresponding to a model equation. Focusing on the block of coefficients, we can look at the first row comparing prog = “general” to our baseline prog = “academic” and the second row comparing prog = “vocation” to our baseline prog = “academic”. If we consider our coefficients from the first row to be b1 and our coefficients from the second row to be b2, we can write our model equations:

Equation

where b’s are the regression coefficients. A one-unit increase in the variable write is associated with a .058 decrease in the relative log odds of being in general program vs. academic program.

A one-unit increase in the variable write is associated with a .1136 decrease in the relative log odds of being in vocation program vs. academic program.

The relative log odds of being in general program vs. in academic program will decrease by 1.163 if moving from the lowest level of ses (ses==1) to the highest level of ses (ses==3).

The ratio of the probability of choosing one outcome category over the probability of choosing the baseline category is often referred to as relative risk (and it is also sometimes referred to as odds as we have just used to described the regression parameters above). Relative risk can be obtained by exponentiating the linear equations above, yielding regression coefficients that are relative risk ratios for a unit change in the predictor variable. We can use the rrr option for mlogit command to display the regression results in terms of relative risk ratios.

## extract the coefficients from the model and exponentiate
exp(coef(test))

##          (Intercept)   seslow sesmiddle     write
## general     5.416653 3.199009  1.876792 0.9437152
## vocation   69.101326 2.671709  3.575477 0.8926115

The relative risk ratio for a one-unit increase in the variable write is .9437 for being in general program vs. academic program.

The relative risk ratio switching from ses = 1 to 3 is .3126 for being in general program vs. academic program.

You can also use predicted probabilities to help you understand the model. You can calculate predicted probabilities for each of our outcome levels using the fitted function. We can start by generating the predicted probabilities for the observations in our dataset and viewing the first few rows

head(pp <- fitted(test))

##    academic   general  vocation
## 1 0.1482721 0.3382509 0.5134769
## 2 0.1201988 0.1806335 0.6991678
## 3 0.4186768 0.2368137 0.3445095
## 4 0.1726839 0.3508433 0.4764728
## 5 0.1001206 0.1689428 0.7309367
## 6 0.3533583 0.2378047 0.4088370

Next, if we want to examine the changes in predicted probability associated with one of our two variables, we can create small datasets varying one variable while holding the other constant. We will first do this holding write at its mean and examining the predicted probabilities for each level of ses.

dses <- data.frame(ses = c("low", "middle", "high"), write = mean(ml$write))
predict(test, newdata = dses, "probs")

##    academic   general  vocation
## 1 0.4396813 0.3581915 0.2021272
## 2 0.4777451 0.2283359 0.2939190
## 3 0.7009046 0.1784928 0.1206026

nother way to understand the model using the predicted probabilities is to look at the averaged predicted probabilities for different values of the continuous predictor variable write within each level of ses.

dwrite <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70),
    3))

## store the predicted probabilities for each value of ses and write
pp.write <- cbind(dwrite, predict(test, newdata = dwrite, type = "probs", se = TRUE))

## calculate the mean probabilities within each level of ses
by(pp.write[, 3:5], pp.write$ses, colMeans)

## pp.write$ses: high
##  academic   general  vocation 
## 0.6164348 0.1808049 0.2027603 
## -------------------------------------------------------- 
## pp.write$ses: low
##  academic   general  vocation 
## 0.3972955 0.3278180 0.2748864 
## -------------------------------------------------------- 
## pp.write$ses: middle
##  academic   general  vocation 
## 0.4256172 0.2010877 0.3732951

Sometimes, a couple of plots can convey a good deal amount of information. Using the predictions we generated for the pp.write object above, we can plot the predicted probabilities against the writing score by the level of ses for different levels of the outcome variable.

## melt data set to long for ggplot2
lpp <- melt(pp.write, id.vars = c("ses", "write"), value.name = "probability")
head(lpp)  # view first few rows

##   ses write variable probability
## 1 low    30 academic  0.09843258
## 2 low    31 academic  0.10716517
## 3 low    32 academic  0.11650018
## 4 low    33 academic  0.12645441
## 5 low    34 academic  0.13704163
## 6 low    35 academic  0.14827211

## plot predicted probabilities across write values for each level of ses
## facetted by program type
ggplot(lpp, aes(x = write, y = probability, colour = ses)) + geom_line() + facet_grid(variable ~
    ., scales = "free")

Things to consider

The Independence of Irrelevant Alternatives (IIA) assumption: Roughly, the IIA assumption means that adding or deleting alternative outcome categories does not affect the odds among the remaining outcomes. There are alternative modeling methods, such as alternative-specific multinomial probit model, or nested logit model to relax the IIA assumption.

Diagnostics and model fit: Unlike logistic regression where there are many statistics for performing model diagnostics, it is not as straightforward to do diagnostics with multinomial logistic regression models. For the purpose of detecting outliers or influential data points, one can run separate logit models and use the diagnostics tools on each model.

Sample size: Multinomial regression uses a maximum likelihood estimation method, it requires a large sample size. It also uses multiple equations. This implies that it requires an even larger sample size than ordinal or binary logistic regression.

Complete or quasi-complete separation: Complete separation means that the outcome variable separate a predictor variable completely, leading perfect prediction by the predictor variable.

Perfect prediction means that only one value of a predictor variable is associated with only one value of the response variable. But you can tell from the output of the regression coefficients that something is wrong. You can then do a two-way tabulation of the outcome variable with the problematic variable to confirm this and then rerun the model without the problematic variable.

Empty cells or small cells: You should check for empty or small cells by doing a cross-tabulation between categorical predictors and the outcome variable. If a cell has very few cases (a small cell), the model may become unstable or it might not even run at all.