The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model. They are used when the dependent variable has more than two nominal categories.
Dummy coding of independent variables is quite common. In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables. There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables. All but one category has its own dummy variable. Each category’s dummy variable has a value of 1 for its category and a 0 for all others. One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.
The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables. The result is M-1 binary logistic regression models. Each one tells the effect of the predictors on the probability of success in that category in comparison to the reference category. Each model has its own intercept and regression coefficients—the predictors can affect each category differently.
Generalized of Logistic Regression has two different categories:
Multinomial- Unordered logistic regression
Multinomial- Ordered logistic regression
Data From Agresti 8.1.2. \(N = 219\) alligators were trapped from four Florida lakes (Hancock, Oklawaha, Trafford, George) and their stomach contents were examined. Primary food source for the \(i\)th alligator was classified as \(Y_i\in\{\mbox{fish,bird,invertebrate,reptile,other}\}\).
In addition, \(M_i = 1\) for male, 0 for female and \(S_i = 1\) for small (\(\le 2.3\)m) and 0 for large were recorded.
Objective: In this study, I compared multinomial logistic regression models from complex(saturated) the simplest(null) model to understand the relationship between the food source and other predictor variables.
Methods to use: - Multinomial Logistic Regression(Saturated to Null) - Drop-in-deviance to compare the models - Finding probabilities for the special cases …
library(nnet) # has the function multinom
library(vcdExtra)# contains the alligator data set
## Loading required package: vcd
## Loading required package: grid
## Loading required package: gnm
data(Alligator) # get the alligator data from the vcdExtra package
names(Alligator)
## [1] "lake" "sex" "size" "food" "count"
head(Alligator)
## lake sex size food count
## 1 Hancock male small fish 7
## 2 Hancock male small invert 1
## 3 Hancock male small reptile 0
## 4 Hancock male small bird 0
## 5 Hancock male small other 5
## 6 Hancock male large fish 4
dim(Alligator)
## [1] 80 5
Notice that instead of one record per alligator, the data are grouped, so that there are 7 small, male, Hancock alligators that prefer fish. We will account for this grouping in the modeling by using a weights statement.
Model the data as \[ Y_i\sim\mbox{independent Multinomial}\left(1;\left[\pi_F(x),\pi_B(x),\pi_I(x),\pi_R(x),\pi_O(x)\right]\right) \] for \(i=1,2,\ldots,219\).
Note that using the baseline-category approach, choose bird as baseline and fit several models. Baseline is chosen by R based on the alphabetical order.
# Saturated model: Lake, size and sex are fully intertacted
fitS <- multinom(food ~ lake * size * sex, data = Alligator, weights = count)
## # weights: 85 (64 variable)
## initial value 352.466903
## iter 10 value 263.396848
## iter 20 value 251.196084
## iter 30 value 244.981116
## iter 40 value 243.848348
## iter 50 value 243.802743
## iter 60 value 243.800913
## final value 243.800898
## converged
# Null model says that food preference does not depend on anything,Alligators has the same food preferences regardless of wherever they live, whatever their size and sex therefore 1 used for the null model. There are 5 categories(food) so 4 variables to estimate
fit0 <- multinom(food ~ 1, data = Alligator, weights = count)
## # weights: 10 (4 variable)
## initial value 352.466903
## iter 10 value 302.185213
## final value 302.181462
## converged
# Other models:
fit1 <- multinom(food ~ sex, data = Alligator, weights = count)
## # weights: 15 (8 variable)
## initial value 352.466903
## iter 10 value 301.233557
## final value 301.129428
## converged
fit2 <- multinom(food ~ size, data = Alligator, weights = count)
## # weights: 15 (8 variable)
## initial value 352.466903
## iter 10 value 294.669601
## final value 294.606678
## converged
fit3 <- multinom(food ~ lake, data = Alligator, weights = count)
## # weights: 25 (16 variable)
## initial value 352.466903
## iter 10 value 283.282977
## iter 20 value 280.584605
## final value 280.583843
## converged
fit4 <- multinom(food ~ size + lake, data = Alligator, weights = count)
## # weights: 30 (20 variable)
## initial value 352.466903
## iter 10 value 271.161880
## iter 20 value 270.060188
## final value 270.040139
## converged
fit5 <- multinom(food ~ size + lake + sex, data = Alligator, weights = count)
## # weights: 35 (24 variable)
## initial value 352.466903
## iter 10 value 270.397070
## iter 20 value 268.958046
## final value 268.932740
## converged
Look at a particular model fit:
summary(fit5)
## Call:
## multinom(formula = food ~ size + lake + sex, data = Alligator,
## weights = count)
##
## Coefficients:
## (Intercept) sizesmall lakeHancock lakeOklawaha lakeTrafford sexmale
## fish 1.70178892 0.7303298 -0.5752403 0.5503569 -1.23679067 0.60639521
## invert 0.53452560 2.0665999 -2.3557451 1.4635491 -0.08096449 0.14342792
## other -0.01957203 1.0209285 0.1913919 0.5764707 0.32102428 0.35382356
## reptile -1.15700455 0.1733300 0.5539263 3.0803954 1.82403973 -0.02116283
##
## Std. Errors:
## (Intercept) sizesmall lakeHancock lakeOklawaha lakeTrafford sexmale
## fish 0.7690506 0.6522820 0.7952018 1.209823 0.8661011 0.6888447
## invert 0.8555494 0.7083550 0.9463415 1.232433 0.8814409 0.7292036
## other 0.9167843 0.7250019 0.9072028 1.374193 0.9589615 0.7623291
## reptile 1.3221528 0.8554960 1.3799947 1.591429 1.3390274 0.9088120
##
## Residual Deviance: 537.8655
## AIC: 585.8655
Comparing Models with Drop-in-Deviance Test
fit0$deviance - fitS$deviance # null deviance
## [1] 116.7611
fit1$deviance - fitS$deviance # sex only
## [1] 114.6571
fit2$deviance - fitS$deviance # size only
## [1] 101.6116
fit3$deviance - fitS$deviance # lake only
## [1] 73.56589
fit4$deviance - fitS$deviance # lake + size
## [1] 52.47848
fit5$deviance - fitS$deviance # sex + lake + size
## [1] 50.26368
The small difference between model 4 (dropping sex) and model 5 suggests that the model can be simplified. The drop-in-deviance test is
fit4$deviance-fit5$deviance
## [1] 2.214799
to be compare to \(\chi^2_4\); dropping sex saves one parameter in each of the J-1 non-baseline categories, hence 4 degrees of freedom. Yes, we can simplify.
So, we can drop sex and reproduce the models:
# Saturated model:
fitS <- multinom(food ~ lake * size, data = Alligator, weights = count)
## # weights: 45 (32 variable)
## initial value 352.466903
## iter 10 value 266.342868
## iter 20 value 262.457272
## iter 30 value 261.584991
## iter 40 value 261.502078
## iter 50 value 261.500253
## final value 261.500226
## converged
# Null model:
fit0 <- multinom(food ~ 1, data = Alligator, weights = count)
## # weights: 10 (4 variable)
## initial value 352.466903
## iter 10 value 302.185213
## final value 302.181462
## converged
fit1 <- multinom(food ~ size, data = Alligator, weights = count)
## # weights: 15 (8 variable)
## initial value 352.466903
## iter 10 value 294.669601
## final value 294.606678
## converged
fit2 <- multinom(food ~ lake, data = Alligator, weights = count)
## # weights: 25 (16 variable)
## initial value 352.466903
## iter 10 value 283.282977
## iter 20 value 280.584605
## final value 280.583843
## converged
fit3 <- multinom(food ~ size + lake, data = Alligator, weights = count)
## # weights: 30 (20 variable)
## initial value 352.466903
## iter 10 value 271.161880
## iter 20 value 270.060188
## final value 270.040139
## converged
fit0$deviance - fitS$deviance # null
## [1] 81.36247
fit1$deviance - fitS$deviance # size only
## [1] 66.2129
fit2$deviance - fitS$deviance # lake only
## [1] 38.16723
fit3$deviance - fitS$deviance # size + lake
## [1] 17.07983
Think about degrees of freedom for these models. The null model says there are no differences among lakes or sizes, so only one probability distribution over the five classes (fish, invert, other, reptile and bird) is necessary. This requires four parameters, since the probabilities sum to one, and bird as baseline is the category without a separate parameter:
coef(fit0)
## (Intercept)
## fish 1.9783455
## invert 1.5459246
## other 0.9007867
## reptile 0.3794897
The saturated model has 32 parameters: 4 each for 4 lakes times 2 size classes. Hence the null model has \(32 - 4 = 28\) residual degrees of freedom.
The size only model has two distributions, one for small alligators and one for large alligators (default), hence 8 parameters and 24 residual df:
coef(fit1)
## (Intercept) sizesmall
## fish 1.7272335 0.5550764
## invert 0.6931755 1.5039716
## other 0.4855291 0.8494003
## reptile 0.4855171 -0.3033028
Similiarly, the lake only model has four distributions, one for each lake (with Lake George as default), hence uses 16 df and has 16 residual df:
coef(fit2)
## (Intercept) lakeHancock lakeOklawaha lakeTrafford
## fish 2.3977922 -0.6060193 0.4925008 -1.2191111
## invert 1.8970142 -2.1201047 1.0473497 -0.3929084
## other 0.6930138 0.2625353 0.4055545 0.2233165
## reptile -1.0983211 0.5875275 3.0441536 1.7915266
Finally, the size + lake model is additive: it uses 20 parameters (either 16 lake parameters to get 4 lake distributions, which are then shifted by 4 size parameters; or 8 size parameters to get 2 size distributions, which are then shifted by 12 lake parameters) and hence has 12 residual df:
coef(fit3)
## (Intercept) sizesmall lakeHancock lakeOklawaha lakeTrafford
## fish 2.0929906 0.6308607 -0.6951466 0.6529211 -1.08764377
## invert 0.5439613 2.0890800 -2.3535280 1.5901541 0.03435528
## other 0.1887161 0.9624274 0.1310280 0.6585231 0.42873394
## reptile -1.2215892 0.2796424 0.5476069 3.1118287 1.84765700
Among these non-saturated models, fit3 is the largest model and has the smallest residual deviance (as it must). Is it better than the next best model, fit2? Or could we simplify the model? Drop-in-deviance is
fit2$deviance - fit3$deviance
## [1] 21.08741
on \(16 - 12 = 4\) df, with p-value
1 - pchisq(fit2$deviance - fit3$deviance, df = 4)
## [1] 0.0003042796
so reject the proposed simplification.
Probability calculations
Ex1. What is the probability for a large alligator in LakeGeorge prefers fish?
By hand
# Note that bird:0 (baseline), fish: 2.0929, large:0, lakeGeorge:0
#P = exp(P(food_type)+P(size)+P(lake)) / (1 + (exp(fish)(food_type) ++ P(size)+P(lake))
# exp(invert)(P(food_type) + P(size)+P(lake)) +
# exp(other) (P(food_type) + P(size)+P(lake)) # exp(reptile)(P(food_type) + P(size)+P(lake)) )
P <- (exp(2.0929906 + 0 + 0 )) / (1 + exp( 2.0929906 + 0 + 0 )
+ exp( 0.5439613 + 0 + 0 )
+ exp( 0.1887161 + 0 + 0 )
+ exp(-1.2215892 + 0 + 0 ))
P
## [1] 0.6574398
By using Model
tmp <- c(0, coef(fit3)[, 1]) # 0 for bird and Lake George coefficients.
exp(tmp) / sum(exp(tmp))
## fish invert other reptile
## 0.08107402 0.65743977 0.13967578 0.09791293 0.02389749
By R
predict(fit3,
type = "probs",
newdata = data.frame(size = "large", lake = "George"))
## bird fish invert other reptile
## 0.08107402 0.65743977 0.13967578 0.09791293 0.02389749
Ex2. If you are a small alligator, what is your food preference relative to a large alligator on Lake Hancock?
1: Probability of Small allgitor for LakeHancock:
By hand for Reptile preference
# Note that bird:0 (baseline), fish: 2.0929, large:0, lakeGeorge:0
# P = exp(P(food_type)+P(size)+P(lake)) / (1 + (exp(fish)(food_type) ++ P(size)+P(lake))
# exp(invert)(P(food_type) + P(size)+P(lake)) +
# exp(other) (P(food_type) + P(size)+P(lake)) # exp(reptile)(P(food_type) + P(size)+P(lake)) )
P <- (exp(-1.2215892 + 0.2796424 + 0.5476069)) / (1
+ exp(-1.2215892 + 0.2796424 + 0.5476069)
+ exp( 2.0929906 + 0.6308607 - 0.6951466)
+ exp( 0.5439613 + 2.0890800 - 2.3535280)
+ exp(0.1887161 + 0.9624274 + 0.1310280))
#Reptile preference
P
## [1] 0.04745587
By using Model
tmp <- c(0, coef(fit3)[, 1] + coef(fit3)[, 2] +coef(fit3)[, 3])
exp(tmp) / sum(exp(tmp))
## fish invert other reptile
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587
By R
predict(fit3,
type = "probs",
newdata = data.frame(size = "small", lake = "Hancock"))
## bird fish invert other reptile
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587
2: Probability of Large allgitor for LakeHancock
By hand for Reptile preference
# Note that bird:0 (baseline), fish: 2.0929, large:0, lakeGeorge:0
# P = exp(P(food_type)+P(size)+P(lake)) / (1 + (exp(fish)(food_type) ++ P(size)+P(lake))
# exp(invert)(P(food_type) + P(size)+P(lake)) +
# exp(other) (P(food_type) + P(size)+P(lake)) # exp(reptile)(P(food_type) + P(size)+P(lake)) )
P <- (exp(-1.2215892 + 0.2796424 + 0.5476069)) / (1
+ exp(-1.2215892 + 0.2796424 + 0.5476069)
+ exp( 2.0929906 + 0.6308607 - 0.6951466)
+ exp( 0.5439613 + 2.0890800 - 2.3535280)
+ exp(0.1887161 + 0.9624274 + 0.1310280))
#Reptile preference
P
## [1] 0.04745587
By using Model
tmp <- c(0, coef(fit3)[, 1] + coef(fit3)[, 2] +coef(fit3)[, 3])
exp(tmp) / sum(exp(tmp))
## fish invert other reptile
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587
By R
predict(fit3,
type = "probs",
newdata = data.frame(size = "small", lake = "Hancock"))
## bird fish invert other reptile
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587
References