Generalized Linear Models 3

Introduction

The multinomial (a.k.a. polytomous) logistic regression model is a simple extension of the binomial logistic regression model. They are used when the dependent variable has more than two nominal categories.

Dummy coding of independent variables is quite common. In multinomial logistic regression the dependent variable is dummy coded into multiple 1/0 variables. There is a variable for all categories but one, so if there are M categories, there will be M-1 dummy variables. All but one category has its own dummy variable. Each category’s dummy variable has a value of 1 for its category and a 0 for all others. One category, the reference category, doesn’t need its own dummy variable as it is uniquely identified by all the other variables being 0.

The multinomial logistic regression then estimates a separate binary logistic regression model for each of those dummy variables. The result is M-1 binary logistic regression models. Each one tells the effect of the predictors on the probability of success in that category in comparison to the reference category. Each model has its own intercept and regression coefficients—the predictors can affect each category differently.

Generalized of Logistic Regression has two different categories:

Multinomial- Unordered logistic regression
Multinomial- Ordered logistic regression

Data From Agresti 8.1.2. \(N = 219\) alligators were trapped from four Florida lakes (Hancock, Oklawaha, Trafford, George) and their stomach contents were examined. Primary food source for the \(i\)th alligator was classified as \(Y_i\in\{\mbox{fish,bird,invertebrate,reptile,other}\}\).

In addition, \(M_i = 1\) for male, 0 for female and \(S_i = 1\) for small (\(\le 2.3\)m) and 0 for large were recorded.

Objective: In this study, I compared multinomial logistic regression models from complex(saturated) the simplest(null) model to understand the relationship between the food source and other predictor variables.

Methods to use:
- Multinomial Logistic Regression(Saturated to Null)
- Drop-in-deviance to compare the models
- Finding probabilities for the special cases
…

library(nnet)    # has the function multinom
library(vcdExtra)# contains the alligator data set

## Loading required package: vcd

## Loading required package: grid

## Loading required package: gnm

data(Alligator)  # get the alligator data from the vcdExtra package
names(Alligator)

## [1] "lake"  "sex"   "size"  "food"  "count"

head(Alligator)

##      lake  sex  size    food count
## 1 Hancock male small    fish     7
## 2 Hancock male small  invert     1
## 3 Hancock male small reptile     0
## 4 Hancock male small    bird     0
## 5 Hancock male small   other     5
## 6 Hancock male large    fish     4

dim(Alligator)

## [1] 80  5

Notice that instead of one record per alligator, the data are grouped, so that there are 7 small, male, Hancock alligators that prefer fish. We will account for this grouping in the modeling by using a weights statement.

Model the data as \[ Y_i\sim\mbox{independent Multinomial}\left(1;\left[\pi_F(x),\pi_B(x),\pi_I(x),\pi_R(x),\pi_O(x)\right]\right) \] for \(i=1,2,\ldots,219\).

Note that using the baseline-category approach, choose bird as baseline and fit several models. Baseline is chosen by R based on the alphabetical order.

# Saturated model: Lake, size and sex are fully intertacted
fitS <- multinom(food ~ lake * size * sex, data = Alligator, weights = count)

## # weights:  85 (64 variable)
## initial  value 352.466903 
## iter  10 value 263.396848
## iter  20 value 251.196084
## iter  30 value 244.981116
## iter  40 value 243.848348
## iter  50 value 243.802743
## iter  60 value 243.800913
## final  value 243.800898 
## converged

# Null model says that food preference does not depend on anything,Alligators has the same food preferences regardless of wherever they live, whatever their size and sex  therefore 1 used for the null model. There are 5 categories(food) so 4 variables to estimate
fit0 <- multinom(food ~ 1, data = Alligator, weights = count)

## # weights:  10 (4 variable)
## initial  value 352.466903 
## iter  10 value 302.185213
## final  value 302.181462 
## converged

# Other models:
fit1 <- multinom(food ~ sex, data = Alligator, weights = count)

## # weights:  15 (8 variable)
## initial  value 352.466903 
## iter  10 value 301.233557
## final  value 301.129428 
## converged

fit2 <- multinom(food ~ size, data = Alligator, weights = count)

## # weights:  15 (8 variable)
## initial  value 352.466903 
## iter  10 value 294.669601
## final  value 294.606678 
## converged

fit3 <- multinom(food ~ lake, data = Alligator, weights = count)

## # weights:  25 (16 variable)
## initial  value 352.466903 
## iter  10 value 283.282977
## iter  20 value 280.584605
## final  value 280.583843 
## converged

fit4 <- multinom(food ~ size + lake, data = Alligator, weights = count)

## # weights:  30 (20 variable)
## initial  value 352.466903 
## iter  10 value 271.161880
## iter  20 value 270.060188
## final  value 270.040139 
## converged

fit5 <- multinom(food ~ size + lake + sex, data = Alligator, weights = count)

## # weights:  35 (24 variable)
## initial  value 352.466903 
## iter  10 value 270.397070
## iter  20 value 268.958046
## final  value 268.932740 
## converged

Look at a particular model fit:

summary(fit5)

## Call:
## multinom(formula = food ~ size + lake + sex, data = Alligator, 
##     weights = count)
## 
## Coefficients:
##         (Intercept) sizesmall lakeHancock lakeOklawaha lakeTrafford     sexmale
## fish     1.70178892 0.7303298  -0.5752403    0.5503569  -1.23679067  0.60639521
## invert   0.53452560 2.0665999  -2.3557451    1.4635491  -0.08096449  0.14342792
## other   -0.01957203 1.0209285   0.1913919    0.5764707   0.32102428  0.35382356
## reptile -1.15700455 0.1733300   0.5539263    3.0803954   1.82403973 -0.02116283
## 
## Std. Errors:
##         (Intercept) sizesmall lakeHancock lakeOklawaha lakeTrafford   sexmale
## fish      0.7690506 0.6522820   0.7952018     1.209823    0.8661011 0.6888447
## invert    0.8555494 0.7083550   0.9463415     1.232433    0.8814409 0.7292036
## other     0.9167843 0.7250019   0.9072028     1.374193    0.9589615 0.7623291
## reptile   1.3221528 0.8554960   1.3799947     1.591429    1.3390274 0.9088120
## 
## Residual Deviance: 537.8655 
## AIC: 585.8655

Comparing Models with Drop-in-Deviance Test

fit0$deviance - fitS$deviance # null deviance

## [1] 116.7611

fit1$deviance - fitS$deviance # sex only

## [1] 114.6571

fit2$deviance - fitS$deviance # size only

## [1] 101.6116

fit3$deviance - fitS$deviance # lake only

## [1] 73.56589

fit4$deviance - fitS$deviance # lake + size

## [1] 52.47848

fit5$deviance - fitS$deviance # sex + lake + size

## [1] 50.26368

The small difference between model 4 (dropping sex) and model 5 suggests that the model can be simplified. The drop-in-deviance test is

fit4$deviance-fit5$deviance

## [1] 2.214799

to be compare to \(\chi^2_4\); dropping sex saves one parameter in each of the J-1 non-baseline categories, hence 4 degrees of freedom. Yes, we can simplify.

So, we can drop sex and reproduce the models:

# Saturated model:
fitS <- multinom(food ~ lake * size, data = Alligator, weights = count)

## # weights:  45 (32 variable)
## initial  value 352.466903 
## iter  10 value 266.342868
## iter  20 value 262.457272
## iter  30 value 261.584991
## iter  40 value 261.502078
## iter  50 value 261.500253
## final  value 261.500226 
## converged

# Null model:
fit0 <- multinom(food ~ 1, data = Alligator, weights = count)

## # weights:  10 (4 variable)
## initial  value 352.466903 
## iter  10 value 302.185213
## final  value 302.181462 
## converged

fit1 <- multinom(food ~ size, data = Alligator, weights = count)

## # weights:  15 (8 variable)
## initial  value 352.466903 
## iter  10 value 294.669601
## final  value 294.606678 
## converged

fit2 <- multinom(food ~ lake, data = Alligator, weights = count)

## # weights:  25 (16 variable)
## initial  value 352.466903 
## iter  10 value 283.282977
## iter  20 value 280.584605
## final  value 280.583843 
## converged

fit3 <- multinom(food ~ size + lake, data = Alligator, weights = count)

## # weights:  30 (20 variable)
## initial  value 352.466903 
## iter  10 value 271.161880
## iter  20 value 270.060188
## final  value 270.040139 
## converged

fit0$deviance - fitS$deviance # null

## [1] 81.36247

fit1$deviance - fitS$deviance # size only

## [1] 66.2129

fit2$deviance - fitS$deviance # lake only

## [1] 38.16723

fit3$deviance - fitS$deviance # size + lake

## [1] 17.07983

Think about degrees of freedom for these models. The null model says there are no differences among lakes or sizes, so only one probability distribution over the five classes (fish, invert, other, reptile and bird) is necessary. This requires four parameters, since the probabilities sum to one, and bird as baseline is the category without a separate parameter:

coef(fit0)

##         (Intercept)
## fish      1.9783455
## invert    1.5459246
## other     0.9007867
## reptile   0.3794897

The saturated model has 32 parameters: 4 each for 4 lakes times 2 size classes. Hence the null model has \(32 - 4 = 28\) residual degrees of freedom.

The size only model has two distributions, one for small alligators and one for large alligators (default), hence 8 parameters and 24 residual df:

coef(fit1)

##         (Intercept)  sizesmall
## fish      1.7272335  0.5550764
## invert    0.6931755  1.5039716
## other     0.4855291  0.8494003
## reptile   0.4855171 -0.3033028

Similiarly, the lake only model has four distributions, one for each lake (with Lake George as default), hence uses 16 df and has 16 residual df:

coef(fit2)

##         (Intercept) lakeHancock lakeOklawaha lakeTrafford
## fish      2.3977922  -0.6060193    0.4925008   -1.2191111
## invert    1.8970142  -2.1201047    1.0473497   -0.3929084
## other     0.6930138   0.2625353    0.4055545    0.2233165
## reptile  -1.0983211   0.5875275    3.0441536    1.7915266

Finally, the size + lake model is additive: it uses 20 parameters (either 16 lake parameters to get 4 lake distributions, which are then shifted by 4 size parameters; or 8 size parameters to get 2 size distributions, which are then shifted by 12 lake parameters) and hence has 12 residual df:

coef(fit3)

##         (Intercept) sizesmall lakeHancock lakeOklawaha lakeTrafford
## fish      2.0929906 0.6308607  -0.6951466    0.6529211  -1.08764377
## invert    0.5439613 2.0890800  -2.3535280    1.5901541   0.03435528
## other     0.1887161 0.9624274   0.1310280    0.6585231   0.42873394
## reptile  -1.2215892 0.2796424   0.5476069    3.1118287   1.84765700

Among these non-saturated models, fit3 is the largest model and has the smallest residual deviance (as it must). Is it better than the next best model, fit2? Or could we simplify the model? Drop-in-deviance is

fit2$deviance - fit3$deviance

## [1] 21.08741

on \(16 - 12 = 4\) df, with p-value

1 - pchisq(fit2$deviance - fit3$deviance, df = 4)

## [1] 0.0003042796

so reject the proposed simplification.

Probability calculations

Ex1. What is the probability for a large alligator in LakeGeorge prefers fish?

By hand

# Note that bird:0 (baseline), fish: 2.0929, large:0, lakeGeorge:0


#P = exp(P(food_type)+P(size)+P(lake)) / (1 +   (exp(fish)(food_type) ++ P(size)+P(lake))
#                                                exp(invert)(P(food_type) + P(size)+P(lake)) +
#                                                exp(other) (P(food_type) + P(size)+P(lake))           #                                                exp(reptile)(P(food_type) + P(size)+P(lake))  )  
                                                                                      

P <- (exp(2.0929906 + 0 + 0 )) / (1 + exp( 2.0929906 + 0 + 0 )  
                                    + exp( 0.5439613 + 0 + 0 )
                                    + exp( 0.1887161 + 0 + 0 )
                                    + exp(-1.2215892 + 0 + 0 ))

P

## [1] 0.6574398

By using Model

tmp <- c(0, coef(fit3)[, 1]) # 0 for bird and Lake George coefficients.
exp(tmp) / sum(exp(tmp))

##                  fish     invert      other    reptile 
## 0.08107402 0.65743977 0.13967578 0.09791293 0.02389749

By R

predict(fit3,
        type = "probs",
        newdata = data.frame(size = "large", lake = "George"))

##       bird       fish     invert      other    reptile 
## 0.08107402 0.65743977 0.13967578 0.09791293 0.02389749

Ex2. If you are a small alligator, what is your food preference relative to a large alligator on Lake Hancock?

1: Probability of Small allgitor for LakeHancock:

By hand for Reptile preference

# Note that bird:0 (baseline), fish: 2.0929, large:0, lakeGeorge:0


# P = exp(P(food_type)+P(size)+P(lake)) / (1 +   (exp(fish)(food_type) ++ P(size)+P(lake))
#                                                exp(invert)(P(food_type) + P(size)+P(lake)) +
#                                                exp(other) (P(food_type) + P(size)+P(lake))          #                                                exp(reptile)(P(food_type) + P(size)+P(lake))  )  
                                                                                      

P <- (exp(-1.2215892 + 0.2796424  + 0.5476069)) / (1
                                                   + exp(-1.2215892 + 0.2796424  + 0.5476069)  
                                                   + exp( 2.0929906 + 0.6308607  - 0.6951466)
                                                   + exp( 0.5439613 + 2.0890800  - 2.3535280)
                                                   + exp(0.1887161  + 0.9624274  + 0.1310280))
#Reptile preference
P

## [1] 0.04745587

By using Model

tmp <- c(0, coef(fit3)[, 1] + coef(fit3)[, 2] +coef(fit3)[, 3])
exp(tmp) / sum(exp(tmp))

##                  fish     invert      other    reptile 
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587

By R

predict(fit3,
        type = "probs",
        newdata = data.frame(size = "small", lake = "Hancock"))

##       bird       fish     invert      other    reptile 
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587

2: Probability of Large allgitor for LakeHancock

By hand for Reptile preference

# Note that bird:0 (baseline), fish: 2.0929, large:0, lakeGeorge:0


# P = exp(P(food_type)+P(size)+P(lake)) / (1 +   (exp(fish)(food_type) ++ P(size)+P(lake))
#                                                exp(invert)(P(food_type) + P(size)+P(lake)) +
#                                                exp(other) (P(food_type) + P(size)+P(lake))          #                                                exp(reptile)(P(food_type) + P(size)+P(lake))  )  
                                                                                      

P <- (exp(-1.2215892 + 0.2796424  + 0.5476069)) / (1
                                                   + exp(-1.2215892 + 0.2796424  + 0.5476069)  
                                                   + exp( 2.0929906 + 0.6308607  - 0.6951466)
                                                   + exp( 0.5439613 + 2.0890800  - 2.3535280)
                                                   + exp(0.1887161  + 0.9624274  + 0.1310280))
#Reptile preference
P

## [1] 0.04745587

By using Model

tmp <- c(0, coef(fit3)[, 1] + coef(fit3)[, 2] +coef(fit3)[, 3])
exp(tmp) / sum(exp(tmp))

##                  fish     invert      other    reptile 
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587

By R

predict(fit3,
        type = "probs",
        newdata = data.frame(size = "small", lake = "Hancock"))

##       bird       fish     invert      other    reptile 
## 0.07039626 0.53530940 0.09309807 0.25374039 0.04745587

References

CSU Course notes- Generalized Regression Model - STAA 552
https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/
https://www.theanalysisfactor.com/logistic-regression-models-for-multinomial-and-ordinal-variables/

************************

Generalized Linear Models 3

Mustafa Arslan

11/2/2021