Multinomial logistic regression in 2 Methods

Method 1 - mlogit

URL: https://www.youtube.com/watch?v=-Cp_KP9mq94&t=15s

Users picked one of the following as their preferred method of fishing and provided other stats such as income:

  • Beach
  • Pier
  • Charter Boat
  • Private Boat
multinomial_fishing1 <- read.csv("~/CSPS work/621 - Analytics and Data Mining/multinomial_fishing1.csv", header=TRUE)
#View(multinomial_fishing1)

attach(multinomial_fishing1)

Describe the data

table(mode)
## mode
##   beach charter    pier private 
##     134     452     178     418

Reshape the data from Long to Wide so each person has 4 lines verses only 1; True would be their choice while also generating three false entries for the choices they did not chose.

Using this, in the conditional logic model since price and catch as they vary across all the options.

mldata <- mlogit.data(multinomial_fishing1, varying = 4:15, choice = "mode", shape = "wide")
mldata[1:20,]
## ~~~~~~~
##  first 10 observations out of 20 
## ~~~~~~~
##     mode   price catchrate   income mode1     alt d  catch chid    idx
## 1  FALSE 157.930    0.5391 7.083332     4   beach 0 0.0678    1 1:each
## 2   TRUE 182.930    0.5391 7.083332     4 charter 1 0.5391    1 1:rter
## 3  FALSE 157.930    0.5391 7.083332     4    pier 0 0.0503    1 1:pier
## 4  FALSE 157.930    0.5391 7.083332     4 private 0 0.2601    1 1:vate
## 5  FALSE  15.114    0.4671 1.250000     4   beach 0 0.1049    2 2:each
## 6   TRUE  34.534    0.4671 1.250000     4 charter 1 0.4671    2 2:rter
## 7  FALSE  15.114    0.4671 1.250000     4    pier 0 0.0451    2 2:pier
## 8  FALSE  10.534    0.4671 1.250000     4 private 0 0.1574    2 2:vate
## 9  FALSE 161.874    0.2413 3.750000     3   beach 0 0.5333    3 3:each
## 10 FALSE  59.334    0.2413 3.750000     3 charter 0 1.0266    3 3:rter
## 
## ~~~ indexes ~~~~
##    chid     alt
## 1     1   beach
## 2     1 charter
## 3     1    pier
## 4     1 private
## 5     2   beach
## 6     2 charter
## 7     2    pier
## 8     2 private
## 9     3   beach
## 10    3 charter
## indexes:  1, 2

Multinomial logit model coefficients

Mode is the dependent variable and we will not have any variable that varies with the alternative so we put 1 in the constant. Income is the variable which does not vary with the alternative. Reference level is Charter creating the base variable.

mlogit.model1 <- mlogit(mode ~ 1 | income, data=mldata, reflevel="charter")
summary(mlogit.model1)
## 
## Call:
## mlogit(formula = mode ~ 1 | income, data = mldata, reflevel = "charter", 
##     method = "nr")
## 
## Frequencies of alternatives:choice
## charter   beach    pier private 
## 0.38240 0.11337 0.15059 0.35364 
## 
## nr method
## 4 iterations, 0h:0m:0s 
## g'(-H)^-1g = 8.32E-07 
## gradient close to zero 
## 
## Coefficients :
##                      Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):beach   -1.341291   0.194517 -6.8955 5.367e-12 ***
## (Intercept):pier    -0.527141   0.177784 -2.9651  0.003026 ** 
## (Intercept):private -0.602371   0.136096 -4.4261 9.597e-06 ***
## income:beach         0.031640   0.041846  0.7561  0.449591    
## income:pier         -0.111763   0.043979 -2.5413  0.011046 *  
## income:private       0.123546   0.027911  4.4265 9.577e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1477.2
## McFadden R^2:  0.013736 
## Likelihood ratio test : chisq = 41.145 (p.value = 6.0931e-09)

Since charter is the base variable, it is not displayed in the coefficients results (normalized to zero). The others are in comparison to charter.

Analysis: As income increases,

  • it does not have a significant effect on the choice of beach.
  • it has a negative effect on the choice of Pier.
  • it has a significant effect on the choice of private

Multinomial logit model coefficients (with different base outcome)

Changing the reference to Pier, how does that change the coefficients? It was the only value with a negative effect earlier.

mlogit.model2 <- mlogit(mode ~ 1 | income, data = mldata, reflevel="pier")
summary(mlogit.model2)
## 
## Call:
## mlogit(formula = mode ~ 1 | income, data = mldata, reflevel = "pier", 
##     method = "nr")
## 
## Frequencies of alternatives:choice
##    pier   beach charter private 
## 0.15059 0.11337 0.38240 0.35364 
## 
## nr method
## 4 iterations, 0h:0m:0s 
## g'(-H)^-1g = 8.32E-07 
## gradient close to zero 
## 
## Coefficients :
##                      Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):beach   -0.814150   0.228632 -3.5610 0.0003695 ***
## (Intercept):charter  0.527141   0.177784  2.9651 0.0030262 ** 
## (Intercept):private -0.075229   0.183240 -0.4106 0.6814007    
## income:beach         0.143403   0.053288  2.6911 0.0071223 ** 
## income:charter       0.111763   0.043979  2.5413 0.0110455 *  
## income:private       0.235309   0.043668  5.3886 7.101e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1477.2
## McFadden R^2:  0.013736 
## Likelihood ratio test : chisq = 41.145 (p.value = 6.0931e-09)

Analysis: As income increases,

  • Beach becomes more likely.
  • Charter becomes more likely
  • Private becomes more likely

The Coefficients do not have any correspondence with the other model in magnitude or sign.

Multinomial logit model odds ratios

Taking the exponents of the Multinomial logit model to create the odds ratio.

exp(coef(mlogit.model1))
##   (Intercept):beach    (Intercept):pier (Intercept):private 
##           0.2615077           0.5902901           0.5475121 
##        income:beach         income:pier      income:private 
##           1.0321457           0.8942561           1.1315023

An odds ratio of over one make it more likely than the other outcomes and likewise less than one makes the outcome less likely.

As income increases,

  • Beach becomes more likely.
  • Charter becomes more likely
  • Pier becomes less likely

Conditional logit model

In the conditional logit model, we have 2 or more variables in as alternative specific variables.

clogit.model1 <- mlogit(mode ~ price+catch |income, data = mldata, reflevel="charter")
summary(clogit.model1)
## 
## Call:
## mlogit(formula = mode ~ price + catch | income, data = mldata, 
##     reflevel = "charter", method = "nr")
## 
## Frequencies of alternatives:choice
## charter   beach    pier private 
## 0.38240 0.11337 0.15059 0.35364 
## 
## nr method
## 7 iterations, 0h:0m:0s 
## g'(-H)^-1g = 1.37E-05 
## successive function values within tolerance limits 
## 
## Coefficients :
##                       Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept):beach   -1.6943657  0.2240506  -7.5624 3.952e-14 ***
## (Intercept):pier    -0.9164063  0.2072648  -4.4214 9.805e-06 ***
## (Intercept):private -1.1670869  0.1590475  -7.3380 2.169e-13 ***
## price               -0.0251166  0.0017317 -14.5042 < 2.2e-16 ***
## catch                0.3577820  0.1097733   3.2593  0.001117 ** 
## income:beach         0.0332917  0.0503409   0.6613  0.508403    
## income:pier         -0.0942854  0.0500600  -1.8834  0.059640 .  
## income:private       0.1227315  0.0286306   4.2867 1.813e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1215.1
## McFadden R^2:  0.18868 
## Likelihood ratio test : chisq = 565.17 (p.value = < 2.22e-16)

Notice we only have one set of coefficients for price or catch but we have multiple sets for income.

Analysis:

  • As the Price increases, the option becomes less likely to be chosen
  • As the Catch increases, the option becomes more likely to be chosen

In comparison to Charter, when incomes increases: * Beach becomes more likely * Pier becomes less likely * Private becomes more likely

Again, running the model with reference level of Pier base category

clogit.model2 <- mlogit(mode ~ price+catch | income, data = mldata, reflevel="pier")
summary(clogit.model2)
## 
## Call:
## mlogit(formula = mode ~ price + catch | income, data = mldata, 
##     reflevel = "pier", method = "nr")
## 
## Frequencies of alternatives:choice
##    pier   beach charter private 
## 0.15059 0.11337 0.38240 0.35364 
## 
## nr method
## 7 iterations, 0h:0m:0s 
## g'(-H)^-1g = 1.37E-05 
## successive function values within tolerance limits 
## 
## Coefficients :
##                       Estimate Std. Error  z-value  Pr(>|z|)    
## (Intercept):beach   -0.7779594  0.2204939  -3.5283 0.0004183 ***
## (Intercept):charter  0.9164063  0.2072648   4.4214 9.805e-06 ***
## (Intercept):private -0.2506806  0.2039395  -1.2292 0.2190004    
## price               -0.0251166  0.0017317 -14.5042 < 2.2e-16 ***
## catch                0.3577820  0.1097733   3.2593 0.0011170 ** 
## income:beach         0.1275771  0.0506395   2.5193 0.0117582 *  
## income:charter       0.0942854  0.0500600   1.8834 0.0596396 .  
## income:private       0.2170169  0.0500582   4.3353 1.456e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1215.1
## McFadden R^2:  0.18868 
## Likelihood ratio test : chisq = 565.17 (p.value = < 2.22e-16)

Analysis:

  • As the Price increases, the option becomes less likely to be chosen
  • As the Catch increases, the option becomes more likely to be chosen

In comparison to Pier, when incomes increases: * Beach becomes more likely * Pier becomes more likely * Private becomes more likely

Setting mean values for variables to use for marginal effects

Mean for the income, Price and catch across all variables, using the reference level of Charter.

m <- mlogit(mode ~ price+catch |income, data = mldata, reflevel="charter")
z <- with(mldata, data.frame(price = tapply(price, index(m)$alt, mean), 
                             catch = tapply(catch, index(m)$alt, mean), income = mean(income)))

We use the mean values to calculate the marginal effects.

Logit Model Marginal Effects

Multinomial logit model marginal effects

The data becomes the z data from earlier only presenting the means.

effects(mlogit.model1, covariate = "income", data = z)
##       charter         beach          pier       private 
## -1.201367e-02  7.495845e-05 -2.065980e-02  3.259851e-02

Conditional logit model marginal effects

effects(clogit.model1, covariate = "price", data = z)
##               charter         beach          pier       private
## charter -0.0062430047  6.091542e-04  7.642235e-04  0.0048696270
## beach    0.0006091541 -1.249124e-03  8.681094e-05  0.0005531588
## pier     0.0007642234  8.681094e-05 -1.545008e-03  0.0006939736
## private  0.0048696270  5.531588e-04  6.939737e-04 -0.0061167595
effects(clogit.model1, covariate = "catch", data = z)
##              charter        beach         pier      private
## charter  0.088930726 -0.008677316 -0.010886256 -0.069367154
## beach   -0.008677329  0.017793621 -0.001236612 -0.007879681
## pier    -0.010886272 -0.001236612  0.022008455 -0.009885571
## private -0.069367164 -0.007879671 -0.009885559  0.087132394

Notice across the marginal effects for Price, we have negative values. This suggests as the price increases, each variable becomes less likely with Pier becoming the most unlikely to be chosen.

If catch increases for a given option, the likelihood of of it being chosen increases.

Multinomial probit model coefficients

#mprobit.model1 <- mlogit(mode ~ 1 | income, data = mldata, reflevel="charter", probit=TRUE)
#summary(mprobit.model1)

Hauseman-McFadden test of independence of irrelevant alternatives

First, we estimate the multinominal logit model using a single reference level, in this case Beach and income being our independent variable. Second, we estimate the same model were we have the alternative subset of three of the four options, dropping Charter

m1<- mlogit(mode ~ 1 | income, data = mldata, reflevel="beach")
m2<- mlogit(mode ~ 1 | income, data = mldata, reflevel="beach", alt.subset=c("beach", "pier", "private"))
hmftest(m1, m2)
## 
##  Hausman-McFadden test
## 
## data:  mldata
## chisq = 14.701, df = 4, p-value = 0.005363
## alternative hypothesis: IIA is rejected

P values of less than 0.05 so the IIA is rejected.

Method 2 - NNET

source: https://stats.idre.ucla.edu/r/dae/multinomial-logistic-regression/

HSB Demo data set contains variables on 200 students where the output variable is prog (program type).

The Predictor variables are:

Understanding the data

hsbdemo <- read.csv("~/CSPS work/621 - Analytics and Data Mining/hsbdemo.csv", header = TRUE)

with(hsbdemo, table(ses,prog))
##         prog
## ses      academic general vocation
##   high         42       9        7
##   low          19      16       12
##   middle       44      20       31
with(hsbdemo, do.call(rbind, tapply(write, prog, function(x) c(M=mean(x), SD = sd(x)))))
##                 M       SD
## academic 56.25714 7.943343
## general  51.33333 9.397775
## vocation 46.76000 9.318754

Multinomial logit model coefficients

This method uses the multinom function from the nnet package. multinom does not require the data to be reshaped as with the mlogit package.

library (nnet)
## Warning: package 'nnet' was built under R version 3.6.3
hsbdemo$prog2 <- relevel(hsbdemo$prog, ref = "academic")

model1 <- multinom(prog2 ~ ses+write, data = hsbdemo)
## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 179.983731
## final  value 179.981726 
## converged

Running the model produces a negative log-likelihood of 179.98. When this value is multiplied by 2, its seen in the residual Deviance and can be used when comparing nested models.

The model summary output has a block of coefficients and a block of standard errors. Each of these blocks has one row of values corresponding to a model equation. Focusing on the block of coefficients, we can look at the first row comparing prog = “general” to our baseline prog = “academic” and the second row comparing prog = “vocation” to our baseline prog = “academic”.

summary(model1)
## Call:
## multinom(formula = prog2 ~ ses + write, data = hsbdemo)
## 
## Coefficients:
##          (Intercept)    seslow sesmiddle       write
## general     1.689478 1.1628411 0.6295638 -0.05793086
## vocation    4.235574 0.9827182 1.2740985 -0.11360389
## 
## Std. Errors:
##          (Intercept)    seslow sesmiddle      write
## general     1.226939 0.5142211 0.4650289 0.02141101
## vocation    1.204690 0.5955688 0.5111119 0.02222000
## 
## Residual Deviance: 359.9635 
## AIC: 375.9635
z <- summary(model1)$coefficients/summary(model1)$standard.errors
z
##          (Intercept)   seslow sesmiddle     write
## general     1.376987 2.261364  1.353816 -2.705658
## vocation    3.515904 1.650050  2.492798 -5.112687
# creating the 2 tailed Z test
p <- (1-pnorm(abs(z),0,1))*2
p
##           (Intercept)     seslow sesmiddle        write
## general  0.1685163893 0.02373673 0.1757949 6.816914e-03
## vocation 0.0004382601 0.09893276 0.0126741 3.176088e-07

Analysis: * A one unit increase in the write variable is associated with a decrease in the likelihood of: * Being in the General vs Academic Program by .058 * Being in the Vocation vs Academic Program by .1136 * Moving from SES Low to SES High the log odds of being in the in the Vocation vs Academic Program increase by 1.162 * Moving from SES Middle to SES High the log odds of being in the in the Vocation vs Academic Program increase by .6295

Relative Risk

The Relative risk is the right-hand side linear equation exponentiated, leading to the fact that the exponentiated regression coefficients are relative risk ratios for a unit change in the predictor variable. We can exponentiate the coefficients from our model to see these risk ratios.

exp(coef(model1))
##          (Intercept)   seslow sesmiddle     write
## general     5.416653 3.199009  1.876792 0.9437152
## vocation   69.101326 2.671709  3.575477 0.8926115
  • The Relative risk ratio for a one unit increase being in the general vs academic program in the variable write is .9437
  • the Relative risk ratio switching from SES 2 to 3 is 3.575 for being in the vocation vs academic program.

Predicted Probabilities

Using the fitted function, we can calculate the predicted probabilities for each outcome level.

head(pp <- fitted(model1))
##    academic   general   vocation
## 1 0.2777676 0.3762093 0.34602312
## 2 0.2331247 0.2204013 0.54647393
## 3 0.6596815 0.2646928 0.07562571
## 4 0.3865745 0.3698511 0.24357436
## 5 0.2141360 0.3656570 0.42020707
## 6 0.2291956 0.3693443 0.40146013

Changes in Predicted Probabilities

We want to examine the changes in predicted probability associated with one of our two variables, we create a small dataset varying one variable while holding the other constant, in this case write. Holding Write at it’s mean, we can examine the predicted probabilities for each level of SES

dses <- data.frame(ses = c("low", "middle", "high"), write = mean(hsbdemo$write))
predict(model1, newdata = dses, "probs")
##    academic   general  vocation
## 1 0.4396813 0.3581915 0.2021272
## 2 0.4777451 0.2283359 0.2939190
## 3 0.7009046 0.1784928 0.1206026

We can also understand the model using the predicted probabilities to look at the at the averaged probabilities for different continuous predictor values, in this case different values of write within each level of SES.

dwrite <- data.frame(ses = rep(c("low", "middle", "high"), each = 41), write = rep(c(30:70),
    3))

## store the predicted probabilities for each value of ses and write
pp.write <- cbind(dwrite, predict(model1, newdata = dwrite, type = "probs", se = TRUE))

## calculate the mean probabilities within each level of ses
by(pp.write[, 3:5], pp.write$ses, colMeans)
## pp.write$ses: high
##  academic   general  vocation 
## 0.6164348 0.1808049 0.2027603 
## -------------------------------------------------------- 
## pp.write$ses: low
##  academic   general  vocation 
## 0.3972955 0.3278180 0.2748864 
## -------------------------------------------------------- 
## pp.write$ses: middle
##  academic   general  vocation 
## 0.4256172 0.2010877 0.3732951

Lastly, we are a visual minded people and plotting the data conveys the information cleanly. Using the predictions generated so far, we plot the predicted probabilities against the write score by the level of SES for different levels of the outcome variable.

library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
## melt data set to long for ggplot2
lpp <- melt(pp.write, id.vars = c("ses", "write"), value.name = "probability")
head(lpp)  # view first few rows
##   ses write variable probability
## 1 low    30 academic  0.09843258
## 2 low    31 academic  0.10716517
## 3 low    32 academic  0.11650018
## 4 low    33 academic  0.12645441
## 5 low    34 academic  0.13704163
## 6 low    35 academic  0.14827211
## plot predicted probabilities across write values for each level of ses
## facetted by program type
ggplot(lpp, aes(x = write, y = probability, colour = ses)) + geom_line() + facet_grid(variable ~
    ., scales = "free")