Class_9 Multinomial Logistic

R Example: Choice of Programs

The data set(hsbdemo.sav) contains variables on 200 students. The outcome variable is prog, program type (1=general, 2=academic, and 3=vocational). The predictor variables are ses, social economic status (1=low, 2=middle, and 3=high), math, mathematics score, and science, science score: both are continuous variables.
( Research Question ):Whenhighschoolstudents choose the program (general, vocational, and academic programs), how do their math and science scores and their social economic status (SES) affect their decision?

Prepare and review the Data

# Starting our example by import the data into R
library(haven)
hsbdemo <- read_sav("hsbdemo.sav")
hsb <- hsbdemo
summary(hsb)

##        id             female           ses            schtyp    
##  Min.   :  1.00   Min.   :0.000   Min.   :1.000   Min.   :1.00  
##  1st Qu.: 50.75   1st Qu.:0.000   1st Qu.:2.000   1st Qu.:1.00  
##  Median :100.50   Median :1.000   Median :2.000   Median :1.00  
##  Mean   :100.50   Mean   :0.545   Mean   :2.055   Mean   :1.16  
##  3rd Qu.:150.25   3rd Qu.:1.000   3rd Qu.:3.000   3rd Qu.:1.00  
##  Max.   :200.00   Max.   :1.000   Max.   :3.000   Max.   :2.00  
##       prog            read           write            math      
##  Min.   :1.000   Min.   :28.00   Min.   :31.00   Min.   :33.00  
##  1st Qu.:2.000   1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00  
##  Median :2.000   Median :50.00   Median :54.00   Median :52.00  
##  Mean   :2.025   Mean   :52.23   Mean   :52.77   Mean   :52.65  
##  3rd Qu.:2.250   3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00  
##  Max.   :3.000   Max.   :76.00   Max.   :67.00   Max.   :75.00  
##     science          socst           honors          awards    
##  Min.   :26.00   Min.   :26.00   Min.   :0.000   Min.   :0.00  
##  1st Qu.:44.00   1st Qu.:46.00   1st Qu.:0.000   1st Qu.:0.00  
##  Median :53.00   Median :52.00   Median :0.000   Median :1.00  
##  Mean   :51.85   Mean   :52.41   Mean   :0.265   Mean   :1.67  
##  3rd Qu.:58.00   3rd Qu.:61.00   3rd Qu.:1.000   3rd Qu.:2.00  
##  Max.   :74.00   Max.   :71.00   Max.   :1.000   Max.   :7.00  
##       cid       
##  Min.   : 1.00  
##  1st Qu.: 5.00  
##  Median :10.50  
##  Mean   :10.43  
##  3rd Qu.:15.00  
##  Max.   :20.00

# Load the jmv package for frequency table
library(jmv)

## 
## Attaching package: 'jmv'

## The following object is masked from 'package:stats':
## 
##     anova

# Use the descritptives function to get the descritptive data
descriptives(hsb, vars = vars(ses, prog, math, science), freq = TRUE)

## 
##  DESCRIPTIVES
## 
##  Descriptives                                   
##  ────────────────────────────────────────────── 
##               ses     prog    math    science   
##  ────────────────────────────────────────────── 
##    N           200     200     200        200   
##    Missing       0       0       0          0   
##    Mean       2.06    2.02    52.6       51.9   
##    Median     2.00    2.00    52.0       53.0   
##    Minimum    1.00    1.00    33.0       26.0   
##    Maximum    3.00    3.00    75.0       74.0   
##  ────────────────────────────────────────────── 
## 
## 
##  FREQUENCIES
## 
##  Frequencies of ses                                 
##  ────────────────────────────────────────────────── 
##    Levels    Counts    % of Total    Cumulative %   
##  ────────────────────────────────────────────────── 
##    1             47          23.5            23.5   
##    2             95          47.5            71.0   
##    3             58          29.0           100.0   
##  ────────────────────────────────────────────────── 
## 
## 
##  Frequencies of prog                                
##  ────────────────────────────────────────────────── 
##    Levels    Counts    % of Total    Cumulative %   
##  ────────────────────────────────────────────────── 
##    1             45          22.5            22.5   
##    2            105          52.5            75.0   
##    3             50          25.0           100.0   
##  ──────────────────────────────────────────────────

# To see the crosstable, we need CrossTable function from gmodels package
library(gmodels)
# Build a crosstable between admit and rank
CrossTable(hsb$ses, hsb$prog)

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  200 
## 
##  
##              | hsb$prog 
##      hsb$ses |         1 |         2 |         3 | Row Total | 
## -------------|-----------|-----------|-----------|-----------|
##            1 |        16 |        19 |        12 |        47 | 
##              |     2.783 |     1.305 |     0.005 |           | 
##              |     0.340 |     0.404 |     0.255 |     0.235 | 
##              |     0.356 |     0.181 |     0.240 |           | 
##              |     0.080 |     0.095 |     0.060 |           | 
## -------------|-----------|-----------|-----------|-----------|
##            2 |        20 |        44 |        31 |        95 | 
##              |     0.088 |     0.692 |     2.213 |           | 
##              |     0.211 |     0.463 |     0.326 |     0.475 | 
##              |     0.444 |     0.419 |     0.620 |           | 
##              |     0.100 |     0.220 |     0.155 |           | 
## -------------|-----------|-----------|-----------|-----------|
##            3 |         9 |        42 |         7 |        58 | 
##              |     1.257 |     4.381 |     3.879 |           | 
##              |     0.155 |     0.724 |     0.121 |     0.290 | 
##              |     0.200 |     0.400 |     0.140 |           | 
##              |     0.045 |     0.210 |     0.035 |           | 
## -------------|-----------|-----------|-----------|-----------|
## Column Total |        45 |       105 |        50 |       200 | 
##              |     0.225 |     0.525 |     0.250 |           | 
## -------------|-----------|-----------|-----------|-----------|
## 
##

Below we use the multinom function from the nnet package to estimate a multinomial logistic regression model. There are other functions in other R packages capable of multinomial regression. We chose the multinom function because it does not require the data to be reshaped (as the mlogit package does) and to mirror the example code found in Hilbe’s Logistic Regression Models.

First, we need to choose the level of our outcome that we wish to use as our baseline and specify this in the relevel function. Then, we run our model using multinom. The multinom package does not include p-value calculation for the regression coefficients, so we calculate p-values using Wald tests (here z-tests).

Run the multinomial model using nnet package

# Load the multinom package
library(nnet)
# Since we are going to use Academic as the reference group, we need relevel the group.
hsb$prog2 <- relevel(as.factor(hsb$prog), ref = 2)
hsb$ses <- as.factor(hsb$ses)
levels(hsb$prog2)

## [1] "2" "1" "3"

# Give the names to each level
levels(hsb$prog2) <- c("academic","general","vocational")
# Run a "only intercept" model
OIM <- multinom(prog2 ~ 1, data = hsb)

## # weights:  6 (2 variable)
## initial  value 219.722458 
## final  value 204.096674 
## converged

summary(OIM)

## Call:
## multinom(formula = prog2 ~ 1, data = hsb)
## 
## Coefficients:
##            (Intercept)
## general     -0.8472980
## vocational  -0.7419374
## 
## Std. Errors:
##            (Intercept)
## general      0.1781742
## vocational   0.1718249
## 
## Residual Deviance: 408.1933 
## AIC: 412.1933

# Run a multinomial model
test <- multinom(prog2 ~ ses + math + science + math*science, data = hsb)

## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 173.831002
## iter  20 value 167.382760
## final  value 166.951813 
## converged

summary(test)

## Call:
## multinom(formula = prog2 ~ ses + math + science + math * science, 
##     data = hsb)
## 
## Coefficients:
##            (Intercept)       ses2       ses3       math     science
## general       5.897618 -0.4081497 -1.1254491 -0.1852220  0.01323626
## vocational   22.728283  0.8402168 -0.5605656 -0.5036705 -0.28297703
##            math:science
## general     0.001025283
## vocational  0.006185571
## 
## Std. Errors:
##            (Intercept)      ses2      ses3       math    science
## general    0.002304064 0.2613732 0.2134308 0.02694593 0.02953364
## vocational 0.003856861 0.2959741 0.1984775 0.02681947 0.03142872
##            math:science
## general    0.0004761369
## vocational 0.0004760567
## 
## Residual Deviance: 333.9036 
## AIC: 357.9036

# Check the Z-score for the model (wald Z)
z <- summary(test)$coefficients/summary(test)$standard.errors
z

##            (Intercept)      ses2      ses3       math    science
## general       2559.659 -1.561559 -5.273132  -6.873837  0.4481759
## vocational    5892.948  2.838819 -2.824328 -18.780028 -9.0037711
##            math:science
## general        2.153336
## vocational    12.993348

# 2-tailed z test
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p

##            (Intercept)        ses2         ses3         math   science
## general              0 0.118391942 1.341147e-07 6.249667e-12 0.6540263
## vocational           0 0.004528089 4.737981e-03 0.000000e+00 0.0000000
##            math:science
## general      0.03129229
## vocational   0.00000000

These are the logit coefficients relative to the reference category. For example,under ‘math’, the -0.185 suggests that for one unit increase in ‘science’ score, the logit coefficient for ‘low’ relative to ‘middle’ will go down by that amount, -0.185.

Check the model fit information

# the anova function is confilcted with JMV's anova function, so we need to unlibrary the JMV function before we use the anova function.
detach("package:jmv", unload=TRUE)
# Compare the our test model with the "Only intercept" model
anova(OIM,test)

## Likelihood ratio tests of Multinomial Models
## 
## Response: prog2
##                                   Model Resid. df Resid. Dev   Test    Df
## 1                                     1       398   408.1933             
## 2 ses + math + science + math * science       388   333.9036 1 vs 2    10
##   LR stat.      Pr(Chi)
## 1                      
## 2 74.28972 6.539991e-12

Interpretation of the Model Fit information

The log-likelihood is a measure of how much unexplained variability there is in the data. Therefore, the difference or change in log-likelihood indicates how much new variance has been explained by the model.
The chi-square test tests the decrease in unexplained variance from the baseline model (408.1933) to the final model (333.9036), which is a difference of 408.1933 - 333.9036 = 74.29. This change is significant, which means that our final model explains a significant amount of the original variability.
The likelihood ratio chi-square of 74.29 with a p-value < 0.001 tells us that our model as a whole fits significantly better than an empty or null model (i.e., a model with no predictors)

Goodness of fit

# Check the predicted probability for each program
head(test$fitted.values,30)

##      academic    general vocational
## 1  0.18801940 0.17122451  0.6407561
## 2  0.12019189 0.10715542  0.7726527
## 3  0.52212681 0.08123771  0.3966355
## 4  0.23683979 0.23125435  0.5319059
## 5  0.10132130 0.12329032  0.7753884
## 6  0.38079544 0.10780793  0.5113966
## 7  0.32321815 0.16454057  0.5122413
## 8  0.09033932 0.08381233  0.8258484
## 9  0.02336687 0.09050704  0.8861261
## 10 0.32321815 0.16454057  0.5122413
## 11 0.16304678 0.29839918  0.5385540
## 12 0.22842326 0.27539161  0.4961851
## 13 0.32747927 0.28141483  0.3911059
## 14 0.05717483 0.12540921  0.8174160
## 15 0.15003741 0.52649953  0.3234631
## 16 0.12638004 0.20495962  0.6686603
## 17 0.25269654 0.39841475  0.3488887
## 18 0.05771613 0.08029142  0.8619924
## 19 0.27404420 0.16131436  0.5646414
## 20 0.25197679 0.20299587  0.5450273
## 21 0.15870561 0.17100945  0.6702849
## 22 0.27404420 0.16131436  0.5646414
## 23 0.16304678 0.29839918  0.5385540
## 24 0.16340250 0.31572103  0.5208765
## 25 0.26080538 0.36665885  0.3725358
## 26 0.74715288 0.12007209  0.1327750
## 27 0.33135572 0.26860688  0.4000374
## 28 0.32321815 0.16454057  0.5122413
## 29 0.40025162 0.32553566  0.2742127
## 30 0.19518342 0.30892507  0.4958915

# We can get the predicted result by use predict functio
head(predict(test),30)

##  [1] vocational vocational academic   vocational vocational vocational
##  [7] vocational vocational vocational vocational vocational vocational
## [13] vocational vocational general    vocational general    vocational
## [19] vocational vocational vocational vocational vocational vocational
## [25] vocational academic   vocational vocational academic   vocational
## Levels: academic general vocational

# Test the goodness of fit
chisq.test(hsb$prog2,predict(test))

## Warning in chisq.test(hsb$prog2, predict(test)): Chi-squared approximation
## may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  hsb$prog2 and predict(test)
## X-squared = 47.841, df = 4, p-value = 1.019e-09

Calculate the Pseudo R-Square

# Load the DescTools package for calculate the R square
library("DescTools")

## Registered S3 method overwritten by 'DescTools':
##   method         from 
##   reorder.factor gdata

# Calculate the R Square
PseudoR2(test, which = c("CoxSnell","Nagelkerke","McFadden"))

##   CoxSnell Nagelkerke   McFadden 
##  0.3102655  0.3565873  0.1819964

Interpretation of the R-Square

These are three pseudo R squared values. Logistic regression does not have an equivalent to the R squared that is found in OLS regression; however, many people have tried to come up with one. These statistics do not mean exactly what R squared means in OLS regression (the proportion of variance of the response variable explained by the predictors), we suggest interpreting them with great caution.
Cox and Snell’s R-Square imitates multiple R-Square based on ‘likelihood’, but its maximum can be (and usually is) less than 1.0, making it difficult to interpret. Here it is indicating that there is the relationship of 31% between the dependent variable and the independent variables. Or it is indicating that 31% of the variation in the dependent variable is explained by the logistic model.
The Nagelkerke modification that does range from 0 to 1 is a more reliable measure of the relationship. Nagelkerke’s R2 will normally be higher than the Cox and Snell measure. In our case it is 0.357, indicating a relationship of 35.7% between the predictors and the prediction.
McFadden = {LL(null) – LL(full)} / LL(null). In our case it is 0.182, indicating a relationship of 18.2% between the predictors and the prediction.

Likelihood Ratio Tests

# Use the lmtest package to run Likelihood Ratio Tests
library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

lrtest(test, "ses") #Chi-Square=12.922,p=0.01166*

## # weights:  15 (8 variable)
## initial  value 219.722458 
## iter  10 value 176.645309
## iter  20 value 173.670728
## iter  30 value 173.430200
## final  value 173.412997 
## converged

## Likelihood ratio test
## 
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ math + science + math:science
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  12 -166.95                       
## 2   8 -173.41 -4 12.922    0.01166 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(test, "math") # Chi-Square=10.613,p=0.004959*

## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 172.387155
## final  value 172.258318 
## converged

## Likelihood ratio test
## 
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ ses + science + math:science
##   #Df  LogLik Df  Chisq Pr(>Chisq)   
## 1  12 -166.95                        
## 2  10 -172.26 -2 10.613   0.004959 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(test, "science") # Chi-Squre=5.3874,p=0.06763

## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 169.972735
## final  value 169.645522 
## converged

## Likelihood ratio test
## 
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ ses + math + math:science
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  12 -166.95                       
## 2  10 -169.65 -2 5.3874    0.06763 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lrtest(test, "math:science") # Chi-Square=5.249,p=0.072

## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 170.088834
## final  value 169.576379 
## converged

## Likelihood ratio test
## 
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ ses + math + science
##   #Df  LogLik Df  Chisq Pr(>Chisq)  
## 1  12 -166.95                       
## 2  10 -169.58 -2 5.2491    0.07247 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation of the Likelihood Ratio Tests

The results of the likelihood ratio tests can be used to ascertain the significance of predictors to the model. This table tells us that SES and math score had significant main effects on program selection, X^2(4) = 12.917, p = .012 for SES and X^2(2) = 10.613, p = .005 for SES.
These likelihood statistics can be seen as sorts of overall statistics that tell us which predictors significantly enable us to predict the outcome category, but they don’t really tell us specifically what the effect is. To see this we have to look at the individual parameter estimates.

Parameter Estimates

# Let's check our model again
summary(test)

## Call:
## multinom(formula = prog2 ~ ses + math + science + math * science, 
##     data = hsb)
## 
## Coefficients:
##            (Intercept)       ses2       ses3       math     science
## general       5.897618 -0.4081497 -1.1254491 -0.1852220  0.01323626
## vocational   22.728283  0.8402168 -0.5605656 -0.5036705 -0.28297703
##            math:science
## general     0.001025283
## vocational  0.006185571
## 
## Std. Errors:
##            (Intercept)      ses2      ses3       math    science
## general    0.002304064 0.2613732 0.2134308 0.02694593 0.02953364
## vocational 0.003856861 0.2959741 0.1984775 0.02681947 0.03142872
##            math:science
## general    0.0004761369
## vocational 0.0004760567
## 
## Residual Deviance: 333.9036 
## AIC: 357.9036

# Check the Wald Z again
z <- summary(test)$coefficients/summary(test)$standard.errors
z

##            (Intercept)      ses2      ses3       math    science
## general       2559.659 -1.561559 -5.273132  -6.873837  0.4481759
## vocational    5892.948  2.838819 -2.824328 -18.780028 -9.0037711
##            math:science
## general        2.153336
## vocational    12.993348

# 2-tailed z test
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p

##            (Intercept)        ses2         ses3         math   science
## general              0 0.118391942 1.341147e-07 6.249667e-12 0.6540263
## vocational           0 0.004528089 4.737981e-03 0.000000e+00 0.0000000
##            math:science
## general      0.03129229
## vocational   0.00000000

Note that the table is split into two rows. This is because these parameters compare pairs of outcome categories.
We specified the second category (2 = academic) as our reference category; therefore, the first row of the table labelled General is comparing this category against the ‘Academic’ category. the second row of the table labelled Vocational is also comparing this category against the ‘Academic’ category.
Because we are just comparing two categories the interpretation is the same as for binary logistic regression:

# extract the coefficients from the model and exponentiate
exp(coef(test))

##             (Intercept)      ses2      ses3      math   science
## general    3.641690e+02 0.6648794 0.3245067 0.8309198 1.0133243
## vocational 7.426219e+09 2.3168692 0.5708861 0.6043085 0.7535371
##            math:science
## general        1.001026
## vocational     1.006205

The relative log odds of being in general program versus in academic program will decrease by 1.125 if moving from the highest level of SES (SES = 3) to the lowest level of SES (SES = 1) , b = -1.125, Wald χ2(1) = -5.27, p <.001.
Exp(-1.1254491) = 0.3245067 means that when students move from the highest level of SES (SES = 3) to the lowest level of SES (1= SES) the odds ratio is 0.325 times as high and therefore students with the lowest level of SES tend to choose general program against academic program more than students with the highest level of SES.
The relative log odds of being in vocational program versus in academic program will decrease by 0.56 if moving from the highest level of SES (SES = 3) to the lowest level of SES (SES = 1) , b = -0.56, Wald χ2(1) = -2.82, p < 0.01.
Exp(-0.56) = 0.57 means that when students move from the highest level of SES (SES = 3) to the lowest level of SES (SES=1) the odds ratio is 0.57 times as high and therefore students with the lowest level of SES tend to choose vocational program against academic program more than students with the highest level of SES.

Interpretation of the Predictive Equation

Please check your slides for detailed information. You can find all the values on above R outcomes.

Build a classification table

# Load the summarytools package to use the classification function
library(summarytools)

## Registered S3 method overwritten by 'pryr':
##   method      from
##   print.bytes Rcpp

# Build a classification table by using the ctable function
ctable <- table(hsb$prog2,predict(test))
ctable

##             
##              academic general vocational
##   academic         88       4         13
##   general          24      12          9
##   vocational       21       4         25