Week 12

This hsb data was collected as a subset of the High School and Beyond study conducted by the National Education Longitudinal Studies program of the National Center for Education Statistics. The variables are gender, race, socioeconomic status, school type, chosen high school program type: scores on reading, writing, math, science, and social studies. We want to determine which factors are related to the choice of the type of program - academic, vocational, or general - that the students pursue in high school. The response is multinomial with three levels.

a). Fit a trinomial response model with the other relevant variables as predictors (untransformed).

The “prog” response variable is a nominal variable with no natural order to the categories. We see from plots that the level of enrollment in high school program across all three type differs by gender (Men tended to be less enrolled in general rather than vocational programs). Similarly, enrollment varies by race (Asians tended to have the highest enrollment in general high school programs while Hispanics had the lowest enrollment for both general and vocational programs and Whites had a high enrollment in both general and vocational programs). Students in the lowest socioeconomic status had the highest enrollment in vocational programs whereas students in middle socioeconomic status had the highest enrollment in general high school programs. Similarly, public school students had the highest enrollment in vocational programs while non-public school students had the highest enrollment in general high school programs. Finally, while reading and writing scores were low for students in vocational programs, math, science and social study scores were higher for these students.

b). Use backward elimination to reduce the model to one where all predictors are statistically significant. Give an interpretation of the resulting model.

I used the AIC criterion deploying a stepwise search method to select which variables need to be included in the model. At the first stage of the search, we see that omitting “race” would be the best option to reduce the AIC criterion. At the next step, “gender” is removed, in the third step “write” is removed and in the final step “read” is removed.

We can also use the standard likelihood methods to derive a test to compare nested models. We first remove “gender” and then compare the deviance of the resulting model.

c). For the student with id 99, compute the predicted probabilities of the three possible choices.

library(faraway)
library(nnet)
## Warning: package 'nnet' was built under R version 4.0.5
sprog <- hsb$prog
levels(sprog) <- c("academic", "vocational", "general")
summary(sprog)
##   academic vocational    general 
##        105         45         50
par(mfrow=c(3,3))

matplot(prop.table(table(hsb$gender, sprog)), type = "l", xlab = "Gender", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$race, sprog)), type = "l", xlab = "Race", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$ses, sprog)), type = "l", xlab = "Socioeconomic status", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$schtyp, sprog)), type = "l", xlab = "School type", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$read, sprog)), type = "l", xlab = "Reading scores", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$write, sprog)), type = "l", xlab = "Writing scores", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$math, sprog)), type = "l", xlab = "Math scores", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$science, sprog)), type = "l", xlab = "Science scores", ylab = "High school program type", lty = c(1, 2, 5))
matplot(prop.table(table(hsb$socst, sprog)), type = "l", xlab = "Social studies scores", ylab = "High school program type", lty = c(1, 2, 5))

mmod <- multinom(sprog ~ gender + race + ses + schtyp + read + write + math + science + socst, hsb)
## # weights:  42 (26 variable)
## initial  value 219.722458 
## iter  10 value 171.814970
## iter  20 value 153.793692
## iter  30 value 152.935260
## final  value 152.935256 
## converged
summary(mmod)
## Call:
## multinom(formula = sprog ~ gender + race + ses + schtyp + read + 
##     write + math + science + socst, data = hsb)
## 
## Coefficients:
##            (Intercept)  gendermale raceasian racehispanic racewhite     seslow
## vocational    3.631901 -0.09264717  1.352739   -0.6322019 0.2965156 1.09864111
## general       7.481381 -0.32104341 -0.700070   -0.1993556 0.3358881 0.04747323
##            sesmiddle schtyppublic        read       write       math    science
## vocational 0.7029621    0.5845405 -0.04418353 -0.03627381 -0.1092888 0.10193746
## general    1.1815808    2.0553336 -0.03481202 -0.03166001 -0.1139877 0.05229938
##                  socst
## vocational -0.01976995
## general    -0.08040129
## 
## Std. Errors:
##            (Intercept) gendermale raceasian racehispanic racewhite    seslow
## vocational    1.823452  0.4548778  1.058754    0.8935504 0.7354829 0.6066763
## general       2.104698  0.5021132  1.470176    0.8393676 0.7480573 0.7045772
##            sesmiddle schtyppublic       read      write       math    science
## vocational 0.5045938    0.5642925 0.03103707 0.03381324 0.03522441 0.03274038
## general    0.5700833    0.8348229 0.03422409 0.03585729 0.03885131 0.03424763
##                 socst
## vocational 0.02712589
## general    0.02938212
## 
## Residual Deviance: 305.8705 
## AIC: 357.8705
mmod1 <- step(mmod)
## Start:  AIC=357.87
## sprog ~ gender + race + ses + schtyp + read + write + math + 
##     science + socst
## 
## trying - gender 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 171.468391
## iter  20 value 153.592758
## final  value 153.142827 
## converged
## trying - race 
## # weights:  33 (20 variable)
## initial  value 219.722458 
## iter  10 value 172.925326
## iter  20 value 156.065379
## final  value 155.776076 
## converged
## trying - ses 
## # weights:  36 (22 variable)
## initial  value 219.722458 
## iter  10 value 173.415148
## iter  20 value 159.270550
## final  value 159.018173 
## converged
## trying - schtyp 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 179.272154
## iter  20 value 157.543382
## final  value 157.125206 
## converged
## trying - read 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 183.830532
## iter  20 value 154.286859
## final  value 154.065250 
## converged
## trying - write 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 176.101070
## iter  20 value 153.975940
## final  value 153.626207 
## converged
## trying - math 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 182.534864
## iter  20 value 160.202057
## final  value 159.929539 
## converged
## trying - science 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 184.326108
## iter  20 value 158.520506
## final  value 158.243167 
## converged
## trying - socst 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 179.967335
## iter  20 value 157.376702
## final  value 157.146736 
## converged
##           Df      AIC
## - race    20 351.5522
## - gender  24 354.2857
## - write   24 355.2524
## - read    24 356.1305
## <none>    26 357.8705
## - ses     22 362.0363
## - schtyp  24 362.2504
## - socst   24 362.2935
## - science 24 364.4863
## - math    24 367.8591
## # weights:  33 (20 variable)
## initial  value 219.722458 
## iter  10 value 172.925326
## iter  20 value 156.065379
## final  value 155.776076 
## converged
## 
## Step:  AIC=351.55
## sprog ~ gender + ses + schtyp + read + write + math + science + 
##     socst
## 
## trying - gender 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 172.662548
## iter  20 value 156.063823
## final  value 156.032828 
## converged
## trying - ses 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 174.614066
## iter  20 value 161.475590
## final  value 161.472216 
## converged
## trying - schtyp 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 180.749264
## iter  20 value 159.825179
## final  value 159.649518 
## converged
## trying - read 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 183.217967
## iter  20 value 156.956223
## final  value 156.905034 
## converged
## trying - write 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 176.860078
## iter  20 value 156.634024
## final  value 156.325078 
## converged
## trying - math 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 183.819884
## iter  20 value 162.568961
## final  value 162.533639 
## converged
## trying - science 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 185.688554
## iter  20 value 161.852139
## final  value 161.818793 
## converged
## trying - socst 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 180.870401
## iter  20 value 159.648982
## final  value 159.589251 
## converged
##           Df      AIC
## - gender  18 348.0657
## - write   18 348.6502
## - read    18 349.8101
## <none>    20 351.5522
## - ses     16 354.9444
## - socst   18 355.1785
## - schtyp  18 355.2990
## - science 18 359.6376
## - math    18 361.0673
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 172.662548
## iter  20 value 156.063823
## final  value 156.032828 
## converged
## 
## Step:  AIC=348.07
## sprog ~ ses + schtyp + read + write + math + science + socst
## 
## trying - ses 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 174.143433
## iter  20 value 161.751125
## iter  20 value 161.751124
## iter  20 value 161.751124
## final  value 161.751124 
## converged
## trying - schtyp 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 180.674669
## iter  20 value 159.902074
## final  value 159.901300 
## converged
## trying - read 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 182.139891
## iter  20 value 157.256553
## final  value 157.255365 
## converged
## trying - write 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 176.827677
## iter  20 value 156.410686
## final  value 156.406678 
## converged
## trying - math 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 183.645245
## iter  20 value 162.999887
## final  value 162.998232 
## converged
## trying - science 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 185.984215
## iter  20 value 162.121250
## final  value 162.117077 
## converged
## trying - socst 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 180.818124
## iter  20 value 159.843366
## final  value 159.841915 
## converged
##           Df      AIC
## - write   16 344.8134
## - read    16 346.5107
## <none>    18 348.0657
## - ses     14 351.5022
## - socst   16 351.6838
## - schtyp  16 351.8026
## - science 16 356.2342
## - math    16 357.9965
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 176.827677
## iter  20 value 156.410686
## final  value 156.406678 
## converged
## 
## Step:  AIC=344.81
## sprog ~ ses + schtyp + read + math + science + socst
## 
## trying - ses 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 175.697433
## final  value 162.312774 
## converged
## trying - schtyp 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 183.996922
## iter  20 value 160.624557
## final  value 160.624514 
## converged
## trying - read 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 171.169761
## iter  20 value 157.775586
## final  value 157.775540 
## converged
## trying - math 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 175.971820
## iter  20 value 164.774187
## final  value 164.774168 
## converged
## trying - science 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 175.048445
## iter  20 value 162.188199
## final  value 162.188190 
## converged
## trying - socst 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 170.620432
## iter  20 value 161.495973
## final  value 161.495961 
## converged
##           Df      AIC
## - read    14 343.5511
## <none>    16 344.8134
## - ses     12 348.6255
## - schtyp  14 349.2490
## - socst   14 350.9919
## - science 14 352.3764
## - math    14 357.5483
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 171.169761
## iter  20 value 157.775586
## final  value 157.775540 
## converged
## 
## Step:  AIC=343.55
## sprog ~ ses + schtyp + math + science + socst
## 
## trying - ses 
## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 166.035430
## final  value 163.818866 
## converged
## trying - schtyp 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 171.145160
## iter  20 value 162.019070
## iter  20 value 162.019070
## iter  20 value 162.019070
## final  value 162.019070 
## converged
## trying - math 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 171.943351
## final  value 169.805302 
## converged
## trying - science 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 165.438528
## final  value 162.495650 
## converged
## trying - socst 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 167.071471
## final  value 165.099779 
## converged
##           Df      AIC
## <none>    14 343.5511
## - ses     10 347.6377
## - schtyp  12 348.0381
## - science 12 348.9913
## - socst   12 354.1996
## - math    12 363.6106
summary(mmod1)
## Call:
## multinom(formula = sprog ~ ses + schtyp + math + science + socst, 
##     data = hsb)
## 
## Coefficients:
##            (Intercept)      seslow sesmiddle schtyppublic       math    science
## vocational    2.587029  0.87607389 0.6978995    0.6468812 -0.1212242 0.08209791
## general       6.687272 -0.01569301 1.2065000    1.9955504 -0.1369641 0.03941237
##                  socst
## vocational -0.04441228
## general    -0.09363417
## 
## Std. Errors:
##            (Intercept)    seslow sesmiddle schtyppublic       math    science
## vocational    1.686492 0.5758781 0.4930330     0.545598 0.03213345 0.02787694
## general       1.945363 0.6690861 0.5571202     0.812881 0.03591701 0.02864929
##                 socst
## vocational 0.02344856
## general    0.02586717
## 
## Residual Deviance: 315.5511 
## AIC: 343.5511
mmodg <- multinom(sprog ~ race + ses + schtyp + read + write + math + science + socst, hsb)
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 171.468391
## iter  20 value 153.592758
## final  value 153.142827 
## converged
deviance(mmodg) - deviance(mmod)
## [1] 0.415142
pchisq(0.415142, mmod$edf - mmodg$edf, lower = F)
## [1] 0.8125555
#predict(mmod1, data.frame(id = 99), type = "probs")