\(~\)
This hsb data was collected as a subset of the High School and Beyond study conducted by the National Education Longitudinal Studies program of the National Center for Education Statistics. The variables are gender; race; socioeconomic status; school type; chosen high school program type; scores on reading; writing, math, schience, and social studies. We want to determine which factors related to the choice of the type of program - academic, vocational, or general - that the students pursue in high school. The response is multinomial with three levels.
library(nnet)
library(faraway)
library(jtools)
## id gender race ses schtyp prog read write math science socst
## 1 70 male white low public general 57 52 41 47 57
## 2 121 female white middle public vocation 68 59 53 63 61
## 3 86 male white high public general 44 33 54 58 31
## 4 141 male white high public vocation 63 44 47 53 56
## 5 172 male white middle public academic 47 52 57 53 61
## 6 113 male white middle public academic 44 52 51 63 61
\(~\)
Summary for whole data set
## id gender race ses schtyp
## Min. : 1.00 female:109 african-amer: 20 high :58 private: 32
## 1st Qu.: 50.75 male : 91 asian : 11 low :47 public :168
## Median :100.50 hispanic : 24 middle:95
## Mean :100.50 white :145
## 3rd Qu.:150.25
## Max. :200.00
## prog read write math science
## academic:105 Min. :28.00 Min. :31.00 Min. :33.00 Min. :26.00
## general : 45 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00 1st Qu.:44.00
## vocation: 50 Median :50.00 Median :54.00 Median :52.00 Median :53.00
## Mean :52.23 Mean :52.77 Mean :52.65 Mean :51.85
## 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00 3rd Qu.:58.00
## Max. :76.00 Max. :67.00 Max. :75.00 Max. :74.00
## socst
## Min. :26.00
## 1st Qu.:46.00
## Median :52.00
## Mean :52.41
## 3rd Qu.:61.00
## Max. :71.00
Summary for program types
## academic vocational general
## 105 45 50
Checking for missing values
## id gender race ses schtyp prog read write math science
## 0 0 0 0 0 0 0 0 0 0
## socst
## 0
\(~\)
Model 1
## # weights: 42 (26 variable)
## initial value 219.722458
## iter 10 value 171.814970
## iter 20 value 153.793692
## iter 30 value 152.935260
## final value 152.935256
## converged
## Call:
## multinom(formula = sprog ~ gender + race + ses + schtyp + read +
## write + math + science + socst, data = hsb)
##
## Coefficients:
## (Intercept) gendermale raceasian racehispanic racewhite seslow
## vocational 3.631901 -0.09264717 1.352739 -0.6322019 0.2965156 1.09864111
## general 7.481381 -0.32104341 -0.700070 -0.1993556 0.3358881 0.04747323
## sesmiddle schtyppublic read write math science
## vocational 0.7029621 0.5845405 -0.04418353 -0.03627381 -0.1092888 0.10193746
## general 1.1815808 2.0553336 -0.03481202 -0.03166001 -0.1139877 0.05229938
## socst
## vocational -0.01976995
## general -0.08040129
##
## Std. Errors:
## (Intercept) gendermale raceasian racehispanic racewhite seslow
## vocational 1.823452 0.4548778 1.058754 0.8935504 0.7354829 0.6066763
## general 2.104698 0.5021132 1.470176 0.8393676 0.7480573 0.7045772
## sesmiddle schtyppublic read write math science
## vocational 0.5045938 0.5642925 0.03103707 0.03381324 0.03522441 0.03274038
## general 0.5700833 0.8348229 0.03422409 0.03585729 0.03885131 0.03424763
## socst
## vocational 0.02712589
## general 0.02938212
##
## Residual Deviance: 305.8705
## AIC: 357.8705
\(~\)
I followed the textbook’s example and used the AIC criterion using the step wise method. As the model goes through each variables we notice that one variable is removed. The lower the AIC score the better and we used model2 to demonstrate just that.
Model 2
## Start: AIC=357.87
## sprog ~ gender + race + ses + schtyp + read + write + math +
## science + socst
##
## trying - gender
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 171.468391
## iter 20 value 153.592758
## final value 153.142827
## converged
## trying - race
## # weights: 33 (20 variable)
## initial value 219.722458
## iter 10 value 172.925326
## iter 20 value 156.065379
## final value 155.776076
## converged
## trying - ses
## # weights: 36 (22 variable)
## initial value 219.722458
## iter 10 value 173.415148
## iter 20 value 159.270550
## final value 159.018173
## converged
## trying - schtyp
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 179.272154
## iter 20 value 157.543382
## final value 157.125206
## converged
## trying - read
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 183.830532
## iter 20 value 154.286859
## final value 154.065250
## converged
## trying - write
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 176.101070
## iter 20 value 153.975940
## final value 153.626207
## converged
## trying - math
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 182.534864
## iter 20 value 160.202057
## final value 159.929539
## converged
## trying - science
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 184.326108
## iter 20 value 158.520506
## final value 158.243167
## converged
## trying - socst
## # weights: 39 (24 variable)
## initial value 219.722458
## iter 10 value 179.967335
## iter 20 value 157.376702
## final value 157.146736
## converged
## Df AIC
## - race 20 351.5522
## - gender 24 354.2857
## - write 24 355.2524
## - read 24 356.1305
## <none> 26 357.8705
## - ses 22 362.0363
## - schtyp 24 362.2504
## - socst 24 362.2935
## - science 24 364.4863
## - math 24 367.8591
## # weights: 33 (20 variable)
## initial value 219.722458
## iter 10 value 172.925326
## iter 20 value 156.065379
## final value 155.776076
## converged
##
## Step: AIC=351.55
## sprog ~ gender + ses + schtyp + read + write + math + science +
## socst
##
## trying - gender
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 172.662548
## iter 20 value 156.063823
## final value 156.032828
## converged
## trying - ses
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 174.614066
## iter 20 value 161.475590
## final value 161.472216
## converged
## trying - schtyp
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 180.749264
## iter 20 value 159.825179
## final value 159.649518
## converged
## trying - read
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 183.217967
## iter 20 value 156.956223
## final value 156.905034
## converged
## trying - write
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 176.860078
## iter 20 value 156.634024
## final value 156.325078
## converged
## trying - math
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 183.819884
## iter 20 value 162.568961
## final value 162.533639
## converged
## trying - science
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 185.688554
## iter 20 value 161.852139
## final value 161.818793
## converged
## trying - socst
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 180.870401
## iter 20 value 159.648982
## final value 159.589251
## converged
## Df AIC
## - gender 18 348.0657
## - write 18 348.6502
## - read 18 349.8101
## <none> 20 351.5522
## - ses 16 354.9444
## - socst 18 355.1785
## - schtyp 18 355.2990
## - science 18 359.6376
## - math 18 361.0673
## # weights: 30 (18 variable)
## initial value 219.722458
## iter 10 value 172.662548
## iter 20 value 156.063823
## final value 156.032828
## converged
##
## Step: AIC=348.07
## sprog ~ ses + schtyp + read + write + math + science + socst
##
## trying - ses
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 174.143433
## iter 20 value 161.751125
## iter 20 value 161.751124
## iter 20 value 161.751124
## final value 161.751124
## converged
## trying - schtyp
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 180.674669
## iter 20 value 159.902074
## final value 159.901300
## converged
## trying - read
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 182.139891
## iter 20 value 157.256553
## final value 157.255365
## converged
## trying - write
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 176.827677
## iter 20 value 156.410686
## final value 156.406678
## converged
## trying - math
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 183.645245
## iter 20 value 162.999887
## final value 162.998232
## converged
## trying - science
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 185.984215
## iter 20 value 162.121250
## final value 162.117077
## converged
## trying - socst
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 180.818124
## iter 20 value 159.843366
## final value 159.841915
## converged
## Df AIC
## - write 16 344.8134
## - read 16 346.5107
## <none> 18 348.0657
## - ses 14 351.5022
## - socst 16 351.6838
## - schtyp 16 351.8026
## - science 16 356.2342
## - math 16 357.9965
## # weights: 27 (16 variable)
## initial value 219.722458
## iter 10 value 176.827677
## iter 20 value 156.410686
## final value 156.406678
## converged
##
## Step: AIC=344.81
## sprog ~ ses + schtyp + read + math + science + socst
##
## trying - ses
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 175.697433
## final value 162.312774
## converged
## trying - schtyp
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 183.996922
## iter 20 value 160.624557
## final value 160.624514
## converged
## trying - read
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 171.169761
## iter 20 value 157.775586
## final value 157.775540
## converged
## trying - math
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 175.971820
## iter 20 value 164.774187
## final value 164.774168
## converged
## trying - science
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 175.048445
## iter 20 value 162.188199
## final value 162.188190
## converged
## trying - socst
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 170.620432
## iter 20 value 161.495973
## final value 161.495961
## converged
## Df AIC
## - read 14 343.5511
## <none> 16 344.8134
## - ses 12 348.6255
## - schtyp 14 349.2490
## - socst 14 350.9919
## - science 14 352.3764
## - math 14 357.5483
## # weights: 24 (14 variable)
## initial value 219.722458
## iter 10 value 171.169761
## iter 20 value 157.775586
## final value 157.775540
## converged
##
## Step: AIC=343.55
## sprog ~ ses + schtyp + math + science + socst
##
## trying - ses
## # weights: 18 (10 variable)
## initial value 219.722458
## iter 10 value 166.035430
## final value 163.818866
## converged
## trying - schtyp
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 171.145160
## iter 20 value 162.019070
## iter 20 value 162.019070
## iter 20 value 162.019070
## final value 162.019070
## converged
## trying - math
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 171.943351
## final value 169.805302
## converged
## trying - science
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 165.438528
## final value 162.495650
## converged
## trying - socst
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 167.071471
## final value 165.099779
## converged
## Df AIC
## <none> 14 343.5511
## - ses 10 347.6377
## - schtyp 12 348.0381
## - science 12 348.9913
## - socst 12 354.1996
## - math 12 363.6106
## Call:
## multinom(formula = sprog ~ ses + schtyp + math + science + socst,
## data = hsb)
##
## Coefficients:
## (Intercept) seslow sesmiddle schtyppublic math science
## vocational 2.587029 0.87607389 0.6978995 0.6468812 -0.1212242 0.08209791
## general 6.687272 -0.01569301 1.2065000 1.9955504 -0.1369641 0.03941237
## socst
## vocational -0.04441228
## general -0.09363417
##
## Std. Errors:
## (Intercept) seslow sesmiddle schtyppublic math science
## vocational 1.686492 0.5758781 0.4930330 0.545598 0.03213345 0.02787694
## general 1.945363 0.6690861 0.5571202 0.812881 0.03591701 0.02864929
## socst
## vocational 0.02344856
## general 0.02586717
##
## Residual Deviance: 315.5511
## AIC: 343.5511
## [1] "Deviance comparison is: 9.68056624203234"
## [1] "The distribution on the models is: 0.643961556133117"
\(~\)
Below are the probabilities for each of the three possible choices:
## academic vocational general
## 0.1877186 0.3566929 0.4555884
\(~\)
Plots for subjects as demonstrated on page 107: