\(~\)

Chapter 5 - Exercise 5.1

This hsb data was collected as a subset of the High School and Beyond study conducted by the National Education Longitudinal Studies program of the National Center for Education Statistics. The variables are gender; race; socioeconomic status; school type; chosen high school program type; scores on reading; writing, math, schience, and social studies. We want to determine which factors related to the choice of the type of program - academic, vocational, or general - that the students pursue in high school. The response is multinomial with three levels.

Load Libraries

library(nnet)
library(faraway)
library(jtools)

Load data

##    id gender  race    ses schtyp     prog read write math science socst
## 1  70   male white    low public  general   57    52   41      47    57
## 2 121 female white middle public vocation   68    59   53      63    61
## 3  86   male white   high public  general   44    33   54      58    31
## 4 141   male white   high public vocation   63    44   47      53    56
## 5 172   male white middle public academic   47    52   57      53    61
## 6 113   male white middle public academic   44    52   51      63    61

\(~\)

Data Evaluation

Summary for whole data set

##        id            gender              race         ses         schtyp   
##  Min.   :  1.00   female:109   african-amer: 20   high  :58   private: 32  
##  1st Qu.: 50.75   male  : 91   asian       : 11   low   :47   public :168  
##  Median :100.50                hispanic    : 24   middle:95                
##  Mean   :100.50                white       :145                            
##  3rd Qu.:150.25                                                            
##  Max.   :200.00                                                            
##        prog          read           write            math          science     
##  academic:105   Min.   :28.00   Min.   :31.00   Min.   :33.00   Min.   :26.00  
##  general : 45   1st Qu.:44.00   1st Qu.:45.75   1st Qu.:45.00   1st Qu.:44.00  
##  vocation: 50   Median :50.00   Median :54.00   Median :52.00   Median :53.00  
##                 Mean   :52.23   Mean   :52.77   Mean   :52.65   Mean   :51.85  
##                 3rd Qu.:60.00   3rd Qu.:60.00   3rd Qu.:59.00   3rd Qu.:58.00  
##                 Max.   :76.00   Max.   :67.00   Max.   :75.00   Max.   :74.00  
##      socst      
##  Min.   :26.00  
##  1st Qu.:46.00  
##  Median :52.00  
##  Mean   :52.41  
##  3rd Qu.:61.00  
##  Max.   :71.00

Summary for program types

##   academic vocational    general 
##        105         45         50

Checking for missing values

##      id  gender    race     ses  schtyp    prog    read   write    math science 
##       0       0       0       0       0       0       0       0       0       0 
##   socst 
##       0

\(~\)

Questions:

  1. Fit a trinomial response model with the other relevant variables as predictors (untransformed).

Model 1

## # weights:  42 (26 variable)
## initial  value 219.722458 
## iter  10 value 171.814970
## iter  20 value 153.793692
## iter  30 value 152.935260
## final  value 152.935256 
## converged
## Call:
## multinom(formula = sprog ~ gender + race + ses + schtyp + read + 
##     write + math + science + socst, data = hsb)
## 
## Coefficients:
##            (Intercept)  gendermale raceasian racehispanic racewhite     seslow
## vocational    3.631901 -0.09264717  1.352739   -0.6322019 0.2965156 1.09864111
## general       7.481381 -0.32104341 -0.700070   -0.1993556 0.3358881 0.04747323
##            sesmiddle schtyppublic        read       write       math    science
## vocational 0.7029621    0.5845405 -0.04418353 -0.03627381 -0.1092888 0.10193746
## general    1.1815808    2.0553336 -0.03481202 -0.03166001 -0.1139877 0.05229938
##                  socst
## vocational -0.01976995
## general    -0.08040129
## 
## Std. Errors:
##            (Intercept) gendermale raceasian racehispanic racewhite    seslow
## vocational    1.823452  0.4548778  1.058754    0.8935504 0.7354829 0.6066763
## general       2.104698  0.5021132  1.470176    0.8393676 0.7480573 0.7045772
##            sesmiddle schtyppublic       read      write       math    science
## vocational 0.5045938    0.5642925 0.03103707 0.03381324 0.03522441 0.03274038
## general    0.5700833    0.8348229 0.03422409 0.03585729 0.03885131 0.03424763
##                 socst
## vocational 0.02712589
## general    0.02938212
## 
## Residual Deviance: 305.8705 
## AIC: 357.8705

\(~\)

  1. Use backward elimination to reduce the model to one where all predictors are statistically significant. Give an interpretation of the resulting model.

I followed the textbook’s example and used the AIC criterion using the step wise method. As the model goes through each variables we notice that one variable is removed. The lower the AIC score the better and we used model2 to demonstrate just that.

Model 2

## Start:  AIC=357.87
## sprog ~ gender + race + ses + schtyp + read + write + math + 
##     science + socst
## 
## trying - gender 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 171.468391
## iter  20 value 153.592758
## final  value 153.142827 
## converged
## trying - race 
## # weights:  33 (20 variable)
## initial  value 219.722458 
## iter  10 value 172.925326
## iter  20 value 156.065379
## final  value 155.776076 
## converged
## trying - ses 
## # weights:  36 (22 variable)
## initial  value 219.722458 
## iter  10 value 173.415148
## iter  20 value 159.270550
## final  value 159.018173 
## converged
## trying - schtyp 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 179.272154
## iter  20 value 157.543382
## final  value 157.125206 
## converged
## trying - read 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 183.830532
## iter  20 value 154.286859
## final  value 154.065250 
## converged
## trying - write 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 176.101070
## iter  20 value 153.975940
## final  value 153.626207 
## converged
## trying - math 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 182.534864
## iter  20 value 160.202057
## final  value 159.929539 
## converged
## trying - science 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 184.326108
## iter  20 value 158.520506
## final  value 158.243167 
## converged
## trying - socst 
## # weights:  39 (24 variable)
## initial  value 219.722458 
## iter  10 value 179.967335
## iter  20 value 157.376702
## final  value 157.146736 
## converged
##           Df      AIC
## - race    20 351.5522
## - gender  24 354.2857
## - write   24 355.2524
## - read    24 356.1305
## <none>    26 357.8705
## - ses     22 362.0363
## - schtyp  24 362.2504
## - socst   24 362.2935
## - science 24 364.4863
## - math    24 367.8591
## # weights:  33 (20 variable)
## initial  value 219.722458 
## iter  10 value 172.925326
## iter  20 value 156.065379
## final  value 155.776076 
## converged
## 
## Step:  AIC=351.55
## sprog ~ gender + ses + schtyp + read + write + math + science + 
##     socst
## 
## trying - gender 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 172.662548
## iter  20 value 156.063823
## final  value 156.032828 
## converged
## trying - ses 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 174.614066
## iter  20 value 161.475590
## final  value 161.472216 
## converged
## trying - schtyp 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 180.749264
## iter  20 value 159.825179
## final  value 159.649518 
## converged
## trying - read 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 183.217967
## iter  20 value 156.956223
## final  value 156.905034 
## converged
## trying - write 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 176.860078
## iter  20 value 156.634024
## final  value 156.325078 
## converged
## trying - math 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 183.819884
## iter  20 value 162.568961
## final  value 162.533639 
## converged
## trying - science 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 185.688554
## iter  20 value 161.852139
## final  value 161.818793 
## converged
## trying - socst 
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 180.870401
## iter  20 value 159.648982
## final  value 159.589251 
## converged
##           Df      AIC
## - gender  18 348.0657
## - write   18 348.6502
## - read    18 349.8101
## <none>    20 351.5522
## - ses     16 354.9444
## - socst   18 355.1785
## - schtyp  18 355.2990
## - science 18 359.6376
## - math    18 361.0673
## # weights:  30 (18 variable)
## initial  value 219.722458 
## iter  10 value 172.662548
## iter  20 value 156.063823
## final  value 156.032828 
## converged
## 
## Step:  AIC=348.07
## sprog ~ ses + schtyp + read + write + math + science + socst
## 
## trying - ses 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 174.143433
## iter  20 value 161.751125
## iter  20 value 161.751124
## iter  20 value 161.751124
## final  value 161.751124 
## converged
## trying - schtyp 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 180.674669
## iter  20 value 159.902074
## final  value 159.901300 
## converged
## trying - read 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 182.139891
## iter  20 value 157.256553
## final  value 157.255365 
## converged
## trying - write 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 176.827677
## iter  20 value 156.410686
## final  value 156.406678 
## converged
## trying - math 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 183.645245
## iter  20 value 162.999887
## final  value 162.998232 
## converged
## trying - science 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 185.984215
## iter  20 value 162.121250
## final  value 162.117077 
## converged
## trying - socst 
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 180.818124
## iter  20 value 159.843366
## final  value 159.841915 
## converged
##           Df      AIC
## - write   16 344.8134
## - read    16 346.5107
## <none>    18 348.0657
## - ses     14 351.5022
## - socst   16 351.6838
## - schtyp  16 351.8026
## - science 16 356.2342
## - math    16 357.9965
## # weights:  27 (16 variable)
## initial  value 219.722458 
## iter  10 value 176.827677
## iter  20 value 156.410686
## final  value 156.406678 
## converged
## 
## Step:  AIC=344.81
## sprog ~ ses + schtyp + read + math + science + socst
## 
## trying - ses 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 175.697433
## final  value 162.312774 
## converged
## trying - schtyp 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 183.996922
## iter  20 value 160.624557
## final  value 160.624514 
## converged
## trying - read 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 171.169761
## iter  20 value 157.775586
## final  value 157.775540 
## converged
## trying - math 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 175.971820
## iter  20 value 164.774187
## final  value 164.774168 
## converged
## trying - science 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 175.048445
## iter  20 value 162.188199
## final  value 162.188190 
## converged
## trying - socst 
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 170.620432
## iter  20 value 161.495973
## final  value 161.495961 
## converged
##           Df      AIC
## - read    14 343.5511
## <none>    16 344.8134
## - ses     12 348.6255
## - schtyp  14 349.2490
## - socst   14 350.9919
## - science 14 352.3764
## - math    14 357.5483
## # weights:  24 (14 variable)
## initial  value 219.722458 
## iter  10 value 171.169761
## iter  20 value 157.775586
## final  value 157.775540 
## converged
## 
## Step:  AIC=343.55
## sprog ~ ses + schtyp + math + science + socst
## 
## trying - ses 
## # weights:  18 (10 variable)
## initial  value 219.722458 
## iter  10 value 166.035430
## final  value 163.818866 
## converged
## trying - schtyp 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 171.145160
## iter  20 value 162.019070
## iter  20 value 162.019070
## iter  20 value 162.019070
## final  value 162.019070 
## converged
## trying - math 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 171.943351
## final  value 169.805302 
## converged
## trying - science 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 165.438528
## final  value 162.495650 
## converged
## trying - socst 
## # weights:  21 (12 variable)
## initial  value 219.722458 
## iter  10 value 167.071471
## final  value 165.099779 
## converged
##           Df      AIC
## <none>    14 343.5511
## - ses     10 347.6377
## - schtyp  12 348.0381
## - science 12 348.9913
## - socst   12 354.1996
## - math    12 363.6106
## Call:
## multinom(formula = sprog ~ ses + schtyp + math + science + socst, 
##     data = hsb)
## 
## Coefficients:
##            (Intercept)      seslow sesmiddle schtyppublic       math    science
## vocational    2.587029  0.87607389 0.6978995    0.6468812 -0.1212242 0.08209791
## general       6.687272 -0.01569301 1.2065000    1.9955504 -0.1369641 0.03941237
##                  socst
## vocational -0.04441228
## general    -0.09363417
## 
## Std. Errors:
##            (Intercept)    seslow sesmiddle schtyppublic       math    science
## vocational    1.686492 0.5758781 0.4930330     0.545598 0.03213345 0.02787694
## general       1.945363 0.6690861 0.5571202     0.812881 0.03591701 0.02864929
##                 socst
## vocational 0.02344856
## general    0.02586717
## 
## Residual Deviance: 315.5511 
## AIC: 343.5511
## [1] "Deviance comparison is: 9.68056624203234"
## [1] "The distribution on the models is: 0.643961556133117"

\(~\)

  1. For the student with id 99, compute the predicted probabilities of the three possible choices.

Below are the probabilities for each of the three possible choices:

##   academic vocational    general 
##  0.1877186  0.3566929  0.4555884

\(~\)

Plots for subjects as demonstrated on page 107: