Homework 6
Imputation of missing data
For this assignment, I will be using the ELS:2002 dataset. My outcome variable will be college attainment in the 3rd follow-up (i.e., did the student attain a bachelor’s degree or higher as of the third follow-up). My predictors will be:
“immediate_ps_enrollment” – did they immediately enroll in a public higher education institution upon graduation from high school (1 for yes, 0 for no)
“repeated_grades” – did the student ever repeat any grades in school (1 for yes, 0 for no)
“parents_college_exp” – did the student’s parents attend college (1 for yes, 0 for no)
“family_low_ses” – does the student’s family have below-average SES (1 for yes, 0 for no)
“non_trad_family” – does the student (during the base year) live in a non-traditional household? 1 for yes, 0 for no.
“minority” – does the student belong to a racial or ethnic minority (1 for yes, 0 for no)
“male” – is the student male? (1 for yes, 0 for no)
Here are the patterns of missingness for the selected ELS:2002 variables:
summary(homework6)
## stu_id strat_id f3qwt edu_expectations_met
## Min. :101101 Min. :101.0 Min. : 0.00 Min. :0.000
## 1st Qu.:190106 1st Qu.:190.0 1st Qu.: 52.35 1st Qu.:0.000
## Median :281116 Median :281.0 Median :180.52 Median :0.000
## Mean :279543 Mean :279.4 Mean :202.94 Mean :0.233
## 3rd Qu.:369205 3rd Qu.:369.0 3rd Qu.:306.37 3rd Qu.:0.000
## Max. :461234 Max. :461.0 Max. :903.01 Max. :1.000
## NA's :3345
## ba_or_higher immediate_ps_enrollment repeated_grades
## Min. :0.0000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.0000 Median :1.000 Median :0.0000
## Mean :0.3849 Mean :0.678 Mean :0.1241
## 3rd Qu.:1.0000 3rd Qu.:1.000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.000 Max. :1.0000
## NA's :2947 NA's :3556 NA's :2944
## parents_college_exp family_low_ses non_trad_family minority
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.8759 Mean :0.2367 Mean :0.4062 Mean :0.4305
## 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :178 NA's :953 NA's :872 NA's :953
## male immigrant_proxy
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000
## Mean :0.4979 Mean :0.0693
## 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000
## NA's :827 NA's :1414
As we can see, the variables with the highest number of missing cases are “edu_expectations_met” and “immediate_ps_enrollment” followed closely by “ba_or_higher” – none of which are surprising because they are all variables recorded after the initial survey.
We can see further patterns in the data’s missingness using the md.pattern function:
md.pattern(homework6)
## stu_id strat_id f3qwt parents_college_exp male non_trad_family
## 9950 1 1 1 1 1 1
## 411 1 1 1 1 1 1
## 1687 1 1 1 1 1 1
## 62 1 1 1 1 1 1
## 222 1 1 1 1 1 1
## 131 1 1 1 1 1 1
## 4 1 1 1 1 1 1
## 259 1 1 1 1 1 1
## 1680 1 1 1 1 1 1
## 82 1 1 1 1 1 1
## 35 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 25 1 1 1 1 1 1
## 567 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 20 1 1 1 1 1 1
## 18 1 1 1 1 1 1
## 19 1 1 1 1 1 1
## 13 1 1 1 1 1 1
## 125 1 1 1 1 1 1
## 19 1 1 1 1 1 0
## 2 1 1 1 1 1 1
## 9 1 1 1 1 1 1
## 2 1 1 1 1 1 0
## 7 1 1 1 1 1 0
## 476 1 1 1 1 0 0
## 12 1 1 1 1 1 0
## 1 1 1 1 1 1 0
## 31 1 1 1 1 0 0
## 11 1 1 1 1 0 0
## 106 1 1 1 0 0 0
## 4 1 1 1 1 1 0
## 126 1 1 1 1 0 0
## 3 1 1 1 0 0 0
## 8 1 1 1 0 0 0
## 5 1 1 1 1 0 0
## 51 1 1 1 0 0 0
## 10 1 1 1 0 0 0
## 0 0 0 178 827 872
## family_low_ses minority immigrant_proxy repeated_grades ba_or_higher
## 9950 1 1 1 1 1
## 411 1 1 1 1 1
## 1687 1 1 1 0 1
## 62 1 1 0 1 1
## 222 1 1 1 1 0
## 131 1 1 1 0 1
## 4 1 1 0 1 1
## 259 1 1 0 0 1
## 1680 1 1 1 1 0
## 82 1 1 1 0 0
## 35 0 0 1 1 1
## 1 1 1 0 1 0
## 25 1 1 0 0 1
## 567 1 1 1 0 0
## 1 0 0 1 1 1
## 2 0 0 1 0 1
## 20 1 1 0 1 0
## 18 1 1 0 0 0
## 19 0 0 0 1 1
## 13 0 0 1 1 0
## 125 1 1 0 0 0
## 19 0 0 0 1 1
## 2 0 0 1 0 0
## 9 0 0 0 1 0
## 2 0 0 0 1 1
## 7 0 0 0 0 1
## 476 0 0 0 1 1
## 12 0 0 0 1 0
## 1 0 0 0 0 1
## 31 0 0 0 1 1
## 11 0 0 0 0 1
## 106 0 0 0 1 1
## 4 0 0 0 0 0
## 126 0 0 0 1 0
## 3 0 0 0 1 1
## 8 0 0 0 0 1
## 5 0 0 0 0 0
## 51 0 0 0 1 0
## 10 0 0 0 0 0
## 953 953 1414 2944 2947
## edu_expectations_met immediate_ps_enrollment
## 9950 1 1 0
## 411 1 0 1
## 1687 1 1 1
## 62 1 1 1
## 222 1 0 2
## 131 1 0 2
## 4 1 0 2
## 259 1 1 2
## 1680 0 0 3
## 82 1 0 3
## 35 0 1 3
## 1 1 0 3
## 25 1 0 3
## 567 0 0 4
## 1 0 0 4
## 2 0 1 4
## 20 0 0 4
## 18 1 0 4
## 19 0 1 4
## 13 0 0 5
## 125 0 0 5
## 19 0 1 5
## 2 0 0 6
## 9 0 0 6
## 2 0 0 6
## 7 0 1 6
## 476 0 1 6
## 12 0 0 7
## 1 0 0 7
## 31 0 0 7
## 11 0 1 7
## 106 0 1 7
## 4 0 0 8
## 126 0 0 8
## 3 0 0 8
## 8 0 1 8
## 5 0 0 9
## 51 0 0 9
## 10 0 0 10
## 3345 3556 17989
It appears as if the most common set of missing data (1687) are those missing repeated_grades information, while the second most common (1680) are those missing all of the post-survey variables.
We can also observe pairs of missingness as follows:
md.pairs(homework6)
## $rr
## stu_id strat_id f3qwt edu_expectations_met
## stu_id 16197 16197 16197 12852
## strat_id 16197 16197 16197 12852
## f3qwt 16197 16197 16197 12852
## edu_expectations_met 12852 12852 12852 12852
## ba_or_higher 13250 13250 13250 12529
## immediate_ps_enrollment 12641 12641 12641 11958
## repeated_grades 13253 13253 13253 10650
## parents_college_exp 16019 16019 16019 12852
## family_low_ses 15244 15244 15244 12852
## non_trad_family 15325 15325 15325 12852
## minority 15244 15244 15244 12852
## male 15370 15370 15370 12852
## immigrant_proxy 14783 14783 14783 12483
## ba_or_higher immediate_ps_enrollment
## stu_id 13250 12641
## strat_id 13250 12641
## f3qwt 13250 12641
## edu_expectations_met 12529 11958
## ba_or_higher 13250 12641
## immediate_ps_enrollment 12641 12641
## repeated_grades 11119 10667
## parents_college_exp 13133 12527
## family_low_ses 12529 11958
## non_trad_family 12586 12014
## minority 12529 11958
## male 12615 12040
## immigrant_proxy 12217 11674
## repeated_grades parents_college_exp family_low_ses
## stu_id 13253 16019 15244
## strat_id 13253 16019 15244
## f3qwt 13253 16019 15244
## edu_expectations_met 10650 12852 12852
## ba_or_higher 11119 13133 12529
## immediate_ps_enrollment 10667 12527 11958
## repeated_grades 13253 13093 12350
## parents_college_exp 13093 16019 15244
## family_low_ses 12350 15244 15244
## non_trad_family 12427 15325 15244
## minority 12350 15244 15244
## male 12460 15370 15244
## immigrant_proxy 12312 14783 14730
## non_trad_family minority male immigrant_proxy
## stu_id 15325 15244 15370 14783
## strat_id 15325 15244 15370 14783
## f3qwt 15325 15244 15370 14783
## edu_expectations_met 12852 12852 12852 12483
## ba_or_higher 12586 12529 12615 12217
## immediate_ps_enrollment 12014 11958 12040 11674
## repeated_grades 12427 12350 12460 12312
## parents_college_exp 15325 15244 15370 14783
## family_low_ses 15244 15244 15244 14730
## non_trad_family 15325 15244 15325 14783
## minority 15244 15244 15244 14730
## male 15325 15244 15370 14783
## immigrant_proxy 14783 14730 14783 14783
##
## $rm
## stu_id strat_id f3qwt edu_expectations_met
## stu_id 0 0 0 3345
## strat_id 0 0 0 3345
## f3qwt 0 0 0 3345
## edu_expectations_met 0 0 0 0
## ba_or_higher 0 0 0 721
## immediate_ps_enrollment 0 0 0 683
## repeated_grades 0 0 0 2603
## parents_college_exp 0 0 0 3167
## family_low_ses 0 0 0 2392
## non_trad_family 0 0 0 2473
## minority 0 0 0 2392
## male 0 0 0 2518
## immigrant_proxy 0 0 0 2300
## ba_or_higher immediate_ps_enrollment
## stu_id 2947 3556
## strat_id 2947 3556
## f3qwt 2947 3556
## edu_expectations_met 323 894
## ba_or_higher 0 609
## immediate_ps_enrollment 0 0
## repeated_grades 2134 2586
## parents_college_exp 2886 3492
## family_low_ses 2715 3286
## non_trad_family 2739 3311
## minority 2715 3286
## male 2755 3330
## immigrant_proxy 2566 3109
## repeated_grades parents_college_exp family_low_ses
## stu_id 2944 178 953
## strat_id 2944 178 953
## f3qwt 2944 178 953
## edu_expectations_met 2202 0 0
## ba_or_higher 2131 117 721
## immediate_ps_enrollment 1974 114 683
## repeated_grades 0 160 903
## parents_college_exp 2926 0 775
## family_low_ses 2894 0 0
## non_trad_family 2898 0 81
## minority 2894 0 0
## male 2910 0 126
## immigrant_proxy 2471 0 53
## non_trad_family minority male immigrant_proxy
## stu_id 872 953 827 1414
## strat_id 872 953 827 1414
## f3qwt 872 953 827 1414
## edu_expectations_met 0 0 0 369
## ba_or_higher 664 721 635 1033
## immediate_ps_enrollment 627 683 601 967
## repeated_grades 826 903 793 941
## parents_college_exp 694 775 649 1236
## family_low_ses 0 0 0 514
## non_trad_family 0 81 0 542
## minority 0 0 0 514
## male 45 126 0 587
## immigrant_proxy 0 53 0 0
##
## $mr
## stu_id strat_id f3qwt edu_expectations_met
## stu_id 0 0 0 0
## strat_id 0 0 0 0
## f3qwt 0 0 0 0
## edu_expectations_met 3345 3345 3345 0
## ba_or_higher 2947 2947 2947 323
## immediate_ps_enrollment 3556 3556 3556 894
## repeated_grades 2944 2944 2944 2202
## parents_college_exp 178 178 178 0
## family_low_ses 953 953 953 0
## non_trad_family 872 872 872 0
## minority 953 953 953 0
## male 827 827 827 0
## immigrant_proxy 1414 1414 1414 369
## ba_or_higher immediate_ps_enrollment
## stu_id 0 0
## strat_id 0 0
## f3qwt 0 0
## edu_expectations_met 721 683
## ba_or_higher 0 0
## immediate_ps_enrollment 609 0
## repeated_grades 2131 1974
## parents_college_exp 117 114
## family_low_ses 721 683
## non_trad_family 664 627
## minority 721 683
## male 635 601
## immigrant_proxy 1033 967
## repeated_grades parents_college_exp family_low_ses
## stu_id 0 0 0
## strat_id 0 0 0
## f3qwt 0 0 0
## edu_expectations_met 2603 3167 2392
## ba_or_higher 2134 2886 2715
## immediate_ps_enrollment 2586 3492 3286
## repeated_grades 0 2926 2894
## parents_college_exp 160 0 0
## family_low_ses 903 775 0
## non_trad_family 826 694 0
## minority 903 775 0
## male 793 649 0
## immigrant_proxy 941 1236 514
## non_trad_family minority male immigrant_proxy
## stu_id 0 0 0 0
## strat_id 0 0 0 0
## f3qwt 0 0 0 0
## edu_expectations_met 2473 2392 2518 2300
## ba_or_higher 2739 2715 2755 2566
## immediate_ps_enrollment 3311 3286 3330 3109
## repeated_grades 2898 2894 2910 2471
## parents_college_exp 0 0 0 0
## family_low_ses 81 0 126 53
## non_trad_family 0 0 45 0
## minority 81 0 126 53
## male 0 0 0 0
## immigrant_proxy 542 514 587 0
##
## $mm
## stu_id strat_id f3qwt edu_expectations_met
## stu_id 0 0 0 0
## strat_id 0 0 0 0
## f3qwt 0 0 0 0
## edu_expectations_met 0 0 0 3345
## ba_or_higher 0 0 0 2624
## immediate_ps_enrollment 0 0 0 2662
## repeated_grades 0 0 0 742
## parents_college_exp 0 0 0 178
## family_low_ses 0 0 0 953
## non_trad_family 0 0 0 872
## minority 0 0 0 953
## male 0 0 0 827
## immigrant_proxy 0 0 0 1045
## ba_or_higher immediate_ps_enrollment
## stu_id 0 0
## strat_id 0 0
## f3qwt 0 0
## edu_expectations_met 2624 2662
## ba_or_higher 2947 2947
## immediate_ps_enrollment 2947 3556
## repeated_grades 813 970
## parents_college_exp 61 64
## family_low_ses 232 270
## non_trad_family 208 245
## minority 232 270
## male 192 226
## immigrant_proxy 381 447
## repeated_grades parents_college_exp family_low_ses
## stu_id 0 0 0
## strat_id 0 0 0
## f3qwt 0 0 0
## edu_expectations_met 742 178 953
## ba_or_higher 813 61 232
## immediate_ps_enrollment 970 64 270
## repeated_grades 2944 18 50
## parents_college_exp 18 178 178
## family_low_ses 50 178 953
## non_trad_family 46 178 872
## minority 50 178 953
## male 34 178 827
## immigrant_proxy 473 178 900
## non_trad_family minority male immigrant_proxy
## stu_id 0 0 0 0
## strat_id 0 0 0 0
## f3qwt 0 0 0 0
## edu_expectations_met 872 953 827 1045
## ba_or_higher 208 232 192 381
## immediate_ps_enrollment 245 270 226 447
## repeated_grades 46 50 34 473
## parents_college_exp 178 178 178 178
## family_low_ses 872 953 827 900
## non_trad_family 872 872 827 872
## minority 872 953 827 900
## male 827 827 827 827
## immigrant_proxy 872 900 827 1414
Here, we will run a simple imputation by replacing missing values with the mode of the variable, since virtually all of our values are categorical.
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
mcv.parents_college_exp<-Mode(homework6$parents_college_exp)
mcv.male<-Mode(homework6$male)
mcv.non_trad_family<-Mode(homework6$non_trad_family)
mcv.family_low_ses<-Mode(homework6$family_low_ses)
mcv.minority<-Mode(homework6$minority)
mcv.immediate_ps_enrollment<-Mode(homework6$immediate_ps_enrollment)
mcv.ba_or_higher<-Mode(homework6$ba_or_higher)
mcv.repeated_grades<-Mode(homework6$repeated_grades)
Now that we have created the modes for these variables, we can imput them:
homework6$parents_college_exp.imp<-ifelse(is.na(homework6$parents_college_exp)==T, mcv.parents_college_exp, homework6$parents_college_exp)
homework6$male.imp<-ifelse(is.na(homework6$male)==T, mcv.male, homework6$male)
homework6$non_trad_family.imp<-ifelse(is.na(homework6$non_trad_family)==T, mcv.non_trad_family, homework6$non_trad_family)
homework6$family_low_ses.imp<-ifelse(is.na(homework6$family_low_ses)==T, mcv.family_low_ses, homework6$family_low_ses)
homework6$minority.imp<-ifelse(is.na(homework6$minority)==T, mcv.minority, homework6$minority)
homework6$immediate_ps_enrollment.imp<-ifelse(is.na(homework6$immediate_ps_enrollment)==T, mcv.immediate_ps_enrollment, homework6$immediate_ps_enrollment)
homework6$ba_or_higher.imp<-ifelse(is.na(homework6$ba_or_higher)==T, mcv.ba_or_higher, homework6$ba_or_higher)
homework6$repeated_grades.imp<-ifelse(is.na(homework6$repeated_grades)==T, mcv.repeated_grades, homework6$repeated_grades)
Since the above method is probably garbage, we can try multiple imputation:
homework6.basic.imp<-mice(data=els[,c("stu_id","strat_id","f3qwt","edu_expectations_met","ba_or_higher","immediate_ps_enrollment","repeated_grades","parents_college_exp","family_low_ses","non_trad_family","minority","male")],seed=22)
##
## iter imp variable
## 1 1 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 1 2 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 1 3 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 1 4 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 1 5 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 2 1 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 2 2 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 2 3 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 2 4 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 2 5 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 3 1 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 3 2 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 3 3 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 3 4 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 3 5 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 4 1 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 4 2 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 4 3 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 4 4 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 4 5 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 5 1 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 5 2 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 5 3 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 5 4 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
## 5 5 edu_expectations_met ba_or_higher immediate_ps_enrollment repeated_grades parents_college_exp family_low_ses non_trad_family minority male
homework6.imp<-complete(homework6.basic.imp)
print(homework6.basic.imp)
## Multiply imputed data set
## Call:
## mice(data = els[, c("stu_id", "strat_id", "f3qwt", "edu_expectations_met",
## "ba_or_higher", "immediate_ps_enrollment", "repeated_grades",
## "parents_college_exp", "family_low_ses", "non_trad_family",
## "minority", "male")], seed = 22)
## Number of multiple imputations: 5
## Missing cells per column:
## stu_id strat_id f3qwt
## 0 0 0
## edu_expectations_met ba_or_higher immediate_ps_enrollment
## 3345 2947 3556
## repeated_grades parents_college_exp family_low_ses
## 2944 178 953
## non_trad_family minority male
## 872 953 827
## Imputation methods:
## stu_id strat_id f3qwt
## "" "" ""
## edu_expectations_met ba_or_higher immediate_ps_enrollment
## "pmm" "pmm" "pmm"
## repeated_grades parents_college_exp family_low_ses
## "pmm" "pmm" "pmm"
## non_trad_family minority male
## "pmm" "pmm" "pmm"
## VisitSequence:
## edu_expectations_met ba_or_higher immediate_ps_enrollment
## 4 5 6
## repeated_grades parents_college_exp family_low_ses
## 7 8 9
## non_trad_family minority male
## 10 11 12
## PredictorMatrix:
## stu_id strat_id f3qwt edu_expectations_met
## stu_id 0 0 0 0
## strat_id 0 0 0 0
## f3qwt 0 0 0 0
## edu_expectations_met 1 0 1 0
## ba_or_higher 1 0 1 1
## immediate_ps_enrollment 1 0 1 1
## repeated_grades 1 0 1 1
## parents_college_exp 1 0 1 1
## family_low_ses 1 0 1 1
## non_trad_family 1 0 1 1
## minority 1 0 1 1
## male 1 0 1 1
## ba_or_higher immediate_ps_enrollment
## stu_id 0 0
## strat_id 0 0
## f3qwt 0 0
## edu_expectations_met 1 1
## ba_or_higher 0 1
## immediate_ps_enrollment 1 0
## repeated_grades 1 1
## parents_college_exp 1 1
## family_low_ses 1 1
## non_trad_family 1 1
## minority 1 1
## male 1 1
## repeated_grades parents_college_exp family_low_ses
## stu_id 0 0 0
## strat_id 0 0 0
## f3qwt 0 0 0
## edu_expectations_met 1 1 1
## ba_or_higher 1 1 1
## immediate_ps_enrollment 1 1 1
## repeated_grades 0 1 1
## parents_college_exp 1 0 1
## family_low_ses 1 1 0
## non_trad_family 1 1 1
## minority 1 1 1
## male 1 1 1
## non_trad_family minority male
## stu_id 0 0 0
## strat_id 0 0 0
## f3qwt 0 0 0
## edu_expectations_met 1 1 1
## ba_or_higher 1 1 1
## immediate_ps_enrollment 1 1 1
## repeated_grades 1 1 1
## parents_college_exp 1 1 1
## family_low_ses 1 1 1
## non_trad_family 0 1 1
## minority 1 0 1
## male 1 1 0
## Random generator seed value: 22
# original model, missing data removed
homework6.original<-els[,c("stu_id","strat_id","f3qwt","edu_expectations_met","ba_or_higher","immediate_ps_enrollment","repeated_grades","parents_college_exp","family_low_ses","non_trad_family","minority","male")] %>% filter(complete.cases(.))
fit1.des<-svydesign(ids=~stu_id,strata=~strat_id,weights=~f3qwt,data=homework6.original)
fit.1<-svyglm(ba_or_higher~immediate_ps_enrollment+repeated_grades+parents_college_exp+family_low_ses+non_trad_family+minority+male,fit1.des,family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
# simple mode imputation
fit2.des<-svydesign(ids=~stu_id,strata=~strat_id,weights=~f3qwt,data=homework6)
fit.2<-svyglm(ba_or_higher.imp~immediate_ps_enrollment.imp+repeated_grades.imp+parents_college_exp.imp+family_low_ses.imp+non_trad_family.imp+minority.imp+male.imp,fit2.des,family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
# multiple imputation
fit3.des<-svydesign(ids=~stu_id,strata=~strat_id,weights=~f3qwt,data=homework6.imp)
fit.3<-svyglm(ba_or_higher~immediate_ps_enrollment+repeated_grades+parents_college_exp+family_low_ses+non_trad_family+minority+male,fit3.des,family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
Now we can examine the results ourselves:
stargazer(fit.1,fit.2,fit.3,type="html",align=T,ci=T)
| Dependent variable: | |||
| ba_or_higher | ba_or_higher.imp | ba_or_higher | |
| (1) | (2) | (3) | |
| immediate_ps_enrollment | 2.693*** | 2.763*** | |
| (2.490, 2.896) | (2.587, 2.938) | ||
| repeated_grades | -0.773*** | -0.701*** | |
| (-1.026, -0.519) | (-0.922, -0.479) | ||
| parents_college_exp | 1.513*** | 1.344*** | |
| (1.223, 1.803) | (1.107, 1.581) | ||
| family_low_ses | -0.619*** | -0.693*** | |
| (-0.790, -0.449) | (-0.841, -0.544) | ||
| non_trad_family | -0.449*** | -0.394*** | |
| (-0.573, -0.326) | (-0.502, -0.285) | ||
| minority | -0.416*** | -0.417*** | |
| (-0.541, -0.292) | (-0.527, -0.307) | ||
| male | -0.207*** | -0.182*** | |
| (-0.322, -0.091) | (-0.285, -0.079) | ||
| immediate_ps_enrollment.imp | 2.749*** | ||
| (2.571, 2.927) | |||
| repeated_grades.imp | -0.936*** | ||
| (-1.168, -0.704) | |||
| parents_college_exp.imp | 1.393*** | ||
| (1.157, 1.629) | |||
| family_low_ses.imp | -0.675*** | ||
| (-0.825, -0.526) | |||
| non_trad_family.imp | -0.399*** | ||
| (-0.508, -0.290) | |||
| minority.imp | -0.424*** | ||
| (-0.534, -0.314) | |||
| male.imp | -0.129** | ||
| (-0.231, -0.027) | |||
| Constant | -3.530*** | -3.624*** | -3.490*** |
| (-3.869, -3.192) | (-3.903, -3.344) | (-3.769, -3.210) | |
| Observations | 10,012 | 16,197 | 16,197 |
| Log Likelihood | -4,551.771 | -7,323.200 | -7,162.345 |
| Akaike Inf. Crit. | 9,119.541 | 14,662.400 | 14,340.690 |
| Note: | p<0.1; p<0.05; p<0.01 | ||
We see from this example that all three models provide roughly similar results (all in the same direction, and relatively close coefficients). Interestingly, the original model (with all missing cases removed) has a lower AIC and thus potentially a better fit than the two imputed models.