Google doc for feedback

Summary

The pooled results from models ran on the 25 imputed datasets are not very different from unimputed models. Even though I am not aware of canonical ways to evaluate the quality of the imputation, I calculated the average correlation for the imputed data in the 25 imputed datasets, and ICC1ks as a measure of agreement between the imputed values when the imputed data was binary. If this is a good indicator of imputation quality, we should be worried about using imputed data for school type.

Imputation quality

The correlations among imputed data for each dataset was modest for standardized testing, somewhat lower for graduation rates, and agreement was very low for school type. The average correlation for imputed GPA in the full sample is shown for reference, even though that column will not be used in analyses.

name Full Dataset GPA Dataset
GPA .31
Standardized test .38 .44
4-year graduation rates .16 .14
6-year graduation rates .23 .22
Non-Title I public school .01 .04
Title I public school .01 .03
Private school .01 .00
Homeschool .01 .04

Imputed models

term gpa full
grad4 grad6 grad4 grad6
Not controlling for instutional graduation rates
(Intercept) 1.04** 1.05*** 1.09*** 1.10***
Self.Concordant.Motivation 1.00 1.00 1.00 1.00†
Goal 1.00 1.01 1.00 1.00*
Leadership 1.02** 1.02*** 1.02*** 1.02***
Learning 1.01* 1.01** 1.01*** 1.02***
Perseverance .99† 1.01† 1.00 1.01**
Self.Transcendence 1.01† 1.02*** 1.02*** 1.03***
Teamwork 1.00 1.00 1.01*** 1.01***
stdtest 1.05*** 1.09*** 1.11*** 1.17***
gpa 1.11*** 1.16*** - -
OSAn 1.04*** 1.08*** 1.05*** 1.10***
OSAt 1.01 1.03*** 1.00* 1.04***
OSAsport 1.01** 1.02*** 1.01*** 1.02***
sexM .91*** .88*** .89*** .87***
parentdegreeOne 1.07*** 1.12*** 1.05*** 1.10***
parentdegreeTwo 1.08*** 1.17*** 1.04*** 1.14***
parentmarriedOther .94*** .88*** .93*** .89***
firstlanguageOther .96** .86*** .99 .89***
raceBlack .90*** .91*** .86*** .89***
raceLatino .92*** .97 .89*** .95***
raceAsian .93*** .90*** .90*** .89***
raceachieve .93*** .90*** .90*** .90***
raceMissing .98 .94*** 1.00 .94***
hstypehs_pubT1 .97* .95*** .94*** .94***
hstypehs_priv 1.02 .99 1.00 .92***
hstypehs_home .89† .87* .79*** .72***
Controlling for instutional graduation rates
(Intercept) 1.08*** 1.12*** 1.13*** 1.15***
Self.Concordant.Motivation 1.00 1.00 1.00 1.00
Goal 1.00 1.00 1.00† 1.00
Leadership 1.02** 1.02*** 1.01*** 1.02***
Learning 1.01* 1.01** 1.01*** 1.02***
Perseverance .99* 1.00 1.00 1.00
Self.Transcendence 1.01 1.02*** 1.02*** 1.02***
Teamwork 1.00 1.00 1.01*** 1.01***
stdtest .98* 1.01 1.02*** 1.06***
gpa 1.09*** 1.13*** - -
grad4rates 1.22*** - 1.24*** -
OSAn 1.01* 1.05*** 1.02*** 1.07***
OSAt 1.00 1.02*** 1.00* 1.03***
OSAsport 1.01 1.02*** 1.00 1.01***
sexM .91*** .88*** .89*** .87***
parentdegreeOne 1.07*** 1.10*** 1.05*** 1.09***
parentdegreeTwo 1.06*** 1.13*** 1.01 1.09***
parentmarriedOther .94*** .89*** .94*** .90***
firstlanguageOther .94*** .84*** .97*** .86***
raceBlack .87*** .88*** .85*** .87***
raceLatino .90*** .94** .89*** .93***
raceAsian .92*** .88*** .91*** .87***
raceachieve .93*** .88*** .90*** .89***
raceMissing .97* .93*** .99† .94***
hstypehs_pubT1 .97* .95*** .94*** .94***
hstypehs_priv .97** .95*** .96*** .90***
hstypehs_home .90† .89* .81*** .75***
grad6rates - 1.26*** - 1.28***

Important notes on procedure

Imputation was done with pmm (predictive mean matching) for continuous varibles and polyreg (polytomous logistic regression) for unordered factors (i.e., school type).

The models below were fit independently in the m = 25 imputed datasets. The results were then pooled according to Rubin’s rules (Rubin, 1987, p. 76). Note: I learned that by using complete(mice(incomplete_dataset)) I was just using the first imputation, and discarding all the rest.

The imputation was ran separately in the full sample (N = 307,254) and the gpa sample (N = 43,667) separately. One remaining question is why should we run these separately? I would think that in doing so we are disregarding some information in the dataset in doing the imputation.