This project is a secondary analysis of NHANES data to characterize and model demographic and dietary factors to predict undiagnosed type II diabetes mellitus among United States Adults.
Type II diabetes mellitus (T2D) is an endocrine disorder characterized by the body’s inability or reduced ability to metabolize carbohydrates due to impaired insulin response. The prevalence of T2D has increased since the 1990s from around 7% to 12.3% in 2011-2014, mirroring the trend in rising obesity rates (1). Undiagnosed diabetes Mellitus (UDM) occurs in the asymptomatic phase of type 2 diabetes. UDM can have serious ocular, renal, and cardiovascular complications that occur before treatment can occur. In addition to health complications, each case of UDM was estimated to cost $4,030 in 2012, contributing to the total $322 billion cost of diagnosed, undiagnosed, gestational, and prediabetes int he United STates at that time (2). In 2017, the CDC estimated there were 7.2 million cases of UDM (3). Effective management of the condition can help alleviate both direct and indirect costs and reduce associated comorbidity. The ability to diagnose diabetes among the undiagnosed is an important step towards reducing the national burden of diabetes.
NHANES is a United States nationally representative survey designed to assess nutritional status of the United States civilian, noninstitutionalized population (4). NHANES is a comprehensive survey, including physical examinations, laboratory analyses, questionnaires, and demographic information, and representativeness of the population including minority and underrepresented groups are ensured with weighting and sampling methods (5). NHANES provides a comprehensive database of factors that could affect health and diabetes, specifically, including dietary recalls and HbA1c laboratory results. Because type 2 diabetes and UDM are affected by a multitude of lifestyle and medical factors, including diet, genetics, and demographics, NHANES provides a useful tool for exploring the relationship between these factors. Kavakiotis et al. (6), in their systematic review of machine learning and data mining techniques in diabetes research, describe the hope that machine learning of NHANES data and similar data sources could be linked to decision-making support tools in diagnosis and treatment of diabetes. Their 2017 review included one analysis of NHANES data by Lee and Giraud-Carrier (2013), in which the researchers applied association rule mining and clustering algorithms to explore relationships between responses to health questionnaires and diabetes and hypertension in NHANES. Given the burden of UDM in the United States, this project will explore factors within the NHANES dataset which predict UDM as assessed by HbA1c level to create a low-cost way to identify those at risk.
This project applies principles of biomedical informatics to address the population-level health issue of UDM. It incorporates machine learning techniques to explore an integration of medical and social determinants of health that can be achieved through the comprehensive NHANES database. This topic has the potential to inform clinical practitioners and regional and national policy makers as to potential areas to focus resources. I have met with Dr. Christina Roberto, Dr. Laura Gibson, and Helen Yan. These advisors recommended narrowing my topic to a more specific issue and to frame the issue as exploratory since the cross-sectional data is week for inferring causality, as well as using imputation to fill in missing data.
Sample restricted to nonpregnant adults age 20-79 with no past diagnosis of diabetes by a health professional
NHANES Modules Demographics, Medical Screener Questionnaire, Examination, Laboratory, Dietary
nutrient intake day 2 (DR2TOT)
nhanesDemo14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.XPT")) #demographic
nhanesDIQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DIQ_H.XPT")) #diabetes
nhanesMSQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/MCQ_H.XPT")) #medical condition screener
nhanesALQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/ALQ_H.XPT")) #alcohol use
nhanesSMQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/SMQ_H.XPT")) #smoking
nhanesDBQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DBQ_H.XPT")) #dietary behavior
nhanesHIQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/HIQ_H.XPT")) #health insurance
nhanesFSQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/FSQ_H.XPT")) #food security
nhanesExam14Body <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/BMX_H.XPT")) #body measurements
nhanesExam14BP <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/BPX_H.XPT")) #blood pressure
nhanesLab14HBA1C <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/GHB_H.XPT")) #HbA1c
nhanesNutrient141 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DR1TOT_H.XPT")) #nutrient data day 1
nhanesNutrient142 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DR2TOT_H.XPT")) #nutrient data day 2
nhanesData14 <- inner_join(nhanesLab14HBA1C, nhanesDemo14, by= "SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesDIQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesMSQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesALQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesSMQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesDBQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesHIQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesFSQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesExam14Body, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesExam14BP, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesNutrient141, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesNutrient142, by="SEQN")
#nhanesData14 <- nhanesData14[!duplicated(nhanesData14$SEQN), ] #removing duplicate IDs created by joining
#rm(list = c("nhanesData14"))nhanesDemo12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DEMO_G.XPT")) #demographic
nhanesDIQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DIQ_G.XPT")) #diabetes
nhanesMSQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/MCQ_G.XPT")) #medical condition screener
nhanesALQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/ALQ_G.XPT")) #alcohol use
nhanesSMQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/SMQ_G.XPT")) #smoking
nhanesDBQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DBQ_G.XPT")) #dietary behavior
nhanesHIQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/HIQ_G.XPT")) #health insurance
nhanesFSQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/FSQ_G.XPT")) #food security
nhanesExam12Body <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/BMX_G.XPT")) #body measurements
nhanesExam12BP <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/BPX_G.XPT")) #blood pressure
nhanesLab12HBA1C <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/GHB_G.XPT")) #HbA1c
nhanesNutrient121 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DR1TOT_G.XPT")) #nutrient data day 1
nhanesNutrient122 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DR2TOT_G.XPT")) #nutrient data day 2
nhanesData12 <- inner_join(nhanesLab12HBA1C, nhanesDemo12, by= "SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesDIQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesMSQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesALQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesSMQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesDBQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesHIQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesFSQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesExam12Body, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesExam12BP, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesNutrient121, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesNutrient122, by="SEQN")
#nhanesData12 <- nhanesData12[!duplicated(nhanesData12$SEQN), ] #removing duplicate IDs created by joining
#rm(list = c("nhanesData14Subset"))#rm(list = c("nhanesDataFull"))
nhanesDataFull <- nhanesData %>%
select(id=SEQN,
weightQ2Yr= WTINT2YR,
weightE2Yr= WTMEC2YR,
hba1cLevel=LBXGH,
age=RIDAGEYR,
gender=RIAGENDR,
race=RIDRETH3,
education=DMDEDUC2,
marital=DMDMARTL,
PIR=INDFMPIR,
pregnant=RIDEXPRG,
#diabetes
diagnosed=DIQ010,
famHistory=DIQ175A,
#med conditions
controlWeight=MCQ370A,
increaseExercise=MCQ370B,
#alcohol
alcohol=ALQ120Q,
#smoking
smoke=SMQ040,
#dietary behavior
diet=DBQ700,
#health insurance
insurance=HIQ011,
#food security
snapCurrent=FSD230,
foodSecure=FSDAD,
#body measurement
bmi=BMXBMI,
#blood pressure,
systolic=BPXSY1,
diastolic=BPXDI1) %>%
mutate(totKcal1=nhanesData$DR1TKCAL,
totKcal2= nhanesData$DR2TKCAL,
totProt1=nhanesData$DR1TPROT,
totProt2=nhanesData$DR2TPROT,
totCarb1=nhanesData$DR1TCARB,
totCarb2=nhanesData$DR2TCARB,
totSugr1=nhanesData$DR1TSUGR,
totSugr2=nhanesData$DR2TSUGR,
totFibe1=nhanesData$DR1TFIBE,
totFibe2=nhanesData$DR2TFIBE,
totFat1=nhanesData$DR1TTFAT,
totFat2=nhanesData$DR2TTFAT)
nhanesDataFull <- nhanesDataFull %>%
filter(age >19) %>%
filter(age <80)
nhanesDataFull <- nhanesDataFull %>%
filter(pregnant != 1 | is.na(nhanesDataFull$pregnant==TRUE))
nhanesDataFull <- nhanesDataFull %>%
filter(alcohol<700) #filtering less than 700 because 777 and 999 are refused and don't know
nhanesDataFull <- nhanesDataFull %>%
mutate(ageCut=cut(nhanesDataFull$age, c(20,30,40,50,60,70,81))) %>%
mutate(gender=factor(gender, levels=c(1, 2), labels=c("male", "female"))) %>%
mutate(race=factor(race, levels=c(3, 4, 1, 6, 2, 7), labels=c("white", "black", "MexA", "Asian", "Hispanic", "other"))) %>%
mutate(marital=factor(marital, levels=c(1,2,3,4,5,6,77,99), labels=c("partnered", "notPartnered", "notPartnered", "notPartnered","notPartnered", "partnered", "notPartnered", "notPartnered"))) %>%
mutate(education=factor(education, levels=c(1,2,3,4,5), labels=c("lesshigh", "lesshigh", "high", "somecollege", "college"))) %>%
mutate(bmiCut=cut(bmi, c(0, 25, 30, 100), labels=c("0-25", "25-30", "30+"))) %>%
mutate(controlWeight=factor(controlWeight, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
mutate(increaseExercise=factor(increaseExercise, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
mutate(smoke=factor(smoke, levels=c(1,2,3,7,9), labels=c("yes", "yes", "no", "no", "no"))) %>%
mutate(diet=factor(diet, levels=c(1,2,3,4,5), labels=c("excellent", "verygood", "good", "fair", "poor"))) %>%
mutate(insurance=factor(insurance, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
mutate(foodSecure=factor(foodSecure, levels=c(1,2,3,4), labels=c("full", "marginal", "low", "verylow"))) %>%
mutate(snapCurrent=factor(snapCurrent, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
mutate(PIR=cut(PIR, c(0,1,1.33, 1.5, 1.85, 5.1), labels=c("0-1", "1-1.32", "1.33-1.49", "1.5-1.84", "1.85-5.0"))) %>%
mutate(diagnosedCode = factor(diagnosed, levels=c(1,2,3), labels=c("yes", "no", "borderline")))
nhanesDataFull <- nhanesDataFull %>%
mutate(hba1cCutoff = ifelse(hba1cLevel<6.5, 1, 0))
nhanesDataFull <- subset(nhanesDataFull, select=-c(pregnant)) #dropping pregnant and diagnosed because only used to filter
#nhanesDataFull <- subset(nhanesDataFull, select=-c(weightQ2Yr)) #dropping the weights until I can figure out how to apply them to the models
#nhanesDataFull <- subset(nhanesDataFull, select=-c(weightE2Yr))
wpct(nhanesDataFull$diagnosedCode, nhanesDataFull$hba1cCutoff)## yes no borderline
## 0.04424911 0.93384982 0.02190107
## $x
## [1] "yes" "no" "borderline"
##
## $sum.of.weights
## [1] 32331984 308093564 8507146
## yes no borderline
## 0.09265966 0.88295986 0.02438048
12% of the sample (unweighted) had a diagnosis of diabetes. Of the 85% with no diagnosis, 2% had HbA1c levels above 6.5, which corresponds to about 6.5 million people.
## # A tibble: 6 x 38
## id weightQ2Yr weightE2Yr hba1cLevel age gender race education marital
## <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <fct> <fct> <fct>
## 1 62169 14392. 14784. 5.4 21 male Asian high notPar~
## 2 62172 26961. 27123. 5.6 43 female black high notPar~
## 3 62176 53831. 54203. 5.4 34 female white college partne~
## 4 62179 16590. 17115. 5.7 55 male Asian college partne~
## 5 62180 20458. 22616. 5.3 35 male white college partne~
## 6 62184 15601. 15236. 4.5 26 male black high notPar~
## # ... with 29 more variables: PIR <fct>, famHistory <dbl>, controlWeight <fct>,
## # increaseExercise <fct>, alcohol <dbl>, smoke <fct>, diet <fct>,
## # insurance <fct>, snapCurrent <fct>, foodSecure <fct>, bmi <dbl>,
## # systolic <dbl>, diastolic <dbl>, totKcal1 <dbl>, totKcal2 <dbl>,
## # totProt1 <dbl>, totProt2 <dbl>, totCarb1 <dbl>, totCarb2 <dbl>,
## # totSugr1 <dbl>, totSugr2 <dbl>, totFibe1 <dbl>, totFibe2 <dbl>,
## # totFat1 <dbl>, totFat2 <dbl>, ageCut <fct>, bmiCut <fct>,
## # diagnosedCode <fct>, hba1cCutoff <dbl>
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 31.00 44.00 45.22 59.00 79.00
## Warning: Unknown or uninitialised column: 'agecut'.
## < table of extent 0 >
##
## male female
## 0 91 71
## 1 3323 2945
##
## white black MexA Asian Hispanic other
## 0 33 56 24 22 22 5
## 1 2748 1348 722 651 593 206
##
## lesshigh high somecollege college
## 0 47 43 46 26
## 1 1144 1332 2025 1767
##
## partnered notPartnered
## 0 95 67
## 1 3653 2615
##
## 0-1 1-1.32 1.33-1.49 1.5-1.84 1.85-5.0
## 0 34 28 10 10 67
## 1 1240 644 205 403 3311
##
## yes no
## 0 112 50
## 1 3746 2522
##
## yes no
## 0 94 68
## 1 3720 2548
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.000 2.000 4.039 4.000 365.000
##
## yes no
## 0 37 50
## 1 1527 1438
##
## excellent verygood good fair poor
## 0 7 22 77 45 10
## 1 538 1329 2675 1386 339
##
## yes no
## 0 116 46
## 1 4707 1561
##
## yes no
## 0 30 2
## 1 1207 176
##
## full marginal low verylow
## 0 100 21 19 19
## 1 4357 693 658 523
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 13.60 23.80 27.40 28.53 31.80 82.90 47
##
## 0-25 25-30 30+
## 0 15 43 102
## 1 2122 2058 2046
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 66.0 110.0 118.0 121.5 130.0 230.0 433
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 64.0 72.0 71.3 78.0 122.0 433
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 1538 2050 2240 2726 13687 270
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 54.70 77.86 85.43 106.06 557.87 270
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 178.9 246.0 268.8 330.5 1815.0 270
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 63.3 102.3 119.7 153.8 1048.5 270
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 9.60 14.80 17.27 22.40 107.00 270
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 50.98 74.85 84.05 106.72 553.79 270
##
## 0 1
## 162 6268
#percent missing for each variable
p <- function(x) {sum(is.na(x))/length(x)*100}
pmissing <- apply(nhanesDataFull,2,p)
pmissing## id weightQ2Yr weightE2Yr hba1cLevel
## 0.00000000 0.00000000 0.00000000 3.06045530
## age gender race education
## 0.00000000 0.00000000 0.00000000 0.01507613
## marital PIR famHistory controlWeight
## 0.00000000 7.55314337 78.29036635 0.00000000
## increaseExercise alcohol smoke diet
## 0.00000000 0.00000000 52.25388210 0.03015227
## insurance snapCurrent foodSecure bmi
## 0.00000000 77.74762551 0.61812151 0.70857832
## systolic diastolic totKcal1 totKcal2
## 6.52796623 6.52796623 4.07055631 14.39770843
## totProt1 totProt2 totCarb1 totCarb2
## 4.07055631 14.39770843 4.07055631 14.39770843
## totSugr1 totSugr2 totFibe1 totFibe2
## 4.07055631 14.39770843 4.07055631 14.39770843
## totFat1 totFat2 ageCut bmiCut
## 4.07055631 14.39770843 2.27649631 0.70857832
## diagnosedCode hba1cCutoff
## 0.00000000 3.06045530
#76% missing from snapCurrent, so changing missing to "no" because missing are filtered from previous question asing if they recieved snap benefits in past 12 months. 12 mo not used because vars are different between 11-12 and 13-14. prior question was "ever received" and people who said no were excluded from currently receive.
nhanesDataFull$snapCurrent <- fct_explicit_na(nhanesDataFull$snapCurrent, na_level = "no")
#41.8% of smoking missing, but coding missing to "no" because prior question asks "smoked at least 100 cigarettes in life" and if answer is no/ref/dk, skip to end of questionnaire. Missing values for this question, "do you now smoke cigarettes" are skipped questions due to no cigarette use.
nhanesDataFull$smoke <- fct_explicit_na(nhanesDataFull$smoke, na_level = "no")
pmissing <- apply(nhanesDataFull,2,p)
pmissing## id weightQ2Yr weightE2Yr hba1cLevel
## 0.00000000 0.00000000 0.00000000 3.06045530
## age gender race education
## 0.00000000 0.00000000 0.00000000 0.01507613
## marital PIR famHistory controlWeight
## 0.00000000 7.55314337 78.29036635 0.00000000
## increaseExercise alcohol smoke diet
## 0.00000000 0.00000000 0.00000000 0.03015227
## insurance snapCurrent foodSecure bmi
## 0.00000000 0.00000000 0.61812151 0.70857832
## systolic diastolic totKcal1 totKcal2
## 6.52796623 6.52796623 4.07055631 14.39770843
## totProt1 totProt2 totCarb1 totCarb2
## 4.07055631 14.39770843 4.07055631 14.39770843
## totSugr1 totSugr2 totFibe1 totFibe2
## 4.07055631 14.39770843 4.07055631 14.39770843
## totFat1 totFat2 ageCut bmiCut
## 4.07055631 14.39770843 2.27649631 0.70857832
## diagnosedCode hba1cCutoff
## 0.00000000 3.06045530
## id weightQ2Yr weightE2Yr age gender race marital controlWeight
## 1057 1 1 1 1 1 1 1 1
## 3669 1 1 1 1 1 1 1 1
## 94 1 1 1 1 1 1 1 1
## 427 1 1 1 1 1 1 1 1
## 68 1 1 1 1 1 1 1 1
## 261 1 1 1 1 1 1 1 1
## 10 1 1 1 1 1 1 1 1
## 30 1 1 1 1 1 1 1 1
## 66 1 1 1 1 1 1 1 1
## 225 1 1 1 1 1 1 1 1
## 8 1 1 1 1 1 1 1 1
## 26 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 19 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1
## 35 1 1 1 1 1 1 1 1
## 140 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 19 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 32 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1
## 20 1 1 1 1 1 1 1 1
## 88 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 33 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 9 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 8 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 23 1 1 1 1 1 1 1 1
## 75 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1
## 15 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 20 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 19 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 0 0 0 0 0 0 0 0
## increaseExercise alcohol smoke insurance snapCurrent diagnosedCode
## 1057 1 1 1 1 1 1
## 3669 1 1 1 1 1 1
## 94 1 1 1 1 1 1
## 427 1 1 1 1 1 1
## 68 1 1 1 1 1 1
## 261 1 1 1 1 1 1
## 10 1 1 1 1 1 1
## 30 1 1 1 1 1 1
## 66 1 1 1 1 1 1
## 225 1 1 1 1 1 1
## 8 1 1 1 1 1 1
## 26 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 19 1 1 1 1 1 1
## 4 1 1 1 1 1 1
## 35 1 1 1 1 1 1
## 140 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 19 1 1 1 1 1 1
## 5 1 1 1 1 1 1
## 32 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1
## 20 1 1 1 1 1 1
## 88 1 1 1 1 1 1
## 5 1 1 1 1 1 1
## 33 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 9 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 8 1 1 1 1 1 1
## 4 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 23 1 1 1 1 1 1
## 75 1 1 1 1 1 1
## 6 1 1 1 1 1 1
## 15 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 6 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1
## 20 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 5 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 19 1 1 1 1 1 1
## 3 1 1 1 1 1 1
## 6 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1
## 0 0 0 0 0 0
## education diet foodSecure bmi bmiCut ageCut hba1cLevel hba1cCutoff
## 1057 1 1 1 1 1 1 1 1
## 3669 1 1 1 1 1 1 1 1
## 94 1 1 1 1 1 1 1 1
## 427 1 1 1 1 1 1 1 1
## 68 1 1 1 1 1 1 1 1
## 261 1 1 1 1 1 1 1 1
## 10 1 1 1 1 1 1 1 1
## 30 1 1 1 1 1 1 1 1
## 66 1 1 1 1 1 1 1 1
## 225 1 1 1 1 1 1 1 1
## 8 1 1 1 1 1 1 1 1
## 26 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 19 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1
## 35 1 1 1 1 1 1 1 1
## 140 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 19 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 32 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1
## 20 1 1 1 1 1 1 0 0
## 88 1 1 1 1 1 1 0 0
## 5 1 1 1 1 1 1 0 0
## 33 1 1 1 1 1 1 0 0
## 3 1 1 1 1 1 1 0 0
## 9 1 1 1 1 1 1 0 0
## 3 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 3 1 1 1 1 1 1 0 0
## 8 1 1 1 1 1 1 0 0
## 4 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 4 1 1 1 1 1 1 0 0
## 3 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 23 1 1 1 1 1 0 1 1
## 75 1 1 1 1 1 0 1 1
## 6 1 1 1 1 1 0 1 1
## 15 1 1 1 1 1 0 1 1
## 2 1 1 1 1 1 0 1 1
## 6 1 1 1 1 1 0 1 1
## 1 1 1 1 1 1 0 1 1
## 1 1 1 1 1 1 0 1 1
## 5 1 1 1 1 1 0 1 1
## 2 1 1 1 1 1 0 1 1
## 1 1 1 1 1 1 0 1 1
## 1 1 1 1 1 1 0 1 1
## 1 1 1 1 1 1 0 0 0
## 3 1 1 1 1 1 0 0 0
## 1 1 1 1 1 1 0 0 0
## 3 1 1 1 1 1 0 0 0
## 1 1 1 1 1 1 0 0 0
## 1 1 1 1 1 1 0 0 0
## 5 1 1 1 0 0 1 1 1
## 20 1 1 1 0 0 1 1 1
## 2 1 1 1 0 0 1 1 1
## 2 1 1 1 0 0 1 1 1
## 3 1 1 1 0 0 1 1 1
## 1 1 1 1 0 0 1 1 1
## 1 1 1 1 0 0 1 1 1
## 2 1 1 1 0 0 1 1 1
## 5 1 1 1 0 0 1 1 1
## 1 1 1 1 0 0 1 1 1
## 2 1 1 1 0 0 1 1 1
## 1 1 1 1 0 0 1 0 0
## 1 1 1 1 0 0 1 0 0
## 1 1 1 1 0 0 1 0 0
## 3 1 1 0 1 1 1 1 1
## 19 1 1 0 1 1 1 1 1
## 3 1 1 0 1 1 1 1 1
## 6 1 1 0 1 1 1 1 1
## 2 1 1 0 1 1 1 1 1
## 1 1 1 0 1 1 1 1 1
## 1 1 1 0 1 1 1 1 1
## 2 1 1 0 1 1 1 1 1
## 1 1 1 0 1 1 1 0 0
## 1 1 1 0 1 1 0 1 1
## 1 1 1 0 1 1 0 1 1
## 1 1 1 0 1 1 0 1 1
## 1 1 0 1 1 1 1 1 1
## 1 1 0 1 1 1 1 1 1
## 1 0 1 1 1 1 1 0 0
## 1 2 41 47 47 151 203 203
## totKcal1 totProt1 totCarb1 totSugr1 totFibe1 totFat1 systolic diastolic
## 1057 1 1 1 1 1 1 1 1
## 3669 1 1 1 1 1 1 1 1
## 94 1 1 1 1 1 1 1 1
## 427 1 1 1 1 1 1 1 1
## 68 1 1 1 1 1 1 1 1
## 261 1 1 1 1 1 1 1 1
## 10 1 1 1 1 1 1 1 1
## 30 1 1 1 1 1 1 1 1
## 66 1 1 1 1 1 1 0 0
## 225 1 1 1 1 1 1 0 0
## 8 1 1 1 1 1 1 0 0
## 26 1 1 1 1 1 1 0 0
## 3 1 1 1 1 1 1 0 0
## 19 1 1 1 1 1 1 0 0
## 4 1 1 1 1 1 1 0 0
## 35 0 0 0 0 0 0 1 1
## 140 0 0 0 0 0 0 1 1
## 3 0 0 0 0 0 0 1 1
## 19 0 0 0 0 0 0 1 1
## 5 0 0 0 0 0 0 0 0
## 32 0 0 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0
## 20 1 1 1 1 1 1 1 1
## 88 1 1 1 1 1 1 1 1
## 5 1 1 1 1 1 1 1 1
## 33 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 9 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 0 0
## 8 1 1 1 1 1 1 0 0
## 4 1 1 1 1 1 1 0 0
## 1 0 0 0 0 0 0 1 1
## 4 0 0 0 0 0 0 1 1
## 3 0 0 0 0 0 0 1 1
## 1 0 0 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0 0
## 1 0 0 0 0 0 0 0 0
## 23 1 1 1 1 1 1 1 1
## 75 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1
## 15 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 0 0
## 5 1 1 1 1 1 1 0 0
## 2 1 1 1 1 1 1 0 0
## 1 0 0 0 0 0 0 1 1
## 1 0 0 0 0 0 0 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 0 0 0 0 0 0 1 1
## 5 1 1 1 1 1 1 1 1
## 20 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 0 0
## 2 0 0 0 0 0 0 1 1
## 5 0 0 0 0 0 0 1 1
## 1 0 0 0 0 0 0 1 1
## 2 0 0 0 0 0 0 0 0
## 1 1 1 1 1 1 1 1 1
## 1 0 0 0 0 0 0 1 1
## 1 0 0 0 0 0 0 0 0
## 3 1 1 1 1 1 1 1 1
## 19 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 0 0
## 1 0 0 0 0 0 0 1 1
## 1 0 0 0 0 0 0 1 1
## 2 0 0 0 0 0 0 0 0
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 0 0
## 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1
## 1 0 0 0 0 0 0 0 0
## 270 270 270 270 270 270 433 433
## PIR totKcal2 totProt2 totCarb2 totSugr2 totFibe2 totFat2 famHistory
## 1057 1 1 1 1 1 1 1 1 0
## 3669 1 1 1 1 1 1 1 0 1
## 94 1 0 0 0 0 0 0 1 6
## 427 1 0 0 0 0 0 0 0 7
## 68 0 1 1 1 1 1 1 1 1
## 261 0 1 1 1 1 1 1 0 2
## 10 0 0 0 0 0 0 0 1 7
## 30 0 0 0 0 0 0 0 0 8
## 66 1 1 1 1 1 1 1 1 2
## 225 1 1 1 1 1 1 1 0 3
## 8 1 0 0 0 0 0 0 1 8
## 26 1 0 0 0 0 0 0 0 9
## 3 0 1 1 1 1 1 1 1 3
## 19 0 1 1 1 1 1 1 0 4
## 4 0 0 0 0 0 0 0 0 10
## 35 1 0 0 0 0 0 0 1 12
## 140 1 0 0 0 0 0 0 0 13
## 3 0 0 0 0 0 0 0 1 13
## 19 0 0 0 0 0 0 0 0 14
## 5 1 0 0 0 0 0 0 1 14
## 32 1 0 0 0 0 0 0 0 15
## 1 0 0 0 0 0 0 0 1 15
## 4 0 0 0 0 0 0 0 0 16
## 20 1 1 1 1 1 1 1 1 2
## 88 1 1 1 1 1 1 1 0 3
## 5 1 0 0 0 0 0 0 1 8
## 33 1 0 0 0 0 0 0 0 9
## 3 0 1 1 1 1 1 1 1 3
## 9 0 1 1 1 1 1 1 0 4
## 3 0 0 0 0 0 0 0 1 9
## 1 0 0 0 0 0 0 0 0 10
## 3 1 1 1 1 1 1 1 1 4
## 8 1 1 1 1 1 1 1 0 5
## 4 1 0 0 0 0 0 0 0 11
## 1 1 0 0 0 0 0 0 1 14
## 4 1 0 0 0 0 0 0 0 15
## 3 0 0 0 0 0 0 0 0 16
## 1 1 0 0 0 0 0 0 1 16
## 1 1 0 0 0 0 0 0 0 17
## 1 0 0 0 0 0 0 0 0 18
## 23 1 1 1 1 1 1 1 1 1
## 75 1 1 1 1 1 1 1 0 2
## 6 1 0 0 0 0 0 0 1 7
## 15 1 0 0 0 0 0 0 0 8
## 2 0 1 1 1 1 1 1 1 2
## 6 0 1 1 1 1 1 1 0 3
## 1 0 0 0 0 0 0 0 0 9
## 1 1 1 1 1 1 1 1 1 3
## 5 1 1 1 1 1 1 1 0 4
## 2 1 0 0 0 0 0 0 0 10
## 1 1 0 0 0 0 0 0 0 14
## 1 0 0 0 0 0 0 0 0 15
## 1 1 1 1 1 1 1 1 1 3
## 3 1 1 1 1 1 1 1 0 4
## 1 1 0 0 0 0 0 0 1 9
## 3 1 0 0 0 0 0 0 0 10
## 1 0 1 1 1 1 1 1 0 5
## 1 1 0 0 0 0 0 0 1 15
## 5 1 1 1 1 1 1 1 1 2
## 20 1 1 1 1 1 1 1 0 3
## 2 0 1 1 1 1 1 1 1 3
## 2 0 1 1 1 1 1 1 0 4
## 3 1 1 1 1 1 1 1 0 5
## 1 1 0 0 0 0 0 0 1 10
## 1 0 1 1 1 1 1 1 0 6
## 2 1 0 0 0 0 0 0 1 14
## 5 1 0 0 0 0 0 0 0 15
## 1 0 0 0 0 0 0 0 0 16
## 2 1 0 0 0 0 0 0 0 17
## 1 1 1 1 1 1 1 1 1 4
## 1 1 0 0 0 0 0 0 0 17
## 1 1 0 0 0 0 0 0 0 19
## 3 0 1 1 1 1 1 1 1 2
## 19 0 1 1 1 1 1 1 0 3
## 3 0 0 0 0 0 0 0 1 8
## 6 0 0 0 0 0 0 0 0 9
## 2 0 1 1 1 1 1 1 0 5
## 1 0 0 0 0 0 0 0 1 14
## 1 0 0 0 0 0 0 0 0 15
## 2 0 0 0 0 0 0 0 0 17
## 1 0 1 1 1 1 1 1 0 5
## 1 0 1 1 1 1 1 1 0 4
## 1 0 0 0 0 0 0 0 0 10
## 1 0 1 1 1 1 1 1 0 6
## 1 1 1 1 1 1 1 1 1 1
## 1 1 0 0 0 0 0 0 0 8
## 1 0 0 0 0 0 0 0 0 19
## 501 955 955 955 955 955 955 5193 14605
##
## iter imp variable
## 1 1 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 1 2 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 1 3 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 1 4 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 1 5 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 2 1 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 2 2 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 2 3 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 2 4 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 2 5 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 3 1 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 3 2 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 3 3 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 3 4 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 3 5 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 4 1 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 4 2 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 4 3 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 4 4 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 4 5 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 5 1 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 5 2 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 5 3 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 5 4 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## 5 5 hba1cLevel education PIR diet foodSecure bmi systolic diastolic totKcal1 totKcal2 totProt1 totProt2 totCarb1 totCarb2 totSugr1 totSugr2 totFibe1 totFibe2 totFat1 totFat2
## Warning: Number of logged events: 1
library(mice)
impute1 <- mice::complete(impute, 1)
impute2 <- mice::complete(impute, 2)
impute3 <- mice::complete(impute, 3)
impute4 <- mice::complete(impute, 4)
impute5 <- mice::complete(impute, 5)
imputeLong <- mice::complete(impute, "long", inc=FALSE)
imputeLong <- subset(imputeLong, select=-c(.imp))
imputeLong <- subset(imputeLong, select=-c(.id))
impute1 <- impute1 %>%
mutate(hba1cCutoff = ifelse(hba1cLevel<6.5, 0, 1)) %>%
mutate(totKcal=(totKcal1+totKcal2)/2) %>%
mutate(totProt=(totProt1 + totProt2)/2) %>%
mutate(totCarb=(totCarb1+totCarb2)/2) %>%
mutate(totSugr=(totSugr1+totSugr2)/2) %>%
mutate(totFibe=(totFibe1+totFibe2)/2) %>%
mutate(totFat=(totFat1+totFat2)/2)
imputeLong <- imputeLong %>%
mutate(hba1cCutoff = ifelse(hba1cLevel<6.5, 0, 1)) %>%
mutate(totKcal=(totKcal1+totKcal2)/2) %>%
mutate(totProt=(totProt1 + totProt2)/2) %>%
mutate(totCarb=(totCarb1+totCarb2)/2) %>%
mutate(totSugr=(totSugr1+totSugr2)/2) %>%
mutate(totFibe=(totFibe1+totFibe2)/2) %>%
mutate(totFat=(totFat1+totFat2)/2)
impute1 <- impute1 %>%
mutate(hba1cCutoffFactor = factor(hba1cCutoff, levels=c("0","1"), labels=c("no", "yes")))
#"no" = no undiagnosed diabetes (<6.5), "yes" = undiagnosed diabetes (>6.5)
imputeLong <- imputeLong %>%
mutate(hba1cCutoffFactor = factor(hba1cCutoff, levels=c("0","1"), labels=c("no", "yes")))
#table(imputeLong$hba1cCutoff)Demographic Factors * Age – (p<0.001) * Race - 4% of undiagnosed adults who were non-Hispanic Black or Other Hispanic had high HbA1c Levels (p<0.001) * Education - (p<0.001) * Income - between 133-149% of the poverty line had the highest proportion of high HbA1c levels (5%), compared to 2% among those with the incomes below the poverty line (p<0.001) * Not significant - Gender, marital status
Dietary Factors * BMI – 5% of adults with no diabetes diagnosis and a BMI over 30 had HbA1c Levels over 6.5 compared to 1% of adults with BMI <25 (p<0.001) * Food Security – 4% of undiagnosed adults with very low food security had high HbA1c Levels (p<0.001) * Diet quality – worse self rated diet quality associated with greater HbA1c (p=0.011) * Alcohol consumption (p=0.004) * Total calorie intake (p=0.014) * Total carbohydrate intake (p=0.029) * Not significant – protein, fiber, fat, smoking,
#chisqAgeCut <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$ageCut, weight=imputeLong$weightE2Yr)}
chisqAge <- summary(aov(age ~ hba1cCutoff, data=impute1))
chisqGender <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$gender, weight=impute1$weightE2Yr)}
chisqRace <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$race, weight=impute1$weightE2Yr)}
chisqEd <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$education, weight=impute1$weightE2Yr)}
chisqPir <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$PIR, weight=impute1$weightE2Yr)}
chisqBmi <- summary(aov(bmi ~ hba1cCutoff, data=imputeLong))
chisqFamHist <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$famHist, weight=impute1$weightE2Yr)}
chisqInsurance <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$insurance, weight=impute1$weightE2Yr)}
chisqMar <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$marital, weight=impute1$weightE2Yr)}
chisqFood <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$foodSecure, weight=impute1$weightE2Yr)}
chisqSmoke <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$smoke, weight=impute1$weightE2Yr)}
chisqAlcohol <- summary(aov(alcohol ~ hba1cCutoff, data=imputeLong))
chisqDiet <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$diet, weight=impute1$weightE2Yr)}
chisqSnap <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$snapCurrent, weight=impute1$weightE2Yr)}
#Nutrient factors
chisqKcal <- summary(aov(totKcal ~ hba1cCutoff, data=impute1))
chisqProt <- summary(aov(totProt ~ hba1cCutoff, data=impute1))
chisqCarb <- summary(aov(totCarb ~ hba1cCutoff, data=impute1))
chisqSugr <- summary(aov(totSugr ~ hba1cCutoff, data=impute1))
chisqFibe <- summary(aov(totFibe ~ hba1cCutoff, data=impute1))
chisqFat <- summary(aov(totFat ~ hba1cCutoff, data=impute1))nhanesLm1 <- lm(data=impute1, hba1cLevel ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet)
summary(nhanesLm1)##
## Call:
## lm(formula = hba1cLevel ~ age + race + education + PIR + bmi +
## insurance + foodSecure + totKcal + totProt + totCarb + totSugr +
## totFat + diet, data = impute1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0674 -0.2493 -0.0183 0.2003 8.4166
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.392e+00 5.069e-02 86.640 < 2e-16 ***
## age 1.128e-02 4.327e-04 26.080 < 2e-16 ***
## raceblack 1.638e-01 1.709e-02 9.585 < 2e-16 ***
## raceMexA 9.804e-02 2.284e-02 4.292 1.80e-05 ***
## raceAsian 2.271e-01 2.364e-02 9.607 < 2e-16 ***
## raceHispanic 1.124e-01 2.354e-02 4.775 1.83e-06 ***
## raceother 1.107e-01 3.673e-02 3.013 0.002598 **
## educationhigh -4.225e-02 2.098e-02 -2.013 0.044123 *
## educationsomecollege -8.837e-02 2.001e-02 -4.416 1.02e-05 ***
## educationcollege -1.004e-01 2.250e-02 -4.463 8.22e-06 ***
## PIR1-1.32 3.821e-02 2.383e-02 1.603 0.108888
## PIR1.33-1.49 -5.187e-03 3.651e-02 -0.142 0.887047
## PIR1.5-1.84 -2.226e-02 2.825e-02 -0.788 0.430709
## PIR1.85-5.0 -5.024e-03 1.874e-02 -0.268 0.788695
## bmi 1.538e-02 1.005e-03 15.301 < 2e-16 ***
## insuranceno 6.667e-02 1.625e-02 4.104 4.11e-05 ***
## foodSecuremarginal 1.325e-02 2.143e-02 0.618 0.536471
## foodSecurelow 4.662e-02 2.263e-02 2.060 0.039425 *
## foodSecureverylow 7.629e-02 2.497e-02 3.055 0.002262 **
## totKcal -1.632e-04 4.060e-05 -4.021 5.87e-05 ***
## totProt 5.770e-04 3.202e-04 1.802 0.071628 .
## totCarb 7.993e-04 2.309e-04 3.461 0.000541 ***
## totSugr -9.934e-05 2.054e-04 -0.484 0.628652
## totFat 1.770e-03 4.531e-04 3.908 9.41e-05 ***
## dietverygood 1.965e-02 2.624e-02 0.749 0.453829
## dietgood 5.139e-02 2.450e-02 2.098 0.035974 *
## dietfair 5.515e-02 2.699e-02 2.044 0.041020 *
## dietpoor 3.455e-02 3.667e-02 0.942 0.346148
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.522 on 6605 degrees of freedom
## Multiple R-squared: 0.1684, Adjusted R-squared: 0.165
## F-statistic: 49.55 on 27 and 6605 DF, p-value: < 2.2e-16
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.02533 0.00000 1.00000
## [1] "numeric"
options(scipen=999) #prevents ORs from being in scientific notation. 999 sets a high threshold for R to use scientific notation
#logistic regression
nhanesGlm <- glm(impute1$hba1cCutoffFactor ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet, data = impute1, family = binomial(logit))
summary(nhanesGlm)##
## Call:
## glm(formula = impute1$hba1cCutoffFactor ~ age + race + education +
## PIR + bmi + insurance + foodSecure + totKcal + totProt +
## totCarb + totSugr + totFat + diet, family = binomial(logit),
## data = impute1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2093 -0.2323 -0.1466 -0.0911 3.2746
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -10.7523711 0.7743729 -13.885 < 0.0000000000000002 ***
## age 0.0561902 0.0063070 8.909 < 0.0000000000000002 ***
## raceblack 1.1808274 0.2286575 5.164 0.000000241501 ***
## raceMexA 0.9262727 0.2964812 3.124 0.00178 **
## raceAsian 2.0399434 0.3177570 6.420 0.000000000136 ***
## raceHispanic 1.2073283 0.2953914 4.087 0.000043658112 ***
## raceother 0.7189765 0.5368657 1.339 0.18050
## educationhigh -0.0026625 0.2281543 -0.012 0.99069
## educationsomecollege -0.1900053 0.2332140 -0.815 0.41523
## educationcollege -0.4578604 0.2902654 -1.577 0.11471
## PIR1-1.32 0.5654851 0.2609508 2.167 0.03023 *
## PIR1.33-1.49 0.4507932 0.3843435 1.173 0.24084
## PIR1.5-1.84 -0.0451222 0.3422479 -0.132 0.89511
## PIR1.85-5.0 -0.0288647 0.2403372 -0.120 0.90440
## bmi 0.0876000 0.0097157 9.016 < 0.0000000000000002 ***
## insuranceno 0.3264765 0.2002705 1.630 0.10306
## foodSecuremarginal 0.2935464 0.2465230 1.191 0.23375
## foodSecurelow 0.0131159 0.2696133 0.049 0.96120
## foodSecureverylow 0.4299186 0.2926731 1.469 0.14185
## totKcal -0.0004712 0.0006884 -0.685 0.49366
## totProt 0.0045315 0.0047827 0.947 0.34340
## totCarb 0.0007941 0.0035236 0.225 0.82169
## totSugr 0.0006522 0.0028989 0.225 0.82198
## totFat 0.0042506 0.0072296 0.588 0.55657
## dietverygood 0.2396848 0.4228829 0.567 0.57086
## dietgood 0.6791478 0.3838441 1.769 0.07684 .
## dietfair 0.6741648 0.4072600 1.655 0.09785 .
## dietpoor 0.4865069 0.5115790 0.951 0.34161
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1566.8 on 6632 degrees of freedom
## Residual deviance: 1307.3 on 6605 degrees of freedom
## AIC: 1363.3
##
## Number of Fisher Scoring iterations: 7
## OR 2.5 % 97.5 %
## (Intercept) 0.00002139462 0.000004492566 0.00009393096
## age 1.05779883641 1.045024833523 1.07121044972
## raceblack 3.25706787827 2.094355182183 5.14781371962
## raceMexA 2.52507987330 1.400299674821 4.49831882173
## raceAsian 7.69017427812 4.093711862749 14.29833315732
## raceHispanic 3.34453702824 1.855766339707 5.93913681895
## raceother 2.05233167442 0.626848851005 5.35065476897
## educationhigh 0.99734106033 0.636632040928 1.56050451274
## educationsomecollege 0.82695472438 0.522977828604 1.30734118897
## educationcollege 0.63263575718 0.354905355392 1.11085371732
## PIR1-1.32 1.76030151674 1.050593043135 2.93284832215
## PIR1.33-1.49 1.56955666247 0.703726673907 3.21975444689
## PIR1.5-1.84 0.95588063417 0.472250400414 1.82477684502
## PIR1.85-5.0 0.97154786350 0.609800751505 1.56718592599
## bmi 1.09155136702 1.070913583366 1.11253683058
## insuranceno 1.38607563541 0.930191038657 2.04248464114
## foodSecuremarginal 1.34117540262 0.812344174491 2.14263558417
## foodSecurelow 1.01320228823 0.583294993838 1.68644025922
## foodSecureverylow 1.53713234016 0.845884259884 2.67774570261
## totKcal 0.99952890486 0.998069860812 1.00076838290
## totProt 1.00454175126 0.995199119489 1.01405717623
## totCarb 1.00079446438 0.994113885186 1.00797631949
## totSugr 1.00065245966 0.994970193799 1.00634299413
## totFat 1.00425968261 0.990808824503 1.01935284887
## dietverygood 1.27084845545 0.576902924263 3.09577684952
## dietgood 1.97219633459 0.985767989638 4.52543247381
## dietfair 1.96239336509 0.928437957719 4.67336474941
## dietpoor 1.62662428103 0.595976695227 4.55373398918
#box plot above vs below bmi
ggplot(data = impute1, aes(x = factor(hba1cCutoffFactor), y = age)) +
geom_boxplot() +
labs(title = "above/below Status by variable age") +
labs(x = "hba1c")#box plot above vs below age
ggplot(data = impute1, aes(x = factor(hba1cCutoffFactor), y = bmi)) +
geom_boxplot() +
labs(title = "above/below Status by variable bmi") +
labs(x = "hba1c")#factors significant in bivariate analysis
nhanesRF <- randomForest(hba1cCutoffFactor ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet, data = trainSet, importance = TRUE)
nhanesRF##
## Call:
## randomForest(formula = hba1cCutoffFactor ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet, data = trainSet, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 0.47%
## Confusion matrix:
## no yes class.error
## no 22632 2 0.00008836264
## yes 106 475 0.18244406196
## no yes MeanDecreaseAccuracy MeanDecreaseGini
## age 0.014458932 0.40380362 0.024210438 116.14789
## race 0.010268439 0.30047804 0.017537435 54.90022
## education 0.006328508 0.19512387 0.011057085 39.58542
## PIR 0.005741501 0.18482153 0.010238447 44.26545
## bmi 0.013267018 0.41744008 0.023397602 152.56936
## insurance 0.001677071 0.04755306 0.002824153 14.11301
## foodSecure 0.003750330 0.10225539 0.006213488 36.35857
## totKcal 0.020700740 0.18865957 0.024902645 120.38012
## totProt 0.013124202 0.18611076 0.017451322 128.64973
## totCarb 0.016817834 0.17448572 0.020767873 119.54479
## totSugr 0.012074805 0.17334510 0.016115944 122.08952
## totFat 0.014714244 0.18621095 0.019015413 126.85139
## diet 0.005287332 0.15173024 0.008947169 40.02254
# Predicting on Validation set
nhanesRFPrediction <- predict(nhanesRF, testSet, type = "class")
table(nhanesRFPrediction)## nhanesRFPrediction
## no yes
## 9726 224
## [1] 0.9963819
##
## nhanesRFPrediction no yes
## no 9691 35
## yes 1 223
The NHANES data only measures HbA1c and glucose (not included in this study) once. However, some studies indicate that confirmatory testing may be required to prevent over-estimation of UDM in the population (i.e. a single test may be an anomoloy) (7). Furthermore, as is always the case when working with NHANES data, the data are cross-sectional, thus no causal inference can be made without temporality. Self-reported data may be unreliable, including self-reported diabetes diagnosis and 24-hour dietary recalls. Specifically with dietary recall data, social-desireability bias may play an important role. These data lacked information about the geographical distribution of the cases. Prior research indicates that geographical region plays an important role in the diabetes epidemic. Yet, location information is restricted in this databases for protection of the participants, and is only accessible with an application to NHANES, which is beyond the scope of this project. Finally, with the models, the data had imbalanced classes for the semi-rare event of UDM (about 2% of the undiagnosed sample), which may make models unreliable without adjusting for imbalance. Adjustment for imbalanced was not done in this project.
Ultimately, this project found that older age and higher body mass index were the strongest predictors of UDM in both logistic regression and random forest models. Diet did not directly predict uncontrolled glucose levels, although obesity (resulting from a combination of diet, genetics, and other lifestyle factors) did. Those with incomes that were low yet above the poverty line may be more at risk for UDM. This somewhat surprising finding could be due to differences access to social wellfare programs, such as Medicaid between those below and above the poverty line. Althoug health insurance was included in the dataset, it was not separated into distinct types of insurance, and thus, may have been to blunt to show this nuance. Programs screening for undiagnosed diabetes may consider focusing on older, overweight/obese adults with incomes above the poverty line. Further research should refine the model to include weighting for nationally representativeness of the data, as well as include a more nuanced factor for health insurance.