Introduction

Overview

This project is a secondary analysis of NHANES data to characterize and model demographic and dietary factors to predict undiagnosed type II diabetes mellitus among United States Adults.

Background

Type II diabetes mellitus (T2D) is an endocrine disorder characterized by the body’s inability or reduced ability to metabolize carbohydrates due to impaired insulin response. The prevalence of T2D has increased since the 1990s from around 7% to 12.3% in 2011-2014, mirroring the trend in rising obesity rates (1). Undiagnosed diabetes Mellitus (UDM) occurs in the asymptomatic phase of type 2 diabetes. UDM can have serious ocular, renal, and cardiovascular complications that occur before treatment can occur. In addition to health complications, each case of UDM was estimated to cost $4,030 in 2012, contributing to the total $322 billion cost of diagnosed, undiagnosed, gestational, and prediabetes int he United STates at that time (2). In 2017, the CDC estimated there were 7.2 million cases of UDM (3). Effective management of the condition can help alleviate both direct and indirect costs and reduce associated comorbidity. The ability to diagnose diabetes among the undiagnosed is an important step towards reducing the national burden of diabetes.

NHANES is a United States nationally representative survey designed to assess nutritional status of the United States civilian, noninstitutionalized population (4). NHANES is a comprehensive survey, including physical examinations, laboratory analyses, questionnaires, and demographic information, and representativeness of the population including minority and underrepresented groups are ensured with weighting and sampling methods (5). NHANES provides a comprehensive database of factors that could affect health and diabetes, specifically, including dietary recalls and HbA1c laboratory results. Because type 2 diabetes and UDM are affected by a multitude of lifestyle and medical factors, including diet, genetics, and demographics, NHANES provides a useful tool for exploring the relationship between these factors. Kavakiotis et al. (6), in their systematic review of machine learning and data mining techniques in diabetes research, describe the hope that machine learning of NHANES data and similar data sources could be linked to decision-making support tools in diagnosis and treatment of diabetes. Their 2017 review included one analysis of NHANES data by Lee and Giraud-Carrier (2013), in which the researchers applied association rule mining and clustering algorithms to explore relationships between responses to health questionnaires and diabetes and hypertension in NHANES. Given the burden of UDM in the United States, this project will explore factors within the NHANES dataset which predict UDM as assessed by HbA1c level to create a low-cost way to identify those at risk.

Relevance to the Field

This project applies principles of biomedical informatics to address the population-level health issue of UDM. It incorporates machine learning techniques to explore an integration of medical and social determinants of health that can be achieved through the comprehensive NHANES database. This topic has the potential to inform clinical practitioners and regional and national policy makers as to potential areas to focus resources. I have met with Dr. Christina Roberto, Dr. Laura Gibson, and Helen Yan. These advisors recommended narrowing my topic to a more specific issue and to frame the issue as exploratory since the cross-sectional data is week for inferring causality, as well as using imputation to fill in missing data.

Research Questions

Which demographic factors are most strongly associated with UDM?
Which dietary factors are most strongly associated with UDM?
Can we predict undiagnosed diabetes from non-clinical factors, with the goal of screening for those most at risk in community settings?

Methods/Results

Step 1: Import Data

Sample

Sample restricted to nonpregnant adults age 20-79 with no past diagnosis of diabetes by a health professional

Variables

NHANES Modules Demographics, Medical Screener Questionnaire, Examination, Laboratory, Dietary
diabetes (DIQ) - doctor told you have diabetes (DIQ010) (1=yes, 2=no, 3=borderline, 7=ref, 9=dk)
hba1c (GHB) - glycohemoglobin (%) (LBXGH) (rage of values)
reformatted into a binary outcome of >6.5 or less than 6.5 HbA1c to denote UDM and not UDM among the undiagnosed population
demographic (DEMO) -
age (RIDAGEYR) (continuous),
sex (RIAGENDR) (male,female,missing),
race/eth (RIDRETH3) (mex american, other hispanic, nh white, nh black, nh asian, other/multi),
education (DMDEDUC2) (less than 9th [1], 9-11th [2], highschool/ged [3], somecollege/aa [4], college or above [5], ref [7], dk[9]),
marital status (DMDMARTL) (married [1], widowed [2], divorced [3], separated [4], never married [5], living wpartner [6], ref [77], dk [99]),
poverty-income ration (PIR) (INDFMPIR) (range of values 0-4.99, 5),
pregnancy status (RIDEXPRG) (1=yes, 2 or 3=no), total number of people in household (DMDHHSIZ) (values 1-7 or more)
medical condition screener (MCQ) -
are you now controlling or losing weight (MCA370a) (1=yes, 2=no, 7=ref, 9=dk),
are you now increasing exercise (MCA370b) (1=yes, 2=no, 8=ref, 9=dk)
alcohol use (ALQ) - how often drank alcohol over past 12 months (ALQ120Q)
smoking (SMQ) - do you now smoke cigarettes (SMA040) (1=everyday, 2=some days, 3=not at all, 7=ref, 9=dk)
dietary behavior (DBQ) - how healthy is the diet (DBQ&)) (1=excellent, 2=very good, 3=good, 4=fair, 5=poor, 7=ref, 9dk),
health insurance (HIQ) - covered by any health insuranc (HIQ011) (1=yes, 2=no, 7=ref, 9=dk)
food security (FSQ) -
Currently receive snap benefits (FSD230) (1=yes, 2=no, 7=ref, 9=dk),
adult food security (FSDAD) (1=full security, 2=marginal, 3=low, 4=very low)
body measurement (BMX) - BMI (BMXBMI) (range of valuess)
blood pressure measurement (BMP) - systolic (BPXSY1) (range of values), diastolic (BPXDI1) (range of values)
nutrient intake day 1 (DR1TOT)
nutrient intake day 2 (DR2TOT)

2015-2016

2015-2016 excluded because food security data not available in these years.

2013-2014

nhanesDemo14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DEMO_H.XPT")) #demographic
nhanesDIQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DIQ_H.XPT")) #diabetes
nhanesMSQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/MCQ_H.XPT")) #medical condition screener
nhanesALQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/ALQ_H.XPT")) #alcohol use
nhanesSMQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/SMQ_H.XPT")) #smoking
nhanesDBQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DBQ_H.XPT")) #dietary behavior
nhanesHIQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/HIQ_H.XPT")) #health insurance
nhanesFSQ14 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/FSQ_H.XPT")) #food security
nhanesExam14Body <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/BMX_H.XPT")) #body measurements
nhanesExam14BP <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/BPX_H.XPT")) #blood pressure
nhanesLab14HBA1C <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/GHB_H.XPT")) #HbA1c
nhanesNutrient141 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DR1TOT_H.XPT")) #nutrient data day 1
nhanesNutrient142 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2013-2014/DR2TOT_H.XPT")) #nutrient data day 2

nhanesData14 <- inner_join(nhanesLab14HBA1C, nhanesDemo14, by= "SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesDIQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesMSQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesALQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesSMQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesDBQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesHIQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesFSQ14, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesExam14Body, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesExam14BP, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesNutrient141, by="SEQN")
nhanesData14 <- inner_join(nhanesData14, nhanesNutrient142, by="SEQN")

#nhanesData14 <- nhanesData14[!duplicated(nhanesData14$SEQN), ] #removing duplicate IDs created by joining
#rm(list = c("nhanesData14"))

2011-2012

nhanesDemo12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DEMO_G.XPT")) #demographic
nhanesDIQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DIQ_G.XPT")) #diabetes
nhanesMSQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/MCQ_G.XPT")) #medical condition screener
nhanesALQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/ALQ_G.XPT")) #alcohol use
nhanesSMQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/SMQ_G.XPT")) #smoking
nhanesDBQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DBQ_G.XPT")) #dietary behavior
nhanesHIQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/HIQ_G.XPT")) #health insurance
nhanesFSQ12 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/FSQ_G.XPT")) #food security
nhanesExam12Body <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/BMX_G.XPT")) #body measurements
nhanesExam12BP <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/BPX_G.XPT")) #blood pressure
nhanesLab12HBA1C <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/GHB_G.XPT")) #HbA1c
nhanesNutrient121 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DR1TOT_G.XPT")) #nutrient data day 1
nhanesNutrient122 <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2011-2012/DR2TOT_G.XPT")) #nutrient data day 2

nhanesData12 <- inner_join(nhanesLab12HBA1C, nhanesDemo12, by= "SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesDIQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesMSQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesALQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesSMQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesDBQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesHIQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesFSQ12, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesExam12Body, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesExam12BP, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesNutrient121, by="SEQN")
nhanesData12 <- inner_join(nhanesData12, nhanesNutrient122, by="SEQN")

#nhanesData12 <- nhanesData12[!duplicated(nhanesData12$SEQN), ] #removing duplicate IDs created by joining
#rm(list = c("nhanesData14Subset"))

Data Preprocessing

Combine 2011-12 and 2013-14

Using full_join to keep rows

nhanesData <- full_join(nhanesData12, nhanesData14)

Creating a new dataframe with just the desired variables

#rm(list = c("nhanesDataFull"))

nhanesDataFull <- nhanesData %>%
    select(id=SEQN, 
           weightQ2Yr= WTINT2YR,
           weightE2Yr= WTMEC2YR,
           hba1cLevel=LBXGH,
           age=RIDAGEYR, 
           gender=RIAGENDR,
           race=RIDRETH3, 
           education=DMDEDUC2, 
           marital=DMDMARTL,
           PIR=INDFMPIR,
           pregnant=RIDEXPRG,
           #diabetes
           diagnosed=DIQ010,
           famHistory=DIQ175A,
           #med conditions
           controlWeight=MCQ370A,
           increaseExercise=MCQ370B,
           #alcohol
           alcohol=ALQ120Q,
           #smoking
           smoke=SMQ040,
           #dietary behavior
           diet=DBQ700,
           #health insurance
           insurance=HIQ011,
           #food security
           snapCurrent=FSD230,
           foodSecure=FSDAD,
           #body measurement
           bmi=BMXBMI,
           #blood pressure,
           systolic=BPXSY1,
           diastolic=BPXDI1) %>%
  mutate(totKcal1=nhanesData$DR1TKCAL,
         totKcal2= nhanesData$DR2TKCAL,
           totProt1=nhanesData$DR1TPROT,
         totProt2=nhanesData$DR2TPROT,
           totCarb1=nhanesData$DR1TCARB,
         totCarb2=nhanesData$DR2TCARB,
           totSugr1=nhanesData$DR1TSUGR,
         totSugr2=nhanesData$DR2TSUGR,
           totFibe1=nhanesData$DR1TFIBE,
         totFibe2=nhanesData$DR2TFIBE,
           totFat1=nhanesData$DR1TTFAT,
         totFat2=nhanesData$DR2TTFAT)

nhanesDataFull <- nhanesDataFull %>%
  filter(age >19) %>%
  filter(age <80)

nhanesDataFull <- nhanesDataFull %>%
  filter(pregnant != 1 | is.na(nhanesDataFull$pregnant==TRUE)) 
  
nhanesDataFull <- nhanesDataFull %>%
  filter(alcohol<700) #filtering less than 700 because 777 and 999 are refused and don't know


nhanesDataFull <- nhanesDataFull %>%
  mutate(ageCut=cut(nhanesDataFull$age, c(20,30,40,50,60,70,81))) %>%
  mutate(gender=factor(gender, levels=c(1, 2), labels=c("male", "female"))) %>%
  mutate(race=factor(race, levels=c(3, 4, 1, 6, 2, 7), labels=c("white", "black", "MexA", "Asian", "Hispanic", "other"))) %>%
  mutate(marital=factor(marital, levels=c(1,2,3,4,5,6,77,99), labels=c("partnered", "notPartnered", "notPartnered", "notPartnered","notPartnered", "partnered", "notPartnered", "notPartnered"))) %>%
  mutate(education=factor(education, levels=c(1,2,3,4,5), labels=c("lesshigh", "lesshigh", "high", "somecollege", "college"))) %>%
  
  mutate(bmiCut=cut(bmi, c(0, 25, 30, 100), labels=c("0-25", "25-30", "30+"))) %>%
  mutate(controlWeight=factor(controlWeight, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
  mutate(increaseExercise=factor(increaseExercise, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
  mutate(smoke=factor(smoke, levels=c(1,2,3,7,9), labels=c("yes", "yes", "no", "no", "no"))) %>%
  mutate(diet=factor(diet, levels=c(1,2,3,4,5), labels=c("excellent", "verygood", "good", "fair", "poor"))) %>%
  mutate(insurance=factor(insurance, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>%
  mutate(foodSecure=factor(foodSecure, levels=c(1,2,3,4), labels=c("full", "marginal", "low", "verylow"))) %>%
  mutate(snapCurrent=factor(snapCurrent, levels=c(1,2,7,9), labels=c("yes", "no", "no", "no"))) %>% 
  mutate(PIR=cut(PIR, c(0,1,1.33, 1.5, 1.85, 5.1), labels=c("0-1", "1-1.32", "1.33-1.49", "1.5-1.84", "1.85-5.0"))) %>%
  mutate(diagnosedCode = factor(diagnosed, levels=c(1,2,3), labels=c("yes", "no", "borderline"))) 


nhanesDataFull <- nhanesDataFull %>%
  mutate(hba1cCutoff = ifelse(hba1cLevel<6.5, 1, 0))


nhanesDataFull <- subset(nhanesDataFull, select=-c(pregnant)) #dropping pregnant and diagnosed because only used to filter

#nhanesDataFull <- subset(nhanesDataFull, select=-c(weightQ2Yr)) #dropping the weights until I can figure out how to apply them to the models
#nhanesDataFull <- subset(nhanesDataFull, select=-c(weightE2Yr))

wpct(nhanesDataFull$diagnosedCode, nhanesDataFull$hba1cCutoff)

##        yes         no borderline 
## 0.04424911 0.93384982 0.02190107

wtd.table(nhanesDataFull$diagnosedCode, weights=nhanesDataFull$weightE2Yr, na.rm=TRUE)

## $x
## [1] "yes"        "no"         "borderline"
## 
## $sum.of.weights
## [1]  32331984 308093564   8507146

wpct(nhanesDataFull$diagnosedCode, weight=nhanesDataFull$weightE2Yr)

##        yes         no borderline 
## 0.09265966 0.88295986 0.02438048

nhanesDataFull <- nhanesDataFull %>%
  filter(diagnosed==2) #filtering to only those who have NOT been diagnosed with diabetes

nhanesDataFull <- subset(nhanesDataFull, select=-c(diagnosed))

Sample Summary

12% of the sample (unweighted) had a diagnosis of diabetes. Of the 85% with no diagnosis, 2% had HbA1c levels above 6.5, which corresponds to about 6.5 million people.

Frequency tables

head(nhanesDataFull)

## # A tibble: 6 x 38
##      id weightQ2Yr weightE2Yr hba1cLevel   age gender race  education marital
##   <dbl>      <dbl>      <dbl>      <dbl> <dbl> <fct>  <fct> <fct>     <fct>  
## 1 62169     14392.     14784.        5.4    21 male   Asian high      notPar~
## 2 62172     26961.     27123.        5.6    43 female black high      notPar~
## 3 62176     53831.     54203.        5.4    34 female white college   partne~
## 4 62179     16590.     17115.        5.7    55 male   Asian college   partne~
## 5 62180     20458.     22616.        5.3    35 male   white college   partne~
## 6 62184     15601.     15236.        4.5    26 male   black high      notPar~
## # ... with 29 more variables: PIR <fct>, famHistory <dbl>, controlWeight <fct>,
## #   increaseExercise <fct>, alcohol <dbl>, smoke <fct>, diet <fct>,
## #   insurance <fct>, snapCurrent <fct>, foodSecure <fct>, bmi <dbl>,
## #   systolic <dbl>, diastolic <dbl>, totKcal1 <dbl>, totKcal2 <dbl>,
## #   totProt1 <dbl>, totProt2 <dbl>, totCarb1 <dbl>, totCarb2 <dbl>,
## #   totSugr1 <dbl>, totSugr2 <dbl>, totFibe1 <dbl>, totFibe2 <dbl>,
## #   totFat1 <dbl>, totFat2 <dbl>, ageCut <fct>, bmiCut <fct>,
## #   diagnosedCode <fct>, hba1cCutoff <dbl>

summary(nhanesDataFull$age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.00   31.00   44.00   45.22   59.00   79.00

table(nhanesDataFull$agecut)

## Warning: Unknown or uninitialised column: 'agecut'.

## < table of extent 0 >

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$gender)

##    
##     male female
##   0   91     71
##   1 3323   2945

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$race)

##    
##     white black MexA Asian Hispanic other
##   0    33    56   24    22       22     5
##   1  2748  1348  722   651      593   206

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$education)

##    
##     lesshigh high somecollege college
##   0       47   43          46      26
##   1     1144 1332        2025    1767

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$marital)

##    
##     partnered notPartnered
##   0        95           67
##   1      3653         2615

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$PIR)

##    
##      0-1 1-1.32 1.33-1.49 1.5-1.84 1.85-5.0
##   0   34     28        10       10       67
##   1 1240    644       205      403     3311

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$controlWeight)

##    
##      yes   no
##   0  112   50
##   1 3746 2522

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$increaseExercise)

##    
##      yes   no
##   0   94   68
##   1 3720 2548

summary(nhanesDataFull$alcohol)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   2.000   4.039   4.000 365.000

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$smoke)

##    
##      yes   no
##   0   37   50
##   1 1527 1438

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$diet)

##    
##     excellent verygood good fair poor
##   0         7       22   77   45   10
##   1       538     1329 2675 1386  339

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$insurance)

##    
##      yes   no
##   0  116   46
##   1 4707 1561

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$snapCurrent)

##    
##      yes   no
##   0   30    2
##   1 1207  176

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$foodSecure)

##    
##     full marginal  low verylow
##   0  100       21   19      19
##   1 4357      693  658     523

summary(nhanesDataFull$bmi)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   13.60   23.80   27.40   28.53   31.80   82.90      47

table(nhanesDataFull$hba1cCutoff, nhanesDataFull$bmiCut)

##    
##     0-25 25-30  30+
##   0   15    43  102
##   1 2122  2058 2046

summary(nhanesDataFull$systolic)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    66.0   110.0   118.0   121.5   130.0   230.0     433

summary(nhanesDataFull$diastolic)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0    64.0    72.0    71.3    78.0   122.0     433

summary(nhanesDataFull$totKcal1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0    1538    2050    2240    2726   13687     270

summary(nhanesDataFull$totProt1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   54.70   77.86   85.43  106.06  557.87     270

summary(nhanesDataFull$totCarb1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   178.9   246.0   268.8   330.5  1815.0     270

summary(nhanesDataFull$totSugr1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0    63.3   102.3   119.7   153.8  1048.5     270

summary(nhanesDataFull$totFibe1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    9.60   14.80   17.27   22.40  107.00     270

summary(nhanesDataFull$totFat1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   50.98   74.85   84.05  106.72  553.79     270

table(nhanesDataFull$hba1cCutoff)

## 
##    0    1 
##  162 6268

Missing Values

Counting missing

Used logic and survey design to relabel some missing values that should be “no” (e.g. participants who answered “no” to a prior questions were not asked the smoking question used for this variable, so were labeled as “no” for this question)

#percent missing for each variable
p <- function(x) {sum(is.na(x))/length(x)*100}
pmissing <- apply(nhanesDataFull,2,p)
pmissing

##               id       weightQ2Yr       weightE2Yr       hba1cLevel 
##       0.00000000       0.00000000       0.00000000       3.06045530 
##              age           gender             race        education 
##       0.00000000       0.00000000       0.00000000       0.01507613 
##          marital              PIR       famHistory    controlWeight 
##       0.00000000       7.55314337      78.29036635       0.00000000 
## increaseExercise          alcohol            smoke             diet 
##       0.00000000       0.00000000      52.25388210       0.03015227 
##        insurance      snapCurrent       foodSecure              bmi 
##       0.00000000      77.74762551       0.61812151       0.70857832 
##         systolic        diastolic         totKcal1         totKcal2 
##       6.52796623       6.52796623       4.07055631      14.39770843 
##         totProt1         totProt2         totCarb1         totCarb2 
##       4.07055631      14.39770843       4.07055631      14.39770843 
##         totSugr1         totSugr2         totFibe1         totFibe2 
##       4.07055631      14.39770843       4.07055631      14.39770843 
##          totFat1          totFat2           ageCut           bmiCut 
##       4.07055631      14.39770843       2.27649631       0.70857832 
##    diagnosedCode      hba1cCutoff 
##       0.00000000       3.06045530

#76% missing from snapCurrent, so changing missing to "no" because missing are filtered from previous question asing if they recieved snap benefits in past 12 months. 12 mo not used because vars are different between 11-12 and 13-14. prior question was "ever received" and people who said no were excluded from currently receive. 
nhanesDataFull$snapCurrent <- fct_explicit_na(nhanesDataFull$snapCurrent, na_level = "no")

#41.8% of smoking missing, but coding missing to "no" because prior question asks "smoked at least 100 cigarettes in life" and if answer is no/ref/dk, skip to end of questionnaire. Missing values for this question, "do you now smoke cigarettes" are skipped questions due to no cigarette use.
nhanesDataFull$smoke <- fct_explicit_na(nhanesDataFull$smoke, na_level = "no")

pmissing <- apply(nhanesDataFull,2,p)
pmissing

##               id       weightQ2Yr       weightE2Yr       hba1cLevel 
##       0.00000000       0.00000000       0.00000000       3.06045530 
##              age           gender             race        education 
##       0.00000000       0.00000000       0.00000000       0.01507613 
##          marital              PIR       famHistory    controlWeight 
##       0.00000000       7.55314337      78.29036635       0.00000000 
## increaseExercise          alcohol            smoke             diet 
##       0.00000000       0.00000000       0.00000000       0.03015227 
##        insurance      snapCurrent       foodSecure              bmi 
##       0.00000000       0.00000000       0.61812151       0.70857832 
##         systolic        diastolic         totKcal1         totKcal2 
##       6.52796623       6.52796623       4.07055631      14.39770843 
##         totProt1         totProt2         totCarb1         totCarb2 
##       4.07055631      14.39770843       4.07055631      14.39770843 
##         totSugr1         totSugr2         totFibe1         totFibe2 
##       4.07055631      14.39770843       4.07055631      14.39770843 
##          totFat1          totFat2           ageCut           bmiCut 
##       4.07055631      14.39770843       2.27649631       0.70857832 
##    diagnosedCode      hba1cCutoff 
##       0.00000000       3.06045530

md.pattern(nhanesDataFull)

##      id weightQ2Yr weightE2Yr age gender race marital controlWeight
## 1057  1          1          1   1      1    1       1             1
## 3669  1          1          1   1      1    1       1             1
## 94    1          1          1   1      1    1       1             1
## 427   1          1          1   1      1    1       1             1
## 68    1          1          1   1      1    1       1             1
## 261   1          1          1   1      1    1       1             1
## 10    1          1          1   1      1    1       1             1
## 30    1          1          1   1      1    1       1             1
## 66    1          1          1   1      1    1       1             1
## 225   1          1          1   1      1    1       1             1
## 8     1          1          1   1      1    1       1             1
## 26    1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 19    1          1          1   1      1    1       1             1
## 4     1          1          1   1      1    1       1             1
## 35    1          1          1   1      1    1       1             1
## 140   1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 19    1          1          1   1      1    1       1             1
## 5     1          1          1   1      1    1       1             1
## 32    1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 4     1          1          1   1      1    1       1             1
## 20    1          1          1   1      1    1       1             1
## 88    1          1          1   1      1    1       1             1
## 5     1          1          1   1      1    1       1             1
## 33    1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 9     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 8     1          1          1   1      1    1       1             1
## 4     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 4     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 23    1          1          1   1      1    1       1             1
## 75    1          1          1   1      1    1       1             1
## 6     1          1          1   1      1    1       1             1
## 15    1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 6     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 5     1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 5     1          1          1   1      1    1       1             1
## 20    1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 5     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 19    1          1          1   1      1    1       1             1
## 3     1          1          1   1      1    1       1             1
## 6     1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 2     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
## 1     1          1          1   1      1    1       1             1
##       0          0          0   0      0    0       0             0
##      increaseExercise alcohol smoke insurance snapCurrent diagnosedCode
## 1057                1       1     1         1           1             1
## 3669                1       1     1         1           1             1
## 94                  1       1     1         1           1             1
## 427                 1       1     1         1           1             1
## 68                  1       1     1         1           1             1
## 261                 1       1     1         1           1             1
## 10                  1       1     1         1           1             1
## 30                  1       1     1         1           1             1
## 66                  1       1     1         1           1             1
## 225                 1       1     1         1           1             1
## 8                   1       1     1         1           1             1
## 26                  1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 19                  1       1     1         1           1             1
## 4                   1       1     1         1           1             1
## 35                  1       1     1         1           1             1
## 140                 1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 19                  1       1     1         1           1             1
## 5                   1       1     1         1           1             1
## 32                  1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 4                   1       1     1         1           1             1
## 20                  1       1     1         1           1             1
## 88                  1       1     1         1           1             1
## 5                   1       1     1         1           1             1
## 33                  1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 9                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 8                   1       1     1         1           1             1
## 4                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 4                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 23                  1       1     1         1           1             1
## 75                  1       1     1         1           1             1
## 6                   1       1     1         1           1             1
## 15                  1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 6                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 5                   1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 5                   1       1     1         1           1             1
## 20                  1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 5                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 19                  1       1     1         1           1             1
## 3                   1       1     1         1           1             1
## 6                   1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 2                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
## 1                   1       1     1         1           1             1
##                     0       0     0         0           0             0
##      education diet foodSecure bmi bmiCut ageCut hba1cLevel hba1cCutoff
## 1057         1    1          1   1      1      1          1           1
## 3669         1    1          1   1      1      1          1           1
## 94           1    1          1   1      1      1          1           1
## 427          1    1          1   1      1      1          1           1
## 68           1    1          1   1      1      1          1           1
## 261          1    1          1   1      1      1          1           1
## 10           1    1          1   1      1      1          1           1
## 30           1    1          1   1      1      1          1           1
## 66           1    1          1   1      1      1          1           1
## 225          1    1          1   1      1      1          1           1
## 8            1    1          1   1      1      1          1           1
## 26           1    1          1   1      1      1          1           1
## 3            1    1          1   1      1      1          1           1
## 19           1    1          1   1      1      1          1           1
## 4            1    1          1   1      1      1          1           1
## 35           1    1          1   1      1      1          1           1
## 140          1    1          1   1      1      1          1           1
## 3            1    1          1   1      1      1          1           1
## 19           1    1          1   1      1      1          1           1
## 5            1    1          1   1      1      1          1           1
## 32           1    1          1   1      1      1          1           1
## 1            1    1          1   1      1      1          1           1
## 4            1    1          1   1      1      1          1           1
## 20           1    1          1   1      1      1          0           0
## 88           1    1          1   1      1      1          0           0
## 5            1    1          1   1      1      1          0           0
## 33           1    1          1   1      1      1          0           0
## 3            1    1          1   1      1      1          0           0
## 9            1    1          1   1      1      1          0           0
## 3            1    1          1   1      1      1          0           0
## 1            1    1          1   1      1      1          0           0
## 3            1    1          1   1      1      1          0           0
## 8            1    1          1   1      1      1          0           0
## 4            1    1          1   1      1      1          0           0
## 1            1    1          1   1      1      1          0           0
## 4            1    1          1   1      1      1          0           0
## 3            1    1          1   1      1      1          0           0
## 1            1    1          1   1      1      1          0           0
## 1            1    1          1   1      1      1          0           0
## 1            1    1          1   1      1      1          0           0
## 23           1    1          1   1      1      0          1           1
## 75           1    1          1   1      1      0          1           1
## 6            1    1          1   1      1      0          1           1
## 15           1    1          1   1      1      0          1           1
## 2            1    1          1   1      1      0          1           1
## 6            1    1          1   1      1      0          1           1
## 1            1    1          1   1      1      0          1           1
## 1            1    1          1   1      1      0          1           1
## 5            1    1          1   1      1      0          1           1
## 2            1    1          1   1      1      0          1           1
## 1            1    1          1   1      1      0          1           1
## 1            1    1          1   1      1      0          1           1
## 1            1    1          1   1      1      0          0           0
## 3            1    1          1   1      1      0          0           0
## 1            1    1          1   1      1      0          0           0
## 3            1    1          1   1      1      0          0           0
## 1            1    1          1   1      1      0          0           0
## 1            1    1          1   1      1      0          0           0
## 5            1    1          1   0      0      1          1           1
## 20           1    1          1   0      0      1          1           1
## 2            1    1          1   0      0      1          1           1
## 2            1    1          1   0      0      1          1           1
## 3            1    1          1   0      0      1          1           1
## 1            1    1          1   0      0      1          1           1
## 1            1    1          1   0      0      1          1           1
## 2            1    1          1   0      0      1          1           1
## 5            1    1          1   0      0      1          1           1
## 1            1    1          1   0      0      1          1           1
## 2            1    1          1   0      0      1          1           1
## 1            1    1          1   0      0      1          0           0
## 1            1    1          1   0      0      1          0           0
## 1            1    1          1   0      0      1          0           0
## 3            1    1          0   1      1      1          1           1
## 19           1    1          0   1      1      1          1           1
## 3            1    1          0   1      1      1          1           1
## 6            1    1          0   1      1      1          1           1
## 2            1    1          0   1      1      1          1           1
## 1            1    1          0   1      1      1          1           1
## 1            1    1          0   1      1      1          1           1
## 2            1    1          0   1      1      1          1           1
## 1            1    1          0   1      1      1          0           0
## 1            1    1          0   1      1      0          1           1
## 1            1    1          0   1      1      0          1           1
## 1            1    1          0   1      1      0          1           1
## 1            1    0          1   1      1      1          1           1
## 1            1    0          1   1      1      1          1           1
## 1            0    1          1   1      1      1          0           0
##              1    2         41  47     47    151        203         203
##      totKcal1 totProt1 totCarb1 totSugr1 totFibe1 totFat1 systolic diastolic
## 1057        1        1        1        1        1       1        1         1
## 3669        1        1        1        1        1       1        1         1
## 94          1        1        1        1        1       1        1         1
## 427         1        1        1        1        1       1        1         1
## 68          1        1        1        1        1       1        1         1
## 261         1        1        1        1        1       1        1         1
## 10          1        1        1        1        1       1        1         1
## 30          1        1        1        1        1       1        1         1
## 66          1        1        1        1        1       1        0         0
## 225         1        1        1        1        1       1        0         0
## 8           1        1        1        1        1       1        0         0
## 26          1        1        1        1        1       1        0         0
## 3           1        1        1        1        1       1        0         0
## 19          1        1        1        1        1       1        0         0
## 4           1        1        1        1        1       1        0         0
## 35          0        0        0        0        0       0        1         1
## 140         0        0        0        0        0       0        1         1
## 3           0        0        0        0        0       0        1         1
## 19          0        0        0        0        0       0        1         1
## 5           0        0        0        0        0       0        0         0
## 32          0        0        0        0        0       0        0         0
## 1           0        0        0        0        0       0        0         0
## 4           0        0        0        0        0       0        0         0
## 20          1        1        1        1        1       1        1         1
## 88          1        1        1        1        1       1        1         1
## 5           1        1        1        1        1       1        1         1
## 33          1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        1         1
## 9           1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        0         0
## 8           1        1        1        1        1       1        0         0
## 4           1        1        1        1        1       1        0         0
## 1           0        0        0        0        0       0        1         1
## 4           0        0        0        0        0       0        1         1
## 3           0        0        0        0        0       0        1         1
## 1           0        0        0        0        0       0        0         0
## 1           0        0        0        0        0       0        0         0
## 1           0        0        0        0        0       0        0         0
## 23          1        1        1        1        1       1        1         1
## 75          1        1        1        1        1       1        1         1
## 6           1        1        1        1        1       1        1         1
## 15          1        1        1        1        1       1        1         1
## 2           1        1        1        1        1       1        1         1
## 6           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        0         0
## 5           1        1        1        1        1       1        0         0
## 2           1        1        1        1        1       1        0         0
## 1           0        0        0        0        0       0        1         1
## 1           0        0        0        0        0       0        1         1
## 1           1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 1           0        0        0        0        0       0        1         1
## 5           1        1        1        1        1       1        1         1
## 20          1        1        1        1        1       1        1         1
## 2           1        1        1        1        1       1        1         1
## 2           1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        0         0
## 1           1        1        1        1        1       1        0         0
## 1           1        1        1        1        1       1        0         0
## 2           0        0        0        0        0       0        1         1
## 5           0        0        0        0        0       0        1         1
## 1           0        0        0        0        0       0        1         1
## 2           0        0        0        0        0       0        0         0
## 1           1        1        1        1        1       1        1         1
## 1           0        0        0        0        0       0        1         1
## 1           0        0        0        0        0       0        0         0
## 3           1        1        1        1        1       1        1         1
## 19          1        1        1        1        1       1        1         1
## 3           1        1        1        1        1       1        1         1
## 6           1        1        1        1        1       1        1         1
## 2           1        1        1        1        1       1        0         0
## 1           0        0        0        0        0       0        1         1
## 1           0        0        0        0        0       0        1         1
## 2           0        0        0        0        0       0        0         0
## 1           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        0         0
## 1           1        1        1        1        1       1        1         1
## 1           1        1        1        1        1       1        1         1
## 1           0        0        0        0        0       0        0         0
##           270      270      270      270      270     270      433       433
##      PIR totKcal2 totProt2 totCarb2 totSugr2 totFibe2 totFat2 famHistory      
## 1057   1        1        1        1        1        1       1          1     0
## 3669   1        1        1        1        1        1       1          0     1
## 94     1        0        0        0        0        0       0          1     6
## 427    1        0        0        0        0        0       0          0     7
## 68     0        1        1        1        1        1       1          1     1
## 261    0        1        1        1        1        1       1          0     2
## 10     0        0        0        0        0        0       0          1     7
## 30     0        0        0        0        0        0       0          0     8
## 66     1        1        1        1        1        1       1          1     2
## 225    1        1        1        1        1        1       1          0     3
## 8      1        0        0        0        0        0       0          1     8
## 26     1        0        0        0        0        0       0          0     9
## 3      0        1        1        1        1        1       1          1     3
## 19     0        1        1        1        1        1       1          0     4
## 4      0        0        0        0        0        0       0          0    10
## 35     1        0        0        0        0        0       0          1    12
## 140    1        0        0        0        0        0       0          0    13
## 3      0        0        0        0        0        0       0          1    13
## 19     0        0        0        0        0        0       0          0    14
## 5      1        0        0        0        0        0       0          1    14
## 32     1        0        0        0        0        0       0          0    15
## 1      0        0        0        0        0        0       0          1    15
## 4      0        0        0        0        0        0       0          0    16
## 20     1        1        1        1        1        1       1          1     2
## 88     1        1        1        1        1        1       1          0     3
## 5      1        0        0        0        0        0       0          1     8
## 33     1        0        0        0        0        0       0          0     9
## 3      0        1        1        1        1        1       1          1     3
## 9      0        1        1        1        1        1       1          0     4
## 3      0        0        0        0        0        0       0          1     9
## 1      0        0        0        0        0        0       0          0    10
## 3      1        1        1        1        1        1       1          1     4
## 8      1        1        1        1        1        1       1          0     5
## 4      1        0        0        0        0        0       0          0    11
## 1      1        0        0        0        0        0       0          1    14
## 4      1        0        0        0        0        0       0          0    15
## 3      0        0        0        0        0        0       0          0    16
## 1      1        0        0        0        0        0       0          1    16
## 1      1        0        0        0        0        0       0          0    17
## 1      0        0        0        0        0        0       0          0    18
## 23     1        1        1        1        1        1       1          1     1
## 75     1        1        1        1        1        1       1          0     2
## 6      1        0        0        0        0        0       0          1     7
## 15     1        0        0        0        0        0       0          0     8
## 2      0        1        1        1        1        1       1          1     2
## 6      0        1        1        1        1        1       1          0     3
## 1      0        0        0        0        0        0       0          0     9
## 1      1        1        1        1        1        1       1          1     3
## 5      1        1        1        1        1        1       1          0     4
## 2      1        0        0        0        0        0       0          0    10
## 1      1        0        0        0        0        0       0          0    14
## 1      0        0        0        0        0        0       0          0    15
## 1      1        1        1        1        1        1       1          1     3
## 3      1        1        1        1        1        1       1          0     4
## 1      1        0        0        0        0        0       0          1     9
## 3      1        0        0        0        0        0       0          0    10
## 1      0        1        1        1        1        1       1          0     5
## 1      1        0        0        0        0        0       0          1    15
## 5      1        1        1        1        1        1       1          1     2
## 20     1        1        1        1        1        1       1          0     3
## 2      0        1        1        1        1        1       1          1     3
## 2      0        1        1        1        1        1       1          0     4
## 3      1        1        1        1        1        1       1          0     5
## 1      1        0        0        0        0        0       0          1    10
## 1      0        1        1        1        1        1       1          0     6
## 2      1        0        0        0        0        0       0          1    14
## 5      1        0        0        0        0        0       0          0    15
## 1      0        0        0        0        0        0       0          0    16
## 2      1        0        0        0        0        0       0          0    17
## 1      1        1        1        1        1        1       1          1     4
## 1      1        0        0        0        0        0       0          0    17
## 1      1        0        0        0        0        0       0          0    19
## 3      0        1        1        1        1        1       1          1     2
## 19     0        1        1        1        1        1       1          0     3
## 3      0        0        0        0        0        0       0          1     8
## 6      0        0        0        0        0        0       0          0     9
## 2      0        1        1        1        1        1       1          0     5
## 1      0        0        0        0        0        0       0          1    14
## 1      0        0        0        0        0        0       0          0    15
## 2      0        0        0        0        0        0       0          0    17
## 1      0        1        1        1        1        1       1          0     5
## 1      0        1        1        1        1        1       1          0     4
## 1      0        0        0        0        0        0       0          0    10
## 1      0        1        1        1        1        1       1          0     6
## 1      1        1        1        1        1        1       1          1     1
## 1      1        0        0        0        0        0       0          0     8
## 1      0        0        0        0        0        0       0          0    19
##      501      955      955      955      955      955     955       5193 14605

#md.pairs(nhanesDataFull)

Multiple imputation

Used mice function to impute missing data values

impute <- mice(nhanesDataFull[,2:34], m=5, seed=619)

## 
##  iter imp variable
##   1   1  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   1   2  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   1   3  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   1   4  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   1   5  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   2   1  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   2   2  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   2   3  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   2   4  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   2   5  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   3   1  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   3   2  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   3   3  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   3   4  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   3   5  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   4   1  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   4   2  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   4   3  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   4   4  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   4   5  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   5   1  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   5   2  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   5   3  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   5   4  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2
##   5   5  hba1cLevel  education  PIR  diet  foodSecure  bmi  systolic  diastolic  totKcal1  totKcal2  totProt1  totProt2  totCarb1  totCarb2  totSugr1  totSugr2  totFibe1  totFibe2  totFat1  totFat2

## Warning: Number of logged events: 1

stripplot(impute, pch=20, cex=1.2)

Imputation gave 5 datasets.
Although the regressions should be done on all 5 individually then averaged, for this project, bivariate analyses were done on the first dataset and regression and random forest models were done on the combined 5 datasets

library(mice)

impute1 <- mice::complete(impute, 1)
impute2 <- mice::complete(impute, 2)
impute3 <- mice::complete(impute, 3)
impute4 <- mice::complete(impute, 4)
impute5 <- mice::complete(impute, 5)
imputeLong <- mice::complete(impute, "long", inc=FALSE)

imputeLong <- subset(imputeLong, select=-c(.imp))
imputeLong <- subset(imputeLong, select=-c(.id))


impute1 <- impute1 %>%
  mutate(hba1cCutoff = ifelse(hba1cLevel<6.5, 0, 1)) %>%
  mutate(totKcal=(totKcal1+totKcal2)/2) %>%
  mutate(totProt=(totProt1 + totProt2)/2) %>%
  mutate(totCarb=(totCarb1+totCarb2)/2) %>%
  mutate(totSugr=(totSugr1+totSugr2)/2) %>%
  mutate(totFibe=(totFibe1+totFibe2)/2) %>%
  mutate(totFat=(totFat1+totFat2)/2)

imputeLong <- imputeLong %>%
  mutate(hba1cCutoff = ifelse(hba1cLevel<6.5, 0, 1))  %>%
  mutate(totKcal=(totKcal1+totKcal2)/2) %>%
  mutate(totProt=(totProt1 + totProt2)/2) %>%
  mutate(totCarb=(totCarb1+totCarb2)/2) %>%
  mutate(totSugr=(totSugr1+totSugr2)/2) %>%
  mutate(totFibe=(totFibe1+totFibe2)/2) %>%
  mutate(totFat=(totFat1+totFat2)/2)

impute1 <- impute1 %>%
  mutate(hba1cCutoffFactor = factor(hba1cCutoff, levels=c("0","1"), labels=c("no", "yes"))) 
#"no" = no undiagnosed diabetes (<6.5), "yes" = undiagnosed diabetes (>6.5)

imputeLong <- imputeLong %>%
  mutate(hba1cCutoffFactor = factor(hba1cCutoff, levels=c("0","1"), labels=c("no", "yes"))) 

#table(imputeLong$hba1cCutoff)

Bivariate Analyses

Chi square test and ANOVA conducted on categorical and continuous variables, respectively.
Bivariate analyses were conducted to identify signifcant factors to then use in regression and radom forest models
Significant in bivariate analyses: age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet

Factors significant in Bivariate Analysis:

Demographic Factors * Age – (p<0.001) * Race - 4% of undiagnosed adults who were non-Hispanic Black or Other Hispanic had high HbA1c Levels (p<0.001) * Education - (p<0.001) * Income - between 133-149% of the poverty line had the highest proportion of high HbA1c levels (5%), compared to 2% among those with the incomes below the poverty line (p<0.001) * Not significant - Gender, marital status

Dietary Factors * BMI – 5% of adults with no diabetes diagnosis and a BMI over 30 had HbA1c Levels over 6.5 compared to 1% of adults with BMI <25 (p<0.001) * Food Security – 4% of undiagnosed adults with very low food security had high HbA1c Levels (p<0.001) * Diet quality – worse self rated diet quality associated with greater HbA1c (p=0.011) * Alcohol consumption (p=0.004) * Total calorie intake (p=0.014) * Total carbohydrate intake (p=0.029) * Not significant – protein, fiber, fat, smoking,

#chisqAgeCut <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$ageCut, weight=imputeLong$weightE2Yr)}
chisqAge <- summary(aov(age ~ hba1cCutoff, data=impute1))
chisqGender <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$gender, weight=impute1$weightE2Yr)}
chisqRace <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$race, weight=impute1$weightE2Yr)}
chisqEd <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$education, weight=impute1$weightE2Yr)}
chisqPir <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$PIR, weight=impute1$weightE2Yr)}
chisqBmi <- summary(aov(bmi ~ hba1cCutoff, data=imputeLong))
chisqFamHist <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$famHist, weight=impute1$weightE2Yr)}
chisqInsurance <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$insurance, weight=impute1$weightE2Yr)}
chisqMar <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$marital, weight=impute1$weightE2Yr)}
chisqFood <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$foodSecure, weight=impute1$weightE2Yr)}
chisqSmoke <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$smoke, weight=impute1$weightE2Yr)}
chisqAlcohol <- summary(aov(alcohol ~ hba1cCutoff, data=imputeLong))
chisqDiet <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$diet, weight=impute1$weightE2Yr)}
chisqSnap <- {weights::wtd.chi.sq(impute1$hba1cCutoff, impute1$snapCurrent, weight=impute1$weightE2Yr)}

#Nutrient factors
chisqKcal <- summary(aov(totKcal ~ hba1cCutoff, data=impute1))
chisqProt <- summary(aov(totProt ~ hba1cCutoff, data=impute1))
chisqCarb <- summary(aov(totCarb ~ hba1cCutoff, data=impute1))
chisqSugr <- summary(aov(totSugr ~ hba1cCutoff, data=impute1))
chisqFibe <- summary(aov(totFibe ~ hba1cCutoff, data=impute1))
chisqFat <-  summary(aov(totFat ~ hba1cCutoff, data=impute1))

Linear Regression

testing out linear regression, but decided not to include

nhanesLm1 <- lm(data=impute1, hba1cLevel ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet)
summary(nhanesLm1)

## 
## Call:
## lm(formula = hba1cLevel ~ age + race + education + PIR + bmi + 
##     insurance + foodSecure + totKcal + totProt + totCarb + totSugr + 
##     totFat + diet, data = impute1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0674 -0.2493 -0.0183  0.2003  8.4166 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           4.392e+00  5.069e-02  86.640  < 2e-16 ***
## age                   1.128e-02  4.327e-04  26.080  < 2e-16 ***
## raceblack             1.638e-01  1.709e-02   9.585  < 2e-16 ***
## raceMexA              9.804e-02  2.284e-02   4.292 1.80e-05 ***
## raceAsian             2.271e-01  2.364e-02   9.607  < 2e-16 ***
## raceHispanic          1.124e-01  2.354e-02   4.775 1.83e-06 ***
## raceother             1.107e-01  3.673e-02   3.013 0.002598 ** 
## educationhigh        -4.225e-02  2.098e-02  -2.013 0.044123 *  
## educationsomecollege -8.837e-02  2.001e-02  -4.416 1.02e-05 ***
## educationcollege     -1.004e-01  2.250e-02  -4.463 8.22e-06 ***
## PIR1-1.32             3.821e-02  2.383e-02   1.603 0.108888    
## PIR1.33-1.49         -5.187e-03  3.651e-02  -0.142 0.887047    
## PIR1.5-1.84          -2.226e-02  2.825e-02  -0.788 0.430709    
## PIR1.85-5.0          -5.024e-03  1.874e-02  -0.268 0.788695    
## bmi                   1.538e-02  1.005e-03  15.301  < 2e-16 ***
## insuranceno           6.667e-02  1.625e-02   4.104 4.11e-05 ***
## foodSecuremarginal    1.325e-02  2.143e-02   0.618 0.536471    
## foodSecurelow         4.662e-02  2.263e-02   2.060 0.039425 *  
## foodSecureverylow     7.629e-02  2.497e-02   3.055 0.002262 ** 
## totKcal              -1.632e-04  4.060e-05  -4.021 5.87e-05 ***
## totProt               5.770e-04  3.202e-04   1.802 0.071628 .  
## totCarb               7.993e-04  2.309e-04   3.461 0.000541 ***
## totSugr              -9.934e-05  2.054e-04  -0.484 0.628652    
## totFat                1.770e-03  4.531e-04   3.908 9.41e-05 ***
## dietverygood          1.965e-02  2.624e-02   0.749 0.453829    
## dietgood              5.139e-02  2.450e-02   2.098 0.035974 *  
## dietfair              5.515e-02  2.699e-02   2.044 0.041020 *  
## dietpoor              3.455e-02  3.667e-02   0.942 0.346148    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.522 on 6605 degrees of freedom
## Multiple R-squared:  0.1684, Adjusted R-squared:  0.165 
## F-statistic: 49.55 on 27 and 6605 DF,  p-value: < 2.2e-16

Logistic Regression

Logistic regression conducted with HbA1c above or below 6.5 as the outcome and features in the model were: age, race, education, PIR, bmi, insurance, foodSecure, totKcal, totProt, totCarb, totSugr, totFat, diet.

Results of Logistic REgression

Older Age was significantly associated with greater odds of UDM (OR=1.05, 95% confidence interval [CI]=1.05, 1.07)
Race/ethnicity being non-white was significantly associated with greater odds of UDM.
Income of 100-132% of poverty line was most strongly associated with UDM (OR=1.73, CI=1.05, 2.93)
Higher BMI associated with greater likelihood of UDM (OR=1.09, CI=1.07, 1.11)
Compared to “Excellent” diet quality, diet quality rated as “Good” or “Fair” associated with highest odds of UDM (OR=2.28, CI=0.58, 3.10 and OR=2.33, CI=0.99, 4.53, respectively).

#checking hba1ccutoff variable and found it was character
summary(impute1$hba1cCutoff)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.00000 0.00000 0.02533 0.00000 1.00000

class(impute1$hba1cCutoff)

## [1] "numeric"

options(scipen=999) #prevents ORs from being in scientific notation. 999 sets a high threshold for R to use scientific notation

#logistic regression
nhanesGlm <- glm(impute1$hba1cCutoffFactor ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet, data = impute1, family = binomial(logit))
summary(nhanesGlm)

## 
## Call:
## glm(formula = impute1$hba1cCutoffFactor ~ age + race + education + 
##     PIR + bmi + insurance + foodSecure + totKcal + totProt + 
##     totCarb + totSugr + totFat + diet, family = binomial(logit), 
##     data = impute1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2093  -0.2323  -0.1466  -0.0911   3.2746  
## 
## Coefficients:
##                         Estimate  Std. Error z value             Pr(>|z|)    
## (Intercept)          -10.7523711   0.7743729 -13.885 < 0.0000000000000002 ***
## age                    0.0561902   0.0063070   8.909 < 0.0000000000000002 ***
## raceblack              1.1808274   0.2286575   5.164       0.000000241501 ***
## raceMexA               0.9262727   0.2964812   3.124              0.00178 ** 
## raceAsian              2.0399434   0.3177570   6.420       0.000000000136 ***
## raceHispanic           1.2073283   0.2953914   4.087       0.000043658112 ***
## raceother              0.7189765   0.5368657   1.339              0.18050    
## educationhigh         -0.0026625   0.2281543  -0.012              0.99069    
## educationsomecollege  -0.1900053   0.2332140  -0.815              0.41523    
## educationcollege      -0.4578604   0.2902654  -1.577              0.11471    
## PIR1-1.32              0.5654851   0.2609508   2.167              0.03023 *  
## PIR1.33-1.49           0.4507932   0.3843435   1.173              0.24084    
## PIR1.5-1.84           -0.0451222   0.3422479  -0.132              0.89511    
## PIR1.85-5.0           -0.0288647   0.2403372  -0.120              0.90440    
## bmi                    0.0876000   0.0097157   9.016 < 0.0000000000000002 ***
## insuranceno            0.3264765   0.2002705   1.630              0.10306    
## foodSecuremarginal     0.2935464   0.2465230   1.191              0.23375    
## foodSecurelow          0.0131159   0.2696133   0.049              0.96120    
## foodSecureverylow      0.4299186   0.2926731   1.469              0.14185    
## totKcal               -0.0004712   0.0006884  -0.685              0.49366    
## totProt                0.0045315   0.0047827   0.947              0.34340    
## totCarb                0.0007941   0.0035236   0.225              0.82169    
## totSugr                0.0006522   0.0028989   0.225              0.82198    
## totFat                 0.0042506   0.0072296   0.588              0.55657    
## dietverygood           0.2396848   0.4228829   0.567              0.57086    
## dietgood               0.6791478   0.3838441   1.769              0.07684 .  
## dietfair               0.6741648   0.4072600   1.655              0.09785 .  
## dietpoor               0.4865069   0.5115790   0.951              0.34161    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1566.8  on 6632  degrees of freedom
## Residual deviance: 1307.3  on 6605  degrees of freedom
## AIC: 1363.3
## 
## Number of Fisher Scoring iterations: 7

exp(cbind(OR = coef(nhanesGlm), confint(nhanesGlm)))

##                                 OR          2.5 %         97.5 %
## (Intercept)          0.00002139462 0.000004492566  0.00009393096
## age                  1.05779883641 1.045024833523  1.07121044972
## raceblack            3.25706787827 2.094355182183  5.14781371962
## raceMexA             2.52507987330 1.400299674821  4.49831882173
## raceAsian            7.69017427812 4.093711862749 14.29833315732
## raceHispanic         3.34453702824 1.855766339707  5.93913681895
## raceother            2.05233167442 0.626848851005  5.35065476897
## educationhigh        0.99734106033 0.636632040928  1.56050451274
## educationsomecollege 0.82695472438 0.522977828604  1.30734118897
## educationcollege     0.63263575718 0.354905355392  1.11085371732
## PIR1-1.32            1.76030151674 1.050593043135  2.93284832215
## PIR1.33-1.49         1.56955666247 0.703726673907  3.21975444689
## PIR1.5-1.84          0.95588063417 0.472250400414  1.82477684502
## PIR1.85-5.0          0.97154786350 0.609800751505  1.56718592599
## bmi                  1.09155136702 1.070913583366  1.11253683058
## insuranceno          1.38607563541 0.930191038657  2.04248464114
## foodSecuremarginal   1.34117540262 0.812344174491  2.14263558417
## foodSecurelow        1.01320228823 0.583294993838  1.68644025922
## foodSecureverylow    1.53713234016 0.845884259884  2.67774570261
## totKcal              0.99952890486 0.998069860812  1.00076838290
## totProt              1.00454175126 0.995199119489  1.01405717623
## totCarb              1.00079446438 0.994113885186  1.00797631949
## totSugr              1.00065245966 0.994970193799  1.00634299413
## totFat               1.00425968261 0.990808824503  1.01935284887
## dietverygood         1.27084845545 0.576902924263  3.09577684952
## dietgood             1.97219633459 0.985767989638  4.52543247381
## dietfair             1.96239336509 0.928437957719  4.67336474941
## dietpoor             1.62662428103 0.595976695227  4.55373398918

#box plot above vs below bmi
ggplot(data = impute1, aes(x = factor(hba1cCutoffFactor), y = age)) +
  geom_boxplot() +
  labs(title = "above/below Status by variable age") +
  labs(x = "hba1c")

#box plot above vs below age
ggplot(data = impute1, aes(x = factor(hba1cCutoffFactor), y = bmi)) +
  geom_boxplot() +
  labs(title = "above/below Status by variable bmi") +
  labs(x = "hba1c")

Random forest classifier

Train and Test Set

Using the data set with all 5 multiple imputations (imputeLong) to increase n

# Split into Train and Validation sets
# Training Set : Validation Set = 70 : 30 (random)
set.seed(619)
train <- sample(nrow(imputeLong), 0.7*nrow(imputeLong), replace = FALSE)
trainSet <- imputeLong[train,]
testSet <- imputeLong[-train,]
#summary(TrainSet)
#summary(ValidSet)

Model

Features significant in bivariate analyses were included in random forest model (same as logistic regression model)
model was trained on 70% of data then tested on 30%
0.55% OOB error estimate
Imbalanced classes

Results of Random Forest

Age and BMI stand out as the most important factors in the model, followed by race
Correlation with logistic regression model shows robustness in analysis
0.55% error in OOB
Although OOB error of .55% seems small, should be considered with caution given highly imbalanced classes (98%/2%) making it challenging to train the model.

#factors significant in bivariate analysis
nhanesRF <- randomForest(hba1cCutoffFactor ~ age + race + education + PIR + bmi + insurance + foodSecure + totKcal + totProt + totCarb + totSugr + totFat + diet, data = trainSet, importance = TRUE)
nhanesRF

## 
## Call:
##  randomForest(formula = hba1cCutoffFactor ~ age + race + education +      PIR + bmi + insurance + foodSecure + totKcal + totProt +      totCarb + totSugr + totFat + diet, data = trainSet, importance = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 3
## 
##         OOB estimate of  error rate: 0.47%
## Confusion matrix:
##        no yes   class.error
## no  22632   2 0.00008836264
## yes   106 475 0.18244406196

nhanesRF$importance

##                     no        yes MeanDecreaseAccuracy MeanDecreaseGini
## age        0.014458932 0.40380362          0.024210438        116.14789
## race       0.010268439 0.30047804          0.017537435         54.90022
## education  0.006328508 0.19512387          0.011057085         39.58542
## PIR        0.005741501 0.18482153          0.010238447         44.26545
## bmi        0.013267018 0.41744008          0.023397602        152.56936
## insurance  0.001677071 0.04755306          0.002824153         14.11301
## foodSecure 0.003750330 0.10225539          0.006213488         36.35857
## totKcal    0.020700740 0.18865957          0.024902645        120.38012
## totProt    0.013124202 0.18611076          0.017451322        128.64973
## totCarb    0.016817834 0.17448572          0.020767873        119.54479
## totSugr    0.012074805 0.17334510          0.016115944        122.08952
## totFat     0.014714244 0.18621095          0.019015413        126.85139
## diet       0.005287332 0.15173024          0.008947169         40.02254

# Predicting on Validation set
nhanesRFPrediction <- predict(nhanesRF, testSet, type = "class")
table(nhanesRFPrediction)

## nhanesRFPrediction
##   no  yes 
## 9726  224

# Checking classification accuracy
mean(nhanesRFPrediction == testSet$hba1cCutoffFactor)

## [1] 0.9963819

table(nhanesRFPrediction, testSet$hba1cCutoffFactor)

##                   
## nhanesRFPrediction   no  yes
##                no  9691   35
##                yes    1  223

Limitations

The NHANES data only measures HbA1c and glucose (not included in this study) once. However, some studies indicate that confirmatory testing may be required to prevent over-estimation of UDM in the population (i.e. a single test may be an anomoloy) (7). Furthermore, as is always the case when working with NHANES data, the data are cross-sectional, thus no causal inference can be made without temporality. Self-reported data may be unreliable, including self-reported diabetes diagnosis and 24-hour dietary recalls. Specifically with dietary recall data, social-desireability bias may play an important role. These data lacked information about the geographical distribution of the cases. Prior research indicates that geographical region plays an important role in the diabetes epidemic. Yet, location information is restricted in this databases for protection of the participants, and is only accessible with an application to NHANES, which is beyond the scope of this project. Finally, with the models, the data had imbalanced classes for the semi-rare event of UDM (about 2% of the undiagnosed sample), which may make models unreliable without adjusting for imbalance. Adjustment for imbalanced was not done in this project.

Conclusions

Ultimately, this project found that older age and higher body mass index were the strongest predictors of UDM in both logistic regression and random forest models. Diet did not directly predict uncontrolled glucose levels, although obesity (resulting from a combination of diet, genetics, and other lifestyle factors) did. Those with incomes that were low yet above the poverty line may be more at risk for UDM. This somewhat surprising finding could be due to differences access to social wellfare programs, such as Medicaid between those below and above the poverty line. Althoug health insurance was included in the dataset, it was not separated into distinct types of insurance, and thus, may have been to blunt to show this nuance. Programs screening for undiagnosed diabetes may consider focusing on older, overweight/obese adults with incomes above the poverty line. Further research should refine the model to include weighting for nationally representativeness of the data, as well as include a more nuanced factor for health insurance.

References

Stokes, A., & Preston, S. H. (2017). The contribution of rising adiposity to the increasing prevalence of diabetes in the United States. Preventive Medicine, 101, 91-95. doi:https://doi.org/10.1016/j.ypmed.2017.05.031
Dall TM, Yang W, Halder P, Pang B, Massoudi M, Wintfeld N et al. The Economic Burden of Elevated Blood Glucose Levels in 2012: Diagnosed and Undiagnosed Diabetes, Gestational Diabetes Mellitus, and Prediabetes. Diabetes Care. 2014;37(12):3172-9. doi:10.2337/dc14-1036.
Centers for Disease Control and Prevention. National Diabetes Statistics Report, 2017. Atlanta, GA: Centers for Disease Control and Prevention, US Department of Health and Human Services; 2017.
US Centers for Disease Control and Prevention, & National Center for Health Statistics. (2014). National health and nutrition examination survey (NHANES) data. In. https://wwwn.cdc.gov/nchs/nhanes/Default.aspx: U.S. Department of Health and Human Services.
Johnson, C. L., Dohrmann, S. M., Burt, V. L., & Mohadjer, L. K. (2014). National Health and Nutrition Examination Survey: Sample design, 2011–2014. Vital Health Statistics, Series 2(162), 1-33.
Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal, 15, 104-116. doi:https://doi.org/10.1016/j.csbj.2016.12.005
Geiss LS, Bullard KM, Brinks R, Gregg EW. Considerations in Epidemiologic Definitions of Undiagnosed Diabetes. Diabetes Care. 2018;41(9):1835-8. doi:10.2337/dc17-1838.

BMIN503 Final Project Methods

Marsha Trego

12/7/2019