Many surveys of diabetes do not include a question about the type, so researchers rely on age of diagnosis as a way to distinguish between type 1 and type 2 diabetes (e.g. Koopman et al. (2005)). Here we try to find out whether this proxy is likely to work well.
NHANES is a series large health surveys. We download and load the latest NHANES data from 2013. One item, DID040, relates to when diabetes was diagnosed. Unfortunately, the survey does not have a question about the type of diabetes diagnosed.
# http://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx?Component=Questionnaire&CycleBeginYear=2013
d_nhanes13 = foreign::read.xport("~/Downloads/DIQ_H.XPT")
First we take a quick look at the descriptive statistics:
psych::describe(d_nhanes13$DID040)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 727 48.59 14.82 50 49.36 14.83 1 80 79 -0.56 0.42
## se
## X1 0.55
Notice that there is a lot of variation, with a standard deviation of about 15 years.
We also look at the proportion of cases diagnosed before specified ages:
kirkegaard::percent_cutoff(d_nhanes13$DID040, cutoffs = seq(20, 80, by = 5)) %>% round(2)
## 20 25 30 35 40 45 50 55 60 65 70 75 80
## 0.96 0.94 0.91 0.84 0.77 0.66 0.52 0.39 0.26 0.14 0.07 0.02 0.01
Finally, we get a visual overview of the data using a density-histogram:
kirkegaard::GG_denhist(d_nhanes13, var = "DID040") + xlab("Age of diagnosis (self-reported). Red line = mean value.")
We can see that there are not two distinct groups in the data, so using age of diagnosis for differentiating between type 1 and type 2 diabetes is likely to be at best a useful proxy.
Koopman, Richelle J, Arch G Mainous III, Vanessa A Diaz, and Mark E Geesey. 2005. “Changes in Age at Diagnosis of Type 2 Diabetes Mellitus in the United States, 1988 to 2000.” Annals of Family Medicine 3 (1). American Academy of Family Physicians: 60.