## Introduction

Many surveys of diabetes do not include a question about the type, so researchers rely on age of diagnosis as a way to distinguish between type 1 and type 2 diabetes (e.g. Koopman et al. (2005)). Here we try to find out whether this proxy is likely to work well.

## Data

NHANES is a series large health surveys. We download and load the latest NHANES data from 2013. One item, DID040, relates to when diabetes was diagnosed. Unfortunately, the survey does not have a question about the type of diabetes diagnosed.

# http://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx?Component=Questionnaire&CycleBeginYear=2013
d_nhanes13 = foreign::read.xport("~/Downloads/DIQ_H.XPT")

## Descriptive analysis

First we take a quick look at the descriptive statistics:

psych::describe(d_nhanes13$DID040) ## vars n mean sd median trimmed mad min max range skew kurtosis ## X1 1 727 48.59 14.82 50 49.36 14.83 1 80 79 -0.56 0.42 ## se ## X1 0.55 Notice that there is a lot of variation, with a standard deviation of about 15 years. We also look at the proportion of cases diagnosed before specified ages: kirkegaard::percent_cutoff(d_nhanes13$DID040, cutoffs = seq(20, 80, by = 5)) %>% round(2)
##   20   25   30   35   40   45   50   55   60   65   70   75   80
## 0.96 0.94 0.91 0.84 0.77 0.66 0.52 0.39 0.26 0.14 0.07 0.02 0.01

## Plot the distribution

Finally, we get a visual overview of the data using a density-histogram:

kirkegaard::GG_denhist(d_nhanes13, var = "DID040") + xlab("Age of diagnosis (self-reported). Red line = mean value.")

We can see that there are not two distinct groups in the data, so using age of diagnosis for differentiating between type 1 and type 2 diabetes is likely to be at best a useful proxy.

## References

Koopman, Richelle J, Arch G Mainous III, Vanessa A Diaz, and Mark E Geesey. 2005. “Changes in Age at Diagnosis of Type 2 Diabetes Mellitus in the United States, 1988 to 2000.” Annals of Family Medicine 3 (1). American Academy of Family Physicians: 60.