Application of the Central Limit Theorem

Introduction and Method

As part of the NHANES study, the triglyceride levels of 3,026 adult women were mesaured. Triglycerides, the main constituent of both vegetable oil and animal fat, have been linked to atherosclerosis, heart disease, and stroke. Let’s consider this whole group of 3,026 women the population for the purposes of our simulation. We are going to conduct a study of this population by taking a small sample of, say 25 women, from it. We will compare the distribution of triglycerides in our population and in the sample:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    19.0    68.0    98.0   116.9   147.0   399.0

Population Distribution The dsitribuion of triglyceride levels among the female population is unimodal, skewed right, and the center of the distribution is 98 mg/dL. The middle 50% of women have levels of 68 mg/dL and 147 mg/dL. Normal triglyceride levels are below 150 mg/dL, so almost 25% of the population has high levels of triglycerides.

Taking One Sample from the Population

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    30.0    77.0   115.0   123.1   171.0   262.0

Sample Distribution The single sample of women drawn from the population is unimodal and skewed right. The center is at 110 mg/dL and the range is from 33-352 mg/dL.

## [1] 116.9451
## [1] 123.08

The mean of the sample (122.6 mg/dL) is higher than that of the population (116.95 mg/dL)

It is worth noting that (a) the distribution of triglycerides in the population is clearly right-skewed, (b) the sample looks representative of the population as it should because it is random, and (c) the sample means are close, but the sample mean is clearly off a bit in terms of estimating a population mean.

This is just one sample; the means of others random samples might be much further or closer to the population mean. To see that distribution, we’ll have to repeat the sampling process many times and obtain sample means.

Describing the Distribution of 100 Sample Means

The distribution of mean triglyceride levels created from 100 samples of 25 randomly-selected women in the NHANES study is approximately Normal with a mean of 116 mg/dL.

Applying the Normal Model to Sampling Distribution of Sample Means

The distribution of the 100 sample means is unimodal and relatively symmetric. Since there are more samples to draw from the data becomes symmetric compared to the previous smaller population samples. Thus, reinforcing the idea of the Central Limit Theorem; once you add too many samples the data becomes too vague and unrepresentative. Therefore, we can model the distribution with a mean at population mean of 117 mg/dL and a standard error of the population standard deviation of 68, divided by the sq root of sample size (n=25). This computation is shown in the code below.

## [1] 13.41499