As part of the NHANES study, the triglyceride levels of 3,026 adult women were measured. Triglycerides, the main constituient of both vegetable oil and animal fat, have been linked to atherosclerosis, heart disease, and stroke. Consider the whole group of 3,026 women the population and take a small sample of 25 women from it. ˆ
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.0 68.0 98.0 116.9 147.0 399.0
The population is skewed left and unimodal with a maximum of 399 women and a minimum of 19 women.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 31.0 69.0 100.0 117.2 154.0 263.0
The single sample of women drawn from the population is bimodal and skewed left. The center is 151 mg/dL and the range is from 31-279 mg/dL.
## [1] 116.9451
## [1] 117.2
The mean of the sample (141.76 mg/dL) is higher than the mean of the population (116.94 mg/dL). It is worth noting that (a) the distribution of triglycerides in the population is clearly right-skewed, (b) the sample looks representative of the population as it should because it is representative, and (c) the sampple means are close, but the sample mean is clearly off a bit in terms of estimating a population mean. This is just one sample; the means of others random samples might be much further or closer to the population mean. To see that distribution, we’ll have to repeat the sampling process many times and obtain sample means
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 84.64 105.52 114.84 116.53 126.41 163.96
The data is unimodal and skewed left. There are no outliers and the center is 116.37.
The distribution of 100 sample means is unimodal and approximately symmetric without any outliers. Therefore, we can model the distribution with a mean at population of 117 mg/dL and a standard error of the population standard deviation of 68, divided by the square root of sample size (n=25). This computation is shown in the code below.
## [1] 15.47744
## [1] 13.58864
## [1] 116.5336
The sampling distribution of the sample means of triglyceride levels is approximately normal with a mean of about 116 mg/dL and a standard error of 13.6 mg/dL. ###Modeling the Distribution with Z-scores Defining the z-score formula to suit the sampling distribution of the means from above will give us the following code:
###Using the relative distribution of means to calculate probabilities based on z scores So, though not a perfect Normal Model, the approximation seems pretty good. Given this, we can make some distributional predictions of sample means of triglyceride levels for sample of 25 women. Remember, this is not a prediction of an individual woman’s triglyceride level and its relation to the mean of the population. Instead it is the probability of the mean of a sample of 25 and how it relates to the mean of the sampling distribution. Note: individual data is more likely to be deviant from a population mean than a sample’s mean is to be deviant from the mean of a sampling distribution. We can use this information for inference testing.
## [1] 0.1061973
There is a 10.6% chance that a sample mean will be less than 100 if the true mean is 117. ##The sample mean triglyceride level representing the 90th percentile -top 10%
## [1] 134.3597
A sample mean trigylceride level of 134.3597 mg/dL represents the cut-off for the top 10%.
## [1] 18.3308
## [1] 91.65401
The middle 50% of sample means of triglyceride levels only vary by 18.33 mg/dL, while the population’s middle 50% (by individual) varies by 91.65 mg/dL. This confirms the Central Limit Theorem-sample means will be more normal and less variable as sample size increases.
## [1] 0.04488361
## [1] 0.3671823
It would be highlly unusual to see a sample mean greater than 140 mg/dL. We would only expect to see this mean 4.5% of the time. However, seeing an individual above 140 mg/dL is much more likely. We would see this result 36.7% of the time.
mean = 117. The average triglyceride level is 117 mg/dL. #**Alternative Hypothesis: mean<117. The average triglyceride level is less than 117 mg/dL.
## [1] 0.07108196
Though the difference is considerable. We expect to see a sample mean of 97 mg/dL 7.1% of the time. This is higher than the standard significance level of 5%, so we RETAIN THE NULL. There is not enough evidence that the experimental drug lower triglyceride levels in women.