The irregularly shaped object under study is The Blob. The Blob is the name and lead character of an independently made movie released by Paramount Pictures in 1958. A remake was released in 1988. The 1958 movie starred a young Steve McQueen who was 27 years old and played a rambunctious teenager. Ah Hollywood.
Because The Blob is a three-dimensional object, when we discuss finding its area, we actually are finding the area of a cross-section of The Blob.
The activity uses a gridded scale drawing of The Blob. Students generate random numbers, determine lengths on a grid, multiply by a scaling factor, and find area estimates for squares. These area estimates are the data. Students summarize these data numerically and graphically with tools such as histogram, boxplot, five-number summary (minimum, first quartile - i.e. the 25th percentile, median, third quartile - i.e. the 75th percentile, maximum), mean, and standard deviation. Each student creates her own confidence interval estimate of the area of a cross-section of The Blob. The different confidence interval estimates of the students are compared, which leads to discussion of sampling variability and the confidence level.
We may not be able to stop The Blob, pretty much nobody can, but we can figure out how big it is. Attached on a separate sheet of paper is a Gridded Scale Drawing of The Blob. Each 1 grid square on the drawing is equivalent to 2.2 yards. Our goal is to estimate the size of The Blob. Notice from the Gridded Scale Drawing that The Blob has width and height that are roughly equal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2788 12002 16565 15149 19066 20449
My sample of area estimates for the Blob creates a distribution that is unimodal and skewed left. The center as represented by the median is 16,565 square yards. My sample’s estimates range from a minimum of 2,788 square yards to 20,449 square yards.
Ho: mu=1500 The area of the blob is equal to 1500. HA: mu= 1500 The area of the blob is not equal to 1500. The distribution of the area estimates of the blob is skewed left, with a center of 15149. In general area estimates range from 2788 to 20449. A one-sample t-test is run below.
##
## One Sample t-test
##
## data: blob
## t = 0.095002, df = 11, p-value = 0.926
## alternative hypothesis: true mean is not equal to 15000
## 95 percent confidence interval:
## 11702.75 18594.73
## sample estimates:
## mean of x
## 15148.74
I am 95% confident that the TRUE area of the blob is between 11702.75 square yards and 18594.23 square yards. The TRUE mean area (null=15000) is captured in my confidence interval. My point estimate of the mean is 15148.74 with a margin of error of plus or minus 3445.99.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 818 11389 16565 15095 19825 21727
The distribution of the population is unimodal and skewed left. The area of the population is 15095 square yards. The measures of center are 15,095 square yards (mean), and 16,565 square yards (median). The estimates range from 818 square yards to 21,727 square yards.According to the Central Limit Theorem, even if a population’s distribution is skewed, a distribution of sample’s from that population will be unimodal and relatively symmetric. Therefore, we will take samples of the means to get better representation of the TRUE sample mean.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10241 16706 17717 17514 19520 21083
THe sample size of 12 area estimates produces a distribution that is bimodal. The mean is 13,263 square yards. The center is much higher with a median of 14,406 square yards.
In order for the Central Limit Theorem to apply, we need to draw many samples. The command below will draw 100 samples of 12 and store them in a new vector called sample_means12.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11839 13828 14720 14879 15666 17845
My distribution of 100 sample means forthe estimate of the Blob’s area is unimodal and skewed left. The center of the distribution is 15,217 square yards (median), while the IQR is 2,016 square yards. Both the mean and the median are close to the null hypothesis of 15,000 square yards for the area of the Blob. According to the Central Limit Theorem, if I create a distribution of 1000 samples of size 12, it will be more Normal than the 100 samples of size 12
How will a distribution of 1000 samples of size 12 differ from a distribution of 100 samples of size 12?
According to the Central Limit Theorem,this distribution should be narrower (less variable) and “more Normal”.
Introduction and Method
As part of the NHANES study, the triglyceride lelvels of 3026 adult women were measured. Triglycerides, the main constiuent of both vegetable and animal fat, have been linked to atherosclerosis, heart disease, and stroke. Let’s consider this whole group of women the population for the purposes of the simulation. I am going to conduct a study of this population by taking a small sample, say of 25 women, from it. We will compare the distribution of triglycerides in our and in the sample:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.0 68.0 98.0 116.9 147.0 399.0
The distribution of triglyceride levels of 3026 women is unimodal and skewed right. The center of the distribution is 98mg/dL. The middle 50% of women have levels of 68 mg/dL and 147 mg/dL. Normal triglyceride levels are below 150 mg/dL, so almosy 25% of the population has high levels of triglycerides
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 41.0 54.0 77.0 108.9 133.0 386.0
The single sample of women drawn from the population is unimodal and skewed right.The center is at 110 mg/dL and the range is from 33-352 mg/dL. According to the outlier rule: There is at least one high outlier. In addition, there is a gap in the data, making 4 measures in the sample seem extreme.
## [1] 116.9451
## [1] 108.88
## [1] 67.94322
The mean of the sample (114.48 mg/dL) is lower than the population (116.9451 mg/dL).
It is worth noting that (a) the distribution of tryglycerides in the population is clearly right-skewed, (b) the sample looks representative of the population as it should because it is representative, and (c) the sample means are close, but the sample mean is clearly off a bit in terms of estimating a population mean.
This is just one sample; the means of others randomm samples might be much further or closer to the population mean. To see that distribution, I will have to repeat the sampling process many times and obtain sample means.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 80.28 107.18 119.56 117.33 126.27 165.40
The distribution of mean triglyceride levels created from 100 samples of 25 randomly-selected women in the NHANES study is approximately Normal with a mean of 116 mg/dL.
The distribution of 100 sample means is unimodal and approximately symmetric without any outliers. Therefore, I can model the distribution with a mean at population mean of 117 mg/dL and a standard error of the population standard deviation of 68, divided by the sq root of sample size (n= 25). This computation is shown in the code below.
## [1] 15.36401
## [1] 13.58864
## [1] 117.3304
The sampling distribution of the sample means of triglyceride levels is approximately Normal with the mean of about 116 mg/dL and a standar deviation error of 14.23 mg/dL.
Defining the z-score formula to suit the sampling distribution of the means from above will give me the following code.
So, though not a perfect Normal Model, the approximation seems pretty good. Given this, I can make some distributional predictions of sample means of triglyceride levels for sample of 25 women. Remember, this is not a prediction of an individual woman’s triglyceride level and its relation to the mean of the population. Instead it is the probability of the mean of a sample of 25 and how it relates to the mean of the sampling distribution. Note: individual data is more likely to be deviant from a population mean than a sample’s mean is to be deviant from the mean of a sampling distribution. We can use this information for inference testing.
## [1] 0.1061973
There is a 10.6% chance that the sample mean will be less than 100 if the true mean is 117.
## [1] 134.3597
A sample mean triglyceride level of 134.3597 mg/dL represents the cut-off for the top 10%.
## [1] 18.3308
## [1] 91.65401
The middle 50% of sample means of triglyceride levels only vary by 18.33 mg/dL, while the population’s middle 50% (by individual) varies by 91.65 mg/dL. This confirms the Central Limit Theorem - sample means will be more Normal and less variable as sample size increases.
## [1] 0.04488361
## [1] 0.3671823
It would be highly unusual to see a sample mean greater 140 mg/dL. I would only expect to see this mean 4.5% of the time. However, seeing an individual above 140 mg/dL is much more likely. I would see this result 36.7% of the time.
mean = 117. The average triglyceride level is 117 mg/dL.
## [1] 0.07108196
Though the difference is considerable. I expect to see a sample mean of 97 mg/dL 7.1% of the time. This is higher than the standard significance level of 5%, so we RETAIN THE NULL. There is not enough evidence that the experimental drug lower triglyceride levels in women.