A random sample of 21 energy bars of a particular brand is selected from a number of different stores. The labels on the bars claim that on the average such energy bars contain 20 grams of protein. Laboratory tests shows that the actual grams of protein for this 21 samples are: 20.70, 27.46, 22.15, 19.85, 21.29, 24.75, 20.75, 22.91, 25.34, 20.33, 21.54, 21.08, 22.14, 19.56, 21.10, 18.04, 24.12, 19.95, 19.72, 18.28, 16.26.
Construct a quantile-quantile plot for the data to check if the protein contents on such energy bars can be modeled by a normal distribution.
library(tinytex)
library(lattice)
library(psych)
library(pastecs)
library(MASS)
library(qualityTools)
##
## Attaching package: 'qualityTools'
## The following object is masked from 'package:stats':
##
## sigma
Lab5X <- c(20.70, 27.46, 22.15, 19.85, 21.29, 24.75, 20.75, 22.91, 25.34, 20.33, 21.54,
21.08, 22.14, 19.56, 21.10, 18.04, 24.12, 19.95, 19.72, 18.28, 16.26)
qqnorm(Lab5X)
qqline(Lab5X)
qqPlot(Lab5X) #he says this is a more ambiguous way to do it
Ans: Based on the QQ-Plot above, we can infer that the data can be modeled by a normal distribution. Almost all of the points (dots) lie within the blue curves, and are relatively close to the red fit line. There does not appear to be many outliers. We will test this hypothesis further below.
Check the normality assumption about the population distribution using a formal hypothesis testing method. Use a significance level of \(\alpha = 0.05\). You can the Shapiro-Wilk test for normality. Clearly state the null hypothesis and the alternative hypothesis. Do you have the same conclusion as in a)?
Step 1: Parameter of interest
Null hypothesis (H_0): The average energy (\(\mu\)) follows a normal distribution.
Step 2: Set up hypothesis
Alternative hypothesis (H_1): The average energy (\(\mu\)) DOES NOT follow a normal distribution.
Step 3: Test Statistic and Critical Region:
In this situation we want to use a Shapiro Test of Normality:
— 𝐻_0: Data coming from a Normal population
— 𝐻_1: Data is not coming from a Normal population
— Based on order statistics
— P-value small means rejecting 𝐻0, i.e., data is not coming from Normal.
Step 4: Computation
shapiro.test(Lab5X)
##
## Shapiro-Wilk normality test
##
## data: Lab5X
## W = 0.96887, p-value = 0.7079
Step 5: Analysis
Ans: From the data above using the Shapiro test, we get a p-value = 0.7079, which is higher than the significance level \(\alpha = 0.05\), so we fail to reject the null hypothesis that “The average energy that each bars contains is 20 grams of protein and the data follows a normal distribution” . We do not have the same conclusion that we had in part (a), the distribution does not follow a normal distribution in part (b).
It is important to note that a random sample of 21 bars is very small; if the brand is found in many stores we can assume that they have a large inventory. Thus a sample of 21 bars does not have provide enough data to make an an accurate prediction about the scope of the entire population.
Test an hypothesis against the claim that the average protein content on such energy bars is equal to 20 grams. Use a significance level of \(\alpha = 0.05\). Clearly state the null hypothesis and the alternative hypothesis and your conclusion in the context. Also report the p-value.
Step 1: Parameter of interest
Null hypothesis (H_0): The average energy (\(\mu\)) that each bars contains exactly 20 grams of protein.
Step 2: Set up hypothesis
Alternative hypothesis (H_1): The average energy (\(\mu\)) that each bars contains is NOT 20 grams of protein.
\(H_0:\mu = 20 \ vs. H_1: \mu ≠ 20\). Generally speaking, our alternative hypothesis here is that the average energy can be any value besides 20g, as long as it’s not negative.
Step 3: Test Statistic and Critical Region:
In this situation we want to use a t-test because we do not know the variance
Step 4: Computation
t.test(Lab5X, mean = 20, alternative= "two.sided", conf.level=0.95)
##
## One Sample t-test
##
## data: Lab5X
## t = 37.642, df = 20, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 20.12056 22.48135
## sample estimates:
## mean of x
## 21.30095
Step 5: Analysis
Ans: In this case, the p-value < 2.2e-16. This value is less than the significance level \(\alpha = 0.05\). This means that we reject the null hypothesis that “The average energy (\(\mu\)) that each bars contains exactly 20 grams of protein.”. This means that the alternative hypothesis is true; the average amount of protein in a bar is not equal to 20g.
End of lab.
________________________________________________________________________________________
orderedLab5XData <- Lab5X[order(Lab5X)]
print(orderedLab5XData)
## [1] 16.26 18.04 18.28 19.56 19.72 19.85 19.95 20.33 20.70 20.75 21.08 21.10
## [13] 21.29 21.54 22.14 22.15 22.91 24.12 24.75 25.34 27.46
summary(orderedLab5XData)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 16.26 19.85 21.08 21.30 22.15 27.46
stem(orderedLab5XData, scale = 3)
##
## The decimal point is at the |
##
## 16 | 3
## 17 |
## 18 | 03
## 19 | 679
## 20 | 0378
## 21 | 1135
## 22 | 129
## 23 |
## 24 | 18
## 25 | 3
## 26 |
## 27 | 5
Mean (21.30) > Mode (21.1) > Median (21.08)
Given the following information it does not perfectly fit the definition for a true skew or symmetric distribution, probably because there are not enough data points to form a strong conclusion on the entire population.
“Negatively” Skewed to the Left (hump on the RHS): mean < median < mode
Symmetric: mean = median = mode
“Positively” Skewed to the Right (hump on the LHS): mean > median > mode