Topic 5: Hypothesis Testing

These are the solutions for Computer Lab 6. Data analysed is from the sm R package (Bowman and Azzalini 2021).

1 Preparations

1.1

install.packages("sm") # Install package
library(sm) # Load package
data(wonions) # Load onions data

2 One-sample \(t\)-tests

2.1 Initial Exploratory Analysis

summary(wonions$Yield)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   28.96   78.46  108.90  119.70  153.47  272.15

2.1.1

We can produce the plots as follows:

par(mfrow=c(1,2))
hist(wonions$Yield, main = "Histogram of Onion Yield \n (grams per plant)", 
     xlab = "grams", col = "chartreuse3")
boxplot(wonions$Yield, main = "Box Plot of Onion Yield \n (grams per plant)", 
        col = "chartreuse3")

qqnorm(wonions$Yield, main =  "Normal Q-Q plot for onion data", pch = 19)
qqline(wonions$Yield)

2.2 Defining Hypotheses

Let \(\mu\) denote the average yield (in grams) of White Imperial Spanish onions. Hence for this hypothesis test we have \[H_0: \mu = 115, \text{ versus } H_1: \mu \neq 115.\]

2.3 Test Assumption Checks

Based on the histogram from 2.1.1, we observe that the yield data is skewed to the right.

The Normal Q-Q plot appears acceptable for theoretical quantile values between roughly \(-1\) and \(1\), but for theoretical quantile values of greater magnitude the data diverges from the diagonal line (so the tails of the distribution do not match the tails of a Normal distribution). This could mean that our assumption of normality is suspect.

2.3.1 Normality Assumption Check

shapiro.test(wonions$Yield)

## 
##  Shapiro-Wilk normality test
## 
## data:  wonions$Yield
## W = 0.95889, p-value = 0.008958

The \(p\)-value computed by the test is \(0.008958\). As this is much smaller than the \(\alpha = 0.05\) value used in the Shapiro-Wilk test, we reject the null hypothesis that the data follows a Normal distribution, and conclude, based on this test, that the yield data is non-normal.

However, we note that our sample size is \(n=84\), so despite assessing data with a non-normal underlying distribution, thanks to the Central Limit Theorem we can still conclude that the distribution of the sample mean is (approximately) normal.

2.4 Computing the test statistic by hand

The R code below computes the sample standard deviation of the yield values.

sd(wonions$Yield)

## [1] 53.05174

Hence, as we already have \(n=84\), \(\overline{x} = 119.7\) and \(\mu_0 = 115\), we have \[\begin{align*} t &= \dfrac{119.7-115}{53.05174/\sqrt{84}} \approx 0.812. \end{align*}\]

2.5 Conducting a one-sample t-test in R

No answer required.

2.5.1

onion.yield.ttest <- t.test(wonions$Yield, alternative = "two.sided", mu = 115)
onion.yield.ttest

## 
##  One Sample t-test
## 
## data:  wonions$Yield
## t = 0.81201, df = 83, p-value = 0.4191
## alternative hypothesis: true mean is not equal to 115
## 95 percent confidence interval:
##  108.1873 131.2132
## sample estimates:
## mean of x 
##  119.7002

2.5.2

The test statistic is \(0.81201\), which is approximately equal to the value calculated in 2.4 above.

2.5.3

The degrees of freedom are \(83\). For the one-sample \(t\)-test, this is found by computing \(n-1\).

2.5.4

The \(p\)-value is \(0.4191\). This denotes the probability of seeing the result we did (\(\overline{x} = 119.7\)) assuming the null hypothesis is true; that is, assuming the true mean is equal to 115.

2.5.5

The \(95\%\) confidence interval is \((108.1873, 131.2132)\). Since we construct \((1-\alpha)\times 100\%\) confidence intervals, this tells us that our \(\alpha = 0.05\).

2.5.6

Since our \(p\)-value \(> \alpha\) (i.e. \(0.4191 > 0.05\)), we fail to reject the null hypothesis.

2.5.7

Our \(95\%\) confidence interval is \((108.1873, 131.2132)\). Since this interval contains \(\mu_0 = 115\), we cannot reject the null hypothesis. This decision matches our decision based on the \(p\)-value assessment.

2.5.8

The \(95\%\) confidence interval of \((108.1873, 131.2132)\) tells us that we are \(95\%\) confident that the true (i.e. population) average yield of White Imperial Spanish onions will be between \(108.1873\) and \(131.2132\) grams per plant.

2.5.9

We have carried out a statistical analysis of the yield characteristics of White Imperial Spanish onions, to determine if the true average yield of these onions is different from \(115\) grams per plant. Our results suggest, with a high degree of statistical certainty, that the true (population) average yield value is between approximately 108 and 131 grams per plant. We do not have sufficient evidence to support the alternative hypothesis that the true population mean yield is different to 115 grams per plant. Therefore, we conclude that we do not have enough evidence to disprove the original claim that the true (population) average yield of these onions is 115 grams per plant.

3 Assessing Normal Q-Q plots

Note that only plots A and E show data generated from a Normal distribution.

Plot B, which plots data generated from a Poisson distribution, shows a clear violation of the normality assumption.

Plots C and F (both with data generated from a Student’s t distribution) are not too bad. The underlying distribution is symmetrical, but often the dots pull away at the extremities of the Q-Q plots, due to the fatter tails of the Student’s t distribution (compared to the tails of a Normal distribution).

Plot D is difficult, because it certainly looks as if it satisfies the normality assumption, despite some fluctuations at the extremities. However, the data for this plot are actually generated from a Weibull distribution!

Note that for both plots A and E, at the theoretical quantiles of higher magnitude there are some minor deviations from the diagonal line. However in practice this is common, and to be expected (to a degree). As always, you should support your analysis with multiple tests, to ensure you have a robust understanding of the data.

4 Extension: Conducting a one-sample \(t\)-test Practice

4.1

summary(wonions$Density)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   18.78   39.54   61.78   73.33   97.86  184.75

4.1.1

We can produce the plots as follows:

par(mfrow=c(1,2))
hist(wonions$Density, main = "Histogram of Onion Density \n (plants per m^2)", 
     xlab = "grams", col = "orange")
boxplot(wonions$Density, main = "Box Plot of Onion Yield \n (plants per m^2)", 
        col = "orange")

qqnorm(wonions$Density, main =  "Normal Q-Q plot for onion data", pch = 19)
qqline(wonions$Density)

4.2

Let \(\mu\) denote the average planting density (in plants per m\(^2\)) of White Imperial Spanish onions. Hence for this hypothesis test we have \[H_0: \mu = 80, \text{ versus } H_1: \mu < 80.\]

4.3

Based on the histogram from 4.1.1, we observe that the density data is skewed to the right.

The Normal Q-Q plot for the density data looks even worse than the one obtained for the yield data, and shows clear signs of non-normal behaviour.

4.3.1

shapiro.test(wonions$Density)

## 
##  Shapiro-Wilk normality test
## 
## data:  wonions$Density
## W = 0.91068, p-value = 2.328e-05

The \(p\)-value computed by the test is \(p < 0.01\). As this is much smaller than the \(\alpha = 0.05\) value used in the Shapiro-Wilk test, we reject the null hypothesis that the data follows a Normal distribution, and conclude, based on this test, that the density data is non-normal.

However, just as for the yield case we note that, as our sample size is \(n=84\), thanks to the Central Limit Theorem we can still conclude that the distribution of the sample mean is (approximately) normal.

4.4

The R code below computes the sample standard deviation of the density values.

sd(wonions$Density)

## [1] 41.53086

Hence, as we already have \(n=84\), \(\overline{x} = 119.7\) and \(\mu_0 = 80\), we have \[\begin{align*} t &= \dfrac{73.33 - 80}{41.53086/\sqrt{84}} \approx -1.472. \end{align*}\]

4.5

No answer required.

4.5.1

onion.density.ttest <- t.test(wonions$Density, alternative = "less", mu = 80)
onion.density.ttest

## 
##  One Sample t-test
## 
## data:  wonions$Density
## t = -1.4714, df = 83, p-value = 0.07248
## alternative hypothesis: true mean is less than 80
## 95 percent confidence interval:
##     -Inf 80.8701
## sample estimates:
## mean of x 
##   73.3325

4.5.2

The test statistic is \(-1.4714\), which is approximately equal to the value calculated in 4.4 above.

4.5.3

The degrees of freedom are \(83\). For the one-sample \(t\)-test, this is found by computing \(n-1\).

4.5.4

The \(p\)-value is \(0.07248\). This denotes the probability of seeing the result we did (\(\overline{x} = 73.33\)) assuming the null hypothesis is true; that is, assuming the true mean is equal to 80.

4.5.5

The \(95\%\) confidence interval is \((-\infty, 80.8701)\). Since we construct \((1-\alpha)\times 100\%\) confidence intervals, this tells us that our \(\alpha = 0.05\).

4.5.6

Since our \(p\)-value \(> \alpha\) (i.e. \(0.07248 > 0.05\)), we fail to reject the null hypothesis. Note, however, that this is a “close-to-significant” result.

4.5.7

Our \(95\%\) confidence interval is \((-\infty, 80.8701)\). Since this interval contains \(\mu_0 = 80\), we cannot reject the null hypothesis. This decision matches our decision based on the \(p\)-value assessment.

4.5.8

The \(95\%\) confidence interval of \((-\infty, 80.8701)\) tells us that we are \(95\%\) confident that the true (i.e. population) average planting density of White Imperial Spanish onions will be 80.870 plants or less, per m\(^2\).

4.5.9

We have carried out a statistical analysis of the planting density characteristics of White Imperial Spanish onions, to determine if the population average planting density of these onions is less than \(80\) plants per m\(^2\). Our results suggest, with a high degree of statistical certainty, that the true (population) average yield value is 81 plants per m\(^2\) or less. Therefore, we conclude that we do not have enough evidence to disprove the original claim that the true (population) average planting density of these onions is \(80\) plants per m\(^2\).

If there were any parts you were unsure about, take a look back over the relevant sections of the Topic 5 material.

References

Bowman, A. W., and A. Azzalini. 2021. R Package sm: Nonparametric Smoothing Methods (Version 2.2-5.7). University of Glasgow, UK; Università di Padova, Italia. http://www.stats.gla.ac.uk/~adrian/sm/.

These notes have been prepared by Rupert Kuveke and Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

STM1001: Computer Lab 6 Solutions