Topic 5: Hypothesis Testing


In Topic 5 we introduced an important statistical technique - hypothesis testing. In this computer lab, we will practice different aspects of hypothesis testing.

If you have time during today’s lab, you may like to work on Quiz 6. For the final question of the quiz, refer back to Computer Lab 4 if you need.

1 Preparations

🏡 In this computer lab we will analyse the wonions data set on White Imperial Spanish onion plants, from the sm R package (Bowman and Azzalini 2021). This data set consists of 84 observations, and contains two variables of interest:

  • Yield (in grams per plant), and
  • Density of planting (in plants per square metre).

1.1

🏡 To begin, open up RStudio and create a new script file. Before we can start analysing the data, we first need to install and load the sm package. Run the code below to get started:

install.packages("sm") # Install package
library(sm) # Load package
data(wonions) # Load onions data

2 One-sample \(t\)-tests

In this question, we will conduct an exploratory analysis and one-sample \(t\)-test on the Yield variable from the wonions data set.

2.1 Initial Exploratory Analysis

💻 It is always a good idea to carry out some exploratory analysis on a new data set, in order to gain a better understanding of the data. Often, a quick inspection of the data can help us identify key characteristics of the data, and provide us with ideas for subsequent analyses.

Run the R code below to obtain a numeric summary for the Yield variable. Comment on any details you find noteworthy.

summary(wonions$Yield)
🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

2.1.1

💻 Create each of the following plots for the Yield variable:

  • Histogram
  • Box plot
  • Normal Q-Q plot

Hint: To create a Normal Q-Q plot, run the R code below:

qqnorm(wonions$Yield, main =  "Normal Q-Q plot for onion data", pch = 19)
qqline(wonions$Yield)

2.2 Defining Hypotheses

🏡 Suppose that previous data has suggested that the average Yield in White Imperial Spanish onions is \(115\) grams per plant. For our sample data, we know from 2.1 that the Yield sample mean is \(119.7\) grams per plant. Therefore, we would like to test if the average Yield in White Imperial Spanish onions is actually different from \(115\) grams per plant.

To do this, we can perform a one-sample \(t\)-test in R.

To begin, clearly define \(\mu\) and the null and alternative hypotheses for this \(t\)-test, adhering to the statistical notation introduced in Topic 5.

Hint: The hypotheses definitions should have the following format:

\(H_0: \mu = \ldots \text{ versus } H_1: \mu \ldots \ldots,\)

where

  • \(\mu\) denotes the population \(\ldots\)
🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

2.3 Test Assumption Checks

🏡 Before we proceed further we should check that the one-sample \(t\)-test test assumptions are satisfied. There are three assumptions we should check:

  1. The data are numeric
  2. The observations are independent, and
  3. The sample mean \(\bar{X}\) is normally distributed

We know that our data are numeric (1), and we can assume that the observations are independent (2). Therefore we only need to check the \(t\)-test normality assumption (3).

Referring to your histogram and Q-Q plot results from 2.1.1, do you have any concerns about the \(t\)-test normality assumption?

Note: Remember that it is the distribution of the sample mean which is assumed to be normal, rather than the data itself. If you have any questions about this, discuss with your classmates and/or computer lab demonstrator.

🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

2.3.1 Normality Assumption Check

🏡 To carry out a rigorous statistical test for normality, we can use the Shapiro-Wilk test. Run the R code below to carry out this test for our data:

shapiro.test(wonions$Yield)

Based on the R output, what can we conclude?

Hint: Keep in mind that our sample size is 84.

🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

2.4 Computing the test statistic by hand

🏡 Having checked the \(t\)-test assumptions, our next step is to calculate the \(t\)-test test statistic. Follow these steps:

  1. Compute the sample standard deviation of the onions’ Yield measurements in R using the sd function
  2. Then, using the sample standard deviation, and the summary information from 2.1, and the hypothesis details from 2.2, calculate the test statistic for this one-sample \(t\)-test.

Hint: Take a look at Section 3.1 of the Topic 5 readings for details on how to calculate this test statistic.

🎧 Online students 💬 Enter your answer next to the question in the shared Google Doc.

2.5 Conducting a one-sample t-test in R

💻 We can conduct a one-sample \(t\)-test in R using the t.test function. Let’s take a look at the Code chunk below, and go over the different arguments in the t.test function.

onion.yield.ttest <- t.test(wonions$Yield, alternative = "two.sided", mu = 115)
onion.yield.ttest

Note that:

  • The first argument in the t.test function is the data we are assessing, wonions$Yield.
  • The second argument, alternative = ..., is used to specify the type of \(t\)-test we are conducting, i.e. is it a two-sided test, or a one-sided test (with \(H_1: \mu > ...\) or \(H_1: \mu < ...\))?. Here, since we are testing if the yield is different from 115, we have selected the "two.sided" option. This is the default setting, meaning that if this argument is omitted, a two-sided test will be carried out. Other settings include "less" and "greater".
  • The last argument, mu = ... allows us to specify the value of \(\mu\) under the null hypothesis \(H_0\).

2.5.1

💻 Run the R code in 2.5 above now.

2.5.2

🏡 What is the test statistic value? Does this value match the value you calculated by hand in 2.4?

🎧 Online students 💬 Enter your answer next to the question in the shared Google Doc.

2.5.3

🏡 What are the degrees of freedom for this test? How are they calculated?

🎧 Online students 💬 For each sub-question selected by the facilitator, enter your answer next to the question in the shared Google Doc.

2.5.4

🏡 What is the \(p\)-value, and what does this number represent?

🎧 Online students 💬 For each sub-question selected by the facilitator, enter your answer next to the question in the shared Google Doc.

2.5.5

🏡 What is the \(95\%\) confidence interval? What does the \(95\%\) value tell us about the level of significance \(\alpha\) used in this \(t\)-test?

🎧 Online students 💬 For each sub-question selected by the facilitator, enter your answer next to the question in the shared Google Doc.

2.5.6

🏡 Based on the \(p\)-value you have obtained, what decision should we make for our hypothesis test? Make sure to explain your reasoning clearly.

🎧 Online students 💬 For each sub-question selected by the facilitator, enter your answer next to the question in the shared Google Doc.

2.5.7

🏡 Based on the \(95\%\) confidence interval you have obtained, what decision should we make for our hypothesis test? Does this decision align with your conclusion in 2.5.6?

🎧 Online students 💬 For each sub-question selected by the facilitator, enter your answer next to the question in the shared Google Doc.

2.5.8

🏡 Interpret the \(95\%\) confidence interval you have obtained, and explain in layman’s terms what the interval tells us.

🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

2.5.9

🏡 Write a short conclusion summarising the test and findings.

🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

3 Assessing Normal Q-Q plots

🏡 We will encounter examples of both “good” and “bad” Normal Q-Q plots in our data analyses. Whilst some plots are relatively easy to assess, sometimes it can be difficult to distinguish between an acceptable Normal Q-Q plot, and one that shows a violation of the assumption of normality.

To practice, assess the Normal Q-Q plots below and try to identify which (if any) correspond to data sampled from a normal distribution. Give reasons for your decision for each plot.

🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.

4 Extension: Conducting a one-sample \(t\)-test Practice

💻 In this question, we will switch our focus to the Density variable from the wonions data set.

Suppose that previous data has suggested that the average Density in White Imperial Spanish onions is 80 plants per m\(^2\). Further suppose we would like to test if the average Density is actually less than 80 plants per m\(^2\).

Repeat Question 2, but this time with respect to the Density variable, using the details provided here.

Hint: Note that this \(t\)-test will be one-sided rather than two-sided.


Well done, that’s everything for today!

Before you finish up, remember to save your work somewhere safe (e.g. OneDrive) so that you can access it at a later time.


References

Bowman, A. W., and A. Azzalini. 2021. R Package sm: Nonparametric Smoothing Methods (Version 2.2-5.7). University of Glasgow, UK; Università di Padova, Italia. http://www.stats.gla.ac.uk/~adrian/sm/.


These notes have been prepared by Rupert Kuveke and Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.