In Topic 5 we introduced an important statistical technique - hypothesis testing. In this computer lab, we will practice different aspects of hypothesis testing.
If you have time during today’s lab, you may like to work on Quiz 6. For the final question of the quiz, refer back to Computer Lab 4 if you need.
🏡 In this computer lab we will analyse the wonions
data set on White Imperial Spanish onion plants, from the sm
R package (Bowman and Azzalini 2021). This data set consists of 84 observations, and contains two variables of interest:
Yield
(in grams per plant), andDensity
of planting (in plants per square metre).🏡 To begin, open up RStudio and create a new script file. Before we can start analysing the data, we first need to install and load the sm
package.
Run the code below to get started:
install.packages("sm") # Install package
library(sm) # Load package
data(wonions) # Load onions data
In this question, we will conduct an exploratory analysis and one-sample \(t\)-test on the Yield
variable from the wonions
data set.
💻 It is always a good idea to carry out some exploratory analysis on a new data set, in order to gain a better understanding of the data. Often, a quick inspection of the data can help us identify key characteristics of the data, and provide us with ideas for subsequent analyses.
Run the R code below to obtain a numeric summary for the Yield
variable. Comment on any details you find noteworthy.
summary(wonions$Yield)
💻 Create each of the following plots for the Yield
variable:
Hint: To create a Normal Q-Q plot, run the R code below:
qqnorm(wonions$Yield, main = "Normal Q-Q plot for onion data", pch = 19)
qqline(wonions$Yield)
🏡 Suppose that previous data has suggested that the average Yield
in White Imperial Spanish onions is \(115\) grams per plant. For our sample data, we know from 2.1 that the Yield
sample mean is \(119.7\) grams per plant. Therefore, we would like to test if the average Yield
in White Imperial Spanish onions is actually different from \(115\) grams per plant.
To do this, we can perform a one-sample \(t\)-test in R.
To begin, clearly define \(\mu\) and the null and alternative hypotheses for this \(t\)-test, adhering to the statistical notation introduced in Topic 5.
Hint: The hypotheses definitions should have the following format:
\(H_0: \mu = \ldots \text{ versus } H_1: \mu \ldots \ldots,\)
where
🏡 Before we proceed further we should check that the one-sample \(t\)-test test assumptions are satisfied. There are three assumptions we should check:
We know that our data are numeric (1), and we can assume that the observations are independent (2). Therefore we only need to check the \(t\)-test normality assumption (3).
Referring to your histogram and Q-Q plot results from 2.1.1, do you have any concerns about the \(t\)-test normality assumption?
Note: Remember that it is the distribution of the sample mean which is assumed to be normal, rather than the data itself. If you have any questions about this, discuss with your classmates and/or computer lab demonstrator.
🏡 To carry out a rigorous statistical test for normality, we can use the Shapiro-Wilk test. Run the R code below to carry out this test for our data:
shapiro.test(wonions$Yield)
Based on the R output, what can we conclude?
Hint: Keep in mind that our sample size is 84.
🏡 Having checked the \(t\)-test assumptions, our next step is to calculate the \(t\)-test test statistic. Follow these steps:
Yield
measurements in R using the sd
functionHint: Take a look at Section 3.1 of the Topic 5 readings for details on how to calculate this test statistic.
💻 We can conduct a one-sample \(t\)-test in R using the t.test
function.
Let’s take a look at the Code
chunk below, and go over the different arguments in the t.test
function.
onion.yield.ttest <- t.test(wonions$Yield, alternative = "two.sided", mu = 115)
onion.yield.ttest
Note that:
t.test
function is the data we are assessing, wonions$Yield
.alternative = ...
, is used to specify the type of \(t\)-test we are conducting, i.e. is it a two-sided test, or a one-sided test (with \(H_1: \mu > ...\) or \(H_1: \mu < ...\))?. Here, since we are testing if the yield is different from 115, we have selected the "two.sided"
option. This is the default setting, meaning that if this argument is omitted, a two-sided test will be carried out. Other settings include "less"
and "greater"
.mu = ...
allows us to specify the value of \(\mu\) under the null hypothesis \(H_0\).💻 Run the R code in 2.5 above now.
🏡 What is the test statistic value? Does this value match the value you calculated by hand in 2.4?
🏡 What are the degrees of freedom for this test? How are they calculated?
🏡 What is the \(p\)-value, and what does this number represent?
🏡 What is the \(95\%\) confidence interval? What does the \(95\%\) value tell us about the level of significance \(\alpha\) used in this \(t\)-test?
🏡 Based on the \(p\)-value you have obtained, what decision should we make for our hypothesis test? Make sure to explain your reasoning clearly.
🏡 Based on the \(95\%\) confidence interval you have obtained, what decision should we make for our hypothesis test? Does this decision align with your conclusion in 2.5.6?
🏡 Interpret the \(95\%\) confidence interval you have obtained, and explain in layman’s terms what the interval tells us.
🏡 Write a short conclusion summarising the test and findings.
🏡 We will encounter examples of both “good” and “bad” Normal Q-Q plots in our data analyses. Whilst some plots are relatively easy to assess, sometimes it can be difficult to distinguish between an acceptable Normal Q-Q plot, and one that shows a violation of the assumption of normality.
To practice, assess the Normal Q-Q plots below and try to identify which (if any) correspond to data sampled from a normal distribution. Give reasons for your decision for each plot.
💻 In this question, we will switch our focus to the Density
variable from the wonions
data set.
Suppose that previous data has suggested that the average Density
in White Imperial Spanish onions is 80 plants per m\(^2\). Further suppose we would like to test if the average Density
is actually less than 80 plants per m\(^2\).
Repeat Question 2, but this time with respect to the Density
variable, using the details provided here.
Hint: Note that this \(t\)-test will be one-sided rather than two-sided.
sm
: Nonparametric Smoothing Methods (Version 2.2-5.7). University of Glasgow, UK; Università di Padova, Italia. http://www.stats.gla.ac.uk/~adrian/sm/.
These notes have been prepared by Rupert Kuveke and Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.