Topic 6: \(t\)-tests for two-sample hypothesis testing

These are the solutions for Computer Lab 7. Data analysed is from the sm R package (Bowman and Azzalini 2021) and from DASL (2021).

1 Independent samples \(t\)-tests

1.1

# install.packages(sm) # uncomment and run this line if the sm package is not installed
library(sm) # Load package
data(wonions) # Load onions data

1.2 Defining Hypotheses

\(\mu_1\) denotes the average yield of WIS onions planted in Purnong Landing.
\(\mu_2\) denotes the average yield of WIS onions planted in Virginia.

The null hypothesis is \(H_0: \mu_1 = \mu_2\) while the alternative hypothesis is \(H_1: \mu_1 \neq \mu_2\).

Note that we could also write this as \(H_0: \mu_1 - \mu_2 = 0\) and \(H_1: \mu_1 - \mu_2 \neq 0\).

1.2.1

The dependent variable here is the Yield variable. The independent variable is the Locality variable.

1.3 Initial Exploratory Analysis

No answer required.

1.3.1

Example R code for this question is shown below:

table(wonions$Locality)

## 
##  1  2 
## 42 42

Hence we have \(n_1 = n_2 = 42\).

1.3.2

Example R code for this question is shown below:

tapply(wonions$Yield, wonions$Locality, mean)

##        1        2 
## 129.7636 109.6369

tapply(wonions$Yield, wonions$Locality, sd)

##        1        2 
## 52.80989 51.97360

Hence we have \(\overline{x}_1 = 129.7636\), \(\overline{x}_2 = 109.6369\), \(s_1 = 52.80989\) and \(s_2 = 51.9736\).

1.3.3

Example R code for this question is shown below:

Note that the histograms and box plots have been plotted side-by-side to make comparisons easier.

par(mfrow=c(1,2))
# histograms
wonions1 <- wonions[wonions$Locality ==1,]
wonions2 <- wonions[wonions$Locality ==2,]
hist(wonions1$Yield, main = "Histogram of Purnong Landing \n Onion Yield \n (grams per plant)",
     xlab = "grams", col = "orange")
hist(wonions2$Yield, main = "Histogram of Virginian  \n Onion Yield \n (grams per plant)", 
     xlab = "grams", col = "chartreuse3")

# boxplots
par(mfrow=c(1,1))
boxplot(wonions$Yield ~ wonions$Locality,  main = "Box plots of Onion Yield \n (grams per plant)", 
        col = c("orange","chartreuse3"), names = c("Purnong Landing", "Virginia"), xlab = "",
        ylab = "grams per plant" )

1.3.4

We can see that both subsets of data appear to not be normally distributed. The box plots also show that the median yield values between the two locations are quite different. However the spread of the data in both subsets appears similar (although the Virginian yield data includes one large outlier).

1.4 Test Assumption Checks

No answer required.

1.4.1 Normality Assumption Check

Example R code is shown below.

We need to conduct the shapiro.test for both groups individually.

shapiro.test(wonions$Yield[wonions$Locality == 1])

## 
##  Shapiro-Wilk normality test
## 
## data:  wonions$Yield[wonions$Locality == 1]
## W = 0.94226, p-value = 0.03428

shapiro.test(wonions$Yield[wonions$Locality == 2])

## 
##  Shapiro-Wilk normality test
## 
## data:  wonions$Yield[wonions$Locality == 2]
## W = 0.93808, p-value = 0.0245

The \(p\)-values computed by the Shapiro-Wilk tests are \(0.03428\) and \(0.0245\) for Purnong Landing (Locality == 1) and Virginia (Locality == 2) respectively. As these values are both smaller than \(\alpha = 0.05\), we reject the null hypotheses that the data follow normal distributions, and conclude, based on these tests, that the yield data is non-normal for both localities. However, as noted in Computer Lab 6, thanks to the Central Limit Theorem we can still conclude that the distributions of the sample means are (approximately) normal, given that our sample sizes are greater than \(30\) for each Locality.

1.4.2 Equal Variances Assumption Check

Note here that in order for the Levene’s Test to work, the independent variable must be of the class factor.

library(car) # Load required package
leveneTest(wonions$Yield ~ as.factor(wonions$Locality))

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.3759 0.5415
##       82

Since the Levene’s Test \(p\)-value \(= 0.5415 > 0.05\), we can assume that the variances of the two Yield data subsets are equal.

1.5 Conducting an independent samples \(t\)-test in R

Example R code for this question is shown below:

 t.test(wonions$Yield ~ wonions$Locality, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  wonions$Yield by wonions$Locality
## t = 1.7604, df = 82, p-value = 0.08207
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
##  -2.617571 42.870904
## sample estimates:
## mean in group 1 mean in group 2 
##        129.7636        109.6369

1.5.1

We note that the test statistic equals \(1.7604\), the \(p\)-value equals \(0.08207\), the degrees of freedom equal \(82\) \((n_1 + n_2 - 2)\), the sample means match those we calculated in 1.3, and the \(95\%\) confidence interval is \(( -2.617571, 42.870904)\).

1.5.2

The \(95\%\) confidence interval is \(( -2.617571, 42.870904)\). This is an interval for the difference between the true average yield value of White Imperial Spanish onions grown in Purnong Landing compared to those grown in Virginia. Therefore this interval tells us that we are \(95\%\) confident that the difference in mean yield of Purnong Landing onions compared to Virginian onions will be between approximately \(-2.618\) grams and \(42.871\) grams.

1.5.3

We have carried out a thorough analysis to assess whether there is a statistically significant difference in the true (population) average yield value of White Imperial Spanish onions grown in the two different locations of Purnong Landing and Virginia. Following an independent samples \(t\)-test, we conclude that we do not have evidence of a statistically significant difference in the average yield values of onions grown in these two locations.

2 Paired \(t\)-tests

2.1

freshman <- read.table(file = "freshman-15.txt", header = T)

2.2 Defining Hypotheses

\(\mu_D\) denotes the true mean difference in before and after weights (in pounds).

The null hypothesis is \(H_0: \mu_D = 0\) while the alternative hypothesis is \(H_1: \mu_D \neq 0\).

2.2.1

The dependent variable here is the weight, which is measured at the start and end of semester. The independent variable is the time variable, since we measure at two time points (initial and terminal).

2.3 Initial Exploratory Analysis

Example R code for this question is shown below.

#boxplots
boxplot(freshman$Initial.Weight, freshman$Terminal.Weight,
        main = "Boxplots of Freshman Initial and Terminal Weights (lbs)", 
        col = c("skyblue", "red"), ylab = "Weight (lbs)", names = c("Initial", "Terminal"))

#histograms
par(mfrow = c(1,2))
hist(freshman$Initial.Weight, 
     main = "Histogram of Freshman Initial Weight (lbs)", col = "skyblue", ylab = "Weight (lbs)")
hist(freshman$Terminal.Weight, 
     main = "Histogram of Freshman Terminal Weight (lbs)", col = "red", ylab = "Weight (lbs)")

2.3.1

We observe from the histograms that the weights appear relatively similar before and after semester.

2.3.2

Example R code for this question is shown below:

summary(freshman)

##     Subject      Initial.Weight  Terminal.Weight
##  Min.   : 1.00   Min.   : 94.0   Min.   : 96.0  
##  1st Qu.:17.75   1st Qu.:118.8   1st Qu.:119.8  
##  Median :34.50   Median :134.0   Median :134.0  
##  Mean   :34.50   Mean   :136.1   Mean   :138.0  
##  3rd Qu.:51.25   3rd Qu.:150.0   3rd Qu.:151.0  
##  Max.   :68.00   Max.   :220.0   Max.   :224.0

The initial weight sample mean is \(136.1\) pounds, the terminal weight sample mean is \(138\) pounds, and the paired difference sample mean is \(1.912\) pounds.

sd(freshman$Initial.Weight)

## [1] 24.37108

The sample standard deviation for the initial weights is \(24.37108\).

sd(freshman$Terminal.Weight)

## [1] 24.61009

The sample standard deviation for the terminal weights is \(24.61009\).

sd(freshman$Paired.Difference)

## [1] NA

The sample standard deviation for the paired difference is \(2.128241\).

We note that the initial sample mean and the terminal sample mean are quite similar, as are the initial sample standard deviation and the terminal sample standard deviation.

At the terminal time point, both the sample mean and sample standard deviation are larger than at the initial time point.

2.4 Test Assumption Checks

Example R code for this question is shown below:

freshman$Paired.Difference <- freshman$Terminal.Weight-freshman$Initial.Weight

2.4.1

Example R code for this question is shown below:

hist(freshman$Paired.Difference, 
     main = "Histogram of Paired Differences in Weights \n before and after semester \n (with normal density curve overlaid)", 
     xlab = "Weight (lbs)", col = "skyblue", freq = F)
curve(dnorm(x, mean = mean(freshman$Paired.Difference), 
            sd = sqrt(var(freshman$Paired.Difference))), 
      add = TRUE, lwd = 2, col = "red")

The histogram looks approximately normal, although there are fewer larger values than we would expect if the data were clearly from a normal distribution.

qqnorm(freshman$Paired.Difference, 
       main =  "Normal Q-Q plot \n for Freshman weight paired differences data", 
       pch = 19)
qqline(freshman$Paired.Difference)

This Normal Q-Q plot could be acceptable, although it does exhibit some peculiar ‘grouping’ (which could possibly be due to the limited range of values the variable takes).

2.4.2 Normality Assumption Check

Example R code for this question is shown below:

shapiro.test(freshman$Paired.Difference)

## 
##  Shapiro-Wilk normality test
## 
## data:  freshman$Paired.Difference
## W = 0.95794, p-value = 0.02199

The \(p\)-value computed by the test is \(0.02199\). As this is much smaller than the \(\alpha = 0.05\) value used in the Shapiro-Wilk test, we reject the null hypothesis that the data follows a normal distribution, and conclude, based on this test, that the paired difference data is non-normal.

However, because our sample size of \(n=68\) is larger than \(30\), due to the Central Limit Theorem we can still conclude that the distribution of the sample mean is (approximately) normal.

2.5 Conducting a paired \(t\)-test in R

Example R code for this question is shown below:

t.test(freshman$Initial.Weight, freshman$Terminal.Weight, paired = T)

## 
##  Paired t-test
## 
## data:  freshman$Initial.Weight and freshman$Terminal.Weight
## t = -7.4074, df = 67, p-value = 2.813e-10
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.426909 -1.396621
## sample estimates:
## mean difference 
##       -1.911765

2.5.1

We note that the test statistic equals \(-7.4074\), the degrees of freedom equal \(n-1 = 67\), \(p < 0.001\), the mean of the differences equals \(-1.911765\), and the \(95\%\) confidence interval is \((-2.426909, -1.396621)\). Note that these values could be reversed in sign, if you have tested Terminal.Weight against Initial.Weight.

2.5.2

The \(95\%\) confidence interval of \((-2.426909, -1.396621)\) is an interval for the difference between the true (population) mean weight (in pounds) of students before semester, compared to after semester. Therefore this interval tells us that we are \(95\%\) confident that the average weight of a student at the start of the semester is between 2.42 and 1.4 pounds less than by the end of semester.

2.5.3

We have carried out a thorough analysis to assess whether there is a statistically significant difference in the true average weight of freshman students before and after their first semester of college. Following a paired \(t\)-test, we conclude, with a high degree of statistical certainty, that such a difference does exist. Namely, students are highly likely to gain between roughly 1.4 to 2.42 pounds over the course of semester.

2.6

Example R code for this question is shown below:

t.test(freshman$Paired.Difference)

## 
##  One Sample t-test
## 
## data:  freshman$Paired.Difference
## t = 7.4074, df = 67, p-value = 2.813e-10
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.396621 2.426909
## sample estimates:
## mean of x 
##  1.911765

Note that these results are the same as those found for the paired \(t\)-test, in terms of absolute magnitude. In other words, the two tests are equivalent!

3 Computing Effect Sizes

Note that for these questions, we have used the effsize R package.

install.packages("effsize")
library(effsize)
?cohen.d

3.1

No answer required.

3.2

cohen.d(wonions$Yield, NA, mu = 110)

## 
## Cohen's d (single sample)
## 
## d estimate: 0.1828448 (negligible)
## Reference mu: 110
## 95 percent confidence interval:
##      lower      upper 
## -0.2520877  0.6177774

Note that the effect size is roughly \(0.1828\), which is considered negligible, or very small.

3.3

Example R code for this question is shown below:

cohen.d(wonions$Yield, as.factor(wonions$Locality), mu = 0)

## 
## Cohen's d
## 
## d estimate: 0.384145 (small)
## 95 percent confidence interval:
##       lower       upper 
## -0.05394536  0.82223531

The effect size for the independent sample \(t\)-test conducted in 1.5 is \(0.384145\), which is considered to be a small effect.

3.4

Example R code for this question is shown below:

cohen.d(freshman$Initial.Weight, freshman$Terminal.Weight, paired = TRUE, within = FALSE)

## 
## Cohen's d
## 
## d estimate: -0.8982837 (large)
## 95 percent confidence interval:
##      lower      upper 
## -1.1824242 -0.6141433

The effect size for the paired \(t\)-test conducted in 2.5 is \(-0.8982837\), which is considered to be a large effect (note that we are interested in the magnitude of the effect size).

That’s everything! If there were any parts you were unsure about, take a look back over the relevant sections of the Topic 6 material.

References

Bowman, A. W., and A. Azzalini. 2021. R Package sm: Nonparametric Smoothing Methods (Version 2.2-5.7). University of Glasgow, UK; Università di Padova, Italia. http://www.stats.gla.ac.uk/~adrian/sm/.

DASL. 2021. “Freshman 15 [.txt File].” 2021. https://dasl.datadescription.com/datafile/freshman-15/.

These notes have been prepared by Rupert Kuveke and Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.

STM1001: Computer Lab 7 Solutions

Topic 6: \(t\)-tests for two-sample hypothesis testing

1 Independent samples \(t\)-tests

1.1

1.2 Defining Hypotheses

1.2.1

1.3 Initial Exploratory Analysis

1.3.1

1.3.2

1.3.3

1.3.4

1.4 Test Assumption Checks

1.4.1 Normality Assumption Check

1.4.2 Equal Variances Assumption Check

1.5 Conducting an independent samples \(t\)-test in R

1.5.1

1.5.2

1.5.3

2 Paired \(t\)-tests

2.1

2.2 Defining Hypotheses

2.2.1

2.3 Initial Exploratory Analysis

2.3.1

2.3.2

2.4 Test Assumption Checks

2.4.1

2.4.2 Normality Assumption Check

2.5 Conducting a paired \(t\)-test in R

2.5.1

2.5.2

2.5.3

2.6

3 Computing Effect Sizes

3.1

3.2

3.3

3.4

That’s everything! If there were any parts you were unsure about, take a look back over the relevant sections of the Topic 6 material.

References