Topic 6: \(t\)-tests for two-sample hypothesis testing


In Topic 6 we extended our understanding of \(t\)-tests. We will now practice carrying out these new tests, using real data.


1 The independent samples \(t\)-test in jamovi

🏡 For this question, we will once again be assessing the wonions data set on White Imperial Spanish onion plants in the sm R package.

In Computer Lab 6, we assessed this data set of size \(n=84\) by focusing on two variables:

  • the Yield (in grams per plant), and
  • the Density of planting (in plants per square metre).

However, the wonions data set also contains a third, integer variable, Locality, which denotes whether the onion planting occurred in Purnong Landing (1) or in Virginia (2). While we ignored the planting location in our previous analysis, it may be beneficial to reassess the data to see if the onions planted in the two locations have statistically significant differences with regards to yield. Therefore, in this computer lab, we will assess the variables:

  • Yield and
  • Locality

To do this, we can carry out an independent samples \(t\)-test.

1.1

🏡 Download the file wonions.csv from the LMS, and save it in a relevant location on your PC.

Once you have done so, import the wonions.csv file in jamovi. For revision on how to do this, see Computer Lab 1.

1.2

🏡 Suppose that we want to check if there is a significant difference in the average yield between White Imperial Spanish (WIS) onions planted in Purnong Landing and Virginia. Let \(\mu_1\) denote the average yield of WIS onions planted in Purnong Landing, and let \(\mu_2\) denote the average yield of WIS onions planted in Virginia.

Using this notation, define the null and alternative hypotheses for our test. 💬

Hint: There are actually two ways in which we can write the hypotheses - can you think of both ways? Check the Topic 6 readings if you are unsure.

1.3

🏡 Define the dependent and independent variables for our test. 💬

1.4

💻 Create a table of descriptive statistics, as well as descriptive plots, for the Yield variable, split by Locality. In your exploratory analysis, include the following:

  • Default descriptive statistics, as well as skewness and the Shapiro-Wilk test
  • Histograms with densities overlaid
  • Boxplots
  • Q-Q plots

In the questions that follow, use the subscripts 1 (for Purnong Landing) and 2 (for Virginia) in your notation to differentiate your answers (e.g. use \(n_1\) for the Purnong Landing sample size and \(n_2\) for the Virginia sample size.):

1.5

🏡 What is the sample size for each group? 💬

1.6

🏡 What is the mean and standard deviation for each group? 💬

1.7

🏡 Comment on the difference and similarities in the histograms and boxplots you have produced. 💬

1.8

🏡 Just as for the one-sample \(t\)-test, we need to check that the data are numeric, that observations are independent, and that the sample mean \(\overline{X}\) is normally distributed.

We know that the data are numeric, and that the observations are independent. Therefore, all we need to do is check for normality, which we can do using the Shapiro-Wilk test. Remember though, that we need to check for normality for both groups individually.

Based on the Shapiro-Wilk tests, do you have any concerns about the normality assumption? 💬

1.9

💻 Carry out the independent samples \(t\)-test in jamovi, and be sure to include the following options:

  • Student’s and Welch’s tests
  • Mean difference and 95% confidence interval for the mean difference
  • Effect size
  • Descriptives
  • Descriptive plots
  • Homogeneity test

1.10

🏡 The independent samples \(t\)-test also has a fourth assumption, namely that the variances between the groups are equal (or homogeneous), which we should check now.

Use the Levene’s test for homogeneity of variances to help determine if the equal variances assumption has been met.

Based on this test, what is your conclusion? 💬

Based on your conclusion, which row of the independent samples \(t\)-test output should we read from (Student’s \(t\) or Welch’s \(t\))? 💬

1.11

🏡 Interpret the output of the independent samples \(t\)-test, and note down the test statistic value, the \(p\)-value, the degrees of freedom, the sample means, and the \(95\%\) confidence interval for the difference.

1.12

🏡 Explain, in your own words, what the \(95\%\) confidence interval tells us. 💬

1.13

🏡 Write a short conclusion summarising this test and your findings. 💬

2 The paired \(t\)-test in jamovi

🏡 For this question, we will consider data collected by Cornell Professor of Nutrition David Levitsky, on students’ weight gains over their first year of college (DASL 2021). A random sample of \(68\) students from varying backgrounds was selected, and their weights (in pounds) were measured at the start of semester, and 12 weeks later, at the end of semester. This data is available in the freshman-15.csv file on LMS.

Download the file freshman-15.csv from the LMS, and save it in a relevant location on your PC.

Once you have done so, import the freshman-15.csv file in jamovi. For revision on how to do this, see Computer Lab 1.

2.1

🏡 Suppose that we would like to know whether the average difference in the weights of students before and after a semester of college is statistically significant. To determine this, we can carry out a paired \(t\)-test. Let \(\mu_D\) denote the true mean difference in before and after weights (in pounds).

Using this notation, define the null and alternative hypotheses for this paired \(t\)-test. 💬

Hint: Check the Topic 6 readings if you are unsure

2.2

🏡 What are the dependent and independent variables for this test? 💬

2.3

💻 As a first step to our analysis, create a table of descriptive statistics, as well as descriptive plots, for the Initial.Weight and Terminal.Weight variables. In your exploratory analysis, include the following:

  • Default descriptive statistics
  • Histograms

Remember, with a paired \(t\)-test, our variable of interest is not the before and after weights themselves, but rather the paired differences. Therefore, when checking the paired \(t\)-test normality assumption, we will use the Shapiro-Wilk test and Q-Q plot provided via the Paired \(t\)-test analysis in jamovi shortly.

2.4

🏡 By comparing the histograms for the initial and final weights, what do you observe? 💬

2.5

🏡 What are the sample means and standard deviations of the initial and final weights? Comment on your findings. 💬

2.6

💻 Carry out the paired \(t\)-test in jamovi, and include the following options:

  • Mean difference and 95% confidence interval
  • Effect size
  • Descriptives
  • Descriptive plots
  • Normality test
  • Q-Q Plot

2.7

🏡 We can see that the data are numeric, and we are told that the observations are independent. Therefore, all that remains to confirm for our test assumptions is that the sample mean \(\overline{X}\) is normally distributed.

Based on the Shapiro-Wilk test result, your Q-Q plot, and sample size, what do you conclude regarding the normality assumption? 💬

2.8

🏡 For the remainder of this question, regardless of your result in 2.7, we will proceed under the assumption that all the assumptions for the paired \(t\)-test have been met.

Interpret the output of the test, and note down the test statistic, the degrees of freedom, the \(p\)-value, the mean of the differences, and the \(95\%\) confidence interval.

2.9

🏡 Explain, in your own words, what the \(95\%\) confidence interval tells us. Based on this confidence interval, would you reject your null hypothesis? 💬

2.10

🏡 Write a short conclusion summarising this test and your findings.

2.11

🏡 Extension: Consider the below R output, which is the result of a one-sample \(t\)-test for the freshman data, testing whether the paired differences are different from 0. What are your findings? Do you notice anything interesting about the \(t\)-test output?

## 
##  One Sample t-test
## 
## data:  freshman$Paired.Difference
## t = 7.4074, df = 67, p-value = 2.813e-10
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.396621 2.426909
## sample estimates:
## mean of x 
##  1.911765

3 Effect size for a one-sample \(t\)-test

🏡 In this computer lab, we have learnt how to produce effect sizes in jamovi, an additional step that we did not learn about in Computer Lab 6. Effect sizes are covered in Section 3 of the Topic 6 readings .

Suppose that we are just assessing the Yield variable of the wonions data set, and are not considering Locality. Carry out the following one-sample \(t\)-test in jamovi:

\[H_0: \mu = 110 \text{ versus } H_1: \mu \neq 110,\]

where \(\mu\) denotes the population average yield (in grams) of White Imperial Spanish onions.

What is the estimated effect size for this test, and what is its magnitude? 💬


Well done, that’s everything for today! If you still have time, you may like to have a go at Quiz 7, which is based on the Topic 7 readings.

Before you finish up, remember to save your work (e.g. your jamovi and Word files) somewhere safe (e.g. OneDrive) so that you can access it at a later time.


References

DASL. 2021. “Freshman 15 [.txt File].” 2021. https://dasl.datadescription.com/datafile/freshman-15/.


These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.