Comparing Two Means with t-tests

Author

Dan Isbell

Getting Started

First, load the tidyverse package and read in the data.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
d <- read_csv("babbel_pre_post.csv")
Rows: 54 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): StudyID, Sex
dbl (6): OPIc_pre, OPIc_post, Vocab_score_Pre, Vocab_score_Post, Grammar_sco...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Compare Two Groups

Descriptive Statistics

We will compare the Vocabulary test scores for Male and Female test takers on the pretest. We will use some handy tidyverse functions to do this efficiently:

d %>% group_by(Sex) %>%
  summarise(n = n(),
            mean = mean(Vocab_score_Pre),
            sd = sd(Vocab_score_Pre),
            min = min(Vocab_score_Pre),
            max = max(Vocab_score_Pre))
# A tibble: 2 × 6
  Sex       n  mean    sd   min   max
  <chr> <int> <dbl> <dbl> <dbl> <dbl>
1 F        37  11.7  9.87     1    35
2 M        17  15.5 10.2      1    34

The groups aren’t the same size, but it looks like they have similar variation in scores (SD ~= 10). The M group looks to have higher scores.

Two-Sample t-test

We will use a two-sample t-test to determine whether the observed difference is statistically significant - in other words, whether we can rule out the idea that there is actually no difference between groups in the population at large.

t.test(Vocab_score_Pre ~ Sex, data = d)

    Welch Two Sample t-test

data:  Vocab_score_Pre by Sex
t = -1.2711, df = 30.158, p-value = 0.2134
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
 -9.820513  2.284742
sample estimates:
mean in group F mean in group M 
       11.70270        15.47059 

In this case, even though we see a difference of almost 4 points, the amount of variability (large standard deviation) and small sample size (especially for the M group) means we don’t have enough information to confidently rule out a null difference in the population at large.

Compare Means Within a Group

Now we’ll look at the whole group of test takers and see whether their Vocabulary scores increased from pretest to posttest.

Descriptive Statistics

Calculate descriptive statistics as follows:

d %>%
  summarise(n = n(),
            mean_Pre = mean(Vocab_score_Pre),
            sd_Pre = sd(Vocab_score_Pre),
            min_Pre = min(Vocab_score_Pre),
            max_Pre = max(Vocab_score_Pre),
            mean_Post = mean(Vocab_score_Post),
            sd_Post = sd(Vocab_score_Post),
            min_Post = min(Vocab_score_Post),
            max_Post = max(Vocab_score_Post))
# A tibble: 1 × 9
      n mean_Pre sd_Pre min_Pre max_Pre mean_Post sd_Post min_Post max_Post
  <int>    <dbl>  <dbl>   <dbl>   <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
1    54     12.9   10.0       1      35      19.6    11.8        1       43

It looks like the mean score increased by about 6 points.

Paired

Because each posttest score is linked to a specific pretest score (the same person), we will use a paired t-test to investigate the difference in means.

t.test(d$Vocab_score_Post, d$Vocab_score_Pre, paired = TRUE)

    Paired t-test

data:  d$Vocab_score_Post and d$Vocab_score_Pre
t = 7.4383, df = 53, p-value = 8.892e-10
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 4.923095 8.558387
sample estimates:
mean difference 
       6.740741 

This difference is clearly significant (p < .001) and allows us to reject the idea that there was no improvement from pretest to posttest.

Extra Practice

Try running between groups (M and F) and within-group (Post-Pre) analyses for Grammar scores.

Doing things by online calculator...

There are many online calculators for running t-tests. Check out one of these, but make sure you choose the right kind of t-test!

https://langtest.jp/shiny/two/

https://langtest.jp/shiny/paired/

https://www.graphpad.com/quickcalcs/ttest1/