First, load the tidyverse package and read in the data.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
d <-read_csv("babbel_pre_post.csv")
Rows: 54 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): StudyID, Sex
dbl (6): OPIc_pre, OPIc_post, Vocab_score_Pre, Vocab_score_Post, Grammar_sco...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Compare Two Groups
Descriptive Statistics
We will compare the Vocabulary test scores for Male and Female test takers on the pretest. We will use some handy tidyverse functions to do this efficiently:
d %>%group_by(Sex) %>%summarise(n =n(),mean =mean(Vocab_score_Pre),sd =sd(Vocab_score_Pre),min =min(Vocab_score_Pre),max =max(Vocab_score_Pre))
# A tibble: 2 × 6
Sex n mean sd min max
<chr> <int> <dbl> <dbl> <dbl> <dbl>
1 F 37 11.7 9.87 1 35
2 M 17 15.5 10.2 1 34
The groups aren’t the same size, but it looks like they have similar variation in scores (SD ~= 10). The M group looks to have higher scores.
Two-Sample t-test
We will use a two-sample t-test to determine whether the observed difference is statistically significant - in other words, whether we can rule out the idea that there is actually no difference between groups in the population at large.
t.test(Vocab_score_Pre ~ Sex, data = d)
Welch Two Sample t-test
data: Vocab_score_Pre by Sex
t = -1.2711, df = 30.158, p-value = 0.2134
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
-9.820513 2.284742
sample estimates:
mean in group F mean in group M
11.70270 15.47059
In this case, even though we see a difference of almost 4 points, the amount of variability (large standard deviation) and small sample size (especially for the M group) means we don’t have enough information to confidently rule out a null difference in the population at large.
Compare Means Within a Group
Now we’ll look at the whole group of test takers and see whether their Vocabulary scores increased from pretest to posttest.
It looks like the mean score increased by about 6 points.
Paired
Because each posttest score is linked to a specific pretest score (the same person), we will use a paired t-test to investigate the difference in means.
Paired t-test
data: d$Vocab_score_Post and d$Vocab_score_Pre
t = 7.4383, df = 53, p-value = 8.892e-10
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
4.923095 8.558387
sample estimates:
mean difference
6.740741
This difference is clearly significant (p < .001) and allows us to reject the idea that there was no improvement from pretest to posttest.
Extra Practice
Try running between groups (M and F) and within-group (Post-Pre) analyses for Grammar scores.
Doing things by online calculator...
There are many online calculators for running t-tests. Check out one of these, but make sure you choose the right kind of t-test!