T-Tests in R

Independent and Paired Tests with Palmer Penguins & Sleep Data

Author

Abdullah Al Shamim

Published

May 25, 2026

1 What is a t-test?

A t-test is a statistical test that helps us determine if there’s a significant difference between the means of two groups. Think of it as answering questions like “Do these two groups really differ, or could the difference just be due to random chance?”

There are two main types:

  • Independent t-test: Compares means from two separate, unrelated groups (e.g., comparing heights of cats vs. dogs)
  • Paired t-test: Compares means from the same group measured twice, or matched pairs (e.g., measuring blood pressure before and after medication in the same patients)
Note

Key assumptions: Both tests assume your data is approximately normally distributed, though t-tests are fairly robust to violations of this assumption, especially with larger sample sizes.

1.1 Understanding t.test() Arguments

The t.test() function in R has several important arguments:

  • x: A numeric vector of data values (or a formula for the formula method)
  • y: An optional numeric vector of data values (for two-sample tests)
  • alternative: Specifies the alternative hypothesis - “two.sided” (default), “less”, or “greater”
  • mu: The true value of the mean (or difference in means) under the null hypothesis. Default is 0
  • paired: Logical value indicating whether to perform a paired t-test. Default is FALSE
  • var.equal: Logical value indicating whether to assume equal variances. Default is FALSE (uses Welch’s t-test)
  • conf.level: Confidence level for the confidence interval. Default is 0.95 (95%)
  • formula: For formula method, use the format outcome ~ group
  • data: The data frame containing the variables (when using formula method)

For most basic comparisons, you’ll primarily use the formula method with the paired argument when needed.


2 Independent T-Test

When to use it: Use an independent t-test when you have two separate groups and want to know if their means differ significantly.

Example scenario: Let’s compare the bill length of Adelie penguins vs. Chinstrap penguins from the Palmer Archipelago. We’ll use the palmerpenguins package, which contains real data collected by Dr. Kristen Gorman.

2.1 Visualizing the Data

Show the code
library(tidyverse)
library(palmerpenguins)

penguins %>%
  filter(species %in% c("Adelie", "Chinstrap")) %>%
  drop_na(bill_length_mm) %>%
  group_by(species) %>%
  mutate(mean_bill = mean(bill_length_mm)) %>%
  ungroup() %>%
  ggplot(aes(x = bill_length_mm, fill = species)) +
  geom_density(alpha = 0.6) +
  geom_vline(aes(xintercept = mean_bill, color = species),
             linewidth = 1, linetype = "dashed") +
  annotate("text", x = 39, y = 0.09,
           label = "Mean", size = 4, fontface = "bold") +
  annotate("text", x = 49, y = 0.09,
           label = "Mean", size = 4, fontface = "bold") +
  labs(title = "Distribution of Penguin Bill Length by Species",
       x = "Bill Length (mm)",
       y = "Density",
       fill = "Species",
       color = "Species") +
  theme_minimal() +
  theme(legend.position = "bottom")
1
Load the tidyverse — provides ggplot2, dplyr, tidyr, and forcats used throughout
2
Load the Palmer Penguins dataset — real measurements from penguins in the Palmer Archipelago, Antarctica
3
Keep only the two species we want to compare; drop all others
4
Remove rows with missing bill length — required for a clean density curve and t-test
5
Compute the per-species mean inside a mutate() so it can be referenced by geom_vline() later
6
Map bill_length_mm to the x-axis and species to fill colour — one density curve per species
7
geom_density() draws a smooth kernel density estimate; alpha = 0.6 lets both curves show through where they overlap
8
geom_vline() draws a dashed vertical line at each species mean — color = species inherits the fill palette automatically
9
Hard-coded “Mean” labels placed just above the dashed lines to annotate each peak without a legend entry

2.2 Performing the Test

Show the code
penguins %>%
  filter(species %in% c("Adelie", "Chinstrap")) %>%
  drop_na(bill_length_mm) %>%
  t.test(bill_length_mm ~ species, data = .)
1
Same filter as the plot — keeps only the two species of interest
2
Drop missing values before passing to t.test() — the function cannot handle NA in the outcome variable
3
Formula method: outcome ~ group splits bill_length_mm by species; data = . pipes the filtered tibble in; defaults to Welch’s t-test (var.equal = FALSE)

    Welch Two Sample t-test

data:  bill_length_mm by species
t = -21.865, df = 106.97, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Adelie and group Chinstrap is not equal to 0
95 percent confidence interval:
 -10.952948  -9.131917
sample estimates:
   mean in group Adelie mean in group Chinstrap 
               38.79139                48.83382 

2.3 Interpreting the Results

The output shows several key pieces of information:

  • t-statistic: How many standard errors the means are apart. Larger absolute values indicate more different groups.
  • p-value: If less than 0.05, we reject the null hypothesis that the means are equal. This suggests a statistically significant difference.
  • 95% confidence interval: Range where we’re 95% confident the true difference in means lies.
  • Sample means: The average bill length for each species.

For our penguins: If p < 0.05, we can conclude that Adelie and Chinstrap penguins have significantly different bill lengths. The confidence interval tells us the likely range of that difference in millimeters. Looking at our density plot, we can see how the distributions are quite separated, with the means falling in distinctly different locations.


3 Paired T-Test

When to use it: Use a paired t-test when your data points are naturally paired - either the same subjects measured twice, or matched subjects measured once each.

Example scenario: Let’s examine the classic sleep dataset (built into R), which shows the effect of two different sleep drugs on 10 patients. Each patient received both drugs at different times, making this a perfect paired scenario.

3.1 Step 1: Prepare the Data

Show the code
sleep_wide <- sleep %>%
  pivot_wider(names_from = group,
              values_from = extra,
              names_prefix = "drug_")

sleep_wide
1
names_from = group uses the group column values (1, 2) as the new column names
2
values_from = extra fills those new columns with the extra sleep hours
3
names_prefix = "drug_" prepends “drug_” to each new column name, giving drug_1 and drug_2
4
Print the reshaped tibble — confirms one row per patient with both drug measurements side by side, ready for t.test(x, y, paired = TRUE)
# A tibble: 10 × 3
   ID    drug_1 drug_2
   <fct>  <dbl>  <dbl>
 1 1        0.7    1.9
 2 2       -1.6    0.8
 3 3       -0.2    1.1
 4 4       -1.2    0.1
 5 5       -0.1   -0.1
 6 6        3.4    4.4
 7 7        3.7    5.5
 8 8        0.8    1.6
 9 9        0      4.6
10 10       2      3.4

3.2 Visualizing the Paired Data

Show the code
sleep %>%
  mutate(group = paste("Drug", group)) %>%
  group_by(group) %>%
  mutate(mean_extra = mean(extra)) %>%
  ungroup() %>%
  ggplot(aes(x = extra, fill = group)) +
  geom_density(alpha = 0.6) +
  geom_vline(aes(xintercept = mean_extra, color = group),
             linewidth = 1, linetype = "dashed") +
  annotate("text", x = 0.8, y = 0.32,
           label = "Mean", size = 4, fontface = "bold") +
  annotate("text", x = 2.3, y = 0.32,
           label = "Mean", size = 4, fontface = "bold") +
  labs(title = "Distribution of Extra Sleep by Drug",
       x = "Extra Sleep (hours)",
       y = "Density",
       fill = "Treatment",
       color = "Treatment") +
  theme_minimal() +
  theme(legend.position = "bottom")
1
Relabel group values from “1”/“2” to “Drug 1”/“Drug 2” for clean legend display
2
Compute the per-drug mean within each group so geom_vline() can reference it
3
Map extra sleep hours to x and group to fill — produces two overlapping density curves
4
alpha = 0.6 keeps both distributions visible where they overlap — essential with only 10 data points per group
5
Dashed mean lines coloured by group — the horizontal shift between lines shows Drug 2 produced more sleep on average
6
Manually positioned “Mean” annotations above each dashed line at their approximate x positions

3.3 Step 2: Perform the Paired T-Test

Show the code
t.test(sleep_wide$drug_1, sleep_wide$drug_2, paired = TRUE)
1
Paired t-test using the vector method — x and y must be in the same row order (same patient); paired = TRUE computes within-person differences before testing, making the test more powerful by removing between-person variability

    Paired t-test

data:  sleep_wide$drug_1 and sleep_wide$drug_2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.4598858 -0.7001142
sample estimates:
mean difference 
          -1.58 

3.4 Interpreting the Results

The paired t-test output is similar to the independent t-test but with important differences:

  • t-statistic: Tests if the mean difference between paired observations is zero.
  • p-value: If less than 0.05, there’s a significant difference between the two conditions.
  • 95% confidence interval: Range for the true mean difference (not the difference between means!).
  • Mean of differences: Average change from condition 1 to condition 2.

For our sleep data: We’re testing whether the two drugs produce different amounts of extra sleep in the same individuals. The paired design is more powerful because it accounts for person-to-person variability - some people naturally sleep more than others, but we’re interested in whether each person sleeps differently on the two drugs. The density plot helps us visualize the overall distributions, though remember that the paired test is actually looking at the differences within each person, not just the overall distributions.


4 Key Takeaway

Independent vs. Paired - Choose wisely!

  • Different subjects in each group → Independent t-test
  • Same subjects measured twice OR matched pairs → Paired t-test
Important

Using the wrong test can lead to incorrect conclusions or reduced statistical power!