T-Tests in R

Independent and Paired Tests with Palmer Penguins & Sleep Data

Author

Abdullah Al Shamim

Published

May 25, 2026

1 What is a t-test?

A t-test is a statistical test that helps us determine if there’s a significant difference between the means of two groups. Think of it as answering questions like “Do these two groups really differ, or could the difference just be due to random chance?”

There are two main types:

  • Independent t-test: Compares means from two separate, unrelated groups (e.g., comparing heights of cats vs. dogs)
  • Paired t-test: Compares means from the same group measured twice, or matched pairs (e.g., measuring blood pressure before and after medication in the same patients)
Note

Key assumptions: Both tests assume your data is approximately normally distributed, though t-tests are fairly robust to violations of this assumption, especially with larger sample sizes.


1.1 Understanding t.test() Arguments

The t.test() function in R has several important arguments:

Argument Description Default
x A numeric vector of data values (or a formula)
y An optional second numeric vector
alternative Alternative hypothesis: "two.sided", "less", or "greater" "two.sided"
mu True value of mean under the null hypothesis 0
paired Logical — perform a paired t-test? FALSE
var.equal Logical — assume equal variances? FALSE
conf.level Confidence level for the interval 0.95
formula Formula method: outcome ~ group
data Data frame containing the variables

For most basic comparisons, you’ll primarily use the formula method with the paired argument when needed.


2 Independent T-Test

When to use it: Use an independent t-test when you have two separate groups and want to know if their means differ significantly.

Example scenario: Let’s compare the bill length of Adelie penguins vs. Chinstrap penguins from the Palmer Archipelago. We’ll use the palmerpenguins package, which contains real data collected by Dr. Kristen Gorman.


2.1 Setup

Show the code
library(tidyverse)
library(palmerpenguins)
library(gt)
library(gtExtras)
1
Load the tidyverse for data wrangling and ggplot2 visualisation
2
Load the Palmer Penguins dataset — real Antarctic penguin measurements
3
Load gt for publication-quality tables
4
Load gtExtras for extra gt helpers

2.2 Prepare the Data

Show the code
penguins_filtered <- penguins |>
  filter(species %in% c("Adelie",
                         "Chinstrap")) |>
  drop_na(bill_length_mm)

penguins_filtered |>
  count(species)
1
Start with the built-in penguins dataset
2
Keep only the two species we want to compare
3
Remove any rows where bill length is missing — t-tests require complete data
4
Confirm how many observations we have per species
# A tibble: 2 × 2
  species       n
  <fct>     <int>
1 Adelie      151
2 Chinstrap    68

2.3 Visualising the Data

Show the code
species_colors <- c(
  "Adelie"    = "#f4a261",
  "Chinstrap" = "#2a9d8f"
)

species_means <- penguins_filtered |>
  group_by(species) |>
  summarise(mean_bill = mean(bill_length_mm), .groups = "drop")

ggplot(penguins_filtered,
       aes(x      = bill_length_mm,
           fill   = species,
           colour = species)) +

  geom_density(
    alpha     = 0.45,
    linewidth = 0.7,
    adjust    = 1.1
  ) +

  geom_vline(
    data      = species_means,
    aes(xintercept = mean_bill,
        colour     = species),
    linetype  = "dashed",
    linewidth = 0.8
  ) +

  geom_text(
    data  = species_means,
    aes(x      = mean_bill,
        y      = Inf,
        label  = "Mean",
        colour = species),
    vjust       = 1.6,
    hjust       = 0.5,
    size        = 3.8,
    fontface    = "bold",
    inherit.aes = FALSE
  ) +

  scale_fill_manual(
    values = species_colors,
    name   = "Species"
  ) +
  scale_colour_manual(
    values = species_colors,
    name   = "Species"
  ) +

  scale_x_continuous(
    breaks = seq(32, 58, by = 4)
  ) +

  labs(
    title    = "Bill Length: Adelie vs. Chinstrap Penguins",
    subtitle = "Overlapping densities with mean markers",
    x        = "Bill Length (mm)",
    y        = "Density"
  ) +

  theme_minimal(base_size = 14) +
  theme(
    plot.title          = element_text(face = "bold", size = 16),
    plot.subtitle       = element_text(colour = "grey40", size = 11),
    plot.title.position = "plot",
    legend.position     = "bottom",
    legend.title        = element_text(face = "bold"),
    panel.grid.minor    = element_blank(),
    panel.grid.major.x  = element_blank()
  )
1
Named colour palette — orange for Adelie, teal for Chinstrap — applied consistently to fill, outline, vlines, and labels
2
Pre-compute per-species means in a separate tibble for use in geom_vline() and geom_text()
3
Map both fill and colour to species so density curves, outlines, vlines, and text all inherit the same colour automatically
4
geom_density() draws smooth overlapping curves; alpha = 0.45 keeps both visible where they overlap; adjust = 1.1 gives a slightly smoothed kernel for n > 100
5
Dashed vertical lines mark each species mean — the clearest way to show how far apart the two distributions are centred
6
“Mean” labels pinned to the top of each vline using y = Inf + vjust — avoids cluttering the density area
7
Unified scale_fill_manual() and scale_colour_manual() with the same name = "Species" merge into a single legend entry per species
8
Explicit x-axis breaks from 32 to 58 mm, covering the full range of both species
9
theme_minimal() with no vertical grid lines keeps the eye on the horizontal distribution shape


2.4 Performing the Test

Show the code
t.test(
  bill_length_mm ~ species,
  data        = penguins_filtered,
  alternative = "two.sided",
  var.equal   = FALSE
)
1
Formula method: outcome ~ grouping variable — R will split bill_length_mm by species
2
Point to our filtered dataset
3
Two-sided test: we’re asking “are they different?” not “is one bigger?”
4
FALSE = Welch’s t-test, which does not assume equal variances — the safer default

    Welch Two Sample t-test

data:  bill_length_mm by species
t = -21.865, df = 106.97, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Adelie and group Chinstrap is not equal to 0
95 percent confidence interval:
 -10.952948  -9.131917
sample estimates:
   mean in group Adelie mean in group Chinstrap 
               38.79139                48.83382 

2.5 Interpreting the Results

The output shows several key pieces of information:

  • t-statistic: How many standard errors the means are apart. Larger absolute values indicate more different groups.
  • p-value: If less than 0.05, we reject the null hypothesis that the means are equal. This suggests a statistically significant difference.
  • 95% confidence interval: Range where we’re 95% confident the true difference in means lies.
  • Sample means: The average bill length for each species.
Tip

For our penguins: if p < 0.05, we can conclude that Adelie and Chinstrap penguins have significantly different bill lengths. The confidence interval tells us the likely range of that difference in millimetres. Looking at our density plot, we can see how the distributions are quite separated, with the means falling in distinctly different locations.


3 Paired T-Test

When to use it: Use a paired t-test when your data points are naturally paired — either the same subjects measured twice, or matched subjects measured once each.

Example scenario: Let’s examine the classic sleep dataset (built into R), which shows the effect of two different sleep drugs on 10 patients. Each patient received both drugs at different times, making this a perfect paired scenario.


3.1 Step 1: Prepare the Data

Show the code
sleep_wide <- sleep |>
  as_tibble() |>
  pivot_wider(
    id_cols     = ID,
    names_from  = group,
    values_from = extra,
    names_prefix = "drug_"
  )

sleep_wide
1
Start with R’s built-in sleep dataset — 20 rows in long format (10 patients × 2 drugs)
2
Convert to a tibble for cleaner printing
3
pivot_wider() reshapes from long to wide: one row per patient, one column per drug
4
names_prefix = "drug_" gives us clear column names: drug_1 and drug_2
5
Print the result to inspect the paired structure before testing
# A tibble: 10 × 3
   ID    drug_1 drug_2
   <fct>  <dbl>  <dbl>
 1 1        0.7    1.9
 2 2       -1.6    0.8
 3 3       -0.2    1.1
 4 4       -1.2    0.1
 5 5       -0.1   -0.1
 6 6        3.4    4.4
 7 7        3.7    5.5
 8 8        0.8    1.6
 9 9        0      4.6
10 10       2      3.4

3.2 Visualising the Paired Data

Show the code
sleep_long <- sleep_wide |>
  pivot_longer(
    cols         = c(drug_1, drug_2),
    names_to     = "drug",
    values_to    = "extra_sleep"
  ) |>
  mutate(
    Treatment = factor(drug,
                       levels = c("drug_1", "drug_2"),
                       labels = c("Drug 1", "Drug 2"))
  )

drug_means <- sleep_long |>
  group_by(Treatment) |>
  summarise(mean_sleep = mean(extra_sleep), .groups = "drop")

drug_colors <- c(
  "Drug 1" = "#e07070",
  "Drug 2" = "#3ab5a0"
)

ggplot(sleep_long,
       aes(x      = extra_sleep,
           fill   = Treatment,
           colour = Treatment)) +

  geom_density(
    alpha     = 0.45,
    linewidth = 0.7,
    adjust    = 1.2
  ) +

  geom_vline(
    data      = drug_means,
    aes(xintercept = mean_sleep,
        colour     = Treatment),
    linetype  = "dashed",
    linewidth = 0.8
  ) +

  geom_text(
    data  = drug_means,
    aes(x      = mean_sleep,
        y      = Inf,
        label  = "Mean",
        colour = Treatment),
    vjust       = 1.6,
    hjust       = 0.5,
    size        = 3.8,
    fontface    = "bold",
    inherit.aes = FALSE
  ) +

  scale_fill_manual(
    values = drug_colors,
    name   = "Treatment"
  ) +
  scale_colour_manual(
    values = drug_colors,
    name   = "Treatment"
  ) +

  scale_x_continuous(
    breaks = seq(-2, 6, by = 2)
  ) +

  labs(
    title    = "Distribution of Extra Sleep by Drug",
    subtitle = "Overlapping densities with mean markers",
    x        = "Extra Sleep (hours)",
    y        = "Density"
  ) +

  theme_minimal(base_size = 14) +
  theme(
    plot.title          = element_text(face = "bold", size = 16),
    plot.subtitle       = element_text(colour = "grey40", size = 11),
    plot.title.position = "plot",
    legend.position     = "bottom",
    legend.title        = element_text(face = "bold"),
    panel.grid.minor    = element_blank(),
    panel.grid.major.x  = element_blank()
  )
1
Pivot to long format — one row per patient per drug — so ggplot2 can map Treatment to fill and colour
2
factor() with explicit labels gives clean display names (“Drug 1” / “Drug 2”) instead of raw column names
3
Pre-compute per-group means in a separate tibble so they can be passed to geom_vline() and geom_text() independently
4
Salmon-pink for Drug 1, teal for Drug 2 — a perceptually distinct, colour-blind-safe pair that matches the reference RPubs style
5
geom_density() draws smooth overlapping curves; alpha = 0.45 keeps both distributions visible where they overlap; adjust = 1.2 slightly smooths the kernel for n = 10
6
geom_vline() draws a dashed vertical line at each group mean — the key visual anchor showing Drug 2 shifts the distribution rightward
7
geom_text() places the “Mean” label at y = Inf (top of panel), nudged just inside with vjust — cleaner than a separate legend entry
8
Consistent colour scale for both fill (density area) and colour (outline + vline + text) so all elements share one colour per drug
9
Explicit x-axis breaks from −2 to 6, matching the reference plot’s axis labels
10
theme_minimal() with horizontal grid only (panel.grid.major.x = element_blank()) keeps the focus on the curves; legend at bottom matches the reference layout


3.3 Step 2: Perform the Paired T-Test

Show the code
t.test(
  x      = sleep_wide$drug_1,
  y      = sleep_wide$drug_2,
  paired = TRUE,
  alternative = "two.sided",
  conf.level  = 0.95
)
1
x is the first measurement vector — extra sleep hours for Drug 1, one value per patient
2
y is the second measurement vector — extra sleep hours for Drug 2, same patients in the same order
3
Crucial: paired = TRUE tells R to compute differences within each row before testing
4
Two-sided: we’re asking whether Drug 2 produces different sleep (more or less) than Drug 1
5
95% confidence interval for the mean within-patient difference

    Paired t-test

data:  sleep_wide$drug_1 and sleep_wide$drug_2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.4598858 -0.7001142
sample estimates:
mean difference 
          -1.58 

3.4 Interpreting the Results

The paired t-test output is similar to the independent t-test but with important differences:

  • t-statistic: Tests if the mean difference between paired observations is zero.
  • p-value: If less than 0.05, there’s a significant difference between the two conditions.
  • 95% confidence interval: Range for the true mean difference (not the difference between means!).
  • Mean of differences: Average change from condition 1 to condition 2.
Tip

For our sleep data: we’re testing whether the two drugs produce different amounts of extra sleep in the same individuals. The paired design is more powerful because it accounts for person-to-person variability — some people naturally sleep more than others, but we’re interested in whether each person sleeps differently on the two drugs.


4 Key Takeaway

Independent vs. Paired — Choose Wisely!
Using the wrong test leads to incorrect conclusions or reduced statistical power
Scenario Correct Test R Code
Two independent groups Independent t-test t.test(y ~ group, data = df)
Same group measured twice Paired t-test t.test(x = pre, y = post, paired = TRUE)
Matched pairs (e.g., twins) Paired t-test t.test(x = group1, y = group2, paired = TRUE)
Important

Rule of Thumb

  • Different subjects in each group → Independent t-test
  • Same subjects measured twice OR matched pairs → Paired t-test

Using the wrong test can lead to incorrect conclusions or reduced statistical power!


5 Publication-Quality Figures (Nature / Q1 Journal Style)

Q1 journals such as Nature, Cell, and Science enforce strict figure standards. The key rules are:

  • Always show individual data points — bar charts alone are rejected; every observation must be visible
  • Overlay mean ± s.d. as a crossbar on top of the dots, never replacing them
  • Significance brackets using the standard star notation (* / ** / *** / ****)
  • Connecting lines for paired designs — linking the same subject across conditions
  • Muted, desaturated colours — no neon or saturated primaries
  • Minimal axes — no top/right borders (theme_classic()), hairline tick marks, rotated y-label

5.1 Additional Packages

Show the code
library(ggpubr)
library(patchwork)
library(ggbeeswarm)
1
ggpubr provides stat_compare_means() for automatic significance brackets and geom_pwc() for pairwise comparisons — the fastest route to Nature-style p-value annotations
2
patchwork combines multiple ggplot objects into a single multi-panel figure with shared styling — essential for journal submissions that require Figure 1a, 1b layouts
3
ggbeeswarm adds geom_beeswarm(), which spreads overlapping points in a deterministic pattern (no random jitter) — preferred over geom_jitter() for reproducible figures

5.2 Figure 1 — Independent T-Test (Nature Style)

Show the code
nature_colors <- c(
  "Adelie"    = "#C0392B",
  "Chinstrap" = "#2980B9"
)

p1 <- ggplot(
  penguins_filtered,
  aes(x = species,
      y = bill_length_mm,
      colour = species)
) +

  geom_beeswarm(
    cex       = 2.2,
    size      = 1.8,
    alpha     = 0.55,
    priority  = "random"
  ) +

  stat_summary(
    fun.data  = mean_sdl,
    fun.args  = list(mult = 1),
    geom      = "pointrange",
    size      = 0.6,
    linewidth = 0.9,
    fatten    = 3
  ) +

  stat_compare_means(
    method       = "t.test",
    label        = "p.signif",
    label.x      = 1.5,
    label.y      = 60,
    size         = 5,
    bracket.size = 0.4
  ) +

  scale_colour_manual(
    values = nature_colors
  ) +

  scale_y_continuous(
    limits = c(32, 63),
    breaks = seq(35, 60, by = 5),
    expand = expansion(mult = c(0.02, 0.08))
  ) +

  labs(
    x        = NULL,
    y        = "Bill length (mm)",
    title    = "Bill length by species",
    subtitle = "Welch two-sample t-test"
  ) +

  theme_classic(base_size = 12) +
  theme(
    legend.position     = "none",
    axis.line           = element_line(linewidth = 0.4, colour = "grey30"),
    axis.ticks          = element_line(linewidth = 0.3, colour = "grey30"),
    axis.text           = element_text(colour = "grey20", size = 11),
    axis.title.y        = element_text(colour = "grey20", size = 11,
                                       margin = margin(r = 8)),
    plot.title          = element_text(face = "bold", size = 12),
    plot.subtitle       = element_text(colour = "grey50", size = 10),
    plot.title.position = "plot"
  )

p1
1
Muted, desaturated colours matching Nature’s palette — avoid bright primaries
2
Map species to both x-axis and colour so all downstream layers inherit the grouping
3
geom_beeswarm() spreads points without randomness — every run produces the same figure, which is required for reproducibility. alpha = 0.55 keeps dense regions readable
4
stat_summary() with mean_sdl (mean ± 1 s.d.) draws the crossbar overlay. fatten = 3 controls the size of the mean dot; linewidth = 0.9 keeps the error bar visible but not dominant
5
stat_compare_means() automatically runs the t-test, computes the p-value, and draws the significance bracket with the correct star label. "p.signif" maps to ns / * / ** / *** / **** notation
6
Apply the custom Nature-style colour palette to override ggplot defaults
7
Explicit y-axis limits with expansion() give headroom above the bracket without excess whitespace below
8
No x-axis label (the tick labels are self-explanatory), informative title and subtitle in the figure caption style
9
theme_classic() removes the grid and top/right axis borders — the standard base for Nature figures. All remaining theme tweaks tighten colours and margins


5.3 Figure 2 — Paired T-Test (Nature Style)

Show the code
drug_colors <- c(
  "drug_1" = "#27AE60",
  "drug_2" = "#8E44AD"
)

sleep_long_plot <- sleep_wide |>
  pivot_longer(
    cols      = c(drug_1, drug_2),
    names_to  = "drug",
    values_to = "extra_sleep"
  ) |>
  mutate(
    drug = factor(drug,
                  levels = c("drug_1","drug_2"),
                  labels = c("Drug 1","Drug 2"))
  )

sleep_wide_plot <- sleep_wide |>
  rename(`Drug 1` = drug_1,
         `Drug 2` = drug_2)

p2 <- ggplot(
  sleep_long_plot,
  aes(x      = drug,
      y      = extra_sleep,
      colour = drug)
) +

  geom_line(
    data = sleep_wide_plot |>
      pivot_longer(
        cols      = c(`Drug 1`,`Drug 2`),
        names_to  = "drug",
        values_to = "extra_sleep"
      ),
    aes(group = ID),
    colour    = "grey70",
    linewidth = 0.45,
    alpha     = 0.8
  ) +

  geom_beeswarm(
    cex   = 3,
    size  = 2.5,
    alpha = 0.85
  ) +

  stat_summary(
    fun.data  = mean_sdl,
    fun.args  = list(mult = 1),
    geom      = "pointrange",
    size      = 0.65,
    linewidth = 1,
    fatten    = 3.5
  ) +

  geom_hline(
    yintercept = 0,
    linetype   = "dashed",
    linewidth  = 0.3,
    colour     = "grey60"
  ) +

  stat_compare_means(
    method       = "t.test",
    paired       = TRUE,
    label        = "p.signif",
    label.x      = 1.5,
    label.y      = 6.8,
    size         = 5,
    bracket.size = 0.4
  ) +

  scale_colour_manual(
    values = c("Drug 1" = "#27AE60",
               "Drug 2" = "#8E44AD")
  ) +

  scale_y_continuous(
    limits = c(-3, 7.5),
    breaks = seq(-2, 6, by = 2),
    expand = expansion(mult = c(0.02, 0.08))
  ) +

  labs(
    x        = NULL,
    y        = "Extra sleep (hours vs. control)",
    title    = "Sleep duration by drug",
    subtitle = "Paired t-test, n = 10 patients"
  ) +

  theme_classic(base_size = 12) +
  theme(
    legend.position     = "none",
    axis.line           = element_line(linewidth = 0.4, colour = "grey30"),
    axis.ticks          = element_line(linewidth = 0.3, colour = "grey30"),
    axis.text           = element_text(colour = "grey20", size = 11),
    axis.title.y        = element_text(colour = "grey20", size = 11,
                                       margin = margin(r = 8)),
    plot.title          = element_text(face = "bold", size = 12),
    plot.subtitle       = element_text(colour = "grey50", size = 10),
    plot.title.position = "plot"
  )

p2
1
Green and purple — a Nature-approved muted pair that is also colour-blind friendly (distinguishable under deuteranopia)
2
Pivot to long format for plotting; relabel factor levels to clean display names (“Drug 1” / “Drug 2”)
3
Keep a separate wide-format tibble for the connecting lines — geom_line() needs the two conditions in separate rows but grouped by patient ID
4
Connecting lines are mandatory in Nature for paired data. Grey at 0.8 alpha keeps them subtle — they show the direction of individual change without competing with the summary statistics
5
cex = 3 spaces beeswarm dots further apart since n = 10 is small — prevents stacking
6
Larger fatten = 3.5 and linewidth = 1 make the mean ± s.d. overlay clearly visible over the small dot cloud
7
A dashed zero line provides a visual reference: points above zero mean the drug increased sleep relative to control
8
paired = TRUE inside stat_compare_means() correctly runs the paired t-test for the bracket — without this it would default to an independent test, giving a wrong p-value
9
Match factor label names exactly — after mutate() relabelled the levels, the colour scale must use “Drug 1” / “Drug 2”, not “drug_1” / “drug_2”


5.4 Combined Multi-Panel Figure

Show the code
combined <- p1 + p2 +
  plot_annotation(
    title   = "Figure 1",
    caption = "Data: palmerpenguins & base R sleep datasets. Error bars = mean ± s.d.\nSignificance: **** p < 0.0001, ** p < 0.01. Welch t-test (a); paired t-test (b).",
    tag_levels = "a"
  ) &
  theme(
    plot.tag         = element_text(face = "bold", size = 13),
    plot.caption     = element_text(colour = "grey50", size = 8,
                                    hjust = 0, margin = margin(t = 8)),
    plot.title       = element_text(face = "bold", size = 13)
  )

combined
1
patchwork uses + to place plots side by side. | stacks vertically, / stacks horizontally — you can mix them for complex layouts like (p1 | p2) / p3
2
plot_annotation() adds a shared figure title and a caption — the caption is where Nature expects the statistical details (test used, n, error bar definition)
3
tag_levels = "a" automatically labels panels a, b, c — matching the standard journal multi-panel convention. Use "A" for uppercase or "1" for numbers
4
The & operator (not +) applies theme() changes to all panels simultaneously — essential for keeping font sizes and margins consistent across the combined figure