Independent and Paired Tests with Palmer Penguins & Sleep Data
Author
Abdullah Al Shamim
Published
May 25, 2026
1 What is a t-test?
A t-test is a statistical test that helps us determine if there’s a significant difference between the means of two groups. Think of it as answering questions like “Do these two groups really differ, or could the difference just be due to random chance?”
There are two main types:
Independent t-test: Compares means from two separate, unrelated groups (e.g., comparing heights of cats vs. dogs)
Paired t-test: Compares means from the same group measured twice, or matched pairs (e.g., measuring blood pressure before and after medication in the same patients)
Note
Key assumptions: Both tests assume your data is approximately normally distributed, though t-tests are fairly robust to violations of this assumption, especially with larger sample sizes.
1.1 Understanding t.test() Arguments
The t.test() function in R has several important arguments:
Argument
Description
Default
x
A numeric vector of data values (or a formula)
—
y
An optional second numeric vector
—
alternative
Alternative hypothesis: "two.sided", "less", or "greater"
"two.sided"
mu
True value of mean under the null hypothesis
0
paired
Logical — perform a paired t-test?
FALSE
var.equal
Logical — assume equal variances?
FALSE
conf.level
Confidence level for the interval
0.95
formula
Formula method: outcome ~ group
—
data
Data frame containing the variables
—
For most basic comparisons, you’ll primarily use the formula method with the paired argument when needed.
2 Independent T-Test
When to use it: Use an independent t-test when you have two separate groups and want to know if their means differ significantly.
Example scenario: Let’s compare the bill length of Adelie penguins vs. Chinstrap penguins from the Palmer Archipelago. We’ll use the palmerpenguins package, which contains real data collected by Dr. Kristen Gorman.
Named colour palette — orange for Adelie, teal for Chinstrap — applied consistently to fill, outline, vlines, and labels
2
Pre-compute per-species means in a separate tibble for use in geom_vline() and geom_text()
3
Map both fill and colour to species so density curves, outlines, vlines, and text all inherit the same colour automatically
4
geom_density() draws smooth overlapping curves; alpha = 0.45 keeps both visible where they overlap; adjust = 1.1 gives a slightly smoothed kernel for n > 100
5
Dashed vertical lines mark each species mean — the clearest way to show how far apart the two distributions are centred
6
“Mean” labels pinned to the top of each vline using y = Inf + vjust — avoids cluttering the density area
7
Unified scale_fill_manual() and scale_colour_manual() with the same name = "Species" merge into a single legend entry per species
8
Explicit x-axis breaks from 32 to 58 mm, covering the full range of both species
9
theme_minimal() with no vertical grid lines keeps the eye on the horizontal distribution shape
Formula method: outcome ~ grouping variable — R will split bill_length_mm by species
2
Point to our filtered dataset
3
Two-sided test: we’re asking “are they different?” not “is one bigger?”
4
FALSE = Welch’s t-test, which does not assume equal variances — the safer default
Welch Two Sample t-test
data: bill_length_mm by species
t = -21.865, df = 106.97, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Adelie and group Chinstrap is not equal to 0
95 percent confidence interval:
-10.952948 -9.131917
sample estimates:
mean in group Adelie mean in group Chinstrap
38.79139 48.83382
2.5 Interpreting the Results
The output shows several key pieces of information:
t-statistic: How many standard errors the means are apart. Larger absolute values indicate more different groups.
p-value: If less than 0.05, we reject the null hypothesis that the means are equal. This suggests a statistically significant difference.
95% confidence interval: Range where we’re 95% confident the true difference in means lies.
Sample means: The average bill length for each species.
Tip
For our penguins: if p < 0.05, we can conclude that Adelie and Chinstrap penguins have significantly different bill lengths. The confidence interval tells us the likely range of that difference in millimetres. Looking at our density plot, we can see how the distributions are quite separated, with the means falling in distinctly different locations.
3 Paired T-Test
When to use it: Use a paired t-test when your data points are naturally paired — either the same subjects measured twice, or matched subjects measured once each.
Example scenario: Let’s examine the classic sleep dataset (built into R), which shows the effect of two different sleep drugs on 10 patients. Each patient received both drugs at different times, making this a perfect paired scenario.
Pivot to long format — one row per patient per drug — so ggplot2 can map Treatment to fill and colour
2
factor() with explicit labels gives clean display names (“Drug 1” / “Drug 2”) instead of raw column names
3
Pre-compute per-group means in a separate tibble so they can be passed to geom_vline() and geom_text() independently
4
Salmon-pink for Drug 1, teal for Drug 2 — a perceptually distinct, colour-blind-safe pair that matches the reference RPubs style
5
geom_density() draws smooth overlapping curves; alpha = 0.45 keeps both distributions visible where they overlap; adjust = 1.2 slightly smooths the kernel for n = 10
6
geom_vline() draws a dashed vertical line at each group mean — the key visual anchor showing Drug 2 shifts the distribution rightward
7
geom_text() places the “Mean” label at y = Inf (top of panel), nudged just inside with vjust — cleaner than a separate legend entry
8
Consistent colour scale for both fill (density area) and colour (outline + vline + text) so all elements share one colour per drug
9
Explicit x-axis breaks from −2 to 6, matching the reference plot’s axis labels
10
theme_minimal() with horizontal grid only (panel.grid.major.x = element_blank()) keeps the focus on the curves; legend at bottom matches the reference layout
x is the first measurement vector — extra sleep hours for Drug 1, one value per patient
2
y is the second measurement vector — extra sleep hours for Drug 2, same patients in the same order
3
Crucial:paired = TRUE tells R to compute differences within each row before testing
4
Two-sided: we’re asking whether Drug 2 produces different sleep (more or less) than Drug 1
5
95% confidence interval for the mean within-patient difference
Paired t-test
data: sleep_wide$drug_1 and sleep_wide$drug_2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean difference
-1.58
3.4 Interpreting the Results
The paired t-test output is similar to the independent t-test but with important differences:
t-statistic: Tests if the mean difference between paired observations is zero.
p-value: If less than 0.05, there’s a significant difference between the two conditions.
95% confidence interval: Range for the true mean difference (not the difference between means!).
Mean of differences: Average change from condition 1 to condition 2.
Tip
For our sleep data: we’re testing whether the two drugs produce different amounts of extra sleep in the same individuals. The paired design is more powerful because it accounts for person-to-person variability — some people naturally sleep more than others, but we’re interested in whether each person sleeps differently on the two drugs.
4 Key Takeaway
Independent vs. Paired — Choose Wisely!
Using the wrong test leads to incorrect conclusions or reduced statistical power
Scenario
Correct Test
R Code
Two independent groups
Independent t-test
t.test(y ~ group, data = df)
Same group measured twice
Paired t-test
t.test(x = pre, y = post, paired = TRUE)
Matched pairs (e.g., twins)
Paired t-test
t.test(x = group1, y = group2, paired = TRUE)
Important
Rule of Thumb
Different subjects in each group → Independent t-test
Same subjects measured twice OR matched pairs → Paired t-test
Using the wrong test can lead to incorrect conclusions or reduced statistical power!
ggpubr provides stat_compare_means() for automatic significance brackets and geom_pwc() for pairwise comparisons — the fastest route to Nature-style p-value annotations
2
patchwork combines multiple ggplot objects into a single multi-panel figure with shared styling — essential for journal submissions that require Figure 1a, 1b layouts
3
ggbeeswarm adds geom_beeswarm(), which spreads overlapping points in a deterministic pattern (no random jitter) — preferred over geom_jitter() for reproducible figures
Map species to both x-axis and colour so all downstream layers inherit the grouping
3
geom_beeswarm() spreads points without randomness — every run produces the same figure, which is required for reproducibility. alpha = 0.55 keeps dense regions readable
4
stat_summary() with mean_sdl (mean ± 1 s.d.) draws the crossbar overlay. fatten = 3 controls the size of the mean dot; linewidth = 0.9 keeps the error bar visible but not dominant
5
stat_compare_means() automatically runs the t-test, computes the p-value, and draws the significance bracket with the correct star label. "p.signif" maps to ns / * / ** / *** / **** notation
6
Apply the custom Nature-style colour palette to override ggplot defaults
7
Explicit y-axis limits with expansion() give headroom above the bracket without excess whitespace below
8
No x-axis label (the tick labels are self-explanatory), informative title and subtitle in the figure caption style
9
theme_classic() removes the grid and top/right axis borders — the standard base for Nature figures. All remaining theme tweaks tighten colours and margins
Green and purple — a Nature-approved muted pair that is also colour-blind friendly (distinguishable under deuteranopia)
2
Pivot to long format for plotting; relabel factor levels to clean display names (“Drug 1” / “Drug 2”)
3
Keep a separate wide-format tibble for the connecting lines — geom_line() needs the two conditions in separate rows but grouped by patient ID
4
Connecting lines are mandatory in Nature for paired data. Grey at 0.8 alpha keeps them subtle — they show the direction of individual change without competing with the summary statistics
5
cex = 3 spaces beeswarm dots further apart since n = 10 is small — prevents stacking
6
Larger fatten = 3.5 and linewidth = 1 make the mean ± s.d. overlay clearly visible over the small dot cloud
7
A dashed zero line provides a visual reference: points above zero mean the drug increased sleep relative to control
8
paired = TRUE inside stat_compare_means() correctly runs the paired t-test for the bracket — without this it would default to an independent test, giving a wrong p-value
9
Match factor label names exactly — after mutate() relabelled the levels, the colour scale must use “Drug 1” / “Drug 2”, not “drug_1” / “drug_2”
patchwork uses + to place plots side by side. | stacks vertically, / stacks horizontally — you can mix them for complex layouts like (p1 | p2) / p3
2
plot_annotation() adds a shared figure title and a caption — the caption is where Nature expects the statistical details (test used, n, error bar definition)
3
tag_levels = "a" automatically labels panels a, b, c — matching the standard journal multi-panel convention. Use "A" for uppercase or "1" for numbers
4
The & operator (not +) applies theme() changes to all panels simultaneously — essential for keeping font sizes and margins consistent across the combined figure