Independent and Paired Tests with Palmer Penguins & Sleep Data
Author
Abdullah Al Shamim
Published
May 25, 2026
1 What is a t-test?
A t-test is a statistical test that helps us determine if there’s a significant difference between the means of two groups. Think of it as answering questions like “Do these two groups really differ, or could the difference just be due to random chance?”
There are two main types:
Independent t-test: Compares means from two separate, unrelated groups (e.g., comparing heights of cats vs. dogs)
Paired t-test: Compares means from the same group measured twice, or matched pairs (e.g., measuring blood pressure before and after medication in the same patients)
Note
Key assumptions: Both tests assume your data is approximately normally distributed, though t-tests are fairly robust to violations of this assumption, especially with larger sample sizes.
1.1 Understanding t.test() Arguments
The t.test() function in R has several important arguments:
x: A numeric vector of data values (or a formula for the formula method)
y: An optional numeric vector of data values (for two-sample tests)
alternative: Specifies the alternative hypothesis - “two.sided” (default), “less”, or “greater”
mu: The true value of the mean (or difference in means) under the null hypothesis. Default is 0
paired: Logical value indicating whether to perform a paired t-test. Default is FALSE
var.equal: Logical value indicating whether to assume equal variances. Default is FALSE (uses Welch’s t-test)
conf.level: Confidence level for the confidence interval. Default is 0.95 (95%)
formula: For formula method, use the format outcome ~ group
data: The data frame containing the variables (when using formula method)
For most basic comparisons, you’ll primarily use the formula method with the paired argument when needed.
2 Independent T-Test
When to use it: Use an independent t-test when you have two separate groups and want to know if their means differ significantly.
Example scenario: Let’s compare the bill length of Adelie penguins vs. Chinstrap penguins from the Palmer Archipelago. We’ll use the palmerpenguins package, which contains real data collected by Dr. Kristen Gorman.
2.1 Visualizing the Data
Show the code
library(tidyverse)library(palmerpenguins)penguins %>%filter(species %in%c("Adelie", "Chinstrap")) %>%drop_na(bill_length_mm) %>%group_by(species) %>%mutate(mean_bill =mean(bill_length_mm)) %>%ungroup() %>%ggplot(aes(x = bill_length_mm, fill = species)) +geom_density(alpha =0.6) +geom_vline(aes(xintercept = mean_bill, color = species),linewidth =1, linetype ="dashed") +annotate("text", x =39, y =0.09,label ="Mean", size =4, fontface ="bold") +annotate("text", x =49, y =0.09,label ="Mean", size =4, fontface ="bold") +labs(title ="Distribution of Penguin Bill Length by Species",x ="Bill Length (mm)",y ="Density",fill ="Species",color ="Species") +theme_minimal() +theme(legend.position ="bottom")
1
Load the tidyverse — provides ggplot2, dplyr, tidyr, and forcats used throughout
2
Load the Palmer Penguins dataset — real measurements from penguins in the Palmer Archipelago, Antarctica
3
Keep only the two species we want to compare; drop all others
4
Remove rows with missing bill length — required for a clean density curve and t-test
5
Compute the per-species mean inside a mutate() so it can be referenced by geom_vline() later
6
Map bill_length_mm to the x-axis and species to fill colour — one density curve per species
7
geom_density() draws a smooth kernel density estimate; alpha = 0.6 lets both curves show through where they overlap
8
geom_vline() draws a dashed vertical line at each species mean — color = species inherits the fill palette automatically
9
Hard-coded “Mean” labels placed just above the dashed lines to annotate each peak without a legend entry
Same filter as the plot — keeps only the two species of interest
2
Drop missing values before passing to t.test() — the function cannot handle NA in the outcome variable
3
Formula method: outcome ~ group splits bill_length_mm by species; data = . pipes the filtered tibble in; defaults to Welch’s t-test (var.equal = FALSE)
Welch Two Sample t-test
data: bill_length_mm by species
t = -21.865, df = 106.97, p-value < 2.2e-16
alternative hypothesis: true difference in means between group Adelie and group Chinstrap is not equal to 0
95 percent confidence interval:
-10.952948 -9.131917
sample estimates:
mean in group Adelie mean in group Chinstrap
38.79139 48.83382
2.3 Interpreting the Results
The output shows several key pieces of information:
t-statistic: How many standard errors the means are apart. Larger absolute values indicate more different groups.
p-value: If less than 0.05, we reject the null hypothesis that the means are equal. This suggests a statistically significant difference.
95% confidence interval: Range where we’re 95% confident the true difference in means lies.
Sample means: The average bill length for each species.
For our penguins: If p < 0.05, we can conclude that Adelie and Chinstrap penguins have significantly different bill lengths. The confidence interval tells us the likely range of that difference in millimeters. Looking at our density plot, we can see how the distributions are quite separated, with the means falling in distinctly different locations.
3 Paired T-Test
When to use it: Use a paired t-test when your data points are naturally paired - either the same subjects measured twice, or matched subjects measured once each.
Example scenario: Let’s examine the classic sleep dataset (built into R), which shows the effect of two different sleep drugs on 10 patients. Each patient received both drugs at different times, making this a perfect paired scenario.
Paired t-test using the vector method — x and y must be in the same row order (same patient); paired = TRUE computes within-person differences before testing, making the test more powerful by removing between-person variability
Paired t-test
data: sleep_wide$drug_1 and sleep_wide$drug_2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean difference
-1.58
3.4 Interpreting the Results
The paired t-test output is similar to the independent t-test but with important differences:
t-statistic: Tests if the mean difference between paired observations is zero.
p-value: If less than 0.05, there’s a significant difference between the two conditions.
95% confidence interval: Range for the true mean difference (not the difference between means!).
Mean of differences: Average change from condition 1 to condition 2.
For our sleep data: We’re testing whether the two drugs produce different amounts of extra sleep in the same individuals. The paired design is more powerful because it accounts for person-to-person variability - some people naturally sleep more than others, but we’re interested in whether each person sleeps differently on the two drugs. The density plot helps us visualize the overall distributions, though remember that the paired test is actually looking at the differences within each person, not just the overall distributions.
4 Key Takeaway
Independent vs. Paired - Choose wisely!
Different subjects in each group → Independent t-test
Same subjects measured twice OR matched pairs → Paired t-test
Important
Using the wrong test can lead to incorrect conclusions or reduced statistical power!