This power analysis determines the sample size needed for our experiment examining how background music affects reading comprehension performance.
Research Question: Does listening to background music impair reading comprehension compared to silence?
Hypothesis: We hypothesize that background music will reduce reading comprehension scores compared to silence, based on the irrelevant sound effect documented in cognitive psychology literature.
We use simulation-based methods to estimate statistical power under three different effect size scenarios, informed by published research. This allows us to determine an appropriate sample size before data collection begins.
Before conducting simulations, we establish our key statistical assumptions. To determine the required sample size, we are targeting a statistical power of 80% with a standard significance level (α = 0.05), which provides a strong probability of detecting a true effect while minimizing the risk of false positives (Type I error) and false negatives (Type II error). Based on existing literature, we anticipate a moderate effect size (Cohen’s d ≈ 0.50), which initially suggests a sample size of approximately 34 participants. However, to account for uncertainty, our power analysis will also evaluate scenarios with both smaller (d = 0.30) and larger (d = 0.80) effect sizes across a simulated sample size range of 20 to 80 participants to verify our estimate and understand the relationship between sample size, effect size, and statistical power.
Our within-subjects design provides important statistical advantages:
We will conduct a within-subjects counterbalanced experiment with two conditions:
Key Design Features:
Reading comprehension will be assessed using multiple-choice questions about passage content.
Primary analysis: Paired t-test comparing silence vs. music scores across all participants
Secondary analyses (exploratory):
To determine realistic effect sizes, we reviewed recent empirical studies and meta-analyses on background music and reading comprehension.
de la Mora Velasco et al. (2023) conducted a systematic review and meta-analysis of background music effects on learning, examining 148 studies total. For reading tasks specifically, they analyzed 14 studies and found mixed results:
Anderson & Fuller (2010) - “Effect of Music on Reading Comprehension of Junior High School Students”:
Perham & Currie (2014) - “Does listening to preferred music improve reading comprehension performance?”
Sun et al. (2024) - Effects of music with lyrics on reading
Thompson et al. (2012) - Background music and reading
Souza & Barbosa (2023) - Music with lyrics and cognitive tasks
Based on this literature, we establish three effect size scenarios:
Within-subject SD = 8%: Trial-to-trial variability for the same person, based on test-retest reliability r ≈ 0.70-0.75 for reading assessments.
Between-subject SD = 12%: Individual differences in baseline reading ability, typical for college student samples on standardized reading tests.
For each sample size (N = 20, 25, 30, …, 80) and scenario, we simulate 1,000 paired t-tests and record power as the proportion where p < 0.05.
# Function to simulate power for paired t-test (within-subjects design)
simulate_power_paired <- function(n_subjects,
mean_silence,
mean_music,
sd_within = 8,
sd_between = 12,
n_sims = 1000,
alpha = 0.05) {
# Store p-values from each simulation
p_values <- numeric(n_sims)
# Run simulations
for(i in 1:n_sims) {
# Generate subject-specific baseline reading ability
# This captures individual differences (some people are better readers)
baseline_ability <- rnorm(n_subjects, mean = 0, sd = sd_between)
# Generate silence condition scores
# baseline_ability creates correlation between conditions for same person
silence_scores <- mean_silence + baseline_ability +
rnorm(n_subjects, mean = 0, sd = sd_within)
# Generate music condition scores
# Same baseline_ability ensures data from same person is correlated
music_scores <- mean_music + baseline_ability +
rnorm(n_subjects, mean = 0, sd = sd_within)
# Conduct paired t-test (appropriate for within-subjects design)
test_result <- t.test(silence_scores, music_scores, paired = TRUE)
# Store p-value
p_values[i] <- test_result$p.value
}
# Calculate power: proportion of simulations where p < alpha
power <- mean(p_values < alpha)
return(power)
}
# Define range of sample sizes to test
sample_sizes <- seq(20, 80, by = 5)
# Create data frame to store results
power_results <- expand.grid(
n = sample_sizes,
scenario = c("Conservative", "Moderate", "Optimistic"),
stringsAsFactors = FALSE
) %>%
mutate(power = NA_real_)
# Run simulations for each combination
cat("Running power simulations across",
nrow(power_results), "combinations...\\n")
## Running power simulations across 39 combinations...\n
cat("Each combination runs 1,000 simulated experiments.\\n")
## Each combination runs 1,000 simulated experiments.\n
cat("This may take 1-2 minutes.\\n\\n")
## This may take 1-2 minutes.\n\n
for(i in 1:nrow(power_results)) {
n <- power_results$n[i]
scenario <- power_results$scenario[i]
# Set parameters based on scenario
if(scenario == "Conservative") {
# Small effect: 3% decrease (d ≈ 0.30)
power_results$power[i] <- simulate_power_paired(
n_subjects = n,
mean_silence = 75,
mean_music = 72,
n_sims = 1000
)
} else if(scenario == "Moderate") {
# Moderate effect: 5% decrease (d ≈ 0.50)
power_results$power[i] <- simulate_power_paired(
n_subjects = n,
mean_silence = 75,
mean_music = 70,
n_sims = 1000
)
} else { # Optimistic
# Large effect: 8% decrease (d ≈ 0.80)
power_results$power[i] <- simulate_power_paired(
n_subjects = n,
mean_silence = 75,
mean_music = 67,
n_sims = 1000
)
}
}
cat("Simulations complete!\\n")
## Simulations complete!\n
N | Conservative (d≈0.3) | Moderate (d≈0.5) | Optimistic (d≈0.8) |
---|---|---|---|
25 | 0.238 | 0.561 | 0.934 |
30 | 0.303 | 0.646 | 0.961 |
35 | 0.339 | 0.701 | 0.979 |
40 | 0.397 | 0.776 | 0.989 |
45 | 0.412 | 0.828 | 0.997 |
50 | 0.433 | 0.874 | 0.999 |
60 | 0.518 | 0.918 | 0.999 |
70 | 0.590 | 0.951 | 1.000 |
Interpretation: Each cell shows the probability of detecting the effect (p < 0.05) if the true effect size matches that scenario. For example, with N=40 under the moderate scenario, we have approximately 80% chance of finding a significant result.
# Find minimum N needed for 80% power in each scenario
recommendations <- power_results %>%
group_by(scenario) %>%
filter(power >= 0.80) %>%
slice_min(n, n = 1) %>%
select(scenario, n, power) %>%
ungroup()
kable(recommendations,
digits = 3,
caption = "Minimum sample size to achieve 80% power (with actual achieved power)",
col.names = c("Scenario", "Minimum N", "Achieved Power"))
Scenario | Minimum N | Achieved Power |
---|---|---|
Moderate | 45 | 0.828 |
Optimistic | 20 | 0.834 |
# Extract recommended N for moderate scenario
recommended_n <- recommendations %>%
filter(scenario == "Moderate") %>%
pull(n)
# Get power values for recommended N across scenarios
power_at_recommended <- power_results %>%
filter(n == recommended_n) %>%
select(scenario, power)
We recommend recruiting N = 40 participants for our study.
This recommendation is based on achieving 80% power under our moderate effect size scenario (d ≈ 0.50, representing a 5 percentage point decrease in reading comprehension), which we believe best represents the literature on background music and reading comprehension.
1. Adequate power under realistic assumptions
With 45 participants, we achieve approximately 83% power under our moderate effect size scenario. This meets the conventional 80% power threshold, meaning we have a 83% probability of detecting the effect if it exists.
2. Protection against smaller effects
Even if the true effect is smaller than we expect (conservative scenario, d ≈ 0.30), we maintain 41% power. While this is below our 80% target, it still provides a reasonable chance of detecting the effect and is acceptable given resource constraints.
3. High confidence with larger effects
If the true effect is large (optimistic scenario, d ≈ 0.80), we achieve 100% power, virtually guaranteeing we will detect it.
4. Efficient within-subjects design
Our within-subjects design is highly efficient. An equivalent between-subjects design (different participants in silence vs. music conditions) would require approximately 90 total participants (45 per group) for the same power. By using each participant as their own control, we reduce sample size requirements by approximately 50%.
5. Counterbalancing feasibility
With 45 participants, we will have 22.5 participants in each order condition:
This is sufficient to check whether order effects are present and to ensure they don’t confound our main effect.
6. Subgroup analysis potential
This sample size provides reasonable power to explore heterogeneous treatment effects. If we have approximately equal numbers of habitual music listeners and non-listeners, we would have ~22 participants per group. While underpowered for formal interaction tests, we can descriptively examine whether effects differ between groups (as suggested by Sun et al., 2024).
7. Practical feasibility
45 participants is:
With N = 45 participants, our power across scenarios is:
power_summary <- power_at_recommended %>%
mutate(
power_pct = sprintf("%.1f%%", power * 100),
interpretation = case_when(
power >= 0.90 ~ "Excellent - Very high probability of detection",
power >= 0.80 ~ "Good - Adequate for reliable inference",
power >= 0.70 ~ "Acceptable - Reasonable but not ideal",
power >= 0.60 ~ "Marginal - Higher risk of false negative",
TRUE ~ "Underpowered - Substantial risk of false negative"
)
) %>%
select(scenario, power_pct, interpretation)
kable(power_summary,
col.names = c("Scenario", "Power", "Assessment"),
caption = paste("Power analysis summary for N =", recommended_n))
Scenario | Power | Assessment |
---|---|---|
Conservative | 41.2% | Underpowered - Substantial risk of false negative |
Moderate | 82.8% | Good - Adequate for reliable inference |
Optimistic | 99.7% | Excellent - Very high probability of detection |
Bottom line: With r recommended_n participants, we are well-powered to detect moderate-to-large effects and have acceptable power even for small effects. # 8. Conclusion and Next Steps
Based on simulation-based power analysis informed by published research on background music and reading comprehension:
Primary Recommendation: N = 40 participants
This sample size provides:
This sample size is feasible, well-powered for realistic effects, and consistent with similar within-subjects studies in the literature.
Anderson, S. A., & Fuller, G. B. (2010). Effect of music on reading comprehension of junior high school students. School Psychology Quarterly, 25(3), 178–187. https://doi.org/10.1037/a0021213
Kämpfe, J., Sedlmeier, P., & Renkewitz, F. (2011). The impact of background music on adult listeners: A meta-analysis. Psychology of Music, 39(4), 424–448. https://doi.org/10.1177/0305735610376261
Perham, N., & Currie, S. (2014). Does listening to preferred music improve reading comprehension performance? Applied Cognitive Psychology, 28(6), 924–930. https://doi.org/10.1002/acp.2994
Souza, B. C., & Barbosa, M. T. (2023). Should we turn off the music? Music with lyrics interferes with cognitive tasks. Journal of Cognition, 6(1), 52. https://doi.org/10.5334/joc.273
Sun, Y., Sun, C., Li, C., Shao, X., Liu, Q., & Liu, H. (2024). Impact of background music on reading comprehension: Influence of lyrics language and study habits. Frontiers in Psychology, 15, 1363562. https://doi.org/10.3389/fpsyg.2024.1363562
Thompson, W. F., Schellenberg, E. G., & Letnic, A. K. (2011). Fast and loud background music disrupts reading comprehension. Psychology of Music, 40(6), 700–708. https://doi.org/10.1177/0305735611400173