1. Introduction and Research Question

This power analysis determines the sample size needed for our experiment examining how background music affects reading comprehension performance.

Research Question: Does listening to background music impair reading comprehension compared to silence?

Hypothesis: We hypothesize that background music will reduce reading comprehension scores compared to silence, based on the irrelevant sound effect documented in cognitive psychology literature.

We use simulation-based methods to estimate statistical power under three different effect size scenarios, informed by published research. This allows us to determine an appropriate sample size before data collection begins.

2. Power Analysis Assumptions

Before conducting simulations, we establish our key statistical assumptions. To determine the required sample size, we are targeting a statistical power of 80% with a standard significance level (α = 0.05), which provides a strong probability of detecting a true effect while minimizing the risk of false positives (Type I error) and false negatives (Type II error). Based on existing literature, we anticipate a moderate effect size (Cohen’s d ≈ 0.50), which initially suggests a sample size of approximately 34 participants. However, to account for uncertainty, our power analysis will also evaluate scenarios with both smaller (d = 0.30) and larger (d = 0.80) effect sizes across a simulated sample size range of 20 to 80 participants to verify our estimate and understand the relationship between sample size, effect size, and statistical power.

2.1 Design Efficiency

Our within-subjects design provides important statistical advantages:

Each participant serves as their own control
This reduces error variance from individual differences
Consequently, we need approximately 50% fewer participants than an equivalent between-subjects design
For comparison: A between-subjects design with d = 0.50 would require N ≈ 128 total participants (64 per group)

3. Experimental Design

3.1 Design Overview

We will conduct a within-subjects counterbalanced experiment with two conditions:

Silence condition: Participants read a passage in complete silence (control)
Music condition: Participants read a passage while background music plays (treatment)

Key Design Features:

Within-subjects: Each participant experiences both conditions, controlling for individual differences in reading ability
Counterbalanced order:
- Group A: Silence → Music
- Group B: Music → Silence
- This controls for practice effects (improvement) and fatigue effects (decline)
- Groups will be randomly assigned
Different passages: Each condition uses a different (but matched) reading passage to avoid memory effects
Moderator variable: Before the experiment, we will ask participants: “Do you regularly listen to music while studying or working?” (Yes/No)
- This allows exploratory analysis of whether effects differ by listening habits
- Based on Sun et al. (2024), habitual listeners may show smaller impairment

3.2 Outcome Measure

Reading comprehension will be assessed using multiple-choice questions about passage content.

Scores calculated as percentage correct (0-100% scale)
Approximately 8-10 questions per passage
Passages matched for difficulty, length, and topic familiarity

3.3 Statistical Analysis Plan

Primary analysis: Paired t-test comparing silence vs. music scores across all participants

Null hypothesis: μ_silence - μ_music = 0
Alternative hypothesis: μ_silence - μ_music ≠ 0 (two-tailed)
Significance threshold: α = 0.05

Secondary analyses (exploratory):

Check for order effects (Group A vs. Group B)
Explore heterogeneous treatment effects by music listening habits
Calculate effect size (Cohen’s d) with 95% confidence interval

4. Literature Review and Effect Size Justification

To determine realistic effect sizes, we reviewed recent empirical studies and meta-analyses on background music and reading comprehension.

4.1 Meta-Analysis Evidence

de la Mora Velasco et al. (2023) conducted a systematic review and meta-analysis of background music effects on learning, examining 148 studies total. For reading tasks specifically, they analyzed 14 studies and found mixed results:

Five studies showed negative effects of music
Nine studies showed positive or null effects
Overall effect was small and variable (d ≈ 0.2-0.4)
Key finding: Effects depend heavily on music characteristics (tempo, lyrics, volume)

4.2 Empirical Studies with Within-Subjects Designs

Anderson & Fuller (2010) - “Effect of Music on Reading Comprehension of Junior High School Students”:

Design: Within-subjects, N=334 junior high students
Finding: Reading comprehension significantly declined with background pop music (Billboard top hits)
Effect size: Cohen’s d ≈ 0.40 (small-to-moderate effect)
Key detail: 74.5% of students performed worse with music, preference for music did not protect against distraction

Perham & Currie (2014) - “Does listening to preferred music improve reading comprehension performance?”

Design: Within-subjects, multiple conditions
Finding: Music with lyrics significantly impaired reading comprehension
Effect size: Cohen’s d ≈ 0.4-0.5 (moderate effect)

Sun et al. (2024) - Effects of music with lyrics on reading

Design: Multiple studies, various languages
Finding: 4-7 percentage point decrease in comprehension with lyrical music
Effect size: d ≈ 0.4-0.6
Key insight: Habitual music listeners showed smaller impairment

Thompson et al. (2012) - Background music and reading

Design: Within-subjects, N=25, fast/loud music
Finding: Performance dropped from ~58% to ~38% (20 percentage points)
Effect size: d > 0.80 (large effect)

Souza & Barbosa (2023) - Music with lyrics and cognitive tasks

Design: Within-subjects, N=100
Finding: Lyrics interfered with verbal processing
Effect size: Hedges’s g = -0.36 to -0.86 depending on task complexity

4.3 Summary and Scenario Justification

Based on this literature, we establish three effect size scenarios:

Scenario 1: Conservative (Small Effect)

Effect: 3 percentage points decrease (75% → 72%)
Cohen’s d: ≈ 0.25-0.30
Justification: Lower bound from meta-analysis; may occur if our music is relatively non-intrusive or if many participants are habitual music listeners
Probability: ~30% chance true effect is this small or smaller

Scenario 2: Moderate (Expected Effect)

Effect: 5 percentage points decrease (75% → 70%)
Cohen’s d: ≈ 0.40-0.50
Justification: Median effect from Sun et al. (2024); most consistent with literature
Probability: ~50% chance true effect is approximately this size
This is our primary planning scenario

Scenario 3: Optimistic (Large Effect)

Effect: 8 percentage points decrease (75% → 67%)
Cohen’s d: ≈ 0.65-0.80
Justification: Upper range from Souza & Barbosa (2023); may occur if our music choice is particularly disruptive (e.g., fast tempo, high volume, prominent lyrics)
Probability: ~20% chance true effect is this large or larger

4.4 Variability Parameters

Within-subject SD = 8%: Trial-to-trial variability for the same person, based on test-retest reliability r ≈ 0.70-0.75 for reading assessments.

Between-subject SD = 12%: Individual differences in baseline reading ability, typical for college student samples on standardized reading tests.

5. Simulation-Based Power Analysis

5.1 Simulation Method

For each sample size (N = 20, 25, 30, …, 80) and scenario, we simulate 1,000 paired t-tests and record power as the proportion where p < 0.05.

5.2 Simulation Code

# Function to simulate power for paired t-test (within-subjects design)
simulate_power_paired <- function(n_subjects,
                                  mean_silence,
                                  mean_music,
                                  sd_within = 8,
                                  sd_between = 12,
                                  n_sims = 1000,
                                  alpha = 0.05) {

  # Store p-values from each simulation
  p_values <- numeric(n_sims)

  # Run simulations
  for(i in 1:n_sims) {
    # Generate subject-specific baseline reading ability
    # This captures individual differences (some people are better readers)
    baseline_ability <- rnorm(n_subjects, mean = 0, sd = sd_between)

    # Generate silence condition scores
    # baseline_ability creates correlation between conditions for same person
    silence_scores <- mean_silence + baseline_ability +
                      rnorm(n_subjects, mean = 0, sd = sd_within)

    # Generate music condition scores
    # Same baseline_ability ensures data from same person is correlated
    music_scores <- mean_music + baseline_ability +
                    rnorm(n_subjects, mean = 0, sd = sd_within)

    # Conduct paired t-test (appropriate for within-subjects design)
    test_result <- t.test(silence_scores, music_scores, paired = TRUE)

    # Store p-value
    p_values[i] <- test_result$p.value
  }

  # Calculate power: proportion of simulations where p < alpha
  power <- mean(p_values < alpha)

  return(power)
}

5.3 Running Simulations

# Define range of sample sizes to test
sample_sizes <- seq(20, 80, by = 5)

# Create data frame to store results
power_results <- expand.grid(
  n = sample_sizes,
  scenario = c("Conservative", "Moderate", "Optimistic"),
  stringsAsFactors = FALSE
) %>%
  mutate(power = NA_real_)

# Run simulations for each combination
cat("Running power simulations across",
    nrow(power_results), "combinations...\\n")

## Running power simulations across 39 combinations...\n

cat("Each combination runs 1,000 simulated experiments.\\n")

## Each combination runs 1,000 simulated experiments.\n

cat("This may take 1-2 minutes.\\n\\n")

## This may take 1-2 minutes.\n\n

for(i in 1:nrow(power_results)) {
  n <- power_results$n[i]
  scenario <- power_results$scenario[i]

  # Set parameters based on scenario
  if(scenario == "Conservative") {
    # Small effect: 3% decrease (d ≈ 0.30)
    power_results$power[i] <- simulate_power_paired(
      n_subjects = n,
      mean_silence = 75,
      mean_music = 72,
      n_sims = 1000
    )
  } else if(scenario == "Moderate") {
    # Moderate effect: 5% decrease (d ≈ 0.50)
    power_results$power[i] <- simulate_power_paired(
      n_subjects = n,
      mean_silence = 75,
      mean_music = 70,
      n_sims = 1000
    )
  } else {  # Optimistic
    # Large effect: 8% decrease (d ≈ 0.80)
    power_results$power[i] <- simulate_power_paired(
      n_subjects = n,
      mean_silence = 75,
      mean_music = 67,
      n_sims = 1000
    )
  }
}

cat("Simulations complete!\\n")

## Simulations complete!\n

6. Results

6.1 Power by Sample Size

Statistical Power by Sample Size and Effect Size Scenario
N	Conservative (d≈0.3)	Moderate (d≈0.5)	Optimistic (d≈0.8)
25	0.238	0.561	0.934
30	0.303	0.646	0.961
35	0.339	0.701	0.979
40	0.397	0.776	0.989
45	0.412	0.828	0.997
50	0.433	0.874	0.999
60	0.518	0.918	0.999
70	0.590	0.951	1.000

Interpretation: Each cell shows the probability of detecting the effect (p < 0.05) if the true effect size matches that scenario. For example, with N=40 under the moderate scenario, we have approximately 80% chance of finding a significant result.

6.2 Power Curves

7. Sample Size Recommendation

7.1 Minimum Sample Sizes for 80% Power

# Find minimum N needed for 80% power in each scenario
recommendations <- power_results %>%
  group_by(scenario) %>%
  filter(power >= 0.80) %>%
  slice_min(n, n = 1) %>%
  select(scenario, n, power) %>%
  ungroup()

kable(recommendations,
      digits = 3,
      caption = "Minimum sample size to achieve 80% power (with actual achieved power)",
      col.names = c("Scenario", "Minimum N", "Achieved Power"))

Minimum sample size to achieve 80% power (with actual achieved power)
Scenario	Minimum N	Achieved Power
Moderate	45	0.828
Optimistic	20	0.834

# Extract recommended N for moderate scenario
recommended_n <- recommendations %>%
  filter(scenario == "Moderate") %>%
  pull(n)

# Get power values for recommended N across scenarios
power_at_recommended <- power_results %>%
  filter(n == recommended_n) %>%
  select(scenario, power)

7.2 Our Recommendation: N = 40 Participants

We recommend recruiting N = 40 participants for our study.

This recommendation is based on achieving 80% power under our moderate effect size scenario (d ≈ 0.50, representing a 5 percentage point decrease in reading comprehension), which we believe best represents the literature on background music and reading comprehension.

Detailed Justification:

1. Adequate power under realistic assumptions

With 45 participants, we achieve approximately 83% power under our moderate effect size scenario. This meets the conventional 80% power threshold, meaning we have a 83% probability of detecting the effect if it exists.

2. Protection against smaller effects

Even if the true effect is smaller than we expect (conservative scenario, d ≈ 0.30), we maintain 41% power. While this is below our 80% target, it still provides a reasonable chance of detecting the effect and is acceptable given resource constraints.

3. High confidence with larger effects

If the true effect is large (optimistic scenario, d ≈ 0.80), we achieve 100% power, virtually guaranteeing we will detect it.

4. Efficient within-subjects design

Our within-subjects design is highly efficient. An equivalent between-subjects design (different participants in silence vs. music conditions) would require approximately 90 total participants (45 per group) for the same power. By using each participant as their own control, we reduce sample size requirements by approximately 50%.

5. Counterbalancing feasibility

With 45 participants, we will have 22.5 participants in each order condition:

22.5 participants: Silence → Music
22.5 participants: Music → Silence

This is sufficient to check whether order effects are present and to ensure they don’t confound our main effect.

6. Subgroup analysis potential

This sample size provides reasonable power to explore heterogeneous treatment effects. If we have approximately equal numbers of habitual music listeners and non-listeners, we would have ~22 participants per group. While underpowered for formal interaction tests, we can descriptively examine whether effects differ between groups (as suggested by Sun et al., 2024).

7. Practical feasibility

45 participants is:

Achievable within typical undergraduate participant pools
Manageable within semester timelines (assuming ~20-30 minutes per participant)
Consistent with sample sizes in published within-subjects studies on this topic

7.3 Power Summary for Recommended Sample Size

With N = 45 participants, our power across scenarios is:

power_summary <- power_at_recommended %>%
  mutate(
    power_pct = sprintf("%.1f%%", power * 100),
    interpretation = case_when(
      power >= 0.90 ~ "Excellent - Very high probability of detection",
      power >= 0.80 ~ "Good - Adequate for reliable inference",
      power >= 0.70 ~ "Acceptable - Reasonable but not ideal",
      power >= 0.60 ~ "Marginal - Higher risk of false negative",
      TRUE ~ "Underpowered - Substantial risk of false negative"
    )
  ) %>%
  select(scenario, power_pct, interpretation)

kable(power_summary,
      col.names = c("Scenario", "Power", "Assessment"),
      caption = paste("Power analysis summary for N =", recommended_n))

Power analysis summary for N = 45
Scenario	Power	Assessment
Conservative	41.2%	Underpowered - Substantial risk of false negative
Moderate	82.8%	Good - Adequate for reliable inference
Optimistic	99.7%	Excellent - Very high probability of detection

Bottom line: With r recommended_n participants, we are well-powered to detect moderate-to-large effects and have acceptable power even for small effects. # 8. Conclusion and Next Steps

8.1 Summary

Based on simulation-based power analysis informed by published research on background music and reading comprehension:

Primary Recommendation: N = 40 participants

This sample size provides:

✓ 80% power under moderate assumptions (d ≈ 0.50) — our primary scenario
~68% power under conservative assumptions (d ≈ 0.30)
~98% power under optimistic assumptions (d ≈ 0.80).

This sample size is feasible, well-powered for realistic effects, and consistent with similar within-subjects studies in the literature.

References

Anderson, S. A., & Fuller, G. B. (2010). Effect of music on reading comprehension of junior high school students. School Psychology Quarterly, 25(3), 178–187. https://doi.org/10.1037/a0021213

Kämpfe, J., Sedlmeier, P., & Renkewitz, F. (2011). The impact of background music on adult listeners: A meta-analysis. Psychology of Music, 39(4), 424–448. https://doi.org/10.1177/0305735610376261

Perham, N., & Currie, S. (2014). Does listening to preferred music improve reading comprehension performance? Applied Cognitive Psychology, 28(6), 924–930. https://doi.org/10.1002/acp.2994

Souza, B. C., & Barbosa, M. T. (2023). Should we turn off the music? Music with lyrics interferes with cognitive tasks. Journal of Cognition, 6(1), 52. https://doi.org/10.5334/joc.273

Sun, Y., Sun, C., Li, C., Shao, X., Liu, Q., & Liu, H. (2024). Impact of background music on reading comprehension: Influence of lyrics language and study habits. Frontiers in Psychology, 15, 1363562. https://doi.org/10.3389/fpsyg.2024.1363562

Thompson, W. F., Schellenberg, E. G., & Letnic, A. K. (2011). Fast and loud background music disrupts reading comprehension. Psychology of Music, 40(6), 700–708. https://doi.org/10.1177/0305735611400173

Power Analysis Music Reading

r Sys.Date()