Introduction

What if there was a drug that could help people focus better, raise energy levels, reduce stress, stay at a healthy weight, lower chances of illness and chronic conditions, increase athletic performance, and more? Fortunately, it exists: sleep.

However, there are many anecdotal accounts of people who function better—either athletically or energy-wise—with less sleep.

This study aims to investigate whether there’s evidence to support that sleep really has all those amazing benefits—and if anecdotal experiences of improved performance are substantiated. Specifically, is there a significant difference in 100-meter sprint performance between participants who maintain adequate sleep (\(\ge7\) hours) versus those who don’t (\(<7\) hours). The population parameter of interest is the difference in mean sprint times between these two groups of Providence Island residents aged 18-30 years.

Previous research has established strong connections between sleep and athletic performance. Mah et al. (2011) found that collegiate basketball players who extended their sleep duration showed significant improvement in sprint times. Similarly, Skein et al. (2011) found that sleep deprivation negatively impacted sprint performance, while Mougin et al. (1991) observed that sleep disturbances affected later athletic performance. The American Academy of Sleep Medicine recommends at least 7 hours of sleep for optimal health and performance in adults (2015), which is where the threshold number of 7 hours comes from.

Based on previous literature, I suspected that participants with adequate sleep (\(\ge7\) hours) would have better sprint performance compared to those with inadequate sleep (\(<7\) hours).

Data Collection Methods

The observational units are the individual Islanders of Providence Island who participated in the study.

60 participants were randomly sampled: A list of all eligible participants (age 18-30, alive residents of Providence) was created by going through the birth records of all town halls of all cities of the Islands. If an individual didn’t live in Providence anymore, was outside of the age range, or was deceased, they were removed from the list. Then, an R script randomly picked out 100 potential individuals, and they were sequentially asked to participate in the study. If they didn’t consent (very common), or removed themselves from the study (only one individual did after day 2), the next Islander on the randomly generated list was asked to participate in the study until 60 participants were enrolled.

During the study period of 7 days, each participant filled out a questionnaire and completed a timed 100m sprint at around 9-10 am. On the first day, participants filled out their gender and age, along with how much they slept the night before, but only sleep duration were asked for the remainder of the period.

Outside of the single individual who removed themselves from the study, the response rate was 100%, and nothing went wrong during the course of the study.

The primary potential source of error is the self-reported sleep duration, which may have been affected by recall bias and rounding. To reduce the affects of one-time events outside of sleep that may affect sprint time, a 7-day measurement period was used.

Descriptive Statistics

sleep_data <- read_csv("Matthew Yao MATH247 Data.csv", show_col_types = FALSE)
sleep_data$enoughSleep <- as.logical(sleep_data$enoughSleep)

favstats(~ avgSprint, data = sleep_data)
favstats(~ avgSleep, data = sleep_data)
favstats(avgSprint ~ enoughSleep, data = sleep_data)
favstats(avgSleep ~ enoughSleep, data = sleep_data)

The data consists of 60 participants from Providence Island, with 56 individuals (93.33%) reporting adequate sleep (\(\ge7\) hours) and 4 individuals (6.67%) reporting inadequate sleep (\(<7\) hours). The average sleep duration across all participants was 7.71 hours (SD = 0.45), while the average sprint time was 18.8 seconds (SD = 4.12).

Examining the relationship between sleep adequacy and sprint performance, participants with adequate sleep averaged 18.72 seconds (SD = 0.387) in their 100m sprint times, while those with inadequate sleep averaged 19.90 seconds (SD = 0.085). This initial comparison suggests that participants with inadequate sleep had slower sprint times by approximately 1.18 seconds on average, though the small sample size of participants with inadequate sleep leads to inaccuracy.

ggplot(sleep_data, aes(x = enoughSleep, y = avgSprint, fill = enoughSleep)) +
  geom_boxplot() +
  labs(title = "Sprint Performance by Sleep Adequacy",
       x = "Adequate Sleep (>=7 hours)",
       y = "Average Sprint Time (seconds)",
       fill = "Adequate Sleep") +
  theme_minimal() +
  scale_fill_manual(values = c("red", "blue"), 
                    labels = c("No (<7 hours)", "Yes (>=7 hours)"))

This side-by-side boxplot shows the distribution of sprint times for both groups. The median sprint time is higher for the inadequate sleep group, and their distribution is narrower. The adequate sleep group shows more variability in sprint times (which is to be expected, since many factors beyond just sleep play into sprint times), with several outliers at both the faster and slower ends of the data. This supports the numerical summary, suggesting a potential association between sleep adequacy and sprint performance.

As a result of having a small inadequate sleep group, day-to-day variations in sleep duration correlate with changes in sprint performance was investigated to find insights into the dynamic relationship between sleep and athletic performance.

sleep_long <- sleep_data %>%
  select(participant, contains("day") & contains("sleep")) %>%
  pivot_longer(
    cols = -participant,
    names_to = "day",
    values_to = "sleep"
  ) %>%
  mutate(day = as.integer(gsub("day(\\d)_sleep", "\\1", day)))

sprint_long <- sleep_data %>%
  select(participant, contains("day") & contains("sprint")) %>%
  pivot_longer(
    cols = -participant,
    names_to = "day",
    values_to = "sprint"
  ) %>%
  mutate(day = as.integer(gsub("day(\\d)_sprint", "\\1", day)))

daily_data <- inner_join(sleep_long, sprint_long, by = c("participant", "day")) %>%
  arrange(participant, day)

daily_changes <- daily_data %>%
  group_by(participant) %>%
  mutate(
    sleep_change = sleep - lag(sleep),
    sprint_change = sprint - lag(sprint)
  ) %>%
  filter(!is.na(sleep_change))

ggplot(daily_changes, aes(x = sleep_change, y = sprint_change)) +
  geom_point(alpha = 0.5) +
  geom_smooth(formula = y ~ x, method = "lm", color = "blue") +
  stat_cor(method = "pearson", label.x = 1, label.y = 2) +
  labs(title = "Relationship Between Day-to-Day Changes in Sleep and Sprint Performance",
       x = "Change in Sleep Duration (hours)",
       y = "Change in Sprint Time (seconds)") +
  theme_minimal()

The day-to-day analysis reveals an interesting pattern. The overall correlation between daily changes in sleep duration and sprint times is r = 0.029 (p = 0.59), which is not only weakly correlated, but is actually a positive correlation! This contradicts previous research completely, but since it isn’t strong evidence, it may just be due to randomness.

To examine individual patterns, four participants were selected with different sleep and performance characteristics:

selected_participants <- c(
  "Haroon Chatterjee", # inadequate sleep
  "Grace McCarthy",    # longest sleeper
  "Zachary Price",     # fastest sprinter
  "Manan Tiwari"       # slowest sprinter
)

daily_data %>%
  filter(participant %in% selected_participants) %>%
  ggplot(aes(x = as.numeric(day), y = sleep, color = "Sleep Duration")) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  geom_line(aes(y = sprint/5, color = "Sprint Time (scaled)"), size = 1) +
  geom_point(aes(y = sprint/5, color = "Sprint Time (scaled)"), size = 2) +
  facet_wrap(~participant, scales = "free_y") +
  scale_x_continuous(breaks = 1:7, labels = 1:7) +
  scale_y_continuous(
    name = "Sleep Duration (hours)",
    sec.axis = sec_axis(~.*5, name = "Sprint Time (seconds)")
  ) +
  scale_color_manual(values = c("Sleep Duration" = "blue", "Sprint Time (scaled)" = "red")) +
  labs(title = "Sleep Duration and Sprint Performance Over 7 Days",
       x = "Day",
       color = "Measure") +
  theme_minimal() +
  theme(legend.position = "bottom")

The time series plots show diverse patterns for different participants. For Haroon Chatterjee, who consistently had inadequate sleep, we observe relatively stable sleep patterns but increasing sprint times, potentially suggesting cumulative fatige. In contrast, Grace McCarthy, who average over 9 hours of sleep, maintained consistent sprint performance.

To further examine whether sleep duration on one night predicts sprint performance the following day, a lag analysis was conducted:

lag_data <- daily_data %>%
  group_by(participant) %>%
  mutate(next_day_sprint = lead(sprint)) %>%
  filter(!is.na(next_day_sprint))

overall_lag_correlation <- cor.test(lag_data$sleep, lag_data$next_day_sprint)

ggplot(lag_data, aes(x = sleep, y = next_day_sprint)) +
  geom_point(alpha = 0.5) +
  geom_smooth(formula = y ~ x, method = "lm", color = "purple") +
  stat_cor(method = "pearson", label.x = min(lag_data$sleep) + 0.5,
           label.y = max(lag_data$next_day_sprint) - 2) +
  labs(title = "Relationship Between Sleep Duration and Next Day's Sprint Performance",
       x = "Sleep Duration (hours)",
       y = "Next Day's Sprint Time (seconds)") +
  theme_minimal()

With r = 0.0069 and p = 0.9, there’s even less evidence for a relationship between sleep duration and next day sprint performance.

Analysis of Results

The population of interest is Providence Island residents aged 18-30 years. The parameter of interest is the difference in mean sprint times (\(\mu_{adequate} - \mu_{inadequate}\)) between individuals who get inadequate sleep (\(<7\) hours) and those who get adequate sleep (\(\ge7\) hours).

The null hypothesis states that there is no difference in mean sprint times between the two sleep groups (\(H_0:\mu_{adequate} - \mu_{inadequate} =0\)) The alternative hypothesis states that there is a difference in mean sprint times between the two sleep groups (\(H_0:\mu_{adequate} - \mu_{inadequate} \neq 0\)). Based on prior research, we’d expect \(\mu_{adequate} > \mu_{inadequate}\), meaning those with inadequate sleep have slower (higher) sprint times.

In this context, a type I error would occur if we rejected the null hypothesis when there is actually no difference in sprint performance between the two sleep groups. This would lead to an incorrect conclusion that sleep adequacy affects sprint performance. A type II error would occur if we failed to reject the null hypothesis when there is actually a difference in sprint performance between the two sleep groups. This would lead to missing a genuine effect of sleep on athletic performance.

Our sample can reasonably be considered representative of the population of interest because participants were randomly selected from the eligible population of Providence Island residents aged 18-30. The 7-day measurement period helps account for day-to-day variability in both sleep and sprint performance.

t.test(avgSprint ~ enoughSleep, data = sleep_data, 
                        alternative = "two.sided", var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  avgSprint by enoughSleep
## t = 0.45237, df = 3.3066, p-value = 0.6791
## alternative hypothesis: true difference in means between group FALSE and group TRUE is not equal to 0
## 95 percent confidence interval:
##  -6.641514  8.980086
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            19.89500            18.72571

To analyze the difference in sprint times between the two sleep groups, a two-sample t-test was conducted. The test statistic is t = 0.45 with 3.3 degrees of freedom, resulting in a p-value of 0.6791. This p-value represents the probability of observing a difference in mean sprint times at least as extreme as what was observed (1.18 seconds), assuming that there is no true difference between the groups.

set.seed(247)

adequate_sleep <- sleep_data$avgSprint[sleep_data$enoughSleep == TRUE]
inadequate_sleep <- sleep_data$avgSprint[sleep_data$enoughSleep == FALSE]

observed_diff <- mean(inadequate_sleep) - mean(adequate_sleep)
n_sims <- 10000
sim_diffs <- numeric(n_sims)
all_sprints <- c(adequate_sleep, inadequate_sleep)

for (i in 1:n_sims) {
  shuffled_data <- sample(all_sprints, size = length(all_sprints), replace = FALSE)
  sim_group1 <- shuffled_data[1:length(inadequate_sleep)]
  sim_group2 <- shuffled_data[(length(inadequate_sleep) + 1):length(all_sprints)]
  sim_diffs[i] <- mean(sim_group1) - mean(sim_group2)
}

p_value_sim <- mean(abs(sim_diffs) >= abs(observed_diff))
cat("Simulation-based p-value:", p_value_sim, "\n")
## Simulation-based p-value: 0.6013

The validity conditions for the t-test are almost met: independence is satisfied because the participants were randomly selected and the values are approximately normal, but there is a small sample size for the inadequate sleep group. So, a simulation-based approach was performed as well, yielding a p-value of 0.6013.

Given the p-values of 0.6013 and 0.6791, we fail to reject the null hypothesis—we do not have sufficient evidence to conculde that there is a significant difference in sprint performance between individuals with adequate versus inadequate sleep.

alpha <- 0.05
ci_lower <- observed_diff - quantile(sim_diffs, 1 - alpha/2)
ci_upper <- observed_diff - quantile(sim_diffs, alpha/2)
cat("95% Simulation-based CI: [", round(ci_lower, 3), ",", round(ci_upper, 3), "]\n")
## 95% Simulation-based CI: [ -3.233 , 5.172 ]

The 95% confidence interval for the difference in mean sprint times (inadequate minus adequate) is [-6.64, 8.98] with a theory-based approach and [-3.233, 5.172] with a simulation-based approach. Both intervals include zero, which align with our failure to reject the null hypothesis. We can say that the plausible vaules for the ture difference in mean sprint times between the two sleep groups range from the adequate sleep group being 6.64 seconds faster to the inadequate group being 8.98 seconds faster (3.233 and 5.172 if using the simulation-based, respectively). Since both intervals contain zero, the possibility that there is no difference between the groups cannot be ruled out.

Conclusion

This study investigated whether there is a significant difference in 100-meter sprint performance between Providence Island residents aged 18-30 who maintain adequate sleep (\(\ge7\) hours) versus those who don’t (\(<7\) hours). The results did not provide sufficient evidence to conclude that a difference exists (p = 0.56), despite observing that participants with inadequate sleep had sprint times that were 1.18 seconds slower on average. The simulation-based analysis yielded a similar p-value, and a lag and day-to-day analysis were similarly insignificant.

The data did not behave as expected, and in the case of the day-to-day analysis, even slightly went in the opposite direction as previous research. We can generaalize this sample to the larger population of Providence Island residents aged 18-30, since participants were randomly sampled.

If conducting this study again, I would implement several improvements: increasing the overall sample size to include more participants with inadequate sleep, using a stratified sampling approach to ensure balanced representation of both sleep groups, and extending the observation period beyond 7 days.

Bibliography

American Academy of Sleep Medicine and Sleep Research Society. (2015). Recommended amount of sleep for a healthy adult: a joint consensus statement. Sleep, 38(6), 843-844. https://doi.org/10.5665/sleep.4716

Mah, C. D., Mah, K. E., Kezirian, E. J., & Dement, W. C. (2011). The effects of sleep extension on the athletic performance of collegiate basketball players. Sleep, 34(7), 943-950. https://doi.org/10.5665/SLEEP.1132

Mougin, F., Simon-Rigaud, M. L., Davenne, D., Renaud, A., Garnier, A., Kantelip, J. P., & Magnin, P. (1991). Effects of sleep disturbances on subsequent physical performance. European Journal of Applied Physiology and Occupational Physiology, 63(2), 77-82. https://doi.org/10.1007/BF00235173

Skein, M., Duffield, R., Edge, J., Short, M. J., & Mündel, T. (2011). Intermittent-sprint performance and muscle glycogen after 30 h of sleep deprivation. Medicine & Science in Sports & Exercise, 43(7), 1301-1311. https://doi.org/10.1249/MSS.0b013e31820abc5a