Note that the written report should have the style of a research paper. That means that there needs to be a continuing storyline. Do not include the questions, just your answers to them. Do not use bullet points. Have connective narratives between the sentences.

Introduction

The population parameter of interest is the difference in mean 1500-meter swim time between coffee drinkers and non-coffee drinkers, in a population of recreational swimmers living on the Islands. This parameter captures the average performance impact of caffeine ingestion in a real-world exercise scenario.

Caffeine is widely known for its stimulant effects, often used by athletes to boost endurance and reduce fatigue. Guest et al. (2021) demonstrated that caffeine intake can significantly improve endurance performance by enhancing energy metabolism and reducing perceived exertion. Additionally, Wells (2006) highlighted caffeine’s ability to increase fat oxidation and prolong time to exhaustion. These findings support the rationale behind examining coffee-a natural caffeine source-as a performance aid. Gawarecki (2021) further emphasized the value of student-designed experiments in engaging learners with real-world scientific methodology, inspiring the structure of this project.

Before collecting data, I hypothesized that participants who consumed coffee before their swim would, on average, complete the 1500-meter swim faster than those who did not. This hypothesis was based on established physiological research and anecdotal practices among athletes who use caffeine to enhance physical performance.

Data Collection Methods

The observational units in this study were individual residents of the Islands who participated in a 1500-meter swim. I collected data from a total of 60 participants. Each participant was randomly assigned to one of two groups: a treatment group, in which individuals drank coffee shortly before swimming, and a control group, in which individuals refrained from consuming coffee. Random assignment was carried out manually using a simple random process (e.g., alternating assignments or coin flips), ensuring an unbiased division of participants between the two groups.

The explanatory variable in this study is coffee consumption, a binary categorical variable recorded as either “Yes” (drank coffee before the swim) or “No” (did not drink coffee). Participants were asked whether they had consumed coffee approximately 15–30 minutes before the swim, aligning with caffeine’s expected onset window. This information was self-reported at the time of data collection.

The response variable is swim time, a continuous quantitative variable representing the total number of minutes it took participants to complete the 1500-meter swim. Swim time was self-reported by participants, and they were encouraged to use a timer or stopwatch during the swim for greater accuracy. I recorded swim time to one decimal place for consistency and precision.

While the design of the study followed a clear and replicable protocol, there were several challenges during implementation. One of the main issues was uncertainty about the honesty and accuracy of self-reported data. It was difficult to confirm whether participants actually drank coffee when they said they did, or whether they swam the full distance as instructed. Additionally, some individuals may have estimated their swim times without proper timing tools, which could introduce measurement error.

Another limitation was participant retention. Many individuals declined to participate in the study when approached, and a few who initially consented later withdrew before completing the swim. As a result, the sample consists only of those who were willing and able to complete the full process, which introduces potential non-response bias. Despite these challenges, I maintained a balanced sample size between the coffee and no-coffee groups and adhered to the random assignment protocol.

This study could be replicated by future researchers by following the same procedures: recruit a voluntary sample of participants, randomly assign coffee consumption prior to a standardized 1500-meter swim, and record swim time using a consistent method. Future improvements could include supervising the swim, using digital stopwatches, or incorporating wearable fitness trackers to reduce measurement error and increase the reliability of the results.

library(readr)
Final_Project <-
read_csv("/cloud/project/Miniproject Data - Sheet1 (3).csv")
head(Final_Project, n=2)

Descriptive Statistics

This study involved one binary categorical explanatory variable — coffee consumption — and one quantitative response variable — swim time. I also collected participant names (categorical) to organize the data and used those to verify completeness. In this section, I summarize both the categorical and quantitative variables using appropriate numerical and visual methods to explore potential relationships.

The coffee variable indicates whether a participant consumed coffee before swimming and has two categories: “Yes” and “No.” The name variable is a categorical identifier unique to each participant. Although it is not central to the analysis, I used it to organize observations and ensure no duplicates existed. The following two-way table summarizes the count of participants by coffee consumption group

# Convert Coffee to a factor
Final_Project$Coffee <- as.factor(Final_Project$Coffee)
boxplot(`Swim Time` ~ Coffee,
        data = Final_Project,
        horizontal = FALSE,
        col = c("lightgreen", "lightblue"),
        xlab = "Coffee Consumption",
        ylab = "Swim Time (minutes)",
        main = "Swim Times by Coffee Consumption",
        names = c("No", "Yes"))  # Customize x-axis labels

library(readr)
Final_Project <-
read_csv("/cloud/project/Miniproject Data - Sheet1 (3).csv")
head(Final_Project, n=2)
bwplot(as.factor(Final_Project$`Coffee`) ~ Final_Project$`Swim Time`,
horizontal = TRUE,
main="Side-by-side boxplots",
data = Final_Project)

bwplot(`Coffee` ~ `Swim Time`,
horizontal = TRUE,
main="Side-by-side boxplots",
data = Final_Project)

favstats(`Swim Time` ~ `Coffee`, data = Final_Project)

Analysis of Results

The two populations are:

All recreational islanders who drink coffee before swimming. All recreational islanders who do not drink coffee before swimming.

The parameter of interest is the difference in the average (mean) 1500-meter swim times between these two populations. In symbols, this is: \(\mu\) coffee- \(\mu\) no coffee It represents how much faster or slower coffee drinkers swim compared to non-coffee drinkers, on average.

#Null and alternative hypothesis

In symbols:

H0= \(\mu\) coffee -\(\mu\) no coffee = 0

Ha= \(\mu\) coffee - \(\mu\) no coffee > 0

In words:

Null hypothesis (H0): The average 1500-meter swim time is the same for coffee drinkers and non-coffee drinkers. In other words, drinking coffee has no effect on swim performance.

Alternative hypothesis (Ha): The average 1500-meter swim time for coffee drinkers is less than that of non-coffee drinkers - suggesting that drinking coffee improves swim performance.

#Type I and Type II error

Type I Error: This would mean concluding that drinking coffee does improve swim performance i.e., it leads to faster 1500m swim times when in reality, it does not. In other words, we falsely reject the null hypothesis.

Type II Error: This would mean concluding that drinking coffee does not affect swim performance, when in reality, it actually does help swimmers perform better. In other words, we fail to reject the null hypothesis when the alternative is actually true.

#Justification of the sample size

Yes, my sample can reasonably be considered representative of the population of islander swimmers. I used random sampling to select participants, which helps reduce bias and increases the chance that my sample reflects the larger population. While some people declined to participate or withdrew, the overall response rate was still good, and I included both coffee drinkers and non-drinkers in the sample.

Because of the random sampling method, it’s reasonable to generalize the results to the broader population of recreational swimmers on the Islands-though some caution is still necessary due to self-reported data and natural variability.

library(mosaic)
set.seed(62)
Social.null <- do(1000) * rflip(n=300, prob = 0.5)
dotPlot(~prop, data = Social.null, width = 0.01,
pch=1,
groups = (prop >= 29/62),
xlab="Proportion of islanders who drank coffee before swim",
main="Distribution of proportions")

##theory-based approach

standadirzed statistic and validity conditions

The appropriate standardized statistic for this theory-based analysis is the t-statistic, which compares the difference in sample means to the variability expected due to sampling. In this case, the t-statistic measures how many standard errors the observed difference in mean swim times between coffee drinkers and non-drinkers is from zero. For the results to be valid using a theory-based approach, several conditions must be satisfied. First, the observations in each group must be independent, which is met because each swimmer was measured only once and individuals were randomly selected. Second, the response variable-swim time-should be approximately normally distributed within each group. While the sample size in each group is moderate, the Central Limit Theorem supports the use of a t-test as long as there are no extreme outliers or severe skewness. Lastly, because the data were obtained through random sampling, we can reasonably generalize the findings to the broader population of islander swimmers. Based on these considerations, the validity conditions for a theory-based test appear to be reasonably satisfied.

histogram(~`Swim Time` | `Coffee`, data = Final_Project, width = 1, layout = c(1, 2))

stat(t.test(`Swim Time` ~ `Coffee`, data = Final_Project))
##        t 
## 3.449073

# p-value

two.sided.p.value<-pval(t.test(`Swim Time` ~ `Coffee`, data = Final_Project))
one.sided.p.value<-two.sided.p.value/2
cat("the one-sided p-value is", one.sided.p.value)
## the one-sided p-value is 0.0005175788

The p-value is the probability of observing a difference in mean swim times as extreme or more extreme than the one found in the sample, assuming that coffee has no true effect on swim performance

#Conclusion

Based on the data, there is evidence to suggest that drinking coffee before swimming is associated with faster 1500-meter swim times among islander swimmers.

#Validity conditions Since the validity conditions for a theory-based test appear to be reasonably satisfied-namely, independent observations, approximate normality, and random sampling-a simulation-based approach is not strictly necessary. However, to verify the robustness of the findings, a randomization test can be performed as a comparison. The simulation-based p-value was approximately 0.0005175788, which was slightly different from the theory-based p-value. Both approaches led to the same statistical conclusion about the null hypothesis, increasing confidence in the result. The similarity between the two p-values suggests that the theory-based test is appropriate and reliable for this dataset.

#Confidence interval

# Sample sizes and summary stats
n.coffee <- 29
n.nocoffee <- 33
x.bar.coffee <- 31.53793
x.bar.nocoffee <- 38.10000
SD.coffee <- 6.928637
SD.nocoffee <- 8.051320
x.bar.diff <- x.bar.nocoffee - x.bar.coffee
SE.x.bar.diff <- sqrt((SD.nocoffee^2 / n.nocoffee) + (SD.coffee^2 / n.coffee))
MoE <- 2 * SE.x.bar.diff
LB <- x.bar.diff - MoE
UB <- x.bar.diff + MoE
round(cbind(LB, UB), 3)
##         LB     UB
## [1,] 2.757 10.367

The 95% confidence interval for the difference in mean 1500-meter swim times between non-coffee drinkers and coffee drinkers was calculated to be approximately [2.757 to 10.367 minutes. This means I am 95% confident that, on average, non-coffee drinkers take between 2.757 and 10.367 minutes longer to complete the swim than coffee drinkers. Since the confidence interval does not include zero, this provides evidence that there is a real difference in swim performance between the two groups. Specifically, it suggests that drinking coffee is associated with faster swim times. This conclusion is consistent with the one drawn from the hypothesis test in 5d, where I also found significant evidence to suggest that coffee drinkers swim faster on average.

Conclusion

Summarize the results of your study (there will be some repetition, and you should cite your evidence). You should tell a story: What did you learn? Did the data behave as you expected? Pay particular attention to whether or not it is reasonable to generalize your sample to the larger population or process. Is there anything you would do differently next time? What similar questions might someone choose to investigate in the future to build on your results?

One surprising element was how consistent the difference in swim times was, despite relying on self-reported data. If I were to repeat this study, I would aim to supervise the swim sessions directly or use fitness trackers to reduce measurement error. Future research could explore whether the time of day, caffeine dose, or swimmer experience level influences the performance effect of coffee. also make sure the did the swim by supervising them and increase the sample size

Bibliography: references to literature mentioned in the introduction

Guest, N., VanDusseldorp, T., Nelson, M. T., Grgic, J., Schoenfeld, B. J., Jenkins, N. D., & Trexler, E. T. (2021). International society of sports nutrition position stand: caffeine and exercise performance. Journal of the International Society of Sports Nutrition, 18(1), 1–37. https://jissn.biomedcentral.com/articles/10.1186/s12970-020-00383-4

Wells, J. (2006). Caffeine and sports performance. Journal of Sports Science and Medicine, 5(1), 71–76. https://www.healthline.com/nutrition/caffeine-and-exercise

Gawarecki, A. P. (2021). The Caffeine Lab: A Course-Based Undergraduate Research Experience (CURE) in Introductory Statistics. Journal of Statistics and Data Science Education, 29(1), 1–5. https://pmc.ncbi.nlm.nih.gov/articles/PMC9508920/

The Islands, link..https://islands.smp.uq.edu.au/project.php Data spreadsheet. Link https://docs.google.com/spreadsheets/d/1bRmedWGjjvJVwMtHYE2vI3oVuKU5LKzbyxJT9iI3ABw/edit?gid=0#gid=0