Final-Project-Report-2.knit

Title: The Relationship Between High-Intensity Physical Activity and Cognitive Performance in Islanders

Author: Burhan Zafar

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(infer)
library(readr)

#Reading my dataset
df <- read_csv("https://docs.google.com/spreadsheets/d/1aWD7TH2YeXMpp7uqr7gigRdkJA3QMUhs63S0AHAXtP0/export?format=csv")

## Rows: 60 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Village, House number, Name, Sports_Participant
## dbl (2): Age, Stroop_Score / ms
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

colnames(df)

## [1] "Village"            "House number"       "Name"              
## [4] "Age"                "Sports_Participant" "Stroop_Score / ms"

head(df)

## # A tibble: 6 × 6
##   Village `House number` Name         Age Sports_Participant `Stroop_Score / ms`
##   <chr>   <chr>          <chr>      <dbl> <chr>                            <dbl>
## 1 Arcadia 20             Noel Clau…    49 Yes                                820
## 2 Arcadia 35             Arnaud Be…    39 Yes                                743
## 3 Arcadia 57             Georgina …    34 Yes                                795
## 4 Arcadia University     Avni Shar…    37 Yes                                793
## 5 Arcadia University     Raine Ana…    48 Yes                                834
## 6 Akkeshi Clinic         Dr Daichi…    36 Yes                                823

Introduction

Do adult Islanders who participate in high-intensity physical activities (200m swimming as a proxy) perform better on the Stroop Interference Test (used as a proxy for cognitive ability) than those who do not? This study aims to Investigate the question by comparing cognitive performance through Stroop scores between two groups: those who engaged in intense physical activity and those who did not. The population parameter of interest is the difference in mean Stroop Test scores between adult Islanders who do and do not participate in high-intensity physical activity. We are testing whether participation in sports activity is associated with improvement in cognitive performance, and here, lower Stroop scores reflect better cognitive performance. Before data collection, I predicted that the mean Stroop score for participants in the high-intensity activity would be lower (showing better performance for partcipants), which was based on findings in the literature I read. For Example, a study by Zhang, Zhou, and Chen (2024) found that aerobic exercise improves brain function, particularly white matter integrity and cerebral blood flow. Their work, a meta-analysis of exercise effects on older adults, inspired my hypothesis that physical activity, might improve cognitive control as judged through tasks like the Stroop test.

Data Collection Methods

The observational units in my study were individual adult Islanders aged 30 to 50. I used a randomized generator and selected 60 Islanders from different Islands and homes across the Island using multi-staged random sampling. All participants included in the study provided consent. For those in the participants group, I instructed them to perform a 200-meter freestyle swim, used as a high-intensity physical activity proxy. For the control group, no such activity was performed. Immediately after either the swim or no activity, each participant completed the Stroop Interference Test, which served as the proxy for cognitive performance. I recorded each individual’s hometown, house number, age, group assignment (Yes/No for activity), and Stroop score in a Google Sheet.

The key variables were:

Sports_Participant (Binary: Yes/No): Whether the individual completed the swim task.

Stroop_Score (Quantitative): A numerical score showing performance on the Stroop test, where lower values indicate better cognitive performance.

Though the methods were easy to perform, and decently accurate, there were still some challenges. For example, some islanders declined the consent, which made the data collection more time consuming. Furthermore, assigning physical tasks one-on-one made data collection slow. Additionally, the swim session may not capture the cumulative cognitive effects of regular physical training, limiting the scope of this study to short-term outcomes. Selection was based on availability and willingness, introducing potential selection bias. Because participants weren’t randomly chosen, generalizing to the full Island population requires caution.

Descriptive Statistics

My dataset included 60 individuals evenly split between the sports participation (Yes) and control (No) groups. The summary statistics for Stroop Scores were as follows:

Participants (Yes): Mean = 807.83, SD = 25.56, Range = 743 to 854

Non-Participants (No): Mean = 802.87, SD = 25.70, Range = 756 to 851

I created a side-by-side boxplot to visualize group differences. The median Stroop score appeared slightly higher (worse performance) in the activity group. Both groups had similar variability, and the participant group exhibited a few low-score outliers.

df <- df %>%
  rename(
    Stroop_Score = `Stroop_Score / ms`
  )
# My project statistics by 
df %>%
  group_by(Sports_Participant) %>%
  summarise(
    Count = n(),
    Mean = round(mean(Stroop_Score, na.rm = TRUE), 2),
    SD = round(sd(Stroop_Score, na.rm = TRUE), 2),
    Min = min(Stroop_Score, na.rm = TRUE),
    Max = max(Stroop_Score, na.rm = TRUE)
  )

## # A tibble: 2 × 6
##   Sports_Participant Count  Mean    SD   Min   Max
##   <chr>              <int> <dbl> <dbl> <dbl> <dbl>
## 1 No                    30  803.  25.7   756   851
## 2 Yes                   30  808.  25.6   743   854

#boxplot
ggplot(df, aes(x = Sports_Participant, y = Stroop_Score, fill = Sports_Participant)) +
  geom_boxplot() +
  labs(
    title = "Stroop Scores by Sports Participation",
    x = "Sports Participation",
    y = "Stroop Score (lower = better)"
  ) +
  theme_minimal()

Further, to analyze the association between activity status and cognitive performance, I planned a two-sample t-test using the Stroop score as the response variable and Sports_Participant status as the explanatory variable.

Analysis of Results

The target population includes adults aged 30–50 from the Island. The parameter of interest is the difference in mean Stroop scores between those who did and did not complete a high-intensity activity(200m Swim).

Hypotheses:

H0 (Null): µYes = µNo (No difference in means between participants and non-participants)

Ha (Alternative): µYes ≠ µNo (There is a difference in means)

A Type I error would involve concluding that physical activity impacts Stroop scores when it actually does not. A Type II error would occur if we fail to detect a real effect of physical activity on cognitive performance.

Although my sample was not random, it reasonably represents a segment of the Island’s adult population. Both groups had equal sample sizes (n = 30), and their standard deviations were similar (25.56 and 25.70), satisfying t-test conditions. Given n = 30, the Central Limit Theorem ensures validity of inference.

t.test(Stroop_Score ~ Sports_Participant, data = df, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  Stroop_Score by Sports_Participant
## t = -0.75046, df = 58, p-value = 0.456
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  -18.21428   8.28095
## sample estimates:
##  mean in group No mean in group Yes 
##          802.8667          807.8333

Two-sample t-test results:

t-statistic: -0.750

Degrees of Freedom: 58

p-value: 0.456

95% Confidence Interval: (-18.21, 8.28)

Simulation Based Test:

null_dist <- df %>%
  specify(Stroop_Score ~ Sports_Participant) %>%
  hypothesize(null = "independence") %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "diff in means", order = c("Yes", "No"))

obs_stat <- df %>%
  specify(Stroop_Score ~ Sports_Participant) %>%
  calculate(stat = "diff in means", order = c("Yes", "No"))

null_dist %>%
  visualize() +
  shade_p_value(obs_stat = obs_stat, direction = "two-sided")

#getting p-value
get_p_value(null_dist, obs_stat = obs_stat, direction = "two-sided")

## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1   0.472

As the p-value is 0.456, this shows that if there wasn’t any true difference, our results would be this extreme or more, 45.6% of the time. As this is larger than the 0.05 threshold, we fail to reject the null hypothesis.

t.test(Stroop_Score ~ Sports_Participant, data = df, var.equal = TRUE)$conf.int

## [1] -18.21428   8.28095
## attr(,"conf.level")
## [1] 0.95

The confidence interval includes zero, reinforcing the result. This shows the possibility that the observed difference can be due to chance. Thus, no significant difference in Stroop scores was found between the two groups.

Conclusion and Discussion

My analysis provides no statistically significant evidence that a single session of high-intensity physical activity improves performance on the Stroop Interference Test. Despite the literature suggesting long-term exercise benefits, the short duration of physical exertion in this study likely limits observable cognitive gains. The data did not match my original expectation. In future iterations, I would recruit a more randomized sample, track participants over a longer duration, and include more comprehensive cognitive measures. This design would be more sensitive to subtle or long-term effects. Future work could examine repeated exposure to physical training or compare different types of exercise intensities. Furthermore, having a larger sample would improve generalizability and help clarify if there is any relationship or not.

Bibliography

Zhang, Wenwen, Chunxiao Zhou, and Aoxiang Chen. 2024. “A Systematic Review and Meta-Analysis of the Effects of Physical Exercise on White Matter Integrity and Cognitive Function in Older Adults.” Geroscience 46: 2641–2651. https://doi.org/10.1007/s11357-023-01033-8.