This project investigates whether adult Islanders with siblings are more physically active than those without siblings, as measured by their self-reported exercise over the past week. The population parameter of interest is the true difference in the mean total amount of physical activity time between Islanders who have siblings and those who do not.

# Load the dataset
data <- read.csv("~/Ontiveros - Project Data - Sheet1.csv")

# Clean up column names for easier referencing
colnames(data) <- c("Participant", "Gender", "Age", "Siblings", "How_many_siblings",
                    "Low_Intensity", "Moderate_Intensity", "Vigorous_Intensity", "Location")

# Remove the first row which is the header information
data <- data[-1, ]

# Convert relevant columns to appropriate types
data$Siblings <- as.factor(data$Siblings)
data$Low_Intensity <- as.numeric(data$Low_Intensity)
data$Moderate_Intensity <- as.numeric(data$Moderate_Intensity)
data$Vigorous_Intensity <- as.numeric(data$Vigorous_Intensity)
data <- data %>%
  mutate(Total_Exercise_Time = Low_Intensity + Moderate_Intensity + Vigorous_Intensity)

Prior literature suggests that siblings play a substantial role in influencing childhood physical activity. According to a meta-analysis, children with siblings are significantly more active than only children, often due to increased opportunities for spontaneous and social play (Kracht and Sisson, 2018). The American Heart Association similarly reports that children with siblings develop healthier routines, including better physical activity levels and improved dietary patterns (Merschel, 2024). These early patterns often extend into adulthood, as routines established in childhood—especially those involving siblings—can form long-term attitudes toward exercise. Adults who grew up with active siblings may internalize exercise as a routine or social behavior, and ongoing sibling relationships can serve as sources of accountability or motivation for maintaining physical activity. Based on this research and personal experience, I expected the average total exercise time to be higher among Islanders with siblings.

The observational units for this study were 39 adult Islanders (age 18 and older) who completed a voluntary survey. The survey first collected sibling status (yes/no and how many), followed by self-reported minutes spent exercising at low, moderate, and vigorous intensity during the previous 7 days. These values were summed to produce a single “Total Exercise Time” variable per respondent. During data collection, I encountered multiple limitations. First, the response rate was low (approximately 25%), and many individuals declined participation. Second, due to a higher number of responses from participants with siblings, I began excluding additional “yes” responses in an effort to balance the groups, introducing non-sampling error. I also did not revisit non-respondents to collect additional data, which could have helped mitigate sampling variability and confounding factors related to weekly exercise patterns.

Descriptive statistics were used to examine the relationship between sibling status and total exercise time. A boxplot comparing the two groups showed no major differences in medians or interquartile ranges (IQRs), though the whiskers did differ slightly.

# Calculate IQR statistics by Sibling Status
iqr_stats <- data %>%
  group_by(Siblings) %>%
  summarise(
    Q1 = quantile(Total_Exercise_Time, 0.25, na.rm = TRUE),
    Q3 = quantile(Total_Exercise_Time, 0.75, na.rm = TRUE),
    IQR = IQR(Total_Exercise_Time, na.rm = TRUE),
    Median = median(Total_Exercise_Time, na.rm = TRUE),
    Min = min(Total_Exercise_Time, na.rm = TRUE),
    Max = max(Total_Exercise_Time, na.rm = TRUE)
  )
print(iqr_stats)
## # A tibble: 2 × 7
##   Siblings    Q1    Q3   IQR Median   Min   Max
##   <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1 no         110 1282. 1172.    490     0  3630
## 2 yes          0 1230  1230     440     0  3500
# Horizontal boxplot with custom colors
ggplot(data, aes(y = Siblings, x = Total_Exercise_Time, fill = Siblings)) +
  geom_boxplot() +
  scale_fill_manual(values = c("no" = "lightblue", "yes" = "lightgreen")) +
  labs(title = "Total Time Spent Exercising by Sibling Status",
       x = "Total Time (minutes)",
       y = "Sibling Status") +
  theme_minimal()

Among participants without siblings, the IQR ranged from 110 to 1282 minutes, while for those with siblings, it ranged from 0 to 1230 minutes. The medians were also similar: 490 minutes for those without siblings and 440 minutes for those with siblings. These similarities suggest no strong association based on the visual summaries alone.

To statistically assess the relationship, I conducted a theory-based two-sample t-test.

t_test_total_time <- t.test(Total_Exercise_Time ~ Siblings, data = data)
# Output results
data.frame(standardized_statistic = t_test_total_time$statistic)

The standardized test statistic (t-value) was 0.466. This low value suggests the observed difference in means is small relative to the variability in the data. While the assumption of independence was reasonable, the validity of the theory-based test is weakened by the small and unequal group sizes and the skewed distribution of total exercise times—particularly among participants without siblings.

# Output p-value
data.frame(p_value = t_test_total_time$p.value)

The p-value was 0.65, meaning that under the assumption that sibling status has no actual effect, there is a 65% chance of observing a difference in sample means as extreme as the one found. Since this p-value is well above any conventional significance level (e.g., 0.05), we fail to reject the null hypothesis. This result suggests that we do not have sufficient evidence to conclude that Islanders with siblings are more physically active than those without.

Given the limitations of the theory-based test, I conducted a simulation-based randomization test to verify the result.

set.seed(123)
n_sim <- 10000
simulated_diffs <- replicate(n_sim, {
  shuffled_siblings <- sample(data$Siblings)
  mean(data$Total_Exercise_Time[shuffled_siblings == "yes"]) -
    mean(data$Total_Exercise_Time[shuffled_siblings == "no"])
})
obs_diff <- mean(data$Total_Exercise_Time[data$Siblings == "yes"]) -
            mean(data$Total_Exercise_Time[data$Siblings == "no"])

# Plot
sim_data <- data.frame(diff = simulated_diffs)
ggplot(sim_data, aes(x = diff)) +
  geom_histogram(bins = 50, fill = "lightblue", color = "black") +
  geom_vline(xintercept = obs_diff, color = "lightgreen", linetype = "dashed", size = 1.0) +
  labs(title = "Randomization Distribution of Mean Differences",
       x = "Simulated Differences in Means (Yes - No)",
       y = "Frequency")

# P-value
p_value_sim <- mean(simulated_diffs >= obs_diff)
p_value_sim
## [1] 0.7109

The simulation-based p-value was 0.71, nearly identical to the theory-based value. This reinforces the finding that the observed difference could very likely arise from random chance alone.

# Confidence interval
data.frame(
  ci_lower = t_test_total_time$conf.int[1],
  ci_upper = t_test_total_time$conf.int[2]
)

A 95% confidence interval for the difference in means ranged from -683.4 to 1059.6 minutes. Because this interval includes zero, it is plausible that there is no difference at all in average total exercise time between the two groups. This aligns with the earlier conclusion drawn from hypothesis testing.

Overall, the data did not support the hypothesis that sibling status affects physical activity levels in adulthood. Both the theory-based and simulation-based p-values were large, and the confidence interval included zero. However, several limitations—such as sampling bias, low response rate, and small subgroup sizes—mean these results should be interpreted cautiously. Future studies could improve on this design by implementing automated, randomized sampling methods and examining other aspects of sibling relationships, such as closeness or frequency of interaction. Experimental designs could also assess whether sibling status influences performance or persistence in structured physical activity environments, such as HIIT classes. Despite the null result, this study contributes to understanding the long-term impact of early family dynamics on adult health behaviors.

References Kracht, Chelsea L, and Susan B Sisson. “Sibling Influence on Children’s Objectively Measured Physical Activity: A Meta-Analysis and Systematic Review.” BMJ Open Sport & Exercise Medicine, 26 July 2018, bmjopensem.bmj.com/content/4/1/e000405.

Merschel, Michael. “The Surprising Ways Your Siblings and Your Health May Be Linked.” American Heart Association, 9 Apr. 2024, www.heart.org/en/news/2024/04/09/the-surprising-ways-your-siblings-and-your-health-may-be-linked.

Presentation: Style, organization, layout, grammar, presentation of a written report, creativity. Make use to cite any work/studies you used to come up with your research question.

R code: Also, make sure that all relevant R code and output are in the body of your report.