Math 247 Final Project Report

Introduction

Existing literature has established a heavy link between periods of long-term endurance exercise (like running) and improvements in cardiovascular health, including lower rates of heart disease and more relaxed resting heart rate (RHR) (Bachoń et al., 2024). RHR can be used as a simple quantitative metric for assessing an individual’s cardiovascular health (Saxena et al., 2013). Unfortunately, spending several months consistently exercising every day is a very high quantity of effort for relatively difficult to visualize benefits. If some of these same benefits could also be observed without the long-term input, it could help motivate more people to get healthier with a more concrete and visualizable product of their efforts along the way to the larger benefit of long term changes.

In an attempt to establish this connection, we will be investigating if a single run has the power to generate a statistically significant decrease in an individual’s RHR (research question). To do this we will use the virtual human population on The Islands, a virtual world designed by the University of Queensland. We will take each virtual subject’s RHR before and after a short run (that is the population parameter of interest), to see if that level of stimulus is enough to help them get healthier. If this level of input is enough to make a difference, we will expect to see a slight decrease in RHR.

Data Collection Methods

We collected data on the islanders (islanders were the observational units) by arbitrarily selecting a town on the islands, then randomly selecting houses using a random number generator, then asking people in the houses to participate in the study. We had over 70% positive responses, including some from the parents of minors and from dead people. Unfortunately, infants cannot be forced to run, and dead people don’t have a heart rate, so we had to cut them from our study.

After collecting a large pool of subjects, we measured their RHR and then entered their information into our database. After taking their heart rate, we sent them out on a short run (30 minutes). After the run, their heart rate was very high due to the stimulus of exercise, so they were given 5 hours to recover. After recovering from the run, we took their heart rate again. We tried to ask them questions about how the run felt, if they enjoyed it, etc, as it could have been interesting to compare positive/negative perceptions of exercise to empirically derived cardiovascular health, but the simulated people were unable to understand our questions. Once we had collected an initial and a final RHR (measured variable), we were able to thank our participants and move on to data analysis.

Descriptive Statistics

Code for Figures

Figure 1:

{r}

library(ggplot2)

initial <- c(63, 66, 69, 66, 86, 76, 68, 108, 72, 79, 70, 66, 80, 92, 58, 62, 73, 85, 63, 61, 71, 90, 60, 53) final <- c(63, 67, 66, 65, 88, 76, 67, 107, 72, 77, 69, 67, 79, 93, 57, 64, 75, 84, 64, 60, 73, 90, 58, 53)

df <- data.frame(Initial = initial, Final = final) df$Change <- df$Final - df$Initial

ggplot(df, aes(x = Initial, y = Final, color = Change)) + geom_point(size = 3) + geom_abline(slope = 1, intercept = 0, linetype = “dashed”, color = “black”) + scale_color_gradient2(low = “#1f78b4”, mid = “gray90”, high = “#e31a1c”, midpoint = 0) + labs( title = “Final vs Initial Resting Heart Rate”, subtitle = “Blue = Decrease, Red = Increase”, x = “Initial RHR (bpm)”, y = “Final RHR (bpm)”, color = “Change (Final - Initial)” ) + coord_fixed() + theme_minimal() + theme( plot.title = element_text(size = 16, face = “bold”, hjust = 0.5), plot.subtitle = element_text(size = 12, hjust = 0.5) )

Figure 2:

{r}

library(ggplot2) library(patchwork)

mean_initial <- 72.375 mean_final <- 72.25 median_initial <- 69.5 median_final <- 68

df_mean <- data.frame( Condition = factor(c(“Initial”, “Final”), levels = c(“Initial”, “Final”)), Value = c(mean_initial, mean_final) )

df_median <- data.frame( Condition = factor(c(“Initial”, “Final”), levels = c(“Initial”, “Final”)), Value = c(median_initial, median_final) )

y_limits <- c(67, 73)

custom_colors <- c(“Initial” = “skyblue”, “Final” = “orange”)

plot_mean <- ggplot(df_mean, aes(x = Condition, y = Value, fill = Condition)) + geom_bar(stat = “identity”, width = 0.4, color = “black”) + geom_text(aes(label = round(Value, 3)), vjust = -0.5, size = 5) + scale_fill_manual(values = custom_colors) + coord_cartesian(ylim = y_limits) + labs(title = “Mean Resting Heart Rate”, y = “Beats per Minute”, x = NULL) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5, size = 14, face = “bold”), legend.position = “none” )

plot_median <- ggplot(df_median, aes(x = Condition, y = Value, fill = Condition)) + geom_bar(stat = “identity”, width = 0.4, color = “black”) + geom_text(aes(label = round(Value, 1)), vjust = -0.5, size = 5) + scale_fill_manual(values = custom_colors) + coord_cartesian(ylim = y_limits) + labs(title = “Median Resting Heart Rate”, y = “Beats per Minute”, x = NULL) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5, size = 14, face = “bold”), legend.position = “none” )

plot_mean + plot_median

Figure 3:

{r}

raw_data <- c( “0”, “0.01503759398”, “-0.04444444444”, “-0.01526717557”, “0.02298850575”, “0”, “-0.01481481481”, “-0.009302325581”, “0”, “-0.02564102564”, “-0.01438848921”, “0.01503759398”, “-0.01257861635”, “0.01081081081”, “-0.01739130435”, “0.03174603175”, “0.02702702703”, “-0.01183431953”, “0.0157480315”, “-0.01652892562”, “0.02777777778”, “0”, “-0.03389830508”, “0” )

clean_data <- as.numeric(raw_data) clean_data <- clean_data[!is.na(clean_data)]

boxplot(clean_data, main = “Change as Proportion of Heart Rate”, xlab = “Change”, horizontal = TRUE, col = “lightblue”, border = “darkblue”)

stripchart(clean_data, method = “stack”, jitter = 0.1, pch = 19, col = “darkred”, add = TRUE, horizontal = TRUE)

abline(v = 0, col = “red”, lty = 2)

Figure 4:

{r}

heart_rate <- data.frame( Rate = c(initial, final), Group = factor(rep(c(“Initial”, “Final”), each = length(initial)), levels = c(“Initial”, “Final”)) )

boxplot(Rate ~ Group, data = heart_rate, main = “Side-by-Side Boxplot of Initial and Final Resting Heart Rates”, xlab = “Group”, ylab = “Resting Heart Rate”, col = c(“lightblue”, “lightgreen”))

Figure 5:

{r}

library(ggplot2) library(dplyr) library(tidyr)

data <- data.frame( initial_rhr = c(63, 66, 69, 66, 86, 76, 68, 108, 72, 79, 70, 66, 80, 92, 58, 62, 73, 85, 63, 61, 71, 90, 60, 53), final_rhr = c(63, 67, 66, 65, 88, 76, 67, 107, 72, 77, 69, 67, 79, 93, 57, 64, 75, 84, 64, 60, 73, 90, 58, 53) )

long_data <- data %>% mutate(id = row_number()) %>% pivot_longer(cols = c(final_rhr, initial_rhr), names_to = “Type”, values_to = “Value”)

ggplot(long_data, aes(x = Type, y = Value, group = id, color = Type)) + geom_point(size = 4, alpha = 0.8) + geom_line(aes(group = id), color = “darkgrey”, linetype = “dashed”) + scale_color_manual(values = c(“lightgreen”, “pink”)) + theme_minimal() + labs(title = “Resting Heart Rate Change: Initial vs Final”, x = ““, y =”Resting Heart Rate (RHR)“) + theme( legend.position =”none”, text = element_text(size = 14, family = “Arial”), plot.title = element_text(face = “bold”, size = 18), axis.text = element_text(color = “black”) )

Analysis of Results

In investigating our data derived from a random sampling of our population (Islanders), we noticed little difference between the parameters of interest (Initial vs. Final RHR). The null hypothesis was that there would be no difference between the initial and final RHR ($H_0$: $\mu_1=\mu_2$), and the alternative hypothesis was that there would be a decrease in resting heart rate from the initial to the final RHR ( $H_a$ : $\mu_1>\mu_2$). In this study, a Type 1 error would mean that we accidentally attribute some value to a short run for short term cardiovascular health when no real difference is present in the population. A Type 2 error would mean that we fail to notice the effects of running on short term cardiovascular health, saying that there is no effect when in reality there are benefits to be had. This random sampling method employed in this study was designed to minimize any sampling bias, but there is still some level of unavoidable bias because there is a chance of correlation between cardiovascular health and willingness to participate in a study. This means that any conclusions from our results can only be extended to people living on the islands who would be willing to consent to a research study.

The standardized statistic in this paired t-test is -0.43963 which is far too small to indicate statistically significant changes in the data. This means that the observed mean is only about 0.44 standard errors from zero (the null) which is too close for statistical significance. The validity conditions for this paired t-test include having paired data (yep!), normal distribution of differences between paired values (yep!) and/or n>20 (also yep!), and random sampling (yep!). These conditions are met (for more on the verification of normal distribution see Appendix A: Verifying Validity Conditions), so we can move on with this Paired T-Test.

The theory-based approach gave us a p-value of 0.6643, meaning that the observed value is less extreme (closer to zero) than about 2/3 of random results based on the null hypothesis. 66% of the time when assuming no effect from the run on final RHR, we will see a difference in means as extreme or more extreme than this. This p-value leads me to conclude that there is no significant effect of a short run on short term RHR. We do not have sufficient evidence from the theory based p-value to reject the null hypothesis (Fail to Reject $H_0$). This means that we cannot assume a short-term impact from a run on an individual’s RHR based on this data.

The 95% confidence interval, as seen in the paired t-test above, ranges from -0.7131772 to 0.4631772. This means that we are 95% confident that the real population difference in means is somewhere between -0.713 and 0.463. This confidence interval spans 0, meaning that we are less than 95% certain that there is any difference between pre and post-run RHR. This brings us to the same conclusion that our p-value made, lacking any proof of a significant difference between the initial and final RHR. We still do not have evidence of an effect of a short run on short-term cardiovascular health.

Conclusion

This study was unable to provide sufficient evidence to support the existence of a short term benefit to cardiovascular health from a single run. The subject’s average resting heart rate did not change by any notable amount after the stimulus of a run. We saw some characteristics displayed by the Islanders in the study that are characteristic of real human heart rate behavior, like consistency over time and elevation based on physical exertion. We did not, however, detect a decrease in RHR after the stimulus of a single run. The random sampling in this study minimized bias, but there is still unavoidable bias due to correlation between cardiovascular health and willingness to participate. As such, conclusions from this only apply to people living on the islands who would be willing to consent. In the future, a study could investigate how susceptible to change the heart rate of the Islanders is without the stimulus of activity, and compare it to the results found here to further back up the conclusions found here. They could also expand this study to a larger sample size, where an average change of -0.125 might be enough to claim some statistical significance. Finally, future studies could attempt to observe a long-term correlation between exercise and cardiovascular health by asking people to run every day for several months and tracking heart rate throughout that time. This study does risk a high dropout rate due to its long duration, but could be worth pursuing if enough resources are available to decide if the effects that are seen in the real world (Bachoń et al., 2024) are also seen in the Islanders.

Appendix A: Verifying Validity Conditions

Check for normality:

Visualizing shape of distribution:

Comparing to a perfect normal distribution:

Bibliography

Bachoń, E., Doligalska, M., Stremel, A., Wesołowska, W., Leszyńska, A., Iwańska, M., & Bałoniak, Z. The Impact of Running on Cardiovascular Health: A Comprehensive Review of Benefits and Risks, Quality in Sport, Volume 35, 2024, Pages 56445-56445, (https://www.researchgate.net/publication/387304723_The_Impact_of_Running_on_Cardiovascular_Health_A_Comprehensive_Review_of_Benefits_and_Risks).

Saxena, A., Minton, D., Lee, D.C., Sui, X., Fayad, R., Lavie, C.J., & Blair, S.N. Protective role of resting heart rate on all-cause and cardiovascular disease mortality, Mayo Clinic Proceedings, Volume 88, Issue 12, 2013, Pages 1420-1426, (https://doi.org/10.1016/j.mayocp.2013.09.011)..)

McManus, J. Math 247 Final Project Report, RPubs, 13 May 2025. https://rpubs.com/jackmcmanus/1309926

Letter of Learning

This project helped me become more confident than ever in my abilities to apply statistical reasoning to analyze data from a research study. The most meaningful part of the study for me was learning to convert the data that I had gathered into slick looking R graphs. I took this class because I need to be able to use R software for my research position this coming summer, and this time I actually felt like I was turning raw data into clear and concise visuals, compared to most of the semester where I was just clicking through and using code and data that had already been worked out for me. I struggled with the data collection in The Islands, and found that the program worked best when I tried to simplify everything to be as basic as possible. By avoiding the newspaper and completing as much of the work outside of The Islands as possible, I was able to work around the areas where the program didn’t work for me. I employed many strategies in this project, the most beneficial of which made it into the final report here. I got to learn how to use the Paired T-Test, which was perfect for the study that I designed. My favorite output from this project is the graph with variable colors to indicate positive/negative changes. It was fun to think through how I could make that data more clear, and am very happy with how it worked out.