Math 247 Final Project Report

Introduction

As a college athlete and someone aspiring to make a high working salary in the field of biostatistics, I am interested in exploring whether there is a correlation between one’s net worth and physical fitness.The research question for my project is whether a person’s net worth have an impact on a person’s physical cardiovascular fitness? Net worth will be a binary categorical variable, (whether or not the subject has a net worth above $5,000) and cardiovascular fitness will be a quantitative variable (score on a multi-stage shuttle run). I believe this is a good variable because to measure physical fitness because the shuttle run can be broadly used to test a diverse set of people more fairly than a sprinting test, weightlifting test or something else. The shuttle test seems to me to be the most fair test for a subject group consisting of males and females of many different ages and health levels.

The population parameter of interest for this study was the difference in mean shuttle run score between Islanders with a net worth greater or less than $5,000, denoted: \[\mu_{High} - \mu_{Low}\] Where high represents Islanders with a net worth greater than 5,000 dollars and Low represents Islanders with a net worth below 5,000 dollars.

In the process of reviewing previous literature about my topic, I found that nearly every article I read concluded that there is in fact an association between net worth and one’s physical cardiovascular fitness, namely that a higher net worth tends to be associated with a higher cardiovascular fitness. I read the following research article, “Income and Physical Activity among Adults: Evidence from Self-Reported and Pedometer-Based Physical Activity Measurements”, from the national library of medicine. The results indicated that higher income was associated with higher self-reported physical activity for both genders. Another article, titled, “Are Fitness Levels and Wealth Related?”, discusses the known correlation between the two, but questions the causality of the variables. Questions like, “Do common characteristics like drive and intelligence of higher income people lead them to want to become more physically fit, or does their physical fitness improve their brain’s ability to make sound beneficial decisions about their money?” are what I think of when I ponder this topic. Another article titled, “higher income was associated with higher self-reported physical activity for both genders” (Kari et al., 2015, p.1). shows interesting conclusions about similar variables to mine. Although this study by Jaana T. Kari and associates in 2015 found the exact association I am looking to examine in my study, Kari’s study took data that was self-reported by the subjects. And it also was looking at physical activity level, not physical cardiovascular fitness. Those two variables are similar, but not quite the same. My study aims to determine more accurately using an experimental approach by directly testing the cardiovascular fitness of the test subjects. An article on medium discusses the connection between physical fitness and wealth, looking specifically at a potential causal connection between the two, saying that “Wealthier American states are the fittest, but it is also an inequity issue. Obesity rates are linked to economic status. . . the poorer you are, the more likely you are to be overweight. This is partly due to the fact that healthy, low-calorie food is more expensive than calorie-dense, nutrient-poor food, and partly due to the fact that poor neighborhoods have more poverty. Because poor communities have fewer grocery stores, the poor are forced to shop for groceries at corner stores, petrol stations, and fast food restaurants, especially if they do not own a car” (Jess 2021). This article discusses a potential causal relationship between the two variables, namely that a higher net worth leads to higher physical fitness, as with more money you can by higher quality foods made with better ingredients. My research question is inspired by this research because it will provide a further look into how wealth and cardiovascular fitness may be connected with more concrete evidence on the cardiovascular fitness of the subjects.

Before seeing any data, I suspected that the actual parameter value was higher than the predicted parameter, due to previous literature articles concluding that a higher net worth tends to be associated with a higher cardiovascular fitness level. I think this may be the case because increased net worth/income leads to indnividuals being able to access more expensive and higher quality medical services. They are also able to access healthier food options, which are only getting more and more expensive relative to processed fast foods.

Data Collection Methods

In this study, the observational units are the 47 Islanders who were selected to participate in the study. In order to attempt to remove as much bias as possible, I took went into through each of the 27 towns on the islands, one by one. In each town, I used random.org to generate a random number between 1 and the total number of houses in the town. I then generated another random number between 1 and the number of people living in the house, and that was the person I asked to participate in the study. I only wanted adult Islanders, as children generally do not have an income. If the islander that was randomly chosen was under the age of 18 or did not consent to the study, I simply generated another house number and person number from within the same town. The response rate was 94%, with 3 Islanders declining consent. In each case, the next person from the town that I asked to participate gave their consent. I do not think this will have a significant impact on the data and results. Once I went through every town on the Islands, I restarted from the beginning of my town list, in order to generate enough participants to satisfy validity conditions for a theory-based t-test. I ended up having more than 20 units in both the high and low net worth categories for the net worth binary categorical variable. After the subject was selected and consent was obtained to participate, I recorded the subjects name, residence, and net worth. They were then asked to complete a multi-stage shuttle run test, after which their score was recorded. A higher score is meant to signify the subject has a higher level of cardio-vascular fitness. A potential source of sampling error is that I did not take an exorbitant amount of subjects for the study, I took 47 observational units, which may be considered to be a small sample size to some. Another potential issue is that each town has a different numbers of Islanders residing within it, meaning that islanders living within smaller towns are more likely to be selected according to my selection method.

Descriptive Statistics

library(readr)
Final_Project <- read_csv("~/Downloads/Stats 247 Mini-Project 3 Data Collection - Data.csv")

bwplot(Score ~ Worth, 
       horizontal = FALSE, 
       main="Side-by-side boxplots",
       data = Final_Project)

favstats(Score ~ Worth, data = Final_Project)

Using the data I collected, I was able to create a side-by-side boxplot comparing the multi-stage shuttle run scores of the high net worth group (depicted on the left) and the low net worth group (depicted on the right). It is clear that both distributions are relatively non-skewed and have significant overlap. One thing to note is that the median shuttle run score of the low income group is slightly higher than the high income group, which contradicts my initial conjecture. But generally speaking, the mean, median, inter quartile ranges, and standard deviations of both the high net worth and low net worth groups are essentially the same. The median shuttle run score of the high net worth group is 8.25, while the median shuttle run score of the low net worth group is 8.60. The means of the high and low net worth groups are 8.55 and 8.3, respectively and the standard deviations are 1.43 and 1.51, respectively. Due to these similar values, there appears to be little association between one’s net worth and their cardiovascular fitness.

Analysis of Results

Our data values are independent of each other and the observational units comprise a relatively random sample. Additionally, we have at least 20 observational units in each category of the binary net worth variable, with 25 low net worth islanders and 25 high net worth islanders. The data is also distributed roughly symmetrically in both groups with similar standard deviations. For all of these reasons, our data meets all of the validity conditions required for a theory-based two sample t-test, which is what will be used for a significance test.

The population of interest for this analysis is the 47 adult islanders who participated in the study, and the parameter is the difference in mean shuttle run score between Islanders with a net worth greater or less than $5,000. The null hypothesis is that he difference in mean shuttle run score between high net worth islanders and low net worth Islanders equals 0, while the alternative hypothesis is that the difference in mean shuttle run score between high net worth islanders and low net worth Islanders is not equal to 0:

\[H_0: \mu_{High} - \mu_{Low} = 0\] \[H_a: \mu_{High} - \mu_{Low} \ne 0\] In the context of this study, a type I error would be a false positive, namely that we would reject the null hypothesis, even though the null hypothesis is actually true. This means that if a type I error occurs, my statistical analysis would conclude that there is an association between an individual’s net worth and their cardiovascular fitness, even though in reality there actually is not. A type II error would be a false negative, where my study would conclude that there is no association between my variables, even though in reality there is.

This sample should be considered a representative sample of the adult Islanders as a whole, as each town is represented by the study and we have two factor randomness for deciding the residence and individual selected. However, due to the likelihood that the towns vary in size, it is possible that depending on which Island a person is from, they are more or less likely to be chosen in the study than other adult Islanders.

I used a theory-based two sample t-test to determine the statistical significance of our data:

stat(t.test(Score ~ Worth, data = Final_Project))

##         t 
## 0.5916511

The t-statistic for our data set was found to be 0.59, signifying that the observed difference in means (0.25) falls about 0.59 standard deviations away from the null hypothesis difference of 0. This is not a very large t-value, so this leads me to believe that we do not have much evidence against the null hypothesis. I also used R to determine the two-sided p-value for our data set:

pval(t.test(Score ~ Worth, data = Final_Project))

##   p.value 
## 0.5570638

The p-value for this data set was found to be 0.56. This means that the probability of observing a shuttle run score difference of 0.25 between the high and low net worth groups assuming the null hypothesis is true is 56%. This is much greater than the 0.05 threshold that is typically used, giving us further evidence to the fact that we do not have much evidence against the null hypothesis. This is also supported by the 95% confidence interval for our data:

confint(t.test(Score ~ Worth, data = Final_Project))

The 95% confidence interval for this data is (-0.61, 1.12). So we are 95% confident that the difference between the mean shuttle run score of high income individuals and low income individuals is between -0.61 and 1.12. Notice that 0 is contained within this interval, so 0 is a plausible value of the parameter which confirms the conclusion that has been reached with the hypothesis test. This only gives us further evidence that we fail to reject our null hypothesis.

Conclusion

Summarize the results of your study (there will be some repetition, and you should cite your evidence). You should tell a story: What did you learn? Did the data behave as you expected? Pay particular attention to whether or not it is reasonable to generalize your sample to the larger population or process. Is there anything you would do differently next time? What similar questions might someone choose to investigate in the future to build on your results?

In summary, my study found no evidence of an association between the net worth of an individual and their cardiovascular fitness. The t-statistic of 0.59, two-sided p-value of 0.56 and 95% confidence interval of (-0.61, 1.12) all support the conclusion that we have failed to reject our null hypothesis that the difference in shuttle run scores between high and low net worth individuals is equal to 0. This is not what I expected based on previous literature related to the topic, as most previous research found a positive association between one’s income and their cardiovascular fitness. However, the conclusions of this study can only be generalized to the population of adult islanders within this simulation, and no causal relationship can be inferred due to the fact that no random assignment of variables is present in the construction of the study (you cannot randomly assign someone’s income). Additionally, I used $5,000 as the threshold for determining the net worth status of the islanders completely arbitrarily so as to conveniently create two groups containing enough observational units to meet the criteria for a theory-based t-test. Although my study did not find any statistically significant conclusions, it is possible that a larger scale study with an increased emphasis on random sampling and a more robust criteria for determining the net worth status of the participants would draw more concrete conclusions that mirror previous research on the association between these two variables, that higher income individuals tend to have increased cardiovascular fitness.

In terms of future research, we can use multiple cardiovascular fitness tests to create a more holistic illustration of each participants cardiovascular fitness. We could also test the islanders on other types of fitness, like muscular strength or muscular endurance. This could tell us if the net worth of individuals has differing associations with different types of fitness.

Bibliography: references to literature mentioned in the introduction

Kari, J., et al. (2015). Income and Physical Activity among Adults: Evidence from Self-Reported and Pedometer-Based Physical Activity Measurements. Plos One (Bethesda, Maryland: 2015), 10(8). https: //www.ncbi.nlm.nih.gov/pmc/articles/PMC4552741/

Jess the Avocado (2021). Are Fitness Levels and Wealth Related? DataDrivenInvestor (2021). https: //medium.datadriveninvestor.com/are-fitness-levels-and-wealth-related-61c6f783e785