Introduction

Keeping your health in check can help prevent chronic conditions such as heart problems. To keep in good health, one should have a good balance by eating well and consistently working out. However, if one neglects their health, this can lead to numerous health problems, one being heart disease which is the leading cause of death in the United States. This study shows that there are some personal key indicators of heart disease and other health problems, one being BMI. BMI is a measurement of body fat and a reliable indicator of one’s risk of illnesses associated with excess body fat. A healthy BMI for males and females is between 18.5 to 24.9. A BMI that is over or not within a healthy range can lead to severe health problems such as heart disease, high blood pressure, type 2 diabetes, gallstones, respiratory issues, and some malignancies.

Methodology

For each participant, data was collected on their BMI, smoking habits, amount of alcohol consumed per week, known stroke(s), mental health, whether the participant has difficulty walking, sex, age, race, known diabetes, physical health, general health, amount of sleep in 24 hours, known kidney disease, and known skin cancer. Although 18 variables of data were collected for each participant, we will examine the average BMI of both males and females to find who has the higher BMI between the genders. To do this, we will conduct a two-sample t-test by constructing a 95% confidence interval for the difference in means, as well as our null and alternative hypotheses.

The data were collected by the CDC and are a large part of the Behavioral Risk Factor Surveillance System (BRFSS), which conducts annual telephone surveys to gather data on the health status of U.S. residents. This data set was conducted in 2020 by the CDC survey data. A total of 400,000 adults took part in the annual survey. 167,805 females and 151,990 males participated in this study where they gave information related to their health status.

The participants were representatives of all ages over 18, and different races as well. Few contributors were of age 55 or younger as this data set’s main target column was heart disease and strokes, which usually don’t occur until 65 years or older. The majority of participants were Caucasian, and some were African American; very few identified themselves as being Asian-American or Latina.

Results

The female and male histograms below show that the data appears to not be normally distributed. The quantile-quantile plot shows the male BMI data is not normally distributed and is heavy-tailed on the right. Similarly, the female quantile-quantile plot is also not normally distributed and is heavy-tailed on the right.

The mean BMI of male participants was 28.505 while the mean BMI of female participants was 28.162. We are 95% confident that the mean BMI of males is between 28.476 and 28.534 while the mean BMI of females is between 28.129 and 28.195. The standard deviation of BMI was 5.767 for participants who were male and 6.842 for participants who were female. The box plot below shows the higher standard deviation among female participants.

We are 95% confident that the mean BMI of males is between 28.476 and 28.534 while the mean BMI of females is between 28.129 and 28.195. The test statistic is -15.368 which indicates how many standard errors the sample means are apart. Our null hypothesis is that the BMI of both male and female are equal. We have a small p-value and so we would reject the null hypothesis in favor of the alternative hypothesis. The p-value is 2.2x10^-16 which is smaller than 0.05. In this case, our alternative hypothesis states that on average the BMI of both male and female are not equal.

The frequency of female and male participants that have a BMI over the healthy range, between 18.5 and 24.9, was 0.638 for female participants and 0.740 for male participants. We are 95% confident that the frequency of male participants with BMI over the healthy range is between 30.586 and 30.646. For female participants, we are 95% confident that the frequency of female participants with BMI over the healthy range is between 31.633 and 31.706. The test statistic is -63.274 which indicates how many standard errors the sample means are apart. Our null hypothesis is that the frequency of male participants with BMI over the healthy range is equal to female participants with BMI over the healthy range. We have a small p-value and so we would reject the null hypothesis in favor of the alternative hypothesis. The p-value is 2.2x10^-16 which is smaller than 0.05. In this case, our alternative hypothesis states that on average the frequency of male participants with BMI over the healthy range is not equal to female participants with BMI over the healthy range.

Discussion

The difference between the mean BMIs for males and females is statistically significant. From our results we can conclude, that based on the factor of BMI alone males are at a higher risk for heart disease than females. However, although the difference in BMIs for males and females is statistically significant, does not mean that it is clinically significant. An individual’s BMI fluctuating by 0.299 to 0.387 would not raise any health concerns or even be noticed by the average human.
The incidence of BMI over the healthy range is higher for males than it is for females. Similar to the analysis of the general difference in BMI for males and females, the difference between BMIs over the healthy range is statistically significant. Although statistically significant, the difference in BMIs over the healthy range is not clinically significant. This raises a problem with our data set.

The data set used for these tests only considers 18 variables, these are factors like age, sex, race, known health conditions, alcohol consumption per week, etc., note that the original data set contains 279 different variables. Since the data set only considers 18 different factors that may cause heart disease, we cannot say with absolute certainty that a higher BMI alone puts males at a larger risk for heart disease.

One last thing worth noting is that while BMI can be useful for determining health, this measurement does have some flaws. BMI does not consider muscle mass, so it has no way of differentiating where weight comes from in terms of fat vs. muscle.

In the future, it could be useful to look at additional variables to give a better idea of whether the participant actually has heart disease and among which variables are most common within those results. Other key risk factors of heart disease that we could consider include high blood pressure, high cholesterol, smoking, and diabetic status.

Appendix

# Read in data using .csv
Heart_data <- read.csv("heart_2020_cleaned.csv")
#Heart_data

# Change Sex from a chr to levels with "Male" and "Female".
Heart_data$Sex = as.factor(Heart_data$Sex)

Male <- subset(Heart_data, Sex == "Male")
Female <- subset(Heart_data, Sex == "Female") 
# Puts plots in a group of four
par(mfrow = c(2,2))

# Histogram of females BMI
hist(Female$BMI, main = "Histogram of Female BMI", xlab = "BMI", ylab = "Number of Females")

# Histogram of males BMI
hist(Male$BMI, main = "Histogram of Male BMI ", xlab = "BMI", ylab = "Number of Males")

# Quantile-Quantile Plot for Female BMI 
qqnorm(Female$BMI, xlab = "Female Theoretical Quantiles")
qqline(Female$BMI)

# Quantile-Quantile Plot for Male BMI 
qqnorm(Male$BMI, xlab = "Male Theoretical Quantiles")
qqline(Male$BMI)

par(mfrow = c(2,2))
# Histogram of all participants BMI
hist(Heart_data$BMI)
# Histogram of females BMI
hist(Female$BMI, xlab = "Female BMI", ylab = "Number of Females")
# Histogram of males BMI
hist(Male$BMI, xlab = "Male BMI", ylab = "Number of Males")
# Boxplots of both male and female BMI next to each other
boxplot(Heart_data$BMI ~ Heart_data$Sex, xlab = "Genders", ylab = "BMI")

# Calculate the mean BMI of both Male and Female participants
# Able to use tapply to double check the means with the t.test.
tapply(Heart_data$BMI, Heart_data$Sex, mean, na.rm = TRUE)
##   Female     Male 
## 28.16244 28.50532
# Calculate the standard deviation of BMI of both Male and Female participants.
# Able to use tapply to see what the standard deviation of BMI of both male and females
tapply(Heart_data$BMI, Heart_data$Sex, sd, na.rm = TRUE)
##   Female     Male 
## 6.841990 5.767018
TEST <- (Heart_data$BMI ~ Heart_data$Sex)

t.test(TEST, mu = 0, alt = "two.sided", conf = 0.95, var.equal = F, paired = F)
## 
##  Welch Two Sample t-test
## 
## data:  Heart_data$BMI by Heart_data$Sex
## t = -15.368, df = 318168, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -0.3866106 -0.2991518
## sample estimates:
## mean in group Female   mean in group Male 
##             28.16244             28.50532
# The t.test is currently calculating alphabetically which is why the t.test is Female - Male. To check the other way just rename Female with a letter after M for Male. In this case we could use ZFemale.
# t.test of just male BMI
MaleGroup <- (Male$BMI)

t.test(MaleGroup, mu = 0, alt = "two.sided", conf = 0.95, var.equal = F, paired = F)
## 
##  One Sample t-test
## 
## data:  MaleGroup
## t = 1927, df = 151989, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  28.47632 28.53431
## sample estimates:
## mean of x 
##  28.50532
# t.test of just female BMI
FemaleGroup <- (Female$BMI)

t.test(FemaleGroup, mu = 0, alt = "two.sided", conf = 0.95, var.equal = F, paired = F)
## 
##  One Sample t-test
## 
## data:  FemaleGroup
## t = 1686.1, df = 167804, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  28.12970 28.19517
## sample estimates:
## mean of x 
##  28.16244
# A healthy BMI range is between 18.5 to 24.9.

# All participants BMI that is greater than 24.9
BMIOver <- subset(Heart_data$BMI, Heart_data$BMI > 24.9)

# Subset the male BMI data that is greater than 24.9
MaleOver <- subset(Male$BMI, Male$BMI > 24.9)

# Subset the female BMI data that is greater than 24.9
FemaleOver <- subset(Female$BMI, Female$BMI > 24.9)
# Relook at Boxplot for Female and Male participants
par(mfrow = c(1,2))

# Boxplot for male participants with BMI greater than 24.9
boxplot(MaleOver, xlab = "Male BMI > 24.9", ylab = "BMI")
min(MaleOver)
## [1] 24.91
max(MaleOver)
## [1] 94.85
# Boxplot for female participants with BMI greater than 24.9
boxplot(FemaleOver, xlab = "Female BMI > 24.9", ylab = "BMI")

min(FemaleOver)
## [1] 24.91
max(FemaleOver)
## [1] 94.66
AllParticipantsOver <- (Heart_data$BMI > 24.9 ~ Heart_data$Sex)

t.test(AllParticipantsOver, mu = 0, alt = "two.sided", conf = 0.95, var.equal = F, paired = F)
## 
##  Welch Two Sample t-test
## 
## data:  Heart_data$BMI > 24.9 by Heart_data$Sex
## t = -63.274, df = 319778, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -0.10601743 -0.09964674
## sample estimates:
## mean in group Female   mean in group Male 
##            0.6376508            0.7404829
MaleGroupOver <- (MaleOver)

t.test(MaleGroupOver, mu = 0, alt = "two.sided", conf = 0.95, var.equal = F, paired = F)
## 
##  One Sample t-test
## 
## data:  MaleGroupOver
## t = 1995.1, df = 112545, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  30.58612 30.64628
## sample estimates:
## mean of x 
##   30.6162
FemaleGroupOver <- (FemaleOver)

t.test(FemaleGroupOver, mu = 0, alt = "two.sided", conf = 0.95, var.equal = F, paired = F)
## 
##  One Sample t-test
## 
## data:  FemaleGroupOver
## t = 1699.8, df = 107000, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  31.63325 31.70628
## sample estimates:
## mean of x 
##  31.66976
# Relook at Quantile-Quantile Plots
par(mfrow = c(1,2))

# Quantile-Quantile Plot for Male BMI greater than 24.9
qqnorm(MaleOver, xlab = "Male > 24.9 Theoretical Quantiles")
qqline(MaleOver)

# Quantile-Quantile Plot for Female BMI greater than 24.9
qqnorm(FemaleOver, xlab = "Female > 24.9 Theoretical Quantiles")
qqline(FemaleOver)