Introduction

In data given by John Wiley & Sons Inc. certain health factors, such as cholesterol level, systolic blood pressure, sedentary minutes per week, obesity, are given for each patient with a sample of about 5000 subjects. Along with each of these measurements are given some demographic factors from each patient, including age, height, marriage status, and gender. While this large dataset can bring about a lot of conclusions, and the one that I chose to focus on is whether or not average systolic blood pressure differs by gender. This question could lead to answers about who has a higher chance of problems associated with blood pressure. It could lead to conclusions about the normal or healthy range of systolic blood pressure, and if it should be different between the genders. Finding out if there is a significant differentiation could lead to a lot of helpful conclusions amongst the medical community in recognizing and assessing health problems related to systolic blood pressure. In order to come to a conclusion about the data given, I used a hypothesis test to determine whether the mean systolic blood pressure measurements are truly different between men and women.

Summary of Data

Gender Female Male Total
Mean 122.0426 125.9736 123.9829095
Size 2,489 2,426 4,915

As shown in the table above, the sample means of the groups showed a slight numeric difference. The female sample average systolic blood pressure was found to be 122.0426 mm Hg. While for males, the sample average systolic blood pressure was calculated at 125.9736 mm Hg. These values can also be seen comparatively in the box plot below. The overall sample mean for systolic blood pressure would be 123.9829095 mm Hg. The overall systolic blood pressure also has a median of 122 mm Hg with standard deviation 18.4508199 mm Hg.

Analysis

My null hypothesis was that the average values of systolic blood pressure would be the same in both women and men; my alternative hypothesis therefore that they were not equal. The hypothesis test gave a test statistic of -7.5214625, with an associated p-value of 6.413070410^{-14}, with test-statistics and p-values specific to the hypotheses stated below.

The test statistic for the alternative \(H_O: \mu_1 = \mu_2\) is -7.5214625, with corresponding p-value 1.
The test statistic for the alternative \(H_A: \mu_1 ≠ \mu_2\) is -7.5214625, with corresponding p-value 3.206535210^{-14}.

I also found the 99% confidence interval between the sample average systolic blood pressure of females to males to be (-5.2777963, -2.5842672) mm Hg.

Interpretation

Even just looking at the summary statistics of each category, you can see the difference in the numeric sample means, the female average systolic blood pressure is significantly less than that of the males, and both deviate on different sides of the overall mean. My hypothesis test showed a very small p-value that was less than all common alphas, meaning that with a confidence higher than 99%, I could reject my null hypothesis. This proves that the average systolic blood pressure for females and males is significantly different. This difference was also seen in my confidence interval of females to males. Since the bounds of the interval were negative, it is implied that the female values were less than the males. With 99% confidence I found that the true mean of female systolic blood pressure differs from that of males by between -5.2777963 and -2.5842672 mm Hg. The difference can also been visibly seen in the box plot, where female mean is slightly lower than male, with very close standard deviations.

Conclusion

I can confidently conclude that patients’ average systolic blood pressure differs by gender. Through the support of my summary statistics, hypothesis test, and confidence interval, it is highly likely that female average systolic blood pressures are significantly less than that of males.

APPENDIX OF CODE

patients=read.csv("~/Downloads/patients.csv")
patients=read.csv("~/Downloads/patients.csv")
the.stuff = t.test(sysBP ~ gender, data = patients, conf.level = 0.99)
the.stuff$estimate
mean=mean(patients$sysBP)
med=median(patients$sysBP)
sd=sd(patients$sysBP)
patients=read.csv("~/Downloads/patients.csv")
patients$gender = sample(c("Female","Male"), nrow(patients), replace = TRUE)
boxplot(sysBP ~ gender, data = patients,  main = "Distribution of Systolic Blood Pressure of Subjects by Gender",horizontal = TRUE,xlab = "mm Hg")
patients=read.csv("~/Downloads/patients.csv")
two.side = t.test(sysBP ~ gender, data = patients, conf.level = 0.99,alternative = "two.side")
greater = t.test(sysBP ~ gender, data = patients, conf.level = 0.99,alternative = "greater")
less = t.test(sysBP ~ gender, data = patients, conf.level = 0.99,alternative = "less")
patients=read.csv("~/Downloads/patients.csv")
the.CI = round(the.stuff$conf.int,digits = 4)