*Opening relevant R library package.**
The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and is in charge of producing vital and health statistics for the entire country. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements as well as laboratory tests administered by highly trained medical personnel. Due to the corona virus disease 2019 (COVID-19) pandemic, the NHANES program suspended conducting fieldwork in March 2020. As a result, data collection for the NHANES 2019-2020 cycle was not completed, and the data that was collected did not represent the entire country. Because of this, data from the NHANES 2017–2018 cycle was joined with data from the 2019–March 2020 cycle to provide a nationally representative sample of pre-pandemic NHANES 2017–2020 data [1].
Data was collected between 2017 and March 2020.
All participants, aged 16 years and older but less than 80 years, were included in the survey.
The data contains 2865 observations and 5 variables.
The aim of this research is to find the impact of cholesterol, age, ethnicity, and gender on early development of Hypertension.
Observations: 10195 Variables: 39 Following are the variables: 1) Blood pressure: It is a continuous variable and will be measured in mm-Hg.120 mmHg is the systolic Bp cutoff, and 80 mmHg is the diastolic cutoff. 2) Cholesterol: It is a categorical variable, and it will be measured in YES or NO. 3) Age: It is a continuous variable. This variable is numeric. Participants were asked for their age in years. This variable will be treated as continuous rather than discrete due to the wide range in ages among study participants. 4) Gender: This is a categorical variable and will be measured in male and female. 5) Ethnicity: It is a categorical data point and will be measured based on different races.
1.Blood pressure
High cholesterol level: This is a categorical variable considered a factor having 2 levels, “yes” and “no.”
Age: This is a continuous variable. This variable is numeric. Participants were asked for their age in years. This variable will be treated as continuous rather than discrete due to the wide range in ages among study participants.
Gender: This is a categorical variable considered a factor having 2 levels, “male” and “female.”
Ethnicity: This is a five-level categorical variable considered a factor: “Mexican American,” “Other Hispanic,” “Non-Hispanic White,” “Non-Hispanic Black,” and “Other Race – Including Multi-Racial.”
Data file access links
Data is imported from the NHANES 2017–2020 survey, which is publicly available. Two different sets of data are imported and merged together
Cleaning of Data To prepare the relevant data for statistical testing, the data of relevance are chosen and cleaned by removing the encoded or NA value.
## bp cholesterol gender age
## Min. :12.00 Yes:1684 Male :1525 Min. :16.00
## 1st Qu.:35.00 No :1379 Female:1538 1st Qu.:49.00
## Median :45.00 Median :60.00
## Mean :44.49 Mean :57.39
## 3rd Qu.:55.00 3rd Qu.:68.00
## Max. :78.00 Max. :79.00
## race
## Mexican American : 255
## Other Hispanic : 296
## Non-Hispanic White : 995
## Non-Hispanic Black :1070
## Other Race - Including Multi-Racial: 447
##
Assumption of normality is tested as blood pressure is a continuous variable.
##Histogram for Blood pressure Figure-1 The graph has a
normal distribution curve.
## skew (g1) se z p
## -2.532490e-01 4.425905e-02 -5.721970e+00 1.052961e-08
## Excess Kur (g2) se z p
## -6.083429e-01 8.851811e-02 -6.872525e+00 6.307621e-12
skewness test of the age variable, where the skew value is -0.253, z = -5.72Since the z value is less than -7 and the skew is positive, the variable age is positively skewed.
The kurtosis value of the variable is -0.608, which is less than 3.
The data is platykurtic, with a flatter distribution.
It does not meet the normal distribution assumption.
The assumption of normality is tested as age is a continuous
variable. I will draw the assumption results from the first association
test where the outcome variable’s blood pressure does not meet the
normality assumption.
Figure-2 graph does not have a normal distribution
curve.
## skew (g1) se z p
## -0.72305954 0.04425905 -16.33698584 0.00000000
## Excess Kur (g2) se z p
## -0.03151828 0.08851811 -0.35606595 0.72179118
skewness test of the age variable, where the skew value is -0.723, z = -16.33Since the z value is less than -7 and the skew is positive, the variable age is positively skewed.
The kurtosis value of the variable is -0.0315, which is less than 3.
The data is platykurtic, with a flatter distribution.
It does not meet the normal distribution assumption.
Overall (N=3063) |
|
---|---|
Age at which diagnosed with high blood pressure | 45.0 (20.0) |
No. of People diagnosed with high cholesterol | |
Yes | 1684 (55.0%) |
No | 1379 (45.0%) |
Gender | |
Male | 1525 (49.8%) |
Female | 1538 (50.2%) |
Average Age in years at the time of screening | 60.0 (19.0) |
Diffrent Ehnicity | |
Mexican American | 255 (8.3%) |
Other Hispanic | 296 (9.7%) |
Non-Hispanic White | 995 (32.5%) |
Non-Hispanic Black | 1070 (34.9%) |
Other Race - Including Multi-Racial | 447 (14.6%) |
***Table-1 : The sample of the study includes 3063 participants who have high blood pressure. 49.8% of those are male (n = 1525) and 50.2% are female (n = 1538), which is an even split. The median age of this data set’s participants was 60 years old, with an interquartile range of 19. The median age at which a participant is diagnosed with high blood pressure is 45 years, with an interquartile range of 20. Of the 3063 participants, 55% (n = 1684) had both high blood pressure and high cholesterol levels. 45% of participants (n = 1379) have normal cholesterol levels. Participants were from different races. 8.3% were Mexican Americans (n = 255), 9.7% were other Hispanics (n = 296), 32.5% were non-Hispanic white (995), 34.9% were non-Hispanic black (n = 1070), and 14.6% were other race including multi racial (n = 447). The median age of the participants in this data set is 60 years old. The sample appears to be gender balanced, with good diversity in race.
Predictors variable
Age: This is a continuous nominal variable. This variable is numeric. Participants were asked for their age in years. This variable will be treated as continuous rather than discrete due to the wide range in ages among study participants.It has 2865 observations and 5 variables.
Cholesterol : This is a categorical Nominal variable have 2 attributes as Yes,means people who have diagnosed with high cholesterol level (1728 observation, 55.3%) and No, those who do not have high cholesterol level (1395 observation, 44.7%).
Gender: This is a Categorical nominal variable have 2 attributes, male (1547 observation, 49.5%), female (1576 observation, 50.5%).
Ethnicity-This is a categorical nominal variable have 5 attributes namely - Mexican American (261 observation, 8.4%), Other Hispanic (303 observation, 9.7%), Non-Hispanic white (1016 observation,32.5%), Non-Hispanic black(1088 observation,34.8%), other race - including multi-racial (455 observation, 14.6%).
outcome Variable Blood pressure: This is a continuous nominal variable and have 2865 observations and 5 variables.It will be measured in mm-Hg.120 mmHg is the systolic Bp cutoff, and 80 mmHg is the diastolic cutoff. rticipants rested in a seated position for about 5 minutes, BP examiners took three consecutive readings BP readings and if a BP measurement was interrupted the reading was taken for a fourth time.
This is the first association test, and I am going to look for associations between the outcome, blood pressure, and gender.
Figure-3 The normality assumption for blood pressure with respect to gender is met.
Homogeneity of variance test. H0: Variances are equal. HA: Variances are not equal.
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 0.4842 0.4866
## 3061
The p value is less than 0.05, we reject the null, and the variances are not equal. the equal variance assumption is not met. This suggest we will use an alternative to the independent t-test (non-parametric tests). To test the association, I will use Mann-Whitney U test.
Plotted Box-plot to visualize the association between continuous variable blood pressure and categorical variable gender.
Figure-4 demonstrates that the mean for both genders is
the same.
Ho: There is no association between blood pressure and gender. HA: There is an association between blood pressure and gender.
##
## Wilcoxon rank sum test with continuity correction
##
## data: bp by gender
## W = 1157134, p-value = 0.5238
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
## -1.000046e+00 5.203865e-05
## sample estimates:
## difference in location
## -1.097796e-05
As the p-value is greater than 0.05, we can accept the null hypothesis and reject the alternate hypothesis.There is no correlation between blood pressure level and gender.
This is the second association test. I am going to look for association between the outcome, blood pressure, and age of the person.
Both the continuous variables BP and age are not normally distributed, as we tested above.
This suggests we will use alternative to Pearson’s r correlation test (non-parametric tests). To test the correlation, I will use Spearman’s correlation coefficient test.
H0: There is a relationship between blood pressure and age. HA: There is no relationship between blood pressure and age.
Figure-5 The monotonic relation between blood pressure and age shows that the regression line goes upward. It meet the assumption for rho.I will use a non-parametric test called Spearman’s correlation test.
Ho : There is no relationship between blood pressure and age.(rho equal to 0) HA : There is significant relationship between blood pressure and age.(rho is not equal to 0)
# Spear-man's correlation co-efficient
cor.test(bp_demo.clean$age, bp_demo.clean$bp, method= "spearman" )
##
## Spearman's rank correlation rho
##
## data: bp_demo.clean$age and bp_demo.clean$bp
## S = 1744213308, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.6358252
The rho value is 0.64, which is positive and lies between 0.5 and 0.8, which signifies that there is a statistically significant moderately positive correlation between the blood pressure and the age of people.(rho= 0.64) As a person’s age increases, so does the risk of developing high blood pressure.
This will show us the association between blood pressure and race.
ANOVA is used for statistical method for comparing means across three or more groups continuous outcome variable, in this case, blood pressure. blood pressure and race, ANOVA was used to make an assumption. Continuous variable and 2+ independent groups and independent observations were met, however, normal distribution with each group was not.
Figure-6 The density plot show normal distribution. thus, It meet the assumption of blood pressure normally distributed across categories of race.
H0: Variances are equal HA: Variances are not equal
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 4 3.3014 0.01042 *
## 3058
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p value is less than 0.05, so we reject the null. Variances are not equal. Thus, it does not meet the equal variance assumption. This suggests I will use the alternative to t-test (a non-parametric test) and the Kruskal-Wallis rank sum test.
Figure-7 This graph shows the relationship between race and blood pressure in mmHg.It shows that Non-Hispanic black and other race are more prone to have high blood pressure at younger age compared to Mexican american, non Hispanic white and other Hispanic.
Ho: There is no association between blood pressure and race. HA: There is an association between blood pressure and race.
##
## Kruskal-Wallis rank sum test
##
## data: bp by race
## Kruskal-Wallis chi-squared = 28.283, df = 4, p-value = 1.093e-05
As the p-value is less than 0.05, we can reject the null hypothesis and accept the alternate hypothesis. There is a significant association between BP and race.
## # A tibble: 5 × 3
## race m.bp sd.bp
## <fct> <dbl> <dbl>
## 1 Mexican American 45.7 13.9
## 2 Other Hispanic 46.1 13.1
## 3 Non-Hispanic White 45.2 14.5
## 4 Non-Hispanic Black 42.8 13.6
## 5 Other Race - Including Multi-Racial 45.2 13.4
Bonferroni post hoc test to see the association between blood pressure and race.
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 28.2827, df = 4, p-value = 0
##
##
## Comparison of x by group
## (Bonferroni)
## Col Mean-|
## Row Mean | Mexican Non-Hisp Non-Hisp Other Hi
## ---------+--------------------------------------------
## Non-Hisp | 3.158340
## | 0.0079*
## |
## Non-Hisp | 0.460639 -4.263310
## | 1.0000 0.0001*
## |
## Other Hi | -0.317301 -3.764135 -0.897821
## | 1.0000 0.0008* 1.0000
## |
## Other Ra | 0.624451 -3.037875 0.292819 1.015729
## | 1.0000 0.0119* 1.0000 1.0000
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
The Dunn test is a rank-sum test. There is association of blood pressure with Mexican American(mean=45.69), Other Hispanic (mean=46.09), non-Hispanic white (mean=45.21), non-Hispanic Black (42.76), Other race (mean=45.24). There is significant difference of mean between non-Hispanic, other race and Mexican american with that of Non-Hispanic Black and other Hispanic. The p-value is adjusted using a Bonferroni adjustment.
This will show us the association between cholesterol levels and blood pressure.
Figure-8 The density plot looks normally distributed, and BP is normally distributed across the cholesterol categories.
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 23.306 1.45e-06 ***
## 3061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As the p-value is less than 0.05, we can reject the null hypothesis and accept the alternate hypothesis that the variances are not equal. The assumption of equal variance is not met.I will use the non-parametric alternative to the independent t-test. Alternative to Independent t-test is Mann-Whitney U Test.
Ho: There is no relationship between blood pressure level and cholesterol level of the person. HA: There is a relationship between blood pressure level and cholesterol level of the person.
##
## Wilcoxon rank sum test with continuity correction
##
## data: bp by cholesterol
## W = 1361694, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
## 3.000005 5.000048
## sample estimates:
## difference in location
## 4.999923
The Mann-Whitney U test results with a p-value of 2.2e-16 with the continuity correction test are very close to zero. This indicates that we should reject the null hypothesis and accept the alternate hypothesis that there is a highly significant association between cholesterol and blood pressure. An increase in cholesterol level can significantly increase the risk of high blood pressure.
The most significant modifiable risk factor for global all-cause morbidity and death is systemic arterial hypertension, which is also linked to an elevated risk of cardiovascular disease (CVD). Fewer than half of participants with hypertension are aware of their illness, and many more are aware but untreated or improperly managed, even though good hypertension therapy lowers the worldwide burden of disease and mortality.
My study found the correlation of blood pressure with different variables, such as age, race, gender, and cholesterol. With this, I was able to identify the relationship between systolic blood pressure and the above variables. This data can make the general population aware of who may be more at risk for hypertension than others and encourage them to take precautionary steps to live a healthy life. This information should also make the general population and healthcare providers aware of the early age at which a person develops hypertension among different races. This will help the health department of the government formulate screening tests based on the results of this study for early detection of hypertension, which will lead to its early management. There is a significant positive correlation between blood pressure and cholesterol levels. This study will also encourage the general population to adopt healthy food habits by avoiding fast food, fatty foods, etc. With this study’s result, people will be more motivated to go for hypertension screening at an early age.
The results in the age in years revealed that the assumption was met. In spearman correlation test, the rho value was 0.64, which is positive and lies between 0.5 and 0.8, which signifies that there is a statistically significant correlation between age and blood pressure. Previous studies have shown the association between age and blood pressure [2,3].The most likely explanation is that aging has a significant effect on large arteries, causing arterial stiffness, which leads to an increase in arterial pressure and hypertension [4].
With the results for gender (males and females), the assumption was met. It revealed that there is no significant difference in the blood pressure pattern between males and females. But according to Jane Recklehoff, men are at greater risk for cardiovascular and renal disease than are age-matched, premenopausal women [4].24-hour ambulatory blood pressure monitoring has shown that blood pressure is higher in men than in women at similar ages. After menopause, however, blood pressure increases in women to levels even higher than in men (5).
As far as race, the results concludes that the assumption was met. Kruskal wallis test reveals that the p-value is less than 0.05, we can reject the null hypothesis and accept the alternate hypothesis.There is a significant association between blood pressure and race. My study shows that Non-Hispanic black and other race are more prone to have high blood pressure at younger age compared to Mexican american, non Hispanic white and other Hispanic. Previous studies found the similar association of blood pressure with race. (6)
This study also found a significant correlation between blood pressure and cholesterol level. The Mann-Whitney U test results with a p-value of 2.2e-16 which shows a highly significant association between cholesterol and blood pressure.An increase in cholesterol level can significantly increase the risk of high blood pressure. Abnormal lipid profile/cholesterol causes the aetherosceloris of the arteries which can significantly decrease the lumen size and causes increased in blood pressure[7].
Overall, the study validated the literature in examining the association between the mentioned factors and blood pressure. It also put doubt on a couple variables, which I will explore in the future. For instance, there was no association between gender and blood pressure which is in contrast with the previous studies. In addition, for future study initiatives, I would use a large sample size with the same predictors: age, race, education level, and gender. One restriction I’d like to mention is that this study has a small sample size, which can undermine the study’s power. I’d want to collect a big population sample for research and then investigate the association between gender and blood pressure.
2.McEniery, C. M., Wilkinson, I. B., & Avolio, A. P. (2007). Age, hypertension and arterial function. Clinical and Experimental Pharmacology and Physiology, 34(7), 665-671.
3.Lisa Cohen, Gary C. Curhan, John P. Forman,Influence of age on the association between lifestyle factors and risk of hypertension,Journal of the American Society of Hypertension,Volume 6, Issue 4,2012,Pages 284-290,ISSN 1933-1711, https://doi.org/10.1016/j.jash.2012.06.002.
Reckelhoff JF. Gender differences in the regulation of blood pressure. Hypertension. 2001;37(5):1199-1208. doi:10.1161/01.hyp.37.5.1199. Guzman, N.J. Epidemiology and Management of Hypertension in the Hispanic Population. Am J Cardiovasc Drugs 12, 165–178 (2012). https://doi.org/10.2165/11631520-000000000-00000
Doumas, M., Papademetriou, V., Faselis, C. et al. Gender Differences in Hypertension: Myths and Reality. Curr Hypertens Rep 15, 321–330 (2013). https://doi.org/10.1007/s11906-013-0359-y
6.Cangiano JL. Hypertension in Hispanic Americans. Cleveland Clinic Journal of Medicine. 1994 Sep-Oct;61(5):345-350. DOI: 10.3949/ccjm.61.5.345. PMID: 7955306.