*Opening relevant R library package.**

Introduction

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and is in charge of producing vital and health statistics for the entire country. The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements as well as laboratory tests administered by highly trained medical personnel. Due to the corona virus disease 2019 (COVID-19) pandemic, the NHANES program suspended conducting fieldwork in March 2020. As a result, data collection for the NHANES 2019-2020 cycle was not completed, and the data that was collected did not represent the entire country. Because of this, data from the NHANES 2017–2018 cycle was joined with data from the 2019–March 2020 cycle to provide a nationally representative sample of pre-pandemic NHANES 2017–2020 data [1].

When was the data Collected?

Data was collected between 2017 and March 2020.

Target Population

All participants, aged 16 years and older but less than 80 years, were included in the survey.

How many observation and variables are there?

The data contains 2865 observations and 5 variables.

Research question

The aim of this research is to find the impact of cholesterol, age, ethnicity, and gender on early development of Hypertension.

Data and variable description

Observations: 10195 Variables: 39 Following are the variables: 1) Blood pressure: It is a continuous variable and will be measured in mm-Hg.120 mmHg is the systolic Bp cutoff, and 80 mmHg is the diastolic cutoff. 2) Cholesterol: It is a categorical variable, and it will be measured in YES or NO. 3) Age: It is a continuous variable. This variable is numeric. Participants were asked for their age in years. This variable will be treated as continuous rather than discrete due to the wide range in ages among study participants. 4) Gender: This is a categorical variable and will be measured in male and female. 5) Ethnicity: It is a categorical data point and will be measured based on different races.

The Outcome Variable and the Predictors

The outcome Variable

1.Blood pressure

The Predictors-

  1. Cholesterol level
  2. Age
  3. Gender
  4. Ethnicity

How the Predictor variable is measured.(data types, categories, values)?

  1. High cholesterol level: This is a categorical variable considered a factor having 2 levels, “yes” and “no.”

  2. Age: This is a continuous variable. This variable is numeric. Participants were asked for their age in years. This variable will be treated as continuous rather than discrete due to the wide range in ages among study participants.

  3. Gender: This is a categorical variable considered a factor having 2 levels, “male” and “female.”

  4. Ethnicity: This is a five-level categorical variable considered a factor: “Mexican American,” “Other Hispanic,” “Non-Hispanic White,” “Non-Hispanic Black,” and “Other Race – Including Multi-Racial.”

Data file access links

Data is imported from the NHANES 2017–2020 survey, which is publicly available. Two different sets of data are imported and merged together

Cleaning of Data To prepare the relevant data for statistical testing, the data of relevance are chosen and cleaned by removing the encoded or NA value.

##        bp        cholesterol    gender          age       
##  Min.   :12.00   Yes:1684    Male  :1525   Min.   :16.00  
##  1st Qu.:35.00   No :1379    Female:1538   1st Qu.:49.00  
##  Median :45.00                             Median :60.00  
##  Mean   :44.49                             Mean   :57.39  
##  3rd Qu.:55.00                             3rd Qu.:68.00  
##  Max.   :78.00                             Max.   :79.00  
##                                   race     
##  Mexican American                   : 255  
##  Other Hispanic                     : 296  
##  Non-Hispanic White                 : 995  
##  Non-Hispanic Black                 :1070  
##  Other Race - Including Multi-Racial: 447  
## 

Normality assumption for blood pressure

Assumption of normality is tested as blood pressure is a continuous variable.

##Histogram for Blood pressure Figure-1 The graph has a normal distribution curve.

Skewness and Kurtosis for blood pressure

##     skew (g1)            se             z             p 
## -2.532490e-01  4.425905e-02 -5.721970e+00  1.052961e-08
## Excess Kur (g2)              se               z               p 
##   -6.083429e-01    8.851811e-02   -6.872525e+00    6.307621e-12

skewness test of the age variable, where the skew value is -0.253, z = -5.72Since the z value is less than -7 and the skew is positive, the variable age is positively skewed.

The kurtosis value of the variable is -0.608, which is less than 3.

The data is platykurtic, with a flatter distribution.

It does not meet the normal distribution assumption.

Normality assumption for Age

The assumption of normality is tested as age is a continuous variable. I will draw the assumption results from the first association test where the outcome variable’s blood pressure does not meet the normality assumption. Figure-2 graph does not have a normal distribution curve.

##    skew (g1)           se            z            p 
##  -0.72305954   0.04425905 -16.33698584   0.00000000
## Excess Kur (g2)              se               z               p 
##     -0.03151828      0.08851811     -0.35606595      0.72179118

skewness test of the age variable, where the skew value is -0.723, z = -16.33Since the z value is less than -7 and the skew is positive, the variable age is positively skewed.

The kurtosis value of the variable is -0.0315, which is less than 3.

The data is platykurtic, with a flatter distribution.

It does not meet the normal distribution assumption.

Descriptive statistics

Table-1: Descriptive Statistics table NHAMES 2020 Survey (2865 Observation)
Overall
(N=3063)
Age at which diagnosed with high blood pressure 45.0 (20.0)
No. of People diagnosed with high cholesterol
Yes 1684 (55.0%)
No 1379 (45.0%)
Gender
Male 1525 (49.8%)
Female 1538 (50.2%)
Average Age in years at the time of screening 60.0 (19.0)
Diffrent Ehnicity
Mexican American 255 (8.3%)
Other Hispanic 296 (9.7%)
Non-Hispanic White 995 (32.5%)
Non-Hispanic Black 1070 (34.9%)
Other Race - Including Multi-Racial 447 (14.6%)

***Table-1 : The sample of the study includes 3063 participants who have high blood pressure. 49.8% of those are male (n = 1525) and 50.2% are female (n = 1538), which is an even split. The median age of this data set’s participants was 60 years old, with an interquartile range of 19. The median age at which a participant is diagnosed with high blood pressure is 45 years, with an interquartile range of 20. Of the 3063 participants, 55% (n = 1684) had both high blood pressure and high cholesterol levels. 45% of participants (n = 1379) have normal cholesterol levels. Participants were from different races. 8.3% were Mexican Americans (n = 255), 9.7% were other Hispanics (n = 296), 32.5% were non-Hispanic white (995), 34.9% were non-Hispanic black (n = 1070), and 14.6% were other race including multi racial (n = 447). The median age of the participants in this data set is 60 years old. The sample appears to be gender balanced, with good diversity in race.

Predictors variable

Age: This is a continuous nominal variable. This variable is numeric. Participants were asked for their age in years. This variable will be treated as continuous rather than discrete due to the wide range in ages among study participants.It has 2865 observations and 5 variables.

Cholesterol : This is a categorical Nominal variable have 2 attributes as Yes,means people who have diagnosed with high cholesterol level (1728 observation, 55.3%) and No, those who do not have high cholesterol level (1395 observation, 44.7%).

Gender: This is a Categorical nominal variable have 2 attributes, male (1547 observation, 49.5%), female (1576 observation, 50.5%).

Ethnicity-This is a categorical nominal variable have 5 attributes namely - Mexican American (261 observation, 8.4%), Other Hispanic (303 observation, 9.7%), Non-Hispanic white (1016 observation,32.5%), Non-Hispanic black(1088 observation,34.8%), other race - including multi-racial (455 observation, 14.6%).

outcome Variable Blood pressure: This is a continuous nominal variable and have 2865 observations and 5 variables.It will be measured in mm-Hg.120 mmHg is the systolic Bp cutoff, and 80 mmHg is the diastolic cutoff. rticipants rested in a seated position for about 5 minutes, BP examiners took three consecutive readings BP readings and if a BP measurement was interrupted the reading was taken for a fourth time.

Explanatory Analysis, Data Visualization, and Bivariate Tests

Association between Blood pressure and Gender

This is the first association test, and I am going to look for associations between the outcome, blood pressure, and gender.

Normality assumption of blood pressure and gender.

Figure-3 The normality assumption for blood pressure with respect to gender is met.

Homogenity of variance test for blood pressure and gender.

Homogeneity of variance test. H0: Variances are equal. HA: Variances are not equal.

## Levene's Test for Homogeneity of Variance (center = mean)
##         Df F value Pr(>F)
## group    1  0.4842 0.4866
##       3061

The p value is less than 0.05, we reject the null, and the variances are not equal. the equal variance assumption is not met. This suggest we will use an alternative to the independent t-test (non-parametric tests). To test the association, I will use Mann-Whitney U test.

Box-Plot Graph for Gender

Plotted Box-plot to visualize the association between continuous variable blood pressure and categorical variable gender.

Figure-4 demonstrates that the mean for both genders is the same.

Mann-Whitney U Test

Ho: There is no association between blood pressure and gender. HA: There is an association between blood pressure and gender.

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  bp by gender
## W = 1157134, p-value = 0.5238
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -1.000046e+00  5.203865e-05
## sample estimates:
## difference in location 
##          -1.097796e-05

Interpretation of Mann-Whitney U Test

As the p-value is greater than 0.05, we can accept the null hypothesis and reject the alternate hypothesis.There is no correlation between blood pressure level and gender.

Association between Blood pressure and Age.

This is the second association test. I am going to look for association between the outcome, blood pressure, and age of the person.

Both the continuous variables BP and age are not normally distributed, as we tested above.

This suggests we will use alternative to Pearson’s r correlation test (non-parametric tests). To test the correlation, I will use Spearman’s correlation coefficient test.

checking the monotonic relation assumption

H0: There is a relationship between blood pressure and age. HA: There is no relationship between blood pressure and age.

Figure-5 The monotonic relation between blood pressure and age shows that the regression line goes upward. It meet the assumption for rho.I will use a non-parametric test called Spearman’s correlation test.

Spearman’s correlation test

Ho : There is no relationship between blood pressure and age.(rho equal to 0) HA : There is significant relationship between blood pressure and age.(rho is not equal to 0)

# Spear-man's correlation co-efficient 
cor.test(bp_demo.clean$age, bp_demo.clean$bp, method= "spearman" )
## 
##  Spearman's rank correlation rho
## 
## data:  bp_demo.clean$age and bp_demo.clean$bp
## S = 1744213308, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.6358252

The rho value is 0.64, which is positive and lies between 0.5 and 0.8, which signifies that there is a statistically significant moderately positive correlation between the blood pressure and the age of people.(rho= 0.64) As a person’s age increases, so does the risk of developing high blood pressure.

Association between blood pressure and race

This will show us the association between blood pressure and race.

Assumptions to check for ANOVA

ANOVA is used for statistical method for comparing means across three or more groups continuous outcome variable, in this case, blood pressure. blood pressure and race, ANOVA was used to make an assumption. Continuous variable and 2+ independent groups and independent observations were met, however, normal distribution with each group was not.

  1. Continuous variable and 2+ independent groups (continuous variable and a categorical variable with 3+ groups)
  2. Independent observations
  3. Normal distribution with each group
  4. Equal variances across groups (Homogeneity of variances)

The blood pressure variable’s test of normal distribution was already checked in the first association, where it did not meet the normal distribution assumption.

The normal distribution of blood pressure across different races.

Density plot of blood pressure and different races.

Figure-6 The density plot show normal distribution. thus, It meet the assumption of blood pressure normally distributed across categories of race.

Equal variance of blood pressure and race

H0: Variances are equal HA: Variances are not equal

## Levene's Test for Homogeneity of Variance (center = mean)
##         Df F value  Pr(>F)  
## group    4  3.3014 0.01042 *
##       3058                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p value is less than 0.05, so we reject the null. Variances are not equal. Thus, it does not meet the equal variance assumption. This suggests I will use the alternative to t-test (a non-parametric test) and the Kruskal-Wallis rank sum test.

Box plot blood pressure vs Race

Figure-7 This graph shows the relationship between race and blood pressure in mmHg.It shows that Non-Hispanic black and other race are more prone to have high blood pressure at younger age compared to Mexican american, non Hispanic white and other Hispanic.

Kruskal-Wallis rank sum test

Ho: There is no association between blood pressure and race. HA: There is an association between blood pressure and race.

## 
##  Kruskal-Wallis rank sum test
## 
## data:  bp by race
## Kruskal-Wallis chi-squared = 28.283, df = 4, p-value = 1.093e-05

Interpretation of Kruskal Wallis

As the p-value is less than 0.05, we can reject the null hypothesis and accept the alternate hypothesis. There is a significant association between BP and race.

summary of each category mean

## # A tibble: 5 × 3
##   race                                 m.bp sd.bp
##   <fct>                               <dbl> <dbl>
## 1 Mexican American                     45.7  13.9
## 2 Other Hispanic                       46.1  13.1
## 3 Non-Hispanic White                   45.2  14.5
## 4 Non-Hispanic Black                   42.8  13.6
## 5 Other Race - Including Multi-Racial  45.2  13.4

Post hoc test

Bonferroni post hoc test to see the association between blood pressure and race.

##   Kruskal-Wallis rank sum test
## 
## data: x and group
## Kruskal-Wallis chi-squared = 28.2827, df = 4, p-value = 0
## 
## 
##                            Comparison of x by group                            
##                                  (Bonferroni)                                  
## Col Mean-|
## Row Mean |   Mexican    Non-Hisp   Non-Hisp   Other Hi
## ---------+--------------------------------------------
## Non-Hisp |   3.158340
##          |    0.0079*
##          |
## Non-Hisp |   0.460639  -4.263310
##          |     1.0000    0.0001*
##          |
## Other Hi |  -0.317301  -3.764135  -0.897821
##          |     1.0000    0.0008*     1.0000
##          |
## Other Ra |   0.624451  -3.037875   0.292819   1.015729
##          |     1.0000    0.0119*     1.0000     1.0000
## 
## alpha = 0.05
## Reject Ho if p <= alpha/2

The Dunn test is a rank-sum test. There is association of blood pressure with Mexican American(mean=45.69), Other Hispanic (mean=46.09), non-Hispanic white (mean=45.21), non-Hispanic Black (42.76), Other race (mean=45.24). There is significant difference of mean between non-Hispanic, other race and Mexican american with that of Non-Hispanic Black and other Hispanic. The p-value is adjusted using a Bonferroni adjustment.

Association between cholesterol and blood pressure

This will show us the association between cholesterol levels and blood pressure.

Normality assumption of Blood pressure with the cholesterol.

Figure-8 The density plot looks normally distributed, and BP is normally distributed across the cholesterol categories.

Homogenity of variance test for high blood pressure and cholesterol.

## Levene's Test for Homogeneity of Variance (center = mean)
##         Df F value   Pr(>F)    
## group    1  23.306 1.45e-06 ***
##       3061                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As the p-value is less than 0.05, we can reject the null hypothesis and accept the alternate hypothesis that the variances are not equal. The assumption of equal variance is not met.I will use the non-parametric alternative to the independent t-test. Alternative to Independent t-test is Mann-Whitney U Test.

Mann-Whitney U Test

Ho: There is no relationship between blood pressure level and cholesterol level of the person. HA: There is a relationship between blood pressure level and cholesterol level of the person.

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  bp by cholesterol
## W = 1361694, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  3.000005 5.000048
## sample estimates:
## difference in location 
##               4.999923

Interpretation of Mann-Whitney U Test

The Mann-Whitney U test results with a p-value of 2.2e-16 with the continuity correction test are very close to zero. This indicates that we should reject the null hypothesis and accept the alternate hypothesis that there is a highly significant association between cholesterol and blood pressure. An increase in cholesterol level can significantly increase the risk of high blood pressure.

Conclusion

The most significant modifiable risk factor for global all-cause morbidity and death is systemic arterial hypertension, which is also linked to an elevated risk of cardiovascular disease (CVD). Fewer than half of participants with hypertension are aware of their illness, and many more are aware but untreated or improperly managed, even though good hypertension therapy lowers the worldwide burden of disease and mortality.

My study found the correlation of blood pressure with different variables, such as age, race, gender, and cholesterol. With this, I was able to identify the relationship between systolic blood pressure and the above variables. This data can make the general population aware of who may be more at risk for hypertension than others and encourage them to take precautionary steps to live a healthy life. This information should also make the general population and healthcare providers aware of the early age at which a person develops hypertension among different races. This will help the health department of the government formulate screening tests based on the results of this study for early detection of hypertension, which will lead to its early management. There is a significant positive correlation between blood pressure and cholesterol levels. This study will also encourage the general population to adopt healthy food habits by avoiding fast food, fatty foods, etc. With this study’s result, people will be more motivated to go for hypertension screening at an early age.

The results in the age in years revealed that the assumption was met. In spearman correlation test, the rho value was 0.64, which is positive and lies between 0.5 and 0.8, which signifies that there is a statistically significant correlation between age and blood pressure. Previous studies have shown the association between age and blood pressure [2,3].The most likely explanation is that aging has a significant effect on large arteries, causing arterial stiffness, which leads to an increase in arterial pressure and hypertension [4].

With the results for gender (males and females), the assumption was met. It revealed that there is no significant difference in the blood pressure pattern between males and females. But according to Jane Recklehoff, men are at greater risk for cardiovascular and renal disease than are age-matched, premenopausal women [4].24-hour ambulatory blood pressure monitoring has shown that blood pressure is higher in men than in women at similar ages. After menopause, however, blood pressure increases in women to levels even higher than in men (5).

As far as race, the results concludes that the assumption was met. Kruskal wallis test reveals that the p-value is less than 0.05, we can reject the null hypothesis and accept the alternate hypothesis.There is a significant association between blood pressure and race. My study shows that Non-Hispanic black and other race are more prone to have high blood pressure at younger age compared to Mexican american, non Hispanic white and other Hispanic. Previous studies found the similar association of blood pressure with race. (6)

This study also found a significant correlation between blood pressure and cholesterol level. The Mann-Whitney U test results with a p-value of 2.2e-16 which shows a highly significant association between cholesterol and blood pressure.An increase in cholesterol level can significantly increase the risk of high blood pressure. Abnormal lipid profile/cholesterol causes the aetherosceloris of the arteries which can significantly decrease the lumen size and causes increased in blood pressure[7].

Overall, the study validated the literature in examining the association between the mentioned factors and blood pressure. It also put doubt on a couple variables, which I will explore in the future. For instance, there was no association between gender and blood pressure which is in contrast with the previous studies. In addition, for future study initiatives, I would use a large sample size with the same predictors: age, race, education level, and gender. One restriction I’d like to mention is that this study has a small sample size, which can undermine the study’s power. I’d want to collect a big population sample for research and then investigate the association between gender and blood pressure.

References

  1. https://www.cdc.gov/nchs/nhanes/index.htm?CDC_AA_refVal=https%3A%2F%2Fwww.cdc.gov%2Fnchs%2Fnhanes.htm

2.McEniery, C. M., Wilkinson, I. B., & Avolio, A. P. (2007). Age, hypertension and arterial function. Clinical and Experimental Pharmacology and Physiology, 34(7), 665-671.

3.Lisa Cohen, Gary C. Curhan, John P. Forman,Influence of age on the association between lifestyle factors and risk of hypertension,Journal of the American Society of Hypertension,Volume 6, Issue 4,2012,Pages 284-290,ISSN 1933-1711, https://doi.org/10.1016/j.jash.2012.06.002.

  1. Reckelhoff JF. Gender differences in the regulation of blood pressure. Hypertension. 2001;37(5):1199-1208. doi:10.1161/01.hyp.37.5.1199. Guzman, N.J. Epidemiology and Management of Hypertension in the Hispanic Population. Am J Cardiovasc Drugs 12, 165–178 (2012). https://doi.org/10.2165/11631520-000000000-00000

  2. Doumas, M., Papademetriou, V., Faselis, C. et al. Gender Differences in Hypertension: Myths and Reality. Curr Hypertens Rep 15, 321–330 (2013). https://doi.org/10.1007/s11906-013-0359-y

6.Cangiano JL. Hypertension in Hispanic Americans. Cleveland Clinic Journal of Medicine. 1994 Sep-Oct;61(5):345-350. DOI: 10.3949/ccjm.61.5.345. PMID: 7955306.

  1. Castelli, William P., and Keaven Anderson. “A population at risk: prevalence of high cholesterol levels in hypertensive patients in the Framingham Study.” The American journal of medicine 80.2 (1986): 23-32.