library("stats")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(testit)
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(stringr)
library(httr)
library(here)
## here() starts at C:/Users/Mahmuda Sultana/Desktop/Interview/JCPH
library(ggplot2)
library(dplyr)
library(readxl)
Interview_Data <- read_excel("C:/Users/Mahmuda Sultana/Desktop/Interview/JCPH/Interview Data.xlsx")
View(Interview_Data)
missing_data <- colSums(is.na(Interview_Data))
print(missing_data)
## SEX TOTCHOL AGE SYSBP DIABP CURSMOKE BMI DIABETES
## 0 52 0 0 0 0 19 0
## educ PREVCHD PREVHYP DEATH ANYCHD TIMECHD TIMEDTH Gender
## 113 0 0 0 0 0 0 0
## PY
## 0
missing_proportion <- colSums(is.na(Interview_Data)) / nrow(Interview_Data)
missing_percentage <- missing_proportion * 100
print(missing_percentage)
## SEX TOTCHOL AGE SYSBP DIABP CURSMOKE BMI DIABETES
## 0.000000 1.172756 0.000000 0.000000 0.000000 0.000000 0.428507 0.000000
## educ PREVCHD PREVHYP DEATH ANYCHD TIMECHD TIMEDTH Gender
## 2.548489 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## PY
## 0.000000
filtered_data <- Interview_Data[complete.cases(Interview_Data), ]
# Fit logistic regression model for systolic blood pressure
model_sysbp <- glm(PREVCHD ~ SYSBP, data = filtered_data, family = binomial(link = "logit"))
# Calculate the odds ratio for a 5 mm/Hg increase in systolic blood pressure
odds_ratio_sysbp <- exp(5 * coef(model_sysbp)[2])
# Calculate the 95% confidence interval for the odds ratio
ci_sysbp <- exp(5 * confint(model_sysbp)[2,])
## Waiting for profiling to be done...
# Fit logistic regression model for diastolic blood pressure
model_diabp <- glm(PREVCHD ~ DIABP, data = filtered_data, family = binomial(link = "logit"))
# Calculate the odds ratio for a 5 mm/Hg increase in diastolic blood pressure
odds_ratio_diabp <- exp(5 * coef(model_diabp)[2])
# Calculate the 95% confidence interval for the odds ratio
ci_diabp <- exp(5 * confint(model_diabp)[2,])
## Waiting for profiling to be done...
cat("Systolic Blood Pressure:\n")
## Systolic Blood Pressure:
cat("Odds Ratio:", odds_ratio_sysbp, "\n")
## Odds Ratio: 1.107796
cat("95% Confidence Interval:", ci_sysbp, "\n\n")
## 95% Confidence Interval: 1.078202 1.137611
cat("Diastolic Blood Pressure:\n")
## Diastolic Blood Pressure:
cat("Odds Ratio:", odds_ratio_diabp, "\n")
## Odds Ratio: 1.137933
cat("95% Confidence Interval:", ci_diabp, "\n")
## 95% Confidence Interval: 1.076276 1.201248
The associations between blood pressure and prevalent Coronary Heart Disease (CHD) are statistically significant. Systolic Blood Pressure: For every unit increase in Systolic Blood Pressure, the odds of developing Coronary Heart Disease increase by approximately 10.8% to 13.8%, with 95% confidence. This suggests that higher Systolic Blood Pressure is associated with an increased risk of Coronary Heart Disease.
Diastolic Blood Pressure: For each unit increase in Diastolic Blood Pressure, the odds of developing Coronary Heart Disease increase by approximately 7.6% to 20.1%, with 95% confidence. This indicates that higher Diastolic Blood Pressure is also associated with an increased risk of Coronary Heart Disease.
In both cases, the odds ratios are statistically significant as the 95% confidence intervals do not include 1. This implies that there is a significant positive association between both Systolic and Diastolic Blood Pressure and the odds of developing Coronary Heart Disease. However, further investigation and consideration of potential confounders are important to better understand the underlying relationship between blood pressure and Coronary Heart Disease.
# Create a data frame for odds ratios and confidence intervals
results <- data.frame(
Variable = c("Systolic Blood Pressure", "Diastolic Blood Pressure"),
OddsRatio = c(1.104984, 1.139773),
CI_Lower = c(1.076118, 1.079457),
CI_Upper = c(1.134032, 1.201733)
)
# Create a forest plot using ggplot2
ggplot(results, aes(x = OddsRatio, y = Variable)) +
geom_point() +
geom_errorbarh(aes(xmin = CI_Lower, xmax = CI_Upper), height = 0.2) +
geom_vline(xintercept = 1, linetype = "dashed") +
labs(title = "Forest Plot of Odds Ratios",
x = "Odds Ratio",
y = "Variable") +
theme_minimal()
model_hyp <- glm(PREVCHD ~ PREVHYP, data = filtered_data, family = binomial(link = "logit"))
# Calculate the odds ratio for hypertension (PREVHYP) variable
odds_ratio_hyp <- exp(coef(model_hyp)[2])
# Calculate the 95% confidence interval for the odds ratio
ci_hyp <- exp(confint(model_hyp)[2,])
## Waiting for profiling to be done...
# Print the results
cat("Hypertension (PREVHYP):\n")
## Hypertension (PREVHYP):
cat("Odds Ratio:", odds_ratio_hyp, "\n")
## Odds Ratio: 3.22146
cat("95% Confidence Interval:", ci_hyp, "\n")
## 95% Confidence Interval: 2.387975 4.368145
Yes, the association between hypertension (PREVHYP) and prevalent Coronary Heart Disease (CHD) is significant
The results indicate that individuals with hypertension (PREVHYP = 1) have an odds ratio of 3.22146 for developing prevalent Coronary Heart Disease (CHD) compared to those without hypertension (PREVHYP = 0). This means that the odds of having CHD are approximately 3.22 times higher for individuals with hypertension compared to those without hypertension.
The 95% confidence interval (2.387975 to 4.368145) around the odds ratio does not include 1. This suggests that the association between hypertension and prevalent CHD is statistically significant. In other words, the increased odds of CHD among individuals with hypertension are unlikely to have occurred by random chance.
In summary, individuals with hypertension are significantly more likely to have prevalent Coronary Heart Disease compared to those without hypertension, as indicated by the odds ratio and its statistically significant confidence interval.
model_hyp_adjusted <- glm(PREVCHD ~ PREVHYP + AGE, data = filtered_data, family = binomial(link = "logit"))
# Calculate the odds ratio for hypertension (PREVHYP) variable adjusted for age
odds_ratio_hyp_adjusted <- exp(coef(model_hyp_adjusted)[2])
# Calculate the 95% confidence interval for the odds ratio adjusted for age
ci_hyp_adjusted <- exp(confint(model_hyp_adjusted)[2,])
## Waiting for profiling to be done...
# Print the results
cat("Hypertension (PREVHYP) Adjusted for Age:\n")
## Hypertension (PREVHYP) Adjusted for Age:
cat("Odds Ratio:", odds_ratio_hyp_adjusted, "\n")
## Odds Ratio: 1.893829
cat("95% Confidence Interval:", ci_hyp_adjusted, "\n")
## 95% Confidence Interval: 1.381835 2.60648
The odds ratio of 1.893829 indicates that, after adjusting for age, individuals with hypertension (PREVHYP = 1) have approximately 1.89 times higher odds of developing Coronary Heart Disease (CHD) compared to those without hypertension (PREVHYP = 0).
The 95% confidence interval (1.381835 to 2.60648) around the odds ratio does not include 1. This suggests that the association between hypertension and CHD remains statistically significant even after accounting for the effect of age. In other words, the increased odds of CHD among individuals with hypertension are unlikely to be solely due to age.
Considering that the odds ratio remains significant after adjusting for age, we can conclude that age does not confound the relationship between hypertension and CHD. This means that the association between hypertension and CHD is not explained solely by the influence of age and likely involves other factors as well.
In summary, the odds ratio and confidence interval suggest that the relationship between hypertension and CHD remains significant after accounting for age, indicating that age does not confound this relationship.
library("survival")
# Fit Cox proportional hazards model for hypertension
model_cox <- coxph(Surv(TIMECHD, ANYCHD) ~ PREVHYP, data = filtered_data)
# Calculate the hazard ratio for hypertension (PREVHYP) variable
hazard_ratio_hyp <- exp(coef(model_cox))
# Calculate the 95% confidence interval for the hazard ratio
ci_hyp <- exp(confint(model_cox))
# Print the results
cat("Hypertension (PREVHYP) and Incident CHD:\n")
## Hypertension (PREVHYP) and Incident CHD:
cat("Hazard Ratio:", hazard_ratio_hyp, "\n")
## Hazard Ratio: 2.280327
cat("95% Confidence Interval:", ci_hyp, "\n")
## 95% Confidence Interval: 2.033383 2.55726
Yes, the association between hypertension (PREVHYP) and incident Coronary Heart Disease (CHD) is significant
The hazard ratio of 2.280327 indicates that individuals with hypertension (PREVHYP = 1) have a hazard rate approximately 2.28 times higher for experiencing incident Coronary Heart Disease (CHD) compared to those without hypertension (PREVHYP = 0). This means that individuals with hypertension are at a higher risk of developing CHD during the follow-up period.
The 95% confidence interval (2.033383 to 2.55726) around the hazard ratio does not include 1. This suggests that the association between hypertension and incident CHD is statistically significant. In other words, the increased hazard of CHD among individuals with hypertension is unlikely to have occurred by random chance.
In summary, the hazard ratio and its statistically significant confidence interval suggest that individuals with hypertension are at a significantly higher risk of experiencing incident Coronary Heart Disease compared to those without hypertension during the follow-up period.
We perform a multivariable analysis to check for potential confounding variables in the relationship between blood pressure and Coronary Heart Disease (CHD). We use multiple regression.
model <- lm(PREVCHD ~ AGE + BMI + CURSMOKE + DIABETES + educ + CURSMOKE + PREVHYP+Gender+ TOTCHOL, data = filtered_data)
summary(model)
##
## Call:
## lm(formula = PREVCHD ~ AGE + BMI + CURSMOKE + DIABETES + educ +
## CURSMOKE + PREVHYP + Gender + TOTCHOL, data = filtered_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.19207 -0.07152 -0.03481 -0.00135 1.01694
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.740e-01 3.190e-02 -5.455 5.17e-08 ***
## AGE 4.143e-03 3.933e-04 10.533 < 2e-16 ***
## BMI -4.672e-05 8.014e-04 -0.058 0.9535
## CURSMOKE 4.785e-03 6.460e-03 0.741 0.4589
## DIABETES 3.242e-02 1.895e-02 1.711 0.0872 .
## educ 3.559e-03 3.034e-03 1.173 0.2409
## PREVHYP 3.058e-02 7.162e-03 4.270 2.00e-05 ***
## Gender 3.591e-02 6.325e-03 5.677 1.46e-08 ***
## TOTCHOL -1.018e-04 7.196e-05 -1.415 0.1572
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1985 on 4244 degrees of freedom
## Multiple R-squared: 0.05001, Adjusted R-squared: 0.04822
## F-statistic: 27.93 on 8 and 4244 DF, p-value: < 2.2e-16
AGE coefficient (4.143e-03) is positive and statistically significant (p < 0.001), suggesting that for each one-year increase in age, the odds of prevalent CHD increase by about 0.4%.
BMI coefficient (-4.672e-05) is close to zero and not statistically significant (p = 0.9535), indicating that there is no significant association between BMI and prevalent CHD in this model.
CURSMOKE coefficient (4.785e-03) is positive and not statistically significant (p = 0.4589), suggesting that there’s no significant association between being a smoker and prevalent CHD in this model.
DIABETES coefficient (3.242e-02) is positive and marginally significant (p = 0.0872), indicating that individuals with diabetes might have slightly higher odds of prevalent CHD. Although the p-value for diabetes is close to the conventional significance threshold of 0.05, further investigation is warranted to establish its significance.
educ coefficient (3.559e-03) is positive and not statistically significant (p = 0.2409), suggesting that education level doesn’t have a significant association with prevalent CHD in this model.
PREVHYP coefficient (3.058e-02) is positive and highly significant (p < 0.001), indicating that individuals with hypertension have significantly higher odds of prevalent CHD.
Gender coefficient (3.591e-02) is positive and highly significant (p < 0.001), indicating that being female is associated with higher odds of prevalent CHD.
TOTCHOL coefficient (-1.018e-04) is negative and not statistically significant (p = 0.1572), suggesting that there’s no significant association between total cholesterol and prevalent CHD in this model.
In summary, after adjusting for the variables considered (including age, hypertension, gender, and other factors), the analysis suggests that age, hypertension, and gender are significantly associated with prevalent CHD. However, other variables, including BMI, CURSMOKE, DIABETES, educ, and TOTCHOL, do not appear to contribute significantly to predicting prevalent CHD in this model. It’s important to consider the limitations and context of the data when interpreting these results.
# Fit Cox proportional hazards model adjusted for relevant confounders
model_cox_adjusted <- coxph(Surv(TIMECHD, ANYCHD) ~ PREVHYP + Gender + DIABETES, data = filtered_data)
# Calculate the hazard ratio for hypertension (PREVHYP) variable adjusted for relevant confounders
hazard_ratio_hyp_adjusted <- exp(coef(model_cox_adjusted)["PREVHYP"])
# Calculate the 95% confidence interval for the hazard ratio adjusted for relevant confounders
ci_hyp_adjusted <- exp(confint(model_cox_adjusted)["PREVHYP", ])
# Print the results
cat("Hypertension (PREVHYP) Adjusted for Relevant Confounders:\n")
## Hypertension (PREVHYP) Adjusted for Relevant Confounders:
cat("Hazard Ratio:", hazard_ratio_hyp_adjusted, "\n")
## Hazard Ratio: 2.225303
cat("95% Confidence Interval:", ci_hyp_adjusted, "\n")
## 95% Confidence Interval: 1.983678 2.496359
The Hazard Ratio for the association between hypertension (PREVHYP) and incident CHD, after adjusting for all relevant confounders, is 2.225303. The 95% Confidence Interval for this Hazard Ratio ranges from 1.983678 to 2.496359.
The Hazard Ratio of 2.225303 suggests that individuals with hypertension have approximately 2.23 times higher hazard of developing incident CHD compared to individuals without hypertension, after accounting for the influence of relevant confounding variables. This indicates a significant and substantial association between hypertension and the risk of incident CHD. The narrow confidence interval (1.983678 to 2.496359) further supports the reliability of this finding.
These results underscore the importance of hypertension as a significant risk factor for developing CHD and highlight the need for effective management and prevention strategies for individuals with hypertension to mitigate their risk of experiencing incident CHD..