Using the Heart Disease dataset found on Kaggle, I want to know if variables such as sex, age, and maximum heart rate achieved has a relationship with the amuont of chestpains someone experiences. This dataset contains 303 cases with 14 different variables from chest pain with exercise to fasting blood sugar. The variables consists of : age= age in years sex=(1 = male; 0 = female) cp=chest pain type trestbps=resting blood pressure (in mm Hg on admission to the hospital) chol= serum cholestoral in mg/dl fbs=(fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) restecg=resting electrocardiographic results thalach=maximum heart rate achieved exang=exercise induced angina (1 = yes; 0 = no) oldpeak=ST depression induced by exercise relative to rest slope=the slope of the peak exercise ST segment ca=number of major vessels (0-3) colored by flourosopy thal=3 = normal; 6 = fixed defect; 7 = reversable defect target=1 or 0
I will be using the four variables listed earlier, age, sex, chestpain and maximum heart rate focusing on their relationships.
Here I read in my dataset, installed and called all the packages I will be using. The data that was given also had some unclear variable names so in order to maintain clarity, I changed the ones that were not as obvious.
The dependent variable that I wanted to focus on was chestpains. Because it was given as a 0,1,2,3 scale, I had to create a new variable that was binary so that 0 which was no chest pain would equal to 0 and 1,2,3 which were different levels of chestpain indications would become 1.
hd_csv <- read.csv("C:/Users/Jessica/Desktop/712/heart_disease.csv")
library(readr)
library(dplyr)
library(tibble)
library(texreg)
library(visreg)
health <- rename(hd_csv,
resting_blood_pressure = trestbps,
fasting_blood_sugar=fbs,
maximum_heart_rate_achieved = thalach,
chest_pain_level = cp ,
chest_pain_exercise = exang,
number_of_major_vessels_colored_by_flouroscopy = ca)
health2 <- health%>%
mutate(chestpain = ifelse(chest_pain_level < 1, 0, 1))
Here the binary dependent variable is chestpain and is ran with the independent variable sex.
m0 <- glm(chestpain ~ sex, family = binomial, data = health2)
summary(m0)
##
## Call:
## glm(formula = chestpain ~ sex, family = binomial, data = health2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.342 -1.173 1.021 1.182 1.182
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.3795 0.2078 1.826 0.0678 .
## sex -0.3892 0.2500 -1.556 0.1196
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 419.09 on 302 degrees of freedom
## Residual deviance: 416.65 on 301 degrees of freedom
## AIC: 420.65
##
## Number of Fisher Scoring iterations: 4
The sex coefficient here shows that males have a decreased odds of having chestpains by a factor of 0.3892. In this regression model we find that there is no significance when using chestpains and sex because the p-value is 0.1196. But we will continue on with adding a second independent variable to see if there is a change.
Now I added maximum heart rate achieved with age to see if there is a significant variable.
m1 <- glm(chestpain ~ sex +age+maximum_heart_rate_achieved, family = binomial, data = health2)
summary(m1)
##
## Call:
## glm(formula = chestpain ~ sex + age + maximum_heart_rate_achieved,
## family = binomial, data = health2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8693 -1.0641 0.6665 0.9789 1.9599
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.3569537 1.5423011 -3.473 0.000514 ***
## sex -0.3548546 0.2696758 -1.316 0.188222
## age 0.0008892 0.0150985 0.059 0.953035
## maximum_heart_rate_achieved 0.0378101 0.0066466 5.689 1.28e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 419.09 on 302 degrees of freedom
## Residual deviance: 371.42 on 299 degrees of freedom
## AIC: 379.42
##
## Number of Fisher Scoring iterations: 4
Here we see that there is a significant relationship when maximum heart rate achieved is introduced. The sex coefficient still has a decreased odds of having chestpains but it is still not significant. The age coefficient here shows that for every one year incease in age there is an increased odds of having chestpains by a facotr of 0.0008892 and the p-value is 0.953035 which is not significant.The maximum heart rate achieved coefficient is showing that for every uit increase in maximum heart rate achieved, there is a increased odds of having chestpains by a factor of 0.0378101. The p-value is significant at 1.28e-08.
I multiplied age and maximum heart rate achieved to create the interaction term and applied it to the third and last regression model.
m2 <- glm(chestpain ~ age*maximum_heart_rate_achieved + sex, family = binomial, data = health2)
summary(m2)
##
## Call:
## glm(formula = chestpain ~ age * maximum_heart_rate_achieved +
## sex, family = binomial, data = health2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2690 -1.0368 0.5446 1.0113 1.7766
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.961e+01 6.349e+00 -3.089 0.00201 **
## age 2.563e-01 1.094e-01 2.342 0.01919 *
## maximum_heart_rate_achieved 1.311e-01 4.095e-02 3.202 0.00137 **
## sex -3.602e-01 2.700e-01 -1.334 0.18217
## age:maximum_heart_rate_achieved -1.684e-03 7.161e-04 -2.351 0.01871 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 419.09 on 302 degrees of freedom
## Residual deviance: 365.58 on 298 degrees of freedom
## AIC: 375.58
##
## Number of Fisher Scoring iterations: 4
The interaction coefficient here shows that when age and maximum heart rate achieved are interacted together, there is decrease odds of having chestpains by a factor of 1.684e-03. This is significant because the p-value is 0.01871.
htmlreg(list(m0, m1, m2), doctype = FALSE)
| Model 1 | Model 2 | Model 3 | ||
|---|---|---|---|---|
| (Intercept) | 0.38 | -5.36*** | -19.61** | |
| (0.21) | (1.54) | (6.35) | ||
| sex | -0.39 | -0.35 | -0.36 | |
| (0.25) | (0.27) | (0.27) | ||
| age | 0.00 | 0.26* | ||
| (0.02) | (0.11) | |||
| maximum_heart_rate_achieved | 0.04*** | 0.13** | ||
| (0.01) | (0.04) | |||
| age:maximum_heart_rate_achieved | -0.00* | |||
| (0.00) | ||||
| AIC | 420.65 | 379.42 | 375.58 | |
| BIC | 428.07 | 394.27 | 394.14 | |
| Log Likelihood | -208.32 | -185.71 | -182.79 | |
| Deviance | 416.65 | 371.42 | 365.58 | |
| Num. obs. | 303 | 303 | 303 | |
| p < 0.001, p < 0.01, p < 0.05 | ||||
In the best model table, model 3 had the lowest BIC and AIC value indicating that it is the best fit model for this analysis.
anova(m0, m1, m2, test = "Chisq")
In the ANOVA, we see that model 2 and model 3 is significant but because each model is compared to the previous one. This means that there is a significant difference between model 3 and model 2.
visreg(m2, "age", scale = "response")
## Warning: Note that you are attempting to plot a 'main effect' in a model that contains an
## interaction. This is potentially misleading; you may wish to consider using the 'by'
## argument.
## Conditions used in construction of plot
## maximum_heart_rate_achieved: 153
## sex: 1
Here we see that there is a slight decrease in chestpain as age increases.
visreg(m2, "chestpain", by = "sex")
Here we see lower odds of having chestpains for males compared to women.
visreg(m2, "sex", by = "chestpain", scale = "response")