Variable Index:

Binary dependent variable:
1-exang (exercise induced angina)
(1 = yes; 0 = no)

Independent variables:
2-age,
3-sex (1=male; 0=female),
4-chol-Serum Cholesterol in mg/dl,
5-fbs-fasting blood sugar >120 mg/dl, (1=true; 0=false)
6-thalach-heart rate,
7-trestbps-resting blood pressure,
8-restecg-resting electrocardiographic results-ECG; (0=normal, 1=having ST-T wave abnormality, 2=showing probable or definite left ventricular hypertrophy)
9-cp-chest pain type; (0=No pain, 1=typical angina, 2=atypical angina, 3=non-anginal pain)

Looking at Mean Values of various Tests contributing to Angina:

Mean Cholestrol level:

To take a start I want to see mean cholesterol level of both who has angina and those who have no angina.

HeartDisease %>%
  group_by (exang) %>%
  summarize(mean_Chloestrol=mean(chol)) %>%
  kable(col.names = c("Exercise Induced Angina","Mean Cholesterol"))%>%
  kable_styling("striped",full_width=F)%>%
  row_spec(0)
Exercise Induced Angina Mean Cholesterol
0 243.8480
1 251.2424

INTERPRETATION:
From the above results I can see that who has angina, the mean cholesterol level is more than who has no angina problem.

Considering Gender Effect; Cholestrol among Male and Females.

HeartDisease %>%
  group_by (exang, sex) %>%
  summarize(mean_Chloestrol=mean(chol)) %>%
  kable(col.names = c("Exercise Induced Angina","By Gender","Mean Cholesterol"))%>%
  kable_styling("striped",full_width=F)%>%
  row_spec(0)
Exercise Induced Angina By Gender Mean Cholesterol
0 0 257.9730
0 1 235.8077
1 0 272.5000
1 1 245.1688

INTERPRETATION:
Interesting results came out, Females whether they have angina problem or not their mean cholesterol level is high as compared to males. In my previous assignment it was seen though females have less probability of angina but now I see this parameter separately for males and females, females cholesterol levels came out high than males.

For a better look and easy to Read Table (Gender Effect):

HeartDisease %>%
  group_by(exang, sex) %>%
  summarize(mean_Chloestrol=mean(chol)) %>%
  spread(sex, mean_Chloestrol) %>%
    kable(col.names = c("Exercise Induced Angina", "Female", "Male"))%>%
  kable_styling("striped",full_width=F)%>%
  row_spec(0)
Exercise Induced Angina Female Male
0 257.973 235.8077
1 272.500 245.1688

INTERPRETATION:
Now it is much easier to read and understand that Mean Cholesterol level for Females either they have angina or not have higher values of cholesterol than males. To further explore, I would like to see other variables as well in regard to gender.

Fasting Blood Sugar Level among Males and Females:

HeartDisease %>%
  group_by (exang, sex) %>%
  summarize(mean_fbs=mean(fbs)) %>%
  spread(sex, mean_fbs) %>%
   kable(col.names = c("Angina", "Female", "Male"))%>%
  kable_styling("striped",full_width=F)%>%
  row_spec(0)
Angina Female Male
0 0.0945946 0.1692308
1 0.2272727 0.1428571

INTERPRETATION:
Results are showing that females who have no angina problem have less fasting blood sugar mean level than males BUT females who have angina problem have high fasting blood sugar mean level than males.

Blood Presure Level among Males and Females:

HeartDisease %>%
  group_by (exang, sex) %>%
  summarize(mean_trestbps=mean(trestbps)) %>%
  spread(sex, mean_trestbps) %>%
   kable(col.names = c("Angina", "Female", "Male"))%>%
  kable_styling("striped",full_width=F)%>%
  row_spec(0)
Angina Female Male
0 129.7432 131.4000
1 144.3182 130.1818

INTERPRETATION:
Results are showing that females who have angina problem have slightly high blood pressure mean level than males.

Heart Rate among Males and Females:

HeartDisease %>%
  group_by (exang, sex) %>%
  summarize(mean_thalach=mean(thalach)) %>%
  spread(sex, mean_thalach) %>%
   kable(col.names = c("Angina", "Female", "Male"))%>%
  kable_styling("striped",full_width=F)%>%
  row_spec(0)
Angina Female Male
0 152.5 157.4923
1 146.5 134.5584

INTERPRETATION:
Results are showing that females who have angina problem have high Heart rate mean level than males. Though probability of angina is less in females than males but those females who have angina problem have high levels means in results for all variables contributing to angina. This is a very interesting outcome.

Three Regression Models:

Now moving on to my regression models using Linear Model for independent variables.

model1 <- lm(trestbps~chol, data=HeartDisease)
model2 <- lm(trestbps~chol + sex , data=HeartDisease)
model3 <- lm(trestbps~chol*sex, data=HeartDisease)

Results:

htmlreg(list(model1, model2, model3), caption="", digits=3)
Model 1 Model 2 Model 3
(Intercept) 121.360*** 122.782*** 121.269***
(4.871) (5.464) (7.417)
chol 0.042* 0.039* 0.045
(0.019) (0.020) (0.028)
sex1 -1.269 1.723
(2.199) (10.143)
chol:sex1 -0.012
(0.040)
R2 0.015 0.016 0.017
Adj. R2 0.012 0.010 0.007
Num. obs. 303 303 303
RMSE 17.433 17.453 17.479
p < 0.001, p < 0.01, p < 0.05

INTERPRETATION:
For model 1, when there is no cholesterol, the average blood pressure is 121.360 and result is significant. Cholesterol is added by unit 1, blood pressure increases on average by 0.042.

For model 2, when there is no cholesterol, the average blood pressure is 122.782 and result is significant. Cholesterol increased by unit 1, blood pressure increases on an average by 0.039. Male’s blood pressure is less on average by 1.269.

For model 3 with no cholesterol, average blood pressure is 121.269 and result is significant. An increase in Cholesterol by 1 unit, average blood increases by 0.045. Effect of cholesterol on blood pressure is smaller for males than females on an average by 0.012.

Subgrouping Analysis:

Now I am breaking it down into two separate models, one for Males and other for Females and estimate the same models separately. Trick is instead of one model for males and females, now i will have one model for males and one model for females separately and parameters are easy to understand.

HeartDiseaseF <- HeartDisease %>%
  filter(sex==0)
HeartDiseaseM <- HeartDisease %>%
  filter(sex==1)
modelF <- lm(trestbps~chol, data=HeartDiseaseF)
modelM <- lm(trestbps~chol, data=HeartDiseaseM)
htmlreg(list(modelF, modelM, model3), caption="", custom.model.names=c("Females", "Males", "Both"), digits=3)
Females Males Both
(Intercept) 121.269*** 122.993*** 121.269***
(8.142) (6.586) (7.417)
chol 0.045 0.033 0.045
(0.030) (0.027) (0.028)
sex1 1.723
(10.143)
chol:sex1 -0.012
(0.040)
R2 0.023 0.007 0.017
Adj. R2 0.013 0.002 0.007
Num. obs. 96 207 303
RMSE 19.187 16.638 17.479
p < 0.001, p < 0.01, p < 0.05

INTERPRETATION:
Now I have a clear table for Females and Males separately and I can interpret easily. First two models belongs to gender specific and easily compare Blood pressure when cholesterol is increase by unit 1, male blood pressure is less than females. Third model is a combined model. The main effect in the combine model, reference to the effect of cholesterol for the omitted gender group is female. Intercept for only females is same i.e 121.269 in the combine model because female is the reference category. The independent variable female =0 and male=1 which means the reference category for male is female. Estimated male coefficient is -0.012 is the difference between two intercepts of male and female (0.033-0.045=-0.012) and on average male’s have less blood pressure by 0.012. The effect of cholesterol in males is less than females.

Interactions in Logit models:

Now i would like to handle interaction terms in context of logist model. I would use all variables contributing to angina since i used few in my last assignment.

Model1:

#Now lets focus on angina heart disease.
model_angina1 <- glm(exang~age, family="binomial", data=HeartDisease)
summary(model_angina1)
## 
## Call:
## glm(formula = exang ~ age, family = "binomial", data = HeartDisease)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0812  -0.9178  -0.8176   1.4303   1.7010  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.98760    0.76812  -2.588  0.00966 **
## age          0.02312    0.01378   1.678  0.09334 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 382.90  on 302  degrees of freedom
## Residual deviance: 380.03  on 301  degrees of freedom
## AIC: 384.03
## 
## Number of Fisher Scoring iterations: 4

INTERPRETATION:
An increase in age by 1 year, log odds of Exercise induced Angina increases on an average by 0.023.

Model2:

model_angina2 <- glm(exang~restecg+age, family="binomial", data=HeartDisease)
summary(model_angina2)
## 
## Call:
## glm(formula = exang ~ restecg + age, family = "binomial", data = HeartDisease)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0755  -0.9232  -0.8105   1.4067   1.7411  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.77456    0.79543  -2.231   0.0257 *
## restecg     -0.24747    0.23671  -1.045   0.2958  
## age          0.02155    0.01389   1.551   0.1208  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 382.90  on 302  degrees of freedom
## Residual deviance: 378.93  on 300  degrees of freedom
## AIC: 384.93
## 
## Number of Fisher Scoring iterations: 4

INTERPRETATION:
Adding another variable restecg-resting electrocardiographic results-ECG, an increase in restecg by unit 1, log odds of Exercise induced Angina decreases on an average by 0.247. An increase in age by 1 year, log odds of Exercise induced Angina increases on an average by 0.021

Model3:

model_angina3 <- glm(exang~trestbps*chol+ fbs, family="binomial", data=HeartDisease)
summary(model_angina3)
## 
## Call:
## glm(formula = exang ~ trestbps * chol + fbs, family = "binomial", 
##     data = HeartDisease)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2528  -0.8779  -0.8403   1.4138   1.6434  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)
## (Intercept)    2.5649531  4.8676216   0.527    0.598
## trestbps      -0.0299943  0.0371449  -0.807    0.419
## chol          -0.0165461  0.0188936  -0.876    0.381
## fbs            0.1063919  0.3469776   0.307    0.759
## trestbps:chol  0.0001449  0.0001430   1.013    0.311
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 382.90  on 302  degrees of freedom
## Residual deviance: 379.37  on 298  degrees of freedom
## AIC: 389.37
## 
## Number of Fisher Scoring iterations: 4

INTERPRETATION:
As blood pressure increases by unit 1, the log odds of Angina decreases on an average by 0.029. With 1 unit increase in cholesterol, the log odds of angina decreases on an average by 0.0165. As blood sugar increase by unit 1, the log odds of Angina increases on an average by 1.06. For interaction term, an effect of blood pressure on log odds of angina depends on cholesteral, i.e every 1 unit increase in blood pressure increases the log odds of angina on an average by 0.00014 as cholesterol increases.

Model4:

model_angina4 <- glm(exang~restecg*age+ fbs +trestbps+ cp+ thalach + chol+ sex, family="binomial", data=HeartDisease)
summary(model_angina4)
## 
## Call:
## glm(formula = exang ~ restecg * age + fbs + trestbps + cp + thalach + 
##     chol + sex, family = "binomial", data = HeartDisease)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1708  -0.6686  -0.3977   0.7638   2.3315  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.424978   2.130523   1.608 0.107928    
## restecg     -1.534771   1.807735  -0.849 0.395880    
## age         -0.033650   0.025119  -1.340 0.180366    
## fbs          0.302819   0.422145   0.717 0.473168    
## trestbps     0.009490   0.008791   1.080 0.280347    
## cp1         -2.167841   0.569881  -3.804 0.000142 ***
## cp2         -1.821509   0.384431  -4.738 2.16e-06 ***
## cp3         -1.627227   0.609671  -2.669 0.007607 ** 
## thalach     -0.029608   0.007457  -3.971 7.17e-05 ***
## chol         0.004314   0.003011   1.433 0.151857    
## sex1         0.732795   0.348194   2.105 0.035330 *  
## restecg:age  0.027298   0.031939   0.855 0.392732    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 382.90  on 302  degrees of freedom
## Residual deviance: 285.82  on 291  degrees of freedom
## AIC: 309.82
## 
## Number of Fisher Scoring iterations: 5

INTERPRETATION:
Adding mix variables,an increase in restecg by unit 1, log odds of Exercise induced Angina decreases on an average by 1.534.
An increase in age by 1 year, log odds of Exercise induced Angina decreases on an average by 0.033.
An increase in fb-fasting blood sugar by 1 unit, log odds of Exercise induced Angina increases on an average by 0.302.
An increase in trestbps-Blood pressure by 1 unit, log odds of Exercise induced Angina increases on an average by 0.009.
An increase in cp1-typical angina by unit 1, log odds of Exercise induced Angina is decreases on an average by 2.167 and results is significant.
An increase in cp2-atypical angina by unit 1, log odds of Exercise induced Angina is decreases on an average by 1.821 and results is significant.
An increase in cp3-non-anginal pain by unit 1, log odds of Exercise induced Angina is decreases on an average by 1.627.
An increase in thalach-heart rate by 1 unit, log odds of Exercise induced Angina decreases on an average by 0.029 and results is significant.
An increase in cholesterol by 1 unit, log odds of Exercise induced Angina increases on an average by 0.004.
Male log odds of Exercise induced Angina increases on an average by 0.732.
For interaction term, an effect of restecg on angina depends on age, log odds of Exercise induced Angina increases on an average by 0.027.

Let’s Table and Compare Models:

htmlreg(list(model_angina1, model_angina2, model_angina3, model_angina4), caption="", digits=3)
Model 1 Model 2 Model 3 Model 4
(Intercept) -1.988** -1.775* 2.565 3.425
(0.768) (0.795) (4.868) (2.131)
age 0.023 0.022 -0.034
(0.014) (0.014) (0.025)
restecg -0.247 -1.535
(0.237) (1.808)
trestbps -0.030 0.009
(0.037) (0.009)
chol -0.017 0.004
(0.019) (0.003)
fbs 0.106 0.303
(0.347) (0.422)
trestbps:chol 0.000
(0.000)
cp1 -2.168***
(0.570)
cp2 -1.822***
(0.384)
cp3 -1.627**
(0.610)
thalach -0.030***
(0.007)
sex1 0.733*
(0.348)
restecg:age 0.027
(0.032)
AIC 384.031 384.932 389.367 309.819
BIC 391.458 396.073 407.936 354.383
Log Likelihood -190.015 -189.466 -189.684 -142.909
Deviance 380.031 378.932 379.367 285.819
Num. obs. 303 303 303 303
p < 0.001, p < 0.01, p < 0.05

INTERPRETATION:
Lower values of AIC (309.819) and BIC(354.383) is showing that model 4 is the best fit and it also includes max variables.

Now I am measuring my models Fit based on log liklihood fuction through a command anova:

anova(model_angina1, model_angina2, model_angina3, model_angina4, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: exang ~ age
## Model 2: exang ~ restecg + age
## Model 3: exang ~ trestbps * chol + fbs
## Model 4: exang ~ restecg * age + fbs + trestbps + cp + thalach + chol + 
##     sex
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)    
## 1       301     380.03                         
## 2       300     378.93  1    1.099   0.2945    
## 3       298     379.37  2   -0.436             
## 4       291     285.82  7   93.549   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

INTERPRETATION:
There is a significant difference between model 4 and other models. This means Model 4 which includes interaction term and all other parameters fits the data significantly better than other models. Deviance is a measure of error, lower deviance means better fit to data. In my analysis of deviance, I can see that model 4 is the best based on deviance as it has the lowest compared to the other models. It is statistically significance at a p-value of 0.001.

Using the interaction package

interact_plot(model_angina3, pred=trestbps, modx=chol, modx.values = c(126, 303, 564))

INTERPRETATION:
At cholesterol level 564, there is a positive relationship between angina and blood pressure. As blood pressure increases, angina probability increases. At cholesterol level 303 a positive relationship is seen as well. But at cholesterol level 126 there is a negative relationship, as blood presure increases, angina decreases. It seems that cholesterol levels plays an important role in angina.

interact_plot(model_angina4, pred=restecg, modx=age, modx.values = c(30, 50, 70))

INTERPRETATION:
They key analytical objective of the above graph to see how the relationship between angina and restecg changes with age. It is showing a positive relationship at age 70. When restecg-resting electrocardiographic is “0” i.e normal, have less angina rate, angina increases. As restecg increases to “1” having ST-T wave abnormality and “2” for showing probable or definite left ventricular hypertrophy, angina also increases for age 70. For younger age at 30 years, slope is very steep and have a negative relationship, as rectecg increases, angina decreases and is same for age 50.

Visualizations:

visreg(model3, "trestbps", by = "sex", scale = "response", ,line=list(col="skyblue"),
                             fill=list(col="navyblue"), xlab="sex")

INTERPRETATION:
By taking lm model 3, comparing one of the variables among male and female, I can see in the above visual that females who have angina have more blood pressure than males.

ggplot(HeartDisease) +
  aes(x = age) +
  aes(y = restecg) +
  facet_wrap(~ cp, scales = "free_y", nrow = 2) +
  geom_smooth(col = "blue")+
  ylab("ECG")+
  xlab( "Age") +
  labs(title = " ECG Patterns for Chest Pain Type Levels at Different Ages")+
  theme_bw(base_size = 13)

INTERPRETATION:
It is seen from above graph that for Typical Angina graph rate is higher than rest three categories of chest pain.
(chest pain type; 0=No pain, 1=typical angina, 2=atypical angina, 3=non-anginal pain)

ggplot(data=HeartDisease,aes(x=exang,y=chol,fill=sex))+geom_bar(stat="identity")+labs(title="Frequency Of Cholesterol",subtitle = "By Angina and Gender",x="Angina",y="Total Cholestrol")+theme_ridges()

INTERPRETATION:
It is clear from the above chart that though females have less angina than males but those who have angina have high cholesterol level than males.

CONCLUSION:

These analysis answer my research questions. Mean analysis showed that parameters contributing to Angina are lower in value when there is no angina. With gender effect, females have higher levels than males in Mean analysis of variables contributing to angina. From regression models I have seen that effect of cholesterol on blood pressure among males are less than females. In logist models, my best model is model 3 because vales of AIC and BIC values are lowest. Interact plot showed that at age 70 as rectecg-resting electrocardiographic increases angina increase.