About the Project

Overview

The Heart Disease University of California, Irvine dataset, consolidated from four sources—Cleveland, Hungarian, Switzerland, and VA—comprises 920 observations and 14 attributes, including the target variable. Each observation represents an individual patient, providing a comprehensive collection of demographic, clinical, and diagnostic data.

The dataset includes key variables such as age, sex, chest pain type, resting blood pressure, cholesterol levels, fasting blood sugar, resting ECG results, maximum heart rate achieved, exercise-induced angina, ST depression, and the slope of the ST segment. These variables are critical for understanding the factors contributing to heart disease and their interrelationships. The target variable indicates the presence or absence of heart disease, making the dataset valuable for both descriptive and predictive modeling.

This expanded dataset offers a richer foundation for analyzing the interactions between age, gender, cholesterol, blood sugar, and other health metrics. It provides an opportunity to uncover patterns that can enhance heart disease prevention, diagnosis, and management strategies across diverse patient populations.

Literature Review

Heart disease remains a leading cause of morbidity and mortality worldwide, with age and health factors playing critical roles in its onset and progression. Age is a non-modifiable risk factor strongly correlated with increased cardiovascular event susceptibility (D’Agostino et al., 2008; Lloyd-Jones et al., 2006). Modifiable health factors such as elevated cholesterol levels, hypertension, diabetes, obesity, sedentary lifestyles, and smoking are also well-established contributors to heart disease pathogenesis (Yusuf et al., 2004). Gender disparities in heart disease prevalence, clinical presentation, and outcomes have been consistently documented, underlining the need for gender-sensitive diagnostic and treatment frameworks (Hemingway & Marmot, 1999). Furthermore, physiological markers like maximum heart rate achieved during exercise have been shown to influence heart disease risk, with notable gender-specific differences (Gupta et al., 2020). Despite these insights, the complex interplay between age, gender, and other risk factors in predicting heart disease outcomes remains underexplored. Additionally, limited research has investigated how combinations of these factors (e.g., age and maximum heart rate or cholesterol and exercise-induced angina) interact to influence heart disease diagnosis and prognosis.

Research Gaps

  1. Age and Maximum Heart Rate Interaction: While maximum heart rate achieved during exercise is linked to heart disease risk, its interaction with age in predicting heart disease has not been extensively studied.
  2. Combinatory Effects of Risk Factors: The combined influence of multiple health factors, such as cholesterol levels and exercise-induced angina, on heart disease diagnosis is not well understood.
  3. Data-Driven Insights from Diverse Populations: Existing studies often focus on specific populations, potentially limiting the generalizability of findings. Further research leveraging diverse datasets, like the UCI Heart Disease dataset, can uncover more nuanced patterns.

Attributes:

  • Age: The age of the patient in years.
  • Sex: The gender of the patient (1 = male; 0 = female).
  • Chest Pain Type (CP): The type of chest pain experienced by the patient
    • Value 1: typical angina
    • Value 2: atypical angina
    • Value 3: non-anginal pain
    • Value 4: asymptomatic
  • Resting Blood Pressure (Trestbps): The resting blood pressure of the patient in mm Hg.
  • Cholesterol (Chol): The serum cholesterol level of the patient in mg/dl.
  • Fasting Blood Sugar (Fbs): The fasting blood sugar level of the patient (>120 mg/dl signifies high blood sugar). 1: True, 0: False
  • Maximum Heart Rate Achieved (Thalach): The maximum heart rate achieved by the patient during exercise.
  • Exercise-Induced Angina (Exang):Whether the patient experienced exercise-induced angina (1 = yes; 0 = no).
  • Num (Heart Disease): diagnosis of heart disease (angiographic disease status)
    • Value 0: < 50% diameter narrowing
    • Value 1: > 50% diameter narrowing

Research Questions & Hypotheses

Research Questions

  1. How do age interacted with maximum heart rate, and age interacted with chest pain type influence the risk of developing heart disease?

  2. How do key health factors, including cholesterol levels, blood pressure, blood sugar, and exercise-induced angina, collectively impact the likelihood of heart disease, and how does this relationship vary among different age groups?

Hypotheses

  1. Age and Maximum Heart Rate:

    The interaction between age and maximum heart rate significantly predicts the risk of heart disease, with maximum heart rates posing a higher risk in older individuals compared to younger ones.

  2. Age and Chest Pain Type:

The interaction between age and maximum heart rate significantly predicts the risk of heart disease when we subset the data with atypical and asymptomatic chest pain posing a higher risk in older individuals compared to younger ones.

  1. Combinatory Health Factors:

    The combined effect of elevated cholesterol levels, high blood pressure, elevated blood sugar, and exercise-induced angina significantly increases the likelihood of heart disease, and this effect is more pronounced in individuals aged 60 and above compared to younger age groups.

Descriptive Statistics

Variable Stats / Values Freqs (% of Valid) Graph text.graph Missing
age_group
[factor]
1. Adults(18-39)
2. Mid-aged(40-59)
3. Older(60+)
 64 ( 9.7%)
\445 (67.1%)
\154 (23.2%)
I   IIIIIIIIIIIII   IIII 0
(0.0%)
sex
[factor]
1. Female
2. Male
\171 (25.8%)
\492 (74.2%)
IIIII   IIIIIIIIIIIIII 0
(0.0%)
cholestrol
[integer]
Mean (sd) : 246.5 (57.6)
min
207 distinct values     :
    : :
    : :
  . : : .
  : : : : .
0
(0.0%)
fasting_blood_sugar
[factor]
1. 120mg/dL \563 (84.9%)
\100 (15.1%)
IIIIIIIIIIIIIIII   III 0
(0.0%)
resting_BP
[integer]
Mean (sd) : 132.7 (17.8)
min
56 distinct values     . :
    : : :
    : : :
  : : : : : .
. : : : : : : . .
0
(0.0%)
exercise_induced_angina
[factor]
1. No
2. Yes
\415 (62.6%)
\248 (37.4%)
IIIIIIIIIIII   IIIIIII 0
(0.0%)
heart_disease
[factor]
1. No
2. Yes
\349 (52.6%)
\314 (47.4%)
IIIIIIIIII   IIIIIIIII 0
(0.0%)

The distribution of variables is as follows:

Maximum Heart Rate and Chest Pain Type


Distribution of Maximum Heart Rate and Chest Pain Type


Maximum Heart Rate by Age Group


Interpretation of the Plot

  1. Decline in Maximum Heart Rate with Age:
    • The plot shows a clear trend where the maximum heart rate decreases across age groups. Adults (18-39) have higher maximum heart rates compared to older groups.
    • The median maximum heart rate for the older group (60+) is lower, highlighting physiological changes with aging.
  2. Variation within Age Groups:
    • Each age group shows variability in maximum heart rates, with the widest range seen in adults (18-39).
    • The data points (jittered) indicate individual variances within each age group.
  3. Clinical Implications:
    • Understanding the decline in maximum heart rate with age is essential for designing age-appropriate cardiovascular fitness assessments.
    • The results suggest the need for age-adjusted benchmarks when evaluating heart health and exercise capacity.

Chest Pain Type and Heart Disease


Interpretation of the Plot

  1. Chest Pain Type and Heart Disease Relationship:
    • The bars in the plot show a clear relationship between chest pain types and heart disease occurrence.
    • “Asymptomatic” chest pain has a high association with heart disease presence, as indicated by the taller blue bar.
  2. Variation Across Chest Pain Types:
    • Patients with “typical angina” are predominantly not diagnosed with heart disease.
    • “Atypical angina” and “non-anginal pain” show a mix of outcomes, with many not having heart disease.
  3. Implication on Diagnostics:
    • This data suggests that the type of chest pain can be a significant indicator in diagnosing heart disease, particularly for asymptomatic patients.

Logistic Regression Models, Marginal Effects:

[1] "Age Group and Maximum Heart Rate"
                                        effect error t.value p.value
(Intercept)                              1.740 1.198   1.452   0.148
age_groupMid-aged(40-59)                -0.922 0.288  -3.198   0.002
age_groupOlder(60+)                     -0.801 0.237  -3.376   0.001
max_heart_rate                          -0.012 0.007  -1.680   0.094
age_groupMid-aged(40-59):max_heart_rate  0.007 0.008   0.858   0.392
age_groupOlder(60+):max_heart_rate       0.020 0.011   1.868   0.063
[1] "Age Group and Maximum Heart Rate, Asymptomatic"
                                        effect error      t.value p.value
(Intercept)                              5.270 3.900        1.351   0.182
age_groupMid-aged(40-59)                -0.996 0.027      -36.770   0.000
age_groupOlder(60+)                     -1.000 0.000 -1253696.347   0.000
max_heart_rate                          -0.032 0.023       -1.346   0.183
age_groupMid-aged(40-59):max_heart_rate  0.022 0.025        0.883   0.381
age_groupOlder(60+):max_heart_rate       0.092 0.043        2.132   0.037
[1] "Age Group and Maximum Heart Rate, Atypical Angina"
                                        effect error   t.value p.value
(Intercept)                              1.738 1.451     1.198   0.235
age_groupMid-aged(40-59)                -1.000 0.000 -4080.018   0.000
age_groupOlder(60+)                     -0.118 0.187    -0.629   0.531
max_heart_rate                          -0.012 0.009    -1.316   0.192
age_groupMid-aged(40-59):max_heart_rate  0.013 0.009     1.536   0.129
age_groupOlder(60+):max_heart_rate       0.005 0.016     0.331   0.742

Data Subset:
A subset of heart data was used, focusing only on individuals with a max heart rate above 150. This cutoff was chosen because max heart rate is calculated as 220 - age, and we are particularly interested in older individuals (e.g., for age 60, max heart rate is approximately 160).

Age Group and Maximum Heart Rate: - Mid-aged and older age groups have a negative, statistically significant relationship with heart disease, indicating that as age increases, the probability of heart disease decreases.
- The interaction between older age and max heart rate is weakly significant (p = 0.063), suggesting a slight increase in heart disease probability when both age and max heart rate are high.

Age Group and Maximum Heart Rate, Asymptomatic: - For individuals with asymptomatic chest pain, the interaction between older age and max heart rate is significant (p = 0.037), showing that both higher age and max heart rate increase the probability of heart disease.
- Without the interaction term, older and mid-aged groups show a negative, statistically significant relationship with heart disease (p = 0.00).

Age Group and Maximum Heart Rate, Atypical Angina: - None of the interaction terms are statistically significant.
- The mid-aged group has a negative, statistically significant effect on heart disease probability (p = 0.00).


Hypothesis 1: Age Group and Max Heart Rate

Hypothesis 1: Older individuals over 60 with a maximum heart rate above 150 have a higher risk of heart disease

  • We focused on heart data where the maximum heart rate is greater than 150. This is because the expected maximum heart rate is typically calculated as 220 minus the person’s age. We are interested in comparing different age groups in all our hypotheses.

Interpretation:

  1. Age Group 40-59:
    • Individuals aged 40-59 show a lower risk of heart disease compared to those over 60, but this difference is not statistically significant (p-value = 0.4014).
  2. Age Group 60+:
    • Older individuals (60+) do not show a significantly different risk of heart disease compared to the baseline group (under 40 years old), though the result is close to significance (p-value = 0.0854).
  3. Maximum Heart Rate:
    • Maximum heart rate has a weak negative association with heart disease risk, but this effect is not statistically significant (p-value = 0.1007).
  4. Interaction Between Age Group and Maximum Heart Rate:
    • For individuals aged 40-59, the interaction with maximum heart rate shows a slight increase in heart disease risk, but this is not statistically significant (p-value = 0.3950).
    • For individuals aged 60+, the interaction with maximum heart rate significantly increases the risk of heart disease (p-value = 0.0647).

Conclusion: - The significant interaction between age group (60+) and maximum heart rate suggests that older individuals with higher maximum heart rates are at a greater risk of heart disease. This provides evidence to reject the null hypothesis and supports the idea that age and maximum heart rate together influence heart disease risk in older individuals


Call:
glm(formula = heart_disease ~ age_group * max_heart_rate, family = binomial(), 
    data = max_heart_data, x = TRUE)

Coefficients:
                                         Estimate Std. Error z value Pr(>|z|)  
(Intercept)                               9.73780    6.82268   1.427   0.1535  
age_groupMid-aged(40-59)                 -6.41545    7.64514  -0.839   0.4014  
age_groupOlder(60+)                     -17.31446   10.06516  -1.720   0.0854 .
max_heart_rate                           -0.06647    0.04049  -1.642   0.1007  
age_groupMid-aged(40-59):max_heart_rate   0.03875    0.04556   0.851   0.3950  
age_groupOlder(60+):max_heart_rate        0.11343    0.06141   1.847   0.0647 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 278.92  on 247  degrees of freedom
Residual deviance: 260.80  on 242  degrees of freedom
AIC: 272.8

Number of Fisher Scoring iterations: 5

Hypothesis 2.1: Asympomatic Chest Pain Type and Heart Disease

Hypothesis 2.1 : Older individuals over 60 with a maximum heart rate above 150 and asymptomatic chest pain have a higher risk of heart disease


Interpretation of the Plot:


Hypothesis 2.2: Atypical Angina Chest Pain Type and Heart Disease

Hypothesis 2.2: Older individuals over 60 with a maximum heart rate above 150 and atypical angina chest pain have a higher risk of heart disease


Call:
glm(formula = heart_disease ~ age_group * max_heart_rate, family = binomial(), 
    data = chestpain2_data, x = TRUE)

Coefficients:
                                         Estimate Std. Error z value Pr(>|z|)
(Intercept)                              27.44878   33.28623   0.825    0.410
age_groupMid-aged(40-59)                -34.14913   34.11029  -1.001    0.317
age_groupOlder(60+)                     -12.59926   43.50442  -0.290    0.772
max_heart_rate                           -0.18183    0.20717  -0.878    0.380
age_groupMid-aged(40-59):max_heart_rate   0.20827    0.21182   0.983    0.326
age_groupOlder(60+):max_heart_rate        0.08217    0.27262   0.301    0.763

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 52.633  on 82  degrees of freedom
Residual deviance: 49.024  on 77  degrees of freedom
AIC: 61.024

Number of Fisher Scoring iterations: 7

Interpretation:

  1. Age Group 40-59:
    • The coefficient for individuals aged 40-59 is -34.14913, indicating a lower risk of heart disease compared to the baseline group (under 40 years old), but this difference is not statistically significant (p-value = 0.317).
  2. Age Group 60+:
    • For individuals aged 60 and above, the coefficient is -12.59926, suggesting no significant difference in heart disease risk compared to the baseline group, with a p-value of 0.772.
  3. Maximum Heart Rate:
    • The coefficient for maximum heart rate is -0.18183, indicating a weak negative association with heart disease risk, but this effect is not statistically significant (p-value = 0.380).
  4. Interaction Effects:
    • Age Group 40-59 and Max Heart Rate: The interaction term has a coefficient of 0.20827, suggesting a slight increase in heart disease risk with higher maximum heart rates for those aged 40-59, but this is not statistically significant (p-value = 0.326).
    • Age Group 60+ and Max Heart Rate: The interaction term has a coefficient of 0.08217, indicating a minimal and non-significant increase in risk for older individuals (p-value = 0.763).

The analysis does not provide statistically significant evidence to support Hypothesis 2.2 that older individuals over 60 with a maximum heart rate above 150 and atypical angina chest pain have a higher risk of heart disease. None of the coefficients, including interaction terms, are statistically significant, indicating that neither age group nor maximum heart rate significantly predicts heart disease risk in this subsetted dataset focused on atypical angina cases with high maximum heart rates. Further investigation with additional variables or data might be necessary to explore this hypothesis more effectively.


Hypothesis 3: Combinatory Health Factors

App to check different user inputs and predict probability of heart disease: https://bhavanipriya.shinyapps.io/DACSS_604/


Call:
glm(formula = heart_disease ~ sex + resting_BP + age_group + 
    cholestrol + fasting_blood_sugar + exercise_induced_angina, 
    family = binomial(), data = heart_data_final, x = TRUE)

Coefficients:
                              Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  -5.166603   0.909498  -5.681 1.34e-08 ***
sexMale                       1.543237   0.246234   6.267 3.67e-10 ***
resting_BP                    0.010391   0.005551   1.872 0.061249 .  
age_groupMid-aged(40-59)      0.383764   0.345674   1.110 0.266918    
age_groupOlder(60+)           1.358127   0.394616   3.442 0.000578 ***
cholestrol                    0.004459   0.001676   2.661 0.007801 ** 
fasting_blood_sugar>120mg/dL  0.374795   0.273472   1.371 0.170528    
exercise_induced_anginaYes    2.142898   0.206369  10.384  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 917.26  on 662  degrees of freedom
Residual deviance: 664.29  on 655  degrees of freedom
AIC: 680.29

Number of Fisher Scoring iterations: 4

Interpretation

  1. Sex: The positive coefficient for sexMale (1.54) is statistically significant (p < 0.001). This suggests that males have a higher likelihood of heart disease compared to females.

  2. Resting Blood Pressure (resting_BP): The coefficient (0.0104) is positive and marginally significant (p = 0.061). While not very strong, it indicates that higher resting blood pressure is associated with a slight increase in heart disease probability.

  3. Age Group:

    • age_groupMid-aged(40-59) has a positive coefficient (0.38) but is not statistically significant (p = 0.27).
    • age_groupOlder(60+) has a positive coefficient (1.36) and is statistically significant (p < 0.001), indicating that older adults (60+) have a significantly higher likelihood of heart disease compared to younger adults (18-39).
  4. Cholesterol (cholestrol): The coefficient (0.0045) is positive and statistically significant (p = 0.0078). This indicates that elevated cholesterol levels are associated with an increased probability of heart disease.

  5. Fasting Blood Sugar (fasting_blood_sugar > 120mg/dL): The coefficient (0.375) is positive but not statistically significant (p = 0.17), suggesting it may not have a strong effect on predicting heart disease in this model.

  6. Exercise-Induced Angina (exercise_induced_anginaYes): The coefficient (2.14) is highly significant (p < 2e-16), indicating that individuals experiencing exercise-induced angina have a much higher likelihood of heart disease.

Given the statistically significant effect of the predictors (age_groupOlder(60+), cholestrol, and exercise_induced_anginaYes), we reject the null hypothesis. This indicates strong evidence that the combinatory health factors (elevated cholesterol, high blood pressure, elevated blood sugar, and exercise-induced angina) significantly increase the likelihood of heart disease, particularly in individuals aged 60 and above.

Key Findings

1. Age and Maximum Heart Rate Interaction

  • A significant relationship exists between age, maximum heart rate, and heart disease risk. Older individuals aged 60+ with a higher maximum heart rate (>150 bpm) have an increased probability of heart disease.
  • Younger and mid-aged individuals (18-39, 40-59) do not show significant associations between their maximum heart rate and heart disease risk, indicating that this interaction is particularly relevant for older age groups.

2. Chest Pain Type as a Predictor

  • Asymptomatic chest pain is strongly associated with the presence of heart disease, making it a key marker during diagnosis.
  • In contrast, patients reporting typical angina are predominantly free from heart disease, while atypical angina and non-anginal pain show a mix of outcomes, with less predictive power.

3. Combinatory Health Factors Increase Risk for Older Populations

  • The combined effect of elevated cholesterol levels, high blood pressure, elevated fasting blood sugar, and exercise-induced angina significantly increases the risk of heart disease.
  • Older adults (60+) experience more pronounced effects of these health factors compared to younger counterparts. This finding highlights the need for targeted intervention strategies in older populations.

4. Key Predictors of Heart Disease

  • Sex: Males have a significantly higher likelihood of heart disease compared to females.
  • Exercise-Induced Angina: Individuals who report experiencing angina during exercise are at a much higher risk, with this factor showing the strongest effect among predictors.
  • Cholesterol: Elevated cholesterol levels are a significant risk factor for heart disease, underscoring the importance of monitoring and managing cholesterol.

5. Decline in Maximum Heart Rate with Age

  • A clear trend shows that maximum heart rate decreases with age. This physiological change is important for evaluating cardiovascular fitness and designing age-appropriate fitness guidelines.

6. Demographic Disparities in the Dataset

  • The dataset is predominantly male (74.2%), with younger adults (18-39) forming only a small proportion (9.7%) of the sample. This skew in demographics highlights the need to account for potential biases and ensure broader representativeness in future studies.

Clinical Implications

These findings emphasize the critical need for age- and gender-sensitive diagnostic frameworks. Additionally, focused preventive strategies targeting modifiable risk factors, such as cholesterol, blood pressure, and exercise-induced angina, could substantially reduce heart disease risk, especially among older adults.

References:


---
title: "Exploring Vital Health Factors in Heart Disease Risk Assessment"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include=FALSE}
# Load required libraries
library(flexdashboard)
library(shiny)
library(dplyr)
library(ggplot2)
library(summarytools)
library(pROC)
library(effects)
library(dplyr)
library(naniar)
library(ggplot2)
library(summarytools)
library(psych)
library(reshape2)
library(stargazer)
library(gridExtra)
library(car)
library(broom)
library(tidyr)
library(effects)
library(erer)
library(plotly)
library(RColorBrewer)


# Load cleaned dataset and pre-trained model
heart_data_final <- readRDS("heart_data_final.rds")  # Cleaned dataset
```

### About the Project

#### **Overview**

The Heart Disease University of California, Irvine dataset, consolidated from four sources—Cleveland, Hungarian, Switzerland, and VA—comprises 920 observations and 14 attributes, including the target variable. Each observation represents an individual patient, providing a comprehensive collection of demographic, clinical, and diagnostic data.

The dataset includes key variables such as age, sex, chest pain type, resting blood pressure, cholesterol levels, fasting blood sugar, resting ECG results, maximum heart rate achieved, exercise-induced angina, ST depression, and the slope of the ST segment. These variables are critical for understanding the factors contributing to heart disease and their interrelationships. The target variable indicates the presence or absence of heart disease, making the dataset valuable for both descriptive and predictive modeling.

This expanded dataset offers a richer foundation for analyzing the interactions between age, gender, cholesterol, blood sugar, and other health metrics. It provides an opportunity to uncover patterns that can enhance heart disease prevention, diagnosis, and management strategies across diverse patient populations.

#### **Literature Review**

Heart disease remains a leading cause of morbidity and mortality worldwide, with age and health factors playing critical roles in its onset and progression. Age is a non-modifiable risk factor strongly correlated with increased cardiovascular event susceptibility (D'Agostino et al., 2008; Lloyd-Jones et al., 2006). Modifiable health factors such as elevated cholesterol levels, hypertension, diabetes, obesity, sedentary lifestyles, and smoking are also well-established contributors to heart disease pathogenesis (Yusuf et al., 2004). Gender disparities in heart disease prevalence, clinical presentation, and outcomes have been consistently documented, underlining the need for gender-sensitive diagnostic and treatment frameworks (Hemingway & Marmot, 1999). Furthermore, physiological markers like maximum heart rate achieved during exercise have been shown to influence heart disease risk, with notable gender-specific differences (Gupta et al., 2020). Despite these insights, the complex interplay between age, gender, and other risk factors in predicting heart disease outcomes remains underexplored. Additionally, limited research has investigated how combinations of these factors (e.g., age and maximum heart rate or cholesterol and exercise-induced angina) interact to influence heart disease diagnosis and prognosis.

#### **Research Gaps**

1. **Age and Maximum Heart Rate Interaction**: While maximum heart rate achieved during exercise is linked to heart disease risk, its interaction with age in predicting heart disease has not been extensively studied.
2. **Combinatory Effects of Risk Factors**: The combined influence of multiple health factors, such as cholesterol levels and exercise-induced angina, on heart disease diagnosis is not well understood.
3. **Data-Driven Insights from Diverse Populations**: Existing studies often focus on specific populations, potentially limiting the generalizability of findings. Further research leveraging diverse datasets, like the UCI Heart Disease dataset, can uncover more nuanced patterns.

**Attributes**:

-   Age: The age of the patient in years.
-   Sex: The gender of the patient (1 = male; 0 = female).
-   Chest Pain Type (CP): The type of chest pain experienced by the patient\
    - Value 1: typical angina 
    - Value 2: atypical angina 
    - Value 3: non-anginal pain 
    - Value 4: asymptomatic
-   Resting Blood Pressure (Trestbps): The resting blood pressure of the patient in mm Hg.
-   Cholesterol (Chol): The serum cholesterol level of the patient in mg/dl.
-   Fasting Blood Sugar (Fbs): The fasting blood sugar level of the patient (\>120 mg/dl       signifies high blood sugar). 1: True, 0: False
-   Maximum Heart Rate Achieved (Thalach): The maximum heart rate achieved by the patient      during exercise.
-   Exercise-Induced Angina (Exang):Whether the patient experienced exercise-induced angina     (1 = yes; 0 = no).
-   Num (Heart Disease): diagnosis of heart disease (angiographic disease status) 
    - Value 0: \< 50% diameter narrowing 
    - Value 1: > 50% diameter narrowing

---

### **Research Questions & Hypotheses**

**Research Questions**

1. How do age interacted with maximum heart rate, and age interacted with chest pain type influence the risk of developing heart disease?  

2. How do key health factors, including cholesterol levels, blood pressure, blood sugar, and exercise-induced angina, collectively impact the likelihood of heart disease, and how does this relationship vary among different age groups?  

**Hypotheses**

1. **Age and Maximum Heart Rate**:  
   
   The interaction between age and maximum heart rate significantly predicts the risk of heart disease, with maximum heart rates posing a higher risk in older individuals compared to younger ones. 
   
2. **Age and Chest Pain Type**:
  
  The interaction between age and maximum heart rate significantly predicts the risk of heart disease when we subset the data with atypical and asymptomatic chest pain posing a higher risk in older individuals compared to younger ones. 

3. **Combinatory Health Factors**:  
   
   The combined effect of elevated cholesterol levels, high blood pressure, elevated blood sugar, and exercise-induced angina significantly increases the likelihood of heart disease, and this effect is more pronounced in individuals aged 60 and above compared to younger age groups.

### **Descriptive Statistics**

```{r, results='asis', echo=FALSE}
library(knitr)
library(kableExtra)
# Select specific variables
variables <- c("age_group", "sex", "cholestrol", "fasting_blood_sugar", "resting_BP", "exercise_induced_angina", "heart_disease")

# Summary of these variables
summary_df <- dfSummary(heart_data_final[, variables], 
                        plain.ascii = FALSE,
                        style = "multiline", # Use a multiline style for clarity
                        graph.magnif = 0.75, # Adjust magnification
                        varnumbers = FALSE, # Do not display variable numbers
                        valid.col = FALSE,
                        tmp.img.dir = './tmp',
                        caption = "Summary of Key Variables", # Optional caption
                        theme = "default", # Apply a default theme for better appearance
                        show_valid_col = FALSE) # Show valid column

# Convert to data frame (if not already)
summary_df <- as.data.frame(summary_df)

# Generate HTML table with kable
html_table <- knitr::kable(summary_df, format = "html", escape = FALSE) %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

# Print the table
print(html_table)
```

***

The distribution of variables is as follows:  

- **Age Group:** The majority of individuals (67.1%) are mid-aged (40–59), followed by older adults (60+) at 23.2%. Young adults (18–39) constitute the smallest group, making up 9.7% of the dataset.  
- **Sex:** The dataset is predominantly male (74.2%), with females accounting for only 25.8%.  
- **Cholesterol:** Cholesterol levels vary widely, with a mean of 246.5 mg/dL and a standard deviation of 57.6 mg/dL across 207 distinct values, suggesting significant variability.  
- **Fasting Blood Sugar:** Most individuals (84.9%) have normal fasting blood sugar levels (<120 mg/dL), while 15.1% have elevated levels (≥120 mg/dL).  
- **Resting Blood Pressure:** Resting blood pressure shows notable variation, with a mean of 132.7 mmHg, a standard deviation of 17.8 mmHg, and 56 distinct values.  
- **Exercise-Induced Angina:** A majority (62.6%) do not experience exercise-induced angina, while 37.4% report it.  
- **Heart Disease:** The dataset shows a fairly balanced distribution, with 52.6% of individuals free from heart disease and 47.4% diagnosed with it.  



### **Maximum Heart Rate and Chest Pain Type**

```{r figures-side, fig.show="hold", out.width="50%"}
library(dygraphs)
#distribution of max_heart_rate
ggplot(heart_data_final, aes(x=max_heart_rate)) + 
  geom_histogram(binwidth=5, fill="skyblue", color="black") +
  labs(title="Maximum Heart Rate Distribution")


ggplot(heart_data_final, aes(x = chest_paintype)) +
  geom_bar(position = "dodge", fill = "skyblue", color = "black") +
  geom_text(
    stat = "count",
    aes(label = after_stat(count)),
    vjust = -0.5,  # Adjusts the vertical position of the text
    color = "black",
    size = 3.5     # Adjusts the size of the text
  ) +
  scale_x_discrete(labels = c(
    "1" = "Typical Angina",
    "2" = "Atypical Angina",
    "3" = "Non-Anginal Pain",
    "4" = "Asymptomatic"
  )) +
  labs(
    title = "Chest Pain Type",
    x = "Chest Pain Type",
    y = "Count"
  ) +
  theme_minimal()

```

***

Distribution of Maximum Heart Rate and Chest Pain Type

- **Maximum Heart Rate Distribution:** Follows a normal distribution without any outliers.
- **Chest Pain Type:** The majority of individuals experience asymptomatic chest pain (333 cases), followed by atypical angina (149).

--- 

### **Maximum Heart Rate by Age Group**

```{r}

# Maximum Heart Rate by Age Group
p1 <- ggplot(heart_data_final, aes(x = age_group, y = max_heart_rate, fill=as.factor(age_group))) +
  geom_boxplot(fill = "lightblue", color = "darkblue", outlier.color = "red", outlier.size = 2) +
  geom_jitter(width = 0.2, alpha = 0.5, color = "darkgray") +
  labs(
    title = "Maximum Heart Rate by Age Group",
    x = "Age Group",
    y = "Maximum Heart Rate",
    fill = "Age Group"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p1)

```

***

#### Interpretation of the Plot

1. **Decline in Maximum Heart Rate with Age**:
   - The plot shows a clear trend where the maximum heart rate decreases across age groups. Adults (18-39) have higher maximum heart rates compared to older groups.
   - The median maximum heart rate for the older group (60+) is lower, highlighting physiological changes with aging.

2. **Variation within Age Groups**:
   - Each age group shows variability in maximum heart rates, with the widest range seen in adults (18-39).
   - The data points (jittered) indicate individual variances within each age group.

3. **Clinical Implications**:
   - Understanding the decline in maximum heart rate with age is essential for designing age-appropriate cardiovascular fitness assessments.
   - The results suggest the need for age-adjusted benchmarks when evaluating heart health and exercise capacity.
  
---

### **Chest Pain Type and Heart Disease**

```{r}
# Grouped Bar Plot for Chest Pain Type and Heart Disease
p2 <- ggplot(heart_data_final, aes(x = chest_paintype, fill = as.factor(heart_disease))) +
  geom_bar(position = "dodge", color = "black") +
  scale_fill_brewer(palette = "Pastel1",labels = c("No Heart Disease", "Heart Disease")) +
  scale_x_discrete(labels = c(
    "1" = "Typical Angina",
    "2" = "Atypical Angina",
    "3" = "Non-Anginal Pain",
    "4" = "Asymptomatic"
  )) +
  labs(
    title = "Chest Pain Type and Heart Disease",
    x = "Chest Pain Type",
    y = "Count",
    fill = "Heart Disease"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Convert to interactive plot
ggplotly(p2)

```

***

#### Interpretation of the Plot

1. **Chest Pain Type and Heart Disease Relationship**:
   - The bars in the plot show a clear relationship between chest pain types and heart disease occurrence.
   - "Asymptomatic" chest pain has a high association with heart disease presence, as indicated by the taller blue bar.

2. **Variation Across Chest Pain Types**:
   - Patients with "typical angina" are predominantly not diagnosed with heart disease.
   - "Atypical angina" and "non-anginal pain" show a mix of outcomes, with many not having heart disease.

3. **Implication on Diagnostics**:
   - This data suggests that the type of chest pain can be a significant indicator in diagnosing heart disease, particularly for asymptomatic patients.
   
---
   
### **Logistic Regression Models, Marginal Effects:**

```{r models-side, out.width="50%", fig.show = "hold"}
# Filter the data to have maximum heart rate of 150+
#maximum heart rate for older individuals is 220-60 which is 160, so we only included data that has more than 150 max heart rate
max_heart_data <- heart_data_final %>% filter(max_heart_rate > 150)

# Logistic regression model for Adults with chest_paintype and max_heart_rate
model_max_heartrate <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = max_heart_data, 
              family = binomial(), 
              x = TRUE)

# Filter the data to include only "asymptomatic" chest pain type and max heart rate is more than 150
chestpain4_data <- heart_data_final %>% filter(chest_paintype == "asymptomatic" & max_heart_rate > 150)

# Logistic regression model for chest_paintype "asymptomatic" and age_group
model_chestpain4 <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = chestpain4_data, 
              family = binomial(), 
              x = TRUE)

#chest pain type: atypical angina
chestpain2_data <- heart_data_final %>% filter(chest_paintype == "atypical angina" &
                                              max_heart_rate>150)

# Logistic regression model for chest_paintype "atypical angina" and age_group
model_chestpain2 <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = chestpain2_data, 
              family = binomial(), 
              x = TRUE)


#print(summary(model_max_heartrate))


#marginal Effects

print("Age Group and Maximum Heart Rate")
maBina(model_max_heartrate)

print("Age Group and Maximum Heart Rate, Asymptomatic")
maBina(model_chestpain4)

print("Age Group and Maximum Heart Rate, Atypical Angina")
maBina(model_chestpain2)





```

***

**Data Subset**:  
A subset of heart data was used, focusing only on individuals with a max heart rate above 150. This cutoff was chosen because max heart rate is calculated as 220 - age, and we are particularly interested in older individuals (e.g., for age 60, max heart rate is approximately 160).

**Age Group and Maximum Heart Rate:**
- Mid-aged and older age groups have a negative, statistically significant relationship with heart disease, indicating that as age increases, the probability of heart disease decreases.  
- The interaction between older age and max heart rate is weakly significant (p = 0.063), suggesting a slight increase in heart disease probability when both age and max heart rate are high.

**Age Group and Maximum Heart Rate, Asymptomatic:**
- For individuals with asymptomatic chest pain, the interaction between older age and max heart rate is significant (p = 0.037), showing that both higher age and max heart rate increase the probability of heart disease.  
- Without the interaction term, older and mid-aged groups show a negative, statistically significant relationship with heart disease (p = 0.00).

**Age Group and Maximum Heart Rate, Atypical Angina:**
- None of the interaction terms are statistically significant.  
- The mid-aged group has a negative, statistically significant effect on heart disease probability (p = 0.00).

---

### **Hypothesis 1: Age Group and Max Heart Rate**

#### **Hypothesis 1: Older individuals over 60 with a maximum heart rate above 150 have a higher risk of heart disease**

- We focused on heart data where the maximum heart rate is greater than 150. This is because the expected maximum heart rate is typically calculated as 220 minus the person's age. We are interested in comparing different age groups in all our hypotheses.

#### Interpretation:

1. **Age Group 40-59**:
   - Individuals aged 40-59 show a lower risk of heart disease compared to those over 60, but this difference is not statistically significant (p-value = `0.4014`).

2. **Age Group 60+**:
   - Older individuals (60+) do not show a significantly different risk of heart disease compared to the baseline group (under 40 years old), though the result is close to significance (p-value = `0.0854`).

3. **Maximum Heart Rate**:
   - Maximum heart rate has a weak negative association with heart disease risk, but this effect is not statistically significant (p-value = `0.1007`).

4. **Interaction Between Age Group and Maximum Heart Rate**:
   - For individuals aged 40-59, the interaction with maximum heart rate shows a slight increase in heart disease risk, but this is not statistically significant (p-value = `0.3950`).
   - For individuals aged 60+, the interaction with maximum heart rate significantly increases the risk of heart disease (p-value = `0.0647`).
   
**Conclusion**:
   - The significant interaction between age group (60+) and maximum heart rate suggests that older individuals with higher maximum heart rates are at a greater risk of heart disease. This provides evidence to **reject the null hypothesis** and supports the idea that age and maximum heart rate together influence heart disease risk in older individuals

```{r max-heart-rate}
#interaction effects
heartrate_effects <- allEffects(model_max_heartrate) 

#
heartrate_data <- expand.grid(
  max_heart_rate = seq(min(max_heart_data$max_heart_rate),
                       max(max_heart_data$max_heart_rate),
                       length.out = 100),
  age_group = unique(max_heart_data$age_group)
)

# Predict interaction effects with confidence intervals
predictions1 <- predict(model_max_heartrate, newdata = heartrate_data, type = "terms", se.fit = TRUE)
interaction_effect1 <- predictions1$fit[, "age_group:max_heart_rate"]  # Interaction term
interaction_se1 <- predictions1$se.fit[, "age_group:max_heart_rate"]

heartrate_data <- heartrate_data |> 
  mutate(
    interaction_coefficient = interaction_effect1,  # Interaction term coefficient
    se = interaction_se1,                           # Standard error
    lower_90 = interaction_coefficient - qnorm(0.95) * se,
    upper_90 = interaction_coefficient + qnorm(0.95) * se,
    lower_95 = interaction_coefficient - qnorm(0.975) * se,
    upper_95 = interaction_coefficient + qnorm(0.975) * se,
    significant = ifelse(lower_95 > 0 | upper_95 < 0, "Significant", "Not Significant")
  )

#dataframe of interaction effects
#heartrate_df <- as.data.frame(heartrate_effects$`age_group:max_heart_rate`)

# Plot using ggplot2, including confidence intervals
p3 <- ggplot(heartrate_data, aes(x = max_heart_rate, y = interaction_coefficient)) +
  geom_point(alpha = 0.6) +  # Scatter plot with semi-transparent points
  geom_line(aes(color = significant), linewidth = 1) +  # Line for predicted probabilities
  geom_ribbon(aes(ymin = lower_95, ymax = upper_95, fill = significant), alpha = 0.2) + # Confidence intervals
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(title = "Maximum Heart Rate and Age Group on Heart Disease",
       x = "Maximum Heart Rate",
       y = "Predicted Probability of Heart Disease",
       color = "significant",
       fill = "significant") +  # Legend for confidence intervals
  theme_minimal() +
  theme(legend.position = "top") +
  facet_wrap(~ age_group, scales = "free_y")

#ggplotly(p3)

summary(model_max_heartrate)
```

---

### **Hypothesis 2.1: Asympomatic Chest Pain Type and Heart Disease**

**Hypothesis 2.1 : Older individuals over 60 with a maximum heart rate above 150 and asymptomatic chest pain have a higher risk of heart disease**

- The dataset used for this hypothesis is subsetted to include only asymptomatic chest pain type and maximum heart rate above 150. 

```{r fig.show="hold", out.width="50%"}
# Filter the data to include only "asymptomatic" chest pain type and max heart rate is more than 150
chestpain4_data <- heart_data_final %>% filter(chest_paintype == "asymptomatic" & max_heart_rate > 150)

# Logistic regression model for chest_paintype "asymptomatic" and age_group
model_chestpain4 <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = chestpain4_data, 
              family = binomial(), 
              x = TRUE)

chest4_data <- expand.grid(
  max_heart_rate = seq(min(chestpain4_data$max_heart_rate),
                       max(chestpain4_data$max_heart_rate),
                       length.out = 100),
  chest_paintype = unique(chestpain4_data$chest_paintype),
  age_group = unique(chestpain4_data$age_group)
)

# Predict interaction effects with confidence intervals
predictions2 <- predict(model_chestpain4, newdata = chest4_data, type = "terms", se.fit = TRUE)
interaction_effect2 <- predictions2$fit[, "age_group:max_heart_rate"]  # Interaction term
interaction_se2 <- predictions2$se.fit[, "age_group:max_heart_rate"]

chest4_data <- chest4_data |> 
  mutate(
    interaction_coefficient = interaction_effect2,  # Interaction term coefficient
    se = interaction_se2,                           # Standard error
    lower_90 = interaction_coefficient - qnorm(0.95) * se,
    upper_90 = interaction_coefficient + qnorm(0.95) * se,
    lower_95 = interaction_coefficient - qnorm(0.975) * se,
    upper_95 = interaction_coefficient + qnorm(0.975) * se,
    significant = ifelse(lower_95 > 0 | upper_95 < 0, "Significant", "Not Significant")
  )

#interaction effects
#chestpain4_effects <- allEffects(model_chestpain4) 

#dataframe for interaction effects
#chestpain4_df <- as.data.frame(chestpain4_effects$`age_group:max_heart_rate`)

# Plot using ggplot2, including confidence intervals
p4 <- ggplot(chest4_data, aes(x = max_heart_rate, y = interaction_coefficient)) +
  geom_line(aes(color = significant), linewidth = 1) + # Line for predicted probabilities
  geom_ribbon(aes(ymin = lower_95, ymax = upper_95, fill = significant), alpha = 0.2) +
  # Confidence intervals
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(title = "Heart Disease with Asymptomatic Chest Pain",
       x = "Maximum Heart Rate",
       y = "Predicted Probability of Heart Disease",
       color = "significant",
       fill = "significant") +  # Legend for confidence intervals
  theme_minimal() +
  theme(legend.position = "top") +
  facet_wrap(~ age_group, scales = "free_y") # Hide redundant legend if not needed

ggplotly(p4)

```

***

**Interpretation of the Plot:**

- **Age Group Impact**: The plot shows separate panels for different age groups (Adults 18-39, Mid-aged 40-59, Older 60+). The predicted probability of heart disease with asymptomatic chest pain increases significantly in the Older group, while it remains flat or increases mildly for Adults and Mid-aged groups.

- **Maximum Heart Rate Influence**: Across all age groups, higher maximum heart rates (above 170 bpm) are associated with a higher predicted probability of heart disease, particularly in the Older group where the relationship becomes significant (blue shading).

- **Significance Designation**: The shaded region indicates the degree of significance. The Older group has a significant relationship (blue portion) at higher maximum heart rates, whereas other groups remain non-significant (red shading), irrespective of heart rate.

---

### **Hypothesis 2.2: Atypical Angina Chest Pain Type and Heart Disease**

**Hypothesis 2.2: Older individuals over 60 with a maximum heart rate above 150 and atypical angina chest pain have a higher risk of heart disease**


```{r}
#interaction effects
chestpain2_effects <- allEffects(model_chestpain2) 

#dataframe for interaction effects
chestpain2_df <- as.data.frame(chestpain2_effects$`age_group:max_heart_rate`)

chest2_data <- expand.grid(
  max_heart_rate = seq(min(chestpain2_data$max_heart_rate),
                       max(chestpain2_data$max_heart_rate),
                       length.out = 100),
  chest_paintype = unique(chestpain2_data$chest_paintype),
  age_group = unique(chestpain2_data$age_group)
)

# Predict interaction effects with confidence intervals
predictions3 <- predict(model_chestpain2, newdata = chest2_data, type = "terms", se.fit = TRUE)
interaction_effect3 <- predictions3$fit[, "age_group:max_heart_rate"]  # Interaction term
interaction_se3 <- predictions3$se.fit[, "age_group:max_heart_rate"]

chest2_data <- chest2_data |> 
  mutate(
    interaction_coefficient = interaction_effect3,  # Interaction term coefficient
    se = interaction_se3,                           # Standard error
    lower_90 = interaction_coefficient - qnorm(0.95) * se,
    upper_90 = interaction_coefficient + qnorm(0.95) * se,
    lower_95 = interaction_coefficient - qnorm(0.975) * se,
    upper_95 = interaction_coefficient + qnorm(0.975) * se,
    significant = ifelse(lower_95 > 0 | upper_95 < 0, "Significant", "Not Significant")
  )

#interaction effects
#chestpain4_effects <- allEffects(model_chestpain4) 

#dataframe for interaction effects
#chestpain4_df <- as.data.frame(chestpain4_effects$`age_group:max_heart_rate`)

# Plot using ggplot2, including confidence intervals
p5 <- ggplot(chest2_data, aes(x = max_heart_rate, y = interaction_coefficient)) +
  geom_line(aes(color = significant), linewidth = 1) + # Line for predicted probabilities
  geom_ribbon(aes(ymin = lower_95, ymax = upper_95, fill = significant), alpha = 0.2) +
  # Confidence intervals
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  labs(title = "Heart Disease with Atypical Angina Chest Pain",
       x = "Maximum Heart Rate",
       y = "Predicted Probability of Heart Disease",
       color = "significant",
       fill = "significant") +  # Legend for confidence intervals
  theme_minimal() +
  theme(legend.position = "top") +
  facet_wrap(~ age_group, scales = "free_y") # Hide redundant legend if not needed

#ggplotly(p5)

summary(model_chestpain2)

```
**Interpretation:**

1. **Age Group 40-59**:
   - The coefficient for individuals aged 40-59 is `-34.14913`, indicating a lower risk of heart disease compared to the baseline group (under 40 years old), but this difference is not statistically significant (p-value = `0.317`).

2. **Age Group 60+**:
   - For individuals aged 60 and above, the coefficient is `-12.59926`, suggesting no significant difference in heart disease risk compared to the baseline group, with a p-value of `0.772`.

3. **Maximum Heart Rate**:
   - The coefficient for maximum heart rate is `-0.18183`, indicating a weak negative association with heart disease risk, but this effect is not statistically significant (p-value = `0.380`).

4. **Interaction Effects**:
   - **Age Group 40-59 and Max Heart Rate**: The interaction term has a coefficient of `0.20827`, suggesting a slight increase in heart disease risk with higher maximum heart rates for those aged 40-59, but this is not statistically significant (p-value = `0.326`).
   - **Age Group 60+ and Max Heart Rate**: The interaction term has a coefficient of `0.08217`, indicating a minimal and non-significant increase in risk for older individuals (p-value = `0.763`).

The analysis does not provide statistically significant evidence to support Hypothesis 2.2 that older individuals over 60 with a maximum heart rate above 150 and atypical angina chest pain have a higher risk of heart disease. None of the coefficients, including interaction terms, are statistically significant, indicating that **neither age group nor maximum heart rate** significantly predicts heart disease risk in this subsetted dataset focused on atypical angina cases with high maximum heart rates. Further investigation with additional variables or data might be necessary to explore this hypothesis more effectively.

---

### **Hypothesis 3: Combinatory Health Factors**

**App to check different user inputs and predict probability of heart disease:**
https://bhavanipriya.shinyapps.io/DACSS_604/

```{r}
# Logistic regression with individual predictors
model_health <- glm(heart_disease ~ sex + resting_BP + age_group + cholestrol 
                        + fasting_blood_sugar + exercise_induced_angina,
                        data = heart_data_final, 
                        family = binomial(),
                        x=TRUE)

summary(model_health)
```
**Interpretation**

1. **Sex**: The positive coefficient for `sexMale` (1.54) is statistically significant (`p < 0.001`). This suggests that males have a higher likelihood of heart disease compared to females.

2. **Resting Blood Pressure (`resting_BP`)**: The coefficient (0.0104) is positive and marginally significant (`p = 0.061`). While not very strong, it indicates that higher resting blood pressure is associated with a slight increase in heart disease probability.

3. **Age Group**: 
   - `age_groupMid-aged(40-59)` has a positive coefficient (0.38) but is not statistically significant (`p = 0.27`).
   - `age_groupOlder(60+)` has a positive coefficient (1.36) and is statistically significant (`p < 0.001`), indicating that older adults (60+) have a significantly higher likelihood of heart disease compared to younger adults (18-39).

4. **Cholesterol (`cholestrol`)**: The coefficient (0.0045) is positive and statistically significant (`p = 0.0078`). This indicates that elevated cholesterol levels are associated with an increased probability of heart disease.

5. **Fasting Blood Sugar (`fasting_blood_sugar > 120mg/dL`)**: The coefficient (0.375) is positive but not statistically significant (`p = 0.17`), suggesting it may not have a strong effect on predicting heart disease in this model.

6. **Exercise-Induced Angina (`exercise_induced_anginaYes`)**: The coefficient (2.14) is highly significant (`p < 2e-16`), indicating that individuals experiencing exercise-induced angina have a much higher likelihood of heart disease.

Given the statistically significant effect of the predictors (`age_groupOlder(60+)`, `cholestrol`, and `exercise_induced_anginaYes`), we **reject the null hypothesis**. This indicates strong evidence that the combinatory health factors (elevated cholesterol, high blood pressure, elevated blood sugar, and exercise-induced angina) significantly increase the likelihood of heart disease, particularly in individuals aged 60 and above.


### **Key Findings**

#### 1. **Age and Maximum Heart Rate Interaction**  
- A **significant relationship** exists between age, maximum heart rate, and heart disease risk. Older individuals aged 60+ with a **higher maximum heart rate (>150 bpm)** have an increased probability of heart disease.  
- Younger and mid-aged individuals (18-39, 40-59) do not show significant associations between their maximum heart rate and heart disease risk, indicating that this interaction is particularly relevant for older age groups.

#### 2. **Chest Pain Type as a Predictor**  
- **Asymptomatic chest pain** is strongly associated with the presence of heart disease, making it a key marker during diagnosis.  
- In contrast, patients reporting **typical angina** are predominantly free from heart disease, while **atypical angina** and **non-anginal pain** show a mix of outcomes, with less predictive power.  

#### 3. **Combinatory Health Factors Increase Risk for Older Populations**  
- The combined effect of **elevated cholesterol levels**, **high blood pressure**, **elevated fasting blood sugar**, and **exercise-induced angina** significantly increases the risk of heart disease.  
- Older adults (60+) experience more pronounced effects of these health factors compared to younger counterparts. This finding highlights the need for targeted intervention strategies in older populations.  

#### 4. **Key Predictors of Heart Disease**  
- **Sex**: Males have a significantly higher likelihood of heart disease compared to females.  
- **Exercise-Induced Angina**: Individuals who report experiencing angina during exercise are at a much higher risk, with this factor showing the strongest effect among predictors.
- **Cholesterol**: Elevated cholesterol levels are a significant risk factor for heart disease, underscoring the importance of monitoring and managing cholesterol.  

#### 5. **Decline in Maximum Heart Rate with Age**  
- A clear trend shows that maximum heart rate decreases with age. This physiological change is important for evaluating cardiovascular fitness and designing age-appropriate fitness guidelines.  

#### 6. **Demographic Disparities in the Dataset**  
- The dataset is predominantly **male (74.2%)**, with younger adults (18-39) forming only a small proportion (9.7%) of the sample. This skew in demographics highlights the need to account for potential biases and ensure broader representativeness in future studies.  

#### **Clinical Implications**
These findings emphasize the critical need for **age- and gender-sensitive diagnostic frameworks**. Additionally, focused preventive strategies targeting modifiable risk factors, such as cholesterol, blood pressure, and exercise-induced angina, could substantially reduce heart disease risk, especially among older adults.

### **References:**

-   D'Agostino, R. B., Vasan, R. S., Pencina, M. J., et al. (2008). General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation, 117(6), 743-753.
-   Lloyd-Jones, D. M., Leip, E. P., Larson, M. G., et al. (2006). Prediction of lifetime risk for cardiovascular disease by risk factor burden at 50 years of age. Circulation, 113(6), 791-798.
-   Yusuf, S., Hawken, S., Ounpuu, S., et al. (2004). Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study. The Lancet, 364(9438), 937-952.
-   Hemingway, H., & Marmot, M. (1999). Evidence-based cardiology: psychosocial factors in the aetiology and prognosis of coronary heart disease: systematic review of prospective cohort studies. BMJ, 318(7196), 1460-1467.
-   Gupta, D. K., Wang, T. J., Claggett, B., et al. (2020). Association of plasma biomarkers of cardiovascular risk with maximum heart rate achieved: a population-based study. JAMA Cardiology, 5(9), 1008-1017.
Rewritten:
- ChatGPT usage: Assistance with understanding how to use R Markdown (RMD) and deploy a Shiny app to ShinyApp.io. Help with making code changes and debugging errors.
- https://rstudio.github.io/flexdashboard/articles/examples.html

---