Heart Disease Prediction App

Overview

The Heart Disease UCI dataset, consolidated from four sources—Cleveland, Hungarian, Switzerland, and VA—comprises 920 observations and 14 attributes, including the target variable. Each observation represents an individual patient, providing a comprehensive collection of demographic, clinical, and diagnostic data.

The dataset includes key variables such as age, sex, chest pain type, resting blood pressure, cholesterol levels, fasting blood sugar, resting ECG results, maximum heart rate achieved, exercise-induced angina, ST depression, and the slope of the ST segment. These variables are critical for understanding the factors contributing to heart disease and their interrelationships. The target variable indicates the presence or absence of heart disease, making the dataset valuable for both descriptive and predictive modeling.

This expanded dataset offers a richer foundation for analyzing the interactions between age, gender, cholesterol, blood sugar, and other health metrics. It provides an opportunity to uncover patterns that can enhance heart disease prevention, diagnosis, and management strategies across diverse patient populations.

Literature Review

Heart disease remains a leading cause of morbidity and mortality worldwide, with age and health factors playing critical roles in its onset and progression. Age is a non-modifiable risk factor strongly correlated with increased cardiovascular event susceptibility (D’Agostino et al., 2008; Lloyd-Jones et al., 2006). Modifiable health factors such as elevated cholesterol levels, hypertension, diabetes, obesity, sedentary lifestyles, and smoking are also well-established contributors to heart disease pathogenesis (Yusuf et al., 2004). Gender disparities in heart disease prevalence, clinical presentation, and outcomes have been consistently documented, underlining the need for gender-sensitive diagnostic and treatment frameworks (Hemingway & Marmot, 1999). Furthermore, physiological markers like maximum heart rate achieved during exercise have been shown to influence heart disease risk, with notable gender-specific differences (Gupta et al., 2020). Despite these insights, the complex interplay between age, gender, and other risk factors in predicting heart disease outcomes remains underexplored. Additionally, limited research has investigated how combinations of these factors (e.g., age and maximum heart rate or cholesterol and exercise-induced angina) interact to influence heart disease diagnosis and prognosis.

Research Gaps

  1. Age and Maximum Heart Rate Interaction: While maximum heart rate achieved during exercise is linked to heart disease risk, its interaction with age in predicting heart disease has not been extensively studied.
  2. Combinatory Effects of Risk Factors: The combined influence of multiple health factors, such as cholesterol levels and exercise-induced angina, on heart disease diagnosis is not well understood.
  3. Data-Driven Insights from Diverse Populations: Existing studies often focus on specific populations, potentially limiting the generalizability of findings. Further research leveraging diverse datasets, like the UCI Heart Disease dataset, can uncover more nuanced patterns.

Attributes:

  • Age: The age of the patient in years.
  • Sex: The gender of the patient (1 = male; 0 = female).
  • Chest Pain Type (CP): The type of chest pain experienced by the patient
    – Value 1: typical angina – Value 2: atypical angina – Value 3: non-anginal pain – Value 4: asymptomatic
  • Resting Blood Pressure (Trestbps): The resting blood pressure of the patient in mm Hg.
  • Cholesterol (Chol): The serum cholesterol level of the patient in mg/dl.
  • Fasting Blood Sugar (Fbs): The fasting blood sugar level of the patient (>120 mg/dl signifies high blood sugar). 1: True, 0: False
  • Maximum Heart Rate Achieved (Thalach): The maximum heart rate achieved by the patient during exercise.
  • Exercise-Induced Angina (Exang):Whether the patient experienced exercise-induced angina (1 = yes; 0 = no).
  • Num (Heart Disease): diagnosis of heart disease (angiographic disease status) – Value 0: < 50% diameter narrowing – Value 1: > 50% diameter narrowing —

Research Questions & Hypotheses

Research Questions

  1. How do age interacted with maximum heart rate, and age interacted with chest pain type influence the risk of developing heart disease?

  2. How do key health factors, including cholesterol levels, blood pressure, blood sugar, and exercise-induced angina, collectively impact the likelihood of heart disease, and how does this relationship vary among different age groups?

Hypotheses

  1. Age and Maximum Heart Rate:
    The interaction between age and maximum heart rate significantly predicts the risk of heart disease, with maximum heart rates posing a higher risk in older individuals compared to younger ones.

  2. Age and Chest Pain Type: The interaction between age and chest pain type significantly predicts the risk of heart disease, with atypical and asymptomatic chest pain posing a higher risk in older individuals compared to younger ones.

  3. Combinatory Health Factors:
    The combined effect of elevated cholesterol levels, high blood pressure, elevated blood sugar, and exercise-induced angina significantly increases the likelihood of heart disease, and this effect is more pronounced in individuals aged 60 and above compared to younger age groups.


Maximum Heart Rate by Age Group

# Maximum Heart Rate by Age Group
p1 <- ggplot(heart_data_final, aes(x = age_group, y = max_heart_rate, fill=as.factor(age_group))) +
  geom_boxplot(fill = "lightblue", color = "darkblue", outlier.color = "red", outlier.size = 2) +
  geom_jitter(width = 0.2, alpha = 0.5, color = "darkgray") +
  labs(
    title = "Maximum Heart Rate by Age Group",
    x = "Age Group",
    y = "Maximum Heart Rate",
    fill = "Age Group"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p1)

Interpretation of the Plot

  1. Decline in Maximum Heart Rate with Age:
    • The plot shows a clear trend where the maximum heart rate decreases across age groups. Adults (18-39) have higher maximum heart rates compared to older groups.
    • The median maximum heart rate for the older group (60+) is lower, highlighting physiological changes with aging.
  2. Variation within Age Groups:
    • Each age group shows variability in maximum heart rates, with the widest range seen in adults (18-39).
    • The data points (jittered) indicate individual variances within each age group.
  3. Clinical Implications:
    • Understanding the decline in maximum heart rate with age is essential for designing age-appropriate cardiovascular fitness assessments.
    • The results suggest the need for age-adjusted benchmarks when evaluating heart health and exercise capacity.

Chest Pain Type and Heart Disease

# Grouped Bar Plot for Chest Pain Type and Heart Disease
p2 <- ggplot(heart_data_final, aes(x = chest_paintype, fill = as.factor(heart_disease))) +
  geom_bar(position = "dodge", color = "black") +
  scale_fill_brewer(palette = "Pastel1",labels = c("No Heart Disease", "Heart Disease")) +
  scale_x_discrete(labels = c(
    "1" = "Typical Angina",
    "2" = "Atypical Angina",
    "3" = "Non-Anginal Pain",
    "4" = "Asymptomatic"
  )) +
  labs(
    title = "Chest Pain Type and Heart Disease",
    x = "Chest Pain Type",
    y = "Count",
    fill = "Heart Disease"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

# Convert to interactive plot
ggplotly(p2)

Interpretation of the Plot

  1. Chest Pain Type and Heart Disease Relationship:
    • The bars in the plot show a clear relationship between chest pain types and heart disease occurrence.
    • “Asymptomatic” chest pain has a high association with heart disease presence, as indicated by the taller blue bar.
  2. Variation Across Chest Pain Types:
    • Patients with “typical angina” are predominantly not diagnosed with heart disease.
    • “Atypical angina” and “non-anginal pain” show a mix of outcomes, with many not having heart disease.
  3. Implication on Diagnostics:
    • This data suggests that the type of chest pain can be a significant indicator in diagnosing heart disease, particularly for asymptomatic patients.

Age Group and Max Heart Rate

# Filter the data to have maximum heart rate of 150+
#maximum heart rate for older individuals is 220-60 which is 160, so we only included data that has more than 150 max heart rate
max_heart_data <- heart_data_final %>% filter(max_heart_rate > 150)

# Logistic regression model for Adults with chest_paintype and max_heart_rate
model_max_heartrate <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = max_heart_data, 
              family = binomial(), 
              x = TRUE)

#interaction effects
heartrate_effects <- allEffects(model_max_heartrate) 

#dataframe of interaction effects
heartrate_df <- as.data.frame(heartrate_effects$`age_group:max_heart_rate`)

# Plot using ggplot2, including confidence intervals
p3 <- ggplot(heartrate_df, aes(x = max_heart_rate, y = fit, color = age_group)) +
  geom_point(alpha = 0.6) +  # Scatter plot with semi-transparent points
  geom_line(aes(group = age_group)) +  # Line for predicted probabilities
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = age_group), alpha = 0.2, color = NA) +  # Confidence intervals
  labs(title = "Effect of Maximum Heart Rate and Age Group on Heart Disease",
       x = "Maximum Heart Rate",
       y = "Predicted Probability of Heart Disease",
       color = "Age Group",
       fill = "Age Group") +  # Legend for confidence intervals
  theme_minimal() +
  theme(legend.position = "top")

ggplotly(p3)

Interpretation of the Plot

  1. Increasing Risk for Older Age Group:
    • The blue line shows that for the older age group (60+), the predicted probability of heart disease increases as the maximum heart rate rises. This suggests a strong relationship between high heart rates and heart disease risk in older adults.
  2. Decreasing Trend in Younger Age Groups:
    • The lines for the younger (18-39) and mid-aged (40-59) groups show a decreasing trend, indicating that higher maximum heart rates are associated with a lower predicted probability of heart disease in these groups.
  3. Confidence Intervals:
    • The shaded ribbons represent confidence intervals. For older adults, the interval widens with higher heart rates, indicating more variability and uncertainty in predictions at these levels.

Asympomatic Chest Pain and Heart Disease

# Filter the data to include only "asymptomatic" chest pain type and max heart rate is more than 150
chestpain4_data <- heart_data_final %>% filter(chest_paintype == "asymptomatic" & max_heart_rate > 150)

# Logistic regression model for chest_paintype "asymptomatic" and age_group
model_chestpain4 <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = chestpain4_data, 
              family = binomial(), 
              x = TRUE)

#interaction effects
chestpain4_effects <- allEffects(model_chestpain4) 

#dataframe for interaction effects
chestpain4_df <- as.data.frame(chestpain4_effects$`age_group:max_heart_rate`)

# Plot using ggplot2, including confidence intervals
p4 <- ggplot(chestpain4_df, aes(x = max_heart_rate, y = fit, color = age_group)) +
  geom_point(alpha = 0.6) +  # Scatter plot with semi-transparent points
  geom_line(aes(group = age_group)) +  # Line for predicted probabilities
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = age_group), alpha = 0.2, color = NA) +  # Confidence intervals
  labs(title = "Heart Disease with Asymptomatic Chest Pain",
       x = "Maximum Heart Rate",
       y = "Predicted Probability of Heart Disease",
       color = "Age Group",
       fill = "Age Group") +  # Legend for confidence intervals
  theme_minimal() +
  theme(legend.position = "top") # Hide redundant legend if not needed

ggplotly(p4)

Interpretation of the Plot:

  1. High Probability for Older Adults with Low Heart Rates:
    • In the older age group (60+), at lower maximum heart rates, the predicted probability of heart disease starts high but stabilizes as heart rate increases.
  2. Distinct Trends by Age Group:
    • The younger and mid-aged groups show a decreasing trend, especially in asymptomatic cases, suggesting that the absence of symptoms combined with higher heart rates might indicate a lower risk of heart disease.
  3. Broader Implications:
    • These plots highlight the importance of assessing heart disease risk differently across age groups, considering maximum heart rate as a significant factor, especially for asymptomatic patients.

Atypical Angina Chest Pain and Heart Disease

#chest pain type: atypical angina
chestpain2_data <- heart_data_final %>% filter(chest_paintype == "atypical angina" &
                                              max_heart_rate>150)

# Logistic regression model for chest_paintype "atypical angina" and age_group
model_chestpain2 <- glm(heart_disease ~ age_group*max_heart_rate, 
              data = chestpain2_data, 
              family = binomial(), 
              x = TRUE)

#interaction effects
chestpain2_effects <- allEffects(model_chestpain2) 

#dataframe for interaction effects
chestpain2_df <- as.data.frame(chestpain2_effects$`age_group:max_heart_rate`)

# Plot using ggplot2, including confidence intervals
p5<-ggplot(chestpain2_df, aes(x = max_heart_rate, y = fit, color = age_group)) +
  geom_point(alpha = 0.6) +  # Scatter plot with semi-transparent points
  geom_line(aes(group = age_group)) +  # Line for predicted probabilities
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = age_group), alpha = 0.2, color = NA) +  # Confidence intervals
  labs(title = "Heart Disease with Atypical Angina Chest Pain",
       x = "Maximum Heart Rate",
       y = "Predicted Probability of Heart Disease",
       color = "Age Group",
       fill = "Age Group") +  # Legend for confidence intervals
  theme_minimal() +
  theme(legend.position = "top")  # Hide redundant legend if not needed

ggplotly(p5)

Input Panel

# Logistic regression with individual predictors
model_health <- glm(heart_disease ~ sex + resting_BP + age_group + cholestrol 
                        + fasting_blood_sugar + exercise_induced_angina,
                        data = heart_data_final, 
                        family = binomial(),
                        x=TRUE)

shinyApp(
# Define UI for the app
ui <- fluidPage(
  titlePanel("Heart Disease Prediction App"),
  
  sidebarLayout(
    sidebarPanel(
      h2("Enter Your Health Data"),
      br(),
      selectInput("sex", "Sex:", choices = c("Female" = "Female", "Male" = "Male")),
      numericInput("resting_BP", "Resting Blood Pressure:", value = 120, min = 50, max = 200, step = 1),
      selectInput("age_group", "Age Group:",  
                  choices = c("Adults(18-39)" = "Adults(18-39)", 
                              "Mid-aged(40-59)" = "Mid-aged(40-59)", 
                              "Older(60+)" = "Older(60+)")),
      numericInput("cholestrol", "Cholesterol Level:", value = 200, min = 85, max = 605, step = 1),
      selectInput("fasting_blood_sugar", "Fasting Blood Sugar:", 
                  choices = c("<=120mg/dL" = "<=120mg/dL", ">120mg/dL" = ">120mg/dL")),
      selectInput("exercise_induced_angina", "Exercise Induced Angina:", 
                  choices = c("No" = "No", "Yes" = "Yes")),
      br(),
      actionButton("predict", "Predict", class = "btn-primary")
    ),
    
    mainPanel(
      h3("Prediction Result:"),
      textOutput("prediction"),
      br(),
      uiOutput("additional_info")  # Placeholder for additional information or suggestions
    )
  )
),

# Define server logic
server <- function(input, output) {
  observeEvent(input$predict, {
    # Create a new data frame with user inputs
    new_data <- data.frame(
      sex = input$sex,
      resting_BP = as.numeric(input$resting_BP),
      age_group = input$age_group,
      cholestrol = as.numeric(input$cholestrol),
      fasting_blood_sugar = input$fasting_blood_sugar,
      exercise_induced_angina = input$exercise_induced_angina,
      stringsAsFactors = FALSE # Prevent automatic conversion
    )
    
    # Match factor levels with training dataset
    new_data$sex <- factor(new_data$sex, levels = levels(heart_data_final$sex))
    new_data$age_group <- factor(new_data$age_group, levels = levels(heart_data_final$age_group))
    new_data$fasting_blood_sugar <- factor(new_data$fasting_blood_sugar, levels = levels(heart_data_final$fasting_blood_sugar))
    new_data$exercise_induced_angina <- factor(new_data$exercise_induced_angina, levels = levels(heart_data_final$exercise_induced_angina))
    
    # Predict probability using the logistic regression model
    prediction <- predict(model_health, newdata = new_data, type = "response")
    
    # Check if prediction is NA
    if (is.na(prediction)) {
      output$prediction <- renderText("Error: Unable to calculate probability. Please check your inputs.")
    } else {
      # Render prediction output
      output$prediction <- renderText({
        paste("The predicted probability of heart disease is:", round(prediction * 100, 2), "%")
      })
    }
    
    # Provide additional information or suggestions
    output$additional_info <- renderUI({
      if (prediction > 0.5) {
        p("The prediction indicates a higher risk of heart disease. It is recommended to consult a healthcare professional for further advice.")
      } else {
        p("The prediction indicates a lower risk of heart disease. Maintain a healthy lifestyle and monitor your health regularly.")
      }
    })
  })
},
options = list(vwidth = NULL, height = 500))

Key Findings

  1. Interaction analyses reveal significant relationships between age, chest pain type, and max heart rate.
  2. Cholesterol and blood pressure remain strong predictors of heart disease.

Next Steps

  1. Incorporate additional datasets to test generalizability.
  2. Explore non-linear modeling techniques to improve prediction accuracy.
  3. Develop actionable guidelines based on the results.