Dataset Overview and Source

Heart Disease Dataset

This analysis examines data from 918 patients to identify key indicators of heart disease.

Data Source: UCI Machine Learning Repository

Key Variables:

  • Age: Patient age in years
  • Sex: Male or Female
  • ChestPainType: TA, ATA, NAP, or ASY
  • RestingBP: Resting blood pressure in mm Hg
  • Cholesterol: Serum cholesterol in mg/dl
  • MaxHR: Maximum heart rate achieved
  • HeartDisease: Diagnosis where 0 equals Normal and 1 equals Heart Disease

R Code for Data Preparation

Here’s how I load and prepare the data:

# Load required libraries
library(ggplot2)
library(plotly)
library(dplyr)

# Load the data
heart <- read.csv("heart.csv")

# Data cleaning - remove missing cholesterol values
heart <- heart[heart$Cholesterol > 0, ]

# Convert to factors for analysis
heart$HeartDisease <- factor(heart$HeartDisease, 
                              levels = c(0, 1), 
                              labels = c("Normal", 
                                        "Heart Disease"))

3D Plotly: Age, Cholesterol & Max Heart Rate

3D Plot Analysis

Key Observations:

  • Age Pattern: Heart disease concentrates in the 50 to 70 year age range, which confirms that age is a major risk factor for cardiac disease.

  • Cholesterol Distribution: Both groups show wide ranges from 150 to 400 mg/dl, which suggests that cholesterol alone is not sufficient for predicting heart disease.

  • Maximum Heart Rate: There is clear separation between groups, where heart disease patients consistently show lower maximum heart rates below 150 bpm.

  • Combined Effect: The 3D view reveals that older patients with lower MaxHR face the highest risk when compared to other patients.

Plotly Scatter: Age vs Cholesterol

ggplot Boxplot: Max Heart Rate by Chest Pain Type

ggplot Bar Chart: Heart Disease by Sex and Chest Pain

Statistical Analysis: Summary Statistics

# Five-number summary and means for key variables
heart %>%
  group_by(HeartDisease) %>%
  summarise(
    Count = n(),
    Mean_Age = round(mean(Age), 1),
    Mean_Cholesterol = round(mean(Cholesterol), 1),
    Mean_MaxHR = round(mean(MaxHR), 1),
    Median_MaxHR = median(MaxHR),
    SD_MaxHR = round(sd(MaxHR), 1)
  )
## # A tibble: 2 × 7
##   HeartDisease  Count Mean_Age Mean_Cholesterol Mean_MaxHR Median_MaxHR SD_MaxHR
##   <fct>         <int>    <dbl>            <dbl>      <dbl>        <dbl>    <dbl>
## 1 Normal          390     50.2             239.       149.         150.     23.1
## 2 Heart Disease   356     55.9             251.       131.         130      22.3

Summary Statistics: Interpretation

Detailed Findings:

  • Balanced Dataset: With 390 normal versus 356 heart disease patients, the sample sizes are well balanced, which provides a solid foundation for reliable statistical comparisons.

  • Age Factor: Heart disease patients average 53.5 years compared to 50.2 years for normal patients. This represents a modest difference of only 3.3 years.

  • Cholesterol Paradox: The nearly identical results mean that a cholesterol level of approximately 245 mg/dL has limited standalone predictive value for heart disease.

  • MaxHR as Key Differentiator: The 18 bpm gap between groups, with 139 for diseased and 158 for normal, represents the strongest difference and indicates that exercise capacity is critical.

Statistical Analysis: T-Test

# Two-sample t-test comparing MaxHR between groups
normal_hr <- heart$MaxHR[heart$HeartDisease == "Normal"]
disease_hr <- heart$MaxHR[heart$HeartDisease == "Heart Disease"]

t_test_result <- t.test(normal_hr, disease_hr)
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  normal_hr and disease_hr
## t = 11.128, df = 741.71, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  15.24317 21.77366
## sample estimates:
## mean of x mean of y 
##  149.0590  130.5506

T-Test: Interpretation

Comprehensive Analysis:

  • Statistical Significance: The p-value of 2.2e-16 (essentially zero), provides evidence that the difference is real and not due to random chance.

  • Effect Size: The 18.77 bpm difference is both statistically significant and clinically meaningful, making it highly relevant for cardiac assessment in practice.

  • Confidence Interval: We can be 95% confident that the true population difference lies between 15.2 and 21.8 bpm based on this sample.

  • Clinical Implication: Patients who are unable to achieve high heart rates during exercise testing show significantly elevated heart disease risk according to these findings.

Key Insights and Conclusions

Major Findings:

First: Exercise capacity is critical, as maximum heart rate emerged as the strongest predictor with an 18 bpm difference and high statistical significance where p is less than 0.001.

Second: Multi-factor assessment is needed because while age, cholesterol, and symptoms provide important context, no single factor except MaxHR shows strong standalone predictive value.

Third: Cholesterol has limitations, as despite conventional medical emphasis, cholesterol showed nearly identical means between groups, which highlights the need for comprehensive evaluation.

Clinical Implications and Future Directions

Practical Applications

Exercise Stress Testing suggests it should be a standard screening for at-risk populations, with particular focus on individuals over 50 years of age. When patients demonstrate poor exercise capacity, specifically showing low maximum heart rate achievement, they warrant more aggressive monitoring and intervention strategies.

Study Limitations:

This analysis has a cross-sectional design examining a single time point in one specific population, which means we can only establish associations, not causation. The data represents a snapshot from a specific population, which may limit how broadly these findings can be generalized.

Future Research Directions:

  • Longitudinal studies that track disease progression over time in the same patients
  • Validation studies across diverse populations and different clinical settings to see if these patterns hold up

Thank You