Probability of Cardiovascular Disease

Peter Phung, Ahmed Elsaeyed, Coffy Andrews, Alec McCabe, Krutika Patel

2022-12-06

Probability of Cardiovascular Disease

This project looks at a data set created by the Framingham Heart Study of 1948.

The data has been used to develop the Framingham Risk Score, an algorithm that estimates the likeliness of a person developing cardiovascular disease in a specified amount of time.

We will be using the data set to train and test predictive models on their ability to estimate the risk of cardiovascular disease.

Keywords

Introduction

Heart disease is the leading cause of death in the United States. It accounts for over one fifth of all deaths per per, with over 600,000 reported deaths in 2020 alone (https://www.cdc.gov/nchs/products/databriefs/db427.htm). To put this further into perspective, roughly one person dies every 34 seconds from cardiovascular disease in the US. It is for these reasons that much effort has been exerted into the study of treatments, medicines, preventative measures and monitoring practices related to heart disease. As it turns out, applied data science techniques and the use of real-world data has proven to be highly effective tools in combating this pressing threat.

Our project objective is to develop a predictive model to be used for the classification of future coronary heart disease in patients, based on select personal attributes and lifestyles. Such a model would help researchers and doctors best help patients, preventing future disease by addressing the current.

Our data is sourced from the Framingham Heart Study, which was initiated by the United States Public Health Service in 1948, under the guidance of President Franklin D. Roosevelt. The study consisted of 5,209 participants with ages between 30-59. Patients were given questionnaires and exams every two years, which expanded over time. The study tracked a large cohort of patients over time and was continued for three generations of the original participants.

Literature Review

Literature Review Cont.

Methodology

Variables

Variable Description
Sex Participant Sex (Male or Female)
Age Age at exam (years)
Education Attained Education
Current Smoker Whether or not the patient is a current smoker
Cigs Per Day The number of cigarettes that the person smoked on average in one day
BP Meds Whether or not the patient was on blood pressure medication
Prevelant Stroke Whether or not the patient had previously had a stroke
Prevalant Hyp Whether or not the patient was hypertensive
Diabetes Whether or not the patient had diabetes
Tot Chol Total cholesterol
Sys BP Systolic blood pressure
Dia BP Diastolic blood pressure
BMI Body Mass Index
Heart Rate Heart rate
Glucose Glucose level
Ten Year CHD 10 year risk of coronary heart disease, ‘TARGET: 1 = Yes | 2 = No’

Histograms

Box Plots

Correlation

Data Cleaning

Data Cleaning Cont.

Transformations and Data Split

Building Models

Models

Model Selection

Model Precision Recall AIC AUC F-score Accuracy Error
Bin. Log. w/ Original Data 0.68 0.11 1939.25 0.71 0.195 0.86 0.14
Bin. Log. w/ Modified Data 0.67 0.02 1507.5 0.7 0.033 0.87 0.13
Step AIC Bin. Log. w/ Original Data 0.7 0.11 1928.55 0.71 0.196 0.86 0.14
Step AIC Bin. Log. w/ Modified Data 0.67 0.02 1496.61 0.69 0.033 0.87 0.13

Model Scores

ROC Curve

Discussion and Conclusion

Refrences

Center for Drug Evaluation and Research. (2021a, January 21). High Blood Pressure Understanding the Silent Killer. U.S. Food And Drug Administration. https://www.fda.gov/drugs/special-features/high-blood-pressure-understanding-silent-killer

Framingham Study | Boston Medical Center. (n.d.). https://www.bmc.org/stroke-and-cerebrovascular-center/research/framingham-study

High cholesterol - Symptoms and causes. (2021, July 20). Mayo Clinic. https://www.mayoclinic.org/diseases-conditions/high-blood-cholesterol/symptoms-causes/syc-20350800

Liu, J., MD. (2022a, July 19). What’s a dangerous heart rate? What’s a Dangerous Heart Rate? | Ohio State Health & Discovery. Retrieved December 5, 2022, from https://health.osu.edu/health/heart-and-vascular/what-is-dangerous-heart-rate

NCBI - WWW Error Blocked Diagnostic. (n.d.). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4159698/

NHS website. (2022, July 4). Low blood pressure (hypotension). nhs.uk. https://www.nhs.uk/conditions/low-blood-pressure-hypotension/

Tachycardia: Symptoms, Causes & Treatment. (n.d.). Cleveland Clinic. https://my.clevelandclinic.org/health/diseases/22108-tachycardia

Slide with R Output

summary(cars)
     speed           dist       
 Min.   : 4.0   Min.   :  2.00  
 1st Qu.:12.0   1st Qu.: 26.00  
 Median :15.0   Median : 36.00  
 Mean   :15.4   Mean   : 42.98  
 3rd Qu.:19.0   3rd Qu.: 56.00  
 Max.   :25.0   Max.   :120.00  

Slide with Plot