Predictive modelling for cardiovascular health

Objectives
Dataset
Significance
Data Preparation
Data Transformation
Including Plots

Cardiovascular diseases (CVD) are among the leading causes of mortality worldwide, emphasizing the importance of accurate prediction and early intervention. In this project, we aim to develop predictive models to assess the risk of cardiovascular health issues based on various physiological indicators.

Image

Leveraging a dataset containing information such as age, gender, blood pressure, cholesterol levels, and other relevant metrics, we seek to construct models that can effectively identify individuals at higher risk of CVD.

Objectives

Develop predictive models to assess cardiovascular health risks.
Explore the relationships between physiological indicators and cardiovascular health outcomes.
Provide insights into factors contributing to cardiovascular health issues.

Dataset

The dataset used for this analysis contains anonymized data on individuals physiological measurements, including age, gender, pulse rate, blood pressure (systolic and diastolic), cholesterol levels (HDL and LDL), body composition metrics (weight, height, BMI), and other relevant features. Each observation in the dataset represents an individual, and the target variable indicates their cardiovascular health status.

head(body_data)

Significance

Accurate prediction of cardiovascular health risks is crucial for preventive healthcare strategies and personalized interventions. By developing robust predictive models, we can identify individuals at higher risk of CVD and facilitate early interventions, lifestyle modifications, and targeted healthcare services, ultimately improving health outcomes and reducing the burden of cardiovascular diseases.

In the following sections, we will delve into the data preparation, exploratory data analysis, model building, evaluation, and interpretation stages, aiming to construct predictive models that offer valuable insights into cardiovascular health assessment.

Data Preparation

Before proceeding with the analysis, it’s essential to prepare the dataset for modeling. This involves tasks such as exploring its structure, handling missing values, and performing any necessary data transformations.

Data Exploration

We’ll explore the dataset to gain insights into its contents and identify any issues that need to be addressed:

##       AGE         GENDER (1=M)      PULSE           SYSTOLIC     DIASTOLIC     
##  Min.   :18.00   Min.   :0.00   Min.   : 36.00   Min.   : 88   Min.   : 40.00  
##  1st Qu.:31.00   1st Qu.:0.00   1st Qu.: 64.00   1st Qu.:112   1st Qu.: 64.00  
##  Median :46.00   Median :1.00   Median : 72.00   Median :121   Median : 70.00  
##  Mean   :47.04   Mean   :0.51   Mean   : 71.77   Mean   :123   Mean   : 70.75  
##  3rd Qu.:62.00   3rd Qu.:1.00   3rd Qu.: 80.00   3rd Qu.:132   3rd Qu.: 78.00  
##  Max.   :80.00   Max.   :1.00   Max.   :104.00   Max.   :186   Max.   :102.00  
##       HDL              LDL            WHITE             RED       
##  Min.   : 26.00   Min.   : 39.0   Min.   : 2.700   Min.   :3.390  
##  1st Qu.: 43.00   1st Qu.: 85.0   1st Qu.: 5.200   1st Qu.:4.197  
##  Median : 52.00   Median :113.0   Median : 6.200   Median :4.490  
##  Mean   : 53.66   Mean   :113.7   Mean   : 6.542   Mean   :4.538  
##  3rd Qu.: 62.00   3rd Qu.:137.2   3rd Qu.: 7.825   3rd Qu.:4.883  
##  Max.   :138.00   Max.   :251.0   Max.   :14.300   Max.   :6.340  
##      PLATE           WEIGHT           HEIGHT          WAIST       
##  Min.   : 75.0   Min.   : 39.00   Min.   :134.5   Min.   : 64.40  
##  1st Qu.:198.0   1st Qu.: 67.08   1st Qu.:161.6   1st Qu.: 87.88  
##  Median :232.0   Median : 80.50   Median :168.3   Median : 96.95  
##  Mean   :239.4   Mean   : 81.66   Mean   :168.0   Mean   : 99.18  
##  3rd Qu.:263.5   3rd Qu.: 92.80   3rd Qu.:174.6   3rd Qu.:109.10  
##  Max.   :646.0   Max.   :150.40   Max.   :193.3   Max.   :170.50  
##     ARM CIRC          BMI       
##  Min.   :20.50   Min.   :15.90  
##  1st Qu.:29.48   1st Qu.:24.50  
##  Median :33.05   Median :28.00  
##  Mean   :33.08   Mean   :28.91  
##  3rd Qu.:36.33   3rd Qu.:31.98  
##  Max.   :46.60   Max.   :59.00

## named numeric(0)

Data Transformation

In this section, we’ll perform necessary transformations on the dataset to ensure it’s appropriately formatted for our analysis. Specifically, we’ll address the following tasks:

Gender Encoding: We’ll convert the numeric values in the “GENDER” column to more interpretable labels. Since the dataset uses 1 to represent male and 0 to represent female, we’ll replace these values with “Male” and “Female,” respectively.
BMI Categorization: We’ll introduce a new column to categorize BMI (Body Mass Index) into different groups, allowing us to explore potential relationships between BMI categories and cardiovascular health outcomes. To enhance the analysis of BMI’s impact on cardiovascular health, we’ll categorize BMI values into different groups based on standard thresholds (e.g., underweight, normal weight, overweight, and obese). This categorization will provide valuable insights into the relationship between BMI levels and cardiovascular health risks.

By performing these transformations, we ensure that the dataset is appropriately prepared for subsequent analysis, allowing us to gain meaningful insights into the factors influencing cardiovascular health outcomes.

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.