Cardiovascular diseases (CVD) are among the leading causes of mortality worldwide, emphasizing the importance of accurate prediction and early intervention. In this project, we aim to develop predictive models to assess the risk of cardiovascular health issues based on various physiological indicators.
Image
Leveraging a dataset containing information such as age, gender, blood pressure, cholesterol levels, and other relevant metrics, we seek to construct models that can effectively identify individuals at higher risk of CVD.
Develop predictive models to assess cardiovascular health risks.
Explore the relationships between physiological indicators and cardiovascular health outcomes.
Provide insights into factors contributing to cardiovascular health issues.
The dataset used for this analysis contains anonymized data on individuals physiological measurements, including age, gender, pulse rate, blood pressure (systolic and diastolic), cholesterol levels (HDL and LDL), body composition metrics (weight, height, BMI), and other relevant features. Each observation in the dataset represents an individual, and the target variable indicates their cardiovascular health status.
head(body_data)
Accurate prediction of cardiovascular health risks is crucial for preventive healthcare strategies and personalized interventions. By developing robust predictive models, we can identify individuals at higher risk of CVD and facilitate early interventions, lifestyle modifications, and targeted healthcare services, ultimately improving health outcomes and reducing the burden of cardiovascular diseases.
In the following sections, we will delve into the data preparation, exploratory data analysis, model building, evaluation, and interpretation stages, aiming to construct predictive models that offer valuable insights into cardiovascular health assessment.
Before proceeding with the analysis, it’s essential to prepare the dataset for modeling. This involves tasks such as exploring its structure, handling missing values, and performing any necessary data transformations.
We’ll explore the dataset to gain insights into its contents and identify any issues that need to be addressed:
## AGE GENDER (1=M) PULSE SYSTOLIC DIASTOLIC
## Min. :18.00 Min. :0.00 Min. : 36.00 Min. : 88 Min. : 40.00
## 1st Qu.:31.00 1st Qu.:0.00 1st Qu.: 64.00 1st Qu.:112 1st Qu.: 64.00
## Median :46.00 Median :1.00 Median : 72.00 Median :121 Median : 70.00
## Mean :47.04 Mean :0.51 Mean : 71.77 Mean :123 Mean : 70.75
## 3rd Qu.:62.00 3rd Qu.:1.00 3rd Qu.: 80.00 3rd Qu.:132 3rd Qu.: 78.00
## Max. :80.00 Max. :1.00 Max. :104.00 Max. :186 Max. :102.00
## HDL LDL WHITE RED
## Min. : 26.00 Min. : 39.0 Min. : 2.700 Min. :3.390
## 1st Qu.: 43.00 1st Qu.: 85.0 1st Qu.: 5.200 1st Qu.:4.197
## Median : 52.00 Median :113.0 Median : 6.200 Median :4.490
## Mean : 53.66 Mean :113.7 Mean : 6.542 Mean :4.538
## 3rd Qu.: 62.00 3rd Qu.:137.2 3rd Qu.: 7.825 3rd Qu.:4.883
## Max. :138.00 Max. :251.0 Max. :14.300 Max. :6.340
## PLATE WEIGHT HEIGHT WAIST
## Min. : 75.0 Min. : 39.00 Min. :134.5 Min. : 64.40
## 1st Qu.:198.0 1st Qu.: 67.08 1st Qu.:161.6 1st Qu.: 87.88
## Median :232.0 Median : 80.50 Median :168.3 Median : 96.95
## Mean :239.4 Mean : 81.66 Mean :168.0 Mean : 99.18
## 3rd Qu.:263.5 3rd Qu.: 92.80 3rd Qu.:174.6 3rd Qu.:109.10
## Max. :646.0 Max. :150.40 Max. :193.3 Max. :170.50
## ARM CIRC BMI
## Min. :20.50 Min. :15.90
## 1st Qu.:29.48 1st Qu.:24.50
## Median :33.05 Median :28.00
## Mean :33.08 Mean :28.91
## 3rd Qu.:36.33 3rd Qu.:31.98
## Max. :46.60 Max. :59.00
## named numeric(0)
In this section, we’ll perform necessary transformations on the dataset to ensure it’s appropriately formatted for our analysis. Specifically, we’ll address the following tasks:
Gender Encoding: We’ll convert the numeric values in the “GENDER” column to more interpretable labels. Since the dataset uses 1 to represent male and 0 to represent female, we’ll replace these values with “Male” and “Female,” respectively.
BMI Categorization: We’ll introduce a new column to categorize BMI (Body Mass Index) into different groups, allowing us to explore potential relationships between BMI categories and cardiovascular health outcomes. To enhance the analysis of BMI’s impact on cardiovascular health, we’ll categorize BMI values into different groups based on standard thresholds (e.g., underweight, normal weight, overweight, and obese). This categorization will provide valuable insights into the relationship between BMI levels and cardiovascular health risks.
By performing these transformations, we ensure that the dataset is appropriately prepared for subsequent analysis, allowing us to gain meaningful insights into the factors influencing cardiovascular health outcomes.
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.