Cardiovascular health statistics

Introduction

For this analysis, I selected the Heart Disease Health Indicators Dataset from the CDC’s 2020 annual survey, which includes data from over 300,000 adults in the United States. This dataset contains both categorical and quantitative variables, offering valuable insights into various aspects of cardiovascular health. I chose this dataset specifically to examine the relationship between unhealthy habits—such as smoking and diabetes—and the prevalence of heart disease. By analyzing these factors, I aim to better understand how lifestyle choices contribute to the development of heart disease and to identify potential risk factors that can help inform preventive health strategies.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor) # for cleaning column names

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

# Load the dataset
setwd("/Users/ayomidealagbada/AYOMIDE'S DATAVISUALITIOM")

cleaning the column names

# Read and clean the data
heart2020 <- read_csv("heart_2020_cleaned.csv") %>%
  clean_names() %>%
  # Convert character variables to factors
  mutate(
    heart_disease = as.factor(heart_disease),
    smoking = as.factor(smoking),
    alcohol_drinking = as.factor(alcohol_drinking),
    stroke = as.factor(stroke),
    race = as.factor(race),
    sex = as.factor(sex),
    age_category = as.factor(age_category),
    diabetic = as.factor(diabetic)
  )

## Rows: 319795 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): HeartDisease, Smoking, AlcoholDrinking, Stroke, DiffWalking, Sex, ...
## dbl  (4): BMI, PhysicalHealth, MentalHealth, SleepTime
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

creating an intractive plot for body mass and heart disease

# Load necessary library
library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

# Create the interactive plot for BMI and heart disease
bmi_plot <- plot_ly(
  data = heart2020, 
  x = ~bmi, 
  color = ~heart_disease, 
  type = "violin",
  box = list(visible = TRUE), 
  meanline = list(visible = TRUE)
) %>%
  layout(
    title = list(text = "BMI Distribution by Heart Disease Status"),
    xaxis = list(title = "BMI"),
    yaxis = list(title = "Density")
  )

# Display the plot
bmi_plot

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

examing the relationship between smoking and heart disease

ggplot(heart2020, aes(x = smoking, fill = heart_disease)) +
  geom_density(alpha = 0.7) +
  theme_minimal() +
  labs(title = "Heart Disease by Smoking Status",
       x = "Smoking Status",
       y = "Count")

Visualize the distribution of heart disease across different age categories

ggplot(heart2020, aes(x = age_category, fill = heart_disease)) +
  geom_bar(position = "dodge") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Heart Disease by Age Category",
       x = "Age Category",
       y = "Count")

# Create a numeric version of heart disease (0/1)
heart2020$heart_disease_num <- as.numeric(heart2020$heart_disease) - 1

linear regrinon between the heart disease number and unhealthy habits

# Fit linear regression model with main risk factors
model <- lm(heart_disease_num ~  smoking + alcohol_drinking +  diabetic,
            data = heart2020)

# Display model summary
summary(model)

## 
## Call:
## lm(formula = heart_disease_num ~ smoking + alcohol_drinking + 
##     diabetic, data = heart2020)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.2500 -0.1021 -0.0447 -0.0447  1.0142 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      0.0447017  0.0006651  67.207  < 2e-16 ***
## smokingYes                       0.0573516  0.0009906  57.895  < 2e-16 ***
## alcohol_drinkingYes             -0.0365537  0.0019359 -18.882  < 2e-16 ***
## diabeticNo, borderline diabetes  0.0486049  0.0033625  14.455  < 2e-16 ***
## diabeticYes                      0.1479393  0.0014582 101.453  < 2e-16 ***
## diabeticYes (during pregnancy)  -0.0223900  0.0054311  -4.123 3.75e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2734 on 319789 degrees of freedom
## Multiple R-squared:  0.04473,    Adjusted R-squared:  0.04472 
## F-statistic:  2995 on 5 and 319789 DF,  p-value: < 2.2e-16

Explantion

it shows that If you particpated in unhealthy habbit you are more likely to be in the demographic of those who have a heart disease

Brief essay

This study underscores the importance of public health initiatives aimed at reducing smoking and managing chronic conditions to mitigate the risks associated with heart disease. Early intervention and lifestyle changes are essential strategies in preventing heart disease, particularly for those with these risk factors. Understanding these relationships, as seen in the data, can help guide more effective preventive measures and health policies.

Reference List:

World Health Organization. “Cardiovascular diseases (CVDs) Fact Sheet.” Retrieved from https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds).

World Health Organization. “Cardiovascular diseases.” Retrieved from https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1.

https://www.cdc.gov/heart-disease/risk-factors/?CDC_AAref_Val=https://www.cdc.gov/heartdisease/risk_factors.htm

Challenges and What Could Not Be Shown

Incorporating geographic data (e.g., regional prevalence of CVD risk factors) could provide additional layers of insight. Unfortunately, the dataset does not include location information.

Longitudinal Trends:

A limitation of the dataset is its cross-sectional nature, which prevents examining trends over time. Adding a time dimension could show how BMI or sleep patterns evolve and impact physical health.

Conclusion:

From my analysis of the Heart Disease Health Indicators Dataset from the CDC’s 2020 annual survey, which includes responses from over 300,000 adults in the United States, it is clear that individuals who engage in unhealthy behaviors, such as smoking, and have a history of conditions like diabetes, are at a significantly higher risk of developing heart disease. The dataset, which incorporates both categorical and quantitative variables, offers valuable insights into the relationship between lifestyle habits and cardiovascular health. By examining the links between factors like smoking, diabetes, and heart disease, my analysis suggests that unhealthy habits play a critical role in the onset and progression of heart disease.