My topic, Heart Health in America: Investigating Elderly Diseases Among Different Demographic Locations,” takes a longitudinal review of data from 2016 to 2021. This data originates from the National Cardiovascular Disease Surveillance System, a crucial component of the Centers for Disease Control and Prevention (CDC), which leverages data from the Centers for Medicare & Medicaid Services (CMS). The dataset was compiled from Medicare and Medicaid claims data, including inpatient and outpatient claims and master beneficiary summary files. The CDC’s Division for Heart Disease and Stroke Prevention (DHDSP) then computed indicators from this data source to ensure comprehensive and accurate data (Centers for Disease Control and Prevention, 2022).
My filtered dataset includes quantitative variables such as the year and data value and categorical variables such as disease type (Heart Disease, Heart Attack, Stroke), categories (High Healthcare Expenditures, Growth Rate of Elderly Population, High Cardiovascular Disease), locations (multiple states), and gender (male and female).
To conduct my analysis, I modified the data by removing unwanted columns with missing or irrelevant information, such as RowID, LocationAbbr, and various PriorityArea columns that had no value for my analysis. I renamed the YearStart column for better readability, shortened the names of major disease categories, and focused on Heart Disease, Heart Attack, and Stroke. I also filtered the data to include only male and female gender categorization and separated the GeoLocation column into latitude and longitude. Additionally, I categorized states based on their characteristics related to elderly populations and cardiovascular health. The dataset also includes information for individuals ages 75 and older.
This topic is significant to me due to a personal experience with a severe health episode in my family, which highlighted the importance of understanding and addressing cardiovascular diseases in the elderly. Witnessing an elderly person having a significant health episode in my living room on a sunny Sunday morning in 2024 changed my life forever and led me to this topic. Also, in my line of work as an investigative analyst, I am drawn to the vulnerability among the elderly, many of whom fall victim to fraud and scams yearly. Thankfully, federal and local governments have instilled measures and provided resources to help thwart these threats and punish offenders. My analysis and recommendations will hopefully contribute to ongoing efforts to safeguard many in our community.
Background Research
The United States population aged 65 and older has grown significantly, increasing five times faster than the total population over the past century (Census, 2020). During the COVID-19 pandemic, this demographic reached 55.8 million, or 16.8% of the total population (Census, 2020). Although older Americans were negatively affected by the COVID-19 Pandemic, they are resilient and strong.
As the American population ages, the focus on health insurance and related care for older adults intensifies. Heart disease, heart attack, and stroke are leading causes of death among the elderly (CDC, 2022; NIH, 2022; AHA, 2021). According to the CDC, heart disease is the leading cause of death for both men and women, accounting for 25% of deaths (CDC, 2022). Understanding the rate in which they are affected is key. The National Institute of Health reports that approximately 805,000 Americans have a heart attack each year, with a significant portion occurring in individuals aged 65 and older (NIH, 2022). In the most staggering statistic in this essay, the American Heart Association (AHA) states that nearly 75% of all strokes occur in people over the age of 65 (AHA, 2021). Research shows that a person in the United States experiences a stroke every 40 seconds (AHA, 2021). Every 40 seconds. That statistic is very high. Given this information, I would like to explore factors that contribute to these numbers in my analysis.
Load libraries
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.1
Warning: package 'dplyr' was built under R version 4.4.1
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(ggalluvial)library(leaflet)
Warning: package 'leaflet' was built under R version 4.4.1
library(RColorBrewer)library(ggdark)
Warning: package 'ggdark' was built under R version 4.4.1
Set Working Directory & Upload Data
setwd("C:/Users/naomi/OneDrive/Desktop/Desktop of 11-08-2022/Community College Classes/DATA 110/Submitted Assignments/Project # 2")elderly <-read_csv("Elderly_Heart_Disease.csv")
Rows: 33454 Columns: 30
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (19): LocationAbbr, LocationDesc, DataSource, PriorityArea1, PriorityAre...
dbl (6): YearStart, Data_Value, Data_Value_Alt, Low_Confidence_Limit, High_...
lgl (5): RowId, PriorityArea2, PriorityArea4, Data_Value_Footnote_Symbol, D...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 6 × 14
Year LocationDesc Topic Question Data_Value_Unit Data_Value
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Maine Stroke Cerebro… Rate per 100,0… 803.
2 2020 Maine Diseases of the Heart… Disease… Rate per 100,0… 3275.
3 2017 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 4890.
4 2017 Florida Diseases of the Heart… Disease… Rate per 100,0… 3409.
5 2020 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 3565.
6 2019 Florida Diseases of the Heart… Prevale… Percent (%) 6.08
# ℹ 8 more variables: Break_Out_Category <chr>, Gender <chr>,
# Data_Value_TypeID <chr>, LocationId <dbl>, Longitude <dbl>, Latitude <dbl>,
# Disease <chr>, Category <chr>
The p values < 2.2e-16 means that the model is highly significant. It suggests that one of the values is related to the Data Value.
The linear regression equation for my model is Data_Value=β0+β1×DiseaseHeartDisease+β2×DiseaseStroke+β3×GenderMale Data_Value=228.16+1361.50×DiseaseHeartDisease+167.13×DiseaseStroke+200.88×GenderMale
The relationship between the variables are positive. The strongest relationship is DiseaseHeartDisease because it has the highest value 1361.50 and smallest p value (< 2e-16). Conversely, the weakest predictor is stroke. It has the lowest value at 167.13, but more importantly, the p-value = 0.015980). Male falls in the middle of these two categories as p-value (0.000396), and it is statistically significant.
R2 = 1 − SS - Sum of Squares residuals/Sum of Squares Total
The multiple R-squared value of 0.2474 means that approximately 24.74% of the variability in the Data value can be explained by the model. For the adjusted R-squared, 0.2458 adjusts the R-squared value based on the number of predictors in the model. In essence, heart disease, stroke, gender are significant predictors of the data value. Heart disease has the highest and most positive effect on data value followed by gender, specifically male then stroke. The model explains 24.74% of the variability.
summary(model)
Call:
lm(formula = Data_Value ~ Disease + Gender, data = cleaned_data)
Residuals:
Min 1Q Median 3Q Max
-1784.5 -464.7 -106.3 524.1 3099.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 228.16 56.57 4.033 5.79e-05 ***
DiseaseHeart Disease 1361.50 69.29 19.650 < 2e-16 ***
DiseaseStroke 167.13 69.29 2.412 0.015980 *
GenderMale 200.88 56.57 3.551 0.000396 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1073 on 1436 degrees of freedom
Multiple R-squared: 0.2474, Adjusted R-squared: 0.2458
F-statistic: 157.3 on 3 and 1436 DF, p-value: < 2.2e-16
plot(model, which =1, main ="Residuals vs Fitted")
I am not versed in this area and may have to take a class to further my understanding. Here is my attempt based on my research. The Residuals vs. Fitted graph helps validate the assumptions of a linear regression model. It appears that the values are clustered around the 500 value and under. The fitted red line veers under the negative values the further away it gets from the lower values. However, on the right hand side of the graph, the values are holding strong past the 1500 point at the 0 intercept.
plot(model, which =2, main ="Normal Q-Q")
Similarly, a normal Q-Q Plot is used to determine if the residuals of a regression model (or any dataset) follow a normal distribution. In this graph, the points that are close to the reference line indicates normality and are stronger. However, the further out or away from the values, the more abnormal or weaker the relationship is.
plot(model, which =3, main ="Scale-Location")
In my view, the scale-location is similar to residuals and fitted graph.
plot(model, which =5, main ="Residuals vs Leverage")
For the Residuals vs. Leverage, Heart Disease remains higher than the other two categories.
custom_theme <-theme(panel.background =element_rect(fill ="lightblue"),panel.grid.major =element_line(color ="white", linewidth =0.5),panel.grid.minor =element_line(color ="white", linewidth =0.2),axis.text =element_text(color ="darkblue"),axis.title =element_text(color ="darkblue", face ="bold"),plot.title =element_text(hjust =0.5, color ="darkblue", face ="bold"),legend.position ="right",legend.title =element_text(face ="bold", color ="darkblue"),legend.background =element_rect(fill ="lightblue"))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
print(line_graph)
The time series measures three categories: heart attack, heart disease, and stroke among men and women from 2016 through 2021. The line graph shows that men consistently have higher rates of heart disease compared to women. Notably, there are periods of reductions for both men and women.
scatter_plot_1 <-ggplot(cleaned_data, aes(x = Year, y = Data_Value, color = Category)) +geom_point(size =2) +scale_color_manual(values =c("darkorange", "cyan", "magenta", "limegreen")) +labs(title ="Scatter Plot: Year vs. Data Value (High Elderly Populations)",x ="Year",y ="Prevalence",color ="Category" ) +facet_wrap(~Category) + custom_themeprint(scatter_plot_1)
scatter_plot_2 <-ggplot(cleaned_data, aes(x = Year, y = Data_Value, color = Category)) +geom_point(size =2) +scale_color_manual(values =c("darkorange", "cyan", "magenta", "limegreen")) +labs(title ="Scatter Plot: Year vs. Data Value (High Health Care Exp)",x ="Year",y ="Prevalence",color ="Category" ) +facet_wrap(~Category) + custom_themeprint(scatter_plot_2)
scatter_plot_3 <-ggplot(cleaned_data, aes(x = Year, y = Data_Value, color = Category)) +geom_point(size =2) +scale_color_manual(values =c("darkorange", "cyan", "magenta", "limegreen")) +labs(title ="Scatter Plot: Year vs. Data Value (Growth Rate of Elderly Pop)",x ="Year",y ="Prevalence",color ="Category" ) +facet_wrap(~Category) + custom_themeprint(scatter_plot_3)
The scatter plot compares states categorized by “Growth Rate of Elderly Population,” “High Cardiovascular Disease,” “High Elderly Populations,” and “High Health Care Expenditure.” States categorized as High Cardiovascular accounts for the highest prevalence followed by High Elderly. States that are described as having the fastest growth rate of elderly population can learn from other states with high elderly population based on the trends.There are noticeable declines in 2020 and 2021 for all categories. One reason may be due to the COVID-19 Pandemic. It will be interesting to see what 2022 and 2023 data shows.
`summarise()` has grouped output by 'Year', 'Disease'. You can override using
the `.groups` argument.
heatmap_plot <-ggplot(heatmap_data, aes(x = Year, y = Disease, fill = mean_value)) +geom_tile(color ="white") +scale_fill_gradient(low ="lightblue", high ="darkblue") +facet_wrap(~ Gender) +labs(title ="Heatmap Analysis of Heart Disease & Gender Prevalence",x ="Year",y ="Disease Type",fill ="Mean Data Value" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold"),legend.position ="right",legend.title =element_text(face ="bold") )print(heatmap_plot)
The heatmap visualizes the mean data by year, gender, and disease type, showing that heart disease consistently dominates across all years, peaking between 1500 to 2000. The chart also consistently shows darker colors for males, indicating higher rates of heart disease among men.
The box plot compares gender and disease type, reaffirming that men dominate in all categories. For heart disease, men have significantly higher values, almost 1000 more than women. This is an opportunity for health campaigns to target their material toward men. Perhaps, encourage men to seek healthcare treatment early as a preventative measure. Again, it is important to reach everyone in an equitable manner.
`summarise()` has grouped output by 'Category', 'Disease'. You can override
using the `.groups` argument.
print(max_data)
# A tibble: 12 × 4
Category Disease LocationDesc max_value
<chr> <chr> <chr> <dbl>
1 Growth Rate of Elderly Pop Heart Attack Texas 870.
2 Growth Rate of Elderly Pop Heart Disease Georgia 4205.
3 Growth Rate of Elderly Pop Stroke Texas 1168.
4 High Cardiovascular Disease Heart Attack Arkansas 1049.
5 High Cardiovascular Disease Heart Disease Arkansas 4733.
6 High Cardiovascular Disease Stroke Louisiana 1426
7 High Elderly Populations Heart Attack West Virginia 1341.
8 High Elderly Populations Heart Disease West Virginia 4890.
9 High Elderly Populations Stroke West Virginia 1236.
10 High Health Care Exp Heart Attack Massachusetts 892.
11 High Health Care Exp Heart Disease Massachusetts 4469.
12 High Health Care Exp Stroke Delaware 1367
This allows me to review the max data for each category to see the states that are most impacted.The summary table highlights the top values for each category by state. Texas leads in both heart attacks and strokes for states with a high growth rate of the elderly population. Arkansas dominates heart attack and heart disease in states with high cardiovascular disease rates. West Virginia leads in all three disease categories among states with high elderly populations. Massachusetts leads in heart attack and heart disease for states with high health care expenditures, while Delaware dominates in stroke for this category.
cleaned_data <- cleaned_data %>%mutate(Category =case_when( Category =="High Elderly Populations"~"High Elderly", Category =="High Health Care Exp"~"High Health Care", Category =="Growth Rate of Elderly Pop"~"Growth Rate Elderly", Category =="High Cardiovascular Disease"~"High Cardiovascular",TRUE~ Category ))
# A tibble: 12 × 4
Category Disease LocationDesc max_value
<chr> <chr> <chr> <dbl>
1 Growth Rate Elderly Heart Attack Texas 870
2 Growth Rate Elderly Heart Disease Georgia 4205
3 Growth Rate Elderly Stroke Texas 1168
4 High Cardiovascular Heart Attack Arkansas 1049
5 High Cardiovascular Heart Disease Arkansas 4733
6 High Cardiovascular Stroke Louisiana 1426
7 High Elderly Heart Attack West Virginia 1341
8 High Elderly Heart Disease West Virginia 4890
9 High Elderly Stroke West Virginia 1236
10 High Health Care Heart Attack Massachusetts 892
11 High Health Care Heart Disease Massachusetts 4469
12 High Health Care Stroke Delaware 1367
Create the theme for upcoming visualization
custom_theme <-theme(panel.background =element_blank(),panel.grid.major =element_line(color ="grey", size =0.2),panel.grid.minor =element_blank(),axis.text =element_text(color ="black"),axis.title =element_text(color ="black", face ="bold"),plot.title =element_text(hjust =0.5, color ="black", face ="bold"),legend.position ="right",legend.title =element_text(face ="bold", color ="black"),legend.background =element_blank())
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
New visualization. Lollipop plot.
lollipop_plot <-ggplot(max_data, aes(x = Category, y = max_value, color = Disease)) +geom_segment(aes(x = Category, xend = Category, y =0, yend = max_value), size =1) +geom_point(size =4) +scale_color_manual(values =c("darkorange", "cyan", "magenta")) +labs(title ="Leading States in 2 Demographic Categories",x ="Category",y ="Prevalence",color ="Disease" ) +geom_text(aes(label =paste(LocationDesc, "\n", max_value)), position =position_stack(vjust =1.1), color ="black", size =2.5) + custom_themeprint(lollipop_plot)
The lollipop plot visualizes the leading states are impacting diseases within the high cardiovascular and high elderly categories.
`summarise()` has grouped output by 'LocationDesc'. You can override using the
`.groups` argument.
Create an alluvial plot
alluvial_plot <-ggplot(alluvial_data,aes(axis1 = Category, axis2 = Disease, axis3 = LocationDesc, y = max_value)) +geom_alluvium(aes(fill = Disease), width =1/12) +geom_stratum(width =1/12, fill ="grey", color ="black") +geom_text(stat ="stratum", aes(label =after_stat(stratum)), size =3) +scale_x_discrete(limits =c("Category", "Disease", "LocationDesc"), expand =c(0.15, 0.05)) +scale_fill_manual(values =c("#FF5733", "#33FF57", "#3357FF")) +labs(title ="Flow of Heart Diseases Across Categories and Locations",x ="Attributes",y ="Prevalance",fill ="Disease" ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold"),legend.position ="right",legend.title =element_text(face ="bold"),axis.title.x =element_blank() )print(alluvial_plot)
The alluvial plot, entitled “Flow of Heart Diseases Across Categories and Locations,” provides a glimpse of pathways for heart attack, heart disease, and stroke. In keeping with earlier visualizations, heart diseases dominate each category, as depicted by the light green colors. Arkansas, Georgia, Massachusetts, and West Virginia have high heart disease rates. Red signifies the heart attack category. The red color weaves throughout the graph in a scarce pattern, though it focuses on West Virginia, Texas, and Massachusetts. It simultaneously touches on each category: Growth Rate Elderly, High Cardiovascular, High Elderly, and High Health Care to the left. However, the numbers for heart attacks are lower when compared to strokes. Delaware, Louisiana, Texas, and West Virginia’s elderly populations have been affected by strokes.
cleaned_data <- cleaned_data %>%mutate(Category =case_when( Category =="High Elderly Populations"~"High Elderly", Category =="High Health Care Exp"~"High Health Care", Category =="Growth Rate of Elderly Pop"~"Growth Rate Elderly", Category =="High Cardiovascular Disease"~"High Cardiovascular",TRUE~ Category ))
# A tibble: 1,440 × 14
Year LocationDesc Topic Question Data_Value_Unit Data_Value
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Maine Stroke Cerebro… Rate per 100,0… 803.
2 2020 Maine Diseases of the Hear… Disease… Rate per 100,0… 3275.
3 2017 West Virginia Diseases of the Hear… Disease… Rate per 100,0… 4890.
4 2017 Florida Diseases of the Hear… Disease… Rate per 100,0… 3409.
5 2020 West Virginia Diseases of the Hear… Disease… Rate per 100,0… 3565.
6 2019 Florida Diseases of the Hear… Prevale… Percent (%) 6.08
7 2018 Florida Diseases of the Hear… Prevale… Percent (%) 6.08
8 2018 Montana Diseases of the Hear… Disease… Rate per 100,0… 3276.
9 2016 Vermont Diseases of the Hear… Disease… Rate per 100,0… 3839.
10 2016 Maine Acute Myocardial Inf… Acute m… Rate per 100,0… 1086.
# ℹ 1,430 more rows
# ℹ 8 more variables: Break_Out_Category <chr>, Gender <chr>,
# Data_Value_TypeID <chr>, LocationId <dbl>, Longitude <dbl>, Latitude <dbl>,
# Disease <chr>, Category <chr>
# A tibble: 6 × 14
Year LocationDesc Topic Question Data_Value_Unit Data_Value
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Maine Stroke Cerebro… Rate per 100,0… 803.
2 2020 Maine Diseases of the Heart… Disease… Rate per 100,0… 3275.
3 2017 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 4890.
4 2017 Florida Diseases of the Heart… Disease… Rate per 100,0… 3409.
5 2020 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 3565.
6 2019 Florida Diseases of the Heart… Prevale… Percent (%) 6.08
# ℹ 8 more variables: Break_Out_Category <chr>, Gender <chr>,
# Data_Value_TypeID <chr>, LocationId <dbl>, Longitude <dbl>, Latitude <dbl>,
# Disease <chr>, Category <chr>
summary(cleaned_data$Data_Value)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.083 6.083 137.491 838.145 1013.975 4889.600
# A tibble: 6 × 15
Year LocationDesc Topic Question Data_Value_Unit Data_Value
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Maine Stroke Cerebro… Rate per 100,0… 803.
2 2020 Maine Diseases of the Heart… Disease… Rate per 100,0… 3275.
3 2017 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 4890.
4 2017 Florida Diseases of the Heart… Disease… Rate per 100,0… 3409.
5 2020 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 3565.
6 2019 Florida Diseases of the Heart… Prevale… Percent (%) 6.08
# ℹ 9 more variables: Break_Out_Category <chr>, Gender <chr>,
# Data_Value_TypeID <chr>, LocationId <dbl>, Longitude <dbl>, Latitude <dbl>,
# Disease <chr>, Category <chr>, Normalized_Value <dbl>
##Create Map based on Normalized data
leaflet_map <-leaflet(normalized_data) |>addTiles() |>addCircleMarkers(~Longitude, ~Latitude,color =~case_when( Disease =="Heart Disease"~"#33FF57", # Bright green for Heart Disease Disease =="Heart Attack"~"#FF5733"# Bright red for Heart Attack ),popup =~paste("<strong>Category:</strong>", Category, "<br>","<strong>Disease:</strong>", Disease, "<br>","<strong>Data Value (%):</strong>", round(Data_Value, 2), "<br>","<strong>Normalized Value:</strong>", round(Normalized_Value, 2), "<br>","<strong>Latitude:</strong>", Latitude, "<br>","<strong>Longitude:</strong>", Longitude ),radius =~Normalized_Value *10, # Scale the radius for better visualizationstroke =FALSE,fillOpacity =0.8 ) |>addLegend("bottomright",colors =c("#33FF57", "#FF5733"),labels =c("Heart Disease", "Heart Attack"),title ="Disease" )
leaflet_map
Cardiovascular disease is a widespread and serious health concern. The dataset used in this analysis includes a wide range of values, from 100 to over 4000, which initially limited the data captured in the interactive map to stroke cases. The interactive map, upon clicking, categorizes the information based on the filtered categories provided: states with High Elderly Populations, High Health Care Expenses, Growth Rate of Elderly Population, and High Cardiovascular Disease rates. The ten states represented in this analysis are Alabama, Arkansas, Florida, Louisiana, Maine, Mississippi, Montana, Oklahoma, Vermont, and West Virginia.
# A tibble: 6 × 15
Year LocationDesc Topic Question Data_Value_Unit Data_Value
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Maine Stroke Cerebro… Rate per 100,0… 803.
2 2020 Maine Diseases of the Heart… Disease… Rate per 100,0… 3275.
3 2017 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 4890.
4 2017 Florida Diseases of the Heart… Disease… Rate per 100,0… 3409.
5 2020 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 3565.
6 2019 Florida Diseases of the Heart… Prevale… Percent (%) 6.08
# ℹ 9 more variables: Break_Out_Category <chr>, Gender <chr>,
# Data_Value_TypeID <chr>, LocationId <dbl>, Longitude <dbl>, Latitude <dbl>,
# Disease <chr>, Category <chr>, Normalized_Value <dbl>
# A tibble: 6 × 15
Year LocationDesc Topic Question Data_Value_Unit Data_Value
<dbl> <chr> <chr> <chr> <chr> <dbl>
1 2021 Maine Stroke Cerebro… Rate per 100,0… 803.
2 2020 Maine Diseases of the Heart… Disease… Rate per 100,0… 3275.
3 2017 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 4890.
4 2017 Florida Diseases of the Heart… Disease… Rate per 100,0… 3409.
5 2020 West Virginia Diseases of the Heart… Disease… Rate per 100,0… 3565.
6 2019 Florida Diseases of the Heart… Prevale… Percent (%) 6.08
# ℹ 9 more variables: Break_Out_Category <chr>, Gender <chr>,
# Data_Value_TypeID <chr>, LocationId <dbl>, Longitude <dbl>, Latitude <dbl>,
# Disease <chr>, Category <chr>, Normalized_Value <dbl>
Utilize New Palette for Variation
color_palette <-brewer.pal(n =3, name ="Set1")
p <-ggplot(filtered_data, aes(x = LocationDesc, y = Data_Value, fill = Disease)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values = color_palette) +labs(title ="A Deeper Took at Cardiovascular Diseases in Lower Income States/Rural Communities",x ="Lower Income States/Rural Communities",y ="Prevalence",fill ="Disease" ) +theme_minimal() + ggdark::dark_theme_minimal() +theme(plot.title =element_text(hjust =0.5, face ="bold", size =10),axis.text.x =element_text(angle =45, hjust =1, size =10),axis.text.y =element_text(size =10),legend.position ="bottom",legend.title =element_text(face ="bold", size =10),legend.text =element_text(size =10) )
Inverted geom defaults of fill and color/colour.
To change them back, use invert_geom_defaults().
This visualization examines states characterized by lower income levels and large rural communities. The disease breakdown reveals consistent trends and patterns throughout the analysis. Heart disease overwhelmingly leads as the primary cause of illness across all categories and states analyzed. This underscores the need for awareness and educational campaigns that promote healthy behaviors, such as better food choices and regular exercise.
Healthy foods need to be normalized. Unfortunately, companies often prioritize cheaper, mass-produced food to maximize profits, compromising quality. For individuals with limited resources and budgets, the increasing prices of healthy food options create a significant barrier due to competing financial demands. Despite this challenge, heart disease remains prevalent even in states with higher health care expenditures or greater earning potential.
It is crucial to consider both the financial costs and the psychological and emotional impacts of heart disease. These ramifications deprive society of healthy aging adults who, due to intentional or culturally influenced behavioral choices, often face a wide array of illnesses. Addressing these issues requires a multifaceted approach that includes improving access to healthy foods and encouraging lifestyle changes across all demographics.
Interesting Findings and Future Analysis
One surprising finding in the analysis is the pervasive impact of heart disease across all categories and states analyzed. Even in states with higher health care expenditures and potentially greater access to medical resources, heart disease remains the leading cause of illness. This suggests that factors beyond just access to healthcare, such as lifestyle choices and socioeconomic conditions, play a significant role in the prevalence of heart disease.
Another notable surprise is the significant gender disparity, with men consistently exhibiting higher rates of cardiovascular diseases compared to women. I would like to explore if this is attributed to biological differences or behavioral and lifestyle factors.
For future research, I would like to investigate access to healthy foods in different regions, including lower income communities (inner city vs. rural). I am also interested in understanding mental health burden on patients and their families as well as the benefits of mental health support (including diet and exercise) in averting or reducing heart diseases.
Sources
Centers for Disease Control and Prevention. (2022).
National Center for Health Statistics. Retrieved from https://www.cdc.gov/nchs/index.htm National Institutes of Health. (2022).
NIH Fact Sheets - Heart Attack. Retrieved from https://report.nih.gov/nihfactsheets/viewfactsheet.aspx?csid=116
American Heart Association. (2021). Heart Disease and Stroke Statistics. Retrieved from https://www.heart.org/en/about-us/heart-and-stroke-association-statistics
United States Census Bureau. (2020). The Aging Population in the United States. Retrieved from https://www.census.gov/topics/population/age-and-sex.html
Health Resources and Services Administration. (2020). Rural Health. Retrieved from https://www.hrsa.gov/rural-health