A. Introduction

For this project, I will explore the question: What are the top three leading causes of death in the United States? This analysis will help me understand the overall trends in death causes in the United states. The dataset used for this analysis is titled “NCHS – Leading Causes of Death, United States” and is available through Data.gov (https://catalog.data.gov/dataset/nchs-leading-causes-of-death-united-states ). It includes data from 1999, recording the number of deaths and age adjusted death rates for the ten leading causes of death across all U.S. states.

B. Data Analysis

I will perform a descriptive data analysis using the dataset “NCHS - Leading Causes of Death, United States.” The goal of my analysis will be to analyze the trends of the top three causes of death in the United States. First, I will import and explore the dataset using the functions str() and summary() to understand the data. This will help me understand which columns are important to my question. Next, I will clean and select only some columns, like, Year, Cause Name, Deaths, and Age-adjusted Death Rate, using the select() function. Doing this will help me to focus only on the information to answer my question. After that, I will use filtering and sorting functions to lessen the dataset to a single year like 2017 and then identify the top three causes of death based on the highest number of deaths. Finally, I will create a bar plot to visualize these top causes. The bar plot will display the number of deaths for each cause, helping illustrate which causes are most prevalent in 2017.

Exploratory data analysis (EDA)

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("~/Downloads")
leading_causeof_death <- read_csv("NCHS_-_Leading_Causes_of_Death__United_States.csv")
## Rows: 10868 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): 113 Cause Name, Cause Name, State
## dbl (3): Year, Deaths, Age-adjusted Death Rate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(leading_causeof_death)
## spc_tbl_ [10,868 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Year                   : num [1:10868] 2017 2017 2017 2017 2017 ...
##  $ 113 Cause Name         : chr [1:10868] "Accidents (unintentional injuries) (V01-X59,Y85-Y86)" "Accidents (unintentional injuries) (V01-X59,Y85-Y86)" "Accidents (unintentional injuries) (V01-X59,Y85-Y86)" "Accidents (unintentional injuries) (V01-X59,Y85-Y86)" ...
##  $ Cause Name             : chr [1:10868] "Unintentional injuries" "Unintentional injuries" "Unintentional injuries" "Unintentional injuries" ...
##  $ State                  : chr [1:10868] "United States" "Alabama" "Alaska" "Arizona" ...
##  $ Deaths                 : num [1:10868] 169936 2703 436 4184 1625 ...
##  $ Age-adjusted Death Rate: num [1:10868] 49.4 53.8 63.7 56.2 51.8 33.2 53.6 53.2 61.9 61 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Year = col_double(),
##   ..   `113 Cause Name` = col_character(),
##   ..   `Cause Name` = col_character(),
##   ..   State = col_character(),
##   ..   Deaths = col_double(),
##   ..   `Age-adjusted Death Rate` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(leading_causeof_death)
##       Year      113 Cause Name      Cause Name           State          
##  Min.   :1999   Length:10868       Length:10868       Length:10868      
##  1st Qu.:2003   Class :character   Class :character   Class :character  
##  Median :2008   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2008                                                           
##  3rd Qu.:2013                                                           
##  Max.   :2017                                                           
##      Deaths        Age-adjusted Death Rate
##  Min.   :     21   Min.   :   2.6         
##  1st Qu.:    612   1st Qu.:  19.2         
##  Median :   1718   Median :  35.9         
##  Mean   :  15460   Mean   : 127.6         
##  3rd Qu.:   5756   3rd Qu.: 151.7         
##  Max.   :2813503   Max.   :1087.3

Filter to only focus on United States

leading_causeof_death <- leading_causeof_death %>%
  filter(State == "United States")

Finding the relevent colums

leading_causeof_death |>
  select(Year,
         `Cause Name`,
         Deaths,
         `Age-adjusted Death Rate`)
## # A tibble: 209 × 4
##     Year `Cause Name`             Deaths `Age-adjusted Death Rate`
##    <dbl> <chr>                     <dbl>                     <dbl>
##  1  2017 Unintentional injuries   169936                      49.4
##  2  2017 All causes              2813503                     732. 
##  3  2017 Alzheimer's disease      121404                      31  
##  4  2017 Stroke                   146383                      37.6
##  5  2017 CLRD                     160201                      40.9
##  6  2017 Diabetes                  83564                      21.5
##  7  2017 Heart disease            647457                     165  
##  8  2017 Influenza and pneumonia   55672                      14.3
##  9  2017 Suicide                   47173                      14  
## 10  2017 Cancer                   599108                     152. 
## # ℹ 199 more rows
head(leading_causeof_death)
## # A tibble: 6 × 6
##    Year `113 Cause Name`        `Cause Name` State Deaths Age-adjusted Death R…¹
##   <dbl> <chr>                   <chr>        <chr>  <dbl>                  <dbl>
## 1  2017 Accidents (unintention… Unintention… Unit… 1.70e5                   49.4
## 2  2017 All Causes              All causes   Unit… 2.81e6                  732. 
## 3  2017 Alzheimer's disease (G… Alzheimer's… Unit… 1.21e5                   31  
## 4  2017 Cerebrovascular diseas… Stroke       Unit… 1.46e5                   37.6
## 5  2017 Chronic lower respirat… CLRD         Unit… 1.60e5                   40.9
## 6  2017 Diabetes mellitus (E10… Diabetes     Unit… 8.36e4                   21.5
## # ℹ abbreviated name: ¹​`Age-adjusted Death Rate`

Find the top three leading causes

death_in_2017 <- leading_causeof_death[leading_causeof_death$Year == 2017, ]
death_in_2017 <- death_in_2017[death_in_2017$`Cause Name` != "All causes", ]
death_in_2017 <- death_in_2017[order(death_in_2017$Deaths, decreasing = TRUE), ]
top3_2017 <- death_in_2017[1:3, ]
top3_2017
## # A tibble: 3 × 6
##    Year `113 Cause Name`        `Cause Name` State Deaths Age-adjusted Death R…¹
##   <dbl> <chr>                   <chr>        <chr>  <dbl>                  <dbl>
## 1  2017 Diseases of heart (I00… Heart disea… Unit… 647457                  165  
## 2  2017 Malignant neoplasms (C… Cancer       Unit… 599108                  152. 
## 3  2017 Accidents (unintention… Unintention… Unit… 169936                   49.4
## # ℹ abbreviated name: ¹​`Age-adjusted Death Rate`

Visualizing the top 3 leading causes of death (2017)

x <- top3_2017$`Cause Name`
y <- top3_2017$Deaths
barplot(y,
        names.arg = x,
        main = "Top 3 Leading Causes of Death in the U.S. (2017)",
        xlab = "Cause of Death",
        ylab = "Number of Deaths")

C. Conclusion and Future Directions

The analysis showed that in 2017, heart disease and cancer remained the two leading causes of death in the United States, and the third as unintentional injuries. These results are consistent with national health patterns, showing that chronic illnesses are the main contributors to death in the United states. This highlights the importance of efforts in health education focused on cardiovascular and cancer related work For future work, I could compare the top causes across multiple years to see if any major shifts happen over time to see more specific trends.

Refrences:

Data.gov / NCHS — Leading Causes of Death dataset National Center for Health Statistics. (2025). NCHS – Leading Causes of Death: United States [Dataset]. https://catalog.data.gov/dataset/nchs-leading-causes-of-death-united-states