May 7, 2019

Introduction

The National Center of Health Statisitics has a dataset that contains the top 10 leading causes of death in the United States. The data has been collected from residental death certificates from all 50 states and the District of Columbia (DC) since 1999. The data also contains the death rate per 100,000 people that is age adjusted. The age-adjustments were initially based off the US population in 2000 but after the 2010 census the values have been re-adjusted.

Current Questions

  • Problems:
    • How do the causes of death differ throughout the country (i.e. do certain states have higher number of deaths due to cancer)?
    • How does the quanitity of deaths due to theses leading causes of death vary by state (i.e. are there more people dying by a leading cause in one region but not in another)?

Data Useage

  • The dataset was subset to contain the leading causes of death during the year of 2012 of each state accounted for during that year.
    • Causes include: Cancer, Heart Disease, Unintentional Injuries, stroke and Chronic Lower Respiratory Diseases (CLRD)
  • Also to create the Choropleth Map that used to visualize the data, the dataset needed geographical coordinates added to it. A new file was created using the states latitude and longitudes; which were obtained from here.
library(readr)
causeofdeathus <- read_csv("NCHS_Leading_Causes_of_Death__United_States.csv")
causeofdeathus2012<-subset(causeofdeathus, Year %in% 2012)
causeofdeathus2012<-read_csv("Leading Cause of Death 2012.csv")

Visualization of the Data

Conclusions

  • The use of this dataset illustrates that during 2012, heart disease and cancer were a major contributors to the greatest number deaths in the United States

  • Also death by a stroke was more concentrated in the Southern region of the United States.

  • CLRD has the smallest number of deaths overall. Only being reported majorly in Wyoming, West Virginia, and Oklahoma.