Picture
Introduction
In the past 100 years, Americans have gained approximately 25 more years of expectancy. These gains were due to significant declines in mortality that occurred as acute infectious diseases were replaced by chronic diseases as the predominant cause of morbidity and mortality. However, 2017 was the third year in a row where U.S. life expectancy fell, a rare event that has only occurred three times in the last century. One other three-year decline occurred in 1916, 1917 and 1918, when the worst flu pandemic in modern history eliminated nearly 7% of the US population. My initial intention was to analyze mortality only in New York state, then I decided to explore death causes in the entire United States. Since I work in health care environment I took advantage to analyse Leading Causes of Death in United States data in order to expand my knowledge and skills.
This dataset is taken from Data.gov presents the age-adjusted death rates for the 10 leading causes of death in the United States beginning in 1999. Data are based on information from all resident death certificates filed in the 50 states and the District of Columbia using demographic and medical characteristics. Age-adjusted death rates (per 100,000 population) are based on the 2000 U.S. standard population.Age-adjusted death rates are useful when comparing different populations because they remove the potential bias that can occur when the populations being compared have different age structures.
Data Acquisition
#Load necessary libraries
suppressWarnings({library(mongolite)})
suppressWarnings({library(ggthemes)})
suppressWarnings({library(gganimate)})
suppressWarnings({library(magick)})
suppressWarnings({library(gapminder)})
suppressWarnings({library(ggplot2)})
suppressWarnings({library(dplyr)})
suppressWarnings({library(lubridate)})
suppressWarnings({library(stringr)})
suppressWarnings({library(plotrix)})
suppressWarnings({library(reshape)})
suppressWarnings({library(knitr)})
suppressWarnings({library(tidyverse)})
suppressWarnings({library(kableExtra)})
suppressWarnings({library(usmap)})url <- "https://raw.githubusercontent.com/uplotnik/FinalProject607/master/USA%20leading%20Causes%20of%20death.csv"
datasetx <- read.csv(url)
head(datasetx)## Year X113.Cause.Name
## 1 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 2 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 3 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 4 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 5 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 6 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## Cause.Name State Deaths Age.adjusted.Death.Rate
## 1 Unintentional injuries United States 169936 49.4
## 2 Unintentional injuries Alabama 2703 53.8
## 3 Unintentional injuries Alaska 436 63.7
## 4 Unintentional injuries Arizona 4184 56.2
## 5 Unintentional injuries Arkansas 1625 51.8
## 6 Unintentional injuries California 13840 33.2
Read Data from MongoDB
mongo_db <- mongo(collection = "Causes of Deaths in USA")
mongo_db$insert(datasetx)## List of 5
## $ nInserted : num 10868
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
head(mongo_db$find())## Year X113_Cause_Name
## 1 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 2 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 3 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 4 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 5 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## 6 2017 Accidents (unintentional injuries) (V01-X59,Y85-Y86)
## Cause_Name State Deaths Age_adjusted_Death_Rate Cause
## 1 Unintentional injuries United States 169936 49.4 <NA>
## 2 Unintentional injuries Alabama 2703 53.8 <NA>
## 3 Unintentional injuries Alaska 436 63.7 <NA>
## 4 Unintentional injuries Arizona 4184 56.2 <NA>
## 5 Unintentional injuries Arkansas 1625 51.8 <NA>
## 6 Unintentional injuries California 13840 33.2 <NA>
Data Preparation
datasetx <- read.csv(url,header = TRUE, na.strings=".")
datasetx<-distinct(datasetx)
datasetx<-datasetx[,-2]
kable(head(datasetx))| Year | Cause.Name | State | Deaths | Age.adjusted.Death.Rate |
|---|---|---|---|---|
| 2017 | Unintentional injuries | United States | 169936 | 49.4 |
| 2017 | Unintentional injuries | Alabama | 2703 | 53.8 |
| 2017 | Unintentional injuries | Alaska | 436 | 63.7 |
| 2017 | Unintentional injuries | Arizona | 4184 | 56.2 |
| 2017 | Unintentional injuries | Arkansas | 1625 | 51.8 |
| 2017 | Unintentional injuries | California | 13840 | 33.2 |
Data Analysis
Exploratory Analysis
dim(datasetx)## [1] 10868 5
str(datasetx)## 'data.frame': 10868 obs. of 5 variables:
## $ Year : int 2017 2017 2017 2017 2017 2017 2017 2017 2017 2017 ...
## $ Cause.Name : Factor w/ 11 levels "All causes","Alzheimer's disease",..: 11 11 11 11 11 11 11 11 11 11 ...
## $ State : Factor w/ 52 levels "Alabama","Alaska",..: 45 1 2 3 4 5 6 7 8 9 ...
## $ Deaths : int 169936 2703 436 4184 1625 13840 3037 2078 608 427 ...
## $ Age.adjusted.Death.Rate: num 49.4 53.8 63.7 56.2 51.8 33.2 53.6 53.2 61.9 61 ...
summary(datasetx)## Year Cause.Name State Deaths
## Min. :1999 All causes : 988 Alabama : 209 Min. : 21
## 1st Qu.:2003 Alzheimer's disease: 988 Alaska : 209 1st Qu.: 612
## Median :2008 Cancer : 988 Arizona : 209 Median : 1718
## Mean :2008 CLRD : 988 Arkansas : 209 Mean : 15460
## 3rd Qu.:2013 Diabetes : 988 California: 209 3rd Qu.: 5756
## Max. :2017 Heart disease : 988 Colorado : 209 Max. :2813503
## (Other) :4940 (Other) :9614
## Age.adjusted.Death.Rate
## Min. : 2.6
## 1st Qu.: 19.2
## Median : 35.9
## Mean : 127.6
## 3rd Qu.: 151.7
## Max. :1087.3
##
Total Number of Death by Year
Filter data by All Causes in entire United States
USA<-filter(datasetx, datasetx$State == "United States")
all_causes_usa<-filter(USA, USA$Cause.Name == "All causes")
kable(head((all_causes_usa),10))| Year | Cause.Name | State | Deaths | Age.adjusted.Death.Rate |
|---|---|---|---|---|
| 2017 | All causes | United States | 2813503 | 731.9 |
| 2016 | All causes | United States | 2744248 | 728.8 |
| 2015 | All causes | United States | 2712630 | 733.1 |
| 2014 | All causes | United States | 2626418 | 724.6 |
| 2013 | All causes | United States | 2596993 | 731.9 |
| 2012 | All causes | United States | 2543279 | 732.8 |
| 2011 | All causes | United States | 2515458 | 741.3 |
| 2010 | All causes | United States | 2468435 | 747.0 |
| 2009 | All causes | United States | 2437163 | 749.6 |
| 2008 | All causes | United States | 2471984 | 774.9 |
Add Death Proportion of each year
yearlydeath<- USA %>%
group_by(Year) %>%
summarise(avg_Deaths = mean(Deaths),
min_Deaths = min(Deaths),
max_Deaths = max(Deaths),
total_deaths= sum(Deaths))%>%
mutate(total_deaths_prop = prop.table(total_deaths))%>%
arrange(desc(Year))
kable(head(yearlydeath,5))| Year | avg_Deaths | min_Deaths | max_Deaths | total_deaths | total_deaths_prop |
|---|---|---|---|---|---|
| 2017 | 445003.1 | 47173 | 2813503 | 4895034 | 0.0582679 |
| 2016 | 434397.0 | 44965 | 2744248 | 4778367 | 0.0568791 |
| 2015 | 429604.3 | 44193 | 2712630 | 4725647 | 0.0562516 |
| 2014 | 414984.2 | 42826 | 2626418 | 4564826 | 0.0543372 |
| 2013 | 409754.9 | 41149 | 2596993 | 4507304 | 0.0536525 |
Find Ave, Min, Max and Total Death 1999-2017
total<-yearlydeath %>%
summarise(avg_deaths_1999_2017 = mean(avg_Deaths),
min_deaths_1999_2017 = min(min_Deaths),
max_deaths_1999_2017 = max(max_Deaths),
total_deaths_1999_2017= sum(total_deaths))
kable(total)| avg_deaths_1999_2017 | min_deaths_1999_2017 | max_deaths_1999_2017 | total_deaths_1999_2017 |
|---|---|---|---|
| 401957.7 | 29199 | 2813503 | 84009152 |
When was the Max and Min number of Deaths?
max<-all_causes_usa %>% slice(which.max(Deaths))
kable(max)| Year | Cause.Name | State | Deaths | Age.adjusted.Death.Rate |
|---|---|---|---|---|
| 2017 | All causes | United States | 2813503 | 731.9 |
min<-all_causes_usa%>% slice(which.min(Deaths))
kable(min)| Year | Cause.Name | State | Deaths | Age.adjusted.Death.Rate |
|---|---|---|---|---|
| 1999 | All causes | United States | 2391399 | 875.6 |
The maximum Number of deaths: 2,813,503 is registered in 2017 and the Minimum Number: 2,391,399 is registered in 1999.
Visualization
mortality<-ggplot(data=yearlydeath, aes(x=Year, y=total_deaths_prop, group=10)) +
geom_line(arrow = arrow())+
geom_point()+
geom_text(aes(label = round(total_deaths_prop,3)),
vjust = "inward", hjust = "inward",
show.legend = FALSE)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs( x="Year", y="Proportion")+ggtitle("Proportion of total Deaths 1999-2017 - All causes ")
mortalityDeath Rate change between years
pct_change <- all_causes_usa %>%
group_by(Cause.Name) %>%
mutate(pct_change = (Deaths/lead(Deaths)-1) * 100)
data_wide1<- pct_change %>%
select(1:4,6)
data_wide2<-spread(data_wide1, State, Deaths)
kable(head(data_wide2))| Year | Cause.Name | pct_change | United States |
|---|---|---|---|
| 1999 | All causes | NA | 2391399 |
| 2000 | All causes | 0.4997911 | 2403351 |
| 2001 | All causes | 0.5439905 | 2416425 |
| 2002 | All causes | 1.1157805 | 2443387 |
| 2003 | All causes | 0.2005822 | 2448288 |
| 2004 | All causes | -2.0697320 | 2397615 |
perc_change<-ggplot(data=data_wide2, aes(x=Year, y=pct_change, group=10)) +
geom_line(arrow = arrow())+
geom_point()+
geom_text(aes(label = round(pct_change, 1)),
vjust = "inward", hjust = "inward",
show.legend = FALSE)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs( x="Year", y="Percentage")+ggtitle("Death Percentage change between Years ")
perc_change## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_text).
The number of death are going up! But the population of US has been growing.In order to see more accurate data, we need to check age-adjusted death rate for the total population.
Age Adjusted Death Rate
AGE-ADJUSTED DEATH RATE is a death rate that controls for the effects of differences in population age distributions. When comparing across geographic areas, some method of age- adjusting is typically used to control for the influence that different population age distributions might have on health event rates.
g3<-ggplot(data=all_causes_usa, aes(x=Year, y=Age.adjusted.Death.Rate, group=10)) +
geom_line(arrow = arrow())+
geom_point()+
geom_text(aes(label = round(Age.adjusted.Death.Rate, 1)),
vjust = "inward", hjust = "inward",
show.legend = FALSE)+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
labs( x="Year", y="Deaths")+ggtitle("Age Adjusted Death Rate 1999-2017")
g3 Age Adjusted Death rate decreased on 16% from 1999 to 2017.
Mortality vs Age Adjusted Death Rate
states<-datasetx[!grepl('United States',datasetx$State),]
all_causes_states<-filter(states, states$Cause.Name == "All causes")p <- ggplot(
all_causes_states,
aes(x = Age.adjusted.Death.Rate, y=Deaths, colour = State)
) +
geom_point(show.legend = FALSE, alpha = 0.7) +
scale_color_viridis_d() +
scale_size(range = c(2, 12)) +
scale_x_log10() +
labs(x = "Age.adjusted.Death.Rate", y = "Deaths")p + transition_time(Year)+
labs(title = "Year: {frame_time}")+
shadow_mark(alpha = 0.3, size = 0.5)The decline in the age-adjusted death rate to a record low value for the United States is consistent with long-term trends in mortality.
Leading Causes of Death in USA 1999-2017
dataset1<-USA[!grepl('All causes',USA$Cause.Name),]dataset_wide<-dataset1%>%select(-State, -Age.adjusted.Death.Rate)
dataset_wide1<-spread(dataset_wide, Year, Deaths)
kable(head(dataset_wide1))%>%
kable_styling(bootstrap_options = "striped", full_width = F)%>%
row_spec(5, bold = T, color = "white", background = "orange")%>%
row_spec(2, bold = T, color = "white", background = "orange")| Cause.Name | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alzheimer’s disease | 44536 | 49558 | 53852 | 58866 | 63457 | 65965 | 71599 | 72432 | 74632 | 82435 | 79003 | 83494 | 84974 | 83637 | 84767 | 93541 | 110561 | 116103 | 121404 |
| Cancer | 549838 | 553091 | 553768 | 557271 | 556902 | 553888 | 559312 | 559888 | 562875 | 565469 | 567628 | 574743 | 576691 | 582623 | 584881 | 591700 | 595930 | 598038 | 599108 |
| CLRD | 124181 | 122009 | 123013 | 124816 | 126382 | 121987 | 130933 | 124583 | 127924 | 141090 | 137353 | 138080 | 142943 | 143489 | 149205 | 147101 | 155041 | 154596 | 160201 |
| Diabetes | 68399 | 69301 | 71372 | 73249 | 74219 | 73138 | 75119 | 72449 | 71382 | 70553 | 68705 | 69071 | 73831 | 73932 | 75578 | 76488 | 79535 | 80058 | 83564 |
| Heart disease | 725192 | 710760 | 700142 | 696947 | 685089 | 652486 | 652091 | 631636 | 616067 | 616828 | 599413 | 597689 | 596577 | 599711 | 611105 | 614348 | 633842 | 635260 | 647457 |
| Influenza and pneumonia | 63730 | 65313 | 62034 | 65681 | 65163 | 59664 | 63001 | 56326 | 52717 | 56284 | 53692 | 50097 | 53826 | 50636 | 56979 | 55227 | 57062 | 51537 | 55672 |
ggplot(aes(x = reorder(Cause.Name,-Deaths), y = Deaths), data = dataset1) +
geom_bar(stat = 'identity') +
ylab('Total Deaths') +
xlab('') +
ggtitle('Causes of Deaths 1999-2017 in USA') +
theme_tufte() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) ggplot(dataset1, aes(x= Year,y=Deaths)) +
geom_bar(aes(fill = reorder(Cause.Name,Deaths)), position ="dodge",
stat = "identity") +
ggtitle("Top Ten Causes of Death by Year") +
ylab(label = "Deaths" )+
scale_fill_discrete(name = "Cause of Death")pie<-dataset1
pie1<-aggregate(pie$Deaths, by=list(Cause=pie$Cause), FUN=sum)
pie2<-pie1%>% mutate(Perc=x/sum(x)) slices <- c(1494816, 10843644, 2594927, 1399943, 12222640,1094641, 858613,2726523,697016,2347820)
lbls <- c("Alzheimer's disease ", "Cancer", "CLRD", "Diabetes", "Heart disease"," Influenza and pneumonia ", "Kidney disease", "Stroke", "Suicide", "Unintentional injuries")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct) # add percents to labels
lbls <- paste(lbls,"%",sep="") # ad % to labels
pie3D(slices,labels = lbls,explode = 0.3, col=rainbow(length(lbls)),
main="Causes of Mortality 1999-2017")p <- ggplot(
dataset1,
aes(Year, Deaths, group = Cause.Name, color = factor(Cause.Name))
) +
geom_line() +
scale_color_viridis_d() +
labs(x = "Year", y = "Deaths") +
theme(legend.position = "top")p + geom_point(aes(group = seq_along(Year))) +
transition_reveal(Year)Mortality by State
Filter by all states and all causes
states<-datasetx[!grepl('United States',datasetx$State),]all_causes_states<-filter(states, states$Cause.Name == "All causes")
head(all_causes_states)%>%arrange(desc(Deaths))## Year Cause.Name State Deaths Age.adjusted.Death.Rate
## 1 2017 All causes California 268189 618.7
## 2 2017 All causes Arizona 57758 678.5
## 3 2017 All causes Alabama 53238 917.7
## 4 2017 All causes Colorado 38063 663.4
## 5 2017 All causes Arkansas 32588 900.1
## 6 2017 All causes Alaska 4411 708.8
Age Adjusted Death Rate for each state 1999-2017
dataset_wide_state<-all_causes_states%>%select(-Cause.Name, -Deaths)
dataset_wide_state1<-spread(dataset_wide_state, Year, Age.adjusted.Death.Rate)
kable(head(dataset_wide_state1))%>%
kable_styling(bootstrap_options = "striped", full_width = F)| State | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Alabama | 1009.3 | 1004.8 | 1002.1 | 1013.4 | 1020.2 | 998.3 | 1001.3 | 979.5 | 957.2 | 959.7 | 940.2 | 939.7 | 933.6 | 926.7 | 925.2 | 909.1 | 924.5 | 920.4 | 917.7 |
| Alaska | 838.9 | 869.1 | 831.3 | 802.4 | 843.5 | 756.7 | 762.3 | 787.6 | 780.6 | 761.3 | 763.8 | 771.5 | 747.8 | 731.4 | 724.4 | 736.8 | 747.4 | 745.6 | 708.8 |
| Arizona | 818.4 | 810.4 | 795.9 | 811.2 | 798.4 | 771.1 | 786.7 | 767.1 | 731.0 | 714.5 | 693.8 | 693.1 | 688.9 | 682.9 | 674.2 | 661.7 | 671.8 | 675.8 | 678.5 |
| Arkansas | 975.3 | 977.1 | 955.4 | 975.2 | 948.2 | 927.4 | 931.5 | 909.5 | 906.2 | 927.3 | 896.4 | 892.7 | 895.3 | 897.5 | 893.8 | 883.7 | 901.8 | 893.2 | 900.1 |
| California | 802.3 | 787.9 | 783.2 | 770.0 | 768.4 | 734.3 | 730.7 | 718.2 | 691.7 | 677.9 | 655.5 | 646.7 | 641.3 | 630.4 | 630.1 | 605.7 | 621.6 | 616.9 | 618.7 |
| Colorado | 801.8 | 792.1 | 796.1 | 804.4 | 792.0 | 738.5 | 752.1 | 720.9 | 708.0 | 715.8 | 689.7 | 682.7 | 677.8 | 665.6 | 655.4 | 664.4 | 665.0 | 669.5 | 663.4 |
Avg, Min, Max Age Adjusted Death Rate for each State
statedeath<- all_causes_states %>%
group_by(State) %>%
summarise(Avg_Rate = mean(Age.adjusted.Death.Rate),
Min_Rate = min(Age.adjusted.Death.Rate),
Max_Rate = max(Age.adjusted.Death.Rate)) %>%
arrange(State)
kable(head(statedeath,5))| State | Avg_Rate | Min_Rate | Max_Rate |
|---|---|---|---|
| Alabama | 962.2579 | 909.1 | 1020.2 |
| Alaska | 774.2737 | 708.8 | 869.1 |
| Arizona | 732.9158 | 661.7 | 818.4 |
| Arkansas | 920.4000 | 883.7 | 977.1 |
| California | 691.1316 | 605.7 | 802.3 |
When was the higest/lowest death rate?
maxrate<-all_causes_states %>% slice(which.max(Age.adjusted.Death.Rate))
kable(maxrate)| Year | Cause.Name | State | Deaths | Age.adjusted.Death.Rate |
|---|---|---|---|---|
| 1999 | All causes | District of Columbia | 6076 | 1087.3 |
minrate<-all_causes_states %>% slice(which.min(Age.adjusted.Death.Rate))
kable(minrate)| Year | Cause.Name | State | Deaths | Age.adjusted.Death.Rate |
|---|---|---|---|---|
| 2016 | All causes | Hawaii | 10913 | 572 |
ggplot(aes(x = reorder(State,-Age.adjusted.Death.Rate), y = Age.adjusted.Death.Rate), data = all_causes_states) +
geom_bar(stat = 'identity') +
ylab('Total Deaths') +
xlab('') +
ggtitle('Mortality by State 1999-2017') +
theme_tufte() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))The five states with the highest age-adjusted death rates were Missisipi, West Virginia, Alabama and Louisiana.
Statistical Analysis
It is important to remember that age adjusted rates are not the actual rates of death or disease in the population - those are called “crude rates.” Sometimes, health statistics are used to compare how healthy two different groups of people are, or how healthy a certain group is during two different time periods. Since older people are more likely to get ill, and younger people are more likely to injure themselves, age adjustment can make studies more accurate.
d <-all_causes_usa
fit <- lm(Age.adjusted.Death.Rate ~ Deaths, data = d) # fit the model
d$predicted <- predict(fit) # Save the predicted values
d$residuals <- residuals(fit) # Save the residual values
ggplot(d, aes(x = Deaths, y = Age.adjusted.Death.Rate)) +
geom_smooth(method = "lm", se = FALSE, color = "lightgrey") + # regression line
geom_segment(aes(xend = Deaths, yend = predicted), alpha = .2) + # draw line from point to line
geom_point(aes(color = abs(residuals), size = abs(residuals))) + # size of the points
scale_color_continuous(low = "green", high = "red") + # colour of the points mapped to residual size - green smaller, red larger
guides(color = FALSE, size = FALSE) + # Size legend removed
geom_point(aes(y = predicted), shape = 1) +
theme_bw()summary(fit)##
## Call:
## lm(formula = Age.adjusted.Death.Rate ~ Deaths, data = d)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.032 -30.830 -4.929 39.837 55.086
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.546e+03 1.844e+02 8.382 1.92e-07 ***
## Deaths -3.033e-04 7.333e-05 -4.137 0.000689 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39.63 on 17 degrees of freedom
## Multiple R-squared: 0.5017, Adjusted R-squared: 0.4724
## F-statistic: 17.11 on 1 and 17 DF, p-value: 0.0006895
Looking at the summary, it has p-value of 0.0006895, which indicates that there is statistically significant relationship between the two variables.
ggplot(data=fit, aes(y=fit$residuals, x=Deaths)) +
geom_point(alpha=.25, color='blue') +
geom_abline(slope=0,
intercept=0, color='red', size=1, alpha=.25) +
ylab("Residuals") +
xlab("Deaths") +
ggtitle("Residual Plot")Points in a residual plot are randomy dispersed around the horizontal axis, meaning that linear regression model is appropriate for the data.
ggplot(data=fit, aes(x=fit$residuals)) +
geom_histogram(bins = 12, alpha=.5) +
ggtitle("Histogram of Residuals")qqnorm(fit$residuals)
qqline(fit$residuals)Conclusion
Based on the analysis of ten leading death causes in USA we can conclude following: There are two leading causes: Heart Disease 34% and Cancer 30%. The highest death rate was registered in 2017 and the lowest in 1999. Mississippi is the state with the higest mortality rate and Hawaii with the lowest rate. The highest percentage of deaths number change was between 2004 and 2005 years Age Adjusted Death Rate fall down from 879.6 in 1999 to 731.9 in 2017 (16%) proving the statement that life expectancy in general increased.
Data Sources and References
1.https://catalog.data.gov/dataset/age-adjusted-death-rates-for-the-top-10-leading-causes-of-death-united-states-2013 2.https://healthmetrics.heart.org/us-life-expectancy-drops-for-third-year-in-a-row-what-we-know-and-dont-know-from-the-2017-cdc-mortality-report 3.https://www.cdc.gov/nchs/nvss/deaths.htm