The purpose of this exploratory data analysis is to showcase the mortality rate scenarios in different US states to explore the mortality to risk trends historically. During this exercise, the mortality data from 5 states have been selected in this analysis process based on their latest population rank. Mortality data analysis would be beneficial to explore any historical events that could cause an impact on the death counts in highly populated states.
Human Mortality Database(HMD) is the main data source for this analysis exercise. The complete data series of HMD database includes the following:
Live birth counts Death counts Population size on January 1st Population exposed to risk of death (period & cohort) Death rates (period & cohort) Life tables (period & cohort)
But for this analysis purpose only the Cohort USA Life tables are derived from the HMD database.
Cohort Life Tables are presented for a population if there is at least one cohort observed from birth until extinction (i.e., the date by which all cohort members are assumed to have died). In that case, life tables are provided for all extinct cohorts and for some almost-extinct cohorts as well.
. Data files are tab-delimited text (ASCII) files.
. Files are organized by sex, age, and time.
. Population size is given for one-year1 and five-year2 age groups.3
. Deaths,3 exposure-to-risk, death rates, and life tables are given in similar formats of age and time:
o 1x1 (by age1 and year)
o 1x5 (by age1 and 5-year time interval)
o 1x10 (by age1 and 10-year time interval)
o 5x1 (by 5-year2 age group and year)
o 5x5 (by 5-year2 age group and 5-year time interval)
o 5x10 (by 5-year2 age group and 10-year time interval)
Year- Year or range of years (for both period & cohort data)
Age- Age group for n-year interval from exact age x to just before exact age x+n, where n=1, 4, 5, or ??? (open age interval)
m(x)- Central death rate between ages x and x+n
q(x)- Probability of death between ages x and x+n
a(x)- Average length of survival between ages x and x+n for persons dying in the interval
l(x)- Number of survivors at exact age x, assuming l(0) = 100,000
d(x)- Number of deaths between ages x and x+n
L(x)- Number of person-years lived between ages x and x+n
T(x)- Number of person-years remaining after exact age x
e(x)- Life expectancy at exact age x (in years)
As a part of data preparation steps, all the life tables from different states have been merged into a few single master R dataframes. As an example, R dataframes will be categorized based on Sex and agexyear matrix as mentioned below:
AZ_male_1x1 (by Arizona State, male and age x 1 year time interval)
AZ_female_1x1 (by Arizona State, female and age x 1 year time interval)
AZ_both_1x1 (by Arizona State, both sex and age x 1 year time interval)
Correlation analysis would give an overview of the correlated life table columns.Positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients.
You can also embed plots, for example:
## $r
## Tx ex lx Lx mx qx ax dx
## Tx 1.000 1.000 0.82 0.66 -0.60 -0.740 -0.061 -0.540
## ex 1.000 1.000 0.83 0.67 -0.63 -0.760 -0.042 -0.520
## lx 0.820 0.830 1.00 0.91 -0.87 -0.980 0.310 -0.230
## Lx 0.660 0.670 0.91 1.00 -0.80 -0.900 0.620 -0.230
## mx -0.600 -0.630 -0.87 -0.80 1.00 0.940 -0.460 -0.140
## qx -0.740 -0.760 -0.98 -0.90 0.94 1.000 -0.390 0.074
## ax -0.061 -0.042 0.31 0.62 -0.46 -0.390 1.000 0.250
## dx -0.540 -0.520 -0.23 -0.23 -0.14 0.074 0.250 1.000
##
## $p
## Tx ex lx Lx mx qx ax dx
## Tx 0.0e+00 0.0e+00 0 0 0.000000e+00 0e+00 4.8e-58 0.000000e+00
## ex 0.0e+00 0.0e+00 0 0 0.000000e+00 0e+00 1.5e-28 0.000000e+00
## lx 0.0e+00 0.0e+00 0 0 0.000000e+00 0e+00 0.0e+00 0.000000e+00
## Lx 0.0e+00 0.0e+00 0 0 0.000000e+00 0e+00 0.0e+00 0.000000e+00
## mx 0.0e+00 0.0e+00 0 0 0.000000e+00 0e+00 0.0e+00 2.964394e-323
## qx 0.0e+00 0.0e+00 0 0 0.000000e+00 0e+00 0.0e+00 2.000000e-85
## ax 4.8e-58 1.5e-28 0 0 0.000000e+00 0e+00 0.0e+00 0.000000e+00
## dx 0.0e+00 0.0e+00 0 0 2.964394e-323 2e-85 0.0e+00 0.000000e+00
##
## $sym
## Tx ex lx Lx mx qx ax dx
## Tx 1
## ex 1 1
## lx + + 1
## Lx , , * 1
## mx . , + , 1
## qx , , B + * 1
## ax . , . . 1
## dx . . 1
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1
Following line series plots will demonstrate the mortality rate trend of various age groups starting from 1933 to 2016. We are only focusing on 5 major poultaed states to examine how the mortality rate(qx = dx/lx) ofr different agae groups increased or decreased in last 80+ years. To reduce the visual clutter in the line plots , the age field has been broken down to four categories:
The life table from the following states have been captured for this analysis:
New York, California, Texas, Pennsylvenia, Florida
Note: Hover your mouse on the line to see the qx value at each age in different year
### Mortality trend for Age Category 1(Male)
### Mortality trend for Age Category 2(Male)
### Mortality trend for Age Category 3(Male)
### Mortality trend for Age Category 4(Male)
### Mortality trend for Age Category 1(Female)
### Mortality trend for Age Category 2(Female)
### Mortality trend for Age Category 3(Female)
### Mortality trend for Age Category 4(Female)