This is an initial exploration of the Human Mortality Database, which is at https://www.mortality.org/.
Download the entire database and place it in your current working directory.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
Load the data for USA males. Add a variable country and set it to “USA”.
Select country, Year, Age and qx.
Make Age numeric.
Eliminate any missing data.
USAM <- read_table("hmd_statistics_20241105/lt_male/mltper_1x1/USA.mltper_1x1.txt", skip = 2) %>%
mutate(country = "USA") %>%
select(country, Year, Age, qx) %>%
mutate(Age = as.numeric(Age)) %>%
filter(Age < 85) %>%
rename(male_prob_death = qx) %>%
drop_na()
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
summary(USAM)
## country Year Age male_prob_death
## Length:7650 Min. :1933 Min. : 0 Min. :0.000100
## Class :character 1st Qu.:1955 1st Qu.:21 1st Qu.:0.001662
## Mode :character Median :1978 Median :42 Median :0.004540
## Mean :1978 Mean :42 Mean :0.019836
## 3rd Qu.:2000 3rd Qu.:63 3rd Qu.:0.024307
## Max. :2022 Max. :84 Max. :0.172840
Do the same for Canada.
CANM <- read_table("hmd_statistics_20241105/lt_male/mltper_1x1/CAN.mltper_1x1.txt", skip = 2) %>%
mutate(country = "Canada") %>%
select(country, Year, Age, qx) %>%
mutate(Age = as.numeric(Age)) %>%
filter(Age < 85) %>%
rename(male_prob_death = qx) %>%
drop_na()
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
Combine the two dataframes into USA_CANM using rbind().
USA_CANM = rbind(USAM, CANM)
Produce a graph showing the probability of male death at age 0 for the USA and Canada. Use color to see two time-series plots. Create this graph beginning in 1940.
USA_CANM %>%
filter(Age == 0 & Year > 1940) %>%
ggplot(aes(x = Year, y = male_prob_death, color = country)) +
geom_point() +
ggtitle("Male Infant Mortality - USA and Canada")
Create a graph comparing USA and Canadian male mortality at age 79.
Copy and modify the code above to produce USAF, CANF and USA_CANF. Do summaries to verify your work.
# Place your code here.
USAF <- read_table("hmd_statistics_20241105/lt_female/fltper_1x1/USA.fltper_1x1.txt", skip = 2) %>%
mutate(country = "USA") %>%
select(country, Year, Age, qx) %>%
mutate(Age = as.numeric(Age)) %>%
filter(Age < 85) %>%
rename(female_prob_death = qx) %>%
drop_na()
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
summary(USAM)
## country Year Age male_prob_death
## Length:7650 Min. :1933 Min. : 0 Min. :0.000100
## Class :character 1st Qu.:1955 1st Qu.:21 1st Qu.:0.001662
## Mode :character Median :1978 Median :42 Median :0.004540
## Mean :1978 Mean :42 Mean :0.019836
## 3rd Qu.:2000 3rd Qu.:63 3rd Qu.:0.024307
## Max. :2022 Max. :84 Max. :0.172840
CANF <- read_table("hmd_statistics_20241105/lt_female/fltper_1x1/CAN.fltper_1x1.txt", skip = 2) %>%
mutate(country = "Canada") %>%
select(country, Year, Age, qx) %>%
mutate(Age = as.numeric(Age)) %>%
filter(Age < 85) %>%
rename(female_prob_death = qx) %>%
drop_na()
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Age = col_character(),
## mx = col_double(),
## qx = col_double(),
## ax = col_double(),
## lx = col_double(),
## dx = col_double(),
## Lx = col_double(),
## Tx = col_double(),
## ex = col_double()
## )
## Warning in mask$eval_all_mutate(quo): NAs introduced by coercion
USA_CANF = rbind(USAF, CANF)
summary(USA_CANF)
## country Year Age female_prob_death
## Length:16320 Min. :1921 Min. : 0 Min. :0.00003
## Class :character 1st Qu.:1951 1st Qu.:21 1st Qu.:0.00069
## Mode :character Median :1974 Median :42 Median :0.00306
## Mean :1974 Mean :42 Mean :0.01348
## 3rd Qu.:1998 3rd Qu.:63 3rd Qu.:0.01371
## Max. :2022 Max. :84 Max. :0.15952
Redo the graphs you produced above for females in the USA and Canada. Do you see the same patterns?
# Place your code here.
USA_CANF %>%
filter(Age == 0 & Year > 1940) %>%
ggplot(aes(x = Year, y = female_prob_death, color = country)) +
geom_point() +
ggtitle("Female Infant Mortality - USA and Canada")
USA_CANF %>%
filter(Age == 79 & Year > 1940) %>%
ggplot(aes(x = Year, y = female_prob_death, color = country)) +
geom_point() +
ggtitle("Age 79 Female Mortality - USA and Canada")
##same patterns, modern era has reduced mortality similar in both countries with usa having more
Combine USAM and USAF into USA. This new dataframe will have both male and female probabilities of death. Run a summary to verify your work.
# Place your code here
USA = cbind(USAF, USAM)
summary(USA)
## country Year Age female_prob_death
## Length:7650 Min. :1933 Min. : 0 Min. :0.00008
## Class :character 1st Qu.:1955 1st Qu.:21 1st Qu.:0.00073
## Mode :character Median :1978 Median :42 Median :0.00297
## Mean :1978 Mean :42 Mean :0.01343
## 3rd Qu.:2000 3rd Qu.:63 3rd Qu.:0.01431
## Max. :2022 Max. :84 Max. :0.15084
## country Year Age male_prob_death
## Length:7650 Min. :1933 Min. : 0 Min. :0.000100
## Class :character 1st Qu.:1955 1st Qu.:21 1st Qu.:0.001662
## Mode :character Median :1978 Median :42 Median :0.004540
## Mean :1978 Mean :42 Mean :0.019836
## 3rd Qu.:2000 3rd Qu.:63 3rd Qu.:0.024307
## Max. :2022 Max. :84 Max. :0.172840
Compute a new variable ratio. It is the ratio of the male probability of death to the female probability. For the year 2019, plot this ratio with Age on the horizontal axis. Use geom_point().
# Combine USAM and USAF
USA <- cbind(USAF, male_prob_death = USAM$male_prob_death) %>%
mutate(ratio = male_prob_death / female_prob_death)
# Verify with summary
summary(USA)
## country Year Age female_prob_death
## Length:7650 Min. :1933 Min. : 0 Min. :0.00008
## Class :character 1st Qu.:1955 1st Qu.:21 1st Qu.:0.00073
## Mode :character Median :1978 Median :42 Median :0.00297
## Mean :1978 Mean :42 Mean :0.01343
## 3rd Qu.:2000 3rd Qu.:63 3rd Qu.:0.01431
## Max. :2022 Max. :84 Max. :0.15084
## male_prob_death ratio
## Min. :0.000100 Min. :1.000
## 1st Qu.:0.001662 1st Qu.:1.407
## Median :0.004540 Median :1.652
## Mean :0.019836 Mean :1.740
## 3rd Qu.:0.024307 3rd Qu.:1.907
## Max. :0.172840 Max. :3.523
# Plot the ratio for 2019
USA %>%
filter(Year == 2019) %>%
ggplot(aes(x = Age, y = ratio)) +
geom_point() +
ggtitle("Ratio of Male to Female Probability of Death in 2019 (USA)")
Describe what you saw in Task 4. How would you explain this?
There’s a noticeable peak in the ratio for young adults (approximately ages 20-30). This peak is particularly pronounced, showing that young adult males have a significantly higher mortality risk compared to females of the same age.Societal and behavioral factors that affect mortality risk differently for males and females. For example, the peak in young adulthood could be attributed to higher risk-taking behavior among young men, such as dangerous driving, substance abuse, or occupational hazards.