Assignment 5

Human Mortality Database

This is an initial exploration of the Human Mortality Database, which is at https://www.mortality.org/.

Download the entire database and place it in your current working directory.

Setup

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.2.0
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

USAM Data

Load the data for USA males. Add a variable country and set it to “USA”.

Select country, Year, Age and qx.

Make Age numeric.

Eliminate any missing data.

Solution

USAM <- read_table("lt_male/mltper_1x1/USA.mltper_1x1.txt", skip = 2) %>% 
mutate(country = "USA") %>% 
select(country, Year, Age, qx) %>% 
mutate(Age = as.numeric(Age)) %>% 
filter(Age < 85) %>% 
rename(male_prob_death = qx) %>% 
drop_na()

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Year = col_double(),
##   Age = col_character(),
##   mx = col_double(),
##   qx = col_double(),
##   ax = col_double(),
##   lx = col_double(),
##   dx = col_double(),
##   Lx = col_double(),
##   Tx = col_double(),
##   ex = col_double()
## )

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

summary(USAM)

##    country               Year           Age     male_prob_death   
##  Length:7650        Min.   :1933   Min.   : 0   Min.   :0.000100  
##  Class :character   1st Qu.:1955   1st Qu.:21   1st Qu.:0.001662  
##  Mode  :character   Median :1978   Median :42   Median :0.004540  
##                     Mean   :1978   Mean   :42   Mean   :0.019836  
##                     3rd Qu.:2000   3rd Qu.:63   3rd Qu.:0.024307  
##                     Max.   :2022   Max.   :84   Max.   :0.172840

Canada

Do the same for Canada.

Solution

CANM <- read_table("lt_male/mltper_1x1/CAN.mltper_1x1.txt", skip = 2) %>% 
mutate(country = "Canada") %>% 
select(country, Year, Age, qx) %>% 
mutate(Age = as.numeric(Age)) %>% 
filter(Age < 85) %>% 
rename(male_prob_death = qx) %>%
drop_na()

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Year = col_double(),
##   Age = col_character(),
##   mx = col_double(),
##   qx = col_double(),
##   ax = col_double(),
##   lx = col_double(),
##   dx = col_double(),
##   Lx = col_double(),
##   Tx = col_double(),
##   ex = col_double()
## )

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

summary(CANM)

##    country               Year           Age     male_prob_death  
##  Length:8670        Min.   :1921   Min.   : 0   Min.   :0.00005  
##  Class :character   1st Qu.:1946   1st Qu.:21   1st Qu.:0.00141  
##  Mode  :character   Median :1972   Median :42   Median :0.00386  
##                     Mean   :1972   Mean   :42   Mean   :0.01870  
##                     3rd Qu.:1997   3rd Qu.:63   3rd Qu.:0.02082  
##                     Max.   :2022   Max.   :84   Max.   :0.18114

Combine

Combine the two dataframes into USA_CANM using rbind().

Solution

USA_CANM = rbind(USAM, CANM)
summary(USA_CANM)

##    country               Year           Age     male_prob_death  
##  Length:16320       Min.   :1921   Min.   : 0   Min.   :0.00005  
##  Class :character   1st Qu.:1951   1st Qu.:21   1st Qu.:0.00152  
##  Mode  :character   Median :1974   Median :42   Median :0.00411  
##                     Mean   :1974   Mean   :42   Mean   :0.01923  
##                     3rd Qu.:1998   3rd Qu.:63   3rd Qu.:0.02257  
##                     Max.   :2022   Max.   :84   Max.   :0.18114

Male Infant Mortality USA and Canada

Produce a graph showing the probability of male death at age 0 for the USA and Canada. Use color to see two time-series plots. Create this graph beginning in 1940.

Solutiom

USA_CANM %>% 
  filter(Age == 0 & Year > 1940) %>% 
  ggplot(aes(x = Year, y = male_prob_death, color = country)) +
  geom_point() +
  ggtitle("Male Infant Mortality - USA and Canada")

USA/Canada 2

Create a graph comparing USA and Canadian male mortality at age 79.

Solution

USA_CANM %>% 
  filter(Age == 79 & Year > 1940) %>% 
  ggplot(aes(x = Year, y = male_prob_death, color = country)) +
  geom_point() +
  ggtitle("Age 79 Male Mortality - USA and Canada")

Task 1

Copy and modify the code above to produce USAF, CANF and USA_CANF. Do summaries to verify your work.

USAF <- read_table("lt_female/fltper_1x1/USA.fltper_1x1.txt", skip = 2) %>% 
mutate(country = "USA") %>% 
select(country, Year, Age, qx) %>% 
mutate(Age = as.numeric(Age)) %>% 
filter(Age < 85) %>% 
rename(female_prob_death = qx) %>% 
drop_na()

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Year = col_double(),
##   Age = col_character(),
##   mx = col_double(),
##   qx = col_double(),
##   ax = col_double(),
##   lx = col_double(),
##   dx = col_double(),
##   Lx = col_double(),
##   Tx = col_double(),
##   ex = col_double()
## )

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

summary(USAF)

##    country               Year           Age     female_prob_death
##  Length:7650        Min.   :1933   Min.   : 0   Min.   :0.00008  
##  Class :character   1st Qu.:1955   1st Qu.:21   1st Qu.:0.00073  
##  Mode  :character   Median :1978   Median :42   Median :0.00297  
##                     Mean   :1978   Mean   :42   Mean   :0.01343  
##                     3rd Qu.:2000   3rd Qu.:63   3rd Qu.:0.01431  
##                     Max.   :2022   Max.   :84   Max.   :0.15084

CANF <- read_table("lt_female/fltper_1x1/CAN.fltper_1x1.txt", skip = 2) %>% 
mutate(country = "Canada") %>% 
select(country, Year, Age, qx) %>% 
mutate(Age = as.numeric(Age)) %>% 
filter(Age < 85) %>% 
rename(female_prob_death = qx) %>%
drop_na()

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Year = col_double(),
##   Age = col_character(),
##   mx = col_double(),
##   qx = col_double(),
##   ax = col_double(),
##   lx = col_double(),
##   dx = col_double(),
##   Lx = col_double(),
##   Tx = col_double(),
##   ex = col_double()
## )

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

summary(CANF)

##    country               Year           Age     female_prob_death 
##  Length:8670        Min.   :1921   Min.   : 0   Min.   :0.000030  
##  Class :character   1st Qu.:1946   1st Qu.:21   1st Qu.:0.000650  
##  Mode  :character   Median :1972   Median :42   Median :0.003125  
##                     Mean   :1972   Mean   :42   Mean   :0.013531  
##                     3rd Qu.:1997   3rd Qu.:63   3rd Qu.:0.013325  
##                     Max.   :2022   Max.   :84   Max.   :0.159520

USA_CANF = rbind(USAF, CANF)
summary(USA_CANF)

##    country               Year           Age     female_prob_death
##  Length:16320       Min.   :1921   Min.   : 0   Min.   :0.00003  
##  Class :character   1st Qu.:1951   1st Qu.:21   1st Qu.:0.00069  
##  Mode  :character   Median :1974   Median :42   Median :0.00306  
##                     Mean   :1974   Mean   :42   Mean   :0.01348  
##                     3rd Qu.:1998   3rd Qu.:63   3rd Qu.:0.01371  
##                     Max.   :2022   Max.   :84   Max.   :0.15952

Task 2

Redo the graphs you produced above for females in the USA and Canada. Do you see the same patterns?

Same general patterns are observed, though at a lower rate compared to male data. Interestingly, male probability of death at age 79 in the US is far greater than that of the Canadian data. A discrepancy that is not present in the female data set.

USA_CANF %>% 
  filter(Age == 0 & Year > 1940) %>% 
  ggplot(aes(x = Year, y = female_prob_death, color = country)) +
  geom_point() +
  ggtitle("Female Infant Mortality - USA and Canada")

USA_CANF %>% 
  filter(Age == 79 & Year > 1940) %>% 
  ggplot(aes(x = Year, y = female_prob_death, color = country)) +
  geom_point() +
  ggtitle("Age 79 Female Mortality - USA and Canada")

Task 3: Male + Female

Combine USAM and USAF into USA. This new dataframe will have both male and female probabilities of death. Run a summary to verify your work.

USA = cbind(USAF, USAM)
summary(USA)

##    country               Year           Age     female_prob_death
##  Length:7650        Min.   :1933   Min.   : 0   Min.   :0.00008  
##  Class :character   1st Qu.:1955   1st Qu.:21   1st Qu.:0.00073  
##  Mode  :character   Median :1978   Median :42   Median :0.00297  
##                     Mean   :1978   Mean   :42   Mean   :0.01343  
##                     3rd Qu.:2000   3rd Qu.:63   3rd Qu.:0.01431  
##                     Max.   :2022   Max.   :84   Max.   :0.15084  
##    country               Year           Age     male_prob_death   
##  Length:7650        Min.   :1933   Min.   : 0   Min.   :0.000100  
##  Class :character   1st Qu.:1955   1st Qu.:21   1st Qu.:0.001662  
##  Mode  :character   Median :1978   Median :42   Median :0.004540  
##                     Mean   :1978   Mean   :42   Mean   :0.019836  
##                     3rd Qu.:2000   3rd Qu.:63   3rd Qu.:0.024307  
##                     Max.   :2022   Max.   :84   Max.   :0.172840

Task 4: The Ratio

Compute a new variable ratio. It is the ratio of the male probability of death to the female probability. For the year 2019, plot this ratio with Age on the horizontal axis. Use geom_point().

USA_19_mf_ratio <- USA %>% 
  select(country:female_prob_death, male_prob_death) %>%
  filter(Year == 2019) %>%
  mutate(m_f_ratio = male_prob_death/female_prob_death)

## Warning in x:y: numerical expression has 2 elements: only the first used

summary(USA_19_mf_ratio)

##    country               Year           Age     female_prob_death 
##  Length:85          Min.   :2019   Min.   : 0   Min.   :0.000090  
##  Class :character   1st Qu.:2019   1st Qu.:21   1st Qu.:0.000460  
##  Mode  :character   Median :2019   Median :42   Median :0.001710  
##                     Mean   :2019   Mean   :42   Mean   :0.007859  
##                     3rd Qu.:2019   3rd Qu.:63   3rd Qu.:0.008460  
##                     Max.   :2019   Max.   :84   Max.   :0.061050  
##  male_prob_death     m_f_ratio    
##  Min.   :0.00012   Min.   :1.154  
##  1st Qu.:0.00132   1st Qu.:1.442  
##  Median :0.00286   Median :1.653  
##  Mean   :0.01155   Mean   :1.741  
##  3rd Qu.:0.01416   3rd Qu.:1.893  
##  Max.   :0.08104   Max.   :2.911

ggplot(USA_19_mf_ratio, aes(x = Age, y = m_f_ratio)) +
  geom_point() +
  ggtitle("Ratio of Male:Female Mortality by Age - USA") +
  xlab("Age") +
  ylab("Ratio")

Task 5: Comments

Describe what you saw in Task 4. How would you explain this?

Given the ratio was described as male_prob/female_prob, a ratio > 1 implies that the male rate of death is greater than that of females’. The greatest discrepancy appears at approximately 21 years of age, though is significant at all ages for which there is data. Given that there is no point at which the ratio dips below 1, that is to say that female mortality probability is greater than the male probability, then this probability metric must account for the change in population by sex. Otherwise there would be a point at which there simply aren’t enough males left to keep the death rate up.

Without any data on cause of death, it is difficult to discern the reason for such result. Subjective experience would seem to indicate that 18-30 year old men are engaged in more dangerous activities, given that health problems do not typically account for a great deal of death by this point. However, this does not explain the reason for young male children to have such higher rates. Preliminary extra-curricular research seems to indicate that cause of death in this age group has higher rates for every major category - accidental, abuse, and health (cancer, usually). While some amount of cultural factors may explain the first two, I cannot imagine what would influence the third.

Assignment 5

Hayden Jones

11/05/2024

Human Mortality Database

Setup

USAM Data

Solution

Canada

Solution

Combine

Solution

Male Infant Mortality USA and Canada

Solutiom

USA/Canada 2

Solution

Task 1

Task 2

Task 3: Male + Female

Task 4: The Ratio

Task 5: Comments