HMD 1

Harold Nelson

2025-11-18

Setup

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Register

Go to https://www.mortality.org/Home/Index. Register and make the agreement which will allow you to download data.

Get Deaths Data

Go to https://www.mortality.org/Data/ZippedDataFiles and download the zip file of death counts.

Unzip the file and obtain the file Deaths_1x1.txt. Put this file in your project directory and import it as Deaths.

Rename as follows:

PopName becomes Country Female becomes Female_Deaths Male Becomes Male_Deaths

Make Age numeric Select Country, Year, Age, Female_Deaths, Male_Deaths.

Drop any rows with missing data.

Filter to keep only rows with Age <= 89.

Solution

Deaths = read_table("Deaths_1x1.txt", skip = 2)

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   PopName = col_character(),
##   Year = col_double(),
##   Age = col_character(),
##   Female = col_double(),
##   Male = col_double(),
##   Total = col_double()
## )

## Warning: 1665 parsing failures.
##   row    col               expected actual             file
## 27862 Female no trailing characters      . 'Deaths_1x1.txt'
## 27862 Male   no trailing characters      . 'Deaths_1x1.txt'
## 27862 Total  no trailing characters      . 'Deaths_1x1.txt'
## 27863 Female no trailing characters      . 'Deaths_1x1.txt'
## 27863 Male   no trailing characters      . 'Deaths_1x1.txt'
## ..... ...... ...................... ...... ................
## See problems(...) for more details.

Deaths = Deaths %>% 
  rename(Male_Deaths = Male,
         Female_Deaths = Female,
         Country = PopName) %>% 
  mutate(Age = as.numeric(Age)) %>% 
  drop_na() %>% 
  select(Country,Year,Age,Female_Deaths,Male_Deaths) %>% 
  filter(Age <= 89)

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

head(Deaths)

## # A tibble: 6 × 5
##   Country  Year   Age Female_Deaths Male_Deaths
##   <chr>   <dbl> <dbl>         <dbl>       <dbl>
## 1 AUS      1921     0         3842.       5125.
## 2 AUS      1921     1          719.        890.
## 3 AUS      1921     2          330.        359.
## 4 AUS      1921     3          166.        250.
## 5 AUS      1921     4          190.        197.
## 6 AUS      1921     5          149.        153.

tail(Deaths)

## # A tibble: 6 × 5
##   Country  Year   Age Female_Deaths Male_Deaths
##   <chr>   <dbl> <dbl>         <dbl>       <dbl>
## 1 USA      2023    84        41709.      38558.
## 2 USA      2023    85        42984.      38407.
## 3 USA      2023    86        42306.      36779.
## 4 USA      2023    87        42813.      35743.
## 5 USA      2023    88        43591.      34513.
## 6 USA      2023    89        41396.      31403.

Get Population Data

Go to https://www.mortality.org/Data/ZippedDataFiles and get the Population Estimates. Move the file Population.txt to your project and import it as Population.

Population <- read_table("Population.txt", 
    skip = 2)

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   PopName = col_character(),
##   Year = col_double(),
##   Age = col_character(),
##   Female = col_double(),
##   Male = col_double(),
##   Total = col_double()
## )

## Warning: 10656 parsing failures.
##   row    col               expected actual             file
## 28195 Female no trailing characters      . 'Population.txt'
## 28195 Male   no trailing characters      . 'Population.txt'
## 28195 Total  no trailing characters      . 'Population.txt'
## 28196 Female no trailing characters      . 'Population.txt'
## 28196 Male   no trailing characters      . 'Population.txt'
## ..... ...... ...................... ...... ................
## See problems(...) for more details.

Fix Population Data

Rename as follows:

PopName becomes Country Female becomes Female_Pop Male Becomes Male_Pop Make Age numeric. Select Country, Year, Age, Female_Pop, Male_Pop.

Drop any rows with missing data.

Filter to keep only rows with Age <= 89.

Solution

Population = Population %>% 
  rename(Country = PopName,
         Female_Pop = Female,
         Male_Pop = Male) %>% 
  mutate(Age = as.numeric(Age)) %>% 
  select(-Total) %>% 
  drop_na() %>% 
  filter(Age <= 89)

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion

Join Population and Deaths

Create Analysis. Left join Deaths to Population.

Solution

Analysis = Deaths %>% 
  inner_join(Population)

## Joining with `by = join_by(Country, Year, Age)`

Compute Probabilities

Add the variables Male_Prob and Female_Prob to Aanalysis. Also add MFRatio, the ratio of these probabilities.

Solution

Analysis = Analysis %>% 
  mutate(Male_Prob = Male_Deaths/Male_Pop,
         Female_Prob = Female_Deaths/Female_Pop,
         MFRatio = Male_Prob/Female_Prob)

Plot the Ratio for the USA.

Use 2023. Put Age on the x-axis and the ratio on the y_axis. Use geom_point() and ggplotly.

Solution

g = Analysis %>% 
  filter(Year == 2023, Country == "USA") %>% 
  ggplot(aes(Age,MFRatio)) +
  geom_point() +
  ggtitle("Ratio of Male to Female Probability of Death by Age")

ggplotly(g)