HMD 2

Harold Nelson

2025-11-25

Setup

Get Packages and Data

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

load("Analysis.Rdata")

Infant Mortality USA

Plot the probability of death for males over time for the USA. Use ggplotly.

Solution

g = Analysis %>% 
  filter(Country == "USA",Age == 0) %>% 
  ggplot(aes(Year,Male_Prob)) +
  geom_point() +
  ggtitle("Male Infant Mortality")

ggplotly(g)

Repeat for Age 80

Solution

g = Analysis %>% 
  filter(Country == "USA",Age == 80) %>% 
  ggplot(aes(Year,Male_Prob)) +
  geom_point() +
  ggtitle("Male Mortality at Age 80")

ggplotly(g)

Compare

Infant Mortality for the USA and Canada

Solution

g = Analysis %>% 
  filter(Country %in% c("USA","CAN"),
  Age == 0) %>% 
  ggplot(aes(Year,Male_Prob,color = Country)) +
  geom_point() +
  ggtitle("Male Infant Mortality")

ggplotly(g)

Repeat for Age 80

Solution

g = Analysis %>% 
  filter(Country %in% c("USA","CAN"),
  Age == 80) %>% 
  ggplot(aes(Year,Male_Prob,color = Country)) +
  geom_point() +
  ggtitle("Male Infant Mortality")

ggplotly(g)

All Ages in 2019: Compare

USA and Canada

Solution

g = Analysis %>% 
  filter(Year == 2019, 
         Country %in% c("USA","CAN")) %>% 
  ggplot(aes(Age,Male_Prob,color = Country)) +
  geom_point() +
  ggtitle("All Ages USA and Canada")

ggplotly(g)

Compare Male Mortality

All Countries, Age 80, Year = 2019. Do a Cleveland plot.

Solution

g = Analysis %>% 
  filter(Age == 80, Year == 2019) %>% 
  ggplot(aes(Male_Prob,reorder(Country,Male_Prob))) + 
  geom_point() +
  ggtitle("All Conntries 2019 Male Age 80")

ggplotly(g)

Covid Analysis

The pandemic began in 2020. Take the years 2015 - 2019 as a base period for comparison.

Create the dataframe Base. There is a row for every Country and Age. Base_Male_Prob and Base_Female_Prob are the mean values of Male_Prob and Female_Prob for the years 2015 - 2019.

Solution

Base = Analysis %>% 
  filter(Year > 2014, Year < 2020) %>% 
  group_by(Country, Age) %>% 
  summarize(Base_Male_Prob = mean(Male_Prob),
            Base_Female_Prob = mean(Female_Prob)) %>% 
  ungroup()

## `summarise()` has grouped output by 'Country'. You can override using the
## `.groups` argument.

str(Base)

## tibble [4,140 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Country         : chr [1:4140] "AUS" "AUS" "AUS" "AUS" ...
##  $ Age             : num [1:4140] 0 1 2 3 4 5 6 7 8 9 ...
##  $ Base_Male_Prob  : num [1:4140] 0.003488 0.000257 0.00016 0.000126 0.000105 ...
##  $ Base_Female_Prob: num [1:4140] 2.86e-03 2.16e-04 1.43e-04 9.66e-05 7.20e-05 ...

Join Base with Analysis

Create Covid_Analysis. Keep years > 2020

Solution

Covid_Analysis = Analysis %>% 
  filter(Year > 2020) %>% 
  inner_join(Base)

## Joining with `by = join_by(Country, Age)`

str(Covid_Analysis)

## tibble [8,730 × 13] (S3: tbl_df/tbl/data.frame)
##  $ Country         : chr [1:8730] "AUS" "AUS" "AUS" "AUS" ...
##  $ Year            : num [1:8730] 2021 2021 2021 2021 2021 ...
##  $ Age             : num [1:8730] 0 1 2 3 4 5 6 7 8 9 ...
##  $ Female_Deaths   : num [1:8730] 455 24 16 12 11 14 12 10 12 7 ...
##  $ Male_Deaths     : num [1:8730] 537 29 22 22 15 16 11 9 16 15 ...
##  $ Female_Pop      : num [1:8730] 144842 144860 146749 149118 153649 ...
##  $ Male_Pop        : num [1:8730] 152819 153464 155759 157955 162682 ...
##  $ Male_Prob       : num [1:8730] 3.51e-03 1.89e-04 1.41e-04 1.39e-04 9.22e-05 ...
##  $ Female_Prob     : num [1:8730] 3.14e-03 1.66e-04 1.09e-04 8.05e-05 7.16e-05 ...
##  $ MFRatio         : num [1:8730] 1.12 1.14 1.3 1.73 1.29 ...
##  $ MFPopRatio      : num [1:8730] 1.06 1.06 1.06 1.06 1.06 ...
##  $ Base_Male_Prob  : num [1:8730] 0.003488 0.000257 0.00016 0.000126 0.000105 ...
##  $ Base_Female_Prob: num [1:8730] 2.86e-03 2.16e-04 1.43e-04 9.66e-05 7.20e-05 ...

New Variables

Create Expected Deaths for Males and Female using Base_Male_Prob and Base_Female_Prob.

Then create Ratio and Difference of Actual and Expected Deaths for Males and Females.

Solution

Covid_Analysis = Covid_Analysis %>% 
  mutate(Expected_Deaths_Male = Base_Male_Prob * Male_Pop,
         Expected_Deaths_Female = Base_Female_Prob * Female_Pop,
         Excess_Male_Ratio = Male_Deaths/Expected_Deaths_Male,
         Excess_Male_Diff = Male_Deaths - Expected_Deaths_Male,
         Excess_Female_Ratio = Female_Deaths/Expected_Deaths_Female,
         Excess_Female_Diff = Female_Deaths - Expected_Deaths_Female)

Plot the Excess Ratio

for the USA males in 2021.

Solution

g = Covid_Analysis %>% 
  filter(Year == 2021,Country == "USA") %>% 
  ggplot(aes(Age, Excess_Male_Ratio)) + 
  geom_point() +
  ggtitle("Excess Male Ratio for 2021")

ggplotly(g)

Now the Difference

Solution

g = Covid_Analysis %>% 
  filter(Year == 2021,Country == "USA") %>% 
  ggplot(aes(Age, Excess_Male_Diff)) + 
  geom_point() +
  ggtitle("Excess Male Deaths for 2021")

ggplotly(g)