Harold Nelson
2025-11-18
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
Go to https://www.mortality.org/Home/Index. Register and make the agreement which will allow you to download data.
Go to https://www.mortality.org/Data/ZippedDataFiles and download the zip file of death counts.
Unzip the file and obtain the file Deaths_1x1.txt. Put this file in your project directory and import it as Deaths.
Rename as follows:
PopName becomes Country Female becomes Female_Deaths Male Becomes Male_Deaths
Make Age numeric Select Country, Year, Age, Female_Deaths, Male_Deaths.
Drop any rows with missing data.
Filter to keep only rows with Age <= 89.
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## PopName = col_character(),
## Year = col_double(),
## Age = col_character(),
## Female = col_double(),
## Male = col_double(),
## Total = col_double()
## )
## Warning: 1665 parsing failures.
## row col expected actual file
## 27862 Female no trailing characters . 'Deaths_1x1.txt'
## 27862 Male no trailing characters . 'Deaths_1x1.txt'
## 27862 Total no trailing characters . 'Deaths_1x1.txt'
## 27863 Female no trailing characters . 'Deaths_1x1.txt'
## 27863 Male no trailing characters . 'Deaths_1x1.txt'
## ..... ...... ...................... ...... ................
## See problems(...) for more details.
Deaths = Deaths %>%
rename(Male_Deaths = Male,
Female_Deaths = Female,
Country = PopName) %>%
mutate(Age = as.numeric(Age)) %>%
drop_na() %>%
select(Country,Year,Age,Female_Deaths,Male_Deaths) %>%
filter(Age <= 89)## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion
## # A tibble: 6 × 5
## Country Year Age Female_Deaths Male_Deaths
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AUS 1921 0 3842. 5125.
## 2 AUS 1921 1 719. 890.
## 3 AUS 1921 2 330. 359.
## 4 AUS 1921 3 166. 250.
## 5 AUS 1921 4 190. 197.
## 6 AUS 1921 5 149. 153.
## # A tibble: 6 × 5
## Country Year Age Female_Deaths Male_Deaths
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 USA 2023 84 41709. 38558.
## 2 USA 2023 85 42984. 38407.
## 3 USA 2023 86 42306. 36779.
## 4 USA 2023 87 42813. 35743.
## 5 USA 2023 88 43591. 34513.
## 6 USA 2023 89 41396. 31403.
Go to https://www.mortality.org/Data/ZippedDataFiles and get the Population Estimates. Move the file Population.txt to your project and import it as Population.
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## PopName = col_character(),
## Year = col_double(),
## Age = col_character(),
## Female = col_double(),
## Male = col_double(),
## Total = col_double()
## )
## Warning: 10656 parsing failures.
## row col expected actual file
## 28195 Female no trailing characters . 'Population.txt'
## 28195 Male no trailing characters . 'Population.txt'
## 28195 Total no trailing characters . 'Population.txt'
## 28196 Female no trailing characters . 'Population.txt'
## 28196 Male no trailing characters . 'Population.txt'
## ..... ...... ...................... ...... ................
## See problems(...) for more details.
Rename as follows:
PopName becomes Country Female becomes Female_Pop Male Becomes Male_Pop Make Age numeric. Select Country, Year, Age, Female_Pop, Male_Pop.
Drop any rows with missing data.
Filter to keep only rows with Age <= 89.
Population = Population %>%
rename(Country = PopName,
Female_Pop = Female,
Male_Pop = Male) %>%
mutate(Age = as.numeric(Age)) %>%
select(-Total) %>%
drop_na() %>%
filter(Age <= 89)## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Age = as.numeric(Age)`.
## Caused by warning:
## ! NAs introduced by coercion
Create Analysis. Left join Deaths to Population.
## Joining with `by = join_by(Country, Year, Age)`
Add the variables Male_Prob and Female_Prob to Aanalysis. Also add MFRatio, the ratio of these probabilities.
Use 2023. Put Age on the x-axis and the ratio on the y_axis. Use geom_point() and ggplotly.