wpp2024 is distributed through GitHub, not
CRAN, because the package data exceed CRAN size limits. Install
it once with remotes (or pak).
install.packages(c("remotes", "data.table", "dplyr", "tidyr", "ggplot2", "scales"))
remotes::install_github("PPgp/wpp2024") # required
remotes::install_github("PPgp/wpp2024extra") # optional companion indicators
library(wpp2024) # UN World Population Prospects 2024 data, loaded as R objects
library(data.table) # fast tables; we coerce UNlocations to this below
library(dplyr) # data verbs: filter(), mutate(), select(), the join functions
library(tidyr) # reshaping: pivot_longer() turns wide columns into long format
library(ggplot2) # plotting
library(scales) # axis-label helpers, e.g. label_comma() for thousands separators
Goal. Load the annual wpp2024 indicator
tables, use UNlocations to resolve country names to M49
codes (the same M49 standard introduced in Lesson 1.1), and plot total
fertility, life expectancy at birth, and total population for three
countries at different stages of the demographic transition.
wpp2024 exposes its annual estimates as
long-format data.tables whose names end in
1dt (the 1 means single-year). Column maps, as
verified against the package:
| Object | Columns (after lazy-load) | Years |
|---|---|---|
tfr1dt |
country_code, name, year, tfr |
1950–2023 |
e01dt |
country_code, name, year, e0M, e0F, e0B |
1950–2023 |
pop1dt |
country_code, name, year, popM, popF, pop |
1950–2023 |
UNlocations |
name, country_code, reg_code, area_name, ... |
— |
The combined-sex life expectancy is e0B (not
e0), and the annual estimate series ends in
2023; 2024 is the first projection year (tables
tfrproj1dt, e0proj1dt,
popproj1dt).
data(tfr1dt) # load the Total Fertility Rate table (lazy-loaded from the package)
data(e01dt) # load life expectancy at birth: e0M (male), e0F (female), e0B (both sexes)
data(pop1dt) # load total population: popM, popF, and pop (both sexes), in thousands
data(UNlocations) # load the table that maps country names to M49 codes
UNlocations ships as a data.frame; we
coerce it to a data.table to use the filter-and-select
idiom, then look up the three comparison countries.
# YOUR TURN (optional): change these three countries to ones you work with or want to compare.
comparison <- c("Niger", "India", "Japan") # a vector of three country names (early-, mid-, post-transition)
loc <- as.data.table(UNlocations) # copy UNlocations as a data.table so we can use its [filter, select] syntax
codes <- loc[name %in% comparison, # keep only rows whose name is one of our three countries ...
.(name, country_code)] # ... and return just two columns: the name and its M49 code
codes # print it, so you can see the codes (Niger 562, India 356, Japan 392)
keep <- codes$country_code # pull the M49 codes out as a plain vector: c(562, 356, 392)
We pull one value column from each table, label it, stack the three
indicators into one long frame, and facet. Keeping everything long means
a single ggplot call draws all three panels for all three
countries.
tfr <- as_tibble(tfr1dt) |> # start from the TFR table (as a data frame)
filter(country_code %in% keep) |> # keep only our three countries
transmute(name, year, # keep name and year, and build two new columns:
indicator = "Total fertility rate", # a constant label naming this indicator
value = tfr) # the TFR number, renamed to a generic "value"
e0 <- as_tibble(e01dt) |> # same pattern for life expectancy
filter(country_code %in% keep) |>
transmute(name, year,
indicator = "Life expectancy at birth (e0, both sexes)",
value = e0B) # use the both-sexes column e0B (not e0)
pop <- as_tibble(pop1dt) |> # same pattern for total population
filter(country_code %in% keep) |>
transmute(name, year,
indicator = "Total population (thousands)",
value = pop) # pop = total population in thousands
trends <- bind_rows(tfr, e0, pop) |> # stack the three labelled tables into one long table
filter(year >= 1950, year <= 2023) |> # keep the estimate years only (1950-2023)
mutate(indicator = factor(indicator, levels = c( # set panel order by making indicator an ordered factor
"Total fertility rate",
"Life expectancy at birth (e0, both sexes)",
"Total population (thousands)"
)))
ggplot(trends, aes(year, value, colour = name)) + # base plot: x = year, y = value, one colour per country
geom_line(linewidth = 0.8) + # draw a line for each country in each panel
facet_wrap(~ indicator, ncol = 1, scales = "free_y") + # one stacked panel per indicator, each with its own y-scale
scale_y_continuous(labels = label_comma()) + # format y-axis numbers with thousands separators
scale_x_continuous(breaks = seq(1950, 2020, 10)) + # an x-axis tick every 10 years
labs( # titles and labels:
title = "Three countries at different transition stages, 1950-2023",
subtitle = "Annual WPP 2024 estimates",
x = NULL, y = NULL, colour = "Country", # drop axis titles; title the legend "Country"
caption = "Source: United Nations, WPP 2024 (wpp2024 R package)."
) +
theme_minimal(base_size = 12) + # clean theme, base font size 12
theme(legend.position = "top", # legend on top
strip.text = element_text(face = "bold")) # bold the panel (facet) titles
The three panels should show the textbook sequence: Niger still early in fertility decline with the fastest population growth, India mid-transition, and Japan post-transition with sub-replacement fertility, the highest life expectancy, and a population now plateauing.
Goal. For the reference country (Uganda, M49 800),
compile a tidy annual profile — TFR, CBR, e0, CDR — for
1970–2023, draw a multi-panel summary, and annotate national events that
plausibly left marks in the series. All four indicators are distributed
in wpp2024, so this exercise needs no external data
files.
| Indicator | Source object | Column | Package |
|---|---|---|---|
| TFR | tfr1dt |
tfr |
wpp2024 |
| e0 | e01dt |
e0B |
wpp2024 |
| CBR | misc1dt |
cbr |
wpp2024 |
| CDR | misc1dt |
cdr |
wpp2024 |
misc1dt also carries cnmr (crude
net-migration rate), growthrate, and
births/deaths counts, so a fifth panel
(e.g. net migration) can be added by pulling one more column.
data(misc1dt) # load the "misc" table: crude birth rate (cbr), crude death rate (cdr), and more
# YOUR TURN (optional): to profile your own country, edit cc/country in the YAML header above
# (used when you Knit) or in the setup-chunk fallback (used when you run chunks interactively).
cc <- params$cc # the reference country's M49 code (from params; 800 = Uganda by default)
yrs <- params$y_min:params$y_max # a sequence of years: 1970, 1971, ..., 2023
profile <- as_tibble(tfr1dt) |> # start from TFR, for our single country
filter(country_code == cc, year %in% yrs) |> # keep that country and the chosen years
select(year, TFR = tfr) |> # keep year and TFR (renamed)
left_join(as_tibble(e01dt) |> filter(country_code == cc) |> select(year, e0 = e0B), by = "year") |> # add e0, matched on year
left_join(as_tibble(misc1dt) |> filter(country_code == cc) |> select(year, CBR = cbr, CDR = cdr), by = "year") # add CBR & CDR, matched on year
profile # print the assembled table: one row per year, columns TFR, e0, CBR, CDR
National events are illustrative placeholders for Uganda; reviewers and learners should confirm or replace them. The plot reshapes the profile to long form so each indicator gets its own free-scaled panel, with one vertical reference line per event.
# YOUR TURN: replace these placeholder events with real events for your country (year + short label).
events <- tibble::tribble( # build a small 2-column table the easy, row-by-row way
~year, ~label, # column names: the event year, and a short text label
1986, "1986 - political stabilization",
1992, "1992 - peak HIV/AIDS mortality",
2004, "2004 - national ART scale-up"
)
profile_long <- profile |> # reshape the wide profile (one column per indicator) ...
pivot_longer(-year, names_to = "indicator", values_to = "value") |> # ... into long form: columns year, indicator, value
filter(!is.na(value)) |> # drop missing values
mutate(indicator = factor(indicator, levels = c("TFR", "CBR", "CDR", "e0"))) # set panel order
ggplot(profile_long, aes(year, value)) + # base plot: x = year, y = value
geom_vline(data = events, aes(xintercept = year), # one dashed vertical line per event year
linetype = "dashed", colour = "grey60") +
geom_line(linewidth = 0.8, colour = "#1f4e79") + # the indicator line, in dark blue
facet_wrap(~ indicator, scales = "free_y") + # one panel per indicator, each with its own y-scale
labs( # titles/labels (the title is built from params):
title = paste0("Demographic profile: ", params$country,
", ", params$y_min, "-", params$y_max),
x = NULL, y = NULL,
caption = "Source: WPP 2024 (wpp2024 R package): tfr1dt, e01dt, misc1dt."
) +
theme_minimal(base_size = 12) +
theme(strip.text = element_text(face = "bold")) # bold the panel titles
| Year | Annotated event |
|---|---|
| 1986 | 1986 - political stabilization |
| 1992 | 1992 - peak HIV/AIDS mortality |
| 2004 | 2004 - national ART scale-up |
Learner deliverable: 5–10 sentences. Reference text below.
For Uganda, the profile shows a classic but incomplete demographic transition. The crude death rate falls steadily from the 1970s while the crude birth rate stays high into the 2000s, so the gap between them — natural increase — widens before it narrows, which is exactly the engine of rapid population growth. Total fertility begins its decline only later and from a high level, lagging the mortality improvement by a generation. Life expectancy at birth rises overall but carries a visible setback around the early 1990s that coincides with peak HIV/AIDS mortality, the clearest case in the profile of a national event leaving a mark in a demographic series. Reading the series against the annotated events is the habit the lab is meant to build: demographic indicators are not free-floating curves but the population’s record of its own history.