1 Setup and installation

wpp2024 is distributed through GitHub, not CRAN, because the package data exceed CRAN size limits. Install it once with remotes (or pak).

install.packages(c("remotes", "data.table", "dplyr", "tidyr", "ggplot2", "scales"))
remotes::install_github("PPgp/wpp2024")        # required
remotes::install_github("PPgp/wpp2024extra")   # optional companion indicators
library(wpp2024)     # UN World Population Prospects 2024 data, loaded as R objects
library(data.table)  # fast tables; we coerce UNlocations to this below
library(dplyr)       # data verbs: filter(), mutate(), select(), the join functions
library(tidyr)       # reshaping: pivot_longer() turns wide columns into long format
library(ggplot2)     # plotting
library(scales)      # axis-label helpers, e.g. label_comma() for thousands separators

2 Exercise B1 — Load, look up M49 codes, and compare three countries

Goal. Load the annual wpp2024 indicator tables, use UNlocations to resolve country names to M49 codes (the same M49 standard introduced in Lesson 1.1), and plot total fertility, life expectancy at birth, and total population for three countries at different stages of the demographic transition.

2.1 Load the data

wpp2024 exposes its annual estimates as long-format data.tables whose names end in 1dt (the 1 means single-year). Column maps, as verified against the package:

Object Columns (after lazy-load) Years
tfr1dt country_code, name, year, tfr 1950–2023
e01dt country_code, name, year, e0M, e0F, e0B 1950–2023
pop1dt country_code, name, year, popM, popF, pop 1950–2023
UNlocations name, country_code, reg_code, area_name, ...

The combined-sex life expectancy is e0B (not e0), and the annual estimate series ends in 2023; 2024 is the first projection year (tables tfrproj1dt, e0proj1dt, popproj1dt).

data(tfr1dt)        # load the Total Fertility Rate table (lazy-loaded from the package)
data(e01dt)         # load life expectancy at birth: e0M (male), e0F (female), e0B (both sexes)
data(pop1dt)        # load total population: popM, popF, and pop (both sexes), in thousands
data(UNlocations)   # load the table that maps country names to M49 codes

2.2 Look up M49 codes

UNlocations ships as a data.frame; we coerce it to a data.table to use the filter-and-select idiom, then look up the three comparison countries.

# YOUR TURN (optional): change these three countries to ones you work with or want to compare.
comparison <- c("Niger", "India", "Japan")   # a vector of three country names (early-, mid-, post-transition)

loc   <- as.data.table(UNlocations)          # copy UNlocations as a data.table so we can use its [filter, select] syntax
codes <- loc[name %in% comparison,           # keep only rows whose name is one of our three countries ...
             .(name, country_code)]          # ... and return just two columns: the name and its M49 code
codes                                        # print it, so you can see the codes (Niger 562, India 356, Japan 392)
keep <- codes$country_code   # pull the M49 codes out as a plain vector: c(562, 356, 392)

2.3 Assemble a tidy long frame and plot

We pull one value column from each table, label it, stack the three indicators into one long frame, and facet. Keeping everything long means a single ggplot call draws all three panels for all three countries.

tfr <- as_tibble(tfr1dt) |>                                       # start from the TFR table (as a data frame)
  filter(country_code %in% keep) |>                              # keep only our three countries
  transmute(name, year,                                          # keep name and year, and build two new columns:
            indicator = "Total fertility rate",                  #   a constant label naming this indicator
            value     = tfr)                                     #   the TFR number, renamed to a generic "value"

e0 <- as_tibble(e01dt) |>                                        # same pattern for life expectancy
  filter(country_code %in% keep) |>
  transmute(name, year,
            indicator = "Life expectancy at birth (e0, both sexes)",
            value     = e0B)                                     # use the both-sexes column e0B (not e0)

pop <- as_tibble(pop1dt) |>                                      # same pattern for total population
  filter(country_code %in% keep) |>
  transmute(name, year,
            indicator = "Total population (thousands)",
            value     = pop)                                     # pop = total population in thousands

trends <- bind_rows(tfr, e0, pop) |>                             # stack the three labelled tables into one long table
  filter(year >= 1950, year <= 2023) |>                          # keep the estimate years only (1950-2023)
  mutate(indicator = factor(indicator, levels = c(               # set panel order by making indicator an ordered factor
    "Total fertility rate",
    "Life expectancy at birth (e0, both sexes)",
    "Total population (thousands)"
  )))
ggplot(trends, aes(year, value, colour = name)) +                # base plot: x = year, y = value, one colour per country
  geom_line(linewidth = 0.8) +                                   # draw a line for each country in each panel
  facet_wrap(~ indicator, ncol = 1, scales = "free_y") +         # one stacked panel per indicator, each with its own y-scale
  scale_y_continuous(labels = label_comma()) +                   # format y-axis numbers with thousands separators
  scale_x_continuous(breaks = seq(1950, 2020, 10)) +             # an x-axis tick every 10 years
  labs(                                                          # titles and labels:
    title    = "Three countries at different transition stages, 1950-2023",
    subtitle = "Annual WPP 2024 estimates",
    x = NULL, y = NULL, colour = "Country",                      # drop axis titles; title the legend "Country"
    caption  = "Source: United Nations, WPP 2024 (wpp2024 R package)."
  ) +
  theme_minimal(base_size = 12) +                                # clean theme, base font size 12
  theme(legend.position = "top",                                 # legend on top
        strip.text = element_text(face = "bold"))                # bold the panel (facet) titles

The three panels should show the textbook sequence: Niger still early in fertility decline with the fastest population growth, India mid-transition, and Japan post-transition with sub-replacement fertility, the highest life expectancy, and a population now plateauing.


3 Exercise B2 — Single-country demographic profile as a reproducible report

Goal. For the reference country (Uganda, M49 800), compile a tidy annual profile — TFR, CBR, e0, CDR — for 1970–2023, draw a multi-panel summary, and annotate national events that plausibly left marks in the series. All four indicators are distributed in wpp2024, so this exercise needs no external data files.

3.1 Indicator sources

Indicator Source object Column Package
TFR tfr1dt tfr wpp2024
e0 e01dt e0B wpp2024
CBR misc1dt cbr wpp2024
CDR misc1dt cdr wpp2024

misc1dt also carries cnmr (crude net-migration rate), growthrate, and births/deaths counts, so a fifth panel (e.g. net migration) can be added by pulling one more column.

3.2 Assemble the profile

data(misc1dt)   # load the "misc" table: crude birth rate (cbr), crude death rate (cdr), and more

# Country and period settings for this profile.
# When you KNIT, the YAML `params:` header at the top is used. When you run chunks
# interactively, `params` may not exist - so each line falls back to a default with exists().
# This is what prevents the "object 'params' not found" error if the setup chunk has not run.
# YOUR TURN (optional): change these to profile your own country (or edit the YAML header).
cc      <- if (exists("params")) params$cc      else 800      # M49 code (800 = Uganda; 562 = Niger; 392 = Japan)
country <- if (exists("params")) params$country else "Uganda" # display name, used in the plot title
y_min   <- if (exists("params")) params$y_min   else 1970     # first year of the profile
y_max   <- if (exists("params")) params$y_max   else 2023     # last estimate year (2024 is the first projection year)
yrs     <- y_min:y_max                                        # the sequence of years: 1970, 1971, ..., 2023
# Make sure the country/period settings exist, in case this chunk is run on its own
# (when knitting, they come from the YAML `params` header; otherwise from these defaults):
if (!exists("cc"))    cc    <- if (exists("params")) params$cc    else 800
if (!exists("y_min")) y_min <- if (exists("params")) params$y_min else 1970
if (!exists("y_max")) y_max <- if (exists("params")) params$y_max else 2023
if (!exists("yrs"))   yrs   <- y_min:y_max

profile <- as_tibble(tfr1dt) |>                                  # start from TFR, for our single country
  filter(country_code == cc, year %in% yrs) |>                   # keep that country and the chosen years
  select(year, TFR = tfr) |>                                     # keep year and TFR (renamed)
  left_join(as_tibble(e01dt)   |> filter(country_code == cc) |> select(year, e0  = e0B),            by = "year") |>  # add e0,  matched on year
  left_join(as_tibble(misc1dt) |> filter(country_code == cc) |> select(year, CBR = cbr, CDR = cdr), by = "year")     # add CBR & CDR, matched on year
profile   # print the assembled table: one row per year, columns TFR, e0, CBR, CDR

3.3 Multi-panel profile with event annotations

National events are illustrative placeholders for Uganda; reviewers and learners should confirm or replace them. The plot reshapes the profile to long form so each indicator gets its own free-scaled panel, with one vertical reference line per event.

# YOUR TURN: replace these placeholder events with real events for your country (year + short label).
events <- tibble::tribble(            # build a small 2-column table the easy, row-by-row way
  ~year, ~label,                      # column names: the event year, and a short text label
  1986,  "1986 - political stabilization",
  1992,  "1992 - peak HIV/AIDS mortality",
  2004,  "2004 - national ART scale-up"
)
# Make sure the title settings exist, in case this chunk is run on its own
# (when knitting, they come from the YAML `params` header; otherwise from these defaults):
if (!exists("country")) country <- if (exists("params")) params$country else "Uganda"
if (!exists("y_min"))   y_min   <- if (exists("params")) params$y_min   else 1970
if (!exists("y_max"))   y_max   <- if (exists("params")) params$y_max   else 2023

profile_long <- profile |>                                       # reshape the wide profile (one column per indicator) ...
  pivot_longer(-year, names_to = "indicator", values_to = "value") |>  # ... into long form: columns year, indicator, value
  filter(!is.na(value)) |>                                       # drop missing values
  mutate(indicator = factor(indicator, levels = c("TFR", "CBR", "CDR", "e0")))  # set panel order

ggplot(profile_long, aes(year, value)) +                         # base plot: x = year, y = value
  geom_vline(data = events, aes(xintercept = year),              # one dashed vertical line per event year
             linetype = "dashed", colour = "grey60") +
  geom_line(linewidth = 0.8, colour = "#1f4e79") +               # the indicator line, in dark blue
  facet_wrap(~ indicator, scales = "free_y") +                   # one panel per indicator, each with its own y-scale
  labs(                                                          # titles/labels (title uses the settings above):
    title = paste0("Demographic profile: ", country,
                   ", ", y_min, "-", y_max),
    x = NULL, y = NULL,
    caption = "Source: WPP 2024 (wpp2024 R package): tfr1dt, e01dt, misc1dt."
  ) +
  theme_minimal(base_size = 12) +
  theme(strip.text = element_text(face = "bold"))                # bold the panel titles

Illustrative national events (confirm/replace per country).
Year Annotated event
1986 1986 - political stabilization
1992 1992 - peak HIV/AIDS mortality
2004 2004 - national ART scale-up

3.4 Narrative interpretation

Learner deliverable: 5–10 sentences. Reference text below.

For Uganda, the profile shows a classic but incomplete demographic transition. The crude death rate falls steadily from the 1970s while the crude birth rate stays high into the 2000s, so the gap between them — natural increase — widens before it narrows, which is exactly the engine of rapid population growth. Total fertility begins its decline only later and from a high level, lagging the mortality improvement by a generation. Life expectancy at birth rises overall but carries a visible setback around the early 1990s that coincides with peak HIV/AIDS mortality, the clearest case in the profile of a national event leaving a mark in a demographic series. Reading the series against the annotated events is the habit the lab is meant to build: demographic indicators are not free-floating curves but the population’s record of its own history.