Introduction

This document will present life expectancy and age specific mortality observed in Scotland since 2012 in comparison with projections based on earlier trends.

Two periods will be considered when informing the pre-2012 trends:

Background literature: why ARIMA(0,1,0) is OK

Lee (2016?), who co-created the Lee-Carter model for mortality projection in the early 1990s, accepted in the mid/late 2000s, as a result of analyses by White (2002) and Oeppen & Vaupel (2002), that a much simpler modelling approach is often adequate for life expectancy forecasting. This approach involves modelling life expectancy at birth (\(e_0\)) directly, rather than individual age specific mortality rates (\(m_{x}\)) and quantities derived from them (a ‘drift parameter’ \(k\)) as in Lee-Carter’s original model specification.

Although Lee (2002) presents a number of refinements of a basic linear life expectancy forecase model specification, including models which allow for negative autocorrelation between change in years, the article states that the basic strategy for forecasting should be “to use the appropriate or preferred equation for \(de_{t}/dt\) [change in life expectancy] to estimate \(e0\) one year later, and then continue recursively.” This is the approach this projection exercise will follow.

Modelling strategy

We have \(e_0 (s, t)\), the life expectancy at birth for sex \(s\) and time \(t\), extracted from the Human Mortality Database (HMD) from 1855 to 2016, and from the ONS for 2017 and 2018 (by males and females only, not total population). We define \(\Delta e_0 (s, t)\) as change in life expectancy for sex \(s\) from year \(t\) to year \(t + 1\).

We first plot both \(e_0\) against \(t\) by \(s\), and \(\Delta e_0\) against \(t\) by \(s\), to identify appropriate pre-slowdown time periods, i.e. the range of years prior to 2012 (our assumed breakpoint in mortality trends) to use to calibrate projections of life expectancy trends from 2012 to 2018 onwards. Based on these plots and background knowledge, we selected two pre-slowdown periods:

  • \(P_1\): 1950-2011 inclusive
  • \(P_2\): 1990-2011 inclusive

For both of these pre-slowdown periods, we calculated the mean annual change observed by sex (i.e. \(E(\Delta e_0 (t \in P, s))\)) and the variance in mean annual changes over the same period (i.e. \(Var(\Delta e_0 (t \in P, s)\)).

To project forward estimates of life expectancy trends based on pre-slowdown observations in a way that takes into account observed variation in annual changes, we sample and accumulate \(k\) draws from Normal distributions calibrated on the sample means and variances from the pre-slowdown period (\(P_1\) or \(P_2\)), where \(k\) refers to the number of annual periods we want to project forward from 2011. As we want to project forward to 2018 we therefore select \(k = 7\). This exercise is repeated \(m = 10,000\) times to allow credible intervals in forecasts to be calculated using a Monte-Carlo approach. Life expectancy projections for each year 2012-2018 inclusive are produced by summing up the draws from the first projected period onwards; the life expectancy from 2011 (the last observation in the pre-slowdown period) is added to these summed values to produce predicted life expectancies rather than cumulative gains or losses in life expectancy since 2011. Algebraically this can be expressed as follows:

\[e_{0}^{*} (t = \tau + K) = e_{0}(t=\tau) + \sum_{k=1}^{K} N_{k}\] \[N \sim Normal(\mu = E(\Delta e_0(P), \sigma^{2} = Var(\Delta e_0({P}))))\]

Where \(\tau\) refers to 2011, \(K\) the discrete number of periods to forecast after \(\tau\), and \(\Delta e_{0} (P)\) the series of annual changes over the pre-slowdown period \(P\).

As for each sex and number of discrete projection periods \(K\) (from 1 to 7 periods), \(m\) replicates have been produced, the observed life expectancy \(e_{0}(t = \tau + K)\) can be compared against the Monte-Carlo distribution of \(m\) corresponding projections for the same period \(e_{0}^{*}(t = \tau + K)\). This comparison is shown both graphically, using shaded regions to indicate the 90% and 95% credible intervals from the projected distributions, and by counting up the proportion of the \(m\) projected values which exceed the observed values for each post 2011 period \(K\). For both pre-trend periods \(P_1\) (1950-2011) and \(P_2\) (1990-2011) the same random number seed was used. All data preparation and analysis were performed using the R programming language.

Load packages required

pacman::p_load(HMDHFDplus, tidyverse, broom, readxl, here, svglite)

Bespoke function

save_the_table <- function(x, path){
  write_csv(x, path)
  x
}

Data

mx and ex for all nation from ONS

We’ll now use the tables from this location to get \(e_0\), \(m_x\) and related lifetable quantities from a single source, for each UK nation.

First, I’ve downloaded each of the single year files to the data directory.

ons_lifetables <- tribble(
  ~population, ~file_location,
  "England",          "singleyearlifetablesengland1.xls",
  "Northern Ireland", "singleyearlifetablesnorthernireland2.xls",
  "Scotland",         "singleyearlifetablesscotland1.xls",
  "Wales",            "singleyearlifetableswales1.xls",
  "England & Wales",  "singleyearlifetablesenglandandwales1.xls",
  "Great Britain"   , "singleyearlifetablesgreatbritain1.xls",
  "United Kingdom",   "singleyearlifetablesunitedkingdom1.xls"
)
clean_data <- function(X){
  out <- X %>% 
  select(-junk) %>% 
  gather(-x, key = "sex_quant", value = "value") %>% 
  separate(sex_quant, into = c("sex", "variable")) %>% 
  mutate(sex = case_when(sex == 'm' ~ "male", sex == 'f' ~ "female", TRUE ~ NA_character_))
  return(out)    
}
get_mx_data <- function(path){
  tibble(
    year = as.character(1980:2018)
  ) %>% 
    mutate(dirty_data = map(
      year, 
      ~read_excel(
        path = path,
        range = "A8:L108",
        col_names = c("x", "m_mx", "m_qx", "m_lx", "m_dx", "m_ex", "junk", "f_mx", "f_qx", "f_lx", "f_dx", "f_ex"),
        sheet = .x
        ))
    ) %>% 
    mutate(
      tidy_data = map(dirty_data, clean_data)
    ) %>% 
    select(-dirty_data) %>% 
    unnest(cols = c(tidy_data)) %>% 
  spread(variable, value) %>% 
  mutate(year = as.numeric(year))
}
ons_lifetables <- 
  ons_lifetables %>% 
    mutate(mx_data = map(file_location, ~.x %>% here("data",.) %>% get_mx_data())) %>% 
    select(-file_location) %>% 
    unnest(cols = c(mx_data))

ons_lifetables

From this it’s very easy to extract \(e_0\) alone.

ons_e0 <- 
  ons_lifetables %>% 
    filter(x == 0) %>% 
    select(population, year, sex, e0 = ex)

ons_e0
write_csv(ons_e0, here("data", "e0_from_ons_allnations.csv"))

Now to get summaries for each population and sex

get_summary_changes <- function(data, period = c(1990, 2011)){
  data %>% 
    arrange(year) %>% 
    mutate(delta_e0 = e0 - lag(e0)) %>% 
    filter(between(year, period[1], period[2])) %>% # NOTE THE CHANGE HERE
    summarise(
      mean_de0 = mean(delta_e0),
      sd_de0   = sd(delta_e0)
    ) 
}
ons_summary_changes <- 
  ons_e0 %>% 
    group_by(population, sex) %>% 
    nest() %>% 
    mutate(summary_stats = map(data, get_summary_changes)) %>% 
    select(-data) %>% 
    unnest()

ons_summary_changes
ons_summary_changes %>% 
  ggplot(aes(x = mean_de0, y = sd_de0, colour = sex, shape = sex)) + 
  geom_point() +
  scale_x_continuous(limits = c(0, 0.3)) +  
  scale_y_continuous(limits = c(0, 0.35)) + 
  ggrepel::geom_text_repel(aes(label = population))

Now to extract the 2011 value, run the simulations, and extract summary stats from the simulations

run_simulations <- function(mean, sd, k=7, m=10000, seed = 20){
  set.seed(seed)
  tibble(
    draws = map2(
      .x = mean, .y = sd, 
      .f = ~replicate(m, rnorm(n = k, mean = .x, sd = .y))
    )
  ) %>% 
  mutate(
    draw_df = map(
      draws,
      ~.x %>% 
        data.frame() %>% 
        mutate(k = 1:n()) %>% 
        gather(-k, key = "rep", value = "change") %>% 
        mutate(rep = str_remove(rep, "X") %>% 
                 as.numeric()
        )               
    )
  ) %>% 
  select(draw_df) %>% 
  unnest(cols = c(draw_df))
}
set.seed(12)
allnations_preds <- 
  ons_summary_changes %>% 
    left_join(
      ons_e0 %>% 
        filter(year == 2011) %>% 
        select(-year) %>% 
        rename(e0_2011 = e0)
    ) %>% 
    mutate(
      simblock = map2(.x = mean_de0, .y = sd_de0, run_simulations, m = 10000)
    ) %>% 
    select(population, sex, e0_2011, simblock) %>% 
    unnest(cols = c(simblock) ) %>% 
    group_by(population, sex, rep) %>% 
    arrange(k) %>% 
    mutate(
      cumulative_change = cumsum(change),
      prediction = e0_2011 + cumulative_change,
      year = k + 2011
    ) %>% 
    left_join(
      ons_e0 %>% rename(e0_obs = e0), 
      by = c("population", "year", "sex")
    )
## Joining, by = c("population", "sex")
allnations_preds  
allnations_preds %>% 
  group_by(population, sex, year) %>% 
  summarise(
    obs = e0_obs[1],
    mean_pred = mean(prediction),
    med_pred  = median(prediction),
    low_95   = quantile(prediction, 0.025),
    high_95  = quantile(prediction, 0.975),
    low_90   = quantile(prediction, 0.050),
    high_90  = quantile(prediction, 0.950)
  ) %>% 
  ungroup() %>% 
  save_the_table(here("data", "preds_using_ons_from_1990.csv")) %>% 
  ggplot(aes(x = year)) + 
  facet_grid(sex ~ population) + 
  geom_ribbon(aes(ymin = low_90, ymax = high_90),
              alpha = 0.2
  ) + 
  geom_ribbon(aes(ymin = low_95, ymax = high_95),
              alpha = 0.2
  ) + 
  geom_line(
    aes(y = med_pred),
    colour = "red", size = 1.2
  ) + 
  geom_line(
    aes(y = mean_pred),
    colour = "darkgreen", size = 1.2, linetype = "dashed"
  ) + 
  geom_point(aes(y = obs), size = 2) +
  labs(
    x = "Year", y = "Life expectancy at birth in years",
    title = "Life expectancy against 1990-2011 trend",
    subtitle = "Red: Median of predictions; Green: Mean of predictions\nShaded regions: 90% and 95% credible intervals",
    caption = "Source: ONS single year lifetables"
  ) +
  theme(
    axis.text.x = element_text(angle = 90)
  )

ggsave(here("figures","allnations_ons_projections.png"), height = 15, width = 25, units = "cm", dpi = 300)
ggsave(here("figures","allnations_ons_projections.svg"), height = 15, width = 25, units = "cm", dpi = 300)
ggsave(here("figures","allnations_ons_projections.eps"))
## Saving 7 x 5 in image
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page
ggsave(here("figures","allnations_ons_projections.wmf"))
## Saving 7 x 5 in image
## Warning in grid.Call.graphics(C_polygon, x$x, x$y, index): semi-
## transparency is not supported on this device: reported only once per page

Now how about 1981-2011 as calibration period?

ons_summary_changes_81 <- 
  ons_e0 %>% 
    group_by(population, sex) %>% 
    nest() %>% 
    mutate(summary_stats = map(data, get_summary_changes, period = c(1981, 2011))) %>% 
    select(-data) %>% 
    unnest(cols = c(summary_stats))

ons_summary_changes_81
set.seed(12)
allnations_preds_81 <- 
  ons_summary_changes_81 %>% 
    left_join(
      ons_e0 %>% 
        filter(year == 2011) %>% 
        select(-year) %>% 
        rename(e0_2011 = e0)
    ) %>% 
    mutate(
      simblock = map2(.x = mean_de0, .y = sd_de0, run_simulations, m = 10000)
    ) %>% 
    select(population, sex, e0_2011, simblock) %>% 
    unnest(cols = c(simblock)) %>% 
    group_by(population, sex, rep) %>% 
    arrange(k) %>% 
    mutate(
      cumulative_change = cumsum(change),
      prediction = e0_2011 + cumulative_change,
      year = k + 2011
    ) %>% 
    left_join(
      ons_e0 %>% rename(e0_obs = e0), 
      by = c("population", "year", "sex")
    )
## Joining, by = c("population", "sex")
allnations_preds_81  

And visualise

allnations_preds_81 %>% 
  group_by(population, sex, year) %>% 
  summarise(
    obs = e0_obs[1],
    mean_pred = mean(prediction),
    med_pred  = median(prediction),
    low_95   = quantile(prediction, 0.025),
    high_95  = quantile(prediction, 0.975),
    low_90   = quantile(prediction, 0.050),
    high_90  = quantile(prediction, 0.950)
  ) %>% 
  ungroup() %>% 
  save_the_table(here("data", "preds_from_1981.csv")) %>% 
  ggplot(aes(x = year)) + 
  facet_grid(sex ~ population) + 
  geom_ribbon(aes(ymin = low_90, ymax = high_90),
              alpha = 0.2
  ) + 
  geom_ribbon(aes(ymin = low_95, ymax = high_95),
              alpha = 0.2
  ) + 
  geom_line(
    aes(y = med_pred),
    colour = "red", size = 1.2
  ) + 
  geom_line(
    aes(y = mean_pred),
    colour = "darkgreen", size = 1.2, linetype = "dashed"
  ) + 
  geom_point(aes(y = obs), size = 2) +
  labs(
    x = "Year", y = "Life expectancy at birth in years",
    title = "Life expectancy against 1981-2011 trend",
    subtitle = "Red: Median of predictions; Green: Mean of predictions\nShaded regions: 90% and 95% credible intervals",
    caption = "Source: ONS single year lifetables"
  ) +
  theme(
    axis.text.x = element_text(angle = 90)
  )

ggsave(here("figures", "all_nations_ons_projections_longer.png"), height = 15, width = 25, units = "cm", dpi = 300)

Project forward another decade

set.seed(12)
allnations_next_decade_targets <- 
  ons_summary_changes %>% 
    left_join(
      ons_e0 %>% 
        filter(year == 2011) %>% 
        select(-year) %>% 
        rename(e0_2011 = e0)
    ) %>% 
    mutate(
      simblock = map2(.x = mean_de0, .y = sd_de0, run_simulations, m = 10000, k = 17) #instead of 7 for main analysis
    ) %>% 
    select(population, sex, e0_2011, simblock) %>% 
    unnest(cols = c(simblock)) %>% 
    group_by(population, sex, rep) %>% 
    arrange(k) %>% 
    mutate(
      cumulative_change = cumsum(change),
      prediction = e0_2011 + cumulative_change,
      year = k + 2011
    ) 
## Joining, by = c("population", "sex")
allnations_next_decade_targets
allnations_next_decade_targets %>% 
  group_by(population, sex, year) %>% 
  summarise(
    mean_pred = mean(prediction),
    med_pred  = median(prediction),
    low_95   = quantile(prediction, 0.025),
    high_95  = quantile(prediction, 0.975),
    low_90   = quantile(prediction, 0.050),
    high_90  = quantile(prediction, 0.950)
  ) %>% 
  ungroup() %>% 
  ggplot(aes(x = year)) + 
  facet_grid(sex ~ population) + 
  geom_ribbon(aes(ymin = low_90, ymax = high_90),
              alpha = 0.2
  ) + 
  geom_ribbon(aes(ymin = low_95, ymax = high_95),
              alpha = 0.2
  ) + 
  geom_line(
    aes(y = med_pred),
    colour = "red", size = 1.2
  ) + 
  geom_line(
    aes(y = mean_pred),
    colour = "darkgreen", size = 1.2, linetype = "dashed"
  ) + 
  labs(
    x = "Year", y = "Life expectancy at birth in years",
    title = "Life expectancy 10 year target against 1990-2011 trend",
    subtitle = "Red: Median of predictions; Green: Mean of predictions\nShaded regions: 90% and 95% credible intervals",
    caption = "Source: ONS single year lifetables"
  ) +
  geom_vline(xintercept = 2018, linetype = "dashed") + 
  theme(
    axis.text.x = element_text(angle = 90)
  )

ggsave(here("figures", "allnations_ons_10yeartargets.png"), height = 15, width = 25, units = "cm", dpi = 300)

And as a table

allnations_next_decade_targets %>% 
  group_by(population, sex, year) %>% 
  summarise(
    mean_pred = mean(prediction),
    sd_pred   = sd(prediction),
    med_pred  = median(prediction),
    low_95   = quantile(prediction, 0.025),
    high_95  = quantile(prediction, 0.975),
    low_90   = quantile(prediction, 0.050),
    high_90  = quantile(prediction, 0.950)
  ) %>% 
  ungroup() %>% 
  write_csv(here("data", "predictions_to_2028.csv"))

Now for predicted/observed comparisons

allnations_preds %>% 
  group_by(population, sex, year) %>% 
  summarise(empirical_p = mean(prediction < e0_obs)) %>% 
  ungroup() %>%
  save_the_table("data/empirical_p_allnations1990_2011.csv") %>% 
  ggplot(aes(x = year, y = empirical_p, colour = sex, group = sex, shape = sex)) + 
  geom_line() + geom_point() +
  facet_wrap(~population) +
  labs(
    x = "Year", y = "Share of projected values worse than observed",
    title = "Share of projected values worse than observed, based on trends from 1990-2011",
    caption = "Source: ONS single year lifetables"
  )

ggsave(here("figures", "allnations_projected_observed.png"),height = 15, width = 20, units = "cm", dpi = 300)

ggsave(here("figures", "allnations_projected_observed.svg"), height = 15, width = 20, units = "cm", dpi = 300)

Longer project forward period, based on longer historic period

set.seed(12)
allnations_next_decade_targets_from81 <- 
  ons_summary_changes_81 %>% 
    left_join(
      ons_e0 %>% 
        filter(year == 2011) %>% 
        select(-year) %>% 
        rename(e0_2011 = e0)
    ) %>% 
    mutate(
      simblock = map2(.x = mean_de0, .y = sd_de0, run_simulations, m = 10000, k = 17) #instead of 7 for main analysis
    ) %>% 
    select(population, sex, e0_2011, simblock) %>% 
    unnest(cols = c(simblock)) %>% 
    group_by(population, sex, rep) %>% 
    arrange(k) %>% 
    mutate(
      cumulative_change = cumsum(change),
      prediction = e0_2011 + cumulative_change,
      year = k + 2011
    ) 
## Joining, by = c("population", "sex")
allnations_next_decade_targets_from81
allnations_next_decade_targets_from81 %>% 
  group_by(population, sex, year) %>% 
  summarise(
    mean_pred = mean(prediction),
    med_pred  = median(prediction),
    low_95   = quantile(prediction, 0.025),
    high_95  = quantile(prediction, 0.975),
    low_90   = quantile(prediction, 0.050),
    high_90  = quantile(prediction, 0.950)
  ) %>% 
  ungroup() %>% 
  ggplot(aes(x = year)) + 
  facet_grid(sex ~ population) + 
  geom_ribbon(aes(ymin = low_90, ymax = high_90),
              alpha = 0.2
  ) + 
  geom_ribbon(aes(ymin = low_95, ymax = high_95),
              alpha = 0.2
  ) + 
  geom_line(
    aes(y = med_pred),
    colour = "red", size = 1.2
  ) + 
  geom_line(
    aes(y = mean_pred),
    colour = "darkgreen", size = 1.2, linetype = "dashed"
  ) + 
  labs(
    x = "Year", y = "Life expectancy at birth in years",
    title = "Life expectancy 10 year target against 1981-2011 trend",
    subtitle = "Red: Median of predictions; Green: Mean of predictions\nShaded regions: 90% and 95% credible intervals",
    caption = "Source: ONS single year lifetables"
  ) +
  geom_vline(xintercept = 2018, linetype = "dashed") + 
  theme(
    axis.text.x = element_text(angle = 90)
  )

ggsave(here("figures", "allnations_ons_10yeartargets_from1981.png"), height = 15, width = 25, units = "cm", dpi = 300)

And as a table

allnations_next_decade_targets_from81 %>% 
  group_by(population, sex, year) %>% 
  summarise(
    mean_pred = mean(prediction),
    sd_pred   = sd(prediction),
    med_pred  = median(prediction),
    low_95   = quantile(prediction, 0.025),
    high_95  = quantile(prediction, 0.975),
    low_90   = quantile(prediction, 0.050),
    high_90  = quantile(prediction, 0.950)
  ) %>% 
  ungroup() %>% 
  write_csv(here("data", "predictions_to_2028_from1981.csv"))