Constructing a life table from an age at death series

Producing the life table

1. The number of deaths

Our starting point for this exercise is a count of the number of dead individuals at each age.

Consider this dataset of Himalayan Thar, Hemitragus jemlahicus (a kind of mountain goat). These data come from Caughley (1966) and were obtained from a population of the Thar that were released around Mount Cook in New Zealand in 1904. The data were processed to smooth away some sampling error, but that is not important for this exercise.

In this data set, \(x\) (x) represents the start of an age interval, and \(D_x\) (Dx) denotes the number of individuals that died within that interval. The width of the interval is 1-year, so when x is 1, the interval is from 1-2 and so on. In Jones (2021) the width of the interval is indicated with \(n\) as part of the standard life table nomenclature including \(_nD_x\), \(_nm_x\), etc. To simplify the nomenclature, the \(n\) is omitted here so the nomenclature becomes \(D_x\), \(m_x\), etc.

tharData

##     x  Dx
## 1   0 109
## 2   1   2
## 3   2   5
## 4   3  10
## 5   4  11
## 6   5  13
## 7   6  12
## 8   7  11
## 9   8  10
## 10  9   7
## 11 10   5
## 12 11   4
## 13 12   6

The first step is to count the total number of animals that we have data for. In this case there are a total of 205 individuals that have died:

sum(tharData$Dx)

## [1] 205

We can think of the life table in terms of intervals (time periods). For example, 0-1 year, 1-2 year, 2-3 year and so on. The numbers in the columns of the life table either refer to the start of the interval, or something happening during the interval.

For example, we already know the number of deaths that occur within each interval (i.e. within the interval 0-1, 1-2, 2-3 and so on.). To build the life table We also need to know the number of individuals alive at the start of each interval (i.e. at age 0, 1, 2 … n).

The number of individuals at the start of the interval 0-1 is easy to deduce: It is simply the total number of deaths that we have recorded, which is the sum of the deaths column (205).

The next step is to add a column representing the count of individuals at the start of each age interval, \(D_x\) (Dx). For example at the start of the 0-1 age interval, there are 205 individuals (i.e. ALL the individuals), then, at the start of the 1-2 age interval there are 96 individuals (i.e. 205 - 109 = ). For the start of the remaining age intervals, we need to add up all the deaths **so far*. In other words, we need the cumulative sum of deaths.

We can do this in R like this:

(tharData %>%
  mutate(cumul_deaths = cumsum(Dx)))

##     x  Dx cumul_deaths
## 1   0 109          109
## 2   1   2          111
## 3   2   5          116
## 4   3  10          126
## 5   4  11          137
## 6   5  13          150
## 7   6  12          162
## 8   7  11          173
## 9   8  10          183
## 10  9   7          190
## 11 10   5          195
## 12 11   4          199
## 13 12   6          205

2. The number of deaths

From the cumulative deaths, we can work out the number alive at the start of each interval, \(N_x\) (Nx). We do this by subtracting the cumulative deaths from the total deaths.

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])))

##     x  Dx cumul_deaths  Nx
## 1   0 109          109 205
## 2   1   2          111  96
## 3   2   5          116  94
## 4   3  10          126  89
## 5   4  11          137  79
## 6   5  13          150  68
## 7   6  12          162  55
## 8   7  11          173  43
## 9   8  10          183  32
## 10  9   7          190  22
## 11 10   5          195  15
## 12 11   4          199  10
## 13 12   6          205   6

Now we have the most important columns of a life table: age (\(x\), x), number alive at the start of the interval (\(N_x\), Nx), deaths within the interval (\(D_x\), Dx). We can get rid of the cumul_deaths column at this point, because it is no longer needed in our calculations.

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths)
)

##     x  Dx  Nx
## 1   0 109 205
## 2   1   2  96
## 3   2   5  94
## 4   3  10  89
## 5   4  11  79
## 6   5  13  68
## 7   6  12  55
## 8   7  11  43
## 9   8  10  32
## 10  9   7  22
## 11 10   5  15
## 12 11   4  10
## 13 12   6   6

4. The age-specific death rate (hazard)

Next we can calculate \(m_x\) (mx), which is the age-specific death rate (hazard) as \(D_x/N_x\).

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx)
)

##     x  Dx  Nx         mx
## 1   0 109 205 0.53170732
## 2   1   2  96 0.02083333
## 3   2   5  94 0.05319149
## 4   3  10  89 0.11235955
## 5   4  11  79 0.13924051
## 6   5  13  68 0.19117647
## 7   6  12  55 0.21818182
## 8   7  11  43 0.25581395
## 9   8  10  32 0.31250000
## 10  9   7  22 0.31818182
## 11 10   5  15 0.33333333
## 12 11   4  10 0.40000000
## 13 12   6   6 1.00000000

5. The age-specific probability of death

We will assume that the individuals survive, on average, until half way through the age interval, so the term \(a_x\) (ax) is a constant 0.5. We can use this ax value to help us work out what the age-specicic mortality probability, \(q_x\) (qx), is using the formula \[q_x = m_x / (1+a_x \times m_x)\].

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx))
)

##     x  Dx  Nx         mx  ax         qx
## 1   0 109 205 0.53170732 0.5 0.42003854
## 2   1   2  96 0.02083333 0.5 0.02061856
## 3   2   5  94 0.05319149 0.5 0.05181347
## 4   3  10  89 0.11235955 0.5 0.10638298
## 5   4  11  79 0.13924051 0.5 0.13017751
## 6   5  13  68 0.19117647 0.5 0.17449664
## 7   6  12  55 0.21818182 0.5 0.19672131
## 8   7  11  43 0.25581395 0.5 0.22680412
## 9   8  10  32 0.31250000 0.5 0.27027027
## 10  9   7  22 0.31818182 0.5 0.27450980
## 11 10   5  15 0.33333333 0.5 0.28571429
## 12 11   4  10 0.40000000 0.5 0.33333333
## 13 12   6   6 1.00000000 0.5 0.66666667

6. The age-specific probability of survival

The age-specific probability of survival, \(p_x\) (px), is \(1 - q_x\).

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx)) %>%
  mutate(px = 1 - qx)
)

##     x  Dx  Nx         mx  ax         qx        px
## 1   0 109 205 0.53170732 0.5 0.42003854 0.5799615
## 2   1   2  96 0.02083333 0.5 0.02061856 0.9793814
## 3   2   5  94 0.05319149 0.5 0.05181347 0.9481865
## 4   3  10  89 0.11235955 0.5 0.10638298 0.8936170
## 5   4  11  79 0.13924051 0.5 0.13017751 0.8698225
## 6   5  13  68 0.19117647 0.5 0.17449664 0.8255034
## 7   6  12  55 0.21818182 0.5 0.19672131 0.8032787
## 8   7  11  43 0.25581395 0.5 0.22680412 0.7731959
## 9   8  10  32 0.31250000 0.5 0.27027027 0.7297297
## 10  9   7  22 0.31818182 0.5 0.27450980 0.7254902
## 11 10   5  15 0.33333333 0.5 0.28571429 0.7142857
## 12 11   4  10 0.40000000 0.5 0.33333333 0.6666667
## 13 12   6   6 1.00000000 0.5 0.66666667 0.3333333

7. Survivorship (\(l_x\))

At this point, we can choose a population size to start our synthetic cohort (\(l_0\)), the radix, and calculate \(l_{x+n}\) as lx * px. Typically in (non-human) animals, we use 1.0 so that the lx curve represents a proportion of individuals born, which is known as survivorship.

This means that the \(l_x\) vector can be calculated as the radix multiplied by the cumulative product of \(p_x\), as follows. Note that the [1:length(px)] part simply ensures that the length of the vector matches the length of the data frame.

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx)) %>%
  mutate(px = 1 - qx) %>%
  mutate(lx = c(1.0, 1.0 * cumprod(px))[1:length(px)])
)

##     x  Dx  Nx         mx  ax         qx        px         lx
## 1   0 109 205 0.53170732 0.5 0.42003854 0.5799615 1.00000000
## 2   1   2  96 0.02083333 0.5 0.02061856 0.9793814 0.57996146
## 3   2   5  94 0.05319149 0.5 0.05181347 0.9481865 0.56800350
## 4   3  10  89 0.11235955 0.5 0.10638298 0.8936170 0.53857326
## 5   4  11  79 0.13924051 0.5 0.13017751 0.8698225 0.48127824
## 6   5  13  68 0.19117647 0.5 0.17449664 0.8255034 0.41862663
## 7   6  12  55 0.21818182 0.5 0.19672131 0.8032787 0.34557769
## 8   7  11  43 0.25581395 0.5 0.22680412 0.7731959 0.27759519
## 9   8  10  32 0.31250000 0.5 0.27027027 0.7297297 0.21463546
## 10  9   7  22 0.31818182 0.5 0.27450980 0.7254902 0.15662587
## 11 10   5  15 0.33333333 0.5 0.28571429 0.7142857 0.11363054
## 12 11   4  10 0.40000000 0.5 0.33333333 0.6666667 0.08116467
## 13 12   6   6 1.00000000 0.5 0.66666667 0.3333333 0.05410978

8. Number dying, person-years lived etc.

After this, we can calculate three remaining important quantities, which all vary in proportion to the number selected as the radix. The three remaining quantities are the number of individuals dying in the interval (\(d_x\), dx), the number of person-years lived in the interval (\(L_x\), Lx), and the number of person years lived beyond the start of the interval (\(T_x\), Tx). The value for dx is simply \(l_x - l_{x+1}\). Therefore it is useful to first calculate \(l_{x+1}\) in its own column (lx_plus_1) that can be used in the calculations.

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx)) %>%
  mutate(px = 1 - qx) %>%
  mutate(lx = c(1.0, 1.0 * cumprod(px))[1:length(px)]) %>%
  mutate(lx_plus_1 = c(lx[2:length(lx)], NA)) %>%
  mutate(dx = lx - lx_plus_1)
)

##     x  Dx  Nx         mx  ax         qx        px         lx  lx_plus_1
## 1   0 109 205 0.53170732 0.5 0.42003854 0.5799615 1.00000000 0.57996146
## 2   1   2  96 0.02083333 0.5 0.02061856 0.9793814 0.57996146 0.56800350
## 3   2   5  94 0.05319149 0.5 0.05181347 0.9481865 0.56800350 0.53857326
## 4   3  10  89 0.11235955 0.5 0.10638298 0.8936170 0.53857326 0.48127824
## 5   4  11  79 0.13924051 0.5 0.13017751 0.8698225 0.48127824 0.41862663
## 6   5  13  68 0.19117647 0.5 0.17449664 0.8255034 0.41862663 0.34557769
## 7   6  12  55 0.21818182 0.5 0.19672131 0.8032787 0.34557769 0.27759519
## 8   7  11  43 0.25581395 0.5 0.22680412 0.7731959 0.27759519 0.21463546
## 9   8  10  32 0.31250000 0.5 0.27027027 0.7297297 0.21463546 0.15662587
## 10  9   7  22 0.31818182 0.5 0.27450980 0.7254902 0.15662587 0.11363054
## 11 10   5  15 0.33333333 0.5 0.28571429 0.7142857 0.11363054 0.08116467
## 12 11   4  10 0.40000000 0.5 0.33333333 0.6666667 0.08116467 0.05410978
## 13 12   6   6 1.00000000 0.5 0.66666667 0.3333333 0.05410978         NA
##            dx
## 1  0.42003854
## 2  0.01195797
## 3  0.02943023
## 4  0.05729503
## 5  0.06265160
## 6  0.07304894
## 7  0.06798250
## 8  0.06295973
## 9  0.05800958
## 10 0.04299534
## 11 0.03246587
## 12 0.02705489
## 13         NA

Then, \(L_x\) is calculated as \(l_{x+1} + a_x + d_x\), except for the last, open, interval which is \(l_x/m_x\).

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx)) %>%
  mutate(px = 1 - qx) %>%
  mutate(lx = c(1.0, 1.0 * cumprod(px))[1:length(px)]) %>%
  mutate(lx_plus_1 = c(lx[2:length(lx)], NA)) %>%
  mutate(dx = lx - lx_plus_1) %>%
  mutate(Lx = ifelse(x == max(x), lx / mx[length(mx)], lx_plus_1 + ax * dx))
)

##     x  Dx  Nx         mx  ax         qx        px         lx  lx_plus_1
## 1   0 109 205 0.53170732 0.5 0.42003854 0.5799615 1.00000000 0.57996146
## 2   1   2  96 0.02083333 0.5 0.02061856 0.9793814 0.57996146 0.56800350
## 3   2   5  94 0.05319149 0.5 0.05181347 0.9481865 0.56800350 0.53857326
## 4   3  10  89 0.11235955 0.5 0.10638298 0.8936170 0.53857326 0.48127824
## 5   4  11  79 0.13924051 0.5 0.13017751 0.8698225 0.48127824 0.41862663
## 6   5  13  68 0.19117647 0.5 0.17449664 0.8255034 0.41862663 0.34557769
## 7   6  12  55 0.21818182 0.5 0.19672131 0.8032787 0.34557769 0.27759519
## 8   7  11  43 0.25581395 0.5 0.22680412 0.7731959 0.27759519 0.21463546
## 9   8  10  32 0.31250000 0.5 0.27027027 0.7297297 0.21463546 0.15662587
## 10  9   7  22 0.31818182 0.5 0.27450980 0.7254902 0.15662587 0.11363054
## 11 10   5  15 0.33333333 0.5 0.28571429 0.7142857 0.11363054 0.08116467
## 12 11   4  10 0.40000000 0.5 0.33333333 0.6666667 0.08116467 0.05410978
## 13 12   6   6 1.00000000 0.5 0.66666667 0.3333333 0.05410978         NA
##            dx         Lx
## 1  0.42003854 0.78998073
## 2  0.01195797 0.57398248
## 3  0.02943023 0.55328838
## 4  0.05729503 0.50992575
## 5  0.06265160 0.44995243
## 6  0.07304894 0.38210216
## 7  0.06798250 0.31158644
## 8  0.06295973 0.24611533
## 9  0.05800958 0.18563067
## 10 0.04299534 0.13512821
## 11 0.03246587 0.09739760
## 12 0.02705489 0.06763722
## 13         NA 0.05410978

\(T_x\) is the sum of \(L_x\) from \(x\) onwards.

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx)) %>%
  mutate(px = 1 - qx) %>%
  mutate(lx = c(1.0, 1.0 * cumprod(px))[1:length(px)]) %>%
  mutate(lx_plus_1 = c(lx[2:length(lx)], NA)) %>%
  mutate(dx = lx - lx_plus_1) %>%
  mutate(Lx = ifelse(x == max(x), lx / mx[length(mx)], lx_plus_1 + ax * dx)) %>%
  mutate(Tx = c(sum(Lx, na.rm = TRUE), sum(Lx, na.rm = TRUE) - cumsum(Lx))[1:length(Lx)])
)

##     x  Dx  Nx         mx  ax         qx        px         lx  lx_plus_1
## 1   0 109 205 0.53170732 0.5 0.42003854 0.5799615 1.00000000 0.57996146
## 2   1   2  96 0.02083333 0.5 0.02061856 0.9793814 0.57996146 0.56800350
## 3   2   5  94 0.05319149 0.5 0.05181347 0.9481865 0.56800350 0.53857326
## 4   3  10  89 0.11235955 0.5 0.10638298 0.8936170 0.53857326 0.48127824
## 5   4  11  79 0.13924051 0.5 0.13017751 0.8698225 0.48127824 0.41862663
## 6   5  13  68 0.19117647 0.5 0.17449664 0.8255034 0.41862663 0.34557769
## 7   6  12  55 0.21818182 0.5 0.19672131 0.8032787 0.34557769 0.27759519
## 8   7  11  43 0.25581395 0.5 0.22680412 0.7731959 0.27759519 0.21463546
## 9   8  10  32 0.31250000 0.5 0.27027027 0.7297297 0.21463546 0.15662587
## 10  9   7  22 0.31818182 0.5 0.27450980 0.7254902 0.15662587 0.11363054
## 11 10   5  15 0.33333333 0.5 0.28571429 0.7142857 0.11363054 0.08116467
## 12 11   4  10 0.40000000 0.5 0.33333333 0.6666667 0.08116467 0.05410978
## 13 12   6   6 1.00000000 0.5 0.66666667 0.3333333 0.05410978         NA
##            dx         Lx         Tx
## 1  0.42003854 0.78998073 4.35683718
## 2  0.01195797 0.57398248 3.56685644
## 3  0.02943023 0.55328838 2.99287396
## 4  0.05729503 0.50992575 2.43958558
## 5  0.06265160 0.44995243 1.92965984
## 6  0.07304894 0.38210216 1.47970740
## 7  0.06798250 0.31158644 1.09760524
## 8  0.06295973 0.24611533 0.78601880
## 9  0.05800958 0.18563067 0.53990348
## 10 0.04299534 0.13512821 0.35427281
## 11 0.03246587 0.09739760 0.21914461
## 12 0.02705489 0.06763722 0.12174700
## 13         NA 0.05410978 0.05410978

9. Life expectancy

Finally, life expectancy (\(e_x\), ex) from age \(x\) is calculated as \(T_x / l_x\).

We can also tidy up by selecting ONLY the main life table columns and omiting those columns that we used to help with calculations on the way

(tharData <- tharData %>%
  mutate(cumul_deaths = cumsum(Dx)) %>%
  mutate(Nx = sum(Dx) - c(0, cumul_deaths[1:(length(Dx) - 1)])) %>%
  select(-cumul_deaths) %>%
  mutate(mx = Dx / Nx) %>%
  mutate(ax = 0.5) %>%
  mutate(qx = mx / (1 + 0.5 * mx)) %>%
  mutate(px = 1 - qx) %>%
  mutate(lx = c(1.0, 1.0 * cumprod(px))[1:length(px)]) %>%
  mutate(lx_plus_1 = c(lx[2:length(lx)], NA)) %>%
  mutate(dx = lx - lx_plus_1) %>%
  mutate(Lx = ifelse(x == max(x), lx / mx[length(mx)], lx_plus_1 + ax * dx)) %>%
  mutate(Tx = c(sum(Lx, na.rm = TRUE), sum(Lx, na.rm = TRUE) - cumsum(Lx))[1:length(Lx)]) %>%
  mutate(ex = Tx / lx) %>%
  select(x, Nx, Dx, mx, qx, px, lx, dx, Lx, Tx, ex)
)

##     x  Nx  Dx         mx         qx        px         lx         dx         Lx
## 1   0 205 109 0.53170732 0.42003854 0.5799615 1.00000000 0.42003854 0.78998073
## 2   1  96   2 0.02083333 0.02061856 0.9793814 0.57996146 0.01195797 0.57398248
## 3   2  94   5 0.05319149 0.05181347 0.9481865 0.56800350 0.02943023 0.55328838
## 4   3  89  10 0.11235955 0.10638298 0.8936170 0.53857326 0.05729503 0.50992575
## 5   4  79  11 0.13924051 0.13017751 0.8698225 0.48127824 0.06265160 0.44995243
## 6   5  68  13 0.19117647 0.17449664 0.8255034 0.41862663 0.07304894 0.38210216
## 7   6  55  12 0.21818182 0.19672131 0.8032787 0.34557769 0.06798250 0.31158644
## 8   7  43  11 0.25581395 0.22680412 0.7731959 0.27759519 0.06295973 0.24611533
## 9   8  32  10 0.31250000 0.27027027 0.7297297 0.21463546 0.05800958 0.18563067
## 10  9  22   7 0.31818182 0.27450980 0.7254902 0.15662587 0.04299534 0.13512821
## 11 10  15   5 0.33333333 0.28571429 0.7142857 0.11363054 0.03246587 0.09739760
## 12 11  10   4 0.40000000 0.33333333 0.6666667 0.08116467 0.02705489 0.06763722
## 13 12   6   6 1.00000000 0.66666667 0.3333333 0.05410978         NA 0.05410978
##            Tx       ex
## 1  4.35683718 4.356837
## 2  3.56685644 6.150161
## 3  2.99287396 5.269112
## 4  2.43958558 4.529719
## 5  1.92965984 4.009448
## 6  1.47970740 3.534671
## 7  1.09760524 3.176146
## 8  0.78601880 2.831529
## 9  0.53990348 2.515444
## 10 0.35427281 2.261905
## 11 0.21914461 1.928571
## 12 0.12174700 1.500000
## 13 0.05410978 1.000000

Constructing a life table from an age at death series

Owen Jones

2023-04-22

Preamble