true

Modern day mortality projections in actuarial institutions are rather intricate systems of mathematical models that take into consideration a variety of factors for accurate forecasting purposes. It is the pupose of this paper to analyse the well known Lee-Carter model and try and perform an analysis from first principles, in addition to a more through analysis of the model on the Australian population. A section will be dedicated with emphasis on the Hierarchical Model for Mortality Projection, with relation to Generalized Linear Models with their various benefits and finally, present what kinds of analyses hold in the field of acturial sciences for the near future.

1. Introduction.

With the improvement of medical technologies, life expectancies world over have seen a remarkable improvement across various age groups. This particular exposure to risk is called longevity risk. With life insurance and pensions often lasting for decades it leads to increased payouts and survival benefits to retirees and annuitants. Also, improvements in computational processing has also aided in carrying out intensive statistical algorithms but virute of which more sophisticated modes of analysis can be carried with greater degrees of accuracy and efficiency.

In the beginning sections a brief introduciton will be given on the various methods that are currently or have been used in the past for mortality modelling. But, the crux of this paper will be its focus Generalised Linear Models for mortality prediction. Much can be written about this, but our focus will be limited to the Lee-Crter Model.

1.1 Generalised Linear Models

Generalised Linear Models(GLM) are, like the name states, generalizations of the classical linear models. These calssical models have been around since the 1970’s , evidenced by the volume of papers that had been put out in that time. Though the use of GLM’s can be traced back to the 1980’s in the analysis of motor insurance claims data, that was first modelled by Baxter et al. (1980). Later in this paper extensions of the GLM into credibility theory and Hierarchical Credibility Models (HGLMs) will be explored.

GLMs are a group of modes all built around the exponential family of distributions. Consider a random variable \(Y\) whose probability dstribution depends on a single parameter \(\theta\). A general expression for the exponential family of distributions for \(Y\) and \(\theta\) is given by\(^1\): \[f(y;\theta) = s(y)t(\theta)e^{a(y)b(\theta)} \]

where a, b, s and t are functions that are known to us.

While the above distribution serves an introduction to the idea of the form of the exponential family of distributions, the form of the distribution that we are most intersted in the one formulated by Haberman and Renshaw(1996)\(^2\). The form of the log-likelihood exponential distribution is given by: \[ l = \frac{y\theta-b(\theta))}{\phi} + c(y,\phi) \]

1.2 SDE’s for Mortality Projection.

The most widely cited model for projecting mortality rates in this category is the Lee-Carter (1992) model. As such only the aforementioned model will be explained in brief in this section. This model was developed by the namessake’s of the model to forecast mortality projections for the US population. The explicit form of the model is given by\(^3\): \[ln(m(x,t) = a_x + b_xk_t + \varepsilon_{x,t} \] where, \(a_x\) & \(b_x\) are appropriately chosen sets of age-specific constants. with a time-varying index \(k_t\) and \(\varepsilon_{x,t}\) is the error term with mean \(0\) and variance \(\sigma_\varepsilon^2\). These parameters are responsible for projecting the mortality rate given by \(m(x,t)\), which should be read as the central death rate at age \(x\) in year \(t\).

In fitting the mdoel, the following constraints are placed on the model; \(\Sigma b_x = 1\) and \(\Sigma k_t = 0\), which implies that with the now normalised model, the age-spefic constraint of \(a_x\) is simply the averages of \(lm(m(x,t))\) over time.

1.3 State Space Models

Though our applications of state space models will be in relation to mortalility modelling, it is fascinating to learn that it originated in the field of control engineering\(^4\) with the paper by Kalman(1960). It was later realized that state space models could be applied to many more areas of study. One of the first forays was the fitting of the auto regressive integrated moving average(ARIMA) by Box and Jenkins(1976). A more general approach for analysis of time-series data was put forward by Durbin and Koopman(2001) over Linear Gaussian state space models.

2. Credibility Models and Generalised Linear Models

Credibility theory usually referred to by actuaries, as a set of techniques by which premiums for short-term insurance contracts can be estimated. It can be defined as a linear function of past data from the risk itselfin combination with some kind of collateral data. In this section a brief description of credibility formulae by Buhlmann(1967) will be laid out along with its usefulness in regards to GLM’s.

Buhlmann(1976) approached credibility through analyzing the classical ratemaking problem. This dealt with estimation of a distribution, given some information and its asscoiated risks. Particulary, he was interested in grouping these risks disregarding homogeneity. Given some data \(y_{ij}\) for \(i=1,2...,t\) and \(j=1,2...,n\), where \(i\) indexes risk in the presumed data and also that each risk has its equivalent risk parameter \(\xi_i\) for every \(i\), the assumptions of the model are\(^11\):

Risks and their corresponding \(\xi_i\)’s are iid.
\(y_{ij}|\xi_i\) are also iid.

Also, the distributions are considered to be one among the exponential family of distributions. The canonical parameter can be written as: \[\theta '= \theta(m(\xi_i))=\theta(u_i)\] where \(m(\xi_i)=E[y_{ij}|\xi_i]\) and \(\theta\) is the canonical link function and \(u_i\) is a stochastic component in the framework.

Now as to the actual nature of the distribution, it can be defined by a Bayesian prior distribution with a simlar form of the stochastic components, in the form of a maximized hierarchical model. The Hierarchical Generalised Linear Model(HGLM) can be defined by the kernel of the log likelihood of \(\theta(u_i)\) defined as: \[a_1\theta'_i-a_2b(\theta'_i)\] Thus a hierarchical log-likelihood can be defined as: \[h = \Sigma_{i,j}l(\theta'_i;y_{i,j}|v_i) + \Sigma_il(v_i)\] \[= \Sigma_{i,j}(\frac{y_{i,j}\theta'_i-b(\theta'_i)}{\phi}) + c(y_{i,j},\phi) +a_1\theta'_i-a_2b(\theta'_i) \] As can be observed the linear predictor in \(y_{i,j}\) can be modelled in the second stage of the likelihood and consists of the stochastic component. Futher maximizing the hierarchical likelihood will spit out estimates of the stochastic component i.e. \(m(\xi_{i})=u_i\). This can be shown in the following proof:

We know, \[\mu = E(Y) = \frac{db\theta}{d\theta}\] From this expression we get, \[\frac{\partial b(\theta(u_i))}{\partial v_i}=u_i\] Therefore, \[\frac{\partial h}{\partial v_i} = \Sigma_{j = 1}^k(\frac{y_{i,j}-u_i}{\phi})+a_1-a_2u_i\] Equating the above equation to \(0\) yields, \[y_{i+}-k\hat{u}_i+\phi a_1-\phi a_2\hat{u_i}=0\] where, \[y_{i+}=\Sigma_{j=1}^ky_{i,j} \] Hence, \[\hat{u_i}=\frac{y_{t+}+\phi a_1}{k + \phi a_2}\] \[=Z\bar{y_i} + (1-Z)m\] where, \(\bar{y_i}=\frac{1}{k}y_{i+}, Z = \frac{Z}{k + \phi a_2}\) and m = \(\frac{a_1}{a_2}\).

And thus we have our final result.

3. Generalised Linear Models

The primary aim of this paper is to use Generalised Linear Models for the purposes of mortaliity modelling. Specifically the model under consideration will be the Lee-Carter Model. The name sakes of the Lee-Carter model, Ronald E. Lee and Lawerence R. Carter proposed their model in 1992 for forecasting US mortality. The model is goven by: \[ln[m(x,t)] = a_x + b_xk_t + \varepsilon_{x,t}\] where \(a_x\) and \(b_x\) are a set of age specfic constants and \(k_t\) is an index that varies with time. We will later see that \(k_t\) is modelled as a random walk with drift\(^4\). Now to get into the specifics of the model, \(m(x,t)\) is the central death rate say for age \(x\) in yeae \(t\). The central death rate is defined as the ratio of the number of deaths, \(D_{x,t}\) and the central exposures \(E_{x,t}\). This information was obtained from the Human Mortality Database. The specifics of how the data was extracted and collated will be detailed in the appendix. some information on the subscripts would allow us to better interpret the indices. Subscript \(x\) refers to the age, while \(t\) refers to the year.

In the Lee-Carter model, the \(a_x\) and \(b_xk_t\) capture age and period effects respectively. This was the innovation of this particular model. Incorporating these specific effects resulted in an effective prediction process. Though it goes without saying that later extensions of the model, expanded on the prediciton process makinig it all the more effective.

3.1 Lee-Carter model Specifications

Now that a brief introduction to the Lee-Carter model has been given, this section will deal with the mathematical assumptions behind the model. As mentioned above, the Lee-Carter model assumes age and period effects and projects the mortality on its basis with an element of added noise.

The inputs to the Lee-Carter model are matrices of observed deaths in a population and a matrix of exposures of said population. The exposures serve to inform us of the probability that a section of the population will survive moving from some year \(t\) to \(t+1\). For both matrices, the order is such that the ages line up as rows, while the columns line up as the years.

The original model used SVD(Singular Value Decomposition) to reduce the size of the mortality matrix for a more efficient process. It can be assumed that the original data is highly correlated, and so to perform any kind of analysis is quite difficult in and of itself. The SVD process, reduces the dimensionality, so that further forecasting is made relatively easy.

The process of estimation of the parameters is well documented in Lee and Carter(1992) and an explanation will be given below\(^5\):

\(x\) denotes the age groups in the mortality matrix.
\(t\) denotes the year of the mortality matrix.
\(m_{x,t}\) is the central rate of mortality which is the ratio of the deaths to entral exposures.
\(a_x\) denotes the age-specific coefficient of mortality.
\(b_x\) denotes the second age-specific coefficient that is sensitive to \(ln(m_{x,t})\).
\(k_t\) is the time-specific coefficient that shows the time trend of mortaility for year \(t\).
\(\varepsilon_{x,t}\) is the noise coefficent that introduces noise/uncertainty into the framework.

Some important things to mention about the parameters listed above; \(b_x\) shows how mortality rates vary with sensitivity to \(k_t(dln(m_{x,t}/dt = b_xdk/dt\) as elucidated in Lee-Carter(1992). When \(k_t\) aries linearly, the mortality at each age group changes exponentially and negative mortality rates are not permitted in this model. each age-spefic rate goes to zero as \(k_t\) goes to negative infinity.

Referring the the HMD database(www.mortality.org), values for exposures and deaths can be extracted. Part of the parameter estimation relies on estimating the \(k_t\) time-specific index. Lee and Carter suggest that this can be done to define a set of central death rates. This is made possible using the following relation: \[D(t) = \Sigma[E(x,t)e^{a_x+b_xk_t}]\]

They also make spefic reference to the fact that this cannot be done analytically, rather the required values of \(k_t\) have to be sought out. Once these values have been collated it serves as the foundation by which the mortality rates can be forecasted. This will be dealt with in detail at a later point.

3.2 Fitting the Model

The original paper used a least-squares solution to the equation given above for the model: \[ln[m(x,t)] = a_x + b_xk_t + \varepsilon_{x,t}\] The authors raise the issue of identifiability with the model. Certain constraints need to be imposed on the model for accurate results. The constraints put forward were as follows: \[\Sigma b_x=1\] \[\Sigma k_t = 0\] On a side note, this is not unique only to this model. Many extensions of this model also suffer from identifiability issues and similar constraints hae to be set.

The least squares solution to the above equation can be accomplished with Singular Value Decomposition(SVD) applied to the mortality matrix. Below is a brief primer to the SVD process and the fitting process is given below.

With reference to the constraints applied on the model that were described above, the SVD can be applied to the matrix given by \(Z_{x,t}=ln(m_{x,t})-\hat{a_x}\). This in turn produces the matrices \(UDV'\), where \(U\) represents the age-specific coefficients, \(D\) is a diagonal matrix containing the singular values and \(V'\) represents the transpose of the time-specific coefficient\(^7\). The quantity \(\hat{a_x}\) represents the average of the log of the mortality rates, i.e. \[\hat{a}=\frac{1}{T}\Sigma_tln(m_{x,t})\] The parameter \(\hat{b}_x\) can be derived from the right singular matrix \(U\) and corresponds to the first column of the matrix, i.e. \(\hat{b} = U_{x1}\). While \(\hat{k_t}\) can be estimated from the left singular matrix \(V\), specifically given by the product of the first columns of the diagonal matrix \(D\) and \(V\), given by \(\hat{k_t}= D_1V_{t1}\). A composite matrix \(\hat{Z_{x,t}}=\hat{b}_x\hat{k_t}\) can then be produced by the product of the mentioned parameters. The estimates for the log of the mortality rates can then be computed by \[ln(\hat{m_{x,t}})=\hat{a_x}+\hat{Z_{x,t}}=\hat{a_x}+\hat{b}_x\hat{k_t}\] The \(\hat{}\) over the parameters represent estimates.

3.3 Forecasting of the model

The Lee-Carter model approaches forecasting of mortality rates with the utility of the time-spefic index \(\hat{k_t}\). More specifically, it is approached from an angle in univariate time analysis, with the forecasts done with a random walk with drift parameter. This can be encapsualted with the sequence of equations given below\(^7\): \[\hat{k_t}=\hat{k_{t-1}} + \theta + \xi_t\] \[\xi_t \tilde{} \mathcal{N}(0,\sigma^2)\] where \(\theta\) is known as the drift parameter and \(\xi_t\) is the error parameter which introduces uncertainty in the forecast, else the forecast follows a straight line. The drift parameter can be estimated by its maximum likelihood which only depends on its first and last term. The equation is given by: \[\hat{\theta} = \frac{\hat{k_T}-\hat{k_1}}{T-1}\] This parameter can then be plugged back into equation for mortality index, substituting \(\theta\) for \(\hat{\theta}\) to spit out forecast point estimates. This process done iteratively can spit out a series of vaules that can help in the forecsting process. Now, a more general approach for estimation to a future time period \(T\), the drift parameter estimate can be written as: \[\hat{k}_{T+\Delta t}=\hat{k}_T+(\Delta t)\hat{\theta}\] if we ignore the error term for the moment. This is astraight line with slope \(\hat{\theta}\) and defines the aforementioned straight line which can be extrapolated through the first and the last term of the maximum likelihood of \(\hat{\theta}\).

3.4 Lee-Carter Model Parameter Estimation & Forecasting by First Principles

In this section parameter estimation for the mdoel will be carried out through first principle’s. The SVD process will be illustrated and the corresponding parameter estimates and their corresponding plots will be shown to get a clearer picture of what happens under the hood of the formulation. The focus of the formulation will be limited to the mortality of males in the age bracket 40 to 90 for the Australian population.

3.4.1 SVD fit over all age groups

The graphs below show a reasonable fit with the results of the SVD. Though, it goes without saying that the fit is less than stellar. It’s clear that for some age groups, the fit swerves wildly deviating away from the observed values.

3.4.2 Focusing on the age range 40-90 years.

A closer look at the plots below of age ranges of interest yeilds that the fit is far worse than first anticipated. The figure on the left shows the observed values and fit for the year 1921, which is the beginning year in the data, and the figure on the right in for the year 2014. The fit does appear salvagable on the right. Even with the SVD exposed to the same amount of data, the deviations are quite hard to explain. It stands that with further fine tuning of the SVD, the fit could be better.

3.4.3 Plotting Parameters from First Principles

Now, that the overall fit of the SVD to the data has been established along with its shortcomings, the plot below illustrates how the parameters \(a_x\), \(b_x\) and \(k_t\) fared in the SVD.

Again, the parameter plots shown below do seem reasonable and follow patterns observed by Lee and Carter. Though, the variation of \(b_x\) is quite interesting. After an immediate descent, it seems to remain more or less on level, with a slight increase. In the real world this would suggest that it has very little sensitivity to the movement of the mortality rates. This perhaps explains why the fit observed in previous section was less than ideal.

3.4.4 Results of Forecast

As per the literature, the time-specific parameter \(k_t\) was plotted with a random walk. The following graph illustrates that with the blue lines signifying the 5% and 95% confidence intervals and the red line signifying the mean or 50% confidence interval. This is expected, atleast as far as the analysis thus far has carried us through.

3.5 Fittting of Lee-Carter Model with Generalised Linear Model’s

While the above exercise was important in understanding the framework of the Lee-Carter model, there are far more efficient ways and techniques to carry it out. R is a robust software package thats useful for a deep dive into statistical analysis. For the rest of the paper utility of the functions with the StMoMo package will be utlisised.

But first, a deeper explaination on the mode of analysis will need to be sketched out. Specifically the StMoMo package in R enables us to use Generalised Non-Linear Models(GNM) to model and forecast with the Lee-Carter model. So, a more rigorous analysis of GNM’s would lay some important groundwork. This is well laid out in McCullagh and Nelder(1987) and will serve as the primary reference for the remainder of this section unless specified other wise. Also a brief introduction the the Generalised Linear Models was given in the first section. This will now be extended linking it with GNM’s.

To re-iterate, a GLM is specified with the help of two components, namely, a response function and a link funciton. The response function should be a member of of the exponential family of distributions. These include the Normal, Poisson, Binomial etc. The general distribution of an exponential family of distributions is given by\(^8\): \[f(y|\theta,\phi)=exp[ \frac{y\theta-b(\theta)}{a(\phi)}+c(y,\phi)] \]

The choice of the parameters \(\theta\) and \(\phi\) will define what for the distribution takes. For this express reason \(\theta\) is called the canonical parameter and specifies location while \(\phi\) is called the dispersion parameter and represents the scale. As an example, the Normal distribution and the choice of parameters is illustrated below:

\[f(y|\theta,\phi)=\frac{1}{\sqrt{2\pi \sigma}} exp[-\frac{(y-\mu)^2} {2 \sigma^2}]\] With the selection of parameters as \(\theta=\mu\), \(\phi=\sigma^2\), \(a(\phi)=\phi\), \(b(\theta)=\theta^2/2\) and \(c(y,\phi)=-(y^2/\phi + log(2\pi \phi))/2\) the above equation can be rewritten as: \[f(y|\theta,\phi)=exp[\frac{y\mu-\mu^2/2}{\sigma^2}-\frac{1}{2}(\frac{y^2}{\sigma^2}+log(2\pi \sigma^2))]\] In general, the exponential fmaily of distributions has the following expressions for mean and variance: \[E[Y] = \mu = b'(\theta)\] \[var[Y] = b''(\theta)a(\phi)\]

As can be onserved the mean is a function of \(\theta\) only while the variance is a function of both \(\theta\) and \(\phi\), of both the location and the scale. In the Gaussian case as above, \(b''(\theta)=1\) making the variance independent of the mean. This is an exception worth noting as this is unique only to the Gaussian distribution.

The second component, namely the link function can be expressed as a linear predictor in the form: \[\eta = \beta_0 + \beta_1 x_1 + ...+ \beta_p x_p = x^T\beta\] The link function \(g\) provides the clues as to how the covariates are linked with the mean response \(E[Y]=\mu\) through the linear predictor: \[\eta = g(\mu)\] Continuing in the scheme of the Gaussian distribution of the exponential family of distributions, the link function is given by: \[\eta = \mu\] At this point having laid some groundwork for GLM’s we can now move onto the scenario of GNM’s. The non-linear nature of these functions is typically introduced through the variance terms or the link funciton, while the linearity of the predictor is maintained. It is intuitive to grasp that introduciton of non-linearity complicates things considerably by introducing another level of covariates or further iterative steps. But a careful consideration can have a wide range of benefits.

3.5.1 Lee-Carter Model

The following assumptions are made regarding the data and analysis under considerations. This is in line with the considerations for utilizing the StMoMo package for this section\(^9\).

Deaths \(D_{x,t}\) is a random component that follows either a Poisson or Binomial distribution. \[D_{x,t} ~ Poisson(E^c_{x,t} \mu_{x,t}\] \[D_{x,t} ~ Binomial(E^i_{x,t} \mu_{x,t}\] with \(E[D_{x,t}/E^c_{x,t}] = \mu_{x,t}\) and \(E[D_{x,t}/E^i_{x,t} = q{x,t}\), \(E^c_{x,t}\) being the central exposure of the popluation and \(E^i_{x,t}\) is the initial exposure of a population at the beginning of the year.
A systematic component that corresponds to the model under consideration. This would be the Lee-Carter model: \[ln[m(x,t)] = a_x + b_xk_t + \varepsilon_{x,t}\]
A link function that forms relationships between the random and systematic components such that, \[g(E(\frac{D_{x,t}}{E_{x,t}}=\eta_{x,t}\] This is achieved in the package with the utility of the canonical link, pairing the log link with the Poisson distribution and the logit link with the Binomial distribution, respectively.
A set of parameter constraints to deal with identifiability issues of the model.

3.5.2 Mortality Analysis for Male & Female populations of Australia.

The following set of graphs show the age-specific parameters \(a_x\), \(b_x\) and the time-spefic parameter \(k_t\) for the male and female populations of the Australian population, for the ages from 40-90. It is clear to see that mortality rates among males are higher. This is particularly noticable looking at plot (a) in Fig.4, the plot of the \(a_x\)’s. This parameter is to be interpreted as the average mortality across an age group. And the levels of the male population are higher.

Looking at plot (c) in Fig. 4, we can also see that the time-specific parameter, which captures the trend of mortality, shows that while the female mortality showed more or less a constant decrease in the rate of mortality, the male popolation show a degree of stationarity before beginnint to drop-off around the year 1970, before matching trends with the female population.

The followings plots show the heat maps of the residuals from the formulations of the Lee-Carter model of the parameter estimates. Ideally, we should be expecting a truly random plot. But as it obvious, certain trend lines can be observed. This is typically because of the inability of the Lee-Carter model to expect for cohort factors. A cohort factor can be defined as the some relationship of a population based on its interaction amongst its age and period, relating to how it may or may not affect analysis of the mortality of that population. Cohort factors are an important extension to the Lee-Carter that have been analysed in models like the CBD model(2009) by Cairns et al. and the Plat model(2009) to name a few.

Looking at the heat map for the female population(Fig.6), it seems to be just as distinct as the heat map for the male population(Fig.5), though the latter seems to show a bit more randomness. But both plots specifically show a trend line around the years 1940 to 1980, leading one to conclude that rates of mortality showed some considerable variation in the period specified.

The following plots show the deviance residuals of the Lee-Carter model in the form of a scatter plot. Our earlier conclusion seems to be brought to clarity here. The male data are grouped more tightly together showing there is a nature of correlation within the data, while the female data is more dispersed, confirming out conclusion of more randomness. But, the shortcomings discussed earlier also stand.

The next series of plots show forecasts carried out on the time-specific cohort. A description of the three graphs is as described below:

Plots (a) and (c) are the forecasts of \(k_t\) carried out with a random walk with drift for the male and female data. This is easily achieved with the functionaities of the forecast package in R.
Plot (b) and (d) are the forecasts of \(k_t\) carried out with an arima process of order (1,1,2). This too can be easily specified within the forecast() funciton of the forecast package.

Comparing these plots, one can see that the arima approximation is quite similar to the forecast information by the random walk. Plots (a) and (c) specifically show a far better approximation of the forecasts as compared to the forecast plot carried by the SVD in Sec 3.4.4.

The flexibility of the StMoMo package enable us to simulate data for the mortality as well. The literature in [11] informs us of how these simulations are carried out. They assume period cohort effect (in the case of the Lee-Carter model, \(k_t\)) follow a random walk with drift and are simulated as such by: \[k_t=\delta + k_{t-1} + \xi_t^k\] where, \[k_t = (k_t^{(1)} \cdots k_t^{(N)})^T\] and \[\xi_t^k=N(0,\Sigma)\]

\(\delta\) is the \(N\)-dimension vector of drift parameters.
\(\Sigma\) is the variance-covariancce matrix of noise \(\xi_t^k\).

Another alternative provided in the StMoMo package is to treat the time-specific parameter as an ARIMA univariate model. It should be noted that the time-specfic parameter in the earlier section is assumed to follow a mutivariate random walk with drift. Under the alternate approach, the \(i\)-th parameter is assumed to be an ARIMA\((p_i,q_i,d_i)\) with drift. The expression for which is given by: \[\Delta^{d_i}k_t^i = \delta_0^i + \phi_1^i\Delta^{d_i}k_{t-1}^i+ \cdots + \phi_{p_i}^i\Delta^{d_i}k_{t-p_i}^i + \xi_t^i + \delta_i^i \xi_{t-1}^i + \cdots + \delta_{q_i}^i \xi_{t-q_i}^i \] where \(\Delta\) is the difference operator, \(\delta_0^i\) is the drift parameter, \(\phi_1^i,...,\phi^i_{p_i}\) are the auto-regressive components, with \(\phi_{p_i}\neq0,\delta_1^i,...,\delta^i_{q_i}\) as the moving average coefficients and \(\xi_t^i\) is stochastic process.

R of course can handle complicated formulations like the ones above with a few lines of code. The results for the simulation of 500 instances of a random walk with drift for the male and female populations is given in (a) and (b).

Figures (c) and (d) now show the forecasted mortality figures.

4. Further Research

With the advent of Artificial Intelligence(AI) and Machine Learning(ML), it would be conducvie to apply such techniques in the applicaitons of mortality modelling. Especially in the field of ML, in the last few decades, great leaps have been made and its feasibility has been documented rigorously. In the scenario of mortality modelling, it could possibly used to forecast say, for example the number of deaths. Machine learning algorithms are quite effective in finding patterns in large data space. Whether these patterns are of any practical use will of course require a great deal of effort to scope out.

If we look at mortality data, with the ages across the rows and periods across the columns, we can observe that there is certainly a pattern that emerges. Mortality rates are improving considerably and so the whole graph seems to be propagating forward. If this propagation can be quantified reasonably well, we could in turn use the predicted number of deaths to predict the expected mortality rates.

The above graphs show patterns of linearity as well as a downward drift. This is particularly clear is (b), which is a plot of the mortaity for 40-80 year olds. This was the fundamental finding, based on which the Lee-Carter and CBD models were built on. Machine Learning techniques such as Neural Networks (NN) or Convolutional Neural Networks(CNN) are efficient algorithms that can spot trends in data that normal people may not even be aware of. Though the problem of prediction still presents itself such that, these algorithms pick up patterns and apply it to a new set of data. As far the author is aware of, ML is still inefficient is truly predicting future values, as in our example, future values of death rates.

Appendix

Code

#Method 1-----
library(ggplot2)
library(demography)
library(tidyverse)
library(StMoMo)
library(fds)

AUSdata <- hmd.mx(country = "AUS", username = "rjmorpheus@gmail.com",
                  password = "Password", label = "Australia")
AUSStMoMO <- StMoMoData(AUSdata,series = "male")
summary(AUSStMoMO)

Years <- 1921:2014
Age <- 1:111
d <- AUSStMoMO$Dxt
e <- AUSStMoMO$Ext
m <- d/e
m <- log(m)
n.x = nrow(m) # 111
n.y = ncol(m) # 94

m_mat <- matrix(m, nrow = n.x, ncol = n.y) # 111 X 94
plot(m_mat[,1], type = "l") #mortality of all ages in year 1933
# Mean log mortality for all age groups under consideration (Age)
a_age <- rowMeans(m_mat)
lines(x = Age, y = a_age, col = 2)#average mortality for all age groups

# plotting mortality for Age = 65 as a trial run to see if code is working
# as exxpected
plot(x = Years, y = m_mat[65,], pch = 18, xlab = "Years", ylab = "log m", col = 2) #working!!!

# Average mortality across each year
a_year <- colMeans(m_mat)

#We need to estimate parameters ax, bx and kt of the Lee-Carter model
# ax is the avg of log mortality rates over time. ie sum(log m)/n.y
# avg of log rates -> a_hat <- rowSums(m)/94
# Subtracting a from all years in m_mat
m_hat <- matrix(1, nrow = n.y, ncol = n.x) #Dummy matrix with dim 94 X 111
m_hat <- t(m_mat)
m_mat <- t(m_mat)
for(i in 1:n.y) m_hat[,i] <- m_mat[,i] - a_age[i]
m_hat <- m_hat[,1:101]
age_mod <- 1:101
#SVD
svd_ <- svd(m_hat,1,1)
v <- svd_$v
u <- svd_$u
d <- svd_$d

b <- v/sum(v)
k <- u * sum(v) * d[1]
fit_1 = a_age[age_mod] + (b * k[1])
plot(x = age_mod, y = m_mat[1,age_mod], pch = 19, xlab = "Age", ylab = "m")
lines(x = age_mod, y = fit_1, col = 2, lwd = 2, xlab = "Age", ylab = "m")

fit_end <- a_age[age_mod] + (b * k[84])
plot(x = age_mod, y = m_mat[84,age_mod], pch = 18)
lines(x = age_mod, y = fit_end, col = 2, lwd = 2)

#Plotting parameter
par(mfrow = c(2,2))
plot(x = Years, y = k, type = "l")
plot(x = age_mod, y = a[age_mod], type = "l", xlab = "Ages", ylab = "ax")
plot(x = age_mod, y = b[age_mod], type = "l", xlab = "Ages", ylab = "bx" )
#plotting m over "Years"
plot(x = Years, y = m_mat[,1], pch = 19, ylab = "m", xlab = "Years", type = "l")
par(mfrow = c(1,1))

########################################################################################

##Performing SVD over ages 40:90
Age_new <- 40:90
m_mat_new <- m_mat[,Age_new]
m_hat_new <- m_hat[,Age_new]
n_x <- nrow(m_hat_new)
n_y <- ncol(m_hat_new)

for(i in 1:n_y) m_hat_new[,i] <- m_mat[,i] - a_age[i]

#SVD
svd_ <- svd(m_hat_new,1,1)
v_1 <- svd_$v
u_1 <- svd_$u
d_1 <- svd_$d

b_1 <- v_1/sum(v_1)
k_1 <- u_1 * sum(v_1) * d_1[1]
fit_1_new = a_age[Age_new] + (b_1 * k_1[1])
plot(x = Age_new, y = m_mat_new[1,], pch = 19, xlab = "Age", ylab = "m")
lines(x = Age_new, y = fit_1_new, col = 2, lwd = 2)

fit_end_new <- a_age[Age_new] + (b_1 * k_1[84])
plot(x = Age_new, y = m_mat_new[84,], pch = 18, xlab = "Age", ylab = "m")
lines(x = Age_new, y = fit_end_new, col = 2, lwd = 2)

#Plotting parameter
par(mfrow = c(2,2))
plot(x = Years, y = k_1, type = "l",  type = "l", xlab = "Years", ylab = "k")
plot(x = Age_new, y = a_age[Age_new], type = "l", type = "l", xlab = "Ages", ylab = "ax")
plot(x = Age_new, y = b_1, type = "l",  type = "l", xlab = "Ages", ylab = "bx")
par(mfrow = c(1,1))

############################################################################################

#Forecasting
#Initialising a dummy matrix to hold our simulation of random walk
#for kt
library(ggplot2)
S <- matrix(0,61,100)

for(j in 1:100) S[,j] <- -54.3332 + cumsum(-0.75 + rnorm(61, 0, 0.75))

quan <- function(x) quantile(x, c(0.05,0.5, 0.95))
sim <- apply(S, 1, quan)

##Forecast of kt showing 5%,50% and 95% confidence intervals of the random walk.
#plot(x = Years, y = k_1, xlab = "Years",type = "l", ylab = "k",
#     xlim = c(1920,2064), ylim = c(-90,50))
#lines(x = 2014:2074, y = sim[1,], col = 4, lty = "dashed")
#lines(x = 2014:2074, y = sim[2,], col = 2, lty = "dotted")
#lines(x = 2014:2074, y = sim[3,], col = 4, lty = "dashed")

ggplot() +
  xlim(1920,2064) +
  ylim(-90,50) +
  geom_line(data = NULL, aes(x = Years, y = k_1)) +
  geom_line(data = NULL, aes(x = 2014:2074, y = sim[1,]), col = 2) +
  geom_line(data = NULL, aes(x = 2014:2074, y = sim[2,]), col = 4) +
  geom_line(data = NULL, aes(x = 2014:2074, y = sim[3,]), col = 2)
  
# LC with StMoMo-----
# Extracting male and female dat from HMD, after creating a StMoMo data object
# "Male" and "Female" data are assigned to different variables for easier
# data wrangling.
library(colorspace)
library(gridExtra)
library(cowplot)
library(RColorBrewer)

AUSdata <- hmd.mx(country = "AUS", username = "rjmorpheus@gmail.com",
                  password = "Password", label = "Australia")
AUSStMoMO_m <- StMoMoData(AUSdata,series = "male")
AUSStMoMO_f <- StMoMoData(AUSdata,series = "female")

#Under a Binomial setting
#Becasue I'm opting to use the data with the Lee-Carter model under a Binomial setting
#the exposures have to be converted to the initial exposures.
LC1 <- lc(link = "logit")
data_m <- central2initial(AUSStMoMO_m)
data_f <- central2initial(AUSStMoMO_f)
ages_fit <- 40:90


#This can be ussed ot generate a weight matrix over the ages and years in the data.
# clips = 1 assigns 0 weights to the first and last cohorts.
wxt_m <- genWeightMat(ages = ages_fit, years = data_m$years,
                     clip = 1)
wxt_f <- genWeightMat(ages = ages_fit, years = data_f$years,
                      clip = 1)

#For males
LCfit_m <- fit(LC1, data = data_m, ages.fit = ages_fit, wxt = wxt_m)

#For females
LCfit_f <- fit(LC1, data = data_f, ages.fit = ages_fit, wxt = wxt_f)

#plotting parameters
par(mfrow = c(1,3))
plot(x = ages_fit, y = LCfit_m$ax, col = 2, type = "l")     #males
lines(x = ages_fit, y = LCfit_f$ax, col = 4)     #females

plot(x = ages_fit, y = LCfit_m$bx, col = 2, type = "l")
lines(x = ages_fit, y = LCfit_f$bx, col = 4)

plot(x = Years, y = LCfit_m$kt[1,], col = 2, type = "l")
lines(x = Years, y = LCfit_f$kt[1,], col = 4)
par(mfrow = c(1,1))

#Goodness-of-fit analysis-----
# For males
res_m <- residuals(LCfit_m)
aic_ <- AIC(LCfit_m)
bic_ <- BIC(LCfit_m)

#For females
res_f <- residuals(LCfit_f)

#Plotting colour maps of males and females
p1 <- plot(res_m, type = "colourmap", main = "Residuals of Male data")
p2 <- plot(res_f, type = "colourmap", main = "Residuals of Female data")

#Plotting residuals of males and females
pal <- brewer.pal(4,"Greys")
plot(res_m, type = "scatter", col = pal)
plot(res_f, type = "scatter", col = pal)

#Forecasting----
LCfore_m <- forecast(LCfit_m, h = 50)
LCfore_f <- forecast(LCfit_f, h = 50) 

## Comparison of forcasting done in three instances:
# a.) Forecasting kt with random walk using the forecast funciton.
# b.) Forecast of kt done with SVD and first principles.
# c.) Forecast of kf done with forecast and iarima.
par(mfrow=c(1,3))
plot(x = Years, y = LCfit_m$kt[1,], type = "l", xlim = c(1920,2065),
     ylim = c(-70,20), sub = "a)", ylab = "kt")
lines(x = 2014:2063, y = LCfore_m$kt.f$mean, col = 2)
lines(x = 2014:2063, y = LCfore_m$kt.f$upper[1,,1], col = 4)
lines(x = 2014:2063, y = LCfore_m$kt.f$lower[1,,1], col = 4)

plot(x = Years, y = k_1, xlab = "Years",type = "l", ylab = "kt",
     xlim = c(1920,2064), ylim = c(-90,50), sub = "b)")
lines(x = 2014:2074, y = sim[1,], col = 4, lty = "dashed")
lines(x = 2014:2074, y = sim[2,], col = 2, lty = "dotted")
lines(x = 2014:2074, y = sim[3,], col = 4, lty = "dashed")

LCforeArima <- forecast(LCfit_m, h = 50, kt.method = "iarima",
                        kt.order = c(1, 1, 2))

plot(LCforeArima, only.kt = TRUE, sub = "c)")
par(mfrow=c(1,1))

## Simulation for LC
LCsim_m <- simulate(LCfit_m, nsim = 500, h = 50)
LCsim_f <- simulate(LCfit_f, nsim = 500, h = 50)

#Simulation of kt for males and females
par(mfrow=c(1,2))
plot(x = Years, y = LCfit_m$kt, ylim = range(LCfit_m$kt,LCsim$kt.s$sim[1, , 1:20]),
     xlim = range(LCfit_m$years, LCsim_m$kt.s$years), type = "l")
matlines(LCsim_m$kt.s$years, LCsim_m$kt.s$sim[1, , 1:20], type = "l", lty = 1)

plot(x = Years, y = LCfit_f$kt, ylim = range(LCfit_f$kt,LCsim$kt.s$sim[1, , 1:20]),
     xlim = range(LCfit_f$years, LCsim_f$kt.s$years), type = "l")
matlines(LCsim_f$kt.s$years, LCsim_f$kt.s$sim[1, , 1:20], type = "l", lty = 1)
par(mfrow=c(1,1))

#Simulation of mortality for males
par(mfrow=c(1,2))
a_ <- range(m[75, ], LCsim_m$rates[35, , 1:20])
b_ <- range(f[75, ], LCsim_f$rates[35, , 1:20])
plot(LCfit_m$years, exp(m[75, ]), xlim = range(LCfit_m$years, LCsim_m$years),
       ylim = c(0.001,0.09) ,type = "l",
      xlab = "year", ylab = "rate", main = "Male mortality rates at age 75")
matlines(LCsim_m$years, LCsim_m$rates[35, , 1:20], type = "l", lty = 1)

f<-data_f$Dxt/data_f$Ext 
  
plot(LCfit_f$years, f[75, ], xlim = range(LCfit_f$years, LCsim_f$years),
     ylim = b_ ,type = "l",
     xlab = "year", ylab = "rate", main = "Female mortality rates at age 75")
matlines(LCsim_f$years, LCsim_f$rates[35, , 1:20], type = "l", lty = 1)
par(mfrow=c(1,1))

#Research-----
y_ <- 40:80
ag_ <- 1:94
lin_mod <- lm(m[y_,]~y_)
plot(lin_mod)
pr.lm <- predict(lin_mod)
plot(pr.lm[1,])


plot(m[40:80,1], col = 2, pch = 19, ylim = c(-7,-2))
points(m[40:80,94], pch = 19, col = 4)

matplot(m[40:80,], col = pal, type = "l")
matlines(m[40:80,94], col = pal, pch = 19)

Bibliography

An Introduction to Generalized Linear Models, Second Edition, Chapter 3, Annette J. Dobson, 2002
Generalized Linear Models and Actuarial Science Author(s): Steven Haberman and Arthur E. Renshaw, Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 45, No. 4 (1996), pp. 407-436
Modeling and Forecasting U. S. Mortality, Ronald D. Lee; Lawrence R. Carter, Journal of the American Statistical Association, Vol. 87, No. 419. (Sep., 1992), pp. 659-671.
Statistical Software for State Space Methods, Jacques J. F. Commandeur, Siem Jan Koopman, Marius Ooms, Journal of Statistical Software, May 2011, Volume 41, Issue 1.
Modeling and Forecasting Mortality Using the Lee-Carter Model for Indian Population Based on Decade-wise Data, Rajendra N. Chavhan and Ramkrishna L. Shinde, Department of Statistics and Actuarial Science, School of Mathematical Sciences, North Maharashtra University, Jalgaon Jalgaon-425001, Maharashtra, India, Sri Lankan Journal of Applied Statistics, Vol (17-1), 2016.
Fitting and Forecasting Mortality for Sweden: Applying the Lee-Carter Model, Jenny Zheng Wang, Mathematical Statistics Stockholm University, 2007.
Understanding the Lee-Carter Mortality Forecasting Method, Federico Girosi & Gary King,(https://gking.harvard.edu/files/gking/files/lc.pdf), 2007.
Extending the Linear Model with R, Julian J. Faraway, Taylor and Francis Group,LLC,2006.
StMoMo: An R Package for Stochastic Mortality Modeling, Andres M. Villegas, Pietro Millossovich, Vladimir K. Kaishev
Experience Rating and Credibility, Hans Bulhmann, 1967
Credibility Theory and Generalised Linear Models, J.A. Nelder and R.J. Verrall,AST1N BULLETIN Vol 27 No I 1997 pp 71 82, https://www.cambridge.org/core.

Lee-Carter Model in a Generalised Linear Modelling Framework

Ron Sam John

February 10, 2019