Patterns of DRD and Suicide

## `mutate_each()` is deprecated.
## Use `mutate_all()`, `mutate_at()` or `mutate_if()` instead.
## To map `funs` over all variables, use `mutate_all()`

## Parsed with column specification:
## cols(
##   sex = col_integer(),
##   age = col_integer(),
##   year = col_integer(),
##   death_count = col_integer(),
##   population_count = col_number(),
##   dths_5yr = col_double(),
##   pop_5yr = col_number(),
##   cruderate = col_double()
## )

## Parsed with column specification:
## cols(
##   sex = col_integer(),
##   age = col_integer(),
##   year = col_integer(),
##   carstairs = col_integer(),
##   death_count = col_double(),
##   population_count = col_number(),
##   x1 = col_double(),
##   x2 = col_number(),
##   x3 = col_double()
## )

Introduction

Galton’s Ellipse

In 1886, Francis Galton published an article titled “Regression towards Mediocrity in Hereditary Stature”. Galton’s interests were in heredity, and as an exploration of possibly heritable traits he compared the heights of many adults over two generations: the parents’ generation, and the child’s generation. This data comprised the heights, measured to one tenth of an inch, of 930 adult children and, given the large family sizes of the time, over 200 parents. The heights of females were multipied by a factor of 1.08 - Galton stated this factor “suits my data better than 1.07 and 1.09”, and also that “the computer to whom I first entrusted the figures used a somewhat different factor, yet the results came out closely the same.” [p. 247] - and the average height of each parent compared against the height of each child by first aggregegating the heights of child and parents to the nearest inch, then drawing a table where each row was a parental height, and each column a child height. Within each cell of this table Galton - or possibly his ‘computer’ - wrote the count of observations where the parental height in the row corresponded with the adult child height in the column.

With both rows and columns indicating heights in inches, Galton had produced more than just a table of numbers. He had also produced a visualisation of a data landscape. Much as a topographic map can be seen as a data landscape comprising latitude running left to right, longitude running bottom to top, and height above sea level for each latitutde-longitude combination, so Galton’s data landscape also comparised three variables, two of which used the same unit of measurement (inches) and were shown to the same scale. And just as heights are really continuous variables regardless of how finely or coarsely they’re aggregated and measured, Galton imagined the distribution of frequencies as existing on a continuous height-height surface, and from this borrowed one of mapmakers’ most important visual methods for representing height over a topographic surface - the contour line - to make sense of this landscape of values. Galton wrote:

I found it hard at first to catch the full significance of the entries in the table, which had curious relations that were very interesting to investigate. They came out distinctly when I “smoothed” the entries by writing at each intersection of a horizontal column with a vertical one, the sum of the entries in the four adjacent squares, and using these to work upon. I then noticed (see Plate X) that lines drawn through entries of the same value formed a series of concentric and similar ellipses. […] The points where each ellipse in succession was touched by a horizontal tangent, lay in a straight line inclined to the vertical in the ratio of 2/3; those where they were touched by a vertical tangent lay in a straight line inclined to the horizontal in the ration of 1/3. [pp. 254-5]

Though the Table X referred to in the above includes only a single elliptical contour, it is clear from the text above that Galton had a series of contour lines comprising additional concentric ellipses, the innermost of which contained the ‘peak’ of the landscape of values, which would now be described as the joint mean of parental and child height. The method of smoothing described - which would now be described as von Neumann neighbourhood with a Manhattan distance of 1 - not only foreshadows Tobler’s First Law of Geography by nearly a century, but also makes apparent the extent to which Galton reasoned about the dataset, and the particular forms of patterns and regularities it exhibited, by thinking about it visually, as a three dimensional surface.

The major axis line which Galton produced, linking together the vertices of the concentric series of ellipses which he had imagined from this table of numbers, was the origin of the concepts of correlation and regression, which in later decades found more precise algebraic expression through Fisher. However, it was from the somewhat less formal, more intuitive, visual approach Galton describes that such ideas were first developed.

As-if-spatial reasoning about population data using Lexis surfaces

This paper argues that, though we should not follow Galton’s particular preoccupations with heredity, nor agree with the particular sociological implications he drew from his data, we should recognise from the above example the importance and potential of visual methods, and in particular as-if-spatial reasoning about data, for helping to develop and refine models of population processes. This paper argues: for the more widespread use of Lexis surfaces - arrangements of values as a function of relative time (age) and absolute time (year) - for the visual exploration of population data; for adopting a case-based approach, in which thousands of data points are thought about as comprising a single, complex data landscape, which use Lexis surface visualisations to develop and refine intuitions - informal models - about the population processes under investigation; for the formalisation of these informal models through the development of de novo statistical model structures and specifications; and for the assessment and refinement of statistical models by exploring Lexis surface visualisations of both model predicted surfaces, and surfaces of residuals. In short, this paper argues for the development of a more integrated workflow between modelling and Lexis surface data visualisation, comprising the main sequences illustrated in the figure below:

FIGURE ABOUT HERE

The proposed uses of Lexis surface visualisation within the workflow are illustrated using data on three types of cause-specific mortality in males living in the most deprived fifth of areas in Scotland.

TO DO:

STATEMENT ABOUT SOURCE OF DATA
STATEMENT ABOUT HOW AREAS WERE DEFINED
STATEMENT ABOUT CAUSES OF DEATH USED

This paper argues that careful visualisation and intution of a similar type of data - those with can be represented on a Lexis surface - can lead to new insights about population processes. As with Galton’s intuitions and Fisher’s formalisations, these insights can be, and should be, formalised through the development of appropriate statistical model specifications. The implications of the model can be, and should be, explored further through Lexis surfaces both of predictions and of deviations from the base data. Lexis surfaces of both predicted surfaces and deviations allow for a more nuanced understanding of the extent to which a model captures some of the often complex patterns present in the data, and are strongly complementary to standard measures of penalised model fit such as AIC. The model structures which may be arrived at as a result of first visualising the data on Lexis surfaces are often unlikely to be those in the standard model ‘toolkit’ used for modelling population processes, which often emphasise the identification of distinct age, period and cohort parameters, even though doing so is logically impossible as such models are under-identified without applying additional arbitrary assumptions. Instead, Lexis surface representations of population data tend to suggest that independent and linear contributions of age effect, period effect and cohort effect are much more often the exception rather than the norm. Instead, complex interactions, nonlinearities, and conditionality in the presence, absense and magnitude of risks are much more common.

This paper will ….

This document will illustrate how a careful visual exploration of three types of death - alcohol-related deaths, other drug-related deaths, and suicide - using Lexis surfaces and other graphs, can allow for the development of new structural statistical models. Lexis surfaces will be used again to explore model fit and complement standard measures of fit such as AIC.

The focus within this paper will be on cause specific death rates due to one of three causes - alcohol-related deaths, suicides, and illegal drug-related deaths - and will focus on a particular subgroup of the Scottish population: males, aged fifteen years and above, living in the most deprived fifth of areas in Scotland as defined by the Carstairs index.

This subpopulation is focused on for a number of reasons:

They are a relatively small population in absolute numbers, illustrating the ability of the approach to explore data even when n is fairly small
There are fairly strong inequalities in health within Scotland, with these ‘despair-related’ causes of deaths particuarly acute issues within the most deprived areas.
Males tend to have greater vulnerability or risk of these causes of death than females.

Data were provided by NHS Scotland - [PROVIDE DETAILS]

The principles illustrated here can be generalised to other population groups as well.

Initial data exploration

The figure below shows the total number of deaths amongst 15 to 90 year olds recorded as due to each of these three causes in the dataset, both in the total male population, and amongst the most deprived quintile. Of course if area deprivation did not affect the risk then on average the number of deaths through these causes in the most deprived fifth of areas would be around one fifth of the total deaths.

We can see from this that a disproportionate share of all cause-specific deaths in males occurred in those living in the most deprived fifth of areas, in particular for drug-related deaths. We can also see both some similarities and some differences in both the magnitudes and trends in deaths through these causes. For example, deaths from all three causes tended to increase over this period, but most sharply for deaths from alcohol. Deaths from illegal drugs appear a particularly acute problem in the most deprived areas. [Write one or two more summaries about this]

Lexis surfaces

The figure below shows death rates from each of these three causes in males in the most deprived areas, arranged as levelplots by age along the vertical axis and year across the horizontal. A qualitative colour scheme from the RColorBrewer package is used, "Paired", in which first light and then dark variants of qualitatively dissimilar colours are used to represent different values of mortality risk. This colour scheme makes it relatively easy to use the legend on the right to lookup values associated with different colours and shades. Because many different types of colour are used it is also expected to be relatively robust to issues of colour blindness, as most people who are colour blind are unlikely to be blind to most of the colours used.

Because each cell represents an age-year specific mortality risk, and the size of the numberators and denominators are relatively small with this level of chronological and temporal disaggregation, there can be fairly high variability between consecutive, or ‘neighbouring’, years and ages. This produces a ‘speckled’ or ‘noisy’ appearance to the level plots, which can be make detecting broader patterns within the Lexis surfaces more challenging; in particular, in the case of the "Paired" colour scheme used, when two neighbouring cells are of qualitatively different colours.

To make it easier to identify some of the broad patterns within the data, it can also be useful to visualise Lexis surfaces which have been smoothed by averaging or ‘blurring’ over neighbouring cells. In the figure below the blur function witin the spatstat package was used to blur over the logarithm of each mortality rates, using a smoothing parameter of 1.3, before being exponentiated again. Because smoothing involves averaging over neighbours, a downside of this approach is that values near the ‘edges’ of the Lexis surface - such as the first and last years, and youngest and oldest ages, tend to be biased downwards. To reduce the visual distortion associated with such edge effects the two youngest and two oldest ages, and two earliest and two most recent years, have been clipped from the display. A second downside is that the mortality values depend on the level of smoothing, so the comparisions using smoothed surface should be primarily impressionistic and qualitative.

Looking at both the unsmoothed and smoothed versions of the figures, a number of important patterns are clear.

Firstly, it appears that the patterns of alcohol-related mortality are qualititatively different to those of drug-related deaths and suicide; secondly, to appears that trends in drug-related deaths and suicide may be somewhat more similar to each other, in particular in the most deprived areas, than the time series of death counts may have suggested.

For alcohol-related deaths, it appears that untile the mid 1990s, there was little change in death rates over time, with a clear and consistent relationship with age. Risk of death through this cause tended to increase from around the age of 40 years, peaking in late working age, then falling a few years after retirement age. After the mid 1990s, alcohol-related mortality in males increased sharply, but these increases over time were not the same at all ages. Instead, alcohol-related mortality did not appear to increase much in those in their twenties, with the section of the surfaces between the ages of 20 and 30 remaining a pale blue in all years. Instead, the increases over time appeared mainly to affect those aged between around 40 and 70 years, peaking around the age of 60 years. As the time-series figure indicated, alcohol death risks also fell after peaking in the mid 2000s. Within the level plots, this produces the appearance of concentric rings, concentrated around the ages of 55-60 years, and the early to mid 2000s on the horizontal scale.

The Lexis surfaces for drug-related deaths are different to those for alcohol-related deaths, suggesting the aetiologies of the two causes are dissimilar. Whereas for alcohol-related deaths, the ages at highest risk of this cause tend to be around the ages of 40 to 70 years, for drug-related deaths the ages at highest risk appear between around age 17 and 40 years. Further, whereas there were historically relatively high and consistent death risks at the ages of highest risk for alcohol-related deaths, for drug-related deaths the annual risk of dying of this cause within the age range 20 and 40 years, and at all other ages, tended to be relatively low until around 1990, after which age-specific risks increased. Unlike the increase in deaths through alcohol - which rose during the 1990s, peaked around 2005, and fell thereafter - there does not appear to be a simple peak to the increasing pattern of risk. Instead, increases in age-specific drug-related death risk tend to occur in later years for older ages within the 20 to 50 year age range, producing a diagonal ‘crest like’ pattern in the bottom right quadrant of the DRD Lexis surfaces, producing a triangular pattern in the bottom right quarter of the Lexis surfaces. As the horizontal axis is period, and vertical axis age, diagonal patterns which ‘move’ from bottom left to top right can be indicative of cohort effects, as it implies cohort membership is a particular important predictor of risk.

We can explore the importance of cohort effects for drug-related deaths, and for suicides, by producing Lexis surfaces in which birth year, rather than year, is on the horizontal axis. Within these age-cohort Lexis surfaces, the rectangular canvas of the Lexis surface becomes rearranged as a trapezoid banking diagonally and to the left, and so the upwards-right diagonal cohort patterns which previously were diagonals banking upwards and to the right instead become vertical lines from bottom to top; period changes, which previously were vertical lines, instead become diagonal features banking upwards and to the left.

With the age-cohort rather than age-year Lexis surface, focused on ages up to 50 years, and only on these two causes of death, indications of cohort effects become clearer. In particular, if we look at the age range 30 to 45 years, and in particular at the smoothed drug-related deaths surface, we can see the colour of tiles change from blue to green when looking left to right, from cohorts born in the 1950s to cohorts born in the 1960s. In particular, it seems the sharpest increases in DRD risk by age, within this age range, occurred within cohorts born in the early 1960s.

However, the pattern of change in DRD risk is not entirely a cohort effect, and instead also appears to contain age-related and period-related factors. The age-related factor is evident by noting how the mortality risk changes between around the ages of 17 and 25 years, in particular for cohorts born after around 1970, with sharp increases with age over this age range. This suggests that, for affected cohorts, the risk is conditional on having reached early working age, and that exposure to labour market conditions may perhaps be a factor in the rise of DRD risk.

By looking at the blue diagonal strip running north west along the DRD and suicide surfaces in the age-cohort Lexis surfaces, and the equivalent blue vertical strips in the age-period surfaces, we can see that, as well as the DRD risk being conditional on cohort membership and age (the ‘ageing in’ effect), it is also conditional on period, with the rise in risk occurring almost entirely after around 1990. This pattern cannot therefore be described exclusively in either cohort terms, nor age terms, not period terms, but is attributable to a particular form of interaction between all three components.

Looking at the Lexis surfaces on the right column of the above figure, showing suicide risk by age and cohort membership, it is apparent that the conditional interaction between age, period and cohort identified above for drug-related death risk is also apparent for risk of suicide, though also that suicide risks for non-exposed cohorts and periods (i.e. for periods before 1990 and cohorts born before around 1955-60) also tended to be somewhat higher than for DRD risk, as evidenced by more cells that are dark blue or light green. We can therefore assume there may be a common factor in both forms of cause-specific mortality risk. Finally, it appears the cohort-component of elevated suicide risk may be slightly more focused on a particular band of cohorts, those born between around 1960 and 1980, than for DRD risk, where there is less evidence that cohorts born after around 1980 then experienced reduced risk over the life course.

Modelling

Having explored the Lexis surfaces of these three causes of death, we can see distinct differences in the mortality pattern, with alcohol-related mortality risk tending to be concentrated at older working ages, and to have exhibited a period change during the 1990s; and both suicides and illegal drug-related deaths tending to affect younger adult ages, to be somewhat similar to each other, and to exhibit more of a cohort-based pattern of change than alcohol related mortality.

This section will progress as follows: in the next part, the comparative value of assuming either age, period or cohort effects separately will be considered by constructing very simple models in which only one of these effects is included, and comparing them using AIC; two much more complex models will be fit to the data to assess whether period or cohort effects matter more for each of the three causes of death, and again their balance between model parsimony and fit assessed using two measures of penalised model fit, AIC, and BIC. Of these two measures BIC tends to place a higher penality on additional parameters, so will tend to select more parsimonious models.

In the second part, a bespoke and stylised model will be introduced which aims to represent the main features of the alcohol mortality surface, and compared against the above models. In the third part a different model structure will be described and fit to the DRD and suicide surfaces.

In the third part, the fit of selected models will be compared not just using AIC, but also by exploring Lexis surfaces of residuals between the modelled surfaces and the data surface.

For all model types, negative binomial models will be used using an offset of log exposure (population at risk).

Simple non-bespoke models

## Joining, by = "death_type"

## Joining, by = "model_name"

We can start to understand the implications of the above model comparisons by working from the least complex models at the bottom of the figure to the most complex models at the top of the figure, and within each of the model groupings compare between those models which emphasise age, period and cohort components.

Firstly, if we compare the bottom three models we can see linear estimates of the age, period and cohort effect for each of the datasets. From this we can see that, for the alcohol deaths data, the linear age model age_lin outperfroms period_lin and cohort_lin. By contrast, for drug related deaths the cohort_lin strongly outperforms period_lin and age_lin, which both have very similar levels of fit. Finally, for the suicide data cohort_lin slightly outperforms period_lin, and age_lin is the least effective model.

Both the triad of models with the _nlin and _fact prefixes explore nonlinear effects in either age, period or cohort effects. The _nlin models allow for both a linear and quadratic term, but not a higher order polynomial; by contrast the _fact models use separate dummy variable terms for each age, year or cohort in single years, allowing for more complex nonlinear shapes in the relationships, though it involves many more parameters. Because it involves far more parameters than many other model specifications it is within the fact family that the differences between AIC and BIC are greatest, with BIC scores tending to be much higher than AIC. If there were both nonlinearity, and this nonlinearity can largely be described through the first two polynomials, then the _fact and _nlin terms can be expected to show a similar order of relative preference amongst the age, period and cohort variants.

For the alcohol data, the age variant of the models remain preferable to period and cohort versions for both the _fact and _nlin model types, and the gap between the age, and period and cohort, specifications has grown. For suicide, the age-based models are similarly supierior to cohort and period models within both the _fact and _nlin groups, again suggesting there is a nonlinear relation between age and risk .By contrast, for drug-related deaths the cohort-based version of the _nlin and _fact models have the best fit within these classes. Overall, the models considered so far suggest that, if only either age, period or cohort effects could be modelled, then age effects are most important for predicting alcohol-related deaths, and to a lesser extent suicides; and cohort membership matter most in predicting drug-related deaths.

Modelled surfaces and residual surfaces for Alcohol-related mortality

In the figure below, prediction surfaces (Lexis surfaces of predicted values) are shown for six of the best performing models in the case of the alcohol-related mortality data. As discussed above, these models tend to include age terms in preference to cohort terms, with the best performing models tending to assume that the relationship between age and mortality risk tends to be nonlinear.

To complement the prediction surfaces, Lexis surfaces of residuals - differences between modelled and observed death rates for each age-year combination - are shown in figure X. Within these residual surfaces, blue colours indicate over-estimation of death rates, red colours under-estimation of death rates, with stronger colours (bluer blues or redder reds) indicating larger differences between predicted and observed values.

When looking at the prediction surfaces we should keep in mind the unsmoothed and smoothed data surfaces (figure XX and figure XX), and are interested in the extent to which the modelled surfaces are qualitatively similar or dissimilar to the data surfaces, in terms of whether the most salient patterns and features identified in the data surfaces are also present in the modelled surfaces. The residual surfaces help to identify systemic deviations between the data surfaces and modelled surfaces, and so for more nuanced identification of model misspecification than summary statistics alone - such as RMS error - would allow.

Looking first at the age_lin model in the top left, we see the model predicts a steadily increasing alcohol-related mortality risk with age. Looking at the corresponding residuals surface, we see this model is relatively effective at representing the data surface up to around the age of 65 years, and for the period up to around 1995; at higher ages, the residual surface becomes increasingly blue, showing that the model over-predicts mortality from this cause at these ages; and after around 1995, between the ages of around 40 and 65, an oval cloud of red cells is evident, suggesting that models which do not allow for changes in age-specific risk after this period under-estimate mortality risk at these ages in this latter period.

By looking now at the two models which allow for nonlinearity in the relationship between age and mortality risk, age_nlin and age_fact, we see that both of these models appear qualitatively similar, predicting that mortality risk peaks around the age of 60, before decreasing again. The similarity between these two models suggests that a quadratic representation of the age-risk relationship is sufficient, and that additional polynomial terms of more flexible nonlinear smoothing functions are not required. By looking at the residual surfaces over the 40 to 60 year age range we can see that both the age_nlin and age_fact models slightly over-estimate (blue cells) the age-specific mortality risk up to around 1995, and somewhat more strongyl under-estimate (red cells) the age-specific mortality risks at the same ages after around 1995. This again suggests that there was a relatively abrupt change in age-specific risk during the mid 1990s, and that in not allowing for this period-based variation the model is mis-specified.

The model aap allows for both a nonlinear quadratic variation in age-specific risk, as well as a linear effect of period, as well as interactions between \(age\), \(age^2\), and \(period\). The effect of this is to reduce the magnitude of the residuals between the ages of around 40 and 60 years, up to around 1995. However, a vertical patch of red cells is evident at these ages in the late 1990s and early 2000s, followed by a blue patch at these ages after around 2010. This suggests that the period-based change in age-specific mortality risks are not linear, and instead the change may have been more abrupt.

The two final models aapp and ccpp include both linear and quadratic terms for age or cohort effects, and period effects, as well as interaction terms. The number of parameters in these models mean they are highly flexible and fit the data quite effectively, even if structurally misspecified, not based on a substantive knowledge of the population processes, and likely to have poor out-of-sample fit. The main substantive question they might be used to address is whether age effects or cohort effects are more important for the population process, which can to a limited extent be addressed by comparing the AIC and BIC of the two model specifications, as was shown in figure XX. From this we saw a slightly preference for the cohort-based ccpp model over the age-based aapp model, hinting that cohort effects matter slightly more than period effects for this example. However, this indication of cohort effects may be misleading given the less flexible model structures all indicated the importance of age effects over cohort effects. Instead, we should compare the pattern of the aapp and the ccpp prediction surface against the data surface in figure XXX and XX. A key feature in the data surface is a maxima or ‘epicentre’ of risk, located around age 60 years and in the early 2000s, with risks decreasing locally over the Lexis surface around neighbouring values; it appears from qualitative comparison between the predicted aapp and ccpp surfaces, in comparison with the data surface, that the ccpp model specification is better able to reproduce this concentric, epicentral pattern in the Lexis surface, even though the position of the epicentre and shape of the concentric patterns are qualitatively dissimilar.

Modelled surfaces and residual surfaces for drug-related deaths

The figures below repeat the same exercise, of producing prediction surfaces and residuals surfaces for the six best performing models, for the drug-related deaths data. Whereas with the alcohol-related deaths data the best performing models within each category tended to preference age over cohort effects, the best peforming models within the DRD data tend to include cohort effects instead.

The simplest model, cohort_lin, represents the assumption of linearly increasing drug-related death risk with cohort, producing a series of diagonal bands, with lowest values in the top left of the Lexis surface to highest values in the bottom right of the Lexis surface. Qualitatively this is dissimilar to the ‘triangular’ shape observed in the DRD data surface (figure XXX and XXX), suggestive of a particular complex interaction between cohort effects and age effects. By looking at the corresponding residuals surface for this model, we see that is is primarily misspecified in two regions of the Lexis surface, with both a diagonal cloud of red cells emenating upwards and to the right for those cohorts aged slightly over 20 years after around 1990; and an increasing gradation of blue cells in the last observed years, and between around the ages of 17 and 25 years. This indicates that this model both under-estimates the additional drug-related mortality for cohorts who were in their twenties in the early 1990s, and over-estimates mortality risk for more recent cohorts at younger ages. Both forms of misspecification highlight the importance of appropriately representing the complex interactions between cohort and age effects in the model specification. The next simplest model, cohort_nlin, appears somewhat better at modelling risk at younger ages amongst the latter cohorts, but little better at modelling the elevated risk amongst the affected cohort. Similarly, the more flexible cohort_fact model appears slightly better at modelling the elevated cohort-attributable risk after around 1990, but also to over-estimate risk for the same and slightly earlier cohorts before around 1990, as indicated by a darker patche of blue cells between around the ages of 17 and 22 up to around 1990.

The three more complex models on the bottom row, ccp, aapp, and ccpp, are qualitatively similar to each other both in terms of the patterns they generate on the prediction surface, and the patterns of residuals evident on the residuals surface. Each of these models is more effective than the three simpler models in not over-estimating mortality risks at younger ages, but in the case of the ccp model, in particular, there is an indication of a slight over-estimation of risk amongst people in their fifties in the late 2000s and early 2010s. For all three models the diagonal red cloud, associated with those in their twenties after around 1990, is still evident in the residuals surfaces, though is slightly fainter in the more complex models. For each of these models the overall appearance of the modelled surfaces tend to be parabolic, which is qualitatively dissimilar to the more triangular appearance in elevated risk observed in the data surfaces.

Intuitive and informal modelling

In the previous sections, the unsmoothed and smoothed data surfaces for alcohol-related deaths and drug-related deaths were first shown and discussed; and then modelled surfaces based on a range of model specifications were presented along with residuals surfaces. Whereas AIC and BIC were useful for broad and coarse comparisons between many models - and to some extent understanding the relative importance of age, period and cohort effects in the population processes being modelled - Lexis surface visualisations allowed for a more nuanced, qualitative comparison between model specifications and data, and identification of specific ways that different models either do, or do not, adequately represent some of the key features identified through informal inspection of the Lexis surfaces of the data itself. Constructing and comparing between these models can help refine, and potentially reform, some of the intuitions developed through visual exploration of the data using the Lexis surfaces. However the model specifications themselves are somewhat generic, and not informed by the intuitions which have been developed while exploring the Lexis surfaces of the data itself. If the process of modelling were to stop here, therefore, there would be something of a disconnect in the process of combining Lexis surface visualisation and modelling steps, with the ‘informal model’ developed from looking at the data and identifying particular patterns within it not then being used to inform any of the formal statistical model specifications employed. In order to make best use of Lexis surface visualisations, therefore, the informal model needs to be formalised: the particular patterns and features within the Lexis surface of the data need to be represented algebraically and parameterised appropriately. By formalising these Lexis surface-informed intuitions, model specifications may be arrived at that otherwise may not have been considered, or even have existed; instead of having to select between a series of a priori model structures which do not incorporate the knowledge gained through data exploration, de novo structures can be developed. Unlike the more highly parameterised model specifications presented above, in particular aapp and ccpp, which are likely over-fit to the data and whose parameters are generally lacking substantive interpretability, de novo models may incorporate far fewer parameters, whose substantive interpretation is often much clearer.

In this section, a different de novo model specification is defined for drug-related deaths and alcohol-related deaths. As the earlier visualisations and explorations indicated, the Lexis surfaces of these two causes of death are qualitatively very dissimilar, and so using a single model structure for both surfaces would be conceptually inappropriate. As before, the predicted surfaces produced by the de novo models will be shown, alongside the surfaces of residuals. Unlike the earlier models, some of which involved many dozens of parameters including interaction terms which would be difficult to interpret, some of the model parameters will also be presented where they allow for a meaningful interpretation. To further illustrate the bespoke nature of the model structures developed, the model specification developed for the alcohol-related deaths data will be applied to the drug-related deaths data and vice versa, and the patterns of residuals on the corresponding Lexis surfaces of these misspecified models discussed. The model developed for characterising the DRD patterns will also be applied to the suicides dataset, because the Lexis surfaces of both datasets suggest a similarity in the interaction between age and cohort in predicting risk.

De novo model specifications

DRD-model results

The figures below show the alcohol model described above as applied to each of the three causes of death. Figure XXX shows the modelled surface for each of these three causes, and figure XXX shows the surfaces of residuals; to allow comparability between causes of death these residuals have been scaled by cause by dividing each residual value by the maximum absolute residual value observed for that particular cause. Because of this rescaling the legends have been omitted.

## iter   10 value 7129.969895
## iter   20 value 7125.632969
## iter   30 value 7123.554796
## iter   40 value 7121.964676
## final  value 7121.933246 
## converged
## iter   10 value 9219.219592
## iter   20 value 9216.209978
## iter   30 value 9211.253992
## iter   40 value 9210.300091
## final  value 9210.271859 
## converged
## iter   10 value 9861.119140
## iter   20 value 9332.349146
## iter   30 value 8751.056653
## iter   40 value 8749.697193
## final  value 8749.697180 
## converged

aics <- rep(NA, 3)
pars <- vector("list", 3)
for (i in 1:3){
  aics[i] <- tbl_mod[["opt_output"]][[i]][["value"]]
  pars[[i]] <- tbl_mod[["opt_output"]][[i]][["par"]]
}

# opt_mix_fun <- function(pars, DATA, output = "AIC")
dta_nested %>% 
  mutate(model = map2(pars, data, opt_mix_fun, output = "model")) %>% 
  mutate(data_augmented = map2(model, data, broom::augment)) %>% 
  select(death_type, data = data_augmented) %>% 
  unnest() %>% 
  mutate(log_risk_modelled = .fitted) %>% 
  mutate(risk_actual = death_count / population_count) %>% 
  mutate(risk_modelled = exp(log_risk_modelled) / population_count) %>% 
  select(death_type, age, year, death_count, population_count, risk_actual, risk_modelled) %>% 
  mutate(residual = risk_actual - risk_modelled) -> mod_mix_predictions

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

mod_mix_predictions %>% 
  ggplot(., aes(x = year, y = age, fill = risk_modelled)) + 
  geom_tile() +
  coord_fixed() +
  facet_wrap(~ death_type  ) +
  scale_fill_gradientn(
    "Modelled\nDeath Rate", 
    colours = brewer_pal(palette = "Paired")(12)
    ) +
  theme(axis.text.x = element_text(angle = 90)) + 
  labs(x = "Year", y = "Age in years", title = "Prediction surface")

Looking at the figure of modelled surfaces it is evident that, in the case of the alcohol data, the model has been able to identify a value of \(\tau\), i.e. the year from which the additional bivariate normal term applies, at around 1995, which is close to that which might be expected given the corresponding data surface. For the first Normal distribution \(N_1(x)\) it has estimated a mean value of …., and a standard deviation of …, suggesting that before 1995 95% of alcohol-related deaths occurred between the ages of XXXX and YYYY. For the second Normal distribution \(N^{*}_2(x, t, \tau)\) the centre of the distribution is estimated to be at x = XXX and t = XXX, with a standard deviation along the age axis of XXX and along the year axis of XXX. Overall, the pattern of predicted alcohol-related deaths over the Lexis surface is similar to that shown in the data itself.

When looking at the modelled surfaces using this specification, when applied to the DRD and suicides data the pattern is very different. For both of these other causes of death it appears the terms associated with \(N^{*}_2(x, t, \tau)\) have not converged on meaningful values, and instead the models produced are little different to if only the first term \(N_1(x)\) were used. The means and standard deviations are XXX and yyy for DRD, and xxx and yyy for suicides.

The impression that the alcohol model fits the alcohol data relatively well, and the other datasets relatively poorly, is supported by looking at the AIC and BIC values, and comparing them against the best performing ‘generic’ model presented previously, ccpp. For the alcohol data the AIC of the alcohol model is 8750, compared with XXXX for the ccpp model; for BIC, which penalises additional parameters more heavily, the improvement over the ccpp model is even greater, with values of 8773 for the alcohol model and XXXX for the ccpp model. For DRD and suicide, by contrast, the Alcohol model does not outperform the ccpp models: For DRD the Alcohol model has an AIC of 7122, and BIC of 7145, compared with XXXX and YYYY respectively for the ccpp model. For suicide, the Alcohol Model has an AIC of 9210 and BIC of 9234, compared with XXX and YYYY respectively for the ccpp model.

# Residual surfaces
mod_mix_predictions %>% 
  group_by(death_type) %>% 
  mutate(abs_max_res = abs(max(residual))) %>% 
  mutate(grp_adj_res = residual / abs_max_res) -> tmp

sm_tmp <- smooth_var(dta = tmp, 
                     group_vars = c("death_type"), 
                     smooth_var = "grp_adj_res", 
                     smooth_par = 1.3) %>% 
  select(death_type, year, age, sm_res = grp_adj_res) %>% 
  group_by(death_type) %>% 
  filter(age > min(age) + 2, age < max(age) - 2) %>% 
  filter(year > min(year) + 2, year < max(year) - 2)

tmp2 <- tmp %>% left_join(sm_tmp) %>% 
  select(death_type, age, year, unsmoothed = grp_adj_res, smoothed = sm_res) %>% 
  gather(key = "smoothing", value = "residual", unsmoothed:smoothed) %>% 
  mutate(residual = ifelse(is.na(residual), 0, residual))

## Joining, by = c("death_type", "age", "year")

tmp2 %>% 
  ggplot(., aes(x = year, y = age, fill = residual)) + 
  geom_tile(na.rm = T) +
  coord_fixed() +
  facet_grid(smoothing ~ death_type) +
  scale_fill_gradientn(
    "Residual",
    limits = c(-1, 1),
    colours = brewer_pal(palette = "RdBu")(5),
    guide = FALSE
    ) +
  theme(axis.text.x = element_text(angle = 90)) + 
  labs(x = "Year", y = "Age in years")

Looking at the residuals surfaces the evidence of misspecification when using the Alcohol model is strongest in the case of the drug-related deaths data, in which the model tends to systematically over-estimate risks for the older, less affected cohorts, and systematically under-estimate risks for those in latter cohorts. A qualitatively similar, but much slighter, pattern is also evident for the suicide data. Exploring the residual surface for alcohol also indicates some systematic misspecification, a slight under-estimation of risk in men aged around 60 and above up to around 1995, followed by over-estimation in this age group after this period. A faint red ‘cloud’ is also evident between around the ages of 50 and 60 years after around 2000, suggesting the model in its current form slightly over-estimates the risk over this period and age group. Both allowing for a more skewed rather than symmetric distribution for \(N_1(x)\), and allowing for covariance in the bivariate normal distribution - i.e. for the off-diagonal terms in \(\mathbf{\Sigma}\) to be non-zero - could both therefore improve the fit of the model still further. However it already outperforms other models for this data.

DRD-model results

## iter   10 value 6035.037697
## iter   20 value 6033.001465
## final  value 6033.001465 
## converged
## iter   10 value 9066.529124
## iter   20 value 9065.247933
## final  value 9065.247865 
## converged
## final  value 11844.396690 
## converged

aics <- rep(NA, 3)
pars <- vector("list", 3)
for (i in 1:3){
  aics[i] <- tbl_mod_tri[["opt_output"]][[i]][["value"]]
  pars[[i]] <- tbl_mod_tri[["opt_output"]][[i]][["par"]]
}

# opt_mix_fun <- function(pars, DATA, output = "AIC")
dta_nested %>% 
  mutate(model = map2(pars, data, opt_tri_fun, output = "model")) %>% 
  mutate(data_augmented = map2(model, data, broom::augment)) %>% 
  select(death_type, data = data_augmented) %>% 
  unnest() %>% 
  mutate(log_risk_modelled = .fitted) %>% 
  mutate(risk_actual = death_count / population_count) %>% 
  mutate(risk_modelled = exp(log_risk_modelled) / population_count) %>% 
  select(death_type, age, year, death_count, population_count, risk_actual, risk_modelled) %>% 
  mutate(residual = risk_actual - risk_modelled) -> mod_tri_predictions

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

## Warning: Deprecated: please use `purrr::possibly()` instead

mod_tri_predictions %>% 
  ggplot(., aes(x = year, y = age, fill = risk_modelled)) + 
  geom_tile() +
  coord_fixed() +
  facet_wrap(~ death_type  ) +
  scale_fill_gradientn(
    "Modelled\nDeath Rate", 
    colours = brewer_pal(palette = "Paired")(12)
    ) +
  theme(axis.text.x = element_text(angle = 90)) + 
  labs(x = "Year", y = "Age in years", title = "Prediction surface")

The AIC for this model when applied to the DRD data is 6033, which is lower than the best performing generic ccpp model with an AIC of 6180. For suicide the DRD model’s AIC is 9065, which is close to but slightly worse than the ccpp model’s AIC of 9061. The AIC when applied to the alcohol data is by far the worst performing, with an AIC of 11844, as compared with 8869 for the ccpp model.

# Residual surfaces
mod_tri_predictions %>% 
  group_by(death_type) %>% 
  mutate(abs_max_res = abs(max(residual))) %>% 
  mutate(grp_adj_res = residual / abs_max_res) -> tmp

sm_tmp <- smooth_var(dta = tmp, 
                     group_vars = c("death_type"), 
                     smooth_var = "grp_adj_res", 
                     smooth_par = 1.3) %>% 
  select(death_type, year, age, sm_res = grp_adj_res) %>% 
  group_by(death_type) %>% 
  filter(age > min(age) + 2, age < max(age) - 2) %>% 
  filter(year > min(year) + 2, year < max(year) - 2)

tmp2 <- tmp %>% left_join(sm_tmp) %>% 
  select(death_type, age, year, unsmoothed = grp_adj_res, smoothed = sm_res) %>% 
  gather(key = "smoothing", value = "residual", unsmoothed:smoothed) %>% 
  mutate(residual = ifelse(is.na(residual), 0, residual))

## Joining, by = c("death_type", "age", "year")

tmp2 %>% 
  ggplot(., aes(x = year, y = age, fill = residual)) + 
  geom_tile(na.rm = T) +
  coord_fixed() +
  facet_grid(smoothing ~ death_type) +
  scale_fill_gradientn(
    "Residual",
    limits = c(-1, 1),
    colours = brewer_pal(palette = "RdBu")(5),
    guide = FALSE
    ) +
  theme(axis.text.x = element_text(angle = 90)) + 
  labs(x = "Year", y = "Age in years")

Patterns of DRD and Suicide

Jon Minton

11 July 2017

Introduction

Galton’s Ellipse

As-if-spatial reasoning about population data using Lexis surfaces

Initial data exploration

Lexis surfaces

Modelling

Simple non-bespoke models

Intuitive and informal modelling

De novo model specifications

DRD-model results

DRD-model results

Patterns of DRD and Suicide

Jon Minton

11 July 2017

Introduction

Galton’s Ellipse

As-if-spatial reasoning about population data using Lexis surfaces

Initial data exploration

Lexis surfaces

Modelling

Simple non-bespoke models

Modelled surfaces and residual surfaces for Alcohol-related mortality

Modelled surfaces and residual surfaces for drug-related deaths

Intuitive and informal modelling

De novo model specifications

Alcohol-related deaths model

Drug-related deaths model

DRD-model results

DRD-model results