(v1.21-2019-01-18 10:50:26 UTC)
Population demographics are a common area of study across ecological research, essential for understanding populations and their change over time. While useful for human populations, these characteristics are of particular importance when examining either species of conservation concern, or the management of invasive species. While life tables may be relatively easily constructed in excel, other aspects of the demography package are a bit more specialized. In addition, the ability to query data and easily manipulate life table data in r adds some additional functionality to user analysis. In order to fully explore this package, and to avoid field specific jargon, human data will be utilized in this intro. These methods may be expanded upon to explore other populations of interest.
Package description: “The R package demography provides functions for demographic analysis including: life table calculations; Lee-Carter modelling; functional data analysis of mortality rates, fertility rates, net migration numbers; and stochastic population forecasting.”
Life Table Calculations: A life table describes the population demographics in terms survivorship, essentially the number of individuals which are likely to progress to the next age or life stage.
Lee-Carter Modelling: An approach to forecasting mortality, not frequently seen in statistical packages.
Functional Data Analysis Allows for analysis of population stats such as: mortality rates, fertility rates, net migration numbers and stochastic population forecasting
In order to properly use the following functions, managing our data will save a lot of time and effort.
In order to import our dataset into r, we will want to utilize the read.demogdata() function found within the demography package
>read.demogdata(file, popfile, type, label, max.mx = 10, skip = 2, popskip = skip, lambda, scale=1)
For this tutorial, data will be obtained from the human mortality database. “https://www.mortality.org/” Accounts are free under the new user registration section. Once registered, data can be accessed for a variety of countries over varying time periods. For the purposes of this guide we will download U.S.A. data. Two datasets of interest are the death rates file, and the Exposure-to-risk file from the period data section. If you have a different dataset, formatting your information after this data will ensure it interacts with the package smoothly.
Once we have our data, we can convert them into a demogdata object which will be utilized by many functions in the package.
>dat=read.demogdata(file="USAdeathrates",popfile="USApopfile", type="mortality",label="USA",skip=0,scale=1 )
We can visualize our data with the plot command as displayed below
plot(dat,series=ifelse(!is.null(dat$rate),names(dat$rate)[3],names(dat$pop)[3]), datatype=ifelse(!is.null(dat$rate),"rate"))
Figure One: The combined Male and Female mortality rates for the United States between 1933 and 2016.Each line represents the cohort from a given year, with recent years are plotted in violet, the older values are in red. Death rates are commonly plotted on the log scale in order to better visualize small changes. A review of how logs work, the smaller the death rate, the more negative a value we see on the plot. So overtime we see a decrease in mortality rates for every age group besides the elderly.
Here we can see a plot depicting for our combined male and female mortality rates. In the case of multiple years being plotted, the default plot command utilizes a rainbow pallet to differentiate data years. The earliest data years are red, and a standard rainbow transitionary scheme is utilized until the most recent violet values.In our plot we can see death rates follow a similar trend with age over time, but decrease overall as we progress to modern times. This pattern does diverge at the later age groups where we see the elderly living longer in modern times.
If we change our plot type to time, we can change how we visualize the death rates for each particular age. Now each age is plotted as its own individual time series.
plot(dat,series=ifelse(!is.null(dat$rate),names(dat$rate)[3],names(dat$pop)[3]), datatype=ifelse(!is.null(dat$rate),"rate"),plot.type=c("time"),xlab="Year")
Figure Two: Combined male and female mortality rates with each line representing a single age group across time. Younger age groups plotted in violet.
Once you have an account with the HMD you can utilize a command in r to access data automatically. This command will save the data as a demogdata object which can be utilized just as dat above
hmd.mx(country,username,password, label=country)
combine.demogdata can combine two demogdata files into one continuous file. This is meant to refer to datasets which describe the same age structure but differing years. For example combining a forecasted future with the current dataset.
The lifetable command in the demography package is as follows
lifetable(data, series = names(data$rate)[1], years = data$year,
ages = data$age, max.age = min(100, max(data$age)),
type = c("period", "cohort"))
ex= Expectation of life at age x, very similar to Tx for this dataset If we want to examine the female death rate in 2016:
>lifetable=lifetable(dat,years=c(2016),age=dat$ages,type=c("period")) Period lifetable for U.S.A. female death rate in 2016
Year: 2016
mx qx lx dx Lx Tx ex
0 0.0053 0.0053 1.0000 0.0053 0.9951 81.3843 81.3843
1 0.0004 0.0004 0.9947 0.0004 0.9945 80.3892 80.8169
2 0.0002 0.0002 0.9944 0.0002 0.9942 79.3947 79.8455
3 0.0002 0.0002 0.9941 0.0002 0.9940 78.4005 78.8648
4 0.0002 0.0002 0.9939 0.0002 0.9939 77.4064 77.8778
5 0.0001 0.0001 0.9938 0.0001 0.9937 76.4126 76.8896
6 0.0001 0.0001 0.9937 0.0001 0.9936 75.4188 75.8995
7 0.0001 0.0001 0.9936 0.0001 0.9935 74.4252 74.9075
8 0.0001 0.0001 0.9934 0.0001 0.9934 73.4317 73.9165
9 0.0001 0.0001 0.9933 0.0001 0.9933 72.4383 72.9235
10 0.0001 0.0001 0.9932 0.0001 0.9932 71.4450 71.9312
11 0.0001 0.0001 0.9931 0.0001 0.9931 70.4518 70.9378
12 0.0001 0.0001 0.9930 0.0001 0.9930 69.4587 69.9457
13 0.0001 0.0001 0.9929 0.0001 0.9929 68.4657 68.9533
14 0.0002 0.0002 0.9928 0.0002 0.9927 67.4729 67.9621
15 0.0002 0.0002 0.9926 0.0002 0.9925 66.4802 66.9739
16 0.0002 0.0002 0.9924 0.0002 0.9923 65.4876 65.9868
17 0.0003 0.0003 0.9922 0.0003 0.9920 64.4953 65.0028
18 0.0004 0.0004 0.9919 0.0004 0.9917 63.5033 64.0220
19 0.0004 0.0004 0.9915 0.0004 0.9913 62.5116 63.0459
20 0.0004 0.0004 0.9911 0.0004 0.9909 61.5202 62.0701
21 0.0005 0.0005 0.9907 0.0005 0.9905 60.5293 61.0966
22 0.0005 0.0005 0.9902 0.0005 0.9900 59.5388 60.1259
23 0.0005 0.0005 0.9897 0.0005 0.9895 58.5488 59.1564
24 0.0005 0.0005 0.9892 0.0005 0.9889 57.5594 58.1880
25 0.0006 0.0006 0.9887 0.0006 0.9884 56.5704 57.2190
26 0.0006 0.0006 0.9881 0.0006 0.9878 55.5821 56.2528
27 0.0007 0.0007 0.9874 0.0007 0.9871 54.5943 55.2883
28 0.0007 0.0007 0.9868 0.0007 0.9864 53.6072 54.3256
29 0.0007 0.0007 0.9861 0.0007 0.9857 52.6208 53.3640
30 0.0008 0.0008 0.9854 0.0008 0.9850 51.6351 52.4027
**Table One:** Lifetable of the 2016 female cohort
Additional commands relating to lifetables plot.lifetable allows for data visualization. Lets visualize the total death rates for the USA population in 2016.
lifetable=lifetable(dat,series=names(dat$rate)[3],years=c(2016),ages=dat$age,type=c("period"))
plot(lifetable, lifetable, xlab="Age", ylab="Expected number of years left")
Figure Three: Combined male and female life expectancy for individuals in 2016
the life.expectancy function displays an estimate from life expectancy for any particular group in the given year.
lefemale=life.expectancy(dat,series=names(dat$rate)[1],years=c(2016),type=c("period"))
lemale=life.expectancy(dat,series=names(dat$rate)[2],years=c(2016),type=c("period"))
letotal=life.expectancy(dat,series=names(dat$rate)[3],years=c(2016),type=c("period"))
lefemale
## Time Series:
## Start = 2016
## End = 2016
## Frequency = 1
## [1] 81.38429
lemale
## Time Series:
## Start = 2016
## End = 2016
## Frequency = 1
## [1] 76.38104
letotal
## Time Series:
## Start = 2016
## End = 2016
## Frequency = 1
## [1] 78.89771
The sexratio function can be used to depict mortality rates at differing age classes
sexratio=sex.ratio(dat)
plot(sexratio, ylab="Sex ratios (M/F)")
Figure Four: Sex ratios of our mortality data for differing age groups, with each line representing a cohort in time
Lee Carter Models allow us to extend our models to the future and predict mortality or life expectancy. While I won’t go too in depth here, it is of note in this package as not many statistical programs have a built in means to perform this task. In order to forecast ahead, our data must first be converted into a Lee-Carter model.
Lets examine the projected total death rates and then project this into the future The lca command requires a demogdata object as its primary input. With many of the other components mirroring what we have seen above. The forecast.lca command utilizes the lca output, and allows you to forecast ahead by the number of years indicated in h. The se and jumpchoice arguments refer to methods of computation which can be examined in more depth on the demography package page. The confidence level for prediction intervals is given by the level argument.
lca2=lca(dat,series=names(dat$rate)[3],years=dat$year,ages=dat$age,max.age=100)
forecast.lca(lca2, h=50, se=c("innovdrift"), jumpchoice=c("fit"), level=90)
If we plot our forecasted data we see a similar plotting mechanism as to our initial data plot.
Figure Five: Projected combined male and female death rates for the years 2017 to 2066
summary(lca2)
## Lee-Carter analysis
##
## Call: lca(data = dat, series = names(dat$rate)[3], years = dat$year,
##
## Call: ages = dat$age, max.age = 100)
##
## Adjustment method: dt
## Region: USA
## Years in fit: 1933 - 2016
## Ages in fit: 0 - 100
##
## Percentage variation explained: 96.2%
##
## ERROR MEASURES BASED ON MORTALITY RATES
##
## Averages across ages:
## ME MSE MPE MAPE
## -0.00001 0.00005 0.00755 0.06977
##
## Averages across years:
## IE ISE IPE IAPE
## 0.00005 0.00386 0.75703 6.93471
##
##
## ERROR MEASURES BASED ON LOG MORTALITY RATES
##
## Averages across ages:
## ME MSE MPE MAPE
## 0.00281 0.00949 -0.00013 0.01660
##
## Averages across years:
## IE ISE IPE IAPE
## 0.28073 0.94279 -0.01769 1.60972