1 Import library

2 Data processing

#create a list of the files from your target directory
file_list <- list.files(path="C:\\Users\\Datarist\\OneDrive - VietNam National University - HCM INTERNATIONAL UNIVERSITY\\Study-IU\\FMath1\\male\\mltper_1x1")

code=gsub(".mltper_1x1.txt","",file_list)

file_list_full <- list.files(path="C:\\Users\\Datarist\\OneDrive - VietNam National University - HCM INTERNATIONAL UNIVERSITY\\Study-IU\\FMath1\\male\\mltper_1x1",full.names = T)

l=lapply(file_list_full, fread)
names(l)=code

f=rbindlist(l,use.names = T,fill=T,idcol="Country")
#country_code=readxl::read_excel("country_code.xlsx")
#country_code
final=f 
#%>% left_join(country_code,by=c("file"="code"))

3 Data description

data=final %>% select(Country,Year, Age, qx,lx,dx) %>% mutate(px=1-as.numeric(qx)) %>% filter(Year==2002,!is.na(Country))
  • Year: Year of observing data

  • Age: Age group

  • q(x): Probability of death between ages x and x+1

  • p(x): Probability of alive between ages x and x+1

  • l(x): Number of survivors at exact age x, assuming l(0) = 100,000

  • d(x): Number of deaths between ages x and x+1

data[Age=="110+",]$Age="110"

data =data %>%  mutate_at(names(data)[-1], as.numeric)
glimpse(data)
## Rows: 5,439
## Columns: 7
## $ Country <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",…
## $ Year    <dbl> 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 20…
## $ Age     <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ qx      <dbl> 0.00529, 0.00047, 0.00034, 0.00031, 0.00015, 0.00014, 0.00011,…
## $ lx      <dbl> 100000, 99471, 99424, 99391, 99360, 99345, 99332, 99320, 99303…
## $ dx      <dbl> 529, 47, 34, 30, 15, 14, 11, 17, 10, 16, 19, 7, 11, 19, 25, 26…
## $ px      <dbl> 0.99471, 0.99953, 0.99966, 0.99969, 0.99985, 0.99986, 0.99989,…

4 One-year moritality rate

4.1 Invidual probability

data %>% filter(Country=="USA",Age==40) %>% select(qx)
##         qx
## 1: 0.00268

4.2 One-year log moritality rate over age

t=data %>% ggplot(aes(x=Age, y=log(qx),col=Country,group=1))+geom_line()
ggplotly(t, height = 500, width = 800)

5 Suvival probabilities

5.1 Individual probabilities

Given a baby born in 2002, the probability he/her can live over a certain age is the cumulative product of probability of alive from the current age to that age.

For example, compute the survival probability over the first 5 years of a male baby born in 2002 in USA.

data %>% filter(Country=="U.S.A.",Age %in% c(0:5)) %>% summarize(sp=prod(px))
##   sp
## 1  1

Compute the survival probability over the next 50 years of a 20-year male born in 2002 in USA.

data %>% filter(Country=="U.S.A.",Age %in% c(20:70)) %>% summarize(sp=prod(px))
##   sp
## 1  1

5.2 Survival probabilities over age

t=data %>%group_by(Country)%>% mutate(cpx=cumprod(px))%>% ggplot(aes(x=Age, y=cpx,col=Country,group=1))+geom_line()
ggplotly(t, height = 500, width = 800)

Given a person born in 2002 and still alive in 2022, the probability he/her can live over a certain age is the cumulative product of probability of alive from age 20 to that age.

t=data %>% filter(Age>20) %>%group_by(Country)%>% mutate(cpx=cumprod(px))%>% ggplot(aes(x=Age, y=cpx,col=Country,group=1))+geom_line()
ggplotly(t, height = 500, width = 800)