1. Data Pre-Processing

Data Pre-processing

file_list <- list.files(path="C:\\Users\\MyNgo\\Downloads\\lt_male-20230225T043725Z-001\\lt_male\\mltper_1x1")

code=gsub(".mltper_1x1.txt","",file_list)

file_list_full <- list.files(path="C:\\Users\\MyNgo\\Downloads\\lt_male-20230225T043725Z-001\\lt_male\\mltper_1x1",full.names = TRUE)

l=lapply(file_list_full, fread)

names(l)=code

f=rbindlist(l,use.names = TRUE,fill=TRUE,idcol="Country")

As I don’t know the country_code.xlsx so that I do with the dataframe f

data=f %>% select(Country,Year, Age, qx,lx,dx) %>% mutate(px=1-as.numeric(qx)) %>% filter(Year==2002,!is.na(Country))

data[Age=="110+",]$Age="110"

data =data %>%  mutate_at(names(data)[-1], as.numeric)
glimpse(data)
## Rows: 5,439
## Columns: 7
## $ Country <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",…
## $ Year    <dbl> 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 20…
## $ Age     <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ qx      <dbl> 0.00529, 0.00047, 0.00034, 0.00031, 0.00015, 0.00014, 0.00011,…
## $ lx      <dbl> 100000, 99471, 99424, 99391, 99360, 99345, 99332, 99320, 99303…
## $ dx      <dbl> 529, 47, 34, 30, 15, 14, 11, 17, 10, 16, 19, 7, 11, 19, 25, 26…
## $ px      <dbl> 0.99471, 0.99953, 0.99966, 0.99969, 0.99985, 0.99986, 0.99989,…

2. Data Description

  • Country: name of the country (in this dataframe, we have 50 countries)
##  [1] "AUS"     "AUT"     "BEL"     "BGR"     "BLR"     "CAN"     "CHE"    
##  [8] "CHL"     "CZE"     "DEUTE"   "DEUTNP"  "DEUTW"   "DNK"     "ESP"    
## [15] "EST"     "FIN"     "FRACNP"  "FRATNP"  "GBR_NIR" "GBR_NP"  "GBR_SCO"
## [22] "GBRCENW" "GBRTENW" "GRC"     "HKG"     "HRV"     "HUN"     "IRL"    
## [29] "ISL"     "ISR"     "ITA"     "JPN"     "KOR"     "LTU"     "LUX"    
## [36] "LVA"     "NLD"     "NOR"     "NZL_MA"  "NZL_NM"  "NZL_NP"  "POL"    
## [43] "PRT"     "RUS"     "SVK"     "SVN"     "SWE"     "TWN"     "UKR"    
## [50] "USA"
  • Year: Year of observing data

  • Age: Age group

  • q(x): Probability of death between ages x and x+1

  • p(x): Probability of alive between ages x and x+1

  • l(x): Number of survivors at exact age x, assuming l(0) = 100,000

  • d(x): Number of deaths between ages x and x+1

3.One Year Mortality rate

Individual Probability

Given a male born in USA in 2002, what is the estimate probability of dying in year of age 40?

data %>% filter(Country=="USA",Age==40) %>% select(qx)
##         qx
## 1: 0.00268

One-year log mortality rate over ages

t=data %>% ggplot(aes(x=Age, y=log(qx),col=Country,group=1))+geom_line() + labs(x = "Age", y = "Log mortality Rate", title = "One-year log mortality Rates, over ages") 
ggplotly(t)

4. Survival probabilities

Individual probabilities Given a baby born in 2002, the probability he/her can live over a certain age is the cumulative product of probability of alive from the current age to that age.

Compute the survival probability over the first 5 years of a male baby born in 2002 in USA.

data %>% filter(Country=="USA",Age %in% c(0:5)) %>% summarize(sp=prod(px))
##          sp
## 1 0.9907133

Compute the survival probability over the next 50 years of a 20-year male born in 2002 in USA.

data %>% filter(Country=="USA",Age %in% c(20:70)) %>% summarize(sp=prod(px))
##          sp
## 1 0.6919628

Survival probabilities over ages

Estimate survival probabilities of a new born in 2002 over ages

t=data %>%group_by(Country)%>% mutate(cpx=cumprod(px))%>% ggplot(aes(x=Age, y=cpx,col=Country,group=1))+geom_line() + labs(x = "Age", y = "Cummulative probabilities", title = "Probabilities of a new born in 2002 still survise, over ages") 
ggplotly(t)

Given a person born in 2002 and still alive in 2022, the probability he/her can live over a certain age is the cumulative product of probability of alive from age 20 to that age.

t=data %>% filter(Age>20) %>%group_by(Country)%>% mutate(cpx=cumprod(px))%>% ggplot(aes(x=Age, y=cpx,col=Country,group=1))+geom_line()+labs(x = "Age", y = "Cumulative probability", title = "The probabilities of person born in 2002 & still alive in 2022") 
ggplotly(t)