Data Pre-processing
file_list <- list.files(path="C:\\Users\\MyNgo\\Downloads\\lt_male-20230225T043725Z-001\\lt_male\\mltper_1x1")
code=gsub(".mltper_1x1.txt","",file_list)
file_list_full <- list.files(path="C:\\Users\\MyNgo\\Downloads\\lt_male-20230225T043725Z-001\\lt_male\\mltper_1x1",full.names = TRUE)
l=lapply(file_list_full, fread)
names(l)=code
f=rbindlist(l,use.names = TRUE,fill=TRUE,idcol="Country")
As I don’t know the country_code.xlsx so that I do with the dataframe f
data=f %>% select(Country,Year, Age, qx,lx,dx) %>% mutate(px=1-as.numeric(qx)) %>% filter(Year==2002,!is.na(Country))
data[Age=="110+",]$Age="110"
data =data %>% mutate_at(names(data)[-1], as.numeric)
glimpse(data)
## Rows: 5,439
## Columns: 7
## $ Country <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",…
## $ Year <dbl> 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 20…
## $ Age <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ qx <dbl> 0.00529, 0.00047, 0.00034, 0.00031, 0.00015, 0.00014, 0.00011,…
## $ lx <dbl> 100000, 99471, 99424, 99391, 99360, 99345, 99332, 99320, 99303…
## $ dx <dbl> 529, 47, 34, 30, 15, 14, 11, 17, 10, 16, 19, 7, 11, 19, 25, 26…
## $ px <dbl> 0.99471, 0.99953, 0.99966, 0.99969, 0.99985, 0.99986, 0.99989,…
## [1] "AUS" "AUT" "BEL" "BGR" "BLR" "CAN" "CHE"
## [8] "CHL" "CZE" "DEUTE" "DEUTNP" "DEUTW" "DNK" "ESP"
## [15] "EST" "FIN" "FRACNP" "FRATNP" "GBR_NIR" "GBR_NP" "GBR_SCO"
## [22] "GBRCENW" "GBRTENW" "GRC" "HKG" "HRV" "HUN" "IRL"
## [29] "ISL" "ISR" "ITA" "JPN" "KOR" "LTU" "LUX"
## [36] "LVA" "NLD" "NOR" "NZL_MA" "NZL_NM" "NZL_NP" "POL"
## [43] "PRT" "RUS" "SVK" "SVN" "SWE" "TWN" "UKR"
## [50] "USA"
Year: Year of observing data
Age: Age group
q(x): Probability of death between ages x and x+1
p(x): Probability of alive between ages x and x+1
l(x): Number of survivors at exact age x, assuming l(0) = 100,000
d(x): Number of deaths between ages x and x+1
Individual Probability
Given a male born in USA in 2002, what is the estimate probability of dying in year of age 40?
data %>% filter(Country=="USA",Age==40) %>% select(qx)
## qx
## 1: 0.00268
One-year log mortality rate over ages
t=data %>% ggplot(aes(x=Age, y=log(qx),col=Country,group=1))+geom_line() + labs(x = "Age", y = "Log mortality Rate", title = "One-year log mortality Rates, over ages")
ggplotly(t)
Individual probabilities Given a baby born in 2002, the probability he/her can live over a certain age is the cumulative product of probability of alive from the current age to that age.
Compute the survival probability over the first 5 years of a male baby born in 2002 in USA.
data %>% filter(Country=="USA",Age %in% c(0:5)) %>% summarize(sp=prod(px))
## sp
## 1 0.9907133
Compute the survival probability over the next 50 years of a 20-year male born in 2002 in USA.
data %>% filter(Country=="USA",Age %in% c(20:70)) %>% summarize(sp=prod(px))
## sp
## 1 0.6919628
Survival probabilities over ages
Estimate survival probabilities of a new born in 2002 over ages
t=data %>%group_by(Country)%>% mutate(cpx=cumprod(px))%>% ggplot(aes(x=Age, y=cpx,col=Country,group=1))+geom_line() + labs(x = "Age", y = "Cummulative probabilities", title = "Probabilities of a new born in 2002 still survise, over ages")
ggplotly(t)
Given a person born in 2002 and still alive in 2022, the probability he/her can live over a certain age is the cumulative product of probability of alive from age 20 to that age.
t=data %>% filter(Age>20) %>%group_by(Country)%>% mutate(cpx=cumprod(px))%>% ggplot(aes(x=Age, y=cpx,col=Country,group=1))+geom_line()+labs(x = "Age", y = "Cumulative probability", title = "The probabilities of person born in 2002 & still alive in 2022")
ggplotly(t)