Demobel collects demographic information about population residing in Belgium from 2001 to 2018. The information cover a population of 71251 individuals.
### DEMOBEL data
##################
demobel.1 <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Demo/c2020_119_dataset2_part1_c.sas7bdat")
demobel.2 <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Demo/c2020_119_dataset2_part2_c.sas7bdat")
demobel.3 <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Demo/c2020_119_dataset2_part3_c.sas7bdat")
#All the database match perfectly
table(demobel.2$ID_DEMO_C %in% demobel.1$ID_DEMO_C)
#Merging
demobel <- left_join(demobel.2, demobel.3, by=c("ID_DEMO_C", "STOCK_YR"="YR_STOCK"))
To do so we transformed the country codes in macro - areas
Here we considered larger in macro - areas
Here we investigate whether the individuals change the nationality (as a sort of preliminary check on the goodness of our data).
| Excluding people with Belgian Nationality | Including people with Belgian Nationality | |
|---|---|---|
| Change nationality | 26 | 3,695 |
| Did not change nationality | 9,925 | 67,419 |
Similarly we investigate the gender composition and the changes in gender as a preliminary check.
Similarly we investigate the gender composition and the changes in gender as a preliminary check.
| Count over the years | Change in sex | |
|---|---|---|
| Female | 584,939 | 0 |
| Male | 560,312 | 0 |
| NA | 137,267 |
#Census
census_2001_par <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Census/tu_reinvest_c2001par_2020_119_c.sas7bdat") census_2011_par <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Census/tu_reinvest_c2011par_2020_119_c.sas7bdat") census_2001_resp <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Census/tu_reinvest_c2001resp_2020_119_c.sas7bdat") census_2011_resp <- read_sas("C:/Users/u0125884/OneDrive - KU Leuven/Documents/HIVA/Reinvest/Data/Census/tu_reinvest_c2011resp_2020_119_c.sas7bdat")
#Merging Census
census_2001 <- left_join(census_2001_par, census_2001_resp, by = c("ID_DEMO_C"))
census_2001$year <- 2001
census_2011 <- left_join(census_2011_par, census_2011_resp, by = c("ID_DEMO_C"))
census_2011$year <- 2011
#merging Census with Demobel
demobel.census2001 <- left_join(demobel, census_2001, by = c("ID_DEMO_C", "STOCK_YR"="year"))
demobel.census <- left_join(demobel.census2001, census_2011, by = c("ID_DEMO_C", "STOCK_YR"="year"))
dim(demobel.census)
Census provide information regarding Father’s employment status, Mother’s employment status, and Respondend’s employment status in 2001 and 2011. However, the two variables do not match perfectly in the definition.
| Census 2001 | Census 2011 |
|---|---|
| Q1 |
CAS = Situation de l’emploi |
| 2001 | 2011 |
|---|---|
| 2001 | 2011 |
|---|---|
| 2001 | 2011 |
|---|---|
Census offers also the information about educational level of respondent, respondent’s mother and respondent’s father. However, also in this case the information is not harmonized between 2001 and 2011.
| Census 2001 | Census 2011 |
|---|---|
| 2001 | 2011 |
|---|---|
| 2001 | 2011 |
|---|---|
| 2001 | 2011 |
|---|---|
This information is available only for the year 2001
This information is available only for the year 2001
| Mother | Father |
|---|---|
This information is also not completely harmonized between 2001 and 2011
| 2001 | 2011 |
|---|---|
| 2001 | 2011 |
|---|---|
IPCAL data are obtained from two sources:
(1) eu_self_ipcal (which contain only information about MS_TNJPI_INDEP and FL_IOE_A); note that eu_self_ipcal is the same of tf_reinvest_ipcal
(2) and tf_silc_pit_rounded (which contain more detailed info about the income).
IMPORTANT REMARK: tf_silc_pit_rounded can be merged with census and demobel, while eu_self_ipcal and tf_reinvest_ipcal no. So we are missing these variables… which in any case were full of NA more than 90%
Data from eu_self_ipcal
| Average MS_TNJPI_INDEP | Percentage NA |
|---|---|
From this dataset we consider the following variables
| Variable | Rounded by | Maximum | Description |
|---|---|---|---|
| MS_NET_JOINTLY_WAGE = Net wage | 1000 | 83000 | Net wages of the declaration |
| MS_NET_JOINTLY_UNEMPL = Net unemployment | 500 | 14500 | Net unemployment benefits of the declaration |
| MS_NET_JOINTLY_SICK = Net sick | 500 | 18500 | Net sickness and disability benefits of the declaration |
| MS_NET_JOINTLY_PENSION = Net pension | 1000 | 44000 | Net pensions of the declaration |
… and also these ones
| Variable | Rounded by | Maximum | Description |
|---|---|---|---|
| MS_TOT_NET_PROF_INC = Prof income | 1000 | 83000 | Total net professional income of the declaration |
| MS_TOT_NET_TAXABLE_INC = Taxable income | 1000 | 83000 | Net taxable income of the declaration |
| MS_TOT_TAXES_DECL | ?? | ?? | ?? |
Data coming from tf_silc_pit_rounded
| Averages | Sums |
|---|---|
Data coming from tf_silc_pit_rounded
| Net sources | Tot net sources |
|---|---|
The variable MS_BEDRAG_OCMW is rounded by 500 and capped at 14000. It represents the Annual living wage.
| Average Bedrag | Percentage NA |
|---|---|
Eusilc data are composed by 4 component:
(1) Houshold register data = dfile
(2) Personal register data = rfile
(3) Household data = hfile
(4) Personal data = pfile
need more investigation… are there differences between PL030 and PL031? Does it change in 2013?
What should I do with the people receiving 0?
What should I do with the people receiving 0?