# import the infra dataset
infra <- read_csv(here("Data", "Clean", "country payments and id infra.csv"))
# import the findex full dataset
fdx <- read_csv(here("Data", "Clean", "findex full.csv"))
# select key variables.
infra <- infra %>%
select(m21_bens_actual, total_fiscal, gdpg_2020e, gdp_per_capita, tax_to_gdp, has_id_id4d, received_wages, soc_reg_coverage, i_branches_A1_pop, i_ATMs_pop, has_account, deaths, gov_effect, tsa, unpan_egov, region, code, m21_status, inc_level, pop_2018, govt_transfer, april_ld_severity, pension) %>%
mutate(region = as.factor(region)) %>%
mutate(log_gdp = log(gdp_per_capita)) %>%
mutate(progress = (m21_status != "Planned"))
Prior to COVID-19, a large share of the population of most low and middle income countries neither received government assistance nor participated in the formal economy. We call this portion of the population the “invisible middle” because governments have no data on this portion of the population from either income tax receipts or participation in social assistance programs. The degree to which this portion of the population is effectively “invisible” to the state varies varies greatly though. In some countries,
The size of the invisible middle is highly correlated with income, though there are some notable outliers like Iran and Timor-Leste, both of which had high pre-existing cash transfer coverage. The scatterplots below show log income per capita vs. the share of the population receiving govt transfers (prior to COVID-19) the share of the population receiving wages (which proxies for participation in the formal economy), and the share of population which neither received govt transfers nor received wages.
** Robert – It might be a good idea to do these graphs for all of the countries in the Findex dataset to see if the linear relationship holds for wealthier countries.
# create variables for size of formal sector and missing middle
infra <- infra %>% mutate(formal = received_wages + pension, mm = 1- govt_transfer - formal)
fdx <- fdx %>% mutate(formal = received_wages + pension, mm = 1- govt_transfer - formal, log_gdp = log(gdp_per_capita))
ggplot(infra, aes(log_gdp, govt_transfer, label = code)) +
geom_point(aes(size = pop_2018, colour = region), show.legend = FALSE) +
geom_text_repel(aes(colour = region)) +
geom_smooth(method = "lm")
ggplot(infra, aes(log_gdp, formal, label = code)) +
geom_point(aes(size = pop_2018, colour = region), show.legend = FALSE) +
geom_text_repel(aes(colour = region)) +
geom_smooth(method = "lm") +
labs(y = "% adults in formal sector")
df <- infra %>% mutate(mm = 1-received_wages-govt_transfer)
ggplot(df, aes(log_gdp, mm, label = code)) +
geom_point(aes(size = pop_2018, colour = region), show.legend = FALSE) +
geom_text_repel(aes(colour = region)) +
geom_smooth(method = "lm") +
labs(y = "% population in the missing middle")
ggplot(fdx, aes(log_gdp, govt_transfer, label = code)) +
geom_point() +
geom_text_repel() +
geom_smooth(method = "lm")
ggplot(fdx, aes(log_gdp, formal, label = code)) +
geom_point() +
geom_text_repel() +
geom_smooth(method = "lm") +
labs(y = "% adults in formal sector")
ggplot(fdx, aes(log_gdp, mm, label = code)) +
geom_point() +
geom_text_repel() +
geom_smooth(method = "lm") +
labs(y = "% population in the missing middle")
While we term this portion of the population the “invisible middle”, whether it is truly in the “middle” depends greatly on tax compliance and accuracy of social assistance targeting. The first figure below plots the share of government transfer recipients whose wealth is below the 40th percentile vs. log GDP per capita. The figure shows that, in many countries, targeting of government transfers is quite poor and, according to this simple measure, even regressive. (Any country for which the y value is < .4 has a regressive government transfer policy.) We also see that targeting accuracy correlates quite a bit with country income.
As a robustness check, the second figure plots the share of adults receiving government transfers versus targeting accuracy. We might expect there to be a negative correlation between the share of adults receiving transfers and targeting accuracy since, theoretically, it should be easier to target a smaller program. (And, according to my crude targetting measure, impossible to perfectly target a transfer which reaches more than 40% of the population.) If so, this could be confounding the relationship between income and transfer targeting accuracy. Instead, we see a positive correlation between the share of adults receiving transfers and targeting accuracy.
Lastly, I also attempted to calculate the share of the invisible middle (and in the formal sector) from the poorest 40% but I don’t really trust the data since it requires a bunch of assumptions and, for many countries, the size of the “middle” is negative (since there is overlap in the various groups). The third figure below plots the share of govt transfer recipients who are poor versus the share of the missing middle who are poor excluding all countries where the size of the middle is negative.
fdx <- fdx %>% mutate(formal_poor = received_wages_poor + pension_poor,
formal_rich = received_wages_rich + received_wages_rich,
middle_poor = 1 - formal_poor - govt_transfer_poor,
middle_rich = 1 - formal_rich - govt_transfer_rich)
# Share receiving govt transfer, in the formal sector, and in the "middle" who are poor
fdx <- fdx %>% mutate(govt_transfer_share = govt_transfer_poor * .4 / (govt_transfer_poor * .4 + govt_transfer_rich * .6),
middle_share = middle_poor * .4 / (middle_poor * .4 + middle_rich * .6),
formal_share = formal_poor * .4 / (formal_poor * .4 + formal_rich * .6))
# look at accuracy of targeting by income
ggplot(fdx, aes(log_gdp, govt_transfer_share, label = code)) +
geom_point()+
geom_text_repel()+
geom_smooth()+
labs(y = "% gov transfer recipients from poorest 40%")
# Look at accuracy by size of transfer
df <- fdx %>% mutate(inc_bin = ntile(log_gdp, 5))
ggplot(df, aes(govt_transfer, govt_transfer_share, label = code))+
geom_point(aes(color = inc_bin)) +
geom_text_repel()+
geom_smooth()+
labs(colour = "Country income quantile", y = "% gov transfer recipients from poorest 40%", x = "% adults receiving gov transfer")
# filter for countries with positive middle_poor and middle_rich
df <- fdx %>% filter(middle_poor > 0 & middle_rich > 0)
# Plot the share of the middle who are poor versus the share of govt transfer recipients who are poor
ggplot(df, aes(govt_transfer_poor, middle_share, label = code)) +
geom_point() +
geom_text_repel() +
xlim(0,.4) + ylim(0,1) +
labs(y = "% missing middle from poorest 40%", x = "% of bottom 40% who receive govt transfers")
We have hypothesized that countries were quick to deliver additional cash to existing cash transfer beneficiaries and to formal sector workers but were a bit slow in delivering cash to the invisible middle. I have added this as a placeholder for that result.
If we are able to collect data on how different countries identified new beneficiaries it would be cool to add that analysis to the paper.