Cap1 - Propensity scores

Background

The relationship between forest cover and human socioeconomic development still is an open question as we have mixed evidences about it. In general, high forest cover is associated with poverty, but in many places, low forest cover is also associated with high poverty rates. The economic development of a forested region frequently happens through deforestation which generates economic growth, but if the gains are not equally distributed (i.e., through access to education, land, market), we often register loss of natural capital that are not translated to reduction in poverty and inequality over the long run. Furthermore, too much loss of natural capital might lead to a worsening of socioeconomic conditions, since the ES need to sustain well-being are lost. When the deforestation is limited to intermediate levels, it is likely that there will be space to build the basic infrastructure necessary for human socioeconomic development (e.g. food production, water and energy infrastructure, roads, schools, hospitals) without disrupting the ecological systems. In this paper we tested the hypothesis that municipalities with intermediate levels of natural capital should have better socioeconomic conditions than municipalities with extreme levels of native vegetation. Thus, we expected a quadratic relationship between native vegetation cover and socioeconomic indices.

Methods

I gathered data on native vegetation cover (NVC) and land cover change from MapBiomas 4.0. The HDI, Gini Index, extreme povery and under five mortality data where gathered from Atlas Brasil Project for 1210 municipalities within the Caatinga Biome domain. We selected 10 covariates (see below) and scaled them (i.e. mean = 0, sd = 1) to build the CBPS for both parametric and non-parametric analysis. The covariates’ data were gathered from IBGE (decadal and rural census) and from Dyngeland et al, 2020. We then evaluate the balance of the parametric an non-parametric CBPS against the unweighted covariates. With the propensity scores calculated and evaluated, we build weighted glm models with a quadratic term that accounts for our expected quadratic relationship between nvc and socioeconomic indices.

Covariates

Unemployment rates (2010)
Percentage of properties with financing (2017)
Mean bovine number per property (2017)
Mean goat number per property (2017)
Mean tons of beans per property (2017)
Mean tons of corn per property (2017)
Popolation density (2010)
Mean declivity (2004)
Mean elevation (2004)
Sum of area (km²) of protected areas (2013)

Covariate Balance Propensity Scores (Inverse Probability Weights)

modelnvc<- CBPS(data = dbcap1_PS,
              nvcPerc_10 ~      
                      txDesemp_10_scaled +  #covariáveis padronizadas (mean = 0; sd = 1)
                      finanPerc_17_scaled +
                      bovMed_17_scaled +
                      capMed_17_scaled +
                      beanMed_17_scaled +
                      cornMed_17_scaled +
                      popDens_10_scaled +
                      MeanSlopefor2004analysis_scaled +                   
                      MeanElevationfor2004analysis_scaled +              
                      Sumkm2_ProtectedArea2013_scaled,
              method = "exact",      # Método utilizado para tratamentos contínuos
              standardize = T,       # Padronizar os weights
              ATT = 0)      # Método para encontrar ATE (única opção para tratamentos contínuos)

head(modelnvc$weights, 10)

##  [1] 0.0010134430 0.0004996059 0.0006879515 0.0005845246 0.0016059954
##  [6] 0.0011664143 0.0006721420 0.0008720986 0.0004456217 0.0005071440

Covariates Balancing evaluation

##    Unweighted            CBPS               npCBPS         
##  Min.   :-0.24531   Min.   :-0.284682   Min.   :-0.063552  
##  1st Qu.:-0.01547   1st Qu.:-0.034132   1st Qu.:-0.007268  
##  Median : 0.04952   Median : 0.003656   Median : 0.012664  
##  Mean   : 0.05503   Mean   :-0.005356   Mean   : 0.014683  
##  3rd Qu.: 0.16953   3rd Qu.: 0.039110   3rd Qu.: 0.041006  
##  Max.   : 0.26788   Max.   : 0.190603   Max.   : 0.070207

Both parametric and non-parametric forms of CBPS improved the covariate balance in relation to the unweighted samples. However, the CBPS can be improved.

Example of how I get the Average Treatment Effect

model.expov10<- glm(data = dbcap1_PS, 
                    expov_2010 ~
                      nvcPerc_10 +
                      I(nvcPerc_10^2),
                    weights = modelnvc$weights
                    )
summary(model.expov10)

## 
## Call:
## glm(formula = expov_2010 ~ nvcPerc_10 + I(nvcPerc_10^2), data = dbcap1_PS, 
##     weights = modelnvc$weights)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.0383  -0.0485   0.0857   0.2428   0.9361  
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     29.1109160  1.4788330   19.68   <2e-16 ***
## nvcPerc_10      -0.8111136  0.0532963  -15.22   <2e-16 ***
## I(nvcPerc_10^2)  0.0091039  0.0004724   19.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.07205396)
## 
##     Null deviance: 125.09  on 1195  degrees of freedom
## Residual deviance:  85.96  on 1193  degrees of freedom
## AIC: 9464.3
## 
## Number of Fisher Scoring iterations: 2

coef(model.expov10)

##     (Intercept)      nvcPerc_10 I(nvcPerc_10^2) 
##    29.110915986    -0.811113580     0.009103857

ATE for each outcome

	IDHM	IDHM_E	IDHM_R	IDHM_L	Gini	Expov	u5mort
nvcPerc_10	0.0045227	0.0082023	0.0026718	0.0016086	-0.0037472	-0.8111136	-0.3562237
I(nvcPerc_10^2)	-0.0000457	-0.0000816	-0.0000290	-0.0000158	0.0000423	0.0091039	0.0034620

Non-parametric ATE for each outcome

	IDHM	IDHM_E	IDHM_R	IDHM_L	Gini	Expov	u5mort
nvcPerc_10	0.0009437	0.0011763	0.0008298	0.0006527	-4.93e-05	-0.1494985	-0.1507184
I(nvcPerc_10^2)	-0.0000068	-0.0000055	-0.0000088	-0.0000053	2.40e-06	0.0019252	0.0011979

We chose to use the parametric CBPS as in Dyngeland et al 2020, despite the balancing is better with non-parametric CBPS, because it does not changed the interpretation of the results.

We found that there is a significative (p<0.0001) quadratic relationship between native vegetation cover and all outcomes. The effect of native vegetation is stronger for extreme poverty and under five mortality. This could happen because native vegetation has a stronger, recognized effect of preventing people to dive further into poverty than lifting them out of poverty.

There is a slight tendency in municipalities with extreme levels of native vegetation to have lower socioeconomic conditions than municipalities with intermediate levels. Municipalities with very high native vegetation could be in a “green poverty” situation, with very few livelihood options and with very low infrastructure condition that could improve socioeconomic conditions. On the other extreme, municipalities with very low native vegetation could have people under “grey poverty” situation, where people can not rely on native vegetation in shortage times. It is important to notice that most of municipalities analyzed are rural and very dependent on rural activities.

Next steps

Add new set of covariates to improve balance
Re-do the analysis with other set of covariates to check for sensibility
Calculate the CBPS using other treatment distribution (e.g. gamma distribution)
Try others CBPS methods such as multi-dose treatment or time-varying treatment
Evalute the ATE with quantile regressions