The use of propensity scores as a method to promote causality in studies that cannot use random assignment has increased dramatically since its original publication in 1983 (An Introduction to Propensity Scores: What, When, and How, Sarah J. Beal1 and Kevin A. Kupzyk2, 2014). There are four main applications of propensity scores in practice: matching, stratification, regression adjustment, and weighting (Rosenbaum & Rubin, 1983).
At its simplest, propensity score matching matches each individual in the treatment group to an individual in the control group based on their propensity score.
After matching, the treatment group and the control group should have very similar characteristics. A simple regression model can be used to estimate the treatment effect on the outcome. Cluster-robust standard errors are required for correct inference.
Side-kick for humanitarian sector:
Result of this exercise may applicable to other sectors such as humanitarian sector. In this sector, impact monitoring is very important which enlights efficiency of the assistance. Sample designs are simple and contains two groups, whom is eligible to assistance as treatment group and those are not as control group. Yet, comparing two groups may have barriers while explaining impact of the assistance, especially in unconditional cash assistance.
One of the barriers is “targeting”. There are several targeting methods in humanitarian sector such as blanket, self targeting, demographic targeting, PMTs and so on. Some of these targeting methods such as demographic targeting makes control-treatment group profiles differs from each other such as number of children, elderly individuals or number of disabled persons in the household.
Since this might be the case for comparison of two group and understanding impact of the assistance, PSM could be use for better comparison between two group, trying to eliminate targeting effect on the selected independent variables. Thus having better perspective towards assistance impact towards household well-beings.
In this setup, the treatment variable is smoking; the outcome variable is psychological distress. This research question will be addressed by propensity score matching.
For better understanding of “robust standard errors” go to: https://data.library.virginia.edu/understanding-robust-standard-errors/
(table(df1$smoker))
##
## 0 1
## 7026 974
Our data in terms of dependent variable is un-balanced. There are 7026 non-smokers while having 974 smokers. We will check distribution of dependent variable after matching procedure.
df1$remoteness <- factor(df1$remoteness, exclude = c("", NA))
match_obj <- matchit(smoker ~ sex + indigeneity + high_school + partnered + remoteness
+ language + risky_alcohol + age,
data = df1, method = "nearest", distance ="glm",
ratio = 1,
replace = FALSE)
Plotting PSM.
#plotting the balance between smokers and non-smokers
plot(match_obj, type = "jitter", interactive = FALSE)
Propensity score matching will not be appropriate if there is not a satisfactory overlap in the propensity score distribution between the matched treated group and the matched untreated group.
plot(summary(match_obj), abs = FALSE)
#Extract the matched data and save the data into the variable matched_data
matched_data <- match.data(match_obj)
res <- lm(psyc_distress ~ smoker, data = matched_data, weights = weights)
#Test the coefficient using cluster robust standard error
coeftest(res, vcov. = vcovCL, cluster = ~subclass)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.65298 0.20093 77.904 < 2.2e-16 ***
## smoker 1.66016 0.29657 5.598 2.477e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Calculate the confidence intervals based on cluster robust standard error
coefci(res, vcov. = vcovCL, cluster = ~subclass, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 15.258922 16.047032
## smoker 1.078545 2.241783
#Re-test without PSM and RSE.
res_v2 <- lm(psyc_distress ~ smoker, data = df1)
summary(res_v2)
##
## Call:
## lm(formula = psyc_distress ~ smoker, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.313 -3.759 -1.759 2.241 35.241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.75932 0.06768 218.06 <2e-16 ***
## smoker 2.55382 0.19398 13.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.673 on 7998 degrees of freedom
## Multiple R-squared: 0.02121, Adjusted R-squared: 0.02109
## F-statistic: 173.3 on 1 and 7998 DF, p-value: < 2.2e-16