Definition

The use of propensity scores as a method to promote causality in studies that cannot use random assignment has increased dramatically since its original publication in 1983 (An Introduction to Propensity Scores: What, When, and How, Sarah J. Beal1 and Kevin A. Kupzyk2, 2014). There are four main applications of propensity scores in practice: matching, stratification, regression adjustment, and weighting (Rosenbaum & Rubin, 1983).

At its simplest, propensity score matching matches each individual in the treatment group to an individual in the control group based on their propensity score.

After matching, the treatment group and the control group should have very similar characteristics. A simple regression model can be used to estimate the treatment effect on the outcome. Cluster-robust standard errors are required for correct inference.

Side-kick for humanitarian sector:

Result of this exercise may applicable to other sectors such as humanitarian sector. In this sector, impact monitoring is very important which enlights efficiency of the assistance. Sample designs are simple and contains two groups, whom is eligible to assistance as treatment group and those are not as control group. Yet, comparing two groups may have barriers while explaining impact of the assistance, especially in unconditional cash assistance.

One of the barriers is “targeting”. There are several targeting methods in humanitarian sector such as blanket, self targeting, demographic targeting, PMTs and so on. Some of these targeting methods such as demographic targeting makes control-treatment group profiles differs from each other such as number of children, elderly individuals or number of disabled persons in the household.

Since this might be the case for comparison of two group and understanding impact of the assistance, PSM could be use for better comparison between two group, trying to eliminate targeting effect on the selected independent variables. Thus having better perspective towards assistance impact towards household well-beings.

Conclusion

Coefficient of smoking without PSM+RSE is 2,55 with 0,19 std err.
Coefficient of smoking with PSM+RSE is 1,66 with 0,29 std err.
Coef of smoking with PSM+RSE range covers coef without PSM+RSE…
Matching makes sense to understand “impact” with trying to eliminating interference of other variables..?
For analysis like humanitarian sector, PSM is a challenging procedure, where one must understand mediation - behind the scenes there is always correlation and causation.

Story Behind and Dataset

In this setup, the treatment variable is smoking; the outcome variable is psychological distress. This research question will be addressed by propensity score matching.

Roadmap and Analysis

Select a research question such as: Is smoking associated with psychological distress?
Selection of dependent and independent variables: Smoking and P.distress
Conducting Propensity Score Matching (Outcome Analysis Part1)
Decide which variables will be use for PSM.
Conduct PSM.
After matching cases with PSM, conduct regression over matched cases; smoking~p.distress. (Outcome Analysis Part2)

For better understanding of “robust standard errors” go to: https://data.library.virginia.edu/understanding-robust-standard-errors/

(table(df1$smoker))

## 
##    0    1 
## 7026  974

Our data in terms of dependent variable is un-balanced. There are 7026 non-smokers while having 974 smokers. We will check distribution of dependent variable after matching procedure.

Outcome Analysis Part1

df1$remoteness <- factor(df1$remoteness, exclude = c("", NA))

match_obj <- matchit(smoker ~ sex + indigeneity + high_school + partnered + remoteness 
                     + language + risky_alcohol + age,
  data = df1, method = "nearest", distance ="glm",
  ratio = 1,
  replace = FALSE)

Plotting PSM.

#plotting the balance between smokers and non-smokers
plot(match_obj, type = "jitter", interactive = FALSE)

Propensity score matching will not be appropriate if there is not a satisfactory overlap in the propensity score distribution between the matched treated group and the matched untreated group.

plot(summary(match_obj), abs = FALSE)

Outcome Analysis Part2, Version1 with PSM

#Extract the matched data and save the data into the variable matched_data
matched_data <- match.data(match_obj)


res <- lm(psyc_distress ~ smoker, data = matched_data, weights = weights)

#Test the coefficient using cluster robust standard error
coeftest(res, vcov. = vcovCL, cluster = ~subclass)

## 
## t test of coefficients:
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 15.65298    0.20093  77.904 < 2.2e-16 ***
## smoker       1.66016    0.29657   5.598 2.477e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Calculate the confidence intervals based on cluster robust standard error
coefci(res, vcov. = vcovCL, cluster = ~subclass, level = 0.95)

##                 2.5 %    97.5 %
## (Intercept) 15.258922 16.047032
## smoker       1.078545  2.241783

Outcome Analysis Part2, Version1 without PSM

#Re-test without PSM and RSE.
res_v2 <- lm(psyc_distress ~ smoker, data = df1)
summary(res_v2)

## 
## Call:
## lm(formula = psyc_distress ~ smoker, data = df1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.313 -3.759 -1.759  2.241 35.241 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.75932    0.06768  218.06   <2e-16 ***
## smoker       2.55382    0.19398   13.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.673 on 7998 degrees of freedom
## Multiple R-squared:  0.02121,    Adjusted R-squared:  0.02109 
## F-statistic: 173.3 on 1 and 7998 DF,  p-value: < 2.2e-16

Propensity Score Matching

Çağrı Çebişli