Import required packages and load data:

library(flexdashboard)
library(MatchIt)
library(ggplot2)
load("~/MWR_course/exercises/exercise_data.RData")

Ex 3. Begin by exact matching on all the dummy variables.

vars <- paste(c("black","hisp","married","nodegr","u74","u75"),collapse = " + ")
f <- as.formula(paste("treat ~",vars))

ematch_out <- matchit(formula = f, data=d, method="exact")

How many treated cases cannot be matched?

ematch_out

Call: 
matchit(formula = f, data = d, method = "exact")

Exact Subclasses: 26

Sample sizes:
          Control Treated
All          2213     297
Matched      2092     287
Unmatched     121      10

What is the (FS)ATT estimate?

ematch_data <- match.data(ematch_out)
lm(re78~treat,data=ematch_data,weights=weights)

Call:
lm(formula = re78 ~ treat, data = ematch_data, weights = weights)

Coefficients:
(Intercept)        treat  
      8.244       -2.386  

Ex 4. Use the observational data to estimate each case’s propensity to receive treatment using glm(). Use a logistic regression with quadratic terms for age, education, 1974 income, and 1975 income.

vars2 <- paste(c("age","educ","re74","re75",vars,
                 "I(age^2)","I(educ^2)","I(re74^2)","I(re75^2)"),collapse = " + ")
f <- as.formula(paste("treat ~",vars2))
psmod <- glm(formula=f,data=d,family=binomial)

If you are familiar with plotting in R, look at the density plots of the p-score for treated and untreated groups. (If not, you can move on. We’ll do the same thing using bal.plot() in a bit.)

d$ps1 = predict(psmod, type = "response")
p <- ggplot(d, aes( x = ps1 , 
                    fill = factor(treat), 
                    color = factor(treat),
                    after_stat(scaled))) + 
  geom_density(alpha = .3) +
  labs(x = "Estimated Propensity Score" ,
       y = "scaled") +
  theme(legend.position = "top") +
  theme(legend.title = element_blank()) 
d$log_odds = predict(psmod, type = "link")
p1 <- ggplot(d, aes( x = log_odds , 
                    fill = factor(treat), 
                    color = factor(treat),
                    after_stat(scaled))) + 
  geom_density(alpha = .3) +
  labs(x = "log_odds" ,
       y = "scaled") +
  theme(legend.position = "top") +
  theme(legend.title = element_blank()) 

Chart 1

Chart 2