A few important notes

Option 1 for submitting your assignment: This method is actually preferred. This is an RMarkdown document. Did you know you can open this document in RStudio, edit it by adding your answers and code, and then knit it to a pdf? To submit your answers to this assignment, simply knit this file as a pdf and submit it as a pdf on Forum. All of your code must be included in the resulting pdf file, i.e., don’t set echo = FALSE in any of your code chunks. To learn more about RMarkdown, watch the videos from session 1 and session 2 of the CS112B optional class. This is also a cheat sheet for using Rmarkdown. If you have questions about RMarkdown, please post them on Perusall. Try knitting this document in your RStudio. You should be able to get a pdf file. At any step, you can try knitting the document and recreate a pdf. If you get error, you might have incomplete code.

Option 2 for submitting your assignment: If you are not comfortable with RMarkdown, you can also choose the Google Doc version of this assignment, make a copy of it and edit the Google doc (include your code, figures, results, and explanations) and at the end download your Google Doc as a pdf and submit the pdf file.

Note: Either way (if you use Rmd and knit as pdf OR if you use Google Doc and download as pdf) you should make sure you put your name on top of the document.

Note: The first time you run this document you may get error that some packages don’t exist. If you don’t have the packages listed on top of this document, install them first and you won’t get those errors.

Note: If you work with others or get help from others on the assignment, please provide the names of your partners at the top of this document. Working together is fine, but all must do and understand their own work and have to mention the collaboration on top of this document.

Note: Don’t change seed in the document. The function set.seed() has already been set at the beginning of this document to 928. Changing the see again to a different number will make your results not replicable.

QUESTION 1: Data Generating Example

The following code, creates a toy dataset with a treatment variable, \(D\), an outcome variable, \(Y\), and other variables \(V_1\) to \(V_4\).

n = 1000
## Generating a random data set here
#Syntax for the normal distribution here is rnorm(sample size, mean, SD)
V1 = rnorm(n, 45, 10)
#getting a binary variable
V2 = sample(c(1,0), 
             replace = TRUE, 
             size = n,
             prob = c(.4,.6))
V3 = rnorm(n, V1/10, 1)
V4 = rnorm(n, 0, 1)
D  = as.numeric(pnorm(rnorm(n, .01*V1 + .8*V2 + 0.3*V3 + V4, 1), .45 + .32 + .3*4.5, 1) > .5)
Y  = rnorm(n, .8*D - 0.45*V2 - .4*V3 + 2, .2)
# combining everything in a data frame
df = data.frame(V1, V2, V3, V4, D, Y)

STEP 1

From the variables \(V_1\), \(V_2\), \(V_3\), and \(V_4\), which one(s) are not confounding variable(s) (covariates that causes confounding)? Remember, a rule of thumb (although not a perfect rule) is that the variable is correlated with both the treatment variable and the outcome variable. Explain!

V2 and V3 are both included in the definition of the outcome and the treatment variables, and therefore they are confounding. Thus, V1 and V4 are not confounding variables.

STEP 2

Can you figure out the true treatment effect by looking at the data generating process above?

By looking at the coefficients, we can estimate that the treatment effect will be around 0.8 plus some noise. The data was created based on randomization functions, which means it is truly randomized; therefore, the data should be balanced. In this case, by deducting the treatment and control, most of the variables should cancel out each other, the only main difference is that the control group will have D = 0, and treatment D = 1, leaving the coefficient of D, 0.8 (0.81 - 0.80). Although randomization leads to balance, there will be some differences between the sample’s variables, leaving additional extra noise

For example, if defined only as Y = 0.8D +2, then the control is 0.80 +2, and the treated is 0.81+2. By deducting them, we are left only with 0.8. noise is added based on the variables V2 and V3

STEP 3

Plot the outcome variable against the treatment variable. Make sure you label your axes. Do you see a trend?

# Your code here
#plot(D,Y, type = "p")
graph = ggplot(df, aes(x=D, y=Y))+geom_point()+ geom_smooth(method="lm", col="blue", se = FALSE)
graph

The slope is slightly positive; it seems that the trend shows that on average, treatment units have a higher value for the outcome variable

STEP 4

Are the variables \(V_1\), \(V_2\), \(V_3\), and \(V_4\) balanced across the treatment and control groups? You can use any R function from any package to check this (for instance, you can check the cobalt package). Make sure you check all variables.

Note: This is optional but you can use the gridExtra package and its grid.arrange() function to put all the 4 graphs in one 2 x 2 graph. Read more about the package and how to use it here: https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html. Set nrow = 2.

# Your code here

dens_plot1 <- bal.plot(x = df, var.name="V1", type = "density", which = "unadjusted", mirror = FALSE, treat = df$D)
dens_plot2 <- bal.plot(x = df, var.name="V2", type = "density", which = "unadjusted", mirror = FALSE, treat = df$D)
dens_plot3 <- bal.plot(x = df, var.name="V3", type = "density", which = "unadjusted", mirror = FALSE, treat = df$D)
dens_plot4 <- bal.plot(x = df, var.name="V4", type = "density", which = "unadjusted", mirror = FALSE, treat = df$D)

#I understand that histogram is more appropriate in this case, but I find the density graphs more intuitive examine balance, so I added them as well
grid.arrange(dens_plot1, dens_plot2, dens_plot3, dens_plot4, nrow = 2)

plot1 <- bal.plot(x = df, treat = df$D, var.name = "V1", which = "unadjusted", type = "histogram", mirror = TRUE)
plot2 <- bal.plot(x = df, treat = df$D, var.name = "V2", which = "unadjusted", type = "histogram", mirror = TRUE)
plot3 <- bal.plot(x = df, treat = df$D, var.name = "V3", which = "unadjusted", type = "histogram", mirror = TRUE)
plot4 <- bal.plot(x = df, treat = df$D, var.name = "V4", which = "unadjusted", type = "histogram", mirror = TRUE)

grid.arrange(plot1, plot2, plot3, plot4, nrow = 2)

overall, the variables seem pretty balanced. Out of all the continuous variables, V4 is the covariate that is the least balanced. V2, which is a binary variable (I assume it demonstrates a variable such as male/female), is not very balanced across both treatment and control

STEP 5

Write code that would simply calculate the Prima Facie treatment effect in the data above. What’s the Prima Facie treatment effect? Note that the outcome variable is binary.

# Your code here
#observed minus the treatment
#swirl session 14
prima_treat <- mean(df$Y[D == 1]) - mean(df$Y[D == 0])
prima_treat
## [1] 0.3423726

STEP 6

Explain why the Prima Facie effect is not the true average causal effect from Step 2.

The Prima Facie effect is when deducting the average outcome of the treated minus the control. However, the two groups received different treatments (or one did not receive any intervention). By deducting the average, the control group does not really represent the outcome of what would happen if the same people did not receive treatment, leading to the wrong treatment effect. The true treatment effect would be if the same people under the same conditions would receive treatment and be under control in a different universe. because we can not create this situation in real life, we try to replicate as much as possible of it by matching

STEP 7

We can use matching to create a better balance of the covariates. Use a propensity score model that includes all the variables \(V_1\), \(V_2\), \(V_3\), and \(V_4\).

# Your code here
glm_df = glm( D ~ V1 +V2+ V3+ V4, data = df, family = binomial)

matchout.prop <- Match(Y = df$Y, Tr = df$D, X = glm_df$fitted)
 
mb1 <- MatchBalance(D ~ V1 +V2+ V3+ V4, data = df, match.out = matchout.prop, nboots = 1000)
## 
## ***** (V1) V1 *****
##                        Before Matching        After Matching
## mean treatment........     47.145             47.145 
## mean control..........     42.457             46.073 
## std mean diff.........     46.267             10.572 
## 
## mean raw eQQ diff.....     4.7386             2.4364 
## med  raw eQQ diff.....     4.6719             1.7626 
## max  raw eQQ diff.....     8.2566             9.4025 
## 
## mean eCDF diff........    0.13197           0.065311 
## med  eCDF diff........    0.14894           0.054487 
## max  eCDF diff........    0.22444            0.17147 
## 
## var ratio (Tr/Co).....     1.1115             1.6438 
## T-test p-value........ 1.4033e-13           0.060663 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... 2.3252e-11         2.1513e-08 
## KS Statistic..........    0.22444            0.17147 
## 
## 
## ***** (V2) V2 *****
##                        Before Matching        After Matching
## mean treatment........    0.52749            0.52749 
## mean control..........    0.28684            0.58847 
## std mean diff.........     48.155            -12.201 
## 
## mean raw eQQ diff.....    0.24236           0.049679 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........    0.12033            0.02484 
## med  eCDF diff........    0.12033            0.02484 
## max  eCDF diff........    0.24066           0.049679 
## 
## var ratio (Tr/Co).....     1.2185             1.0292 
## T-test p-value........ 4.4409e-15           0.036802 
## 
## 
## ***** (V3) V3 *****
##                        Before Matching        After Matching
## mean treatment........      4.919              4.919 
## mean control..........     4.0853             4.8504 
## std mean diff.........     61.744             5.0783 
## 
## mean raw eQQ diff.....    0.84184            0.29996 
## med  raw eQQ diff.....    0.82372            0.27618 
## max  raw eQQ diff.....     1.7613               1.27 
## 
## mean eCDF diff........    0.16484           0.058721 
## med  eCDF diff........    0.17683           0.044872 
## max  eCDF diff........    0.26201            0.14103 
## 
## var ratio (Tr/Co).....      0.954             1.0788 
## T-test p-value........ < 2.22e-16            0.36665 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... 2.5535e-15         8.1531e-06 
## KS Statistic..........    0.26201            0.14103 
## 
## 
## ***** (V4) V4 *****
##                        Before Matching        After Matching
## mean treatment........    0.56565            0.56565 
## mean control..........   -0.49923            0.38581 
## std mean diff.........     127.27             21.493 
## 
## mean raw eQQ diff.....     1.0694            0.15808 
## med  raw eQQ diff.....     1.0629            0.06365 
## max  raw eQQ diff.....     1.8961             1.8961 
## 
## mean eCDF diff........    0.31869            0.04643 
## med  eCDF diff........    0.35388           0.019231 
## max  eCDF diff........    0.50036            0.21795 
## 
## var ratio (Tr/Co).....     1.0242             1.4111 
## T-test p-value........ < 2.22e-16          6.192e-06 
## KS Bootstrap p-value.. < 2.22e-16         < 2.22e-16 
## KS Naive p-value...... < 2.22e-16         2.6801e-13 
## KS Statistic..........    0.50036            0.21795 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): V1 V3 V4  Number(s): 1 3 4 
## 
## After Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): V1 V3 V4  Number(s): 1 3 4

STEP 8

Check the balance of the covariates. Do you see any improvements after matching?

what I find interesting is that the balance plots look less balanced than before matching. However, the p values of the covariates have improved somewhat. Before, all the p values were minimal. After matching, they are still relatively small but has significantly improved from before. For example, V3 p-value was 2.22e-16, and after matching became 0.36. Because the rest of the variables matching is still low, overall, the PS matching did not do a very good job An important point to note is that KS test can represent better the change in this case of Propensity score (PS). Because PS aims to optimize based on the mean value of the covariate for treatment and control; thus, it would make sense that P-value of the t-test has improved, but the KS test p-value which compares distribution, remained pretty much the same, at a very low rate.

V1 T-test p-value…….. 1.4033e-13 0.060663 KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16 V2 T-test p-value…….. 4.4409e-15 0.036802 KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16 V3 T-test p-value…….. < 2.22e-16 0.36665 KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16 V4 T-test p-value…….. < 2.22e-16 6.192e-06 KS Bootstrap p-value.. < 2.22e-16 < 2.22e-16

Q1plot1_AM_den  <- bal.plot( matchout.prop , "V1", treat = D,  type = "density", covs = cbind(V1, V2, V3, V4), which = "both")
Q1plot2_AM_den  <- bal.plot( matchout.prop , "V2", treat = D,  type = "density", covs = cbind(V1, V2, V3, V4), which = "both")
Q1plot3_AM_den  <- bal.plot( matchout.prop , "V3", treat = D,  type = "density", covs = cbind(V1, V2, V3, V4), which = "both")
Q1plot4_AM_den  <- bal.plot( matchout.prop , "V4", treat = D,  type = "density", covs = cbind(V1, V2, V3, V4), which = "both")

grid.arrange(Q1plot1_AM_den , Q1plot2_AM_den , Q1plot3_AM_den , Q1plot4_AM_den , nrow = 2)

Q1plot1_AM <- bal.plot( matchout.prop , "V1", treat = D,  type = "histogram", covs = cbind(V1, V2, V3, V4), mirror = TRUE, which = "both")
Q1plot2_AM <- bal.plot( matchout.prop , "V2", treat = D,  type = "histogram", covs = cbind(V1, V2, V3, V4), mirror = TRUE, which = "both")
Q1plot3_AM <- bal.plot( matchout.prop , "V3", treat = D,  type = "histogram", covs = cbind(V1, V2, V3, V4), mirror = TRUE, which = "both")
Q1plot4_AM <- bal.plot( matchout.prop , "V4", treat = D,  type = "histogram", covs = cbind(V1, V2, V3, V4), mirror = TRUE, which = "both")

grid.arrange(Q1plot1_AM, Q1plot2_AM, Q1plot3_AM, Q1plot4_AM, nrow = 2)

STEP 9

What is the treatment effect after matching? Is this surprising given your answer to Step 2. Is the treatment effect found in this Step closer to the treatment effect in Step 2 than the treatment effect before matching?

# Your code here
matchout.prop$est
##           [,1]
## [1,] 0.8132563

The treatment effect is very similar to the one I estimated in Step 2. They are both different than the Prima Facie effect (0.34), emphasizing that the calculation of Prima Facie is not the true treatment effect

QUESTION 2: Daughters

Read Section 5 (which is only about 1 page long!) of Iacus, King, Porro (2011), Multivariate Matching Methods That Are Monotonic Imbalance Bounding, JASA, V 106, N. 493, available here: https://gking.harvard.edu/files/gking/files/cem_jasa.pdf. Don’t worry about the “CEM” elements. Focus on the “daughters” case study.

Data for this case study is available in “doughters” below.

daughters = read.csv(url("http://bit.ly/daughters_data")) %>% 
  clean_names()

STEP 1

Before doing any matching, run a regression, with the outcome variable being nowtot, the treatment variable being hasgirls, and the independent vars mentioned below: - dems, - repubs, - christian, - age, - srvlng, - demvote

Show the regression specification. Use the regression to estimate a treatment effect and confidence interval. Check the balance of this not-matched data set using any method of your choice (balance tables, balance plots, love plots, etc).

library("cobalt")

Q2lm <- lm(nowtot ~ dems + repubs+ christian + age + srvlng + demvote + hasgirls, data = daughters)

summary(Q2lm)
## 
## Call:
## lm(formula = nowtot ~ dems + repubs + christian + age + srvlng + 
##     demvote + hasgirls, data = daughters)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.028 -10.322  -1.517  11.208  69.642 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  38.6991    18.6306   2.077 0.038390 *  
## dems         -8.1022    17.5861  -0.461 0.645238    
## repubs      -55.1069    17.6340  -3.125 0.001901 ** 
## christian   -13.3961     3.7218  -3.599 0.000357 ***
## age           0.1260     0.1117   1.128 0.259938    
## srvlng       -0.2251     0.1355  -1.662 0.097349 .  
## demvote      87.5501     8.4847  10.319  < 2e-16 ***
## hasgirls     -0.4523     1.9036  -0.238 0.812322    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared:  0.7821, Adjusted R-squared:  0.7784 
## F-statistic: 216.3 on 7 and 422 DF,  p-value: < 2.2e-16
#estimated treatment effect would be

coef(summary(Q2lm))["hasgirls","Estimate"]
## [1] -0.4522678
#confidence interval
confint(Q2lm, "hasgirls")
##             2.5 %   97.5 %
## hasgirls -4.19406 3.289525
Q2mout <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, nboots = 500) 
## 
## ***** (V1) dems *****
## before matching:
## mean treatment........ 0.45833 
## mean control.......... 0.50847 
## std mean diff......... -10.047 
## 
## mean raw eQQ diff..... 0.050847 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 1 
## 
## mean eCDF diff........ 0.025071 
## med  eCDF diff........ 0.025071 
## max  eCDF diff........ 0.050141 
## 
## var ratio (Tr/Co)..... 0.98809 
## T-test p-value........ 0.35571 
## 
## 
## ***** (V2) repubs *****
## before matching:
## mean treatment........ 0.53846 
## mean control.......... 0.49153 
## std mean diff......... 9.4 
## 
## mean raw eQQ diff..... 0.042373 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 1 
## 
## mean eCDF diff........ 0.023468 
## med  eCDF diff........ 0.023468 
## max  eCDF diff........ 0.046936 
## 
## var ratio (Tr/Co)..... 0.98911 
## T-test p-value........ 0.3873 
## 
## 
## ***** (V3) christian *****
## before matching:
## mean treatment........ 0.9391 
## mean control.......... 0.94915 
## std mean diff......... -4.1958 
## 
## mean raw eQQ diff..... 0.016949 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 1 
## 
## mean eCDF diff........ 0.005025 
## med  eCDF diff........ 0.005025 
## max  eCDF diff........ 0.01005 
## 
## var ratio (Tr/Co)..... 1.1787 
## T-test p-value........ 0.68107 
## 
## 
## ***** (V4) age *****
## before matching:
## mean treatment........ 52.628 
## mean control.......... 49.178 
## std mean diff......... 38.385 
## 
## mean raw eQQ diff..... 3.661 
## med  raw eQQ diff..... 4 
## max  raw eQQ diff..... 7 
## 
## mean eCDF diff........ 0.075348 
## med  eCDF diff........ 0.075538 
## max  eCDF diff........ 0.17807 
## 
## var ratio (Tr/Co)..... 0.71552 
## T-test p-value........ 0.0020402 
## KS Bootstrap p-value.. 0.002 
## KS Naive p-value...... 0.0087659 
## KS Statistic.......... 0.17807 
## 
## 
## ***** (V5) srvlng *****
## before matching:
## mean treatment........ 8.5865 
## mean control.......... 8.7458 
## std mean diff......... -2.1085 
## 
## mean raw eQQ diff..... 0.66949 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 5 
## 
## mean eCDF diff........ 0.017181 
## med  eCDF diff........ 0.01445 
## max  eCDF diff........ 0.051608 
## 
## var ratio (Tr/Co)..... 0.77347 
## T-test p-value........ 0.85956 
## KS Bootstrap p-value.. 0.75 
## KS Naive p-value...... 0.97653 
## KS Statistic.......... 0.051608 
## 
## 
## ***** (V6) demvote *****
## before matching:
## mean treatment........ 0.49929 
## mean control.......... 0.50602 
## std mean diff......... -5.2747 
## 
## mean raw eQQ diff..... 0.011441 
## med  raw eQQ diff..... 0.01 
## max  raw eQQ diff..... 0.08 
## 
## mean eCDF diff........ 0.015928 
## med  eCDF diff........ 0.010811 
## max  eCDF diff........ 0.048512 
## 
## var ratio (Tr/Co)..... 1.1269 
## T-test p-value........ 0.61103 
## KS Bootstrap p-value.. 0.932 
## KS Naive p-value...... 0.98776 
## KS Statistic.......... 0.048512 
## 
## 
## Before Matching Minimum p.value: 0.002 
## Variable Name(s): age  Number(s): 4
Q2mout$BMsmallest.p.value
## [1] 0.002
#Checking balance

Q2covs <- subset(daughters, select = c(dems, repubs, christian, age, srvlng, demvote))

Q2plot4_den <- bal.plot(daughters$hasgirls ~ daughters$age,  type = "density", treat = daughters$hasgirls)
Q2plot5_den <- bal.plot(daughters$hasgirls ~ daughters$srvlng,  type = "density", treat = daughters$hasgirls)
Q2plot6_den <- bal.plot(daughters$hasgirls ~ daughters$demvote,  type = "density", treat = daughters$hasgirls)

grid.arrange(Q2plot4_den, Q2plot5_den , Q2plot6_den,  nrow = 2)

Q2plot1 <- bal.plot(daughters$hasgirls ~ daughters$dems,  type = "histogram", treat = daughters$hasgirls, mirror = TRUE)
Q2plot2 <- bal.plot(daughters$hasgirls ~ daughters$repubs,  type = "histogram", treat = daughters$hasgirls, mirror = TRUE)
Q2plot3 <- bal.plot(daughters$hasgirls ~ daughters$christian,  type = "histogram", treat = daughters$hasgirls, mirror = TRUE)
Q2plot4 <- bal.plot(daughters$hasgirls ~ daughters$age,  type = "histogram", treat = daughters$hasgirls, mirror = TRUE)
Q2plot5 <- bal.plot(daughters$hasgirls ~ daughters$srvlng,  type = "histogram", treat = daughters$hasgirls, mirror = TRUE)
Q2plot6 <- bal.plot(daughters$hasgirls ~ daughters$demvote,  type = "histogram", treat = daughters$hasgirls, mirror = TRUE)

grid.arrange(Q2plot1, Q2plot2, Q2plot3, Q2plot4, Q2plot5, Q2plot6,  nrow = 2)

the estimated treatment effect is -0.45, the 95% confidence interval is (2.5 %: -4.19, 97.5%: 3.28)

STEP 2

Then, do genetic matching. Use the same variables as in the regression above. Make sure to set estimand = "ATT". What’s your treatment effect?

Note: For replicability purposes, we need to choose a see for the GenMatch() function. However, setting seed for GenMatch() is different. The usual practice of typing set.seed(some number) before the GenMatch line doesn’t ensure stochastic stability. To set seeds for GenMatch, you have to run GenMatch() including instructions for genoud within GenMatch(), e.g.: GenMatch(Tr, X, unif.seed = 123, int.seed = 92485...). You can find info on these .seed elements in the documentation for genoud(). The special .seed elements should only be present in the GenMatch() line, not elsewhere (because nothing besides GenMatch() runs genoud.

Note: When you use the GenMatch() function, wrap everything inside the following function invisible(capture.output()). This will reduce the unnecessary output produced from the GenMatch() function. For instance, you can say: invisible(capture.output(genout_daughters <- GenMatch(...)))

# Your code here
Q2Y= daughters$nowtot

Q2Tr = daughters$hasgirls

Q2_2_X <- cbind(daughters$dems, daughters$repubs, daughters$christian, daughters$age , daughters$srvlng , daughters$demvote)

invisible(capture.output(Q2genout <- GenMatch(Tr = Q2Tr, X = Q2_2_X, estimand ='ATT', pop.size = 10 , BalanceMatrix = Q2_2_X , max.generations = 15 , unif.seed = 123, int.seed = 92485)))


matchout.gen <- Match(Tr = Q2Tr, X= Q2_2_X, estimand ='ATT', Weight.matrix = Q2genout, M=1)

mbout_Q2_2 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = matchout.gen, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.45833 
## mean control..........    0.50847            0.46154 
## std mean diff.........    -10.047           -0.64223 
## 
## mean raw eQQ diff.....   0.050847          0.0032051 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0016026 
## med  eCDF diff........   0.025071          0.0016026 
## max  eCDF diff........   0.050141          0.0032051 
## 
## var ratio (Tr/Co).....    0.98809            0.99897 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.53846 
## mean control..........    0.49153            0.53846 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391             0.9391 
## mean control..........    0.94915             0.9391 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             52.628 
## mean control..........     49.178              52.49 
## std mean diff.........     38.385             1.5333 
## 
## mean raw eQQ diff.....      3.661            0.57372 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  4 
## 
## mean eCDF diff........   0.075348           0.012464 
## med  eCDF diff........   0.075538          0.0096154 
## max  eCDF diff........    0.17807           0.041667 
## 
## var ratio (Tr/Co).....    0.71552            0.98948 
## T-test p-value........  0.0020402            0.37525 
## KS Bootstrap p-value..      0.007              0.845 
## KS Naive p-value......  0.0087659            0.94937 
## KS Statistic..........    0.17807           0.041667 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.5865 
## mean control..........     8.7458             8.6955 
## std mean diff.........    -2.1085             -1.443 
## 
## mean raw eQQ diff.....    0.66949                0.5 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  9 
## 
## mean eCDF diff........   0.017181           0.013067 
## med  eCDF diff........    0.01445           0.011218 
## max  eCDF diff........   0.051608           0.051282 
## 
## var ratio (Tr/Co).....    0.77347            0.94995 
## T-test p-value........    0.85956            0.55124 
## KS Bootstrap p-value..      0.788              0.557 
## KS Naive p-value......    0.97653            0.80655 
## KS Statistic..........   0.051608           0.051282 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.49929 
## mean control..........    0.50602            0.49949 
## std mean diff.........    -5.2747            -0.1509 
## 
## mean raw eQQ diff.....   0.011441          0.0094872 
## med  raw eQQ diff.....       0.01               0.01 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.014398 
## med  eCDF diff........   0.010811          0.0096154 
## max  eCDF diff........   0.048512           0.044872 
## 
## var ratio (Tr/Co).....     1.1269             1.1179 
## T-test p-value........    0.61103            0.93281 
## KS Bootstrap p-value..      0.921               0.79 
## KS Naive p-value......    0.98776            0.91194 
## KS Statistic..........   0.048512           0.044872 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.31731 
## Variable Name(s): dems  Number(s): 1
mout_Q2_2_TE <- Match( Weight.matrix = Q2genout, Y= daughters$nowtot, Tr = Q2Tr, X= Q2_2_X, estimand ='ATT', M=1)

mbout_Q2_2_TE <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = mout_Q2_2_TE, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.45833 
## mean control..........    0.50847            0.46154 
## std mean diff.........    -10.047           -0.64223 
## 
## mean raw eQQ diff.....   0.050847          0.0032051 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0016026 
## med  eCDF diff........   0.025071          0.0016026 
## max  eCDF diff........   0.050141          0.0032051 
## 
## var ratio (Tr/Co).....    0.98809            0.99897 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.53846 
## mean control..........    0.49153            0.53846 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391             0.9391 
## mean control..........    0.94915             0.9391 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             52.628 
## mean control..........     49.178              52.49 
## std mean diff.........     38.385             1.5333 
## 
## mean raw eQQ diff.....      3.661            0.57372 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  4 
## 
## mean eCDF diff........   0.075348           0.012464 
## med  eCDF diff........   0.075538          0.0096154 
## max  eCDF diff........    0.17807           0.041667 
## 
## var ratio (Tr/Co).....    0.71552            0.98948 
## T-test p-value........  0.0020402            0.37525 
## KS Bootstrap p-value..      0.003              0.861 
## KS Naive p-value......  0.0087659            0.94937 
## KS Statistic..........    0.17807           0.041667 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.5865 
## mean control..........     8.7458             8.6955 
## std mean diff.........    -2.1085             -1.443 
## 
## mean raw eQQ diff.....    0.66949                0.5 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  9 
## 
## mean eCDF diff........   0.017181           0.013067 
## med  eCDF diff........    0.01445           0.011218 
## max  eCDF diff........   0.051608           0.051282 
## 
## var ratio (Tr/Co).....    0.77347            0.94995 
## T-test p-value........    0.85956            0.55124 
## KS Bootstrap p-value..      0.768              0.568 
## KS Naive p-value......    0.97653            0.80655 
## KS Statistic..........   0.051608           0.051282 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.49929 
## mean control..........    0.50602            0.49949 
## std mean diff.........    -5.2747            -0.1509 
## 
## mean raw eQQ diff.....   0.011441          0.0094872 
## med  raw eQQ diff.....       0.01               0.01 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.014398 
## med  eCDF diff........   0.010811          0.0096154 
## max  eCDF diff........   0.048512           0.044872 
## 
## var ratio (Tr/Co).....     1.1269             1.1179 
## T-test p-value........    0.61103            0.93281 
## KS Bootstrap p-value..      0.914              0.799 
## KS Naive p-value......    0.98776            0.91194 
## KS Statistic..........   0.048512           0.044872 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.31731 
## Variable Name(s): dems  Number(s): 1
summary(mout_Q2_2_TE)
## 
## Estimate...  0.64103 
## AI SE......  2.1916 
## T-stat.....  0.2925 
## p.val......  0.76991 
## 
## Original number of observations..............  430 
## Original number of treated obs...............  312 
## Matched number of observations...............  312 
## Matched number of observations  (unweighted).  312
mout_Q2_2_TE$est
##           [,1]
## [1,] 0.6410256

The estimated treatment effect after genetic matching is 0.78

STEP 3

Summarize (in 5-15 sentences) the genetic matching procedure and results, including what you matched on, what you balanced on, and what your balance results were. Provide output for MatchBalance() in the body of your submission.

For genetic matching, I first define the outcome variable, the treatment variable, and the covariates I want to match on. Then, I run the GenMatch function to calculate the weights. I am then matching using the weight matrix, blinding myself from the outcome, so only the covariates’ balance is visible. The minimum p-value before matching was 0.002 for the covariate age. after matching, the p-value for age is 0.54- showing a significant improvement, leaving “dems” with the smallest p-value of 0.31. The matching method was defined as ATT and M = 1; for every treated unit, the algorithm found one control to match the treated units. The balance results are:

V1 dems: Before Matching After Matching mean treatment…….. 0.45833 0.45833 mean control………. 0.50847 0.46154 T-test p-value…….. 0.35571 0.31731

V2 repubs: Before Matching After Matching mean treatment…….. 0.53846 0.53846 mean control………. 0.49153 0.53846 T-test p-value…….. 0.3873 1

V3 christian: Before Matching After Matching mean treatment…….. 0.9391 0.9391 mean control………. 0.94915 0.9391 T-test p-value…….. 0.68107 1

V4 age: Before Matching After Matching mean treatment…….. 52.628 52.628 mean control………. 49.178 52.535

T-test p-value…….. 0.0020402 0.54498 KS Bootstrap p-value.. 0.006 0.898 KS Naive p-value…… 0.0087659 0.97513 KS Statistic………. 0.17807 0.038462

V5 srvlng: Before Matching After Matching mean treatment…….. 8.5865 8.5865 mean control………. 8.7458 8.6987

T-test p-value…….. 0.85956 0.52239 KS Bootstrap p-value.. 0.772 0.601 KS Naive p-value…… 0.97653 0.86365 KS Statistic………. 0.051608 0.048077

V6 demvote: Before Matching After Matching mean treatment…….. 0.49929 0.49929 mean control………. 0.50602 0.49846

T-test p-value…….. 0.61103 0.72718 KS Bootstrap p-value.. 0.926 0.803 KS Naive p-value…… 0.98776 0.91194 KS Statistic………. 0.048512 0.044872

STEP 4

Is your treatment effect different from the one reported before matching? By how much? If your numbers are different, provide some explanation as to why the two numbers are different. If they’re not, provide an explanation why they’re not different.

before: TE -0.4522678 After: TE 0.78526 There is a difference of 1.237 between the two treatment effects. The genetic matching did an excellent job improving the balance of the treated and control units. by matching, covariates that are confounding were considered, so the results should demonstrate more accurate estimation of the treatment effect.

STEP 5

Change the parameters in your genetic matching algorithm to improve the balance of the covariates. Consider rerunning with M = 2 or 3 or more. Consider increasing other parameters in the GenMatch() function such as the number of generations and population size, caliper, etc. Try 10 different ways but don’t report the results of the genetic matching weights or the balance in this document. Only report the treatment effect of your top 3 matches. For instance, run the Match() function three times for your top 3 genout objects. Make sure the summary reports the treatment effect estimate, the standard error, and the confidence interval. Do you see a large variation in your treatment effect between your top 3 models?

Note: For replicability purposes, we need to choose a see for the GenMatch() function. However, setting seed for GenMatch() is different. The usual practice of typing set.seed(123) before the GenMatch line doesn’t ensure stochastic stability. To set seeds for GenMatch, you have to run GenMatch() including instructions for genoud within GenMatch(), e.g.: GenMatch(Tr, X, unif.seed = 123, int.seed = 92485...). You can find info on these .seed elements in the documentation for genoud(). The special .seed elements should only be present in the GenMatch() line, not elsewhere (because nothing besides GenMatch() runs genoud.

Note: When you use the GenMatch() function, wrap everything inside the following function invisible(capture.output()). This will reduce the unnecessary output produced with the GenMatch() function. For instance, you can say: invisible(capture.output(genout_daughters <- GenMatch(...)))

Note: In the matching assignment, you may find that the Genetic Matching step takes a while, e.g., hours. If you have to reduce pop.size to e.g., 10 or 16 to ensure it stops after only an hour or two, that’s fine. Running your computer for an hour or two is a good thing. Running it for a full day or more is unnecessary overkill (and if this is your situation, change hyperparameters like pop.size to reduce run-time). For example, we suggest you modify the pop.size (e.g., you can set it to 5!), max.generations (set it to 2!), and wait.generations (set it to 1!) and that should expedite things.

Note: Can you set a caliper for one confounding variable, and not others (e.g., only set a caliper for “age”)? No and yes. No, strictly speaking you can’t. But yes, practically speaking you can, if you set other calipers (for the other confounders) that are so wide as to not induce any constraints. E.g., in GenMatch, and Match, you could set caliper = c(1e16, 1e16, 0.5, 1e16) and this would induce a certain meaningful caliper for the third confounder in X, without constraining the other confounders (because 1e16 implies a caliper that is so enormously wide that it does not, in practical terms, serve as a caliper at all).

#changed the pop size to 30
invisible(capture.output(Q2genout_1 <- GenMatch( caliper = c(1e16, 1e16, 0.5, 1e16), Tr = Q2Tr, X = Q2_2_X, estimand ='ATT', pop.size = 10 , BalanceMatrix = Q2_2_X , max.generations = 15 , unif.seed = 123, int.seed = 92485)))


matchout.gen_1 <- Match(Tr = Q2Tr, X= Q2_2_X, estimand ='ATT', Weight.matrix = Q2genout_1)


mbout_Q2_1 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = matchout.gen_1, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.45833 
## mean control..........    0.50847            0.46154 
## std mean diff.........    -10.047           -0.64223 
## 
## mean raw eQQ diff.....   0.050847          0.0032051 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0016026 
## med  eCDF diff........   0.025071          0.0016026 
## max  eCDF diff........   0.050141          0.0032051 
## 
## var ratio (Tr/Co).....    0.98809            0.99897 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.53846 
## mean control..........    0.49153            0.53846 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391             0.9391 
## mean control..........    0.94915             0.9391 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             52.628 
## mean control..........     49.178              52.49 
## std mean diff.........     38.385             1.5333 
## 
## mean raw eQQ diff.....      3.661            0.57372 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  4 
## 
## mean eCDF diff........   0.075348           0.012464 
## med  eCDF diff........   0.075538          0.0096154 
## max  eCDF diff........    0.17807           0.041667 
## 
## var ratio (Tr/Co).....    0.71552            0.98948 
## T-test p-value........  0.0020402            0.37525 
## KS Bootstrap p-value..      0.003              0.861 
## KS Naive p-value......  0.0087659            0.94937 
## KS Statistic..........    0.17807           0.041667 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.5865 
## mean control..........     8.7458             8.6955 
## std mean diff.........    -2.1085             -1.443 
## 
## mean raw eQQ diff.....    0.66949                0.5 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  9 
## 
## mean eCDF diff........   0.017181           0.013067 
## med  eCDF diff........    0.01445           0.011218 
## max  eCDF diff........   0.051608           0.051282 
## 
## var ratio (Tr/Co).....    0.77347            0.94995 
## T-test p-value........    0.85956            0.55124 
## KS Bootstrap p-value..      0.771              0.539 
## KS Naive p-value......    0.97653            0.80655 
## KS Statistic..........   0.051608           0.051282 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.49929 
## mean control..........    0.50602            0.49949 
## std mean diff.........    -5.2747            -0.1509 
## 
## mean raw eQQ diff.....   0.011441          0.0094872 
## med  raw eQQ diff.....       0.01               0.01 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.014398 
## med  eCDF diff........   0.010811          0.0096154 
## max  eCDF diff........   0.048512           0.044872 
## 
## var ratio (Tr/Co).....     1.1269             1.1179 
## T-test p-value........    0.61103            0.93281 
## KS Bootstrap p-value..      0.915              0.792 
## KS Naive p-value......    0.98776            0.91194 
## KS Statistic..........   0.048512           0.044872 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.31731 
## Variable Name(s): dems  Number(s): 1
mout_Q2_2_TE_1 <- Match( Weight.matrix = Q2genout_1, Y= daughters$nowtot, Tr = Q2Tr, X= Q2_2_X, estimand ='ATT', M=1)

mbout_Q2_2_TE_1 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = mout_Q2_2_TE_1, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.45833 
## mean control..........    0.50847            0.46154 
## std mean diff.........    -10.047           -0.64223 
## 
## mean raw eQQ diff.....   0.050847          0.0032051 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0016026 
## med  eCDF diff........   0.025071          0.0016026 
## max  eCDF diff........   0.050141          0.0032051 
## 
## var ratio (Tr/Co).....    0.98809            0.99897 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.53846 
## mean control..........    0.49153            0.53846 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391             0.9391 
## mean control..........    0.94915             0.9391 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             52.628 
## mean control..........     49.178              52.49 
## std mean diff.........     38.385             1.5333 
## 
## mean raw eQQ diff.....      3.661            0.57372 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  4 
## 
## mean eCDF diff........   0.075348           0.012464 
## med  eCDF diff........   0.075538          0.0096154 
## max  eCDF diff........    0.17807           0.041667 
## 
## var ratio (Tr/Co).....    0.71552            0.98948 
## T-test p-value........  0.0020402            0.37525 
## KS Bootstrap p-value.. < 2.22e-16              0.855 
## KS Naive p-value......  0.0087659            0.94937 
## KS Statistic..........    0.17807           0.041667 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.5865 
## mean control..........     8.7458             8.6955 
## std mean diff.........    -2.1085             -1.443 
## 
## mean raw eQQ diff.....    0.66949                0.5 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  9 
## 
## mean eCDF diff........   0.017181           0.013067 
## med  eCDF diff........    0.01445           0.011218 
## max  eCDF diff........   0.051608           0.051282 
## 
## var ratio (Tr/Co).....    0.77347            0.94995 
## T-test p-value........    0.85956            0.55124 
## KS Bootstrap p-value..       0.76              0.558 
## KS Naive p-value......    0.97653            0.80655 
## KS Statistic..........   0.051608           0.051282 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.49929 
## mean control..........    0.50602            0.49949 
## std mean diff.........    -5.2747            -0.1509 
## 
## mean raw eQQ diff.....   0.011441          0.0094872 
## med  raw eQQ diff.....       0.01               0.01 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.014398 
## med  eCDF diff........   0.010811          0.0096154 
## max  eCDF diff........   0.048512           0.044872 
## 
## var ratio (Tr/Co).....     1.1269             1.1179 
## T-test p-value........    0.61103            0.93281 
## KS Bootstrap p-value..      0.928              0.799 
## KS Naive p-value......    0.98776            0.91194 
## KS Statistic..........   0.048512           0.044872 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.31731 
## Variable Name(s): dems  Number(s): 1
summary(mout_Q2_2_TE_1)
## 
## Estimate...  0.64103 
## AI SE......  2.1916 
## T-stat.....  0.2925 
## p.val......  0.76991 
## 
## Original number of observations..............  430 
## Original number of treated obs...............  312 
## Matched number of observations...............  312 
## Matched number of observations  (unweighted).  312
mbout_Q2_2_TE_1$AMsmallest.p.value
## [1] 0.3173124
mout_Q2_2_TE_1$est
##           [,1]
## [1,] 0.6410256
######################
#change M=4
invisible(capture.output(Q2genout_2 <- GenMatch(caliper = c(1e16, 1e16, 0.5, 1e16), M=4,Tr = Q2Tr, X = Q2_2_X, estimand ='ATT', pop.size = 10 , BalanceMatrix = Q2_2_X , max.generations = 15 , unif.seed = 123, int.seed = 92485)))

matchout.gen_2 <- Match(Tr = Q2Tr, X= Q2_2_X, estimand ='ATT', Weight.matrix = Q2genout_2, M=4)

mbout_Q2_2 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = matchout.gen_2, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.45833 
## mean control..........    0.50847            0.46154 
## std mean diff.........    -10.047           -0.64223 
## 
## mean raw eQQ diff.....   0.050847          0.0032026 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0016013 
## med  eCDF diff........   0.025071          0.0016013 
## max  eCDF diff........   0.050141          0.0032026 
## 
## var ratio (Tr/Co).....    0.98809            0.99897 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.53846 
## mean control..........    0.49153            0.53846 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391             0.9391 
## mean control..........    0.94915            0.95112 
## std mean diff.........    -4.1958            -5.0179 
## 
## mean raw eQQ diff.....   0.016949            0.01201 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.005025          0.0060048 
## med  eCDF diff........   0.005025          0.0060048 
## max  eCDF diff........    0.01005            0.01201 
## 
## var ratio (Tr/Co).....     1.1787             1.2302 
## T-test p-value........    0.68107           0.052286 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             52.628 
## mean control..........     49.178             52.136 
## std mean diff.........     38.385             5.4806 
## 
## mean raw eQQ diff.....      3.661            0.72058 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  4 
## 
## mean eCDF diff........   0.075348           0.015799 
## med  eCDF diff........   0.075538           0.011209 
## max  eCDF diff........    0.17807           0.052042 
## 
## var ratio (Tr/Co).....    0.71552             1.0619 
## T-test p-value........  0.0020402          0.0091671 
## KS Bootstrap p-value..      0.006              0.026 
## KS Naive p-value......  0.0087659           0.067908 
## KS Statistic..........    0.17807           0.052042 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.5865 
## mean control..........     8.7458             8.4564 
## std mean diff.........    -2.1085             1.7232 
## 
## mean raw eQQ diff.....    0.66949            0.36509 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  9 
## 
## mean eCDF diff........   0.017181          0.0097617 
## med  eCDF diff........    0.01445          0.0072058 
## max  eCDF diff........   0.051608           0.036829 
## 
## var ratio (Tr/Co).....    0.77347             1.0803 
## T-test p-value........    0.85956             0.6619 
## KS Bootstrap p-value..      0.788              0.184 
## KS Naive p-value......    0.97653            0.36523 
## KS Statistic..........   0.051608           0.036829 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.49929 
## mean control..........    0.50602            0.50013 
## std mean diff.........    -5.2747           -0.65894 
## 
## mean raw eQQ diff.....   0.011441            0.01257 
## med  raw eQQ diff.....       0.01               0.01 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.018987 
## med  eCDF diff........   0.010811           0.013611 
## max  eCDF diff........   0.048512           0.066453 
## 
## var ratio (Tr/Co).....     1.1269             1.2921 
## T-test p-value........    0.61103            0.83859 
## KS Bootstrap p-value..      0.926              0.004 
## KS Naive p-value......    0.98776          0.0080469 
## KS Statistic..........   0.048512           0.066453 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.004 
## Variable Name(s): demvote  Number(s): 6
mout_Q2_2_TE_2 <- Match( Weight.matrix = Q2genout_2, Y= daughters$nowtot, Tr = Q2Tr, X= Q2_2_X, estimand ='ATT', M=4)

mbout_Q2_2_TE_2 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = mout_Q2_2_TE_2, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.45833 
## mean control..........    0.50847            0.46154 
## std mean diff.........    -10.047           -0.64223 
## 
## mean raw eQQ diff.....   0.050847          0.0032026 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0016013 
## med  eCDF diff........   0.025071          0.0016013 
## max  eCDF diff........   0.050141          0.0032026 
## 
## var ratio (Tr/Co).....    0.98809            0.99897 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.53846 
## mean control..........    0.49153            0.53846 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391             0.9391 
## mean control..........    0.94915            0.95112 
## std mean diff.........    -4.1958            -5.0179 
## 
## mean raw eQQ diff.....   0.016949            0.01201 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.005025          0.0060048 
## med  eCDF diff........   0.005025          0.0060048 
## max  eCDF diff........    0.01005            0.01201 
## 
## var ratio (Tr/Co).....     1.1787             1.2302 
## T-test p-value........    0.68107           0.052286 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             52.628 
## mean control..........     49.178             52.136 
## std mean diff.........     38.385             5.4806 
## 
## mean raw eQQ diff.....      3.661            0.72058 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  4 
## 
## mean eCDF diff........   0.075348           0.015799 
## med  eCDF diff........   0.075538           0.011209 
## max  eCDF diff........    0.17807           0.052042 
## 
## var ratio (Tr/Co).....    0.71552             1.0619 
## T-test p-value........  0.0020402          0.0091671 
## KS Bootstrap p-value..      0.003              0.038 
## KS Naive p-value......  0.0087659           0.067908 
## KS Statistic..........    0.17807           0.052042 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.5865 
## mean control..........     8.7458             8.4564 
## std mean diff.........    -2.1085             1.7232 
## 
## mean raw eQQ diff.....    0.66949            0.36509 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  9 
## 
## mean eCDF diff........   0.017181          0.0097617 
## med  eCDF diff........    0.01445          0.0072058 
## max  eCDF diff........   0.051608           0.036829 
## 
## var ratio (Tr/Co).....    0.77347             1.0803 
## T-test p-value........    0.85956             0.6619 
## KS Bootstrap p-value..      0.774              0.189 
## KS Naive p-value......    0.97653            0.36523 
## KS Statistic..........   0.051608           0.036829 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.49929 
## mean control..........    0.50602            0.50013 
## std mean diff.........    -5.2747           -0.65894 
## 
## mean raw eQQ diff.....   0.011441            0.01257 
## med  raw eQQ diff.....       0.01               0.01 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.018987 
## med  eCDF diff........   0.010811           0.013611 
## max  eCDF diff........   0.048512           0.066453 
## 
## var ratio (Tr/Co).....     1.1269             1.2921 
## T-test p-value........    0.61103            0.83859 
## KS Bootstrap p-value..      0.912              0.005 
## KS Naive p-value......    0.98776          0.0080469 
## KS Statistic..........   0.048512           0.066453 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.005 
## Variable Name(s): demvote  Number(s): 6
summary(mout_Q2_2_TE_2)
## 
## Estimate...  1.3093 
## AI SE......  2.0143 
## T-stat.....  0.64999 
## p.val......  0.5157 
## 
## Original number of observations..............  430 
## Original number of treated obs...............  312 
## Matched number of observations...............  312 
## Matched number of observations  (unweighted).  1249
mbout_Q2_2_TE_2$AMsmallest.p.value
## [1] 0.005
mout_Q2_2_TE_2$est
##          [,1]
## [1,] 1.309295
###########
#changed ATT to ATE 

invisible(capture.output(Q2genout_3 <- GenMatch( caliper = c(1e16, 1e16, 0.5, 1e16), Tr = Q2Tr, X = Q2_2_X, estimand ='ATE', pop.size = 10 , BalanceMatrix = Q2_2_X , max.generations = 15 , unif.seed = 123, int.seed = 92485)))

matchout.gen_3 <- Match(Tr = Q2Tr, X= Q2_2_X, estimand ='ATE', Weight.matrix = Q2genout_3)

mbout_Q2_3 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = matchout.gen_3, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.47209 
## mean control..........    0.50847            0.47442 
## std mean diff.........    -10.047            -0.4653 
## 
## mean raw eQQ diff.....   0.050847          0.0023095 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0011547 
## med  eCDF diff........   0.025071          0.0011547 
## max  eCDF diff........   0.050141          0.0023095 
## 
## var ratio (Tr/Co).....    0.98809             0.9995 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.52558 
## mean control..........    0.49153            0.52558 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391            0.94186 
## mean control..........    0.94915            0.94186 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             51.709 
## mean control..........     49.178             51.556 
## std mean diff.........     38.385             1.6727 
## 
## mean raw eQQ diff.....      3.661            0.64665 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  7 
## 
## mean eCDF diff........   0.075348           0.012462 
## med  eCDF diff........   0.075538           0.010393 
## max  eCDF diff........    0.17807           0.034642 
## 
## var ratio (Tr/Co).....    0.71552            0.91419 
## T-test p-value........  0.0020402            0.23929 
## KS Bootstrap p-value..      0.007              0.887 
## KS Naive p-value......  0.0087659            0.95738 
## KS Statistic..........    0.17807           0.034642 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.4349 
## mean control..........     8.7458             8.6558 
## std mean diff.........    -2.1085            -2.9711 
## 
## mean raw eQQ diff.....    0.66949            0.51039 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  5 
## 
## mean eCDF diff........   0.017181           0.013001 
## med  eCDF diff........    0.01445          0.0069284 
## max  eCDF diff........   0.051608           0.050808 
## 
## var ratio (Tr/Co).....    0.77347            0.89169 
## T-test p-value........    0.85956            0.21901 
## KS Bootstrap p-value..      0.811              0.382 
## KS Naive p-value......    0.97653            0.63122 
## KS Statistic..........   0.051608           0.050808 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.50228 
## mean control..........    0.50602            0.50123 
## std mean diff.........    -5.2747            0.83037 
## 
## mean raw eQQ diff.....   0.011441          0.0071824 
## med  raw eQQ diff.....       0.01                  0 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.010778 
## med  eCDF diff........   0.010811          0.0069284 
## max  eCDF diff........   0.048512           0.034642 
## 
## var ratio (Tr/Co).....     1.1269             1.0996 
## T-test p-value........    0.61103            0.54972 
## KS Bootstrap p-value..       0.92              0.882 
## KS Naive p-value......    0.98776            0.95738 
## KS Statistic..........   0.048512           0.034642 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.21901 
## Variable Name(s): srvlng  Number(s): 5
mout_Q2_2_TE_3 <- Match( Weight.matrix = Q2genout_3, Y= daughters$nowtot, Tr = Q2Tr, X= Q2_2_X, estimand ='ATE')

mbout_Q2_2_TE_3 <- MatchBalance(hasgirls ~ dems + repubs+ christian + age + srvlng + demvote, data = daughters, match.out = mout_Q2_2_TE_3, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.47209 
## mean control..........    0.50847            0.47442 
## std mean diff.........    -10.047            -0.4653 
## 
## mean raw eQQ diff.....   0.050847          0.0023095 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.025071          0.0011547 
## med  eCDF diff........   0.025071          0.0011547 
## max  eCDF diff........   0.050141          0.0023095 
## 
## var ratio (Tr/Co).....    0.98809             0.9995 
## T-test p-value........    0.35571            0.31731 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.52558 
## mean control..........    0.49153            0.52558 
## std mean diff.........        9.4                  0 
## 
## mean raw eQQ diff.....   0.042373                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.023468                  0 
## med  eCDF diff........   0.023468                  0 
## max  eCDF diff........   0.046936                  0 
## 
## var ratio (Tr/Co).....    0.98911                  1 
## T-test p-value........     0.3873                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391            0.94186 
## mean control..........    0.94915            0.94186 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             51.709 
## mean control..........     49.178             51.556 
## std mean diff.........     38.385             1.6727 
## 
## mean raw eQQ diff.....      3.661            0.64665 
## med  raw eQQ diff.....          4                  1 
## max  raw eQQ diff.....          7                  7 
## 
## mean eCDF diff........   0.075348           0.012462 
## med  eCDF diff........   0.075538           0.010393 
## max  eCDF diff........    0.17807           0.034642 
## 
## var ratio (Tr/Co).....    0.71552            0.91419 
## T-test p-value........  0.0020402            0.23929 
## KS Bootstrap p-value..      0.002               0.85 
## KS Naive p-value......  0.0087659            0.95738 
## KS Statistic..........    0.17807           0.034642 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.4349 
## mean control..........     8.7458             8.6558 
## std mean diff.........    -2.1085            -2.9711 
## 
## mean raw eQQ diff.....    0.66949            0.51039 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  5 
## 
## mean eCDF diff........   0.017181           0.013001 
## med  eCDF diff........    0.01445          0.0069284 
## max  eCDF diff........   0.051608           0.050808 
## 
## var ratio (Tr/Co).....    0.77347            0.89169 
## T-test p-value........    0.85956            0.21901 
## KS Bootstrap p-value..      0.789              0.372 
## KS Naive p-value......    0.97653            0.63122 
## KS Statistic..........   0.051608           0.050808 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.50228 
## mean control..........    0.50602            0.50123 
## std mean diff.........    -5.2747            0.83037 
## 
## mean raw eQQ diff.....   0.011441          0.0071824 
## med  raw eQQ diff.....       0.01                  0 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.010778 
## med  eCDF diff........   0.010811          0.0069284 
## max  eCDF diff........   0.048512           0.034642 
## 
## var ratio (Tr/Co).....     1.1269             1.0996 
## T-test p-value........    0.61103            0.54972 
## KS Bootstrap p-value..      0.918              0.887 
## KS Naive p-value......    0.98776            0.95738 
## KS Statistic..........   0.048512           0.034642 
## 
## 
## Before Matching Minimum p.value: 0.002 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.21901 
## Variable Name(s): srvlng  Number(s): 5
summary(mout_Q2_2_TE_3)
## 
## Estimate...  0.50581 
## AI SE......  2.0678 
## T-stat.....  0.24462 
## p.val......  0.80675 
## 
## Original number of observations..............  430 
## Original number of treated obs...............  312 
## Matched number of observations...............  430 
## Matched number of observations  (unweighted).  433
mbout_Q2_2_TE_3$AMsmallest.p.value
## [1] 0.2190103
mout_Q2_2_TE_3$est
##          [,1]
## [1,] 0.505814

1: Original number of observations………….. 430 Original number of treated obs…………… 312 Matched number of observations…………… 312 Matched number of observations (unweighted). 312 mbout_Q2_2_TE_1\(AMsmallest.p.value 0.3173124 mout_Q2_2_TE_1\)est 0.6410256

2: Original number of observations………….. 430 Original number of treated obs…………… 312 Matched number of observations…………… 312 Matched number of observations (unweighted). 1249

mbout_Q2_2_TE_2\(AMsmallest.p.value 0.003 mout_Q2_2_TE_2\)est 1.309295

3: Original number of observations………….. 430 Original number of treated obs…………… 312 Matched number of observations…………… 430 Matched number of observations (unweighted). 433

mbout_Q2_2_TE_3\(AMsmallest.p.value 0.2190103 mout_Q2_2_TE_3\)est 0.505814

STEP 6

Repeat everything you’ve done for Steps 1-2, including the regression, genetic algorithm, code and estimating the treatment effect EXCEPT this time change the definition of treatment to cover 2 girls or more, and change the definition of control to cover 2 boys or more. Exclude all observations that don’t meet these requirements. Be sure to explain (in a sentence or two) what you’re doing with your new treatment and control definitions. Do your new definitions change anything?

Note: Definition of the new treatment variable is as follows: Individuals in the treatment group should be having 2 or more girls and no boys, and individuals in the control group should be having 2 or more boys and no girls. What I had in mind was that such a situation increased the “dosage” of treatment vs. the “dosage” of control (and Rosenbaum alluded to this kind of observational design logic in one of the recently assigned articles). Note that you can’t have the same units in the treatment group AND the control group – we should all know by now that such a situation would be wrong.

# Your code here

#creating  a new dataframe
step6_df <- daughters
step6_df$newtreat <- NA

#applying the conditions for control and treatment
step6_df$newtreat[step6_df$ngirls >= 2 & step6_df$nboys == 0] <- 1
step6_df$newtreat[step6_df$nboys >= 2 & step6_df$ngirls == 0] <- 0

#keeping in the data only the ones that apply to the condition
step6_df<- step6_df[ which(step6_df$newtreat==0 | step6_df$newtreat == 1), ]


######step 1 again:
Q2lm6 <- lm(nowtot ~ newtreat + dems + repubs+ christian + age + srvlng + demvote , data = step6_df)

coef(summary(Q2lm6))["newtreat","Estimate"]
## [1] 12.29254
Q2mout6 <- MatchBalance(newtreat ~ dems + repubs+ christian + age + srvlng + demvote, data = step6_df, nboots = 500) 
## 
## ***** (V1) dems *****
## before matching:
## mean treatment........ 0.61702 
## mean control.......... 0.40909 
## std mean diff......... 42.317 
## 
## mean raw eQQ diff..... 0.20455 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 1 
## 
## mean eCDF diff........ 0.10397 
## med  eCDF diff........ 0.10397 
## max  eCDF diff........ 0.20793 
## 
## var ratio (Tr/Co)..... 0.97609 
## T-test p-value........ 0.04806 
## 
## 
## ***** (V2) repubs *****
## before matching:
## mean treatment........ 0.38298 
## mean control.......... 0.59091 
## std mean diff......... -42.317 
## 
## mean raw eQQ diff..... 0.22727 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 1 
## 
## mean eCDF diff........ 0.10397 
## med  eCDF diff........ 0.10397 
## max  eCDF diff........ 0.20793 
## 
## var ratio (Tr/Co)..... 0.97609 
## T-test p-value........ 0.04806 
## 
## 
## ***** (V3) christian *****
## before matching:
## mean treatment........ 0.91489 
## mean control.......... 0.97727 
## std mean diff......... -22.116 
## 
## mean raw eQQ diff..... 0.068182 
## med  raw eQQ diff..... 0 
## max  raw eQQ diff..... 1 
## 
## mean eCDF diff........ 0.03119 
## med  eCDF diff........ 0.03119 
## max  eCDF diff........ 0.062379 
## 
## var ratio (Tr/Co)..... 3.5005 
## T-test p-value........ 0.1887 
## 
## 
## ***** (V4) age *****
## before matching:
## mean treatment........ 51.213 
## mean control.......... 51.977 
## std mean diff......... -8.2171 
## 
## mean raw eQQ diff..... 1.5682 
## med  raw eQQ diff..... 1 
## max  raw eQQ diff..... 6 
## 
## mean eCDF diff........ 0.0288 
## med  eCDF diff........ 0.025629 
## max  eCDF diff........ 0.07882 
## 
## var ratio (Tr/Co)..... 0.79756 
## T-test p-value........ 0.71354 
## KS Bootstrap p-value.. 0.968 
## KS Naive p-value...... 0.99893 
## KS Statistic.......... 0.07882 
## 
## 
## ***** (V5) srvlng *****
## before matching:
## mean treatment........ 7.9574 
## mean control.......... 10.568 
## std mean diff......... -38.919 
## 
## mean raw eQQ diff..... 2.8636 
## med  raw eQQ diff..... 2 
## max  raw eQQ diff..... 14 
## 
## mean eCDF diff........ 0.079008 
## med  eCDF diff........ 0.091876 
## max  eCDF diff........ 0.13926 
## 
## var ratio (Tr/Co)..... 0.48803 
## T-test p-value........ 0.13925 
## KS Bootstrap p-value.. 0.506 
## KS Naive p-value...... 0.77019 
## KS Statistic.......... 0.13926 
## 
## 
## ***** (V6) demvote *****
## before matching:
## mean treatment........ 0.51745 
## mean control.......... 0.48727 
## std mean diff......... 24.982 
## 
## mean raw eQQ diff..... 0.045909 
## med  raw eQQ diff..... 0.045 
## max  raw eQQ diff..... 0.11 
## 
## mean eCDF diff........ 0.10233 
## med  eCDF diff........ 0.091393 
## max  eCDF diff........ 0.25193 
## 
## var ratio (Tr/Co)..... 1.0419 
## T-test p-value........ 0.23199 
## KS Bootstrap p-value.. 0.054 
## KS Naive p-value...... 0.11171 
## KS Statistic.......... 0.25193 
## 
## 
## Before Matching Minimum p.value: 0.04806 
## Variable Name(s): dems repubs  Number(s): 1 2
#Checking balance

Q2plot4_den <- bal.plot(step6_df$newtreat ~ step6_df$age,  type = "density", treat = daughters$hasgirls)
Q2plot5_den <- bal.plot(step6_df$newtreat ~ step6_df$srvlng,  type = "density", treat = daughters$hasgirls)
Q2plot6_den <- bal.plot(step6_df$newtreat ~ step6_df$demvote,  type = "density", treat = daughters$hasgirls)

grid.arrange(Q2plot4_den, Q2plot5_den , Q2plot6_den,  nrow = 2)

Q2plot1 <- bal.plot(step6_df$newtreat ~ step6_df$dems,  type = "histogram", treat = step6_df$newtreat, mirror = TRUE)
Q2plot2 <- bal.plot(step6_df$newtreat ~ step6_df$repubs,  type = "histogram", treat = step6_df$newtreat, mirror = TRUE)
Q2plot3 <- bal.plot(step6_df$newtreat ~ step6_df$christian,  type = "histogram", treat = step6_df$newtreat, mirror = TRUE)
Q2plot4 <- bal.plot(step6_df$newtreat ~ step6_df$age,  type = "histogram", treat = step6_df$newtreat, mirror = TRUE)
Q2plot5 <- bal.plot(step6_df$newtreat ~ step6_df$srvlng,  type = "histogram", treat = step6_df$newtreat, mirror = TRUE)
Q2plot6 <- bal.plot(step6_df$newtreat ~ step6_df$demvote,  type = "histogram", treat = step6_df$newtreat, mirror = TRUE)

grid.arrange(Q2plot1, Q2plot2, Q2plot3, Q2plot4, Q2plot5, Q2plot6,  nrow = 2)

###Step 2 revised:

Q2Y6= step6_df$nowtot

Q2Tr6 = step6_df$newtreat

Q2_2_X6 <- cbind(step6_df$dems, step6_df$repubs, step6_df$christian, step6_df$age , step6_df$srvlng , step6_df$demvote)


###
invisible(capture.output(Q2genout6 <- GenMatch(Tr = Q2Tr6, X = Q2_2_X6, estimand ='ATT', pop.size = 10 , BalanceMatrix = Q2_2_X6 , max.generations = 15 , unif.seed = 123, int.seed = 92485)))


matchout.gen6 <- Match(Tr = Q2Tr6, X= Q2_2_X6, estimand ='ATT', Weight.matrix = Q2genout6, M=1)

mbout_Q2_26 <- MatchBalance(newtreat ~ dems + repubs+ christian + age + srvlng + demvote, data = step6_df, match.out = matchout.gen6, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.61702            0.61702 
## mean control..........    0.40909            0.61702 
## std mean diff.........     42.317                  0 
## 
## mean raw eQQ diff.....    0.20455                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.10397                  0 
## med  eCDF diff........    0.10397                  0 
## max  eCDF diff........    0.20793                  0 
## 
## var ratio (Tr/Co).....    0.97609                  1 
## T-test p-value........    0.04806                  1 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.38298            0.38298 
## mean control..........    0.59091            0.38298 
## std mean diff.........    -42.317                  0 
## 
## mean raw eQQ diff.....    0.22727                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.10397                  0 
## med  eCDF diff........    0.10397                  0 
## max  eCDF diff........    0.20793                  0 
## 
## var ratio (Tr/Co).....    0.97609                  1 
## T-test p-value........    0.04806                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........    0.91489            0.91489 
## mean control..........    0.97727            0.91489 
## std mean diff.........    -22.116                  0 
## 
## mean raw eQQ diff.....   0.068182                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.03119                  0 
## med  eCDF diff........    0.03119                  0 
## max  eCDF diff........   0.062379                  0 
## 
## var ratio (Tr/Co).....     3.5005                  1 
## T-test p-value........     0.1887                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     51.213             51.213 
## mean control..........     51.977             51.745 
## std mean diff.........    -8.2171            -5.7171 
## 
## mean raw eQQ diff.....     1.5682            0.95745 
## med  raw eQQ diff.....          1                  1 
## max  raw eQQ diff.....          6                  3 
## 
## mean eCDF diff........     0.0288            0.02695 
## med  eCDF diff........   0.025629           0.021277 
## max  eCDF diff........    0.07882            0.06383 
## 
## var ratio (Tr/Co).....    0.79756             1.0089 
## T-test p-value........    0.71354            0.60035 
## KS Bootstrap p-value..      0.966              0.999 
## KS Naive p-value......    0.99893            0.99998 
## KS Statistic..........    0.07882            0.06383 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     7.9574             7.9574 
## mean control..........     10.568             8.1489 
## std mean diff.........    -38.919            -2.8546 
## 
## mean raw eQQ diff.....     2.8636            0.61702 
## med  raw eQQ diff.....          2                  0 
## max  raw eQQ diff.....         14                  2 
## 
## mean eCDF diff........   0.079008           0.022695 
## med  eCDF diff........   0.091876           0.021277 
## max  eCDF diff........    0.13926            0.06383 
## 
## var ratio (Tr/Co).....    0.48803            0.92491 
## T-test p-value........    0.13925            0.66963 
## KS Bootstrap p-value..      0.492              0.986 
## KS Naive p-value......    0.77019            0.99998 
## KS Statistic..........    0.13926            0.06383 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.51745            0.51745 
## mean control..........    0.48727               0.51 
## std mean diff.........     24.982             6.1654 
## 
## mean raw eQQ diff.....   0.045909           0.025745 
## med  raw eQQ diff.....      0.045               0.02 
## max  raw eQQ diff.....       0.11               0.07 
## 
## mean eCDF diff........    0.10233           0.053495 
## med  eCDF diff........   0.091393           0.042553 
## max  eCDF diff........    0.25193            0.12766 
## 
## var ratio (Tr/Co).....     1.0419             1.3854 
## T-test p-value........    0.23199            0.44246 
## KS Bootstrap p-value..      0.058               0.73 
## KS Naive p-value......    0.11171            0.83838 
## KS Statistic..........    0.25193            0.12766 
## 
## 
## Before Matching Minimum p.value: 0.04806 
## Variable Name(s): dems repubs  Number(s): 1 2 
## 
## After Matching Minimum p.value: 0.44246 
## Variable Name(s): demvote  Number(s): 6
mout_Q2_2_TE6 <- Match( Weight.matrix = Q2genout6, Y= Q2Y6, Tr = Q2Tr6, X= Q2_2_X6, estimand ='ATT', M=1)

mbout_Q2_2_TE <- MatchBalance(newtreat ~ dems + repubs+ christian + age + srvlng + demvote, data = step6_df, match.out = mout_Q2_2_TE6, nboots=1000)
## 
## ***** (V1) dems *****
##                        Before Matching        After Matching
## mean treatment........    0.61702            0.61702 
## mean control..........    0.40909            0.61702 
## std mean diff.........     42.317                  0 
## 
## mean raw eQQ diff.....    0.20455                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.10397                  0 
## med  eCDF diff........    0.10397                  0 
## max  eCDF diff........    0.20793                  0 
## 
## var ratio (Tr/Co).....    0.97609                  1 
## T-test p-value........    0.04806                  1 
## 
## 
## ***** (V2) repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.38298            0.38298 
## mean control..........    0.59091            0.38298 
## std mean diff.........    -42.317                  0 
## 
## mean raw eQQ diff.....    0.22727                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.10397                  0 
## med  eCDF diff........    0.10397                  0 
## max  eCDF diff........    0.20793                  0 
## 
## var ratio (Tr/Co).....    0.97609                  1 
## T-test p-value........    0.04806                  1 
## 
## 
## ***** (V3) christian *****
##                        Before Matching        After Matching
## mean treatment........    0.91489            0.91489 
## mean control..........    0.97727            0.91489 
## std mean diff.........    -22.116                  0 
## 
## mean raw eQQ diff.....   0.068182                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.03119                  0 
## med  eCDF diff........    0.03119                  0 
## max  eCDF diff........   0.062379                  0 
## 
## var ratio (Tr/Co).....     3.5005                  1 
## T-test p-value........     0.1887                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     51.213             51.213 
## mean control..........     51.977             51.745 
## std mean diff.........    -8.2171            -5.7171 
## 
## mean raw eQQ diff.....     1.5682            0.95745 
## med  raw eQQ diff.....          1                  1 
## max  raw eQQ diff.....          6                  3 
## 
## mean eCDF diff........     0.0288            0.02695 
## med  eCDF diff........   0.025629           0.021277 
## max  eCDF diff........    0.07882            0.06383 
## 
## var ratio (Tr/Co).....    0.79756             1.0089 
## T-test p-value........    0.71354            0.60035 
## KS Bootstrap p-value..      0.968                  1 
## KS Naive p-value......    0.99893            0.99998 
## KS Statistic..........    0.07882            0.06383 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     7.9574             7.9574 
## mean control..........     10.568             8.1489 
## std mean diff.........    -38.919            -2.8546 
## 
## mean raw eQQ diff.....     2.8636            0.61702 
## med  raw eQQ diff.....          2                  0 
## max  raw eQQ diff.....         14                  2 
## 
## mean eCDF diff........   0.079008           0.022695 
## med  eCDF diff........   0.091876           0.021277 
## max  eCDF diff........    0.13926            0.06383 
## 
## var ratio (Tr/Co).....    0.48803            0.92491 
## T-test p-value........    0.13925            0.66963 
## KS Bootstrap p-value..      0.472              0.984 
## KS Naive p-value......    0.77019            0.99998 
## KS Statistic..........    0.13926            0.06383 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.51745            0.51745 
## mean control..........    0.48727               0.51 
## std mean diff.........     24.982             6.1654 
## 
## mean raw eQQ diff.....   0.045909           0.025745 
## med  raw eQQ diff.....      0.045               0.02 
## max  raw eQQ diff.....       0.11               0.07 
## 
## mean eCDF diff........    0.10233           0.053495 
## med  eCDF diff........   0.091393           0.042553 
## max  eCDF diff........    0.25193            0.12766 
## 
## var ratio (Tr/Co).....     1.0419             1.3854 
## T-test p-value........    0.23199            0.44246 
## KS Bootstrap p-value..      0.049              0.698 
## KS Naive p-value......    0.11171            0.83838 
## KS Statistic..........    0.25193            0.12766 
## 
## 
## Before Matching Minimum p.value: 0.04806 
## Variable Name(s): dems repubs  Number(s): 1 2 
## 
## After Matching Minimum p.value: 0.44246 
## Variable Name(s): demvote  Number(s): 6
summary(mout_Q2_2_TE6)
## 
## Estimate...  11.383 
## AI SE......  4.0247 
## T-stat.....  2.8283 
## p.val......  0.0046797 
## 
## Original number of observations..............  91 
## Original number of treated obs...............  47 
## Matched number of observations...............  47 
## Matched number of observations  (unweighted).  47
mout_Q2_2_TE6$est
##          [,1]
## [1,] 11.38298

STEP 7

It is NOT wise to match or balance on “totchi”. What is the reason? Hint: You will have to look at what variables mean in the data set to be able to answer this question.

mean(daughters$totchi)
## [1] 2.497674
mean(daughters$nboys)
## [1] 1.223256
mean(daughters$ngirls)
## [1] 1.274419

Because we are testing as treatment the effect of having girls on decision making being more liberal. There is a significant importance to having daughters versus sons. By matching with the total number of children, samples can be matched, although this match will be wrong for the test. For example, a subject who has three girls could be matched with another who has two boys and a girl. Although the number of total children is identical, it will distort the results.

QUESTION 3: COPD

Most causal studies on the health effects of smoking are observational studies (well, for very obvious reasons). In this exercise, we are specifically after answer the following question: Does smoking increase the risk of chronic obstructive pulmonary disease (COPD)? To learn more about the disease, read here: https://www.cdc.gov/copd/index.html

We’ll use a sub-sample of the 2015 BRFSS survey (pronounced bur-fiss), which stands for Behavioral Risk Factor Surveillance System. The data is collected through a phone survey across American citizens regarding their health-related risk behaviors and chronic health conditions. Although, the entire survey has over 400,000 records and over 300 variables, we only sample 5,000 observations and 7 variables.

Let’s load the data first and take a look at the first few rows:

brfss = read.csv("http://bit.ly/BRFSS_data") %>% 
  clean_names()
head(brfss)

A summary of the variables is as follows:

  • copd: Ever told you have chronic obstructive pulmonary disease (COPD)?
  • smoke: Adults who are current smokers (0 = no, 1 = yes)
  • race: Race group
  • age: age group
  • sex: gender
  • wtlbs: weight in pounds (lbs)
  • avedrnk2: During the past 30 days, when you drank, how many drinks did you drink on average?

STEP 1

Check the balance of covariates before matching using any method of your choice. You can look at balance tables, balance plots, or love plots from any package of your choice. Do you see a balance across the covariates?

Note: This is optional but you can use the gridExtra package and its grid.arrange() function to put all the 4 graphs in one 2 x 2 graph. Read more about the package and how to use it here: https://cran.r-project.org/web/packages/egg/vignettes/Ecosystem.html. Set nrow = 2.

# Your code here
#na.omit(brfss)

brfss1 = splitfactor(brfss, "sex", replace = FALSE, drop.first = TRUE)

Q3plot1 <- bal.plot(x = brfss, var.name="race", type = "histogram", which = "unadjusted", mirror = TRUE, treat = brfss$smoke)
Q3plot2 <- bal.plot(x = brfss, var.name="age", type = "histogram", which = "unadjusted", mirror = TRUE, treat = brfss$smoke)
Q3plot3 <- bal.plot(x = brfss1, var.name="sex_Male", type = "histogram", which = "unadjusted", mirror = TRUE, treat = brfss$smoke)
Q3plot4 <- bal.plot(x = brfss, var.name="wtlbs", type = "histogram", which = "unadjusted", mirror = TRUE, treat = brfss$smoke)
Q3plot5 <- bal.plot(x = brfss, var.name="avedrnk2", type = "histogram", which = "unadjusted", mirror = TRUE, treat = brfss$smoke)

grid.arrange(Q3plot1, Q3plot2, Q3plot3, Q3plot4, Q3plot5, nrow = 2)

STEP 2

Now, let’s do Mahalanobis distance matching. Note that you can use the same old Match() function. Use all covariates in the data set to match on, however, make sure you convert all categorical variables into factor variables (Google to see how). We are going to specify estimand = "ATT" in the Match() function. What’s the treatment effect after matching?

# Your code here
brfss[sapply(brfss, is.character)] <- lapply(brfss[sapply(brfss, is.character)], as.factor)
str(brfss)
## 'data.frame':    5000 obs. of  7 variables:
##  $ copd    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ smoke   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ race    : Factor w/ 4 levels "Black","Hispanic",..: 4 4 4 2 4 4 4 4 4 2 ...
##  $ age     : Factor w/ 6 levels "18-24","25-34",..: 1 5 6 1 6 6 2 4 6 2 ...
##  $ sex     : Factor w/ 2 levels "Female","Male": 1 1 2 2 1 2 1 2 1 2 ...
##  $ wtlbs   : num  180 170 170 185 150 180 155 172 130 160 ...
##  $ avedrnk2: int  4 1 1 1 1 2 2 2 1 2 ...
X_Q3 <- cbind( brfss$race, brfss$age, brfss$sex, brfss$wtlbs, brfss$avedrnk2)

rr_Q3 <- Match( Tr= brfss$smoke , X= X_Q3, estimand = "ATT", Weight = 2, M=1)

mb_Q3 <- MatchBalance(brfss$smoke ~ brfss$race + brfss$age+ brfss$sex+ brfss$wtlbs+ brfss$avedrnk2, match.out = rr_Q3, nboots = 1000)
## 
## ***** (V1) brfss$raceHispanic *****
##                        Before Matching        After Matching
## mean treatment........   0.070866           0.070866 
## mean control..........   0.067249           0.073491 
## std mean diff.........     1.4088            -1.0222 
## 
## mean raw eQQ diff.....  0.0026247         0.00089286 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0018087         0.00044643 
## med  eCDF diff........  0.0018087         0.00044643 
## max  eCDF diff........  0.0036174         0.00089286 
## 
## var ratio (Tr/Co).....     1.0508            0.96702 
## T-test p-value........    0.71939            0.15716 
## 
## 
## ***** (V2) brfss$raceOther *****
##                        Before Matching        After Matching
## mean treatment........   0.068241           0.068241 
## mean control..........   0.047664           0.062992 
## std mean diff.........     8.1551             2.0804 
## 
## mean raw eQQ diff.....   0.019685          0.0017857 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.010289         0.00089286 
## med  eCDF diff........   0.010289         0.00089286 
## max  eCDF diff........   0.020577          0.0017857 
## 
## var ratio (Tr/Co).....     1.4023             1.0773 
## T-test p-value........   0.034312           0.045288 
## 
## 
## ***** (V3) brfss$raceWhite *****
##                        Before Matching        After Matching
## mean treatment........    0.74147            0.74147 
## mean control..........    0.83388            0.74541 
## std mean diff.........    -21.094           -0.89863 
## 
## mean raw eQQ diff.....   0.091864          0.0013393 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.046207         0.00066964 
## med  eCDF diff........   0.046207         0.00066964 
## max  eCDF diff........   0.092414          0.0013393 
## 
## var ratio (Tr/Co).....     1.3853             1.0101 
## T-test p-value........ 5.4811e-08           0.083062 
## 
## 
## ***** (V4) brfss$age25-34 *****
##                        Before Matching        After Matching
## mean treatment........    0.18241            0.18241 
## mean control..........    0.10854             0.1916 
## std mean diff.........     19.116            -2.3772 
## 
## mean raw eQQ diff.....   0.073491          0.0035714 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.036936          0.0017857 
## med  eCDF diff........   0.036936          0.0017857 
## max  eCDF diff........   0.073873          0.0035714 
## 
## var ratio (Tr/Co).....      1.543            0.96287 
## T-test p-value........ 7.0411e-07            0.16138 
## 
## 
## ***** (V5) brfss$age35-44 *****
##                        Before Matching        After Matching
## mean treatment........     0.1811             0.1811 
## mean control..........    0.12482            0.16929 
## std mean diff.........     14.605              3.065 
## 
## mean raw eQQ diff.....    0.05643          0.0044643 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........    0.02814          0.0022321 
## med  eCDF diff........    0.02814          0.0022321 
## max  eCDF diff........   0.056279          0.0044643 
## 
## var ratio (Tr/Co).....      1.359             1.0546 
## T-test p-value........ 0.00016078           0.049321 
## 
## 
## ***** (V6) brfss$age45-54 *****
##                        Before Matching        After Matching
## mean treatment........    0.18373            0.18373 
## mean control..........    0.18263            0.18766 
## std mean diff.........    0.28224             -1.016 
## 
## mean raw eQQ diff.....  0.0013123          0.0013393 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........ 0.00054686         0.00066964 
## med  eCDF diff........ 0.00054686         0.00066964 
## max  eCDF diff........  0.0010937          0.0013393 
## 
## var ratio (Tr/Co).....     1.0057            0.98377 
## T-test p-value........    0.94281            0.46692 
## 
## 
## ***** (V7) brfss$age55-64 *****
##                        Before Matching        After Matching
## mean treatment........    0.21916            0.21916 
## mean control..........    0.22534             0.2336 
## std mean diff.........    -1.4934            -3.4873 
## 
## mean raw eQQ diff.....  0.0065617          0.0049107 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.003091          0.0024554 
## med  eCDF diff........   0.003091          0.0024554 
## max  eCDF diff........   0.006182          0.0049107 
## 
## var ratio (Tr/Co).....    0.98138            0.95587 
## T-test p-value........    0.70477          0.0044161 
## 
## 
## ***** (V8) brfss$age65+ *****
##                        Before Matching        After Matching
## mean treatment........    0.16404            0.16404 
## mean control..........    0.31029            0.15879 
## std mean diff.........    -39.467             1.4166 
## 
## mean raw eQQ diff.....    0.14698          0.0017857 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.073123         0.00089286 
## med  eCDF diff........   0.073123         0.00089286 
## max  eCDF diff........    0.14625          0.0017857 
## 
## var ratio (Tr/Co).....    0.64147             1.0266 
## T-test p-value........ < 2.22e-16            0.15716 
## 
## 
## ***** (V9) brfss$sexMale *****
##                        Before Matching        After Matching
## mean treatment........    0.54593            0.54593 
## mean control..........    0.48679            0.54593 
## std mean diff.........     11.872                  0 
## 
## mean raw eQQ diff.....   0.059055                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.029573                  0 
## med  eCDF diff........   0.029573                  0 
## max  eCDF diff........   0.059146                  0 
## 
## var ratio (Tr/Co).....    0.99332                  1 
## T-test p-value........   0.002627                  1 
## 
## 
## ***** (V10) brfss$wtlbs *****
##                        Before Matching        After Matching
## mean treatment........     177.76             177.76 
## mean control..........     177.99             177.87 
## std mean diff.........   -0.49852           -0.24766 
## 
## mean raw eQQ diff.....     2.1432            0.63842 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....         50                 30 
## 
## mean eCDF diff........  0.0077522          0.0029607 
## med  eCDF diff........  0.0056792          0.0026786 
## max  eCDF diff........   0.025879           0.012054 
## 
## var ratio (Tr/Co).....     1.1223             1.0965 
## T-test p-value........    0.89836            0.76523 
## KS Bootstrap p-value..      0.664              0.961 
## KS Naive p-value......    0.78001            0.99683 
## KS Statistic..........   0.025879           0.012054 
## 
## 
## ***** (V11) brfss$avedrnk2 *****
##                        Before Matching        After Matching
## mean treatment........     2.9843             2.9843 
## mean control..........     1.9672             2.8937 
## std mean diff.........     40.633             3.6177 
## 
## mean raw eQQ diff.....          1           0.036607 
## med  raw eQQ diff.....          1                  0 
## max  raw eQQ diff.....          4                  3 
## 
## mean eCDF diff........   0.052876           0.002381 
## med  eCDF diff........    0.01035         0.00089286 
## max  eCDF diff........    0.26493           0.011607 
## 
## var ratio (Tr/Co).....     1.9119              1.018 
## T-test p-value........ < 2.22e-16         1.2562e-08 
## KS Bootstrap p-value.. < 2.22e-16              0.667 
## KS Naive p-value...... < 2.22e-16            0.99818 
## KS Statistic..........    0.26493           0.011607 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): brfss$age65+ brfss$avedrnk2  Number(s): 8 11 
## 
## After Matching Minimum p.value: 1.2562e-08 
## Variable Name(s): brfss$avedrnk2  Number(s): 11
rr_Q3_est <- Match( Tr= brfss$smoke , X= X_Q3, estimand = "ATT", M=1, Y = brfss$copd, Weight = 2)

summary(rr_Q3_est)
## 
## Estimate...  0.087135 
## AI SE......  0.015649 
## T-stat.....  5.5682 
## p.val......  2.5736e-08 
## 
## Original number of observations..............  5000 
## Original number of treated obs...............  762 
## Matched number of observations...............  762 
## Matched number of observations  (unweighted).  2240
rr_Q3_est$est
##           [,1]
## [1,] 0.0871346

STEP 3

Provide a few sentences talking about the number of treated units dropped, and a few more sentences talking about the balance obtained.

The original number of observations is 5000. 762 observations out of the 5000 are under treatment. Because M is set to 1 and estimand to ATT, the algorithm found a match for each treated unit a control observation; thus, none of the treated units were dropped. However, the Mahalanobis matching did not do a good job. the balance did improve a little but definitely not in a significant way that can be considered useful. the minimum p-value improved from 2.22e-16 to 1.2562e-08.

STEP 4

Now, let’s do another Mahalanobis distance matching. Use all covariates in the data set in the propensity score estimation. However, this time make sure you specify estimand = "ATE" in the Match() function. What’s the treatment effect after matching?

# Your code here
rr_Q3_4 <- Match( Tr= brfss$smoke , X= X_Q3, estimand = "ATE", M=1, Y = brfss$copd, Weight = 2)

mb_Q3_4 <- MatchBalance(brfss$smoke ~ brfss$race + brfss$age+ brfss$sex+ brfss$wtlbs+ brfss$avedrnk2, match.out = rr_Q3_4, nboots = 1000)
## 
## ***** (V1) brfss$raceHispanic *****
##                        Before Matching        After Matching
## mean treatment........   0.070866             0.0642 
## mean control..........   0.067249             0.0682 
## std mean diff.........     1.4088            -1.6318 
## 
## mean raw eQQ diff.....  0.0026247          0.0025887 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........  0.0018087          0.0012943 
## med  eCDF diff........  0.0018087          0.0012943 
## max  eCDF diff........  0.0036174          0.0025887 
## 
## var ratio (Tr/Co).....     1.0508            0.94539 
## T-test p-value........    0.71939         0.00025845 
## 
## 
## ***** (V2) brfss$raceOther *****
##                        Before Matching        After Matching
## mean treatment........   0.068241               0.05 
## mean control..........   0.047664               0.05 
## std mean diff.........     8.1551                  0 
## 
## mean raw eQQ diff.....   0.019685         0.00012943 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.010289         6.4717e-05 
## med  eCDF diff........   0.010289         6.4717e-05 
## max  eCDF diff........   0.020577         0.00012943 
## 
## var ratio (Tr/Co).....     1.4023                  1 
## T-test p-value........   0.034312                  1 
## 
## 
## ***** (V3) brfss$raceWhite *****
##                        Before Matching        After Matching
## mean treatment........    0.74147             0.8232 
## mean control..........    0.83388             0.8204 
## std mean diff.........    -21.094            0.73387 
## 
## mean raw eQQ diff.....   0.091864          0.0019415 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.046207         0.00097075 
## med  eCDF diff........   0.046207         0.00097075 
## max  eCDF diff........   0.092414          0.0019415 
## 
## var ratio (Tr/Co).....     1.3853            0.98777 
## T-test p-value........ 5.4811e-08           0.006023 
## 
## 
## ***** (V4) brfss$age25-34 *****
##                        Before Matching        After Matching
## mean treatment........    0.18241             0.1246 
## mean control..........    0.10854             0.1212 
## std mean diff.........     19.116             1.0294 
## 
## mean raw eQQ diff.....   0.073491          0.0024592 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.036936          0.0012296 
## med  eCDF diff........   0.036936          0.0012296 
## max  eCDF diff........   0.073873          0.0024592 
## 
## var ratio (Tr/Co).....      1.543             1.0241 
## T-test p-value........ 7.0411e-07              0.158 
## 
## 
## ***** (V5) brfss$age35-44 *****
##                        Before Matching        After Matching
## mean treatment........     0.1811             0.1479 
## mean control..........    0.12482             0.1316 
## std mean diff.........     14.605             4.5911 
## 
## mean raw eQQ diff.....    0.05643           0.010743 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........    0.02814          0.0053715 
## med  eCDF diff........    0.02814          0.0053715 
## max  eCDF diff........   0.056279           0.010743 
## 
## var ratio (Tr/Co).....      1.359             1.1028 
## T-test p-value........ 0.00016078         1.0036e-10 
## 
## 
## ***** (V6) brfss$age45-54 *****
##                        Before Matching        After Matching
## mean treatment........    0.18373             0.1732 
## mean control..........    0.18263             0.1834 
## std mean diff.........    0.28224            -2.6951 
## 
## mean raw eQQ diff.....  0.0013123          0.0067305 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........ 0.00054686          0.0033653 
## med  eCDF diff........ 0.00054686          0.0033653 
## max  eCDF diff........  0.0010937          0.0067305 
## 
## var ratio (Tr/Co).....     1.0057            0.95618 
## T-test p-value........    0.94281         2.3588e-06 
## 
## 
## ***** (V7) brfss$age55-64 *****
##                        Before Matching        After Matching
## mean treatment........    0.21916             0.2297 
## mean control..........    0.22534             0.2266 
## std mean diff.........    -1.4934             0.7369 
## 
## mean raw eQQ diff.....  0.0065617          0.0020709 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.003091          0.0010355 
## med  eCDF diff........   0.003091          0.0010355 
## max  eCDF diff........   0.006182          0.0020709 
## 
## var ratio (Tr/Co).....    0.98138             1.0096 
## T-test p-value........    0.70477            0.15272 
## 
## 
## ***** (V8) brfss$age65+ *****
##                        Before Matching        After Matching
## mean treatment........    0.16404               0.28 
## mean control..........    0.31029             0.2872 
## std mean diff.........    -39.467            -1.6034 
## 
## mean raw eQQ diff.....    0.14698          0.0046596 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.073123          0.0023298 
## med  eCDF diff........   0.073123          0.0023298 
## max  eCDF diff........    0.14625          0.0046596 
## 
## var ratio (Tr/Co).....    0.64147            0.98478 
## T-test p-value........ < 2.22e-16         4.5155e-05 
## 
## 
## ***** (V9) brfss$sexMale *****
##                        Before Matching        After Matching
## mean treatment........    0.54593             0.4954 
## mean control..........    0.48679             0.4958 
## std mean diff.........     11.872          -0.079995 
## 
## mean raw eQQ diff.....   0.059055         0.00025887 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.029573         0.00012943 
## med  eCDF diff........   0.029573         0.00012943 
## max  eCDF diff........   0.059146         0.00025887 
## 
## var ratio (Tr/Co).....    0.99332            0.99999 
## T-test p-value........   0.002627            0.31731 
## 
## 
## ***** (V10) brfss$wtlbs *****
##                        Before Matching        After Matching
## mean treatment........     177.76             177.36 
## mean control..........     177.99             177.97 
## std mean diff.........   -0.49852            -1.4132 
## 
## mean raw eQQ diff.....     2.1432             1.3044 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....         50                142 
## 
## mean eCDF diff........  0.0077522          0.0054519 
## med  eCDF diff........  0.0056792          0.0040771 
## max  eCDF diff........   0.025879           0.035206 
## 
## var ratio (Tr/Co).....     1.1223            0.95132 
## T-test p-value........    0.89836         3.2715e-05 
## KS Bootstrap p-value..      0.669              0.001 
## KS Naive p-value......    0.78001         0.00013875 
## KS Statistic..........   0.025879           0.035206 
## 
## 
## ***** (V11) brfss$avedrnk2 *****
##                        Before Matching        After Matching
## mean treatment........     2.9843             2.1334 
## mean control..........     1.9672             2.1084 
## std mean diff.........     40.633             1.2995 
## 
## mean raw eQQ diff.....          1             0.0343 
## med  raw eQQ diff.....          1                  0 
## max  raw eQQ diff.....          4                  6 
## 
## mean eCDF diff........   0.052876          0.0017303 
## med  eCDF diff........    0.01035         0.00051773 
## max  eCDF diff........    0.26493           0.020709 
## 
## var ratio (Tr/Co).....     1.9119            0.96755 
## T-test p-value........ < 2.22e-16         4.3741e-05 
## KS Bootstrap p-value.. < 2.22e-16              0.024 
## KS Naive p-value...... < 2.22e-16           0.072775 
## KS Statistic..........    0.26493           0.020709 
## 
## 
## Before Matching Minimum p.value: < 2.22e-16 
## Variable Name(s): brfss$age65+ brfss$avedrnk2  Number(s): 8 11 
## 
## After Matching Minimum p.value: 1.0036e-10 
## Variable Name(s): brfss$age35-44  Number(s): 5
summary(rr_Q3_4)
## 
## Estimate...  0.1316 
## AI SE......  0.017808 
## T-stat.....  7.3896 
## p.val......  1.4722e-13 
## 
## Original number of observations..............  5000 
## Original number of treated obs...............  762 
## Matched number of observations...............  5000 
## Matched number of observations  (unweighted).  7726

STEP 5

Are your answers in Steps 2 and 4 different? Why? What does the matching process do differently in each step? Which answer do you trust more?

Yes, the results are different.ATT matches the treated group, then discards the remaining controls, and therefore the model based on ATT is more trustworthy because the focus should be on the subjects who received treatment, then we should look for someone similar to them to match on. ATE tries to find treated observations to match for control units, which can skew the results

summary(rr_Q3_est)
## 
## Estimate...  0.087135 
## AI SE......  0.015649 
## T-stat.....  5.5682 
## p.val......  2.5736e-08 
## 
## Original number of observations..............  5000 
## Original number of treated obs...............  762 
## Matched number of observations...............  762 
## Matched number of observations  (unweighted).  2240
summary(rr_Q3_4)
## 
## Estimate...  0.1316 
## AI SE......  0.017808 
## T-stat.....  7.3896 
## p.val......  1.4722e-13 
## 
## Original number of observations..............  5000 
## Original number of treated obs...............  762 
## Matched number of observations...............  5000 
## Matched number of observations  (unweighted).  7726

BONUS QUESTION: Sensitivity Analysis

STEP 1

Use the BRFSS data set from the previous question. Now, identify the critical value of Gamma as we discussed in the class. Do it using rbounds: https://nbviewer.jupyter.org/gist/viniciusmss/a156c3f22081fb5c690cdd58658f61fa

# Your code here
library(rbounds)
rr_Q3_step5 <- Match(Y = brfss$copd ,Tr= brfss$smoke , X= X_Q3)
psens(rr_Q3_est, Gamma=3.5, GammaInc=.01)$bounds[150:160,]

STEP 2

Then, write a paragraph explaining what you found. Your paragraph should include numbers obtained from your analysis and explain what those numbers mean as simply as you can.

The idea of sensitivity analysis is to check how much other covariates that were not observed can skew the results in a way that will change the conclusions from the study. In this case, a gamma over 2.55 will skew the results to be insignificant. Thus, although the gamma is not very high, we can still say the results are somewhat robust. In observational studies, when gamma is greater than 1, such as 2, it means that one subject is twice as likely to receive treatment due to endogeneity. because gamma of over 2.5 is needed to skew the results so the p-value would become insignificant to reject the null hypothesis, it is somewhat robust

End of Assignment

Final Steps

Before finalizing your project you’ll want be sure there are comments in your code chunks and text outside of your code chunks to explain what you’re doing in each code chunk. These explanations are incredibly helpful for someone who doesn’t code or someone unfamiliar to your project.

You have two options for submission:

  1. You can complete this .rmd file, knit it to pdf and submit the resulting .pdf file on Forum.
  2. You can complete the Google Doc version of this assignment, include your code, graphs, results, and your explanations wherever necessary and download the Google Doc as a pdf file and submit the pdf file on Forum. If you choose this method, you need to make sure you will provide a link to an .R script file where you code can be found (you can host your code on Github or Google Drive). Note that links to Google Docs are not accepted as your final submission.

Knitting your R Markdown Document

Last but not least, you’ll want to Knit your .Rmd document into a pdf document. If you get an error, take a look at what the error says and edit your .Rmd document. Then, try to Knit again! Troubleshooting these error messages will teach you a lot about coding in R. If you get any error that doesn’t make sense to you, post it on Perusall.

Good Luck! The CS112 Teaching Team