foo <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSY9jLlufY1GjeMh7D2_g1m6olveHLNCerT2C36MTkcjwQCOlZYf8evLMzGOnc252OgXEEasHqcNIcZ/pub?gid=1976818127&single=true&output=csv")
reg1 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)
summary(reg1)
##
## Call:
## lm(formula = nowtot ~ hasgirls + Dems + Repubs + Christian +
## age + srvlng + demvote, data = foo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.028 -10.322 -1.517 11.208 69.642
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.6991 18.6306 2.077 0.038390 *
## hasgirls -0.4523 1.9036 -0.238 0.812322
## Dems -8.1022 17.5861 -0.461 0.645238
## Repubs -55.1069 17.6340 -3.125 0.001901 **
## Christian -13.3961 3.7218 -3.599 0.000357 ***
## age 0.1260 0.1117 1.128 0.259938
## srvlng -0.2251 0.1355 -1.662 0.097349 .
## demvote 87.5501 8.4847 10.319 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared: 0.7821, Adjusted R-squared: 0.7784
## F-statistic: 216.3 on 7 and 422 DF, p-value: < 2.2e-16
The treatment effect of having girls is -0.4523, with a standard error of 18.63. This means the confidence interval is estimated to be between -36.8077 and 37.7123 at a 95% confidence level. It is important to note that this treatment effect does not take into account the influence of confounding factors. To address this, we will introduce “Matching” in below session.
library(ggplot2)
library(gridExtra)
# plot the distribution
plot1 <- ggplot(foo, aes(x = Dems, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Dems by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot2 <- ggplot(foo, aes(x = Repubs, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Repubs by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot3 <- ggplot(foo, aes(x = Christian, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Christian by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot4 <- ggplot(foo, aes(x = age, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Age by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot5 <- ggplot(foo, aes(x = srvlng, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of srvlng by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot6 <- ggplot(foo, aes(x = demvote, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Demvote by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
# layout
grid.arrange(plot1, plot2, plot3, plot4, plot5, plot6, ncol = 2)
library(Matching)
## Warning: package 'Matching' was built under R version 4.3.3
## Loading required package: MASS
## ##
## ## Matching (Version 4.10-15, Build Date: 2024-10-14)
## ## See https://www.jsekhon.com for additional documentation.
## ## Please cite software as:
## ## Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ## Software with Automated Balance Optimization: The Matching package for R.''
## ## Journal of Statistical Software, 42(7): 1-52.
## ##
## Match on the confounders below...
X <- cbind(foo$Dems, foo$Repubs, foo$Christian, foo$age, foo$srvlng, foo$demvote)
Tr <- foo$hasgirls
Y <- foo$nowtot
genout <- GenMatch(Tr = Tr, estimand="ATT", X = X, M=3, pop.size=16, max.generations=10, wait.generations=1)
## Loading required namespace: rgenoud
##
##
## Mon Dec 16 19:59:57 2024
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 5.240188e-03 4.909477e-02 8.960232e-02 1.964681e-01 1.964681e-01 3.173124e-01 3.173124e-01 3.310551e-01 3.617439e-01 5.639124e-01 5.639124e-01 5.939293e-01
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 4.089335e+02
## mean............ 4.771542e+02
## variance........ 9.306529e+04
## var 2:
## best............ 6.233589e+02
## mean............ 3.744562e+02
## variance........ 7.007394e+04
## var 3:
## best............ 3.035372e+02
## mean............ 3.446113e+02
## variance........ 5.371188e+04
## var 4:
## best............ 9.650124e+02
## mean............ 5.201913e+02
## variance........ 1.315297e+05
## var 5:
## best............ 5.548271e+02
## mean............ 5.053687e+02
## variance........ 6.839926e+04
## var 6:
## best............ 2.126889e+02
## mean............ 5.184861e+02
## variance........ 1.012202e+05
##
## GENERATION: 1
## Lexical Fit..... 2.030087e-02 2.212342e-02 1.545841e-01 1.964681e-01 1.964681e-01 3.395078e-01 4.143426e-01 4.143426e-01 5.241463e-01 5.310622e-01 5.639124e-01 5.639124e-01
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 4.089335e+02
## mean............ 4.417691e+02
## variance........ 8.281992e+04
## var 2:
## best............ 6.233589e+02
## mean............ 4.714460e+02
## variance........ 3.541329e+04
## var 3:
## best............ 3.035372e+02
## mean............ 3.289821e+02
## variance........ 2.192679e+04
## var 4:
## best............ 9.650124e+02
## mean............ 9.083194e+02
## variance........ 1.290739e+04
## var 5:
## best............ 2.552607e+02
## mean............ 4.524836e+02
## variance........ 1.316494e+04
## var 6:
## best............ 2.126889e+02
## mean............ 2.783973e+02
## variance........ 2.228998e+04
##
## GENERATION: 2
## Lexical Fit..... 2.030087e-02 2.212342e-02 1.545841e-01 1.964681e-01 1.964681e-01 3.395078e-01 4.143426e-01 4.143426e-01 5.241463e-01 5.310622e-01 5.639124e-01 5.639124e-01
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 4.089335e+02
## mean............ 2.851074e+02
## variance........ 1.371594e+04
## var 2:
## best............ 6.233589e+02
## mean............ 4.156116e+02
## variance........ 3.590132e+04
## var 3:
## best............ 3.035372e+02
## mean............ 4.107470e+02
## variance........ 1.294899e+04
## var 4:
## best............ 9.650124e+02
## mean............ 9.451768e+02
## variance........ 4.937924e+03
## var 5:
## best............ 2.552607e+02
## mean............ 3.290438e+02
## variance........ 6.110964e+03
## var 6:
## best............ 2.126889e+02
## mean............ 2.718720e+02
## variance........ 1.026979e+04
##
## GENERATION: 3
## Lexical Fit..... 2.212342e-02 2.376619e-02 1.249280e-01 1.262261e-01 1.262261e-01 3.179272e-01 3.458234e-01 3.458234e-01 5.684948e-01 5.687506e-01 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 54
## var 1:
## best............ 3.721656e+02
## mean............ 3.867065e+02
## variance........ 8.492127e+03
## var 2:
## best............ 5.762529e+02
## mean............ 5.612998e+02
## variance........ 7.507412e+03
## var 3:
## best............ 8.139170e+02
## mean............ 3.597086e+02
## variance........ 2.103786e+04
## var 4:
## best............ 9.197144e+02
## mean............ 8.330766e+02
## variance........ 6.674020e+04
## var 5:
## best............ 2.413626e+02
## mean............ 2.785779e+02
## variance........ 7.565773e+03
## var 6:
## best............ 2.084056e+02
## mean............ 2.228478e+02
## variance........ 5.515775e+02
##
## GENERATION: 4
## Lexical Fit..... 2.748641e-02 3.770192e-02 1.541630e-01 1.964681e-01 1.964681e-01 4.143426e-01 4.143426e-01 4.405624e-01 5.639124e-01 5.639124e-01 6.604038e-01 8.314274e-01
## #unique......... 12, #Total UniqueCount: 66
## var 1:
## best............ 3.811564e+02
## mean............ 4.024830e+02
## variance........ 2.164124e+03
## var 2:
## best............ 6.136957e+02
## mean............ 6.017425e+02
## variance........ 3.291897e+02
## var 3:
## best............ 3.034552e+02
## mean............ 4.319642e+02
## variance........ 4.462998e+04
## var 4:
## best............ 9.608901e+02
## mean............ 9.181486e+02
## variance........ 1.471925e+04
## var 5:
## best............ 2.110031e+02
## mean............ 2.340237e+02
## variance........ 1.798344e+02
## var 6:
## best............ 2.321491e+02
## mean............ 2.340856e+02
## variance........ 4.435305e+03
##
## GENERATION: 5
## Lexical Fit..... 2.748641e-02 3.770192e-02 1.541630e-01 1.964681e-01 1.964681e-01 4.143426e-01 4.143426e-01 4.405624e-01 5.639124e-01 5.639124e-01 6.604038e-01 8.314274e-01
## #unique......... 13, #Total UniqueCount: 79
## var 1:
## best............ 3.811564e+02
## mean............ 3.822435e+02
## variance........ 2.813281e+01
## var 2:
## best............ 6.136957e+02
## mean............ 6.105143e+02
## variance........ 1.158756e+03
## var 3:
## best............ 3.034552e+02
## mean............ 4.985424e+02
## variance........ 5.999101e+04
## var 4:
## best............ 9.608901e+02
## mean............ 9.379574e+02
## variance........ 2.944820e+03
## var 5:
## best............ 2.110031e+02
## mean............ 2.219688e+02
## variance........ 1.606180e+02
## var 6:
## best............ 2.321491e+02
## mean............ 2.247934e+02
## variance........ 1.024363e+02
##
## GENERATION: 6
## Lexical Fit..... 4.780984e-02 1.119439e-01 1.896228e-01 1.964681e-01 1.964681e-01 2.533796e-01 2.902986e-01 3.146097e-01 4.143426e-01 4.143426e-01 5.639124e-01 5.639124e-01
## #unique......... 12, #Total UniqueCount: 91
## var 1:
## best............ 3.811564e+02
## mean............ 3.807309e+02
## variance........ 7.403831e-01
## var 2:
## best............ 6.136957e+02
## mean............ 6.145495e+02
## variance........ 6.960878e+00
## var 3:
## best............ 3.034552e+02
## mean............ 3.181222e+02
## variance........ 3.573759e+03
## var 4:
## best............ 9.608901e+02
## mean............ 9.005358e+02
## variance........ 2.660876e+04
## var 5:
## best............ 2.110031e+02
## mean............ 2.108401e+02
## variance........ 3.938060e+00
## var 6:
## best............ 7.198967e+00
## mean............ 2.182949e+02
## variance........ 2.973523e+03
##
## GENERATION: 7
## Lexical Fit..... 4.780984e-02 1.119439e-01 1.896228e-01 1.964681e-01 1.964681e-01 2.533796e-01 2.902986e-01 3.146097e-01 4.143426e-01 4.143426e-01 5.639124e-01 5.639124e-01
## #unique......... 12, #Total UniqueCount: 103
## var 1:
## best............ 3.811564e+02
## mean............ 3.760019e+02
## variance........ 2.896642e+02
## var 2:
## best............ 6.136957e+02
## mean............ 5.984221e+02
## variance........ 5.161668e+03
## var 3:
## best............ 3.034552e+02
## mean............ 3.800547e+02
## variance........ 7.889893e+03
## var 4:
## best............ 9.608901e+02
## mean............ 9.550376e+02
## variance........ 3.290507e+02
## var 5:
## best............ 2.110031e+02
## mean............ 2.110902e+02
## variance........ 6.586819e-02
## var 6:
## best............ 7.198967e+00
## mean............ 1.369670e+02
## variance........ 1.096155e+04
##
## GENERATION: 8
## Lexical Fit..... 6.102335e-02 1.119439e-01 1.262261e-01 1.262261e-01 1.896228e-01 2.533796e-01 2.622157e-01 3.458234e-01 3.458234e-01 3.484983e-01 1.000000e+00 1.000000e+00
## #unique......... 10, #Total UniqueCount: 113
## var 1:
## best............ 3.809839e+02
## mean............ 3.773696e+02
## variance........ 2.073871e+02
## var 2:
## best............ 6.136533e+02
## mean............ 6.136811e+02
## variance........ 4.067119e-04
## var 3:
## best............ 7.640723e+02
## mean............ 3.270495e+02
## variance........ 1.313160e+04
## var 4:
## best............ 9.608939e+02
## mean............ 9.388585e+02
## variance........ 7.261030e+03
## var 5:
## best............ 2.111863e+02
## mean............ 2.200837e+02
## variance........ 1.223702e+03
## var 6:
## best............ 7.208276e+00
## mean............ 6.270837e+01
## variance........ 4.621339e+04
##
## GENERATION: 9
## Lexical Fit..... 6.102335e-02 1.119439e-01 1.262261e-01 1.262261e-01 1.896228e-01 2.533796e-01 2.622157e-01 3.458234e-01 3.458234e-01 3.484983e-01 1.000000e+00 1.000000e+00
## #unique......... 10, #Total UniqueCount: 123
## var 1:
## best............ 3.809839e+02
## mean............ 3.876317e+02
## variance........ 6.532461e+02
## var 2:
## best............ 6.136533e+02
## mean............ 6.136642e+02
## variance........ 4.141544e-04
## var 3:
## best............ 7.640723e+02
## mean............ 5.872352e+02
## variance........ 5.522774e+04
## var 4:
## best............ 9.608939e+02
## mean............ 9.522263e+02
## variance........ 1.126474e+03
## var 5:
## best............ 2.111863e+02
## mean............ 2.111255e+02
## variance........ 1.195736e-02
## var 6:
## best............ 7.208276e+00
## mean............ 3.666885e+01
## variance........ 6.121125e+03
##
## GENERATION: 10
## Lexical Fit..... 6.102335e-02 1.119439e-01 1.262261e-01 1.262261e-01 1.896228e-01 2.533796e-01 2.622157e-01 3.458234e-01 3.458234e-01 3.484983e-01 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 136
## var 1:
## best............ 3.809839e+02
## mean............ 3.809951e+02
## variance........ 1.471041e-03
## var 2:
## best............ 6.136533e+02
## mean............ 5.903179e+02
## variance........ 9.839882e+03
## var 3:
## best............ 7.640723e+02
## mean............ 7.256049e+02
## variance........ 1.513418e+04
## var 4:
## best............ 9.608939e+02
## mean............ 9.375856e+02
## variance........ 8.144992e+03
## var 5:
## best............ 2.111863e+02
## mean............ 2.220140e+02
## variance........ 1.759337e+03
## var 6:
## best............ 7.208276e+00
## mean............ 7.208878e+00
## variance........ 1.357114e-05
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 6.102335e-02 1.119439e-01 1.262261e-01 1.262261e-01 1.896228e-01 2.533796e-01 2.622157e-01 3.458234e-01 3.458234e-01 3.484983e-01 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 3.809839e+02
## X[ 2] : 6.136533e+02
## X[ 3] : 7.640723e+02
## X[ 4] : 9.608939e+02
## X[ 5] : 2.111863e+02
## X[ 6] : 7.208276e+00
##
## Solution Found Generation 8
## Number of Generations Run 10
##
## Mon Dec 16 19:59:58 2024
## Total run time : 0 hours 0 minutes and 1 seconds
mout <- Match(Tr=foo$hasgirls, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
##
## Estimate... 0
## SE......... 0
## T-stat..... NaN
## p.val...... NA
##
## Original number of observations.............. 430
## Original number of treated obs............... 312
## Matched number of observations............... 430
## Matched number of observations (unweighted). 434
mb <- MatchBalance(
hasgirls ~ Dems +Repubs + Christian + age + srvlng + demvote,
match.out = mout, nboots=500, data = foo)
##
## ***** (V1) Dems *****
## Before Matching After Matching
## mean treatment........ 0.45833 0.47209
## mean control.......... 0.50847 0.47674
## std mean diff......... -10.047 -0.9306
##
## mean raw eQQ diff..... 0.050847 0.0046083
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.025071 0.0023041
## med eCDF diff........ 0.025071 0.0023041
## max eCDF diff........ 0.050141 0.0046083
##
## var ratio (Tr/Co)..... 0.98809 0.99905
## T-test p-value........ 0.35571 0.15706
##
##
## ***** (V2) Repubs *****
## Before Matching After Matching
## mean treatment........ 0.53846 0.52558
## mean control.......... 0.49153 0.52326
## std mean diff......... 9.4 0.46518
##
## mean raw eQQ diff..... 0.042373 0.0023041
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.023468 0.0011521
## med eCDF diff........ 0.023468 0.0011521
## max eCDF diff........ 0.046936 0.0023041
##
## var ratio (Tr/Co)..... 0.98911 0.99954
## T-test p-value........ 0.3873 0.31731
##
##
## ***** (V3) Christian *****
## Before Matching After Matching
## mean treatment........ 0.9391 0.94186
## mean control.......... 0.94915 0.94186
## std mean diff......... -4.1958 0
##
## mean raw eQQ diff..... 0.016949 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.005025 0
## med eCDF diff........ 0.005025 0
## max eCDF diff........ 0.01005 0
##
## var ratio (Tr/Co)..... 1.1787 1
## T-test p-value........ 0.68107 1
##
##
## ***** (V4) age *****
## Before Matching After Matching
## mean treatment........ 52.628 51.733
## mean control.......... 49.178 51.621
## std mean diff......... 38.385 1.2089
##
## mean raw eQQ diff..... 3.661 0.47926
## med raw eQQ diff..... 4 0
## max raw eQQ diff..... 7 7
##
## mean eCDF diff........ 0.075348 0.0090726
## med eCDF diff........ 0.075538 0.0069124
## max eCDF diff........ 0.17807 0.032258
##
## var ratio (Tr/Co)..... 0.71552 0.94853
## T-test p-value........ 0.0020402 0.22856
## KS Bootstrap p-value.. < 2.22e-16 0.91
## KS Naive p-value...... 0.0087659 0.97764
## KS Statistic.......... 0.17807 0.032258
##
##
## ***** (V5) srvlng *****
## Before Matching After Matching
## mean treatment........ 8.5865 8.5326
## mean control.......... 8.7458 8.6326
## std mean diff......... -2.1085 -1.3244
##
## mean raw eQQ diff..... 0.66949 0.40783
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 5 5
##
## mean eCDF diff........ 0.017181 0.010923
## med eCDF diff........ 0.01445 0.0069124
## max eCDF diff........ 0.051608 0.032258
##
## var ratio (Tr/Co)..... 0.77347 0.92055
## T-test p-value........ 0.85956 0.46051
## KS Bootstrap p-value.. 0.768 0.842
## KS Naive p-value...... 0.97653 0.97764
## KS Statistic.......... 0.051608 0.032258
##
##
## ***** (V6) demvote *****
## Before Matching After Matching
## mean treatment........ 0.49929 0.49953
## mean control.......... 0.50602 0.50884
## std mean diff......... -5.2747 -7.5845
##
## mean raw eQQ diff..... 0.011441 0.012488
## med raw eQQ diff..... 0.01 0.01
## max raw eQQ diff..... 0.08 0.08
##
## mean eCDF diff........ 0.015928 0.019092
## med eCDF diff........ 0.010811 0.013825
## max eCDF diff........ 0.048512 0.052995
##
## var ratio (Tr/Co)..... 1.1269 1.0179
## T-test p-value........ 0.61103 0.074488
## KS Bootstrap p-value.. 0.928 0.468
## KS Naive p-value...... 0.98776 0.57589
## KS Statistic.......... 0.048512 0.052995
##
##
## Before Matching Minimum p.value: < 2.22e-16
## Variable Name(s): age Number(s): 4
##
## After Matching Minimum p.value: 0.074488
## Variable Name(s): demvote Number(s): 6
After applying the matching method, we observed an improvement in
balance. For example, the mean of Dems
in the treatment
group and the control group was 0.45833 and 0.50847, respectively.
However, after matching, the mean number of Dems
in the
treatment and control groups both became 0.47209. The T-test p-value
increased to 1 from 0.35571, indicating that the distributions between
two groups had become quite similar. Additionally, other covariate
distributions also aligned, making it reasonable to derive potential
causal inferences from this analysis.
After_genmatch <- Match(Y = Y, Tr=Tr, X=X, M=3)
summary(After_genmatch)
##
## Estimate... -0.0013355
## AI SE...... 1.9563
## T-stat..... -0.00068264
## p.val...... 0.99946
##
## Original number of observations.............. 430
## Original number of treated obs............... 312
## Matched number of observations............... 312
## Matched number of observations (unweighted). 938
Based on the analysis, we can identify the treatment effect as -0.0013355, with a standard error of 1.9563. This means that the confidence interval for the treatment effect is estimated to be between -3.835684 and 3.833012 at a 95% confidence level. Therefore, we can conclude from the data that the presence of girls does not have a significant impact for political stance for voting.
Also, it is interesting to observe that through the matching process, the uncertainty was reduced compared to the simple regression model.
#filter data
treatment_group <- subset(foo, ngirls == 2 & nboys == 0)
control_group <- subset(foo, nboys == 2 & ngirls == 0)
filtered_data <- rbind(treatment_group, control_group)
#head(filtered_data)
## Match on the confounders below...
foo <- filtered_data
reg2 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)
summary(reg1)
##
## Call:
## lm(formula = nowtot ~ hasgirls + Dems + Repubs + Christian +
## age + srvlng + demvote, data = foo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.028 -10.322 -1.517 11.208 69.642
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.6991 18.6306 2.077 0.038390 *
## hasgirls -0.4523 1.9036 -0.238 0.812322
## Dems -8.1022 17.5861 -0.461 0.645238
## Repubs -55.1069 17.6340 -3.125 0.001901 **
## Christian -13.3961 3.7218 -3.599 0.000357 ***
## age 0.1260 0.1117 1.128 0.259938
## srvlng -0.2251 0.1355 -1.662 0.097349 .
## demvote 87.5501 8.4847 10.319 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared: 0.7821, Adjusted R-squared: 0.7784
## F-statistic: 216.3 on 7 and 422 DF, p-value: < 2.2e-16
X <- cbind(foo$Dems, foo$Repubs, foo$Christian, foo$age, foo$srvlng, foo$demvote)
Tr <- foo$hasgirls
Y <- foo$nowtot
genout <- GenMatch(Tr = Tr, estimand="ATT", X = X, M=2, pop.size=16, max.generations=10, wait.generations=1)
##
##
## Mon Dec 16 19:59:58 2024
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 7.836441e-02 7.836441e-02 2.463540e-01 2.753149e-01 2.944328e-01 3.295378e-01 3.861067e-01 6.226498e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 8.154814e+02
## mean............ 5.546160e+02
## variance........ 7.597385e+04
## var 2:
## best............ 6.177120e+02
## mean............ 4.747410e+02
## variance........ 4.177031e+04
## var 3:
## best............ 9.321378e+02
## mean............ 4.587515e+02
## variance........ 7.731167e+04
## var 4:
## best............ 7.666736e+00
## mean............ 3.968721e+02
## variance........ 9.574306e+04
## var 5:
## best............ 3.048386e+02
## mean............ 4.648337e+02
## variance........ 1.317992e+05
## var 6:
## best............ 7.590850e+02
## mean............ 3.857127e+02
## variance........ 5.057271e+04
##
## GENERATION: 1
## Lexical Fit..... 7.836441e-02 7.836441e-02 2.463540e-01 2.753149e-01 2.944328e-01 3.295378e-01 3.861067e-01 6.226498e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 8.154814e+02
## mean............ 7.759390e+02
## variance........ 4.928724e+03
## var 2:
## best............ 6.177120e+02
## mean............ 4.977584e+02
## variance........ 1.058235e+04
## var 3:
## best............ 9.321378e+02
## mean............ 6.283226e+02
## variance........ 9.227468e+04
## var 4:
## best............ 7.666736e+00
## mean............ 1.168220e+02
## variance........ 1.205825e+04
## var 5:
## best............ 3.048386e+02
## mean............ 3.712163e+02
## variance........ 1.130965e+05
## var 6:
## best............ 7.590850e+02
## mean............ 5.619880e+02
## variance........ 6.564955e+04
##
## GENERATION: 2
## Lexical Fit..... 7.836441e-02 7.836441e-02 2.463540e-01 2.753149e-01 2.944328e-01 3.295378e-01 3.861067e-01 6.226498e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 11, #Total UniqueCount: 40
## var 1:
## best............ 8.154814e+02
## mean............ 7.984996e+02
## variance........ 7.521928e+03
## var 2:
## best............ 6.177120e+02
## mean............ 6.237730e+02
## variance........ 7.316444e+02
## var 3:
## best............ 9.321378e+02
## mean............ 9.207833e+02
## variance........ 1.919584e+03
## var 4:
## best............ 7.666736e+00
## mean............ 1.048392e+01
## variance........ 1.270550e+02
## var 5:
## best............ 3.048386e+02
## mean............ 3.218381e+02
## variance........ 5.475228e+03
## var 6:
## best............ 7.590850e+02
## mean............ 6.892854e+02
## variance........ 3.218285e+04
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 7.836441e-02 7.836441e-02 2.463540e-01 2.753149e-01 2.944328e-01 3.295378e-01 3.861067e-01 6.226498e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 8.154814e+02
## X[ 2] : 6.177120e+02
## X[ 3] : 9.321378e+02
## X[ 4] : 7.666736e+00
## X[ 5] : 3.048386e+02
## X[ 6] : 7.590850e+02
##
## Solution Found Generation 1
## Number of Generations Run 2
##
## Mon Dec 16 19:59:58 2024
## Total run time : 0 hours 0 minutes and 0 seconds
mout <- Match(Tr=foo$hasgirls, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
##
## Estimate... 0
## SE......... 0
## T-stat..... NaN
## p.val...... NA
##
## Original number of observations.............. 59
## Original number of treated obs............... 31
## Matched number of observations............... 59
## Matched number of observations (unweighted). 59
mb <- MatchBalance(
hasgirls ~ Dems +Repubs + Christian + age + srvlng + demvote,
match.out = mout, nboots=500, data = foo)
##
## ***** (V1) Dems *****
## Before Matching After Matching
## mean treatment........ 0.64516 0.54237
## mean control.......... 0.42857 0.54237
## std mean diff......... 44.532 0
##
## mean raw eQQ diff..... 0.21429 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.10829 0
## med eCDF diff........ 0.10829 0
## max eCDF diff........ 0.21659 0
##
## var ratio (Tr/Co)..... 0.93145 1
## T-test p-value........ 0.099329 1
##
##
## ***** (V2) Repubs *****
## Before Matching After Matching
## mean treatment........ 0.35484 0.45763
## mean control.......... 0.57143 0.45763
## std mean diff......... -44.532 0
##
## mean raw eQQ diff..... 0.21429 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.10829 0
## med eCDF diff........ 0.10829 0
## max eCDF diff........ 0.21659 0
##
## var ratio (Tr/Co)..... 0.93145 1
## T-test p-value........ 0.099329 1
##
##
## ***** (V3) Christian *****
## Before Matching After Matching
## mean treatment........ 0.90323 0.94915
## mean control.......... 1 1
## std mean diff......... -32.2 -22.949
##
## mean raw eQQ diff..... 0.10714 0.050847
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.048387 0.025424
## med eCDF diff........ 0.048387 0.025424
## max eCDF diff........ 0.096774 0.050847
##
## var ratio (Tr/Co)..... Inf Inf
## T-test p-value........ 0.083087 0.080673
##
##
## ***** (V4) age *****
## Before Matching After Matching
## mean treatment........ 48.226 48.424
## mean control.......... 49.857 47.593
## std mean diff......... -19.026 10.29
##
## mean raw eQQ diff..... 2.4643 1.0678
## med raw eQQ diff..... 2.5 0
## max raw eQQ diff..... 5 8
##
## mean eCDF diff........ 0.061382 0.027797
## med eCDF diff........ 0.051843 0.016949
## max eCDF diff........ 0.13479 0.084746
##
## var ratio (Tr/Co)..... 1.0028 1.1312
## T-test p-value........ 0.46822 0.4593
## KS Bootstrap p-value.. 0.79 0.928
## KS Naive p-value...... 0.80669 0.90803
## KS Statistic.......... 0.13479 0.084746
##
##
## ***** (V5) srvlng *****
## Before Matching After Matching
## mean treatment........ 7.5484 7.8475
## mean control.......... 9.6071 8.4407
## std mean diff......... -28.926 -8.1437
##
## mean raw eQQ diff..... 2.4286 1
## med raw eQQ diff..... 1 0
## max raw eQQ diff..... 10 8
##
## mean eCDF diff........ 0.066172 0.03072
## med eCDF diff........ 0.05818 0.033898
## max eCDF diff........ 0.17051 0.084746
##
## var ratio (Tr/Co)..... 0.60661 0.74509
## T-test p-value........ 0.34249 0.17321
## KS Bootstrap p-value.. 0.476 0.83
## KS Naive p-value...... 0.49336 0.80634
## KS Statistic.......... 0.17051 0.084746
##
##
## ***** (V6) demvote *****
## Before Matching After Matching
## mean treatment........ 0.52677 0.51119
## mean control.......... 0.50714 0.51576
## std mean diff......... 15.554 -3.6245
##
## mean raw eQQ diff..... 0.05 0.022542
## med raw eQQ diff..... 0.05 0.02
## max raw eQQ diff..... 0.12 0.08
##
## mean eCDF diff........ 0.10108 0.041874
## med eCDF diff........ 0.066244 0.033898
## max eCDF diff........ 0.29493 0.10169
##
## var ratio (Tr/Co)..... 0.88501 1.0516
## T-test p-value........ 0.56612 0.45914
## KS Bootstrap p-value.. 0.13 0.798
## KS Naive p-value...... 0.099395 0.8412
## KS Statistic.......... 0.29493 0.10169
##
##
## Before Matching Minimum p.value: 0.083087
## Variable Name(s): Christian Number(s): 3
##
## After Matching Minimum p.value: 0.080673
## Variable Name(s): Christian Number(s): 3
After_genmatch <- Match(Y = Y, Tr=Tr, X=X, M=2)
summary(After_genmatch)
##
## Estimate... 15.484
## AI SE...... 5.1617
## T-stat..... 2.9998
## p.val...... 0.0027017
##
## Original number of observations.............. 59
## Original number of treated obs............... 31
## Matched number of observations............... 31
## Matched number of observations (unweighted). 62
Based on the analysis, we can identify the treatment effect as 15.484, with a standard error of 5.1617. This means that the confidence interval for the treatment effect is estimated to be between 5.367068 and 25.60093 at a 95% confidence level. Therefore, we can conclude from the data that in the “high dosed” scenario, having two girls causing statistically significant on outcome variable, in contrast to having two boys.
hasgirls
and
totchi
are part of the treatment group
definition and therefore directly reflects whether a sample is belonged
to (having daughters vs. not having daughters). So, it is not ideal to
match or balance on these variables, it would weaken the effectiveness
of the analysis because of the inherently tied connection.
From our CS class we learnt that randomised controlled trials (RCTs) are the gold standard for making causal inferences. However, policy is typically not conducted in an experimental setting. Convenience sampling is the common scenario. Hence, how we can address the confounding to make an apple-to-apple comparison is critical in observational research.
This article not only combines different matching methods(e.g. exact, calliper, genetic method) as an approach but also provides actionable results to inform the survey team of the targets to follow up with after the treatment. Traditionally, or in previous classes, we dealt with static data. However, with this article, I can see how the matching method can be utilized in a dynamic way.
My only concern is whether it is possible to optimize the number of treated units while maintaining a good balance. The article mentions that it’s possible to achieve a more efficient algorithm by adjusting the settings in repeated trial-and-error. But, could machine learning for optimization be applied here, given that our objective function is clearly defined as a combination of the number of treatment units and a balance?
Because there is a potential causal inference we can draw from the “high dose” scenario, I will use it as the subset to demonstrate the sensitivity analysis.
foo <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSY9jLlufY1GjeMh7D2_g1m6olveHLNCerT2C36MTkcjwQCOlZYf8evLMzGOnc252OgXEEasHqcNIcZ/pub?gid=1976818127&single=true&output=csv")
# loads package
library(sensemakr)
## Warning: package 'sensemakr' was built under R version 4.3.3
## See details in:
## Carlos Cinelli and Chad Hazlett (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias. Journal of the Royal Statistical Society, Series B (Statistical Methodology).
#filter data
treatment_group <- subset(foo, ngirls == 2 & nboys == 0)
control_group <- subset(foo, nboys == 2 & ngirls == 0)
filtered_data <- rbind(treatment_group, control_group)
While our data has multiple covariates, I was considering which covariate to select as the benchmark for the sensitivity test. Therefore, I employed the PCA method to identify the most significant covariate that explains the major variation.
covariates <- foo[, c("Dems","Repubs","Christian","age","srvlng","demvote")]
pca <- prcomp(covariates, scale. = TRUE)
summary(pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 1.6219 1.2517 0.9600 0.69705 0.6250 0.06680
## Proportion of Variance 0.4385 0.2611 0.1536 0.08098 0.0651 0.00074
## Cumulative Proportion 0.4385 0.6996 0.8532 0.93416 0.9993 1.00000
pca$rotation
## PC1 PC2 PC3 PC4 PC5
## Dems -0.5805602 0.14127956 -0.14803652 0.350220423 0.00896927
## Repubs 0.5822635 -0.14002613 0.12847497 -0.350920742 -0.01644348
## Christian 0.2121854 0.09021739 -0.96910461 -0.064435959 -0.05765627
## age -0.1487362 -0.68381376 -0.13451859 -0.050724696 0.69971515
## srvlng -0.1705498 -0.67912390 -0.05768713 -0.007351973 -0.71155291
## demvote -0.4771653 0.15324183 -0.03150505 -0.864535279 -0.02039802
## PC6
## Dems 0.7059322509
## Repubs 0.7081245085
## Christian -0.0140784699
## age 0.0025977429
## srvlng -0.0037552662
## demvote -0.0005328732
biplot(pca, main = "PCA Biplot", cex = 0.8)
For picking a benchmark covariate from the plot, I exclude party variables because they straightforwardly indicate political stance. So, I choose “Christian” as the benchmark covariate to conduct the following sensitivity analysis.
foo <- filtered_data
reg2 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)
#summary(reg1)
daughters.sensitivity <- sensemakr(model = reg2,
treatment = "hasgirls",
benchmark_covariates = "Christian",
kd = 1:3,
)
daughters.sensitivity
## Sensitivity Analysis to Unobserved Confounding
##
## Model Formula: nowtot ~ hasgirls + Dems + Repubs + Christian + age + srvlng +
## demvote
##
## Null hypothesis: q = 1 and reduce = TRUE
##
## Unadjusted Estimates of ' hasgirls ':
## Coef. estimate: 13.20268
## Standard Error: 3.76197
## t-value: 3.50951
##
## Sensitivity Statistics:
## Partial R2 of treatment with outcome: 0.1915
## Robustness Value, q = 1 : 0.38245
## Robustness Value, q = 1 alpha = 0.05 : 0.18552
##
## For more information, check summary.
#ovb_minimal_reporting(daughters.sensitivity, format = "latex")
From the table above, the robustness value is crucial, indicating that unobserved confounders must account for at least 38.2% of the residual variance of both the treatment and the outcome to significantly impact the results.
Since our benchmark covariant “Christian” partial R-squares is 6.4%, which is lower than 38.2%. Thus, we can claim that any potential unobserved confounder as powerful as “Christian” is not sufficient enough to affect the outcomes.
par(mfrow = c(1, 2))
plot(daughters.sensitivity)
plot(daughters.sensitivity, sensitivity.of = "t-value")
par(mfrow = c(1, 1))
From the figure above, we can see even unobserved confounders even three times as strong as the “Christian”, can not bring the effect size down to 0.
As for the uncertainty aspect that the unobserved confounders can contribute, we can examine our t-value plot, from the plot we can claim that any potential unobserved confounder even three times as powerful as “Christian” is not sufficient enough to make the estimate statistically insignificant.
plot(daughters.sensitivity, type = "extreme")
## Warning in rug(x = r2dz.x, col = "red", lwd = 2): some values will be clipped
In our final visualization, we simulate different hypothetical scenarios indicating strengths even once and twice as strong as the “Christian” covariate, the unobserved confounders neither can not bring the adjusted effect down to zero in those extreme scenarios.