foo <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSY9jLlufY1GjeMh7D2_g1m6olveHLNCerT2C36MTkcjwQCOlZYf8evLMzGOnc252OgXEEasHqcNIcZ/pub?gid=1976818127&single=true&output=csv")
reg1 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)
summary(reg1)
##
## Call:
## lm(formula = nowtot ~ hasgirls + Dems + Repubs + Christian +
## age + srvlng + demvote, data = foo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.028 -10.322 -1.517 11.208 69.642
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.6991 18.6306 2.077 0.038390 *
## hasgirls -0.4523 1.9036 -0.238 0.812322
## Dems -8.1022 17.5861 -0.461 0.645238
## Repubs -55.1069 17.6340 -3.125 0.001901 **
## Christian -13.3961 3.7218 -3.599 0.000357 ***
## age 0.1260 0.1117 1.128 0.259938
## srvlng -0.2251 0.1355 -1.662 0.097349 .
## demvote 87.5501 8.4847 10.319 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared: 0.7821, Adjusted R-squared: 0.7784
## F-statistic: 216.3 on 7 and 422 DF, p-value: < 2.2e-16
library(ggplot2)
library(gridExtra)
# plot the distribution
plot1 <- ggplot(foo, aes(x = Dems, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Dems by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot2 <- ggplot(foo, aes(x = Repubs, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Repubs by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot3 <- ggplot(foo, aes(x = Christian, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Christian by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot4 <- ggplot(foo, aes(x = age, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Age by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot5 <- ggplot(foo, aes(x = srvlng, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of srvlng by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
plot6 <- ggplot(foo, aes(x = demvote, fill = factor(hasgirls))) +
geom_density(alpha = 0.5) +
ggtitle("Distribution of Demvote by hasgirls") +
theme_minimal() +
scale_fill_manual(values = c("blue", "red"))
# layout
grid.arrange(plot1, plot2, plot3, plot4, plot5, plot6, ncol = 2)
The treatment effect of having girls is -0.4523, with a standard error of 18.63. This means the confidence interval is estimated to be between -36.8077 and 37.7123 at a 95% confidence level. It is important to note that this treatment effect does not take into account the influence of confounding factors. To address this, we will introduce “Matching” in below session.
library(Matching)
## Warning: package 'Matching' was built under R version 4.3.3
## Loading required package: MASS
## ##
## ## Matching (Version 4.10-15, Build Date: 2024-10-14)
## ## See https://www.jsekhon.com for additional documentation.
## ## Please cite software as:
## ## Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ## Software with Automated Balance Optimization: The Matching package for R.''
## ## Journal of Statistical Software, 42(7): 1-52.
## ##
## Match on the confounders below...
X <- cbind(foo$Dems, foo$Repubs, foo$Christian, foo$age, foo$srvlng, foo$demvote)
Tr <- foo$hasgirls
Y <- foo$nowtot
genout <- GenMatch(Tr = Tr, estimand="ATT", X = X, M=3, pop.size=16, max.generations=10, wait.generations=1)
## Loading required namespace: rgenoud
##
##
## Mon Dec 16 12:55:48 2024
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 2.880402e-02 6.050129e-02 1.569680e-01 1.569680e-01 2.522714e-01 4.387264e-01 4.387264e-01 6.165314e-01 8.309462e-01 8.607924e-01 1.000000e+00 1.000000e+00
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 9.658339e+02
## mean............ 5.264171e+02
## variance........ 9.051955e+04
## var 2:
## best............ 4.545248e+02
## mean............ 4.122412e+02
## variance........ 6.291642e+04
## var 3:
## best............ 7.433343e+02
## mean............ 4.396681e+02
## variance........ 6.255117e+04
## var 4:
## best............ 9.957218e+02
## mean............ 5.524709e+02
## variance........ 1.109960e+05
## var 5:
## best............ 2.877353e+01
## mean............ 3.707807e+02
## variance........ 8.472810e+04
## var 6:
## best............ 7.293103e+01
## mean............ 4.915480e+02
## variance........ 1.266730e+05
##
## GENERATION: 1
## Lexical Fit..... 3.770192e-02 4.137054e-02 1.569680e-01 1.569680e-01 2.300324e-01 4.315719e-01 4.387264e-01 4.387264e-01 5.303673e-01 9.798002e-01 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 7.823778e+02
## mean............ 6.087362e+02
## variance........ 9.230380e+04
## var 2:
## best............ 4.369617e+02
## mean............ 3.439781e+02
## variance........ 8.366097e+03
## var 3:
## best............ 6.872927e+02
## mean............ 7.093641e+02
## variance........ 2.082159e+04
## var 4:
## best............ 7.425135e+02
## mean............ 7.157633e+02
## variance........ 6.047397e+04
## var 5:
## best............ 4.314048e+01
## mean............ 3.255320e+02
## variance........ 7.072203e+04
## var 6:
## best............ 1.152656e+02
## mean............ 1.183648e+02
## variance........ 1.513796e+04
##
## GENERATION: 2
## Lexical Fit..... 4.890113e-02 5.751797e-02 1.545841e-01 1.569680e-01 1.569680e-01 4.387264e-01 4.387264e-01 5.342514e-01 6.071759e-01 7.260032e-01 1.000000e+00 1.000000e+00
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 6.627449e+02
## mean............ 8.461707e+02
## variance........ 8.792016e+03
## var 2:
## best............ 4.250508e+02
## mean............ 4.303543e+02
## variance........ 3.333956e+03
## var 3:
## best............ 6.507477e+02
## mean............ 6.652219e+02
## variance........ 1.288317e+04
## var 4:
## best............ 5.773947e+02
## mean............ 8.448381e+02
## variance........ 1.554261e+04
## var 5:
## best............ 5.720602e+01
## mean............ 4.137889e+01
## variance........ 2.784064e+02
## var 6:
## best............ 1.428723e+02
## mean............ 1.092834e+02
## variance........ 1.641250e+03
##
## GENERATION: 3
## Lexical Fit..... 4.890113e-02 5.751797e-02 1.545841e-01 1.569680e-01 1.569680e-01 4.387264e-01 4.387264e-01 5.342514e-01 6.071759e-01 7.260032e-01 1.000000e+00 1.000000e+00
## #unique......... 12, #Total UniqueCount: 53
## var 1:
## best............ 6.627449e+02
## mean............ 7.346339e+02
## variance........ 1.340572e+04
## var 2:
## best............ 4.250508e+02
## mean............ 4.403844e+02
## variance........ 2.550566e+03
## var 3:
## best............ 6.507477e+02
## mean............ 6.943771e+02
## variance........ 4.122676e+03
## var 4:
## best............ 5.773947e+02
## mean............ 6.768492e+02
## variance........ 2.903826e+04
## var 5:
## best............ 5.720602e+01
## mean............ 4.955038e+01
## variance........ 2.190047e+02
## var 6:
## best............ 1.428723e+02
## mean............ 1.557070e+02
## variance........ 1.622773e+03
##
## GENERATION: 4
## Lexical Fit..... 4.890113e-02 6.516602e-02 1.262261e-01 1.262261e-01 1.713969e-01 2.480885e-01 2.480885e-01 4.876947e-01 6.071759e-01 7.543029e-01 1.000000e+00 1.000000e+00
## #unique......... 10, #Total UniqueCount: 63
## var 1:
## best............ 5.530065e+02
## mean............ 6.730817e+02
## variance........ 2.888852e+03
## var 2:
## best............ 7.895890e+02
## mean............ 4.660923e+02
## variance........ 7.410925e+03
## var 3:
## best............ 7.887057e+02
## mean............ 7.062528e+02
## variance........ 1.072215e+04
## var 4:
## best............ 5.773947e+02
## mean............ 5.484422e+02
## variance........ 2.414415e+04
## var 5:
## best............ 5.720602e+01
## mean............ 5.533075e+01
## variance........ 2.771106e+01
## var 6:
## best............ 1.479595e+02
## mean............ 1.503942e+02
## variance........ 2.346288e+02
##
## GENERATION: 5
## Lexical Fit..... 4.890113e-02 7.145467e-02 1.569680e-01 1.569680e-01 1.713969e-01 4.387264e-01 4.387264e-01 5.120429e-01 6.460242e-01 8.407361e-01 1.000000e+00 1.000000e+00
## #unique......... 12, #Total UniqueCount: 75
## var 1:
## best............ 6.608709e+02
## mean............ 5.824787e+02
## variance........ 3.104587e+03
## var 2:
## best............ 4.726607e+02
## mean............ 6.633427e+02
## variance........ 3.285063e+04
## var 3:
## best............ 7.882436e+02
## mean............ 7.766112e+02
## variance........ 3.958048e+03
## var 4:
## best............ 5.773947e+02
## mean............ 5.682457e+02
## variance........ 1.334906e+03
## var 5:
## best............ 5.720602e+01
## mean............ 9.804236e+01
## variance........ 2.497203e+04
## var 6:
## best............ 1.513320e+02
## mean............ 1.474768e+02
## variance........ 8.652128e+00
##
## GENERATION: 6
## Lexical Fit..... 4.890113e-02 7.145467e-02 1.569680e-01 1.569680e-01 1.713969e-01 4.387264e-01 4.387264e-01 5.178081e-01 6.071759e-01 8.045222e-01 1.000000e+00 1.000000e+00
## #unique......... 11, #Total UniqueCount: 86
## var 1:
## best............ 6.270355e+02
## mean............ 5.571419e+02
## variance........ 2.426231e+04
## var 2:
## best............ 5.739976e+02
## mean............ 6.080216e+02
## variance........ 3.411085e+04
## var 3:
## best............ 7.837100e+02
## mean............ 7.805032e+02
## variance........ 1.799056e+02
## var 4:
## best............ 5.773947e+02
## mean............ 5.755886e+02
## variance........ 4.016482e+01
## var 5:
## best............ 5.720602e+01
## mean............ 5.732375e+01
## variance........ 2.126732e-01
## var 6:
## best............ 1.506628e+02
## mean............ 1.454469e+02
## variance........ 3.644913e+02
##
## GENERATION: 7
## Lexical Fit..... 4.890113e-02 7.145467e-02 1.569680e-01 1.569680e-01 1.713969e-01 4.387264e-01 4.387264e-01 5.178081e-01 6.071759e-01 8.045222e-01 1.000000e+00 1.000000e+00
## #unique......... 10, #Total UniqueCount: 96
## var 1:
## best............ 6.270355e+02
## mean............ 6.626791e+02
## variance........ 4.104900e+03
## var 2:
## best............ 5.739976e+02
## mean............ 5.357816e+02
## variance........ 3.483028e+03
## var 3:
## best............ 7.837100e+02
## mean............ 7.738758e+02
## variance........ 1.944608e+03
## var 4:
## best............ 5.773947e+02
## mean............ 5.653012e+02
## variance........ 2.196797e+03
## var 5:
## best............ 5.720602e+01
## mean............ 5.723066e+01
## variance........ 8.757528e-03
## var 6:
## best............ 1.506628e+02
## mean............ 1.673098e+02
## variance........ 2.105230e+03
##
## GENERATION: 8
## Lexical Fit..... 4.890113e-02 7.145467e-02 1.569680e-01 1.569680e-01 1.713969e-01 4.387264e-01 4.387264e-01 5.178081e-01 6.071759e-01 8.045222e-01 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 109
## var 1:
## best............ 6.270355e+02
## mean............ 6.444993e+02
## variance........ 2.540632e+03
## var 2:
## best............ 5.739976e+02
## mean............ 5.440684e+02
## variance........ 1.353217e+04
## var 3:
## best............ 7.837100e+02
## mean............ 7.596487e+02
## variance........ 9.191376e+03
## var 4:
## best............ 5.773947e+02
## mean............ 5.773935e+02
## variance........ 2.642594e-05
## var 5:
## best............ 5.720602e+01
## mean............ 1.303279e+02
## variance........ 4.725372e+04
## var 6:
## best............ 1.506628e+02
## mean............ 1.757487e+02
## variance........ 9.477804e+03
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 4.890113e-02 7.145467e-02 1.569680e-01 1.569680e-01 1.713969e-01 4.387264e-01 4.387264e-01 5.178081e-01 6.071759e-01 8.045222e-01 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 6.270355e+02
## X[ 2] : 5.739976e+02
## X[ 3] : 7.837100e+02
## X[ 4] : 5.773947e+02
## X[ 5] : 5.720602e+01
## X[ 6] : 1.506628e+02
##
## Solution Found Generation 6
## Number of Generations Run 8
##
## Mon Dec 16 12:55:48 2024
## Total run time : 0 hours 0 minutes and 0 seconds
mout <- Match(Tr=foo$hasgirls, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
##
## Estimate... 0
## SE......... 0
## T-stat..... NaN
## p.val...... NA
##
## Original number of observations.............. 430
## Original number of treated obs............... 312
## Matched number of observations............... 430
## Matched number of observations (unweighted). 434
mb <- MatchBalance(
hasgirls ~ Dems +Repubs + Christian + age + srvlng + demvote,
match.out = mout, nboots=500, data = foo)
##
## ***** (V1) Dems *****
## Before Matching After Matching
## mean treatment........ 0.45833 0.47209
## mean control.......... 0.50847 0.47209
## std mean diff......... -10.047 0
##
## mean raw eQQ diff..... 0.050847 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.025071 0
## med eCDF diff........ 0.025071 0
## max eCDF diff........ 0.050141 0
##
## var ratio (Tr/Co)..... 0.98809 1
## T-test p-value........ 0.35571 1
##
##
## ***** (V2) Repubs *****
## Before Matching After Matching
## mean treatment........ 0.53846 0.52558
## mean control.......... 0.49153 0.52791
## std mean diff......... 9.4 -0.46518
##
## mean raw eQQ diff..... 0.042373 0.0023041
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.023468 0.0011521
## med eCDF diff........ 0.023468 0.0011521
## max eCDF diff........ 0.046936 0.0023041
##
## var ratio (Tr/Co)..... 0.98911 1.0005
## T-test p-value........ 0.3873 0.31731
##
##
## ***** (V3) Christian *****
## Before Matching After Matching
## mean treatment........ 0.9391 0.94186
## mean control.......... 0.94915 0.94186
## std mean diff......... -4.1958 0
##
## mean raw eQQ diff..... 0.016949 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.005025 0
## med eCDF diff........ 0.005025 0
## max eCDF diff........ 0.01005 0
##
## var ratio (Tr/Co)..... 1.1787 1
## T-test p-value........ 0.68107 1
##
##
## ***** (V4) age *****
## Before Matching After Matching
## mean treatment........ 52.628 51.686
## mean control.......... 49.178 51.479
## std mean diff......... 38.385 2.2396
##
## mean raw eQQ diff..... 3.661 0.56682
## med raw eQQ diff..... 4 0
## max raw eQQ diff..... 7 7
##
## mean eCDF diff........ 0.075348 0.010849
## med eCDF diff........ 0.075538 0.0092166
## max eCDF diff........ 0.17807 0.036866
##
## var ratio (Tr/Co)..... 0.71552 0.94531
## T-test p-value........ 0.0020402 0.093283
## KS Bootstrap p-value.. 0.008 0.784
## KS Naive p-value...... 0.0087659 0.9296
## KS Statistic.......... 0.17807 0.036866
##
##
## ***** (V5) srvlng *****
## Before Matching After Matching
## mean treatment........ 8.5865 8.3884
## mean control.......... 8.7458 8.5837
## std mean diff......... -2.1085 -2.6375
##
## mean raw eQQ diff..... 0.66949 0.4424
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 5 5
##
## mean eCDF diff........ 0.017181 0.011265
## med eCDF diff........ 0.01445 0.011521
## max eCDF diff........ 0.051608 0.029954
##
## var ratio (Tr/Co)..... 0.77347 0.87885
## T-test p-value........ 0.85956 0.32982
## KS Bootstrap p-value.. 0.754 0.864
## KS Naive p-value...... 0.97653 0.98994
## KS Statistic.......... 0.051608 0.029954
##
##
## ***** (V6) demvote *****
## Before Matching After Matching
## mean treatment........ 0.49929 0.50181
## mean control.......... 0.50602 0.50249
## std mean diff......... -5.2747 -0.53816
##
## mean raw eQQ diff..... 0.011441 0.0072581
## med raw eQQ diff..... 0.01 0
## max raw eQQ diff..... 0.08 0.08
##
## mean eCDF diff........ 0.015928 0.010862
## med eCDF diff........ 0.010811 0.0069124
## max eCDF diff........ 0.048512 0.032258
##
## var ratio (Tr/Co)..... 1.1269 1.0992
## T-test p-value........ 0.61103 0.75436
## KS Bootstrap p-value.. 0.918 0.908
## KS Naive p-value...... 0.98776 0.97764
## KS Statistic.......... 0.048512 0.032258
##
##
## Before Matching Minimum p.value: 0.0020402
## Variable Name(s): age Number(s): 4
##
## After Matching Minimum p.value: 0.093283
## Variable Name(s): age Number(s): 4
After applying the matching method, we observed an improvement in
balance. For example, the mean of Dems
in the treatment
group and the control group was 0.45833 and 0.50847, respectively.
However, after matching, the mean number of Dems
in the
treatment and control groups both became 0.47209. The T-test p-value
increased to 1 from 0.35571, indicating that the distributions between
two groups had become quite similar. Additionally, other covariate
distributions also aligned, making it reasonable to derive potential
causal inferences from this analysis.
After_genmatch <- Match(Y = Y, Tr=Tr, X=X, M=3)
summary(After_genmatch)
##
## Estimate... -0.0013355
## AI SE...... 1.9563
## T-stat..... -0.00068264
## p.val...... 0.99946
##
## Original number of observations.............. 430
## Original number of treated obs............... 312
## Matched number of observations............... 312
## Matched number of observations (unweighted). 938
Based on the analysis, we can identify the treatment effect as -0.0013355, with a standard error of 1.9563. This means that the confidence interval for the treatment effect is estimated to be between -3.835684 and 3.833012 at a 95% confidence level. Therefore, we can conclude from the data that the presence of girls does not have a significant impact for political stance for voting.
Also, it is interesting to observe that through the matching process, the uncertainty was reduced compared to the simple regression model.
#filter data
treatment_group <- subset(foo, ngirls == 2 & nboys == 0)
control_group <- subset(foo, nboys == 2 & ngirls == 0)
filtered_data <- rbind(treatment_group, control_group)
#head(filtered_data)
## Match on the confounders below...
foo <- filtered_data
reg2 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)
summary(reg1)
##
## Call:
## lm(formula = nowtot ~ hasgirls + Dems + Repubs + Christian +
## age + srvlng + demvote, data = foo)
##
## Residuals:
## Min 1Q Median 3Q Max
## -56.028 -10.322 -1.517 11.208 69.642
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.6991 18.6306 2.077 0.038390 *
## hasgirls -0.4523 1.9036 -0.238 0.812322
## Dems -8.1022 17.5861 -0.461 0.645238
## Repubs -55.1069 17.6340 -3.125 0.001901 **
## Christian -13.3961 3.7218 -3.599 0.000357 ***
## age 0.1260 0.1117 1.128 0.259938
## srvlng -0.2251 0.1355 -1.662 0.097349 .
## demvote 87.5501 8.4847 10.319 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared: 0.7821, Adjusted R-squared: 0.7784
## F-statistic: 216.3 on 7 and 422 DF, p-value: < 2.2e-16
X <- cbind(foo$Dems, foo$Repubs, foo$Christian, foo$age, foo$srvlng, foo$demvote)
Tr <- foo$hasgirls
Y <- foo$nowtot
genout <- GenMatch(Tr = Tr, estimand="ATT", X = X, M=2, pop.size=16, max.generations=10, wait.generations=1)
##
##
## Mon Dec 16 12:55:48 2024
## Domains:
## 0.000000e+00 <= X1 <= 1.000000e+03
## 0.000000e+00 <= X2 <= 1.000000e+03
## 0.000000e+00 <= X3 <= 1.000000e+03
## 0.000000e+00 <= X4 <= 1.000000e+03
## 0.000000e+00 <= X5 <= 1.000000e+03
## 0.000000e+00 <= X6 <= 1.000000e+03
##
## Data Type: Floating Point
## Operators (code number, name, population)
## (1) Cloning........................... 1
## (2) Uniform Mutation.................. 2
## (3) Boundary Mutation................. 2
## (4) Non-Uniform Mutation.............. 2
## (5) Polytope Crossover................ 2
## (6) Simple Crossover.................. 2
## (7) Whole Non-Uniform Mutation........ 2
## (8) Heuristic Crossover............... 2
## (9) Local-Minimum Crossover........... 0
##
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size : 16
## Convergence Tolerance: 1.000000e-03
##
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
##
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 7.836441e-02 7.836441e-02 2.490379e-01 2.766236e-01 3.709038e-01 4.644735e-01 5.795017e-01 6.790204e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 9.221780e+02
## mean............ 5.175257e+02
## variance........ 8.749483e+04
## var 2:
## best............ 6.278279e+01
## mean............ 4.007037e+02
## variance........ 7.226336e+04
## var 3:
## best............ 2.177170e+01
## mean............ 3.893849e+02
## variance........ 8.354676e+04
## var 4:
## best............ 4.386396e+02
## mean............ 4.736510e+02
## variance........ 5.383847e+04
## var 5:
## best............ 2.672448e+02
## mean............ 4.868009e+02
## variance........ 1.204813e+05
## var 6:
## best............ 9.109921e+02
## mean............ 5.445747e+02
## variance........ 1.266014e+05
##
## GENERATION: 1
## Lexical Fit..... 7.836441e-02 7.836441e-02 2.490379e-01 2.766236e-01 3.709038e-01 4.644735e-01 5.795017e-01 6.790204e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 9.221780e+02
## mean............ 6.898203e+02
## variance........ 7.777860e+04
## var 2:
## best............ 6.278279e+01
## mean............ 2.391045e+02
## variance........ 6.898410e+04
## var 3:
## best............ 2.177170e+01
## mean............ 2.922640e+02
## variance........ 1.094456e+05
## var 4:
## best............ 4.386396e+02
## mean............ 4.071890e+02
## variance........ 1.046794e+04
## var 5:
## best............ 2.672448e+02
## mean............ 2.486523e+02
## variance........ 3.616966e+04
## var 6:
## best............ 9.109921e+02
## mean............ 8.091624e+02
## variance........ 4.848595e+04
##
## GENERATION: 2
## Lexical Fit..... 7.836441e-02 7.836441e-02 2.490379e-01 2.766236e-01 3.709038e-01 4.644735e-01 5.795017e-01 6.790204e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 9.221780e+02
## mean............ 8.241288e+02
## variance........ 5.241358e+04
## var 2:
## best............ 6.278279e+01
## mean............ 1.260676e+02
## variance........ 3.202499e+04
## var 3:
## best............ 2.177170e+01
## mean............ 8.511650e+01
## variance........ 9.340414e+03
## var 4:
## best............ 4.386396e+02
## mean............ 4.264452e+02
## variance........ 2.660469e+03
## var 5:
## best............ 2.672448e+02
## mean............ 2.960514e+02
## variance........ 9.119360e+03
## var 6:
## best............ 9.109921e+02
## mean............ 8.956361e+02
## variance........ 2.022662e+03
##
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
##
## Solution Lexical Fitness Value:
## 7.836441e-02 7.836441e-02 2.490379e-01 2.766236e-01 3.709038e-01 4.644735e-01 5.795017e-01 6.790204e-01 1.000000e+00 1.000000e+00 1.000000e+00 1.000000e+00
##
## Parameters at the Solution:
##
## X[ 1] : 9.221780e+02
## X[ 2] : 6.278279e+01
## X[ 3] : 2.177170e+01
## X[ 4] : 4.386396e+02
## X[ 5] : 2.672448e+02
## X[ 6] : 9.109921e+02
##
## Solution Found Generation 1
## Number of Generations Run 2
##
## Mon Dec 16 12:55:48 2024
## Total run time : 0 hours 0 minutes and 0 seconds
mout <- Match(Tr=foo$hasgirls, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
##
## Estimate... 0
## SE......... 0
## T-stat..... NaN
## p.val...... NA
##
## Original number of observations.............. 59
## Original number of treated obs............... 31
## Matched number of observations............... 59
## Matched number of observations (unweighted). 59
mb <- MatchBalance(
hasgirls ~ Dems +Repubs + Christian + age + srvlng + demvote,
match.out = mout, nboots=500, data = foo)
##
## ***** (V1) Dems *****
## Before Matching After Matching
## mean treatment........ 0.64516 0.54237
## mean control.......... 0.42857 0.54237
## std mean diff......... 44.532 0
##
## mean raw eQQ diff..... 0.21429 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.10829 0
## med eCDF diff........ 0.10829 0
## max eCDF diff........ 0.21659 0
##
## var ratio (Tr/Co)..... 0.93145 1
## T-test p-value........ 0.099329 1
##
##
## ***** (V2) Repubs *****
## Before Matching After Matching
## mean treatment........ 0.35484 0.45763
## mean control.......... 0.57143 0.45763
## std mean diff......... -44.532 0
##
## mean raw eQQ diff..... 0.21429 0
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 0
##
## mean eCDF diff........ 0.10829 0
## med eCDF diff........ 0.10829 0
## max eCDF diff........ 0.21659 0
##
## var ratio (Tr/Co)..... 0.93145 1
## T-test p-value........ 0.099329 1
##
##
## ***** (V3) Christian *****
## Before Matching After Matching
## mean treatment........ 0.90323 0.9322
## mean control.......... 1 1
## std mean diff......... -32.2 -26.738
##
## mean raw eQQ diff..... 0.10714 0.067797
## med raw eQQ diff..... 0 0
## max raw eQQ diff..... 1 1
##
## mean eCDF diff........ 0.048387 0.033898
## med eCDF diff........ 0.048387 0.033898
## max eCDF diff........ 0.096774 0.067797
##
## var ratio (Tr/Co)..... Inf Inf
## T-test p-value........ 0.083087 0.042775
##
##
## ***** (V4) age *****
## Before Matching After Matching
## mean treatment........ 48.226 48.915
## mean control.......... 49.857 48.153
## std mean diff......... -19.026 9.2958
##
## mean raw eQQ diff..... 2.4643 1.0678
## med raw eQQ diff..... 2.5 1
## max raw eQQ diff..... 5 8
##
## mean eCDF diff........ 0.061382 0.026441
## med eCDF diff........ 0.051843 0.016949
## max eCDF diff........ 0.13479 0.067797
##
## var ratio (Tr/Co)..... 1.0028 1.1254
## T-test p-value........ 0.46822 0.23178
## KS Bootstrap p-value.. 0.8 0.982
## KS Naive p-value...... 0.80669 0.97503
## KS Statistic.......... 0.13479 0.067797
##
##
## ***** (V5) srvlng *****
## Before Matching After Matching
## mean treatment........ 7.5484 7.8475
## mean control.......... 9.6071 9.0508
## std mean diff......... -28.926 -16.898
##
## mean raw eQQ diff..... 2.4286 1.4746
## med raw eQQ diff..... 1 0
## max raw eQQ diff..... 10 8
##
## mean eCDF diff........ 0.066172 0.04661
## med eCDF diff........ 0.05818 0.042373
## max eCDF diff........ 0.17051 0.11864
##
## var ratio (Tr/Co)..... 0.60661 0.65825
## T-test p-value........ 0.34249 0.043865
## KS Bootstrap p-value.. 0.49 0.558
## KS Naive p-value...... 0.49336 0.55479
## KS Statistic.......... 0.17051 0.11864
##
##
## ***** (V6) demvote *****
## Before Matching After Matching
## mean treatment........ 0.52677 0.51068
## mean control.......... 0.50714 0.51356
## std mean diff......... 15.554 -2.2113
##
## mean raw eQQ diff..... 0.05 0.028644
## med raw eQQ diff..... 0.05 0.02
## max raw eQQ diff..... 0.12 0.08
##
## mean eCDF diff........ 0.10108 0.053838
## med eCDF diff........ 0.066244 0.033898
## max eCDF diff........ 0.29493 0.15254
##
## var ratio (Tr/Co)..... 0.88501 1.0986
## T-test p-value........ 0.56612 0.65957
## KS Bootstrap p-value.. 0.074 0.378
## KS Naive p-value...... 0.099395 0.41575
## KS Statistic.......... 0.29493 0.15254
##
##
## Before Matching Minimum p.value: 0.074
## Variable Name(s): demvote Number(s): 6
##
## After Matching Minimum p.value: 0.042775
## Variable Name(s): Christian Number(s): 3
After_genmatch <- Match(Y = Y, Tr=Tr, X=X, M=2)
summary(After_genmatch)
##
## Estimate... 15.484
## AI SE...... 5.1617
## T-stat..... 2.9998
## p.val...... 0.0027017
##
## Original number of observations.............. 59
## Original number of treated obs............... 31
## Matched number of observations............... 31
## Matched number of observations (unweighted). 62
Based on the analysis, we can identify the treatment effect as 15.484, with a standard error of 5.1617. This means that the confidence interval for the treatment effect is estimated to be between 5.367068 and 25.60093 at a 95% confidence level. Therefore, we can conclude from the data that in the “high dosed” scenario, having two girls significantly impacts the outcome, in contrast to having two boys.
We all know that randomised controlled trials (RCTs) are the gold standard for making causal inferences. However, policy is typically not conducted in an experimental setting. Convenience sampling is the common scenario. Hence, how we can address the confounding to make an apple-to-apple comparison is critical. This article not only combines different matching methods(e.g. exact, calliper, genetic method) as an approach but also provides actionable results to inform the survey team of the targets to follow up with after the treatment. Traditionally, or in previous classes, we dealt with static data. However, with this article, I can see how the matching method can be utilized in a dynamic way.
My only concern is whether it is possible to optimize the number of treated units while maintaining a good balance. The article mentions that it’s possible to achieve a more efficient algorithm by adjusting the settings in repeated trial-and-error. But, could machine learning for optimization be applied here, given that our objective function is clearly defined as a combination of the number of treatment units and a balance?