Question 1: “Daughters”

PART (A)

foo <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSY9jLlufY1GjeMh7D2_g1m6olveHLNCerT2C36MTkcjwQCOlZYf8evLMzGOnc252OgXEEasHqcNIcZ/pub?gid=1976818127&single=true&output=csv")

reg1 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)

summary(reg1)
## 
## Call:
## lm(formula = nowtot ~ hasgirls + Dems + Repubs + Christian + 
##     age + srvlng + demvote, data = foo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.028 -10.322  -1.517  11.208  69.642 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  38.6991    18.6306   2.077 0.038390 *  
## hasgirls     -0.4523     1.9036  -0.238 0.812322    
## Dems         -8.1022    17.5861  -0.461 0.645238    
## Repubs      -55.1069    17.6340  -3.125 0.001901 ** 
## Christian   -13.3961     3.7218  -3.599 0.000357 ***
## age           0.1260     0.1117   1.128 0.259938    
## srvlng       -0.2251     0.1355  -1.662 0.097349 .  
## demvote      87.5501     8.4847  10.319  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared:  0.7821, Adjusted R-squared:  0.7784 
## F-statistic: 216.3 on 7 and 422 DF,  p-value: < 2.2e-16
library(ggplot2)
library(gridExtra)

# plot the distribution
plot1 <- ggplot(foo, aes(x = Dems, fill = factor(hasgirls))) +
  geom_density(alpha = 0.5) +
  ggtitle("Distribution of Dems by hasgirls") +
  theme_minimal() +
  scale_fill_manual(values = c("blue", "red"))

plot2 <- ggplot(foo, aes(x = Repubs, fill = factor(hasgirls))) +
  geom_density(alpha = 0.5) +
  ggtitle("Distribution of Repubs by hasgirls") +
  theme_minimal() +
  scale_fill_manual(values = c("blue", "red"))

plot3 <- ggplot(foo, aes(x = Christian, fill = factor(hasgirls))) +
  geom_density(alpha = 0.5) +
  ggtitle("Distribution of Christian by hasgirls") +
  theme_minimal() +
  scale_fill_manual(values = c("blue", "red"))

plot4 <- ggplot(foo, aes(x = age, fill = factor(hasgirls))) +
  geom_density(alpha = 0.5) +
  ggtitle("Distribution of Age by hasgirls") +
  theme_minimal() +
  scale_fill_manual(values = c("blue", "red"))

 plot5 <- ggplot(foo, aes(x = srvlng, fill = factor(hasgirls))) +
  geom_density(alpha = 0.5) +
  ggtitle("Distribution of srvlng by hasgirls") +
  theme_minimal() +
  scale_fill_manual(values = c("blue", "red"))

plot6 <- ggplot(foo, aes(x = demvote, fill = factor(hasgirls))) +
  geom_density(alpha = 0.5) +
  ggtitle("Distribution of Demvote by hasgirls") +
  theme_minimal() +
  scale_fill_manual(values = c("blue", "red"))

# layout
grid.arrange(plot1, plot2, plot3, plot4, plot5, plot6, ncol = 2)

The treatment effect of having girls is -0.4523, with a standard error of 18.63. This means the confidence interval is estimated to be between -36.8077 and 37.7123 at a 95% confidence level. It is important to note that this treatment effect does not take into account the influence of confounding factors. To address this, we will introduce “Matching” in below session.

library(Matching)
## Warning: package 'Matching' was built under R version 4.3.3
## Loading required package: MASS
## ## 
## ##  Matching (Version 4.10-15, Build Date: 2024-10-14)
## ##  See https://www.jsekhon.com for additional documentation.
## ##  Please cite software as:
## ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
## ##   Software with Automated Balance Optimization: The Matching package for R.''
## ##   Journal of Statistical Software, 42(7): 1-52. 
## ##
## Match on the confounders below...
X <- cbind(foo$Dems, foo$Repubs, foo$Christian, foo$age, foo$srvlng, foo$demvote)
Tr  <- foo$hasgirls
Y   <- foo$nowtot


genout <- GenMatch(Tr = Tr, estimand="ATT", X = X, M=3, pop.size=16, max.generations=10, wait.generations=1)
## Loading required namespace: rgenoud
## 
## 
## Mon Dec 16 12:55:48 2024
## Domains:
##  0.000000e+00   <=  X1   <=    1.000000e+03 
##  0.000000e+00   <=  X2   <=    1.000000e+03 
##  0.000000e+00   <=  X3   <=    1.000000e+03 
##  0.000000e+00   <=  X4   <=    1.000000e+03 
##  0.000000e+00   <=  X5   <=    1.000000e+03 
##  0.000000e+00   <=  X6   <=    1.000000e+03 
## 
## Data Type: Floating Point
## Operators (code number, name, population) 
##  (1) Cloning...........................  1
##  (2) Uniform Mutation..................  2
##  (3) Boundary Mutation.................  2
##  (4) Non-Uniform Mutation..............  2
##  (5) Polytope Crossover................  2
##  (6) Simple Crossover..................  2
##  (7) Whole Non-Uniform Mutation........  2
##  (8) Heuristic Crossover...............  2
##  (9) Local-Minimum Crossover...........  0
## 
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size       : 16
## Convergence Tolerance: 1.000000e-03
## 
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
## 
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 2.880402e-02  6.050129e-02  1.569680e-01  1.569680e-01  2.522714e-01  4.387264e-01  4.387264e-01  6.165314e-01  8.309462e-01  8.607924e-01  1.000000e+00  1.000000e+00  
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 9.658339e+02
## mean............ 5.264171e+02
## variance........ 9.051955e+04
## var 2:
## best............ 4.545248e+02
## mean............ 4.122412e+02
## variance........ 6.291642e+04
## var 3:
## best............ 7.433343e+02
## mean............ 4.396681e+02
## variance........ 6.255117e+04
## var 4:
## best............ 9.957218e+02
## mean............ 5.524709e+02
## variance........ 1.109960e+05
## var 5:
## best............ 2.877353e+01
## mean............ 3.707807e+02
## variance........ 8.472810e+04
## var 6:
## best............ 7.293103e+01
## mean............ 4.915480e+02
## variance........ 1.266730e+05
## 
## GENERATION: 1
## Lexical Fit..... 3.770192e-02  4.137054e-02  1.569680e-01  1.569680e-01  2.300324e-01  4.315719e-01  4.387264e-01  4.387264e-01  5.303673e-01  9.798002e-01  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 7.823778e+02
## mean............ 6.087362e+02
## variance........ 9.230380e+04
## var 2:
## best............ 4.369617e+02
## mean............ 3.439781e+02
## variance........ 8.366097e+03
## var 3:
## best............ 6.872927e+02
## mean............ 7.093641e+02
## variance........ 2.082159e+04
## var 4:
## best............ 7.425135e+02
## mean............ 7.157633e+02
## variance........ 6.047397e+04
## var 5:
## best............ 4.314048e+01
## mean............ 3.255320e+02
## variance........ 7.072203e+04
## var 6:
## best............ 1.152656e+02
## mean............ 1.183648e+02
## variance........ 1.513796e+04
## 
## GENERATION: 2
## Lexical Fit..... 4.890113e-02  5.751797e-02  1.545841e-01  1.569680e-01  1.569680e-01  4.387264e-01  4.387264e-01  5.342514e-01  6.071759e-01  7.260032e-01  1.000000e+00  1.000000e+00  
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 6.627449e+02
## mean............ 8.461707e+02
## variance........ 8.792016e+03
## var 2:
## best............ 4.250508e+02
## mean............ 4.303543e+02
## variance........ 3.333956e+03
## var 3:
## best............ 6.507477e+02
## mean............ 6.652219e+02
## variance........ 1.288317e+04
## var 4:
## best............ 5.773947e+02
## mean............ 8.448381e+02
## variance........ 1.554261e+04
## var 5:
## best............ 5.720602e+01
## mean............ 4.137889e+01
## variance........ 2.784064e+02
## var 6:
## best............ 1.428723e+02
## mean............ 1.092834e+02
## variance........ 1.641250e+03
## 
## GENERATION: 3
## Lexical Fit..... 4.890113e-02  5.751797e-02  1.545841e-01  1.569680e-01  1.569680e-01  4.387264e-01  4.387264e-01  5.342514e-01  6.071759e-01  7.260032e-01  1.000000e+00  1.000000e+00  
## #unique......... 12, #Total UniqueCount: 53
## var 1:
## best............ 6.627449e+02
## mean............ 7.346339e+02
## variance........ 1.340572e+04
## var 2:
## best............ 4.250508e+02
## mean............ 4.403844e+02
## variance........ 2.550566e+03
## var 3:
## best............ 6.507477e+02
## mean............ 6.943771e+02
## variance........ 4.122676e+03
## var 4:
## best............ 5.773947e+02
## mean............ 6.768492e+02
## variance........ 2.903826e+04
## var 5:
## best............ 5.720602e+01
## mean............ 4.955038e+01
## variance........ 2.190047e+02
## var 6:
## best............ 1.428723e+02
## mean............ 1.557070e+02
## variance........ 1.622773e+03
## 
## GENERATION: 4
## Lexical Fit..... 4.890113e-02  6.516602e-02  1.262261e-01  1.262261e-01  1.713969e-01  2.480885e-01  2.480885e-01  4.876947e-01  6.071759e-01  7.543029e-01  1.000000e+00  1.000000e+00  
## #unique......... 10, #Total UniqueCount: 63
## var 1:
## best............ 5.530065e+02
## mean............ 6.730817e+02
## variance........ 2.888852e+03
## var 2:
## best............ 7.895890e+02
## mean............ 4.660923e+02
## variance........ 7.410925e+03
## var 3:
## best............ 7.887057e+02
## mean............ 7.062528e+02
## variance........ 1.072215e+04
## var 4:
## best............ 5.773947e+02
## mean............ 5.484422e+02
## variance........ 2.414415e+04
## var 5:
## best............ 5.720602e+01
## mean............ 5.533075e+01
## variance........ 2.771106e+01
## var 6:
## best............ 1.479595e+02
## mean............ 1.503942e+02
## variance........ 2.346288e+02
## 
## GENERATION: 5
## Lexical Fit..... 4.890113e-02  7.145467e-02  1.569680e-01  1.569680e-01  1.713969e-01  4.387264e-01  4.387264e-01  5.120429e-01  6.460242e-01  8.407361e-01  1.000000e+00  1.000000e+00  
## #unique......... 12, #Total UniqueCount: 75
## var 1:
## best............ 6.608709e+02
## mean............ 5.824787e+02
## variance........ 3.104587e+03
## var 2:
## best............ 4.726607e+02
## mean............ 6.633427e+02
## variance........ 3.285063e+04
## var 3:
## best............ 7.882436e+02
## mean............ 7.766112e+02
## variance........ 3.958048e+03
## var 4:
## best............ 5.773947e+02
## mean............ 5.682457e+02
## variance........ 1.334906e+03
## var 5:
## best............ 5.720602e+01
## mean............ 9.804236e+01
## variance........ 2.497203e+04
## var 6:
## best............ 1.513320e+02
## mean............ 1.474768e+02
## variance........ 8.652128e+00
## 
## GENERATION: 6
## Lexical Fit..... 4.890113e-02  7.145467e-02  1.569680e-01  1.569680e-01  1.713969e-01  4.387264e-01  4.387264e-01  5.178081e-01  6.071759e-01  8.045222e-01  1.000000e+00  1.000000e+00  
## #unique......... 11, #Total UniqueCount: 86
## var 1:
## best............ 6.270355e+02
## mean............ 5.571419e+02
## variance........ 2.426231e+04
## var 2:
## best............ 5.739976e+02
## mean............ 6.080216e+02
## variance........ 3.411085e+04
## var 3:
## best............ 7.837100e+02
## mean............ 7.805032e+02
## variance........ 1.799056e+02
## var 4:
## best............ 5.773947e+02
## mean............ 5.755886e+02
## variance........ 4.016482e+01
## var 5:
## best............ 5.720602e+01
## mean............ 5.732375e+01
## variance........ 2.126732e-01
## var 6:
## best............ 1.506628e+02
## mean............ 1.454469e+02
## variance........ 3.644913e+02
## 
## GENERATION: 7
## Lexical Fit..... 4.890113e-02  7.145467e-02  1.569680e-01  1.569680e-01  1.713969e-01  4.387264e-01  4.387264e-01  5.178081e-01  6.071759e-01  8.045222e-01  1.000000e+00  1.000000e+00  
## #unique......... 10, #Total UniqueCount: 96
## var 1:
## best............ 6.270355e+02
## mean............ 6.626791e+02
## variance........ 4.104900e+03
## var 2:
## best............ 5.739976e+02
## mean............ 5.357816e+02
## variance........ 3.483028e+03
## var 3:
## best............ 7.837100e+02
## mean............ 7.738758e+02
## variance........ 1.944608e+03
## var 4:
## best............ 5.773947e+02
## mean............ 5.653012e+02
## variance........ 2.196797e+03
## var 5:
## best............ 5.720602e+01
## mean............ 5.723066e+01
## variance........ 8.757528e-03
## var 6:
## best............ 1.506628e+02
## mean............ 1.673098e+02
## variance........ 2.105230e+03
## 
## GENERATION: 8
## Lexical Fit..... 4.890113e-02  7.145467e-02  1.569680e-01  1.569680e-01  1.713969e-01  4.387264e-01  4.387264e-01  5.178081e-01  6.071759e-01  8.045222e-01  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 109
## var 1:
## best............ 6.270355e+02
## mean............ 6.444993e+02
## variance........ 2.540632e+03
## var 2:
## best............ 5.739976e+02
## mean............ 5.440684e+02
## variance........ 1.353217e+04
## var 3:
## best............ 7.837100e+02
## mean............ 7.596487e+02
## variance........ 9.191376e+03
## var 4:
## best............ 5.773947e+02
## mean............ 5.773935e+02
## variance........ 2.642594e-05
## var 5:
## best............ 5.720602e+01
## mean............ 1.303279e+02
## variance........ 4.725372e+04
## var 6:
## best............ 1.506628e+02
## mean............ 1.757487e+02
## variance........ 9.477804e+03
## 
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
## 
## Solution Lexical Fitness Value:
## 4.890113e-02  7.145467e-02  1.569680e-01  1.569680e-01  1.713969e-01  4.387264e-01  4.387264e-01  5.178081e-01  6.071759e-01  8.045222e-01  1.000000e+00  1.000000e+00  
## 
## Parameters at the Solution:
## 
##  X[ 1] : 6.270355e+02
##  X[ 2] : 5.739976e+02
##  X[ 3] : 7.837100e+02
##  X[ 4] : 5.773947e+02
##  X[ 5] : 5.720602e+01
##  X[ 6] : 1.506628e+02
## 
## Solution Found Generation 6
## Number of Generations Run 8
## 
## Mon Dec 16 12:55:48 2024
## Total run time : 0 hours 0 minutes and 0 seconds
mout <- Match(Tr=foo$hasgirls, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
## 
## Estimate...  0 
## SE.........  0 
## T-stat.....  NaN 
## p.val......  NA 
## 
## Original number of observations..............  430 
## Original number of treated obs...............  312 
## Matched number of observations...............  430 
## Matched number of observations  (unweighted).  434
mb <- MatchBalance(
    hasgirls ~ Dems +Repubs + Christian + age + srvlng + demvote,
    match.out = mout, nboots=500, data = foo)
## 
## ***** (V1) Dems *****
##                        Before Matching        After Matching
## mean treatment........    0.45833            0.47209 
## mean control..........    0.50847            0.47209 
## std mean diff.........    -10.047                  0 
## 
## mean raw eQQ diff.....   0.050847                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.025071                  0 
## med  eCDF diff........   0.025071                  0 
## max  eCDF diff........   0.050141                  0 
## 
## var ratio (Tr/Co).....    0.98809                  1 
## T-test p-value........    0.35571                  1 
## 
## 
## ***** (V2) Repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.53846            0.52558 
## mean control..........    0.49153            0.52791 
## std mean diff.........        9.4           -0.46518 
## 
## mean raw eQQ diff.....   0.042373          0.0023041 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.023468          0.0011521 
## med  eCDF diff........   0.023468          0.0011521 
## max  eCDF diff........   0.046936          0.0023041 
## 
## var ratio (Tr/Co).....    0.98911             1.0005 
## T-test p-value........     0.3873            0.31731 
## 
## 
## ***** (V3) Christian *****
##                        Before Matching        After Matching
## mean treatment........     0.9391            0.94186 
## mean control..........    0.94915            0.94186 
## std mean diff.........    -4.1958                  0 
## 
## mean raw eQQ diff.....   0.016949                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........   0.005025                  0 
## med  eCDF diff........   0.005025                  0 
## max  eCDF diff........    0.01005                  0 
## 
## var ratio (Tr/Co).....     1.1787                  1 
## T-test p-value........    0.68107                  1 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     52.628             51.686 
## mean control..........     49.178             51.479 
## std mean diff.........     38.385             2.2396 
## 
## mean raw eQQ diff.....      3.661            0.56682 
## med  raw eQQ diff.....          4                  0 
## max  raw eQQ diff.....          7                  7 
## 
## mean eCDF diff........   0.075348           0.010849 
## med  eCDF diff........   0.075538          0.0092166 
## max  eCDF diff........    0.17807           0.036866 
## 
## var ratio (Tr/Co).....    0.71552            0.94531 
## T-test p-value........  0.0020402           0.093283 
## KS Bootstrap p-value..      0.008              0.784 
## KS Naive p-value......  0.0087659             0.9296 
## KS Statistic..........    0.17807           0.036866 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     8.5865             8.3884 
## mean control..........     8.7458             8.5837 
## std mean diff.........    -2.1085            -2.6375 
## 
## mean raw eQQ diff.....    0.66949             0.4424 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          5                  5 
## 
## mean eCDF diff........   0.017181           0.011265 
## med  eCDF diff........    0.01445           0.011521 
## max  eCDF diff........   0.051608           0.029954 
## 
## var ratio (Tr/Co).....    0.77347            0.87885 
## T-test p-value........    0.85956            0.32982 
## KS Bootstrap p-value..      0.754              0.864 
## KS Naive p-value......    0.97653            0.98994 
## KS Statistic..........   0.051608           0.029954 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.49929            0.50181 
## mean control..........    0.50602            0.50249 
## std mean diff.........    -5.2747           -0.53816 
## 
## mean raw eQQ diff.....   0.011441          0.0072581 
## med  raw eQQ diff.....       0.01                  0 
## max  raw eQQ diff.....       0.08               0.08 
## 
## mean eCDF diff........   0.015928           0.010862 
## med  eCDF diff........   0.010811          0.0069124 
## max  eCDF diff........   0.048512           0.032258 
## 
## var ratio (Tr/Co).....     1.1269             1.0992 
## T-test p-value........    0.61103            0.75436 
## KS Bootstrap p-value..      0.918              0.908 
## KS Naive p-value......    0.98776            0.97764 
## KS Statistic..........   0.048512           0.032258 
## 
## 
## Before Matching Minimum p.value: 0.0020402 
## Variable Name(s): age  Number(s): 4 
## 
## After Matching Minimum p.value: 0.093283 
## Variable Name(s): age  Number(s): 4

After applying the matching method, we observed an improvement in balance. For example, the mean of Dems in the treatment group and the control group was 0.45833 and 0.50847, respectively. However, after matching, the mean number of Dems in the treatment and control groups both became 0.47209. The T-test p-value increased to 1 from 0.35571, indicating that the distributions between two groups had become quite similar. Additionally, other covariate distributions also aligned, making it reasonable to derive potential causal inferences from this analysis.

After_genmatch  <- Match(Y = Y, Tr=Tr, X=X, M=3)
summary(After_genmatch)
## 
## Estimate...  -0.0013355 
## AI SE......  1.9563 
## T-stat.....  -0.00068264 
## p.val......  0.99946 
## 
## Original number of observations..............  430 
## Original number of treated obs...............  312 
## Matched number of observations...............  312 
## Matched number of observations  (unweighted).  938

Based on the analysis, we can identify the treatment effect as -0.0013355, with a standard error of 1.9563. This means that the confidence interval for the treatment effect is estimated to be between -3.835684 and 3.833012 at a 95% confidence level. Therefore, we can conclude from the data that the presence of girls does not have a significant impact for political stance for voting.

Also, it is interesting to observe that through the matching process, the uncertainty was reduced compared to the simple regression model.

PART (B)

#filter data
treatment_group <- subset(foo, ngirls == 2 & nboys == 0)
control_group <- subset(foo, nboys == 2 & ngirls == 0)

filtered_data <- rbind(treatment_group, control_group)

#head(filtered_data)
## Match on the confounders below...
foo <- filtered_data
reg2 <- lm(nowtot ~ hasgirls +Dems +Repubs + Christian + age + srvlng + demvote, foo)
summary(reg1)
## 
## Call:
## lm(formula = nowtot ~ hasgirls + Dems + Repubs + Christian + 
##     age + srvlng + demvote, data = foo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -56.028 -10.322  -1.517  11.208  69.642 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  38.6991    18.6306   2.077 0.038390 *  
## hasgirls     -0.4523     1.9036  -0.238 0.812322    
## Dems         -8.1022    17.5861  -0.461 0.645238    
## Repubs      -55.1069    17.6340  -3.125 0.001901 ** 
## Christian   -13.3961     3.7218  -3.599 0.000357 ***
## age           0.1260     0.1117   1.128 0.259938    
## srvlng       -0.2251     0.1355  -1.662 0.097349 .  
## demvote      87.5501     8.4847  10.319  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.19 on 422 degrees of freedom
## Multiple R-squared:  0.7821, Adjusted R-squared:  0.7784 
## F-statistic: 216.3 on 7 and 422 DF,  p-value: < 2.2e-16
X <- cbind(foo$Dems, foo$Repubs, foo$Christian, foo$age, foo$srvlng, foo$demvote)
Tr  <- foo$hasgirls
Y   <- foo$nowtot

genout <- GenMatch(Tr = Tr, estimand="ATT", X = X, M=2, pop.size=16, max.generations=10, wait.generations=1)
## 
## 
## Mon Dec 16 12:55:48 2024
## Domains:
##  0.000000e+00   <=  X1   <=    1.000000e+03 
##  0.000000e+00   <=  X2   <=    1.000000e+03 
##  0.000000e+00   <=  X3   <=    1.000000e+03 
##  0.000000e+00   <=  X4   <=    1.000000e+03 
##  0.000000e+00   <=  X5   <=    1.000000e+03 
##  0.000000e+00   <=  X6   <=    1.000000e+03 
## 
## Data Type: Floating Point
## Operators (code number, name, population) 
##  (1) Cloning...........................  1
##  (2) Uniform Mutation..................  2
##  (3) Boundary Mutation.................  2
##  (4) Non-Uniform Mutation..............  2
##  (5) Polytope Crossover................  2
##  (6) Simple Crossover..................  2
##  (7) Whole Non-Uniform Mutation........  2
##  (8) Heuristic Crossover...............  2
##  (9) Local-Minimum Crossover...........  0
## 
## SOFT Maximum Number of Generations: 10
## Maximum Nonchanging Generations: 1
## Population size       : 16
## Convergence Tolerance: 1.000000e-03
## 
## Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
## Not Checking Gradients before Stopping.
## Using Out of Bounds Individuals.
## 
## Maximization Problem.
## GENERATION: 0 (initializing the population)
## Lexical Fit..... 7.836441e-02  7.836441e-02  2.490379e-01  2.766236e-01  3.709038e-01  4.644735e-01  5.795017e-01  6.790204e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 16, #Total UniqueCount: 16
## var 1:
## best............ 9.221780e+02
## mean............ 5.175257e+02
## variance........ 8.749483e+04
## var 2:
## best............ 6.278279e+01
## mean............ 4.007037e+02
## variance........ 7.226336e+04
## var 3:
## best............ 2.177170e+01
## mean............ 3.893849e+02
## variance........ 8.354676e+04
## var 4:
## best............ 4.386396e+02
## mean............ 4.736510e+02
## variance........ 5.383847e+04
## var 5:
## best............ 2.672448e+02
## mean............ 4.868009e+02
## variance........ 1.204813e+05
## var 6:
## best............ 9.109921e+02
## mean............ 5.445747e+02
## variance........ 1.266014e+05
## 
## GENERATION: 1
## Lexical Fit..... 7.836441e-02  7.836441e-02  2.490379e-01  2.766236e-01  3.709038e-01  4.644735e-01  5.795017e-01  6.790204e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 13, #Total UniqueCount: 29
## var 1:
## best............ 9.221780e+02
## mean............ 6.898203e+02
## variance........ 7.777860e+04
## var 2:
## best............ 6.278279e+01
## mean............ 2.391045e+02
## variance........ 6.898410e+04
## var 3:
## best............ 2.177170e+01
## mean............ 2.922640e+02
## variance........ 1.094456e+05
## var 4:
## best............ 4.386396e+02
## mean............ 4.071890e+02
## variance........ 1.046794e+04
## var 5:
## best............ 2.672448e+02
## mean............ 2.486523e+02
## variance........ 3.616966e+04
## var 6:
## best............ 9.109921e+02
## mean............ 8.091624e+02
## variance........ 4.848595e+04
## 
## GENERATION: 2
## Lexical Fit..... 7.836441e-02  7.836441e-02  2.490379e-01  2.766236e-01  3.709038e-01  4.644735e-01  5.795017e-01  6.790204e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## #unique......... 12, #Total UniqueCount: 41
## var 1:
## best............ 9.221780e+02
## mean............ 8.241288e+02
## variance........ 5.241358e+04
## var 2:
## best............ 6.278279e+01
## mean............ 1.260676e+02
## variance........ 3.202499e+04
## var 3:
## best............ 2.177170e+01
## mean............ 8.511650e+01
## variance........ 9.340414e+03
## var 4:
## best............ 4.386396e+02
## mean............ 4.264452e+02
## variance........ 2.660469e+03
## var 5:
## best............ 2.672448e+02
## mean............ 2.960514e+02
## variance........ 9.119360e+03
## var 6:
## best............ 9.109921e+02
## mean............ 8.956361e+02
## variance........ 2.022662e+03
## 
## 'wait.generations' limit reached.
## No significant improvement in 1 generations.
## 
## Solution Lexical Fitness Value:
## 7.836441e-02  7.836441e-02  2.490379e-01  2.766236e-01  3.709038e-01  4.644735e-01  5.795017e-01  6.790204e-01  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  
## 
## Parameters at the Solution:
## 
##  X[ 1] : 9.221780e+02
##  X[ 2] : 6.278279e+01
##  X[ 3] : 2.177170e+01
##  X[ 4] : 4.386396e+02
##  X[ 5] : 2.672448e+02
##  X[ 6] : 9.109921e+02
## 
## Solution Found Generation 1
## Number of Generations Run 2
## 
## Mon Dec 16 12:55:48 2024
## Total run time : 0 hours 0 minutes and 0 seconds
mout <- Match(Tr=foo$hasgirls, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
## 
## Estimate...  0 
## SE.........  0 
## T-stat.....  NaN 
## p.val......  NA 
## 
## Original number of observations..............  59 
## Original number of treated obs...............  31 
## Matched number of observations...............  59 
## Matched number of observations  (unweighted).  59
mb <- MatchBalance(
    hasgirls ~ Dems +Repubs + Christian + age + srvlng + demvote,
    match.out = mout, nboots=500, data = foo)
## 
## ***** (V1) Dems *****
##                        Before Matching        After Matching
## mean treatment........    0.64516            0.54237 
## mean control..........    0.42857            0.54237 
## std mean diff.........     44.532                  0 
## 
## mean raw eQQ diff.....    0.21429                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.10829                  0 
## med  eCDF diff........    0.10829                  0 
## max  eCDF diff........    0.21659                  0 
## 
## var ratio (Tr/Co).....    0.93145                  1 
## T-test p-value........   0.099329                  1 
## 
## 
## ***** (V2) Repubs *****
##                        Before Matching        After Matching
## mean treatment........    0.35484            0.45763 
## mean control..........    0.57143            0.45763 
## std mean diff.........    -44.532                  0 
## 
## mean raw eQQ diff.....    0.21429                  0 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  0 
## 
## mean eCDF diff........    0.10829                  0 
## med  eCDF diff........    0.10829                  0 
## max  eCDF diff........    0.21659                  0 
## 
## var ratio (Tr/Co).....    0.93145                  1 
## T-test p-value........   0.099329                  1 
## 
## 
## ***** (V3) Christian *****
##                        Before Matching        After Matching
## mean treatment........    0.90323             0.9322 
## mean control..........          1                  1 
## std mean diff.........      -32.2            -26.738 
## 
## mean raw eQQ diff.....    0.10714           0.067797 
## med  raw eQQ diff.....          0                  0 
## max  raw eQQ diff.....          1                  1 
## 
## mean eCDF diff........   0.048387           0.033898 
## med  eCDF diff........   0.048387           0.033898 
## max  eCDF diff........   0.096774           0.067797 
## 
## var ratio (Tr/Co).....        Inf                Inf 
## T-test p-value........   0.083087           0.042775 
## 
## 
## ***** (V4) age *****
##                        Before Matching        After Matching
## mean treatment........     48.226             48.915 
## mean control..........     49.857             48.153 
## std mean diff.........    -19.026             9.2958 
## 
## mean raw eQQ diff.....     2.4643             1.0678 
## med  raw eQQ diff.....        2.5                  1 
## max  raw eQQ diff.....          5                  8 
## 
## mean eCDF diff........   0.061382           0.026441 
## med  eCDF diff........   0.051843           0.016949 
## max  eCDF diff........    0.13479           0.067797 
## 
## var ratio (Tr/Co).....     1.0028             1.1254 
## T-test p-value........    0.46822            0.23178 
## KS Bootstrap p-value..        0.8              0.982 
## KS Naive p-value......    0.80669            0.97503 
## KS Statistic..........    0.13479           0.067797 
## 
## 
## ***** (V5) srvlng *****
##                        Before Matching        After Matching
## mean treatment........     7.5484             7.8475 
## mean control..........     9.6071             9.0508 
## std mean diff.........    -28.926            -16.898 
## 
## mean raw eQQ diff.....     2.4286             1.4746 
## med  raw eQQ diff.....          1                  0 
## max  raw eQQ diff.....         10                  8 
## 
## mean eCDF diff........   0.066172            0.04661 
## med  eCDF diff........    0.05818           0.042373 
## max  eCDF diff........    0.17051            0.11864 
## 
## var ratio (Tr/Co).....    0.60661            0.65825 
## T-test p-value........    0.34249           0.043865 
## KS Bootstrap p-value..       0.49              0.558 
## KS Naive p-value......    0.49336            0.55479 
## KS Statistic..........    0.17051            0.11864 
## 
## 
## ***** (V6) demvote *****
##                        Before Matching        After Matching
## mean treatment........    0.52677            0.51068 
## mean control..........    0.50714            0.51356 
## std mean diff.........     15.554            -2.2113 
## 
## mean raw eQQ diff.....       0.05           0.028644 
## med  raw eQQ diff.....       0.05               0.02 
## max  raw eQQ diff.....       0.12               0.08 
## 
## mean eCDF diff........    0.10108           0.053838 
## med  eCDF diff........   0.066244           0.033898 
## max  eCDF diff........    0.29493            0.15254 
## 
## var ratio (Tr/Co).....    0.88501             1.0986 
## T-test p-value........    0.56612            0.65957 
## KS Bootstrap p-value..      0.074              0.378 
## KS Naive p-value......   0.099395            0.41575 
## KS Statistic..........    0.29493            0.15254 
## 
## 
## Before Matching Minimum p.value: 0.074 
## Variable Name(s): demvote  Number(s): 6 
## 
## After Matching Minimum p.value: 0.042775 
## Variable Name(s): Christian  Number(s): 3
After_genmatch  <- Match(Y = Y, Tr=Tr, X=X, M=2)
summary(After_genmatch)
## 
## Estimate...  15.484 
## AI SE......  5.1617 
## T-stat.....  2.9998 
## p.val......  0.0027017 
## 
## Original number of observations..............  59 
## Original number of treated obs...............  31 
## Matched number of observations...............  31 
## Matched number of observations  (unweighted).  62

Based on the analysis, we can identify the treatment effect as 15.484, with a standard error of 5.1617. This means that the confidence interval for the treatment effect is estimated to be between 5.367068 and 25.60093 at a 95% confidence level. Therefore, we can conclude from the data that in the “high dosed” scenario, having two girls significantly impacts the outcome, in contrast to having two boys.

 QUESTION 2: “Business Lending in Indonesia”

We all know that randomised controlled trials (RCTs) are the gold standard for making causal inferences. However, policy is typically not conducted in an experimental setting. Convenience sampling is the common scenario. Hence, how we can address the confounding to make an apple-to-apple comparison is critical. This article not only combines different matching methods(e.g. exact, calliper, genetic method) as an approach but also provides actionable results to inform the survey team of the targets to follow up with after the treatment. Traditionally, or in previous classes, we dealt with static data. However, with this article, I can see how the matching method can be utilized in a dynamic way.

My only concern is whether it is possible to optimize the number of treated units while maintaining a good balance. The article mentions that it’s possible to achieve a more efficient algorithm by adjusting the settings in repeated trial-and-error. But, could machine learning for optimization be applied here, given that our objective function is clearly defined as a combination of the number of treatment units and a balance?

QUESTION 3: Sensemakr package in R