Pset01

Este PSet es de carácter práctico y tiene como objetivo que realicen estimaciones relacionadas con Matching e interpreten los resultados. Para el desarrollo del mismo pueden utilizar los fragmentos de códigos presentados en Matching.R

Setup y Carga de Paquetes

# INSTALACIÓN DE PAQUETES (comentado porque ya están instalados)
# install.packages(c("haven", "dplyr", "xtable", "stargazer", 
#                   "MatchIt", "margins", "estimatr", "cobalt",
#                   "ggplot2", "modelsummary"))

# CARGA DE PAQUETES
library(haven)
library(dplyr)
library(xtable)
library(stargazer)
library(MatchIt)
library(margins)
library(estimatr)
library(cobalt)
library(ggplot2)
library(modelsummary)

#lenguaje
Sys.setlocale("LC_ALL", "en_US.UTF-8")
## [1] "LC_COLLATE=en_US.UTF-8;LC_CTYPE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8"

Inciso 1

Utilice los controles en la base de datos cps_controls.rda, para estimar el efecto del tratamiento sobre el ingreso de 1978, utilizando MCO. Recuerde que los tratados se encuentran en la base de datos nsw_dw.rda. Muestre evidencia empírica que le permita argumentar si este efecto estimado es fiable o no.

load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")

data_mco <- psid_controls

#para que r sepa que treat es una dummy
data_mco$treat_factor <- as.factor(data_mco$treat)

# Estimación MCO simple (sin controles)
mco_simple <- lm_robust(re78 ~ treat,
                        data = data_mco, se_type = "HC2")
summary(mco_simple)
## 
## Call:
## lm_robust(formula = re78 ~ treat, data = data_mco, se_type = "HC2")
## 
## Standard error type:  HC2 
## 
## Coefficients: (1 not defined because the design matrix is rank deficient)
##             Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper   DF
## (Intercept)    21554      311.7   69.14        0    20943    22165 2489
## treat             NA         NA      NA       NA       NA       NA   NA
## 
## Multiple R-squared:  -1.332e-15 ,    Adjusted R-squared:  -1.332e-15
# Estimación MCO con controles
mco_controles_1 <- lm_robust(re78 ~ treat + age + education + black + hispanic + married + nodegree + re74 + re75, 
                    data = data_mco, se_type = "HC2")
summary(mco_controles_1)
## 
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black + 
##     hispanic + married + nodegree + re74 + re75, data = data_mco, 
##     se_type = "HC2")
## 
## Standard error type:  HC2 
## 
## Coefficients: (1 not defined because the design matrix is rank deficient)
##              Estimate Std. Error  t value  Pr(>|t|)   CI Lower  CI Upper   DF
## (Intercept) -120.2203  1.959e+03 -0.06137 9.511e-01 -3961.4153 3720.9748 2481
## treat              NA         NA       NA        NA         NA        NA   NA
## age          -93.5690  2.155e+01 -4.34290 1.463e-05  -135.8176  -51.3204 2481
## education    594.8656  1.276e+02  4.66112 3.311e-06   344.6074  845.1238 2481
## black       -570.6953  4.537e+02 -1.25778 2.086e-01 -1460.4315  319.0409 2481
## hispanic    2502.6857  1.323e+03  1.89157 5.867e-02   -91.7623 5097.1337 2481
## married     1380.7505  5.269e+02  2.62036 8.837e-03   347.4779 2414.0231 2481
## nodegree     768.5418  6.770e+02  1.13520 2.564e-01  -559.0188 2096.1023 2481
## re74           0.2852  6.347e-02  4.49243 7.364e-06     0.1607    0.4096 2481
## re75           0.5675  6.813e-02  8.32886 1.332e-16     0.4339    0.7011 2481
## 
## Multiple R-squared:  0.5717 ,    Adjusted R-squared:  0.5703 
## F-statistic:    NA on 8 and 2481 DF,  p-value: NA
# que fue lo que se hizo es tomar los tratados y se estimo un mco 
# como en el 75 fue el programa se ve si existe pues una causalidad

modelsummary(list("Modelo Simple" = mco_simple,
                  "Modelo con Controles" = mco_controles_1),
             title = "Comparación de Modelos MCO",
             fmt = 2
)
Comparación de Modelos MCO
Modelo Simple Modelo con Controles
(Intercept) 21553.92 -120.22
(311.73) (1958.87)
age -93.57
(21.55)
education 594.87
(127.62)
black -570.70
(453.73)
hispanic 2502.69
(1323.08)
married 1380.75
(526.93)
nodegree 768.54
(677.01)
re74 0.29
(0.06)
re75 0.57
(0.07)
Num.Obs. 2490 2490
R2 -0.000 0.572
R2 Adj. -0.000 0.570
AIC 55137.1 53042.0
BIC 55148.7 53100.2
RMSE 15552.22 10178.65

El efecto del tratamiento estimado mediante MCO no es confiable, ya que los individuos tratados del experimento NSW y los controles del CPS difieren significativamente en sus características.

Tambien se introduce el sesgo de selección que impide identificar de forma precisa el efecto causal del programa sobre los ingresos de 1978.

Inciso 2

Realice matching en propensity score (con las configuraciones que considere pertinente debidamente justificadas). Compare con los resultados obtenidos en el inciso 1. Argumente si el efecto estimado a través de matching es fiable o no (muestre evidencia para sostener sus argumentos).

load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")

# PROPENSITY SCORE MATCHING 

dataset <- rbind(psid_controls, nsw_dw)
dataset_final <- dataset %>% select("treat","age","education","black",
                                    "hispanic","married","nodegree",
                                    "re74","re75","re78")

#estimar propensity score con probit
ps_model <- glm(treat ~ age + education + black + hispanic + 
                  married + nodegree + re74 + re75,
                data = dataset_final, 
                family = binomial(link = "probit"))

summary(ps_model)
## 
## Call:
## glm(formula = treat ~ age + education + black + hispanic + married + 
##     nodegree + re74 + re75, family = binomial(link = "probit"), 
##     data = dataset_final)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.844e-01  4.908e-01  -0.987   0.3236    
## age         -1.915e-02  6.034e-03  -3.174   0.0015 ** 
## education   -4.147e-03  2.983e-02  -0.139   0.8894    
## black        8.138e-01  1.426e-01   5.706 1.16e-08 ***
## hispanic     5.016e-01  2.343e-01   2.140   0.0323 *  
## married     -5.082e-01  1.209e-01  -4.204 2.62e-05 ***
## nodegree    -6.370e-02  1.512e-01  -0.421   0.6735    
## re74        -2.917e-05  1.164e-05  -2.506   0.0122 *  
## re75        -6.796e-05  1.434e-05  -4.740 2.13e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1380.81  on 2934  degrees of freedom
## Residual deviance:  831.56  on 2926  degrees of freedom
## AIC: 849.56
## 
## Number of Fisher Scoring iterations: 9
#hacer matching
set.seed(123)
match_psm <- matchit(
  treat ~ age + education + black + hispanic + married + nodegree + re74 + re75,
  data = dataset_final,
  method = "nearest",
  distance = "glm",
  link = "probit",
  ratio = 1,
  caliper = 0.1,  # super restrictivo con las desviacion 
  replace = FALSE,
  estimand = "ATT"
)

#resumen del maatching
cat("=== RESUMEN DEL MATCHING ===\n")
## === RESUMEN DEL MATCHING ===
summary(match_psm, standardize = TRUE)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset_final, method = "nearest", 
##     distance = "glm", link = "probit", estimand = "ATT", replace = FALSE, 
##     caliper = 0.1, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2971        0.0470          1.8947     1.5698    0.4394
## age             25.8162       33.9244         -1.1332     0.4587    0.2079
## education       10.3459       11.9251         -0.7854     0.4394    0.0973
## black            0.8432        0.3051          1.4802          .    0.5382
## hispanic         0.0595        0.0396          0.0838          .    0.0198
## married          0.1892        0.7989         -1.5568          .    0.6097
## nodegree         0.7081        0.3553          0.7761          .    0.3528
## re74          2095.5737    17791.0560         -3.2119     0.1247    0.4356
## re75          1532.0553    17380.7662         -4.9231     0.0530    0.4403
##           eCDF Max
## distance    0.7917
## age         0.3405
## education   0.3528
## black       0.5382
## hispanic    0.0198
## married     0.6097
## nodegree    0.3528
## re74        0.6603
## re75        0.7018
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2971        0.2964          0.0056     1.0046    0.0005
## age             25.8162       26.1838         -0.0514     0.7603    0.0175
## education       10.3459       10.3027          0.0215     1.0065    0.0134
## black            0.8432        0.9135         -0.1933          .    0.0703
## hispanic         0.0595        0.0486          0.0457          .    0.0108
## married          0.1892        0.2324         -0.1104          .    0.0432
## nodegree         0.7081        0.7297         -0.0476          .    0.0216
## re74          2095.5737     2232.4086         -0.0280     1.3697    0.0163
## re75          1532.0553     1821.2594         -0.0898     0.9051    0.0140
##           eCDF Max Std. Pair Dist.
## distance    0.0270          0.0083
## age         0.0486          0.8144
## education   0.0757          0.8442
## black       0.0703          0.5501
## hispanic    0.0108          0.4114
## married     0.0432          0.4416
## nodegree    0.0216          0.6658
## re74        0.0757          0.5030
## re75        0.0703          0.5615
## 
## Sample Sizes:
##           Control Treated
## All          2750     185
## Matched       185     185
## Unmatched    2565       0
## Discarded       0       0
#grafico de balance
plot(match_psm, type = "qq", interactive = FALSE)

#extraer datos 
matched_data <- match.data(match_psm)

#ATT CON DATOS MATCHED
att_psm <- lm_robust(re78 ~ treat, data = matched_data, weights = weights, se_type = "HC2")

summary(att_psm)
## 
## Call:
## lm_robust(formula = re78 ~ treat, data = matched_data, weights = weights, 
##     se_type = "HC2")
## 
## Weighted, Standard error type:  HC2 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
## (Intercept)   5383.4      444.2  12.118 1.135e-28   4509.8     6257 368
## treat          965.7      729.3   1.324 1.863e-01   -468.5     2400 368
## 
## Multiple R-squared:  0.004742 ,  Adjusted R-squared:  0.002037 
## F-statistic: 1.753 on 1 and 368 DF,  p-value: 0.1863
att_psm_robust <- lm_robust(re78 ~ treat + age + education + black + hispanic + 
                              married + nodegree + re74 + re75, 
                            data = matched_data, weights = weights, se_type = "HC2")
summary(att_psm_robust)
## 
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black + 
##     hispanic + married + nodegree + re74 + re75, data = matched_data, 
##     weights = weights, se_type = "HC2")
## 
## Weighted, Standard error type:  HC2 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)   CI Lower  CI Upper  DF
## (Intercept)  -115.3341  3695.3879 -0.03121  0.97512 -7.383e+03 7151.9252 360
## treat         937.5167   722.8381  1.29699  0.19546 -4.840e+02 2359.0323 360
## age            32.5081    42.9152  0.75750  0.44925 -5.189e+01  116.9042 360
## education     532.1903   205.9845  2.58364  0.01017  1.271e+02  937.2743 360
## black       -2220.1366  1338.2821 -1.65895  0.09800 -4.852e+03  411.6962 360
## hispanic    -1642.8126  2003.1524 -0.82011  0.41269 -5.582e+03 2296.5377 360
## married       742.2488   922.2915  0.80479  0.42147 -1.072e+03 2556.0047 360
## nodegree      303.2858  1090.4696  0.27812  0.78108 -1.841e+03 2447.7765 360
## re74            0.2623     0.1709  1.53507  0.12565 -7.373e-02    0.5983 360
## re75            0.1610     0.1850  0.87056  0.38458 -2.027e-01    0.5248 360
## 
## Multiple R-squared:  0.09826 ,   Adjusted R-squared:  0.07572 
## F-statistic:  3.65 on 9 and 360 DF,  p-value: 0.0002211
#balance
balance_stats <- summary(match_psm, standardize = TRUE)
print(balance_stats)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset_final, method = "nearest", 
##     distance = "glm", link = "probit", estimand = "ATT", replace = FALSE, 
##     caliper = 0.1, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2971        0.0470          1.8947     1.5698    0.4394
## age             25.8162       33.9244         -1.1332     0.4587    0.2079
## education       10.3459       11.9251         -0.7854     0.4394    0.0973
## black            0.8432        0.3051          1.4802          .    0.5382
## hispanic         0.0595        0.0396          0.0838          .    0.0198
## married          0.1892        0.7989         -1.5568          .    0.6097
## nodegree         0.7081        0.3553          0.7761          .    0.3528
## re74          2095.5737    17791.0560         -3.2119     0.1247    0.4356
## re75          1532.0553    17380.7662         -4.9231     0.0530    0.4403
##           eCDF Max
## distance    0.7917
## age         0.3405
## education   0.3528
## black       0.5382
## hispanic    0.0198
## married     0.6097
## nodegree    0.3528
## re74        0.6603
## re75        0.7018
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2971        0.2964          0.0056     1.0046    0.0005
## age             25.8162       26.1838         -0.0514     0.7603    0.0175
## education       10.3459       10.3027          0.0215     1.0065    0.0134
## black            0.8432        0.9135         -0.1933          .    0.0703
## hispanic         0.0595        0.0486          0.0457          .    0.0108
## married          0.1892        0.2324         -0.1104          .    0.0432
## nodegree         0.7081        0.7297         -0.0476          .    0.0216
## re74          2095.5737     2232.4086         -0.0280     1.3697    0.0163
## re75          1532.0553     1821.2594         -0.0898     0.9051    0.0140
##           eCDF Max Std. Pair Dist.
## distance    0.0270          0.0083
## age         0.0486          0.8144
## education   0.0757          0.8442
## black       0.0703          0.5501
## hispanic    0.0108          0.4114
## married     0.0432          0.4416
## nodegree    0.0216          0.6658
## re74        0.0757          0.5030
## re75        0.0703          0.5615
## 
## Sample Sizes:
##           Control Treated
## All          2750     185
## Matched       185     185
## Unmatched    2565       0
## Discarded       0       0
#love plot
love_plot <- love.plot(match_psm, 
                       stats = "mean.diffs",
                       stars = "raw",        
                       thresholds = 0.1,
                       drop.distance = TRUE,
                       title = "Balance de Covariables - PSM con dataset_final")
print(love_plot)

# FIABILIDAD

# Tamaño de muestra post-matching
cat("=== ANÁLISIS DE FIABILIDAD ===\n")
## === ANÁLISIS DE FIABILIDAD ===
cat("Tamaño de muestra original (dataset_final):", nrow(dataset_final), "\n")
## Tamaño de muestra original (dataset_final): 2935
cat("Tamaño de muestra matched:", nrow(matched_data), "\n")
## Tamaño de muestra matched: 370
cat("Pérdida de observaciones:", nrow(dataset_final) - nrow(matched_data), "\n")
## Pérdida de observaciones: 2565
cat("Porcentaje retenido:", round(nrow(matched_data)/nrow(dataset_final)*100, 1), "%\n")
## Porcentaje retenido: 12.6 %
#Common support para obtener del x del grafico de densidad
dataset_final$ps <- predict(ps_model, newdata = dataset_final, type = "response")
matched_data$ps <- matched_data$distance

ps_treat_pre <- dataset_final$ps[dataset_final$treat == 1]
ps_control_pre <- dataset_final$ps[dataset_final$treat == 0]
ps_treat_post <- matched_data$ps[matched_data$treat == 1]
ps_control_post <- matched_data$ps[matched_data$treat == 0]

cat("\n=== COMMON SUPPORT ===\n")
## 
## === COMMON SUPPORT ===
cat("Pre-matching - Tratados: [", round(min(ps_treat_pre), 6), ",", round(max(ps_treat_pre), 6), "]\n")
## Pre-matching - Tratados: [ 0.001119 , 0.474009 ]
cat("Pre-matching - Controles: [", round(min(ps_control_pre), 6), ",", round(max(ps_control_pre), 6), "]\n")
## Pre-matching - Controles: [ 0 , 0.466389 ]
cat("Post-matching - Tratados: [", round(min(ps_treat_post), 6), ",", round(max(ps_treat_post), 6), "]\n")
## Post-matching - Tratados: [ 0.001119 , 0.474009 ]
cat("Post-matching - Controles: [", round(min(ps_control_post), 6), ",", round(max(ps_control_post), 6), "]\n")
## Post-matching - Controles: [ 0.001116 , 0.466389 ]
#grafico de densidad
densidad1 <- ggplot(matched_data, aes(x = ps, fill = factor(treat))) +
  geom_density(alpha = 0.5) +
  labs(title = "Densidad de Propensity Score Post-Matching - dataset_final",
       subtitle = paste("Muestra:", nrow(matched_data), "observaciones"),
       x = "Propensity Score", y = "Densidad") +
  scale_fill_manual(values = c("blue", "red"), 
                    labels = c("Control", "Tratado"),
                    name = "Grupo")
print(densidad1)

# COMPARACION FINAL MCO vs PSM 


modelsummary(list("MCO Simple" = mco_simple,
                  "MCO con Controles" = mco_controles_1,
                  "MCO Propensity score simple" = att_psm,
                  "MCO Propensity score con controles" = att_psm_robust),
             title = "Comparación de Modelos MCO",
             fmt = 2
)
Comparación de Modelos MCO
MCO Simple MCO con Controles MCO Propensity score simple MCO Propensity score con controles
(Intercept) 21553.92 -120.22 5383.42 -115.33
(311.73) (1958.87) (444.24) (3695.39)
age -93.57 32.51
(21.55) (42.92)
education 594.87 532.19
(127.62) (205.98)
black -570.70 -2220.14
(453.73) (1338.28)
hispanic 2502.69 -1642.81
(1323.08) (2003.15)
married 1380.75 742.25
(526.93) (922.29)
nodegree 768.54 303.29
(677.01) (1090.47)
re74 0.29 0.26
(0.06) (0.17)
re75 0.57 0.16
(0.07) (0.18)
treat 965.72 937.52
(729.33) (722.84)
Num.Obs. 2490 2490 370 370
R2 -0.000 0.572 0.005 0.098
R2 Adj. -0.000 0.570 0.002 0.076
AIC 55137.1 53042.0 7607.3 7586.7
BIC 55148.7 53100.2 7619.0 7629.8
RMSE 15552.22 10178.65 6995.51 6658.74

Del resumen del balance previo al emparejamiento (pre-matching) se aprecia que existían diferencias estandarizadas muy altas entre los grupos, con valores cercanos a 4.9 en términos absolutos. No obstante, tras aplicar el emparejamiento (post-matching), dichas diferencias disminuyeron de manera considerable, ubicándose por debajo de 0.1. Esto evidencia que el proceso de matching logró mejorar sustancialmente la similitud entre tratados y controles, por lo que puede considerarse un procedimiento confiable.

Aunque en las regresiones no existe evidencia estadísticamente significativa de que el tratamiento tenga un efecto en los ingresos de 1978, R^2 bajo y p-value no significavos exepto en educacion con un beta del 532.19 es decir que por estudiar se puede generar $ 532.19 dolares mas de salario con un error de 205.98.

En las regresiones realizadas después del emparejamiento no se encontró evidencia estadísticamente significativa de que el tratamiento haya influido en los ingresos de 1978. El bajo valor de 𝑅2 y los p-valores elevados respaldan esta interpretación. La única variable con significancia fue la educación, cuyo coeficiente (β = 532.19; error estándar = 205.98) indica que un mayor nivel educativo se asocia con un incremento aproximado de 532 dólares en el ingreso anual.

En comparación con el modelo MCO del inciso anterior, el método de propensity score matching proporciona una estimación más consistente y creíble, ya que atenúa el sesgo de selección y mejora la validez causal de los resultados obtenidos.

Inciso 3

Implemente matching en características (con las configuraciones que considere pertinente debidamente justificadas). Compare con los resultados obtenidos en el inciso 1 y 2. Argumente si el efecto estimado a través de este método es fiable o no.

load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")

dataset_final_1 <- rbind(psid_controls, nsw_dw)

cov_vars <- c("age","education","black","hispanic","married","nodegree","re74","re75")

X <- as.matrix(dataset_final_1[, cov_vars, drop = FALSE])

dataset_final_1$maha_dist <- mahalanobis(X, center = colMeans(X), cov = cov(X))

cal_vec <- rep(0.1, length(cov_vars))
names(cal_vec) <- cov_vars

set.seed(1234)
match_psm_ratio1 <- matchit(
  treat ~ age + education + black + hispanic + married + nodegree + re74 + re75,
  data = dataset_final_1,
  method = "nearest",
  distance = "mahalanobis", 
  ratio = 1,  
  caliper = cal_vec,
  replace = FALSE,
  estimand = "ATT"
)

matched_data_ratio1 <- match.data(match_psm_ratio1)

#mactching simple en caracteristicas
att_psm_ratio_simple <- lm_robust(re78 ~ treat, data = matched_data_ratio1, weights = weights, se_type = "HC2")
summary(att_psm_ratio_simple)
## 
## Call:
## lm_robust(formula = re78 ~ treat, data = matched_data_ratio1, 
##     weights = weights, se_type = "HC2")
## 
## Weighted, Standard error type:  HC2 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
## (Intercept)     3573      502.6   7.109 2.730e-11   2581.0     4565 178
## treat           1754      770.9   2.276 2.405e-02    233.1     3275 178
## 
## Multiple R-squared:  0.02827 ,   Adjusted R-squared:  0.02281 
## F-statistic: 5.179 on 1 and 178 DF,  p-value: 0.02405
#mactching simple en caracteristicas robusto
att_psm_ratio1 <- lm_robust(re78 ~ treat + age + education + black +
                              hispanic + married + nodegree + re74 + 
                              re75,
                            data = matched_data_ratio1, 
                            weights = weights, se_type = "HC2")
summary(att_psm_ratio1)
## 
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black + 
##     hispanic + married + nodegree + re74 + re75, data = matched_data_ratio1, 
##     weights = weights, se_type = "HC2")
## 
## Weighted, Standard error type:  HC2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   CI Lower   CI Upper  DF
## (Intercept)  1496.0489  4455.5309  0.3358  0.73746 -7299.2435 10291.3413 170
## treat        1750.1225   749.5885  2.3348  0.02072   270.4223  3229.8227 170
## age            85.6350    91.0352  0.9407  0.34820   -94.0700   265.3400 170
## education     306.4229   287.6438  1.0653  0.28826  -261.3908   874.2365 170
## black       -3723.3511  1666.7602 -2.2339  0.02679 -7013.5635  -433.1387 170
## hispanic     -491.9267  2398.5270 -0.2051  0.83774 -5226.6592  4242.8057 170
## married     -1018.3573  1397.1693 -0.7289  0.46708 -3776.3928  1739.6782 170
## nodegree      155.7743  1339.1010  0.1163  0.90753 -2487.6335  2799.1821 170
## re74           -0.6583     0.4283 -1.5369  0.12617    -1.5038     0.1872 170
## re75            1.3099     0.5326  2.4594  0.01492     0.2585     2.3613 170
## 
## Multiple R-squared:  0.1186 ,    Adjusted R-squared:  0.07193 
## F-statistic: 3.332 on 9 and 170 DF,  p-value: 0.0008904
cat("=== RESUMEN DEL MATCHING ===\n")
## === RESUMEN DEL MATCHING ===
summary(match_psm_ratio1, standardize = TRUE)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset_final_1, 
##     method = "nearest", distance = "mahalanobis", estimand = "ATT", 
##     replace = FALSE, caliper = cal_vec, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## age             25.8162       33.9244         -1.1332     0.4587    0.2079
## education       10.3459       11.9251         -0.7854     0.4394    0.0973
## black            0.8432        0.3051          1.4802          .    0.5382
## hispanic         0.0595        0.0396          0.0838          .    0.0198
## married          0.1892        0.7989         -1.5568          .    0.6097
## nodegree         0.7081        0.3553          0.7761          .    0.3528
## re74          2095.5737    17791.0560         -3.2119     0.1247    0.4356
## re75          1532.0553    17380.7662         -4.9231     0.0530    0.4403
##           eCDF Max
## age         0.3405
## education   0.3528
## black       0.5382
## hispanic    0.0198
## married     0.6097
## nodegree    0.3528
## re74        0.6603
## re75        0.7018
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## age             23.3444       23.2556          0.0124     1.0120    0.0046
## education       10.3667       10.3667          0.0000     1.0000    0.0000
## black            0.9111        0.9111          0.0000          .    0.0000
## hispanic         0.0222        0.0222          0.0000          .    0.0000
## married          0.0778        0.0778          0.0000          .    0.0000
## nodegree         0.7333        0.7333          0.0000          .    0.0000
## re74           538.7169      600.4506         -0.0126     0.8733    0.0029
## re75           479.5671      513.2268         -0.0105     1.0180    0.0035
##           eCDF Max Std. Pair Dist.
## age         0.0222          0.0466
## education   0.0000          0.0000
## black       0.0000          0.0000
## hispanic    0.0000          0.0000
## married     0.0000          0.0000
## nodegree    0.0000          0.0000
## re74        0.0333          0.0255
## re75        0.0333          0.0488
## 
## Sample Sizes:
##           Control Treated
## All          2750     185
## Matched        90      90
## Unmatched    2660      95
## Discarded       0       0
#grafico de balance
plot(match_psm_ratio1, type = "qq", interactive = FALSE)

#love plot
love_plot_1 <- love.plot(match_psm_ratio1,
                         stats = "mean.diffs",
                         stars = "raw",        
                         thresholds = 0.1,
                         drop.distance = TRUE,
                       title = "Balance de Covariables - PSM con dataset_final_1\n match en caracteriticas")
print(love_plot_1)

# fiabilidad matching


X_matched <- as.matrix(matched_data_ratio1[, cov_vars, drop = FALSE])
center_init <- colMeans(X)
cov_init <- cov(X)
matched_data_ratio1$maha_dist <- mahalanobis(X_matched, center = center_init, cov = cov_init)


# Pre-matching: usamos la maha_dist que creamos
ps_treat_pre_1   <- dataset_final_1$maha_dist[dataset_final_1$treat == 1]
ps_control_pre_1 <- dataset_final_1$maha_dist[dataset_final_1$treat == 0]

# Post-matching: match.data() guarda 'distance' con los valores pasados
ps_treat_post_1  <- matched_data_ratio1$maha_dist[matched_data_ratio1$treat == 1]
ps_control_post_1<- matched_data_ratio1$maha_dist[matched_data_ratio1$treat == 0]

cat("\n=== COMMON SUPPORT (Mahalanobis) ===\n")
## 
## === COMMON SUPPORT (Mahalanobis) ===
cat("Pre-matching - Tratados: [", round(min(ps_treat_pre_1), 6), ",", round(max(ps_treat_pre_1), 6), "]\n")
## Pre-matching - Tratados: [ 4.2152 , 30.04799 ]
cat("Pre-matching - Controles: [", round(min(ps_control_pre_1), 6), ",", round(max(ps_control_pre_1), 6), "]\n")
## Pre-matching - Controles: [ 0.988205 , 133.555 ]
cat("Post-matching - Tratados: [", round(min(ps_treat_post_1), 6), ",", round(max(ps_treat_post_1), 6), "]\n")
## Post-matching - Tratados: [ 4.481577 , 28.2091 ]
cat("Post-matching - Controles: [", round(min(ps_control_post_1), 6), ",", round(max(ps_control_post_1), 6), "]\n")
## Post-matching - Controles: [ 4.481577 , 28.22515 ]
#grafico de densidad

matched_data_ratio1$maha_dist <- mahalanobis(X_matched, center = center_init, cov = cov_init)

ggplot(matched_data_ratio1, aes(x = maha_dist, fill = factor(treat))) +
  geom_density(alpha = 0.5) +
  labs(title = "Densidad de Mahalanobis Post-Matching - dataset_final_1",
       subtitle = paste("Muestra:", nrow(matched_data_ratio1), "observaciones"),
       x = "Mahalanobis distance", y = "Densidad") +
  scale_fill_manual(labels = c("Control", "Tratado"), 
                    values = c("blue", "red"), 
                    name = "Grupo")

# COMPARACION FINAL MCO vs PSM VS Matching en caract

modelsummary(list("MCO Simple" = mco_simple,
                  "MCO con Controles" = mco_controles_1,
                  "MCO Propensity score simple" = att_psm,
                  "MCO Propensity score con controles" = att_psm_robust,
                  "MCO Propensity carteristicas simple" = att_psm_ratio_simple,
                  "MCO Propensity caracteristicas con controles" = att_psm_ratio1
                  ),
             title = "Comparación de Modelos MCO",
             fmt = 2
)
Comparación de Modelos MCO
MCO Simple MCO con Controles MCO Propensity score simple MCO Propensity score con controles MCO Propensity carteristicas simple MCO Propensity caracteristicas con controles
(Intercept) 21553.92 -120.22 5383.42 -115.33 3572.84 1496.05
(311.73) (1958.87) (444.24) (3695.39) (502.59) (4455.53)
age -93.57 32.51 85.64
(21.55) (42.92) (91.04)
education 594.87 532.19 306.42
(127.62) (205.98) (287.64)
black -570.70 -2220.14 -3723.35
(453.73) (1338.28) (1666.76)
hispanic 2502.69 -1642.81 -491.93
(1323.08) (2003.15) (2398.53)
married 1380.75 742.25 -1018.36
(526.93) (922.29) (1397.17)
nodegree 768.54 303.29 155.77
(677.01) (1090.47) (1339.10)
re74 0.29 0.26 -0.66
(0.06) (0.17) (0.43)
re75 0.57 0.16 1.31
(0.07) (0.18) (0.53)
treat 965.72 937.52 1754.28 1750.12
(729.33) (722.84) (770.85) (749.59)
Num.Obs. 2490 2490 370 370 180 180
R2 -0.000 0.572 0.005 0.098 0.028 0.119
R2 Adj. -0.000 0.570 0.002 0.076 0.023 0.072
AIC 55137.1 53042.0 7607.3 7586.7 3593.1 3591.5
BIC 55148.7 53100.2 7619.0 7629.8 3602.7 3626.7
RMSE 15552.22 10178.65 6995.51 6658.74 5142.24 4897.45

Al realizar el Matching en Características se observó una drástica reducción en la muestra de tratados, que pasó de las 185 observaciones retenidas en el PSM a solo 90. Sin embargo, a pesar de esta pérdida muestral, la estimación del Efecto Promedio del Tratamiento en los Tratados (ATT) mostró una notable coherencia con los modelos MCO controlados. El coeficiente estimado (\(\beta\)) para el tratamiento en Mahalanobis fue de $1750 , un valor muy similar a los $1754 obtenidos mediante el MCO simple, y fue altamente significativo en ambos casos. Esta convergencia en el efecto refuerza la fiabilidad interna de la estimación. No obstante, la significativa disminución de la muestra de tratados lo hace sensible y plantea interrogantes sobre la validez externa o la capacidad de generalización del resultado, haciendo del Mahalanobis un estimador robusto pero delicado.

Inciso 4

Implemente matching, utilizando como posibles controles todos aquellos de los cuales usted dispone. Compare con los resultados obtenidos en los incisos 1, 2 y 3. Argumente si el efecto estimado es fiable o no.

load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")

dataset11 <- rbind(psid_controls, nsw_dw, psid_controls2)


#estimar propensity score con probit
ps_model_11 <- glm(treat ~ age + education + black + hispanic + 
                  married + nodegree + re74 + re75,
                data = dataset11, 
                family = binomial(link = "probit"))

summary(ps_model_11)
## 
## Call:
## glm(formula = treat ~ age + education + black + hispanic + married + 
##     nodegree + re74 + re75, family = binomial(link = "probit"), 
##     data = dataset11)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -5.898e-01  4.801e-01  -1.228 0.219326    
## age         -2.177e-02  5.822e-03  -3.739 0.000185 ***
## education    3.404e-03  2.897e-02   0.117 0.906469    
## black        8.027e-01  1.396e-01   5.751 8.90e-09 ***
## hispanic     5.006e-01  2.279e-01   2.196 0.028070 *  
## married     -5.146e-01  1.170e-01  -4.397 1.10e-05 ***
## nodegree    -1.119e-02  1.459e-01  -0.077 0.938876    
## re74        -3.143e-05  1.137e-05  -2.765 0.005689 ** 
## re75        -6.386e-05  1.413e-05  -4.521 6.15e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1412.36  on 3187  degrees of freedom
## Residual deviance:  868.21  on 3179  degrees of freedom
## AIC: 886.21
## 
## Number of Fisher Scoring iterations: 9
#hacer matching
set.seed(2312)
match_psm_11 <- matchit(
  treat ~ age + education + black + hispanic + married + nodegree + re74 + re75,
  data = dataset11,
  method = "nearest",
  distance = "glm",
  link = "probit",
  ratio = 1,
  caliper = 0.1,  # super restrictivo con las desviacion 
  replace = FALSE,
  estimand = "ATT"
)

#resumen del maatching
cat("=== RESUMEN DEL MATCHING ===\n")
## === RESUMEN DEL MATCHING ===
summary(match_psm_11, standardize = TRUE)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset11, method = "nearest", 
##     distance = "glm", link = "probit", estimand = "ATT", replace = FALSE, 
##     caliper = 0.1, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2768        0.0444          1.8154     1.6880    0.4249
## age             25.8162       34.1072         -1.1588     0.4459    0.2126
## education       10.3459       11.8275         -0.7369     0.4312    0.0932
## black            0.8432        0.3124          1.4602          .    0.5309
## hispanic         0.0595        0.0420          0.0740          .    0.0175
## married          0.1892        0.7935         -1.5431          .    0.6044
## nodegree         0.7081        0.3663          0.7518          .    0.3418
## re74          2095.5737    17221.2160         -3.0953     0.1265    0.4215
## re75          1532.0553    16554.1526         -4.6663     0.0536    0.4196
##           eCDF Max
## distance    0.7801
## age         0.3434
## education   0.3418
## black       0.5309
## hispanic    0.0175
## married     0.6044
## nodegree    0.3418
## re74        0.6461
## re75        0.6724
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2768        0.2763          0.0038     1.0042    0.0005
## age             25.8162       25.4000          0.0582     0.8315    0.0209
## education       10.3459       10.3514         -0.0027     1.0387    0.0156
## black            0.8432        0.8595         -0.0446          .    0.0162
## hispanic         0.0595        0.0541          0.0229          .    0.0054
## married          0.1892        0.1730          0.0414          .    0.0162
## nodegree         0.7081        0.7568         -0.1070          .    0.0486
## re74          2095.5737     2243.0648         -0.0302     1.4801    0.0198
## re75          1532.0553     1906.1239         -0.1162     0.7098    0.0180
##           eCDF Max Std. Pair Dist.
## distance    0.0270          0.0076
## age         0.0811          0.7049
## education   0.0595          0.9006
## black       0.0162          0.5204
## hispanic    0.0054          0.4800
## married     0.0162          0.4278
## nodegree    0.0486          0.6777
## re74        0.0865          0.4620
## re75        0.0649          0.5172
## 
## Sample Sizes:
##           Control Treated
## All          3003     185
## Matched       185     185
## Unmatched    2818       0
## Discarded       0       0
#grafico de balance
plot(match_psm_11, type = "qq", interactive = FALSE)

#extraer datos 
matched_data_11 <- match.data(match_psm_11)

#ATT CON DATOS MATCHED
att_psm_11 <- lm_robust(re78 ~ treat, data = matched_data, weights = weights, se_type = "HC2")

summary(att_psm_11)
## 
## Call:
## lm_robust(formula = re78 ~ treat, data = matched_data, weights = weights, 
##     se_type = "HC2")
## 
## Weighted, Standard error type:  HC2 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|) CI Lower CI Upper  DF
## (Intercept)   5383.4      444.2  12.118 1.135e-28   4509.8     6257 368
## treat          965.7      729.3   1.324 1.863e-01   -468.5     2400 368
## 
## Multiple R-squared:  0.004742 ,  Adjusted R-squared:  0.002037 
## F-statistic: 1.753 on 1 and 368 DF,  p-value: 0.1863
att_psm_11 <- lm_robust(re78 ~ treat, data = matched_data_11, weights = weights, se_type = "HC2")
att_psm_robust_11 <- lm_robust(re78 ~ treat + age + education + black + hispanic + 
                              married + nodegree + re74 + re75, 
                            data = matched_data_11, weights = weights, se_type = "HC2")

summary(att_psm_robust_11)
## 
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black + 
##     hispanic + married + nodegree + re74 + re75, data = matched_data_11, 
##     weights = weights, se_type = "HC2")
## 
## Weighted, Standard error type:  HC2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   CI Lower  CI Upper  DF
## (Intercept) -1.715e+03   3573.145 -0.4801  0.63147 -8742.2278 5311.4915 360
## treat        1.197e+03    696.118  1.7189  0.08649  -172.3849 2565.5517 360
## age          6.753e+01     41.768  1.6168  0.10680   -14.6103  149.6702 360
## education    5.077e+02    210.427  2.4125  0.01634    93.8425  921.4829 360
## black       -1.471e+03   1097.848 -1.3396  0.18123 -3629.6391  688.3613 360
## hispanic     2.985e+02   1688.571  0.1768  0.85976 -3022.1611 3619.2429 360
## married      4.998e+02    962.370  0.5193  0.60384 -1392.7689 2392.3771 360
## nodegree     4.738e+02   1072.897  0.4416  0.65903 -1636.1179 2583.7480 360
## re74         5.873e-02      0.183  0.3209  0.74848    -0.3012    0.4187 360
## re75         3.535e-01      0.158  2.2368  0.02591     0.0427    0.6642 360
## 
## Multiple R-squared:  0.08947 ,   Adjusted R-squared:  0.06671 
## F-statistic: 3.791 on 9 and 360 DF,  p-value: 0.0001385
#balance
balance_stats_11 <- summary(match_psm_11, standardize = TRUE)
print(balance_stats_11)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset11, method = "nearest", 
##     distance = "glm", link = "probit", estimand = "ATT", replace = FALSE, 
##     caliper = 0.1, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2768        0.0444          1.8154     1.6880    0.4249
## age             25.8162       34.1072         -1.1588     0.4459    0.2126
## education       10.3459       11.8275         -0.7369     0.4312    0.0932
## black            0.8432        0.3124          1.4602          .    0.5309
## hispanic         0.0595        0.0420          0.0740          .    0.0175
## married          0.1892        0.7935         -1.5431          .    0.6044
## nodegree         0.7081        0.3663          0.7518          .    0.3418
## re74          2095.5737    17221.2160         -3.0953     0.1265    0.4215
## re75          1532.0553    16554.1526         -4.6663     0.0536    0.4196
##           eCDF Max
## distance    0.7801
## age         0.3434
## education   0.3418
## black       0.5309
## hispanic    0.0175
## married     0.6044
## nodegree    0.3418
## re74        0.6461
## re75        0.6724
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2768        0.2763          0.0038     1.0042    0.0005
## age             25.8162       25.4000          0.0582     0.8315    0.0209
## education       10.3459       10.3514         -0.0027     1.0387    0.0156
## black            0.8432        0.8595         -0.0446          .    0.0162
## hispanic         0.0595        0.0541          0.0229          .    0.0054
## married          0.1892        0.1730          0.0414          .    0.0162
## nodegree         0.7081        0.7568         -0.1070          .    0.0486
## re74          2095.5737     2243.0648         -0.0302     1.4801    0.0198
## re75          1532.0553     1906.1239         -0.1162     0.7098    0.0180
##           eCDF Max Std. Pair Dist.
## distance    0.0270          0.0076
## age         0.0811          0.7049
## education   0.0595          0.9006
## black       0.0162          0.5204
## hispanic    0.0054          0.4800
## married     0.0162          0.4278
## nodegree    0.0486          0.6777
## re74        0.0865          0.4620
## re75        0.0649          0.5172
## 
## Sample Sizes:
##           Control Treated
## All          3003     185
## Matched       185     185
## Unmatched    2818       0
## Discarded       0       0
#love plot
love_plot_11 <- love.plot(match_psm_11, 
                       stats = "mean.diffs",
                       stars = "raw",        
                       thresholds = 0.1,
                       drop.distance = TRUE,
                       title = "Balance de Covariables - PSM con dataset_final")
print(love_plot_11)

# FIABILIDAD


# Tamaño de muestra post-matching
cat("=== ANÁLISIS DE FIABILIDAD ===\n")
## === ANÁLISIS DE FIABILIDAD ===
cat("Tamaño de muestra original (dataset11):", nrow(dataset11), "\n")
## Tamaño de muestra original (dataset11): 3188
cat("Tamaño de muestra matched:", nrow(matched_data_11), "\n")
## Tamaño de muestra matched: 370
cat("Pérdida de observaciones:", nrow(dataset11) - nrow(matched_data_11), "\n")
## Pérdida de observaciones: 2818
cat("Porcentaje retenido:", round(nrow(matched_data_11)/nrow(dataset11)*100, 1), "%\n")
## Porcentaje retenido: 11.6 %
#Common support para obtener del x del grafico de densidad
dataset11$ps <- predict(ps_model_11, newdata = dataset11, type = "response")
matched_data_11$ps <- matched_data_11$distance

ps_treat_pre_11 <- dataset11$ps[dataset11$treat == 1]
ps_control_pre_11 <- dataset11$ps[dataset11$treat == 0]
ps_treat_post_11 <- matched_data_11$ps[matched_data_11$treat == 1]
ps_control_post_11 <- matched_data_11$ps[matched_data_11$treat == 0]

cat("\n=== COMMON SUPPORT ===\n")
## 
## === COMMON SUPPORT ===
cat("Pre-matching - Tratados: [", round(min(ps_treat_pre_11), 6), ",", round(max(ps_treat_pre_11), 6), "]\n")
## Pre-matching - Tratados: [ 0.001106 , 0.446575 ]
cat("Pre-matching - Controles: [", round(min(ps_control_pre_11), 6), ",", round(max(ps_control_pre_11), 6), "]\n")
## Pre-matching - Controles: [ 0 , 0.447921 ]
cat("Post-matching - Tratados: [", round(min(ps_treat_post_11), 6), ",", round(max(ps_treat_post_11), 6), "]\n")
## Post-matching - Tratados: [ 0.001106 , 0.446575 ]
cat("Post-matching - Controles: [", round(min(ps_control_post_11), 6), ",", round(max(ps_control_post_11), 6), "]\n")
## Post-matching - Controles: [ 0.001106 , 0.447921 ]
#grafico de densidad
densidad2 <-ggplot(matched_data_11, aes(x = ps, fill = factor(treat))) +
  geom_density(alpha = 0.5) +
  labs(title = "Densidad de Propensity Score Post-Matching - dataset_final",
       subtitle = paste("Muestra:", nrow(matched_data_11), "observaciones"),
       x = "Propensity Score", y = "Densidad") +
  scale_fill_manual(values = c("blue", "red"), 
                    labels = c("Control", "Tratado"),
                    name = "Grupo")
print(densidad2)

# COMPARACION FINAL MCO vs PSM VS Matching en caract


modelsummary(list("MCO Simple" = mco_simple,
                  "MCO con Controles" = mco_controles_1,
                  "MCO Propensity score simple" = att_psm,
                  "MCO Propensity score con controles" = att_psm_robust,
                  "MCO Propensity carteristicas simple" = att_psm_ratio_simple,
                  "MCO Propensity caracteristicas con controles" = att_psm_ratio1,
                  "MCO Propensity score simple 11" = att_psm_11,
                  "MCO Propensity score con controles 11" = att_psm_robust_11
),
title = "Comparación de Modelos MCO",
fmt = 2
)
Comparación de Modelos MCO
MCO Simple MCO con Controles MCO Propensity score simple MCO Propensity score con controles MCO Propensity carteristicas simple MCO Propensity caracteristicas con controles MCO Propensity score simple 11 MCO Propensity score con controles 11
(Intercept) 21553.92 -120.22 5383.42 -115.33 3572.84 1496.05 5257.56 -1715.37
(311.73) (1958.87) (444.24) (3695.39) (502.59) (4455.53) (397.97) (3573.15)
age -93.57 32.51 85.64 67.53
(21.55) (42.92) (91.04) (41.77)
education 594.87 532.19 306.42 507.66
(127.62) (205.98) (287.64) (210.43)
black -570.70 -2220.14 -3723.35 -1470.64
(453.73) (1338.28) (1666.76) (1097.85)
hispanic 2502.69 -1642.81 -491.93 298.54
(1323.08) (2003.15) (2398.53) (1688.57)
married 1380.75 742.25 -1018.36 499.80
(526.93) (922.29) (1397.17) (962.37)
nodegree 768.54 303.29 155.77 473.82
(677.01) (1090.47) (1339.10) (1072.90)
re74 0.29 0.26 -0.66 0.06
(0.06) (0.17) (0.43) (0.18)
re75 0.57 0.16 1.31 0.35
(0.07) (0.18) (0.53) (0.16)
treat 965.72 937.52 1754.28 1750.12 1091.58 1196.58
(729.33) (722.84) (770.85) (749.59) (702.11) (696.12)
Num.Obs. 2490 2490 370 370 180 180 370 370
R2 -0.000 0.572 0.005 0.098 0.028 0.119 0.007 0.089
R2 Adj. -0.000 0.570 0.002 0.076 0.023 0.072 0.004 0.067
AIC 55137.1 53042.0 7607.3 7586.7 3593.1 3591.5 7579.1 7562.8
BIC 55148.7 53100.2 7619.0 7629.8 3602.7 3626.7 7590.8 7605.9
RMSE 15552.22 10178.65 6995.51 6658.74 5142.24 4897.45 6734.37 6447.10
summary(match_psm_11)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset11, method = "nearest", 
##     distance = "glm", link = "probit", estimand = "ATT", replace = FALSE, 
##     caliper = 0.1, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2768        0.0444          1.8154     1.6880    0.4249
## age             25.8162       34.1072         -1.1588     0.4459    0.2126
## education       10.3459       11.8275         -0.7369     0.4312    0.0932
## black            0.8432        0.3124          1.4602          .    0.5309
## hispanic         0.0595        0.0420          0.0740          .    0.0175
## married          0.1892        0.7935         -1.5431          .    0.6044
## nodegree         0.7081        0.3663          0.7518          .    0.3418
## re74          2095.5737    17221.2160         -3.0953     0.1265    0.4215
## re75          1532.0553    16554.1526         -4.6663     0.0536    0.4196
##           eCDF Max
## distance    0.7801
## age         0.3434
## education   0.3418
## black       0.5309
## hispanic    0.0175
## married     0.6044
## nodegree    0.3418
## re74        0.6461
## re75        0.6724
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2768        0.2763          0.0038     1.0042    0.0005
## age             25.8162       25.4000          0.0582     0.8315    0.0209
## education       10.3459       10.3514         -0.0027     1.0387    0.0156
## black            0.8432        0.8595         -0.0446          .    0.0162
## hispanic         0.0595        0.0541          0.0229          .    0.0054
## married          0.1892        0.1730          0.0414          .    0.0162
## nodegree         0.7081        0.7568         -0.1070          .    0.0486
## re74          2095.5737     2243.0648         -0.0302     1.4801    0.0198
## re75          1532.0553     1906.1239         -0.1162     0.7098    0.0180
##           eCDF Max Std. Pair Dist.
## distance    0.0270          0.0076
## age         0.0811          0.7049
## education   0.0595          0.9006
## black       0.0162          0.5204
## hispanic    0.0054          0.4800
## married     0.0162          0.4278
## nodegree    0.0486          0.6777
## re74        0.0865          0.4620
## re75        0.0649          0.5172
## 
## Sample Sizes:
##           Control Treated
## All          3003     185
## Matched       185     185
## Unmatched    2818       0
## Discarded       0       0
summary(match_psm)
## 
## Call:
## matchit(formula = treat ~ age + education + black + hispanic + 
##     married + nodegree + re74 + re75, data = dataset_final, method = "nearest", 
##     distance = "glm", link = "probit", estimand = "ATT", replace = FALSE, 
##     caliper = 0.1, ratio = 1)
## 
## Summary of Balance for All Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2971        0.0470          1.8947     1.5698    0.4394
## age             25.8162       33.9244         -1.1332     0.4587    0.2079
## education       10.3459       11.9251         -0.7854     0.4394    0.0973
## black            0.8432        0.3051          1.4802          .    0.5382
## hispanic         0.0595        0.0396          0.0838          .    0.0198
## married          0.1892        0.7989         -1.5568          .    0.6097
## nodegree         0.7081        0.3553          0.7761          .    0.3528
## re74          2095.5737    17791.0560         -3.2119     0.1247    0.4356
## re75          1532.0553    17380.7662         -4.9231     0.0530    0.4403
##           eCDF Max
## distance    0.7917
## age         0.3405
## education   0.3528
## black       0.5382
## hispanic    0.0198
## married     0.6097
## nodegree    0.3528
## re74        0.6603
## re75        0.7018
## 
## Summary of Balance for Matched Data:
##           Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance         0.2971        0.2964          0.0056     1.0046    0.0005
## age             25.8162       26.1838         -0.0514     0.7603    0.0175
## education       10.3459       10.3027          0.0215     1.0065    0.0134
## black            0.8432        0.9135         -0.1933          .    0.0703
## hispanic         0.0595        0.0486          0.0457          .    0.0108
## married          0.1892        0.2324         -0.1104          .    0.0432
## nodegree         0.7081        0.7297         -0.0476          .    0.0216
## re74          2095.5737     2232.4086         -0.0280     1.3697    0.0163
## re75          1532.0553     1821.2594         -0.0898     0.9051    0.0140
##           eCDF Max Std. Pair Dist.
## distance    0.0270          0.0083
## age         0.0486          0.8144
## education   0.0757          0.8442
## black       0.0703          0.5501
## hispanic    0.0108          0.4114
## married     0.0432          0.4416
## nodegree    0.0216          0.6658
## re74        0.0757          0.5030
## re75        0.0703          0.5615
## 
## Sample Sizes:
##           Control Treated
## All          2750     185
## Matched       185     185
## Unmatched    2565       0
## Discarded       0       0

Si bien el matching se realizo de manera correcta y se encontro match en todos los tratados, son variables no significativas excepto la variables educacion para el mco con todas las variables incluidas pero en el mco simple es altamente significativa.

Si es fiable a comparacion de las demas modelos, no lo recomiendo, es util de manera preliminar y al observar un patron, el mco simple es significativo mas no cuando se incluyen todas la variables por lo que es no recomendable usar propensity score.

Inciso 5

Muestre una breve discusión de los resultados de sus estimaciones.

Como se menciono en el inciso 4 lo mejor es no trabajar con propensity score, valores no significativos.

En cambio con matching en caracteristica es mas robusto, se obtuvo mejores resultados robustos, significativos y distancia mco de tratados mas corta, aunque se reduce a 90 de 185 tratados, se uso solo 2 bases de datos recomiendo ampliamente agregar la ultima base de datos para mejores resultados.

Por lo que se puede concluir que el tratamiento incremento ingresos entre $270-$3,230, siendo el efecto más confiable alrededor de $1,750 para la subpoblación que encontró match óptimo.