Este PSet es de carácter práctico y tiene como objetivo que realicen estimaciones relacionadas con Matching e interpreten los resultados. Para el desarrollo del mismo pueden utilizar los fragmentos de códigos presentados en Matching.R
# INSTALACIÓN DE PAQUETES (comentado porque ya están instalados)
# install.packages(c("haven", "dplyr", "xtable", "stargazer",
# "MatchIt", "margins", "estimatr", "cobalt",
# "ggplot2", "modelsummary"))
# CARGA DE PAQUETES
library(haven)
library(dplyr)
library(xtable)
library(stargazer)
library(MatchIt)
library(margins)
library(estimatr)
library(cobalt)
library(ggplot2)
library(modelsummary)
#lenguaje
Sys.setlocale("LC_ALL", "en_US.UTF-8")
## [1] "LC_COLLATE=en_US.UTF-8;LC_CTYPE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8"
load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")
data_mco <- psid_controls
#para que r sepa que treat es una dummy
data_mco$treat_factor <- as.factor(data_mco$treat)
# Estimación MCO simple (sin controles)
mco_simple <- lm_robust(re78 ~ treat,
data = data_mco, se_type = "HC2")
summary(mco_simple)
##
## Call:
## lm_robust(formula = re78 ~ treat, data = data_mco, se_type = "HC2")
##
## Standard error type: HC2
##
## Coefficients: (1 not defined because the design matrix is rank deficient)
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) 21554 311.7 69.14 0 20943 22165 2489
## treat NA NA NA NA NA NA NA
##
## Multiple R-squared: -1.332e-15 , Adjusted R-squared: -1.332e-15
# Estimación MCO con controles
mco_controles_1 <- lm_robust(re78 ~ treat + age + education + black + hispanic + married + nodegree + re74 + re75,
data = data_mco, se_type = "HC2")
summary(mco_controles_1)
##
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black +
## hispanic + married + nodegree + re74 + re75, data = data_mco,
## se_type = "HC2")
##
## Standard error type: HC2
##
## Coefficients: (1 not defined because the design matrix is rank deficient)
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) -120.2203 1.959e+03 -0.06137 9.511e-01 -3961.4153 3720.9748 2481
## treat NA NA NA NA NA NA NA
## age -93.5690 2.155e+01 -4.34290 1.463e-05 -135.8176 -51.3204 2481
## education 594.8656 1.276e+02 4.66112 3.311e-06 344.6074 845.1238 2481
## black -570.6953 4.537e+02 -1.25778 2.086e-01 -1460.4315 319.0409 2481
## hispanic 2502.6857 1.323e+03 1.89157 5.867e-02 -91.7623 5097.1337 2481
## married 1380.7505 5.269e+02 2.62036 8.837e-03 347.4779 2414.0231 2481
## nodegree 768.5418 6.770e+02 1.13520 2.564e-01 -559.0188 2096.1023 2481
## re74 0.2852 6.347e-02 4.49243 7.364e-06 0.1607 0.4096 2481
## re75 0.5675 6.813e-02 8.32886 1.332e-16 0.4339 0.7011 2481
##
## Multiple R-squared: 0.5717 , Adjusted R-squared: 0.5703
## F-statistic: NA on 8 and 2481 DF, p-value: NA
# que fue lo que se hizo es tomar los tratados y se estimo un mco
# como en el 75 fue el programa se ve si existe pues una causalidad
modelsummary(list("Modelo Simple" = mco_simple,
"Modelo con Controles" = mco_controles_1),
title = "Comparación de Modelos MCO",
fmt = 2
)
| Modelo Simple | Modelo con Controles | |
|---|---|---|
| (Intercept) | 21553.92 | -120.22 |
| (311.73) | (1958.87) | |
| age | -93.57 | |
| (21.55) | ||
| education | 594.87 | |
| (127.62) | ||
| black | -570.70 | |
| (453.73) | ||
| hispanic | 2502.69 | |
| (1323.08) | ||
| married | 1380.75 | |
| (526.93) | ||
| nodegree | 768.54 | |
| (677.01) | ||
| re74 | 0.29 | |
| (0.06) | ||
| re75 | 0.57 | |
| (0.07) | ||
| Num.Obs. | 2490 | 2490 |
| R2 | -0.000 | 0.572 |
| R2 Adj. | -0.000 | 0.570 |
| AIC | 55137.1 | 53042.0 |
| BIC | 55148.7 | 53100.2 |
| RMSE | 15552.22 | 10178.65 |
El efecto del tratamiento estimado mediante MCO no es confiable, ya que los individuos tratados del experimento NSW y los controles del CPS difieren significativamente en sus características.
Tambien se introduce el sesgo de selección que impide identificar de forma precisa el efecto causal del programa sobre los ingresos de 1978.
load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")
# PROPENSITY SCORE MATCHING
dataset <- rbind(psid_controls, nsw_dw)
dataset_final <- dataset %>% select("treat","age","education","black",
"hispanic","married","nodegree",
"re74","re75","re78")
#estimar propensity score con probit
ps_model <- glm(treat ~ age + education + black + hispanic +
married + nodegree + re74 + re75,
data = dataset_final,
family = binomial(link = "probit"))
summary(ps_model)
##
## Call:
## glm(formula = treat ~ age + education + black + hispanic + married +
## nodegree + re74 + re75, family = binomial(link = "probit"),
## data = dataset_final)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.844e-01 4.908e-01 -0.987 0.3236
## age -1.915e-02 6.034e-03 -3.174 0.0015 **
## education -4.147e-03 2.983e-02 -0.139 0.8894
## black 8.138e-01 1.426e-01 5.706 1.16e-08 ***
## hispanic 5.016e-01 2.343e-01 2.140 0.0323 *
## married -5.082e-01 1.209e-01 -4.204 2.62e-05 ***
## nodegree -6.370e-02 1.512e-01 -0.421 0.6735
## re74 -2.917e-05 1.164e-05 -2.506 0.0122 *
## re75 -6.796e-05 1.434e-05 -4.740 2.13e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1380.81 on 2934 degrees of freedom
## Residual deviance: 831.56 on 2926 degrees of freedom
## AIC: 849.56
##
## Number of Fisher Scoring iterations: 9
#hacer matching
set.seed(123)
match_psm <- matchit(
treat ~ age + education + black + hispanic + married + nodegree + re74 + re75,
data = dataset_final,
method = "nearest",
distance = "glm",
link = "probit",
ratio = 1,
caliper = 0.1, # super restrictivo con las desviacion
replace = FALSE,
estimand = "ATT"
)
#resumen del maatching
cat("=== RESUMEN DEL MATCHING ===\n")
## === RESUMEN DEL MATCHING ===
summary(match_psm, standardize = TRUE)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset_final, method = "nearest",
## distance = "glm", link = "probit", estimand = "ATT", replace = FALSE,
## caliper = 0.1, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2971 0.0470 1.8947 1.5698 0.4394
## age 25.8162 33.9244 -1.1332 0.4587 0.2079
## education 10.3459 11.9251 -0.7854 0.4394 0.0973
## black 0.8432 0.3051 1.4802 . 0.5382
## hispanic 0.0595 0.0396 0.0838 . 0.0198
## married 0.1892 0.7989 -1.5568 . 0.6097
## nodegree 0.7081 0.3553 0.7761 . 0.3528
## re74 2095.5737 17791.0560 -3.2119 0.1247 0.4356
## re75 1532.0553 17380.7662 -4.9231 0.0530 0.4403
## eCDF Max
## distance 0.7917
## age 0.3405
## education 0.3528
## black 0.5382
## hispanic 0.0198
## married 0.6097
## nodegree 0.3528
## re74 0.6603
## re75 0.7018
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2971 0.2964 0.0056 1.0046 0.0005
## age 25.8162 26.1838 -0.0514 0.7603 0.0175
## education 10.3459 10.3027 0.0215 1.0065 0.0134
## black 0.8432 0.9135 -0.1933 . 0.0703
## hispanic 0.0595 0.0486 0.0457 . 0.0108
## married 0.1892 0.2324 -0.1104 . 0.0432
## nodegree 0.7081 0.7297 -0.0476 . 0.0216
## re74 2095.5737 2232.4086 -0.0280 1.3697 0.0163
## re75 1532.0553 1821.2594 -0.0898 0.9051 0.0140
## eCDF Max Std. Pair Dist.
## distance 0.0270 0.0083
## age 0.0486 0.8144
## education 0.0757 0.8442
## black 0.0703 0.5501
## hispanic 0.0108 0.4114
## married 0.0432 0.4416
## nodegree 0.0216 0.6658
## re74 0.0757 0.5030
## re75 0.0703 0.5615
##
## Sample Sizes:
## Control Treated
## All 2750 185
## Matched 185 185
## Unmatched 2565 0
## Discarded 0 0
#grafico de balance
plot(match_psm, type = "qq", interactive = FALSE)
#extraer datos
matched_data <- match.data(match_psm)
#ATT CON DATOS MATCHED
att_psm <- lm_robust(re78 ~ treat, data = matched_data, weights = weights, se_type = "HC2")
summary(att_psm)
##
## Call:
## lm_robust(formula = re78 ~ treat, data = matched_data, weights = weights,
## se_type = "HC2")
##
## Weighted, Standard error type: HC2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) 5383.4 444.2 12.118 1.135e-28 4509.8 6257 368
## treat 965.7 729.3 1.324 1.863e-01 -468.5 2400 368
##
## Multiple R-squared: 0.004742 , Adjusted R-squared: 0.002037
## F-statistic: 1.753 on 1 and 368 DF, p-value: 0.1863
att_psm_robust <- lm_robust(re78 ~ treat + age + education + black + hispanic +
married + nodegree + re74 + re75,
data = matched_data, weights = weights, se_type = "HC2")
summary(att_psm_robust)
##
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black +
## hispanic + married + nodegree + re74 + re75, data = matched_data,
## weights = weights, se_type = "HC2")
##
## Weighted, Standard error type: HC2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) -115.3341 3695.3879 -0.03121 0.97512 -7.383e+03 7151.9252 360
## treat 937.5167 722.8381 1.29699 0.19546 -4.840e+02 2359.0323 360
## age 32.5081 42.9152 0.75750 0.44925 -5.189e+01 116.9042 360
## education 532.1903 205.9845 2.58364 0.01017 1.271e+02 937.2743 360
## black -2220.1366 1338.2821 -1.65895 0.09800 -4.852e+03 411.6962 360
## hispanic -1642.8126 2003.1524 -0.82011 0.41269 -5.582e+03 2296.5377 360
## married 742.2488 922.2915 0.80479 0.42147 -1.072e+03 2556.0047 360
## nodegree 303.2858 1090.4696 0.27812 0.78108 -1.841e+03 2447.7765 360
## re74 0.2623 0.1709 1.53507 0.12565 -7.373e-02 0.5983 360
## re75 0.1610 0.1850 0.87056 0.38458 -2.027e-01 0.5248 360
##
## Multiple R-squared: 0.09826 , Adjusted R-squared: 0.07572
## F-statistic: 3.65 on 9 and 360 DF, p-value: 0.0002211
#balance
balance_stats <- summary(match_psm, standardize = TRUE)
print(balance_stats)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset_final, method = "nearest",
## distance = "glm", link = "probit", estimand = "ATT", replace = FALSE,
## caliper = 0.1, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2971 0.0470 1.8947 1.5698 0.4394
## age 25.8162 33.9244 -1.1332 0.4587 0.2079
## education 10.3459 11.9251 -0.7854 0.4394 0.0973
## black 0.8432 0.3051 1.4802 . 0.5382
## hispanic 0.0595 0.0396 0.0838 . 0.0198
## married 0.1892 0.7989 -1.5568 . 0.6097
## nodegree 0.7081 0.3553 0.7761 . 0.3528
## re74 2095.5737 17791.0560 -3.2119 0.1247 0.4356
## re75 1532.0553 17380.7662 -4.9231 0.0530 0.4403
## eCDF Max
## distance 0.7917
## age 0.3405
## education 0.3528
## black 0.5382
## hispanic 0.0198
## married 0.6097
## nodegree 0.3528
## re74 0.6603
## re75 0.7018
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2971 0.2964 0.0056 1.0046 0.0005
## age 25.8162 26.1838 -0.0514 0.7603 0.0175
## education 10.3459 10.3027 0.0215 1.0065 0.0134
## black 0.8432 0.9135 -0.1933 . 0.0703
## hispanic 0.0595 0.0486 0.0457 . 0.0108
## married 0.1892 0.2324 -0.1104 . 0.0432
## nodegree 0.7081 0.7297 -0.0476 . 0.0216
## re74 2095.5737 2232.4086 -0.0280 1.3697 0.0163
## re75 1532.0553 1821.2594 -0.0898 0.9051 0.0140
## eCDF Max Std. Pair Dist.
## distance 0.0270 0.0083
## age 0.0486 0.8144
## education 0.0757 0.8442
## black 0.0703 0.5501
## hispanic 0.0108 0.4114
## married 0.0432 0.4416
## nodegree 0.0216 0.6658
## re74 0.0757 0.5030
## re75 0.0703 0.5615
##
## Sample Sizes:
## Control Treated
## All 2750 185
## Matched 185 185
## Unmatched 2565 0
## Discarded 0 0
#love plot
love_plot <- love.plot(match_psm,
stats = "mean.diffs",
stars = "raw",
thresholds = 0.1,
drop.distance = TRUE,
title = "Balance de Covariables - PSM con dataset_final")
print(love_plot)
# FIABILIDAD
# Tamaño de muestra post-matching
cat("=== ANÁLISIS DE FIABILIDAD ===\n")
## === ANÁLISIS DE FIABILIDAD ===
cat("Tamaño de muestra original (dataset_final):", nrow(dataset_final), "\n")
## Tamaño de muestra original (dataset_final): 2935
cat("Tamaño de muestra matched:", nrow(matched_data), "\n")
## Tamaño de muestra matched: 370
cat("Pérdida de observaciones:", nrow(dataset_final) - nrow(matched_data), "\n")
## Pérdida de observaciones: 2565
cat("Porcentaje retenido:", round(nrow(matched_data)/nrow(dataset_final)*100, 1), "%\n")
## Porcentaje retenido: 12.6 %
#Common support para obtener del x del grafico de densidad
dataset_final$ps <- predict(ps_model, newdata = dataset_final, type = "response")
matched_data$ps <- matched_data$distance
ps_treat_pre <- dataset_final$ps[dataset_final$treat == 1]
ps_control_pre <- dataset_final$ps[dataset_final$treat == 0]
ps_treat_post <- matched_data$ps[matched_data$treat == 1]
ps_control_post <- matched_data$ps[matched_data$treat == 0]
cat("\n=== COMMON SUPPORT ===\n")
##
## === COMMON SUPPORT ===
cat("Pre-matching - Tratados: [", round(min(ps_treat_pre), 6), ",", round(max(ps_treat_pre), 6), "]\n")
## Pre-matching - Tratados: [ 0.001119 , 0.474009 ]
cat("Pre-matching - Controles: [", round(min(ps_control_pre), 6), ",", round(max(ps_control_pre), 6), "]\n")
## Pre-matching - Controles: [ 0 , 0.466389 ]
cat("Post-matching - Tratados: [", round(min(ps_treat_post), 6), ",", round(max(ps_treat_post), 6), "]\n")
## Post-matching - Tratados: [ 0.001119 , 0.474009 ]
cat("Post-matching - Controles: [", round(min(ps_control_post), 6), ",", round(max(ps_control_post), 6), "]\n")
## Post-matching - Controles: [ 0.001116 , 0.466389 ]
#grafico de densidad
densidad1 <- ggplot(matched_data, aes(x = ps, fill = factor(treat))) +
geom_density(alpha = 0.5) +
labs(title = "Densidad de Propensity Score Post-Matching - dataset_final",
subtitle = paste("Muestra:", nrow(matched_data), "observaciones"),
x = "Propensity Score", y = "Densidad") +
scale_fill_manual(values = c("blue", "red"),
labels = c("Control", "Tratado"),
name = "Grupo")
print(densidad1)
# COMPARACION FINAL MCO vs PSM
modelsummary(list("MCO Simple" = mco_simple,
"MCO con Controles" = mco_controles_1,
"MCO Propensity score simple" = att_psm,
"MCO Propensity score con controles" = att_psm_robust),
title = "Comparación de Modelos MCO",
fmt = 2
)
| MCO Simple | MCO con Controles | MCO Propensity score simple | MCO Propensity score con controles | |
|---|---|---|---|---|
| (Intercept) | 21553.92 | -120.22 | 5383.42 | -115.33 |
| (311.73) | (1958.87) | (444.24) | (3695.39) | |
| age | -93.57 | 32.51 | ||
| (21.55) | (42.92) | |||
| education | 594.87 | 532.19 | ||
| (127.62) | (205.98) | |||
| black | -570.70 | -2220.14 | ||
| (453.73) | (1338.28) | |||
| hispanic | 2502.69 | -1642.81 | ||
| (1323.08) | (2003.15) | |||
| married | 1380.75 | 742.25 | ||
| (526.93) | (922.29) | |||
| nodegree | 768.54 | 303.29 | ||
| (677.01) | (1090.47) | |||
| re74 | 0.29 | 0.26 | ||
| (0.06) | (0.17) | |||
| re75 | 0.57 | 0.16 | ||
| (0.07) | (0.18) | |||
| treat | 965.72 | 937.52 | ||
| (729.33) | (722.84) | |||
| Num.Obs. | 2490 | 2490 | 370 | 370 |
| R2 | -0.000 | 0.572 | 0.005 | 0.098 |
| R2 Adj. | -0.000 | 0.570 | 0.002 | 0.076 |
| AIC | 55137.1 | 53042.0 | 7607.3 | 7586.7 |
| BIC | 55148.7 | 53100.2 | 7619.0 | 7629.8 |
| RMSE | 15552.22 | 10178.65 | 6995.51 | 6658.74 |
Del resumen del balance previo al emparejamiento (pre-matching) se aprecia que existían diferencias estandarizadas muy altas entre los grupos, con valores cercanos a 4.9 en términos absolutos. No obstante, tras aplicar el emparejamiento (post-matching), dichas diferencias disminuyeron de manera considerable, ubicándose por debajo de 0.1. Esto evidencia que el proceso de matching logró mejorar sustancialmente la similitud entre tratados y controles, por lo que puede considerarse un procedimiento confiable.
Aunque en las regresiones no existe evidencia estadísticamente significativa de que el tratamiento tenga un efecto en los ingresos de 1978, R^2 bajo y p-value no significavos exepto en educacion con un beta del 532.19 es decir que por estudiar se puede generar $ 532.19 dolares mas de salario con un error de 205.98.
En las regresiones realizadas después del emparejamiento no se encontró evidencia estadísticamente significativa de que el tratamiento haya influido en los ingresos de 1978. El bajo valor de 𝑅2 y los p-valores elevados respaldan esta interpretación. La única variable con significancia fue la educación, cuyo coeficiente (β = 532.19; error estándar = 205.98) indica que un mayor nivel educativo se asocia con un incremento aproximado de 532 dólares en el ingreso anual.
En comparación con el modelo MCO del inciso anterior, el método de propensity score matching proporciona una estimación más consistente y creíble, ya que atenúa el sesgo de selección y mejora la validez causal de los resultados obtenidos.
load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")
dataset_final_1 <- rbind(psid_controls, nsw_dw)
cov_vars <- c("age","education","black","hispanic","married","nodegree","re74","re75")
X <- as.matrix(dataset_final_1[, cov_vars, drop = FALSE])
dataset_final_1$maha_dist <- mahalanobis(X, center = colMeans(X), cov = cov(X))
cal_vec <- rep(0.1, length(cov_vars))
names(cal_vec) <- cov_vars
set.seed(1234)
match_psm_ratio1 <- matchit(
treat ~ age + education + black + hispanic + married + nodegree + re74 + re75,
data = dataset_final_1,
method = "nearest",
distance = "mahalanobis",
ratio = 1,
caliper = cal_vec,
replace = FALSE,
estimand = "ATT"
)
matched_data_ratio1 <- match.data(match_psm_ratio1)
#mactching simple en caracteristicas
att_psm_ratio_simple <- lm_robust(re78 ~ treat, data = matched_data_ratio1, weights = weights, se_type = "HC2")
summary(att_psm_ratio_simple)
##
## Call:
## lm_robust(formula = re78 ~ treat, data = matched_data_ratio1,
## weights = weights, se_type = "HC2")
##
## Weighted, Standard error type: HC2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) 3573 502.6 7.109 2.730e-11 2581.0 4565 178
## treat 1754 770.9 2.276 2.405e-02 233.1 3275 178
##
## Multiple R-squared: 0.02827 , Adjusted R-squared: 0.02281
## F-statistic: 5.179 on 1 and 178 DF, p-value: 0.02405
#mactching simple en caracteristicas robusto
att_psm_ratio1 <- lm_robust(re78 ~ treat + age + education + black +
hispanic + married + nodegree + re74 +
re75,
data = matched_data_ratio1,
weights = weights, se_type = "HC2")
summary(att_psm_ratio1)
##
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black +
## hispanic + married + nodegree + re74 + re75, data = matched_data_ratio1,
## weights = weights, se_type = "HC2")
##
## Weighted, Standard error type: HC2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) 1496.0489 4455.5309 0.3358 0.73746 -7299.2435 10291.3413 170
## treat 1750.1225 749.5885 2.3348 0.02072 270.4223 3229.8227 170
## age 85.6350 91.0352 0.9407 0.34820 -94.0700 265.3400 170
## education 306.4229 287.6438 1.0653 0.28826 -261.3908 874.2365 170
## black -3723.3511 1666.7602 -2.2339 0.02679 -7013.5635 -433.1387 170
## hispanic -491.9267 2398.5270 -0.2051 0.83774 -5226.6592 4242.8057 170
## married -1018.3573 1397.1693 -0.7289 0.46708 -3776.3928 1739.6782 170
## nodegree 155.7743 1339.1010 0.1163 0.90753 -2487.6335 2799.1821 170
## re74 -0.6583 0.4283 -1.5369 0.12617 -1.5038 0.1872 170
## re75 1.3099 0.5326 2.4594 0.01492 0.2585 2.3613 170
##
## Multiple R-squared: 0.1186 , Adjusted R-squared: 0.07193
## F-statistic: 3.332 on 9 and 170 DF, p-value: 0.0008904
cat("=== RESUMEN DEL MATCHING ===\n")
## === RESUMEN DEL MATCHING ===
summary(match_psm_ratio1, standardize = TRUE)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset_final_1,
## method = "nearest", distance = "mahalanobis", estimand = "ATT",
## replace = FALSE, caliper = cal_vec, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## age 25.8162 33.9244 -1.1332 0.4587 0.2079
## education 10.3459 11.9251 -0.7854 0.4394 0.0973
## black 0.8432 0.3051 1.4802 . 0.5382
## hispanic 0.0595 0.0396 0.0838 . 0.0198
## married 0.1892 0.7989 -1.5568 . 0.6097
## nodegree 0.7081 0.3553 0.7761 . 0.3528
## re74 2095.5737 17791.0560 -3.2119 0.1247 0.4356
## re75 1532.0553 17380.7662 -4.9231 0.0530 0.4403
## eCDF Max
## age 0.3405
## education 0.3528
## black 0.5382
## hispanic 0.0198
## married 0.6097
## nodegree 0.3528
## re74 0.6603
## re75 0.7018
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## age 23.3444 23.2556 0.0124 1.0120 0.0046
## education 10.3667 10.3667 0.0000 1.0000 0.0000
## black 0.9111 0.9111 0.0000 . 0.0000
## hispanic 0.0222 0.0222 0.0000 . 0.0000
## married 0.0778 0.0778 0.0000 . 0.0000
## nodegree 0.7333 0.7333 0.0000 . 0.0000
## re74 538.7169 600.4506 -0.0126 0.8733 0.0029
## re75 479.5671 513.2268 -0.0105 1.0180 0.0035
## eCDF Max Std. Pair Dist.
## age 0.0222 0.0466
## education 0.0000 0.0000
## black 0.0000 0.0000
## hispanic 0.0000 0.0000
## married 0.0000 0.0000
## nodegree 0.0000 0.0000
## re74 0.0333 0.0255
## re75 0.0333 0.0488
##
## Sample Sizes:
## Control Treated
## All 2750 185
## Matched 90 90
## Unmatched 2660 95
## Discarded 0 0
#grafico de balance
plot(match_psm_ratio1, type = "qq", interactive = FALSE)
#love plot
love_plot_1 <- love.plot(match_psm_ratio1,
stats = "mean.diffs",
stars = "raw",
thresholds = 0.1,
drop.distance = TRUE,
title = "Balance de Covariables - PSM con dataset_final_1\n match en caracteriticas")
print(love_plot_1)
# fiabilidad matching
X_matched <- as.matrix(matched_data_ratio1[, cov_vars, drop = FALSE])
center_init <- colMeans(X)
cov_init <- cov(X)
matched_data_ratio1$maha_dist <- mahalanobis(X_matched, center = center_init, cov = cov_init)
# Pre-matching: usamos la maha_dist que creamos
ps_treat_pre_1 <- dataset_final_1$maha_dist[dataset_final_1$treat == 1]
ps_control_pre_1 <- dataset_final_1$maha_dist[dataset_final_1$treat == 0]
# Post-matching: match.data() guarda 'distance' con los valores pasados
ps_treat_post_1 <- matched_data_ratio1$maha_dist[matched_data_ratio1$treat == 1]
ps_control_post_1<- matched_data_ratio1$maha_dist[matched_data_ratio1$treat == 0]
cat("\n=== COMMON SUPPORT (Mahalanobis) ===\n")
##
## === COMMON SUPPORT (Mahalanobis) ===
cat("Pre-matching - Tratados: [", round(min(ps_treat_pre_1), 6), ",", round(max(ps_treat_pre_1), 6), "]\n")
## Pre-matching - Tratados: [ 4.2152 , 30.04799 ]
cat("Pre-matching - Controles: [", round(min(ps_control_pre_1), 6), ",", round(max(ps_control_pre_1), 6), "]\n")
## Pre-matching - Controles: [ 0.988205 , 133.555 ]
cat("Post-matching - Tratados: [", round(min(ps_treat_post_1), 6), ",", round(max(ps_treat_post_1), 6), "]\n")
## Post-matching - Tratados: [ 4.481577 , 28.2091 ]
cat("Post-matching - Controles: [", round(min(ps_control_post_1), 6), ",", round(max(ps_control_post_1), 6), "]\n")
## Post-matching - Controles: [ 4.481577 , 28.22515 ]
#grafico de densidad
matched_data_ratio1$maha_dist <- mahalanobis(X_matched, center = center_init, cov = cov_init)
ggplot(matched_data_ratio1, aes(x = maha_dist, fill = factor(treat))) +
geom_density(alpha = 0.5) +
labs(title = "Densidad de Mahalanobis Post-Matching - dataset_final_1",
subtitle = paste("Muestra:", nrow(matched_data_ratio1), "observaciones"),
x = "Mahalanobis distance", y = "Densidad") +
scale_fill_manual(labels = c("Control", "Tratado"),
values = c("blue", "red"),
name = "Grupo")
# COMPARACION FINAL MCO vs PSM VS Matching en caract
modelsummary(list("MCO Simple" = mco_simple,
"MCO con Controles" = mco_controles_1,
"MCO Propensity score simple" = att_psm,
"MCO Propensity score con controles" = att_psm_robust,
"MCO Propensity carteristicas simple" = att_psm_ratio_simple,
"MCO Propensity caracteristicas con controles" = att_psm_ratio1
),
title = "Comparación de Modelos MCO",
fmt = 2
)
| MCO Simple | MCO con Controles | MCO Propensity score simple | MCO Propensity score con controles | MCO Propensity carteristicas simple | MCO Propensity caracteristicas con controles | |
|---|---|---|---|---|---|---|
| (Intercept) | 21553.92 | -120.22 | 5383.42 | -115.33 | 3572.84 | 1496.05 |
| (311.73) | (1958.87) | (444.24) | (3695.39) | (502.59) | (4455.53) | |
| age | -93.57 | 32.51 | 85.64 | |||
| (21.55) | (42.92) | (91.04) | ||||
| education | 594.87 | 532.19 | 306.42 | |||
| (127.62) | (205.98) | (287.64) | ||||
| black | -570.70 | -2220.14 | -3723.35 | |||
| (453.73) | (1338.28) | (1666.76) | ||||
| hispanic | 2502.69 | -1642.81 | -491.93 | |||
| (1323.08) | (2003.15) | (2398.53) | ||||
| married | 1380.75 | 742.25 | -1018.36 | |||
| (526.93) | (922.29) | (1397.17) | ||||
| nodegree | 768.54 | 303.29 | 155.77 | |||
| (677.01) | (1090.47) | (1339.10) | ||||
| re74 | 0.29 | 0.26 | -0.66 | |||
| (0.06) | (0.17) | (0.43) | ||||
| re75 | 0.57 | 0.16 | 1.31 | |||
| (0.07) | (0.18) | (0.53) | ||||
| treat | 965.72 | 937.52 | 1754.28 | 1750.12 | ||
| (729.33) | (722.84) | (770.85) | (749.59) | |||
| Num.Obs. | 2490 | 2490 | 370 | 370 | 180 | 180 |
| R2 | -0.000 | 0.572 | 0.005 | 0.098 | 0.028 | 0.119 |
| R2 Adj. | -0.000 | 0.570 | 0.002 | 0.076 | 0.023 | 0.072 |
| AIC | 55137.1 | 53042.0 | 7607.3 | 7586.7 | 3593.1 | 3591.5 |
| BIC | 55148.7 | 53100.2 | 7619.0 | 7629.8 | 3602.7 | 3626.7 |
| RMSE | 15552.22 | 10178.65 | 6995.51 | 6658.74 | 5142.24 | 4897.45 |
Al realizar el Matching en Características se observó una drástica reducción en la muestra de tratados, que pasó de las 185 observaciones retenidas en el PSM a solo 90. Sin embargo, a pesar de esta pérdida muestral, la estimación del Efecto Promedio del Tratamiento en los Tratados (ATT) mostró una notable coherencia con los modelos MCO controlados. El coeficiente estimado (\(\beta\)) para el tratamiento en Mahalanobis fue de $1750 , un valor muy similar a los $1754 obtenidos mediante el MCO simple, y fue altamente significativo en ambos casos. Esta convergencia en el efecto refuerza la fiabilidad interna de la estimación. No obstante, la significativa disminución de la muestra de tratados lo hace sensible y plantea interrogantes sobre la validez externa o la capacidad de generalización del resultado, haciendo del Mahalanobis un estimador robusto pero delicado.
load("psid_controls.rda")
load("nsw_dw.rda")
load("psid_controls2.rda")
dataset11 <- rbind(psid_controls, nsw_dw, psid_controls2)
#estimar propensity score con probit
ps_model_11 <- glm(treat ~ age + education + black + hispanic +
married + nodegree + re74 + re75,
data = dataset11,
family = binomial(link = "probit"))
summary(ps_model_11)
##
## Call:
## glm(formula = treat ~ age + education + black + hispanic + married +
## nodegree + re74 + re75, family = binomial(link = "probit"),
## data = dataset11)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.898e-01 4.801e-01 -1.228 0.219326
## age -2.177e-02 5.822e-03 -3.739 0.000185 ***
## education 3.404e-03 2.897e-02 0.117 0.906469
## black 8.027e-01 1.396e-01 5.751 8.90e-09 ***
## hispanic 5.006e-01 2.279e-01 2.196 0.028070 *
## married -5.146e-01 1.170e-01 -4.397 1.10e-05 ***
## nodegree -1.119e-02 1.459e-01 -0.077 0.938876
## re74 -3.143e-05 1.137e-05 -2.765 0.005689 **
## re75 -6.386e-05 1.413e-05 -4.521 6.15e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1412.36 on 3187 degrees of freedom
## Residual deviance: 868.21 on 3179 degrees of freedom
## AIC: 886.21
##
## Number of Fisher Scoring iterations: 9
#hacer matching
set.seed(2312)
match_psm_11 <- matchit(
treat ~ age + education + black + hispanic + married + nodegree + re74 + re75,
data = dataset11,
method = "nearest",
distance = "glm",
link = "probit",
ratio = 1,
caliper = 0.1, # super restrictivo con las desviacion
replace = FALSE,
estimand = "ATT"
)
#resumen del maatching
cat("=== RESUMEN DEL MATCHING ===\n")
## === RESUMEN DEL MATCHING ===
summary(match_psm_11, standardize = TRUE)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset11, method = "nearest",
## distance = "glm", link = "probit", estimand = "ATT", replace = FALSE,
## caliper = 0.1, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2768 0.0444 1.8154 1.6880 0.4249
## age 25.8162 34.1072 -1.1588 0.4459 0.2126
## education 10.3459 11.8275 -0.7369 0.4312 0.0932
## black 0.8432 0.3124 1.4602 . 0.5309
## hispanic 0.0595 0.0420 0.0740 . 0.0175
## married 0.1892 0.7935 -1.5431 . 0.6044
## nodegree 0.7081 0.3663 0.7518 . 0.3418
## re74 2095.5737 17221.2160 -3.0953 0.1265 0.4215
## re75 1532.0553 16554.1526 -4.6663 0.0536 0.4196
## eCDF Max
## distance 0.7801
## age 0.3434
## education 0.3418
## black 0.5309
## hispanic 0.0175
## married 0.6044
## nodegree 0.3418
## re74 0.6461
## re75 0.6724
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2768 0.2763 0.0038 1.0042 0.0005
## age 25.8162 25.4000 0.0582 0.8315 0.0209
## education 10.3459 10.3514 -0.0027 1.0387 0.0156
## black 0.8432 0.8595 -0.0446 . 0.0162
## hispanic 0.0595 0.0541 0.0229 . 0.0054
## married 0.1892 0.1730 0.0414 . 0.0162
## nodegree 0.7081 0.7568 -0.1070 . 0.0486
## re74 2095.5737 2243.0648 -0.0302 1.4801 0.0198
## re75 1532.0553 1906.1239 -0.1162 0.7098 0.0180
## eCDF Max Std. Pair Dist.
## distance 0.0270 0.0076
## age 0.0811 0.7049
## education 0.0595 0.9006
## black 0.0162 0.5204
## hispanic 0.0054 0.4800
## married 0.0162 0.4278
## nodegree 0.0486 0.6777
## re74 0.0865 0.4620
## re75 0.0649 0.5172
##
## Sample Sizes:
## Control Treated
## All 3003 185
## Matched 185 185
## Unmatched 2818 0
## Discarded 0 0
#grafico de balance
plot(match_psm_11, type = "qq", interactive = FALSE)
#extraer datos
matched_data_11 <- match.data(match_psm_11)
#ATT CON DATOS MATCHED
att_psm_11 <- lm_robust(re78 ~ treat, data = matched_data, weights = weights, se_type = "HC2")
summary(att_psm_11)
##
## Call:
## lm_robust(formula = re78 ~ treat, data = matched_data, weights = weights,
## se_type = "HC2")
##
## Weighted, Standard error type: HC2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) 5383.4 444.2 12.118 1.135e-28 4509.8 6257 368
## treat 965.7 729.3 1.324 1.863e-01 -468.5 2400 368
##
## Multiple R-squared: 0.004742 , Adjusted R-squared: 0.002037
## F-statistic: 1.753 on 1 and 368 DF, p-value: 0.1863
att_psm_11 <- lm_robust(re78 ~ treat, data = matched_data_11, weights = weights, se_type = "HC2")
att_psm_robust_11 <- lm_robust(re78 ~ treat + age + education + black + hispanic +
married + nodegree + re74 + re75,
data = matched_data_11, weights = weights, se_type = "HC2")
summary(att_psm_robust_11)
##
## Call:
## lm_robust(formula = re78 ~ treat + age + education + black +
## hispanic + married + nodegree + re74 + re75, data = matched_data_11,
## weights = weights, se_type = "HC2")
##
## Weighted, Standard error type: HC2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) -1.715e+03 3573.145 -0.4801 0.63147 -8742.2278 5311.4915 360
## treat 1.197e+03 696.118 1.7189 0.08649 -172.3849 2565.5517 360
## age 6.753e+01 41.768 1.6168 0.10680 -14.6103 149.6702 360
## education 5.077e+02 210.427 2.4125 0.01634 93.8425 921.4829 360
## black -1.471e+03 1097.848 -1.3396 0.18123 -3629.6391 688.3613 360
## hispanic 2.985e+02 1688.571 0.1768 0.85976 -3022.1611 3619.2429 360
## married 4.998e+02 962.370 0.5193 0.60384 -1392.7689 2392.3771 360
## nodegree 4.738e+02 1072.897 0.4416 0.65903 -1636.1179 2583.7480 360
## re74 5.873e-02 0.183 0.3209 0.74848 -0.3012 0.4187 360
## re75 3.535e-01 0.158 2.2368 0.02591 0.0427 0.6642 360
##
## Multiple R-squared: 0.08947 , Adjusted R-squared: 0.06671
## F-statistic: 3.791 on 9 and 360 DF, p-value: 0.0001385
#balance
balance_stats_11 <- summary(match_psm_11, standardize = TRUE)
print(balance_stats_11)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset11, method = "nearest",
## distance = "glm", link = "probit", estimand = "ATT", replace = FALSE,
## caliper = 0.1, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2768 0.0444 1.8154 1.6880 0.4249
## age 25.8162 34.1072 -1.1588 0.4459 0.2126
## education 10.3459 11.8275 -0.7369 0.4312 0.0932
## black 0.8432 0.3124 1.4602 . 0.5309
## hispanic 0.0595 0.0420 0.0740 . 0.0175
## married 0.1892 0.7935 -1.5431 . 0.6044
## nodegree 0.7081 0.3663 0.7518 . 0.3418
## re74 2095.5737 17221.2160 -3.0953 0.1265 0.4215
## re75 1532.0553 16554.1526 -4.6663 0.0536 0.4196
## eCDF Max
## distance 0.7801
## age 0.3434
## education 0.3418
## black 0.5309
## hispanic 0.0175
## married 0.6044
## nodegree 0.3418
## re74 0.6461
## re75 0.6724
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2768 0.2763 0.0038 1.0042 0.0005
## age 25.8162 25.4000 0.0582 0.8315 0.0209
## education 10.3459 10.3514 -0.0027 1.0387 0.0156
## black 0.8432 0.8595 -0.0446 . 0.0162
## hispanic 0.0595 0.0541 0.0229 . 0.0054
## married 0.1892 0.1730 0.0414 . 0.0162
## nodegree 0.7081 0.7568 -0.1070 . 0.0486
## re74 2095.5737 2243.0648 -0.0302 1.4801 0.0198
## re75 1532.0553 1906.1239 -0.1162 0.7098 0.0180
## eCDF Max Std. Pair Dist.
## distance 0.0270 0.0076
## age 0.0811 0.7049
## education 0.0595 0.9006
## black 0.0162 0.5204
## hispanic 0.0054 0.4800
## married 0.0162 0.4278
## nodegree 0.0486 0.6777
## re74 0.0865 0.4620
## re75 0.0649 0.5172
##
## Sample Sizes:
## Control Treated
## All 3003 185
## Matched 185 185
## Unmatched 2818 0
## Discarded 0 0
#love plot
love_plot_11 <- love.plot(match_psm_11,
stats = "mean.diffs",
stars = "raw",
thresholds = 0.1,
drop.distance = TRUE,
title = "Balance de Covariables - PSM con dataset_final")
print(love_plot_11)
# FIABILIDAD
# Tamaño de muestra post-matching
cat("=== ANÁLISIS DE FIABILIDAD ===\n")
## === ANÁLISIS DE FIABILIDAD ===
cat("Tamaño de muestra original (dataset11):", nrow(dataset11), "\n")
## Tamaño de muestra original (dataset11): 3188
cat("Tamaño de muestra matched:", nrow(matched_data_11), "\n")
## Tamaño de muestra matched: 370
cat("Pérdida de observaciones:", nrow(dataset11) - nrow(matched_data_11), "\n")
## Pérdida de observaciones: 2818
cat("Porcentaje retenido:", round(nrow(matched_data_11)/nrow(dataset11)*100, 1), "%\n")
## Porcentaje retenido: 11.6 %
#Common support para obtener del x del grafico de densidad
dataset11$ps <- predict(ps_model_11, newdata = dataset11, type = "response")
matched_data_11$ps <- matched_data_11$distance
ps_treat_pre_11 <- dataset11$ps[dataset11$treat == 1]
ps_control_pre_11 <- dataset11$ps[dataset11$treat == 0]
ps_treat_post_11 <- matched_data_11$ps[matched_data_11$treat == 1]
ps_control_post_11 <- matched_data_11$ps[matched_data_11$treat == 0]
cat("\n=== COMMON SUPPORT ===\n")
##
## === COMMON SUPPORT ===
cat("Pre-matching - Tratados: [", round(min(ps_treat_pre_11), 6), ",", round(max(ps_treat_pre_11), 6), "]\n")
## Pre-matching - Tratados: [ 0.001106 , 0.446575 ]
cat("Pre-matching - Controles: [", round(min(ps_control_pre_11), 6), ",", round(max(ps_control_pre_11), 6), "]\n")
## Pre-matching - Controles: [ 0 , 0.447921 ]
cat("Post-matching - Tratados: [", round(min(ps_treat_post_11), 6), ",", round(max(ps_treat_post_11), 6), "]\n")
## Post-matching - Tratados: [ 0.001106 , 0.446575 ]
cat("Post-matching - Controles: [", round(min(ps_control_post_11), 6), ",", round(max(ps_control_post_11), 6), "]\n")
## Post-matching - Controles: [ 0.001106 , 0.447921 ]
#grafico de densidad
densidad2 <-ggplot(matched_data_11, aes(x = ps, fill = factor(treat))) +
geom_density(alpha = 0.5) +
labs(title = "Densidad de Propensity Score Post-Matching - dataset_final",
subtitle = paste("Muestra:", nrow(matched_data_11), "observaciones"),
x = "Propensity Score", y = "Densidad") +
scale_fill_manual(values = c("blue", "red"),
labels = c("Control", "Tratado"),
name = "Grupo")
print(densidad2)
# COMPARACION FINAL MCO vs PSM VS Matching en caract
modelsummary(list("MCO Simple" = mco_simple,
"MCO con Controles" = mco_controles_1,
"MCO Propensity score simple" = att_psm,
"MCO Propensity score con controles" = att_psm_robust,
"MCO Propensity carteristicas simple" = att_psm_ratio_simple,
"MCO Propensity caracteristicas con controles" = att_psm_ratio1,
"MCO Propensity score simple 11" = att_psm_11,
"MCO Propensity score con controles 11" = att_psm_robust_11
),
title = "Comparación de Modelos MCO",
fmt = 2
)
| MCO Simple | MCO con Controles | MCO Propensity score simple | MCO Propensity score con controles | MCO Propensity carteristicas simple | MCO Propensity caracteristicas con controles | MCO Propensity score simple 11 | MCO Propensity score con controles 11 | |
|---|---|---|---|---|---|---|---|---|
| (Intercept) | 21553.92 | -120.22 | 5383.42 | -115.33 | 3572.84 | 1496.05 | 5257.56 | -1715.37 |
| (311.73) | (1958.87) | (444.24) | (3695.39) | (502.59) | (4455.53) | (397.97) | (3573.15) | |
| age | -93.57 | 32.51 | 85.64 | 67.53 | ||||
| (21.55) | (42.92) | (91.04) | (41.77) | |||||
| education | 594.87 | 532.19 | 306.42 | 507.66 | ||||
| (127.62) | (205.98) | (287.64) | (210.43) | |||||
| black | -570.70 | -2220.14 | -3723.35 | -1470.64 | ||||
| (453.73) | (1338.28) | (1666.76) | (1097.85) | |||||
| hispanic | 2502.69 | -1642.81 | -491.93 | 298.54 | ||||
| (1323.08) | (2003.15) | (2398.53) | (1688.57) | |||||
| married | 1380.75 | 742.25 | -1018.36 | 499.80 | ||||
| (526.93) | (922.29) | (1397.17) | (962.37) | |||||
| nodegree | 768.54 | 303.29 | 155.77 | 473.82 | ||||
| (677.01) | (1090.47) | (1339.10) | (1072.90) | |||||
| re74 | 0.29 | 0.26 | -0.66 | 0.06 | ||||
| (0.06) | (0.17) | (0.43) | (0.18) | |||||
| re75 | 0.57 | 0.16 | 1.31 | 0.35 | ||||
| (0.07) | (0.18) | (0.53) | (0.16) | |||||
| treat | 965.72 | 937.52 | 1754.28 | 1750.12 | 1091.58 | 1196.58 | ||
| (729.33) | (722.84) | (770.85) | (749.59) | (702.11) | (696.12) | |||
| Num.Obs. | 2490 | 2490 | 370 | 370 | 180 | 180 | 370 | 370 |
| R2 | -0.000 | 0.572 | 0.005 | 0.098 | 0.028 | 0.119 | 0.007 | 0.089 |
| R2 Adj. | -0.000 | 0.570 | 0.002 | 0.076 | 0.023 | 0.072 | 0.004 | 0.067 |
| AIC | 55137.1 | 53042.0 | 7607.3 | 7586.7 | 3593.1 | 3591.5 | 7579.1 | 7562.8 |
| BIC | 55148.7 | 53100.2 | 7619.0 | 7629.8 | 3602.7 | 3626.7 | 7590.8 | 7605.9 |
| RMSE | 15552.22 | 10178.65 | 6995.51 | 6658.74 | 5142.24 | 4897.45 | 6734.37 | 6447.10 |
summary(match_psm_11)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset11, method = "nearest",
## distance = "glm", link = "probit", estimand = "ATT", replace = FALSE,
## caliper = 0.1, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2768 0.0444 1.8154 1.6880 0.4249
## age 25.8162 34.1072 -1.1588 0.4459 0.2126
## education 10.3459 11.8275 -0.7369 0.4312 0.0932
## black 0.8432 0.3124 1.4602 . 0.5309
## hispanic 0.0595 0.0420 0.0740 . 0.0175
## married 0.1892 0.7935 -1.5431 . 0.6044
## nodegree 0.7081 0.3663 0.7518 . 0.3418
## re74 2095.5737 17221.2160 -3.0953 0.1265 0.4215
## re75 1532.0553 16554.1526 -4.6663 0.0536 0.4196
## eCDF Max
## distance 0.7801
## age 0.3434
## education 0.3418
## black 0.5309
## hispanic 0.0175
## married 0.6044
## nodegree 0.3418
## re74 0.6461
## re75 0.6724
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2768 0.2763 0.0038 1.0042 0.0005
## age 25.8162 25.4000 0.0582 0.8315 0.0209
## education 10.3459 10.3514 -0.0027 1.0387 0.0156
## black 0.8432 0.8595 -0.0446 . 0.0162
## hispanic 0.0595 0.0541 0.0229 . 0.0054
## married 0.1892 0.1730 0.0414 . 0.0162
## nodegree 0.7081 0.7568 -0.1070 . 0.0486
## re74 2095.5737 2243.0648 -0.0302 1.4801 0.0198
## re75 1532.0553 1906.1239 -0.1162 0.7098 0.0180
## eCDF Max Std. Pair Dist.
## distance 0.0270 0.0076
## age 0.0811 0.7049
## education 0.0595 0.9006
## black 0.0162 0.5204
## hispanic 0.0054 0.4800
## married 0.0162 0.4278
## nodegree 0.0486 0.6777
## re74 0.0865 0.4620
## re75 0.0649 0.5172
##
## Sample Sizes:
## Control Treated
## All 3003 185
## Matched 185 185
## Unmatched 2818 0
## Discarded 0 0
summary(match_psm)
##
## Call:
## matchit(formula = treat ~ age + education + black + hispanic +
## married + nodegree + re74 + re75, data = dataset_final, method = "nearest",
## distance = "glm", link = "probit", estimand = "ATT", replace = FALSE,
## caliper = 0.1, ratio = 1)
##
## Summary of Balance for All Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2971 0.0470 1.8947 1.5698 0.4394
## age 25.8162 33.9244 -1.1332 0.4587 0.2079
## education 10.3459 11.9251 -0.7854 0.4394 0.0973
## black 0.8432 0.3051 1.4802 . 0.5382
## hispanic 0.0595 0.0396 0.0838 . 0.0198
## married 0.1892 0.7989 -1.5568 . 0.6097
## nodegree 0.7081 0.3553 0.7761 . 0.3528
## re74 2095.5737 17791.0560 -3.2119 0.1247 0.4356
## re75 1532.0553 17380.7662 -4.9231 0.0530 0.4403
## eCDF Max
## distance 0.7917
## age 0.3405
## education 0.3528
## black 0.5382
## hispanic 0.0198
## married 0.6097
## nodegree 0.3528
## re74 0.6603
## re75 0.7018
##
## Summary of Balance for Matched Data:
## Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
## distance 0.2971 0.2964 0.0056 1.0046 0.0005
## age 25.8162 26.1838 -0.0514 0.7603 0.0175
## education 10.3459 10.3027 0.0215 1.0065 0.0134
## black 0.8432 0.9135 -0.1933 . 0.0703
## hispanic 0.0595 0.0486 0.0457 . 0.0108
## married 0.1892 0.2324 -0.1104 . 0.0432
## nodegree 0.7081 0.7297 -0.0476 . 0.0216
## re74 2095.5737 2232.4086 -0.0280 1.3697 0.0163
## re75 1532.0553 1821.2594 -0.0898 0.9051 0.0140
## eCDF Max Std. Pair Dist.
## distance 0.0270 0.0083
## age 0.0486 0.8144
## education 0.0757 0.8442
## black 0.0703 0.5501
## hispanic 0.0108 0.4114
## married 0.0432 0.4416
## nodegree 0.0216 0.6658
## re74 0.0757 0.5030
## re75 0.0703 0.5615
##
## Sample Sizes:
## Control Treated
## All 2750 185
## Matched 185 185
## Unmatched 2565 0
## Discarded 0 0
Si bien el matching se realizo de manera correcta y se encontro match en todos los tratados, son variables no significativas excepto la variables educacion para el mco con todas las variables incluidas pero en el mco simple es altamente significativa.
Si es fiable a comparacion de las demas modelos, no lo recomiendo, es util de manera preliminar y al observar un patron, el mco simple es significativo mas no cuando se incluyen todas la variables por lo que es no recomendable usar propensity score.
Como se menciono en el inciso 4 lo mejor es no trabajar con propensity score, valores no significativos.
En cambio con matching en caracteristica es mas robusto, se obtuvo mejores resultados robustos, significativos y distancia mco de tratados mas corta, aunque se reduce a 90 de 185 tratados, se uso solo 2 bases de datos recomiendo ampliamente agregar la ultima base de datos para mejores resultados.
Por lo que se puede concluir que el tratamiento incremento ingresos entre $270-$3,230, siendo el efecto más confiable alrededor de $1,750 para la subpoblación que encontró match óptimo.