Este es un ejemplo de como estimar los modelos que vimos en la clase usando R
Recuerden que lo primero es fijar el directorio de trabajo usando a funcion setwd “set working directory”
Dado que mi base de datos esta en formato de Stata debo usar el paquete foreign, que me permitira importar las bases de otros formatos a R
library(foreign)
Si no la tienen instalada deberan usar install.packages(“foreign”)
data<-read.dta("BaseMMP2.dta")
names(data)
## [1] "commun" "parity" "agep" "chdie" "chdie2"
## [6] "femalehh" "usaexphh" "heduc" "lnborn" "tpopbr"
## [11] "lffborn" "totlpf" "tmigprev" "pmigf" "pmratio"
## [16] "pfratio" "totagr" "migfam1" "migfam2" "carencias"
## [21] "remittances" "idp" "idhh" "wtfin" "mighh"
## [26] "yrmig1" "yrmig2" "yrmig10" "yrmig10f" "periodo"
## [31] "ageg" "ageg1" "ageg2" "ageg3" "ageg4"
## [36] "commun1" "periodo1" "periodo2" "periodo3" "periodo4"
## [41] "mighh1" "mighh2" "mighh3"
Primero veamos la variable dependiente chdie2, esta variable indica que si un ninio naciodo en el hogar murio en el mismo anio de su nacimiento o un anio despues
table(data$chdie2)
##
## 0 1
## 16780 930
para poder obtener cuadros ponderados instalamos el paquete install.packages(“questionr”)
library(questionr)
table(data$chdie2)
##
## 0 1
## 16780 930
Tenemos aproximadamente 5% de eventos Supongamos que queremos ver si la jefatura femenina del hogar y la experiencia migratoria afectan la propension a morir Lo primero es correr algunas pruebas
table(data$chdie2,data$femalehh)
##
## 0 1
## 0 14784 1996
## 1 815 115
prop.test(table(data$chdie2,data$femalehh))
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: table(data$chdie2, data$femalehh)
## X-squared = 0.14367, df = 1, p-value = 0.7047
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.01757913 0.02698869
## sample estimates:
## prop 1 prop 2
## 0.8810489 0.8763441
Esta primera prueba nos dice que no hay diferencias significativas
Ahora intentemos con experiencia migratoria en el hogar
table(data$chdie2,data$migfam1)
##
## 0 1
## 0 8053 8727
## 1 164 766
prop.test(table(data$chdie2,data$migfam1))
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: table(data$chdie2, data$migfam1)
## X-squared = 325.29, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.2773711 0.3297738
## sample estimates:
## prop 1 prop 2
## 0.4799166 0.1763441
Y en la familia extendida
table(data$chdie2,data$migfam2)
##
## 0 1
## 0 4074 12706
## 1 73 857
prop.test(table(data$chdie2,data$migfam2))
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: table(data$chdie2, data$migfam2)
## X-squared = 131.72, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.1452644 0.1833244
## sample estimates:
## prop 1 prop 2
## 0.24278903 0.07849462
Ahora vamos a correr un simple modelo logistico
model1<-glm(chdie2~femalehh+migfam1+migfam2+usaexphh,family=binomial,data=data)
summary(model1)
##
## Call:
## glm(formula = chdie2 ~ femalehh + migfam1 + migfam2 + usaexphh,
## family = binomial, data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.4481 -0.4243 -0.2370 -0.1837 3.0163
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.1881720 0.1215655 -34.452 < 2e-16 ***
## femalehh 0.1145444 0.1041953 1.099 0.271627
## migfam1 1.3103507 0.0995049 13.169 < 2e-16 ***
## migfam2 0.5154985 0.1401857 3.677 0.000236 ***
## usaexphh -0.0024330 0.0006295 -3.865 0.000111 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 7291.1 on 17709 degrees of freedom
## Residual deviance: 6900.0 on 17705 degrees of freedom
## AIC: 6910
##
## Number of Fisher Scoring iterations: 6
Ahora solo para tener una variable de otra naturaleza vamos a crear una variable multinomial, en este caso usemos la experiencia migratoria para hacer 3 grupos: 1)sin exp 2) menos de 60 mese 3) 60 y mas
table(data$usaexphh)
##
## 0 1 2 3 4 5 6 7 8 9 10 11
## 12135 71 101 129 96 83 225 72 104 76 69 43
## 12 13 14 15 16 17 18 19 20 21 22 23
## 411 34 49 70 44 26 109 26 44 35 32 20
## 24 25 26 27 28 29 30 31 32 33 34 35
## 373 23 22 35 30 26 50 11 23 20 18 13
## 36 37 38 39 40 41 42 43 44 45 46 47
## 261 6 21 17 17 14 56 14 16 12 8 3
## 48 49 50 51 52 53 54 55 56 57 58 59
## 171 12 11 16 9 15 28 3 13 17 16 5
## 60 61 62 63 64 65 66 67 68 69 70 71
## 157 7 8 5 10 14 32 4 11 9 11 5
## 72 73 74 75 76 77 78 79 80 81 82 83
## 127 6 10 10 10 12 26 7 8 7 3 7
## 84 85 86 87 88 89 90 91 92 93 94 96
## 89 2 4 4 8 6 11 5 4 10 6 87
## 97 98 99 100 101 102 103 104 105 106 107 108
## 8 3 11 10 3 16 9 9 6 2 4 56
## 110 111 112 113 114 115 116 117 118 119 120 121
## 6 6 3 1 23 5 9 5 6 4 81 9
## 122 123 124 126 127 128 129 130 131 132 133 134
## 6 9 5 23 4 3 2 9 5 46 4 5
## 135 136 137 138 139 140 141 142 143 144 145 146
## 4 7 4 24 2 3 3 5 1 56 1 3
## 147 148 149 150 151 152 153 155 156 157 158 159
## 2 4 2 22 3 5 3 2 37 1 4 5
## 160 161 162 163 164 165 166 167 168 170 171 172
## 5 5 20 3 4 4 5 2 31 5 5 1
## 173 174 175 176 177 178 179 180 181 182 183 184
## 4 20 2 3 7 5 2 36 2 4 3 3
## 185 186 187 188 189 190 191 192 193 194 195 197
## 3 13 4 2 2 4 5 32 3 2 1 1
## 198 199 200 201 202 203 204 205 206 207 208 209
## 9 3 4 2 3 3 27 2 2 1 4 4
## 210 211 212 213 214 215 216 217 218 219 222 223
## 13 3 2 2 1 1 22 2 1 2 7 2
## 224 225 226 227 228 230 231 232 234 235 236 237
## 2 2 1 1 20 2 2 1 11 1 1 2
## 238 239 240 241 242 243 244 246 248 249 250 252
## 1 3 29 4 3 1 2 13 1 4 2 22
## 255 257 258 260 261 262 264 265 268 269 270 271
## 2 1 9 4 1 2 19 1 1 1 13 1
## 272 273 274 275 276 278 279 280 281 282 285 286
## 2 2 1 1 14 1 1 1 5 7 2 1
## 288 291 292 294 297 298 299 300 303 305 306 307
## 14 2 4 8 1 1 1 7 2 1 5 1
## 309 310 311 312 314 315 316 317 318 320 321 323
## 2 1 1 14 1 2 3 1 1 1 2 1
## 324 327 330 335 336 338 340 342 344 348 351 354
## 15 1 4 1 8 2 3 4 1 5 2 2
## 356 357 359 360 361 363 366 367 369 370 371 372
## 1 1 1 8 1 1 1 1 1 2 1 6
## 375 378 380 384 388 389 390 393 396 397 400 408
## 1 1 1 8 1 1 4 1 3 1 1 6
## 411 414 415 416 418 420 423 426 432 438 444 450
## 1 3 1 1 1 3 1 1 3 2 2 1
## 456 466 468 474 480 492 504 516 528 531 540 552
## 5 1 2 1 4 3 4 5 3 1 2 1
## 558 564 576 612 672 690 780
## 2 1 2 1 2 1 1
data$expg<-ifelse(data$usaexphh==0,1,ifelse(data$usaexphh<=60,2,3))
table(data$expg)
##
## 1 2 3
## 12135 3531 2044
Para correr un modelo multinomial necesitamos usar otro paquete, en este caso nnet
library(nnet)
model2<-multinom(expg~femalehh+migfam1+migfam2+remittances,data=data)
## # weights: 18 (10 variable)
## initial value 19456.423632
## iter 10 value 13859.426698
## final value 13558.319626
## converged
summary(model2)
## Call:
## multinom(formula = expg ~ femalehh + migfam1 + migfam2 + remittances,
## data = data)
##
## Coefficients:
## (Intercept) femalehh migfam1 migfam2 remittances
## 2 -2.121039 -1.504917 0.8530599 0.5602262 0.8523268
## 3 -3.258217 -1.522472 1.4360964 0.6326950 1.5720417
##
## Std. Errors:
## (Intercept) femalehh migfam1 migfam2 remittances
## 2 0.05309465 0.0901648 0.04782173 0.06303008 0.06924580
## 3 0.08270208 0.1144699 0.06630168 0.09156115 0.07696469
##
## Residual Deviance: 27116.64
## AIC: 27136.64
Ahora para una ordinal
library(MASS)
data$expg2<-as.factor(data$expg)
model3<-polr(expg2~femalehh+migfam1+migfam2+remittances,data=data,Hess=T)
summary(model3)
## Call:
## polr(formula = expg2 ~ femalehh + migfam1 + migfam2 + remittances,
## data = data, Hess = T)
##
## Coefficients:
## Value Std. Error t value
## femalehh -1.4979 0.07372 -20.32
## migfam1 1.0685 0.04095 26.09
## migfam2 0.5691 0.05425 10.49
## remittances 1.1690 0.05532 21.13
##
## Intercepts:
## Value Std. Error t value
## 1|2 1.8585 0.0463 40.1474
## 2|3 3.2303 0.0508 63.5266
##
## Residual Deviance: 27136.88
## AIC: 27148.88
Ahora vamos a ajustar binomial negativo y poisson primero hay que hacer la variable… para eso vamos a seguir con los meses de experiencia migratoria del hogar, pero vamos a convertirlos en anios y redondear para no tener promblemas
data$aniosexp<-round(data$usaexphh/12,0)
table(data$aniosexp)
##
## 0 1 2 3 4 5 6 7 8 9 10 11
## 12840 998 825 421 371 255 273 145 174 102 185 93
## 12 13 14 15 16 17 18 19 20 21 22 23
## 128 70 104 70 78 55 56 33 71 32 51 29
## 24 25 26 27 28 29 30 31 32 33 34 35
## 38 13 32 20 22 8 16 12 16 6 10 7
## 36 37 38 39 40 41 42 43 44 45 46 47
## 6 2 6 3 5 3 4 5 4 2 3 1
## 48 51 56 58 65
## 2 1 2 1 1
hist(data$aniosexp,breaks=10)
hist(data$aniosexp,breaks=50)
Ahora ajustamos el modelo poisson
model4<-glm(aniosexp~femalehh+migfam1+migfam2+remittances,family="poisson",data=data)
summary(model4)
##
## Call:
## glm(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances,
## family = "poisson", data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.959 -2.361 -1.446 -0.711 20.161
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.39514 0.01809 -21.84 <2e-16 ***
## femalehh -0.98028 0.02510 -39.06 <2e-16 ***
## migfam1 0.96084 0.01425 67.45 <2e-16 ***
## migfam2 0.45941 0.02030 22.64 <2e-16 ***
## remittances 1.03373 0.01464 70.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 123801 on 17709 degrees of freedom
## Residual deviance: 109755 on 17705 degrees of freedom
## AIC: 125997
##
## Number of Fisher Scoring iterations: 7
Ahora el binomial negativo
model5<-glm.nb(aniosexp~femalehh+migfam1+migfam2+remittances,data=data)
summary(model5)
##
## Call:
## glm.nb(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances,
## data = data, init.theta = 0.1295846364, link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0454 -0.9016 -0.7550 -0.3111 4.5409
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.51604 0.04959 -10.406 <2e-16 ***
## femalehh -1.00760 0.07180 -14.034 <2e-16 ***
## migfam1 1.00985 0.05210 19.382 <2e-16 ***
## migfam2 0.55475 0.06258 8.865 <2e-16 ***
## remittances 1.11005 0.07612 14.583 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.1296) family taken to be 1)
##
## Null deviance: 11556 on 17709 degrees of freedom
## Residual deviance: 10514 on 17705 degrees of freedom
## AIC: 47871
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.12958
## Std. Err.: 0.00241
##
## 2 x log-likelihood: -47859.13400
Los modelos inflados por exeso de ceros
library(pscl)
## Loading required package: lattice
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
model6<-zeroinfl(aniosexp~femalehh+migfam1+migfam2+remittances | femalehh,data=data)
summary(model6)
##
## Call:
## zeroinfl(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances |
## femalehh, data = data)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.6140 -0.5995 -0.5788 -0.2956 22.6522
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.62536 0.01906 85.276 < 2e-16 ***
## femalehh 0.15743 0.02512 6.267 3.69e-10 ***
## migfam1 0.30330 0.01430 21.217 < 2e-16 ***
## migfam2 0.06649 0.02032 3.273 0.00106 **
## remittances 0.33808 0.01488 22.713 < 2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.84559 0.01750 48.31 <2e-16 ***
## femalehh 1.42147 0.07669 18.54 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 16
## Log-likelihood: -3.462e+04 on 7 Df
model7<-zeroinfl(aniosexp~femalehh+migfam1+migfam2+remittances | femalehh,dist="negbin",data=data)
summary(model7)
##
## Call:
## zeroinfl(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances |
## femalehh, data = data, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.3837 -0.3771 -0.3492 -0.1859 29.6154
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.15760 0.10824 -1.456 0.145
## femalehh 0.18321 0.12406 1.477 0.140
## migfam1 0.95285 0.05371 17.742 < 2e-16 ***
## migfam2 0.48485 0.06099 7.950 1.87e-15 ***
## remittances 1.07505 0.07903 13.603 < 2e-16 ***
## Log(theta) -1.58078 0.11598 -13.630 < 2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1748 0.3340 -3.518 0.000435 ***
## femalehh 2.3923 0.2559 9.348 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.2058
## Number of iterations in BFGS optimization: 30
## Log-likelihood: -2.377e+04 on 8 Df
Para los modelos multinivel, primero con una variable continua
library(lme4)
## Loading required package: Matrix
model8<-lmer(usaexphh~femalehh+migfam1+migfam2+remittances +(1 | commun),data=data)
summary(model8)
## Linear mixed model fit by REML ['lmerMod']
## Formula: usaexphh ~ femalehh + migfam1 + migfam2 + remittances + (1 |
## commun)
## Data: data
##
## REML criterion at convergence: 194884.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0714 -0.4526 -0.1693 -0.0036 13.0055
##
## Random effects:
## Groups Name Variance Std.Dev.
## commun (Intercept) 320 17.89
## Residual 3463 58.85
## Number of obs: 17710, groups: commun, 116
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.457 2.065 3.612
## femalehh -15.350 1.385 -11.080
## migfam1 22.896 1.460 15.686
## migfam2 5.475 1.329 4.120
## remittances 22.931 1.653 13.876
##
## Correlation of Fixed Effects:
## (Intr) femlhh migfm1 migfm2
## femalehh -0.096
## migfam1 -0.255 0.006
## migfam2 -0.397 0.040 -0.250
## remittances -0.032 -0.065 -0.006 -0.079
Metiende una variable a nivel superior
model9<-lmer(usaexphh~femalehh+migfam1+migfam2+remittances +(femalehh | commun),data=data)
summary(model9)
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## usaexphh ~ femalehh + migfam1 + migfam2 + remittances + (femalehh |
## commun)
## Data: data
##
## REML criterion at convergence: 194813.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.1718 -0.4365 -0.1696 0.0043 13.0211
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## commun (Intercept) 374.3 19.35
## femalehh 159.4 12.62 -1.00
## Residual 3447.9 58.72
## Number of obs: 17710, groups: commun, 116
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 8.497 2.157 3.938
## femalehh -15.695 1.806 -8.692
## migfam1 21.572 1.419 15.201
## migfam2 5.076 1.318 3.851
## remittances 23.441 1.638 14.309
##
## Correlation of Fixed Effects:
## (Intr) femlhh migfm1 migfm2
## femalehh -0.625
## migfam1 -0.222 0.023
## migfam2 -0.368 0.041 -0.281
## remittances -0.040 -0.048 0.015 -0.076
Ahora para una logistica
model10<-glmer(chdie2~femalehh+migfam1+migfam2+remittances +(1 | commun),data=data, family="binomial")
summary(model10)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## chdie2 ~ femalehh + migfam1 + migfam2 + remittances + (1 | commun)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 6694.1 6740.8 -3341.0 6682.1 17704
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.5560 -0.2952 -0.1695 -0.0973 11.6260
##
## Random effects:
## Groups Name Variance Std.Dev.
## commun (Intercept) 1.311 1.145
## Number of obs: 17710, groups: commun, 116
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.8672 0.1772 -21.827 < 2e-16 ***
## femalehh 0.2699 0.1049 2.572 0.01011 *
## migfam1 0.4289 0.1556 2.756 0.00586 **
## migfam2 0.1582 0.1532 1.032 0.30189
## remittances 0.3533 0.1590 2.222 0.02628 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) femlhh migfm1 migfm2
## femalehh -0.100
## migfam1 -0.311 -0.003
## migfam2 -0.560 0.025 -0.199
## remittances -0.052 -0.044 -0.047 -0.089
# y con efectos a nivel superio
model11<-glmer(chdie2~femalehh+migfam1+migfam2+remittances +(tmigprev | commun),data=data, family="binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge with max|grad| = 0.00118723 (tol =
## 0.001, component 1)
summary(model11)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: binomial ( logit )
## Formula:
## chdie2 ~ femalehh + migfam1 + migfam2 + remittances + (tmigprev |
## commun)
## Data: data
##
## AIC BIC logLik deviance df.resid
## 6678.5 6740.8 -3331.3 6662.5 17702
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.6129 -0.2879 -0.1731 -0.0929 12.4795
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## commun (Intercept) 2.3274282 1.52559
## tmigprev 0.0009824 0.03134 -0.80
## Number of obs: 17710, groups: commun, 116
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.8617 0.1862 -20.743 < 2e-16 ***
## femalehh 0.2212 0.1056 2.094 0.03625 *
## migfam1 0.4117 0.1584 2.598 0.00937 **
## migfam2 0.1321 0.1531 0.863 0.38805
## remittances 0.3366 0.1601 2.102 0.03556 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) femlhh migfm1 migfm2
## femalehh -0.086
## migfam1 -0.321 0.001
## migfam2 -0.542 0.028 -0.192
## remittances -0.044 -0.039 -0.062 -0.089
## convergence code: 0
## Model failed to converge with max|grad| = 0.00118723 (tol = 0.001, component 1)