Este es un ejemplo de como estimar los modelos que vimos en la clase usando R

Recuerden que lo primero es fijar el directorio de trabajo usando a funcion setwd “set working directory”

Dado que mi base de datos esta en formato de Stata debo usar el paquete foreign, que me permitira importar las bases de otros formatos a R

library(foreign)

Si no la tienen instalada deberan usar install.packages(“foreign”)

data<-read.dta("BaseMMP2.dta")
names(data)
##  [1] "commun"      "parity"      "agep"        "chdie"       "chdie2"     
##  [6] "femalehh"    "usaexphh"    "heduc"       "lnborn"      "tpopbr"     
## [11] "lffborn"     "totlpf"      "tmigprev"    "pmigf"       "pmratio"    
## [16] "pfratio"     "totagr"      "migfam1"     "migfam2"     "carencias"  
## [21] "remittances" "idp"         "idhh"        "wtfin"       "mighh"      
## [26] "yrmig1"      "yrmig2"      "yrmig10"     "yrmig10f"    "periodo"    
## [31] "ageg"        "ageg1"       "ageg2"       "ageg3"       "ageg4"      
## [36] "commun1"     "periodo1"    "periodo2"    "periodo3"    "periodo4"   
## [41] "mighh1"      "mighh2"      "mighh3"

Primero veamos la variable dependiente chdie2, esta variable indica que si un ninio naciodo en el hogar murio en el mismo anio de su nacimiento o un anio despues

table(data$chdie2)
## 
##     0     1 
## 16780   930

para poder obtener cuadros ponderados instalamos el paquete install.packages(“questionr”)

library(questionr)
table(data$chdie2)
## 
##     0     1 
## 16780   930

Tenemos aproximadamente 5% de eventos Supongamos que queremos ver si la jefatura femenina del hogar y la experiencia migratoria afectan la propension a morir Lo primero es correr algunas pruebas

table(data$chdie2,data$femalehh)
##    
##         0     1
##   0 14784  1996
##   1   815   115
prop.test(table(data$chdie2,data$femalehh))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  table(data$chdie2, data$femalehh)
## X-squared = 0.14367, df = 1, p-value = 0.7047
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.01757913  0.02698869
## sample estimates:
##    prop 1    prop 2 
## 0.8810489 0.8763441

Esta primera prueba nos dice que no hay diferencias significativas

Ahora intentemos con experiencia migratoria en el hogar

table(data$chdie2,data$migfam1)
##    
##        0    1
##   0 8053 8727
##   1  164  766
prop.test(table(data$chdie2,data$migfam1))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  table(data$chdie2, data$migfam1)
## X-squared = 325.29, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.2773711 0.3297738
## sample estimates:
##    prop 1    prop 2 
## 0.4799166 0.1763441

Y en la familia extendida

table(data$chdie2,data$migfam2)
##    
##         0     1
##   0  4074 12706
##   1    73   857
prop.test(table(data$chdie2,data$migfam2))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  table(data$chdie2, data$migfam2)
## X-squared = 131.72, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1452644 0.1833244
## sample estimates:
##     prop 1     prop 2 
## 0.24278903 0.07849462

Ahora vamos a correr un simple modelo logistico

model1<-glm(chdie2~femalehh+migfam1+migfam2+usaexphh,family=binomial,data=data)
summary(model1)
## 
## Call:
## glm(formula = chdie2 ~ femalehh + migfam1 + migfam2 + usaexphh, 
##     family = binomial, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.4481  -0.4243  -0.2370  -0.1837   3.0163  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.1881720  0.1215655 -34.452  < 2e-16 ***
## femalehh     0.1145444  0.1041953   1.099 0.271627    
## migfam1      1.3103507  0.0995049  13.169  < 2e-16 ***
## migfam2      0.5154985  0.1401857   3.677 0.000236 ***
## usaexphh    -0.0024330  0.0006295  -3.865 0.000111 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7291.1  on 17709  degrees of freedom
## Residual deviance: 6900.0  on 17705  degrees of freedom
## AIC: 6910
## 
## Number of Fisher Scoring iterations: 6

Ahora solo para tener una variable de otra naturaleza vamos a crear una variable multinomial, en este caso usemos la experiencia migratoria para hacer 3 grupos: 1)sin exp 2) menos de 60 mese 3) 60 y mas

table(data$usaexphh)
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 12135    71   101   129    96    83   225    72   104    76    69    43 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##   411    34    49    70    44    26   109    26    44    35    32    20 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##   373    23    22    35    30    26    50    11    23    20    18    13 
##    36    37    38    39    40    41    42    43    44    45    46    47 
##   261     6    21    17    17    14    56    14    16    12     8     3 
##    48    49    50    51    52    53    54    55    56    57    58    59 
##   171    12    11    16     9    15    28     3    13    17    16     5 
##    60    61    62    63    64    65    66    67    68    69    70    71 
##   157     7     8     5    10    14    32     4    11     9    11     5 
##    72    73    74    75    76    77    78    79    80    81    82    83 
##   127     6    10    10    10    12    26     7     8     7     3     7 
##    84    85    86    87    88    89    90    91    92    93    94    96 
##    89     2     4     4     8     6    11     5     4    10     6    87 
##    97    98    99   100   101   102   103   104   105   106   107   108 
##     8     3    11    10     3    16     9     9     6     2     4    56 
##   110   111   112   113   114   115   116   117   118   119   120   121 
##     6     6     3     1    23     5     9     5     6     4    81     9 
##   122   123   124   126   127   128   129   130   131   132   133   134 
##     6     9     5    23     4     3     2     9     5    46     4     5 
##   135   136   137   138   139   140   141   142   143   144   145   146 
##     4     7     4    24     2     3     3     5     1    56     1     3 
##   147   148   149   150   151   152   153   155   156   157   158   159 
##     2     4     2    22     3     5     3     2    37     1     4     5 
##   160   161   162   163   164   165   166   167   168   170   171   172 
##     5     5    20     3     4     4     5     2    31     5     5     1 
##   173   174   175   176   177   178   179   180   181   182   183   184 
##     4    20     2     3     7     5     2    36     2     4     3     3 
##   185   186   187   188   189   190   191   192   193   194   195   197 
##     3    13     4     2     2     4     5    32     3     2     1     1 
##   198   199   200   201   202   203   204   205   206   207   208   209 
##     9     3     4     2     3     3    27     2     2     1     4     4 
##   210   211   212   213   214   215   216   217   218   219   222   223 
##    13     3     2     2     1     1    22     2     1     2     7     2 
##   224   225   226   227   228   230   231   232   234   235   236   237 
##     2     2     1     1    20     2     2     1    11     1     1     2 
##   238   239   240   241   242   243   244   246   248   249   250   252 
##     1     3    29     4     3     1     2    13     1     4     2    22 
##   255   257   258   260   261   262   264   265   268   269   270   271 
##     2     1     9     4     1     2    19     1     1     1    13     1 
##   272   273   274   275   276   278   279   280   281   282   285   286 
##     2     2     1     1    14     1     1     1     5     7     2     1 
##   288   291   292   294   297   298   299   300   303   305   306   307 
##    14     2     4     8     1     1     1     7     2     1     5     1 
##   309   310   311   312   314   315   316   317   318   320   321   323 
##     2     1     1    14     1     2     3     1     1     1     2     1 
##   324   327   330   335   336   338   340   342   344   348   351   354 
##    15     1     4     1     8     2     3     4     1     5     2     2 
##   356   357   359   360   361   363   366   367   369   370   371   372 
##     1     1     1     8     1     1     1     1     1     2     1     6 
##   375   378   380   384   388   389   390   393   396   397   400   408 
##     1     1     1     8     1     1     4     1     3     1     1     6 
##   411   414   415   416   418   420   423   426   432   438   444   450 
##     1     3     1     1     1     3     1     1     3     2     2     1 
##   456   466   468   474   480   492   504   516   528   531   540   552 
##     5     1     2     1     4     3     4     5     3     1     2     1 
##   558   564   576   612   672   690   780 
##     2     1     2     1     2     1     1
data$expg<-ifelse(data$usaexphh==0,1,ifelse(data$usaexphh<=60,2,3))
table(data$expg)
## 
##     1     2     3 
## 12135  3531  2044

Para correr un modelo multinomial necesitamos usar otro paquete, en este caso nnet

library(nnet)
model2<-multinom(expg~femalehh+migfam1+migfam2+remittances,data=data)
## # weights:  18 (10 variable)
## initial  value 19456.423632 
## iter  10 value 13859.426698
## final  value 13558.319626 
## converged
summary(model2)
## Call:
## multinom(formula = expg ~ femalehh + migfam1 + migfam2 + remittances, 
##     data = data)
## 
## Coefficients:
##   (Intercept)  femalehh   migfam1   migfam2 remittances
## 2   -2.121039 -1.504917 0.8530599 0.5602262   0.8523268
## 3   -3.258217 -1.522472 1.4360964 0.6326950   1.5720417
## 
## Std. Errors:
##   (Intercept)  femalehh    migfam1    migfam2 remittances
## 2  0.05309465 0.0901648 0.04782173 0.06303008  0.06924580
## 3  0.08270208 0.1144699 0.06630168 0.09156115  0.07696469
## 
## Residual Deviance: 27116.64 
## AIC: 27136.64

Ahora para una ordinal

library(MASS)
data$expg2<-as.factor(data$expg)
model3<-polr(expg2~femalehh+migfam1+migfam2+remittances,data=data,Hess=T)
summary(model3)
## Call:
## polr(formula = expg2 ~ femalehh + migfam1 + migfam2 + remittances, 
##     data = data, Hess = T)
## 
## Coefficients:
##               Value Std. Error t value
## femalehh    -1.4979    0.07372  -20.32
## migfam1      1.0685    0.04095   26.09
## migfam2      0.5691    0.05425   10.49
## remittances  1.1690    0.05532   21.13
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2   1.8585   0.0463    40.1474
## 2|3   3.2303   0.0508    63.5266
## 
## Residual Deviance: 27136.88 
## AIC: 27148.88

Ahora vamos a ajustar binomial negativo y poisson primero hay que hacer la variable… para eso vamos a seguir con los meses de experiencia migratoria del hogar, pero vamos a convertirlos en anios y redondear para no tener promblemas

data$aniosexp<-round(data$usaexphh/12,0)
table(data$aniosexp)
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 12840   998   825   421   371   255   273   145   174   102   185    93 
##    12    13    14    15    16    17    18    19    20    21    22    23 
##   128    70   104    70    78    55    56    33    71    32    51    29 
##    24    25    26    27    28    29    30    31    32    33    34    35 
##    38    13    32    20    22     8    16    12    16     6    10     7 
##    36    37    38    39    40    41    42    43    44    45    46    47 
##     6     2     6     3     5     3     4     5     4     2     3     1 
##    48    51    56    58    65 
##     2     1     2     1     1
hist(data$aniosexp,breaks=10)

hist(data$aniosexp,breaks=50)

Ahora ajustamos el modelo poisson

model4<-glm(aniosexp~femalehh+migfam1+migfam2+remittances,family="poisson",data=data)
summary(model4)
## 
## Call:
## glm(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances, 
##     family = "poisson", data = data)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -3.959  -2.361  -1.446  -0.711  20.161  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.39514    0.01809  -21.84   <2e-16 ***
## femalehh    -0.98028    0.02510  -39.06   <2e-16 ***
## migfam1      0.96084    0.01425   67.45   <2e-16 ***
## migfam2      0.45941    0.02030   22.64   <2e-16 ***
## remittances  1.03373    0.01464   70.62   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 123801  on 17709  degrees of freedom
## Residual deviance: 109755  on 17705  degrees of freedom
## AIC: 125997
## 
## Number of Fisher Scoring iterations: 7

Ahora el binomial negativo

model5<-glm.nb(aniosexp~femalehh+migfam1+migfam2+remittances,data=data)
summary(model5)
## 
## Call:
## glm.nb(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances, 
##     data = data, init.theta = 0.1295846364, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0454  -0.9016  -0.7550  -0.3111   4.5409  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.51604    0.04959 -10.406   <2e-16 ***
## femalehh    -1.00760    0.07180 -14.034   <2e-16 ***
## migfam1      1.00985    0.05210  19.382   <2e-16 ***
## migfam2      0.55475    0.06258   8.865   <2e-16 ***
## remittances  1.11005    0.07612  14.583   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.1296) family taken to be 1)
## 
##     Null deviance: 11556  on 17709  degrees of freedom
## Residual deviance: 10514  on 17705  degrees of freedom
## AIC: 47871
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.12958 
##           Std. Err.:  0.00241 
## 
##  2 x log-likelihood:  -47859.13400

Los modelos inflados por exeso de ceros

library(pscl)
## Loading required package: lattice
## Classes and Methods for R developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University
## Simon Jackman
## hurdle and zeroinfl functions by Achim Zeileis
model6<-zeroinfl(aniosexp~femalehh+migfam1+migfam2+remittances | femalehh,data=data)
summary(model6)
## 
## Call:
## zeroinfl(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances | 
##     femalehh, data = data)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6140 -0.5995 -0.5788 -0.2956 22.6522 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.62536    0.01906  85.276  < 2e-16 ***
## femalehh     0.15743    0.02512   6.267 3.69e-10 ***
## migfam1      0.30330    0.01430  21.217  < 2e-16 ***
## migfam2      0.06649    0.02032   3.273  0.00106 ** 
## remittances  0.33808    0.01488  22.713  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.84559    0.01750   48.31   <2e-16 ***
## femalehh     1.42147    0.07669   18.54   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 16 
## Log-likelihood: -3.462e+04 on 7 Df
model7<-zeroinfl(aniosexp~femalehh+migfam1+migfam2+remittances | femalehh,dist="negbin",data=data)
summary(model7)
## 
## Call:
## zeroinfl(formula = aniosexp ~ femalehh + migfam1 + migfam2 + remittances | 
##     femalehh, data = data, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3837 -0.3771 -0.3492 -0.1859 29.6154 
## 
## Count model coefficients (negbin with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.15760    0.10824  -1.456    0.145    
## femalehh     0.18321    0.12406   1.477    0.140    
## migfam1      0.95285    0.05371  17.742  < 2e-16 ***
## migfam2      0.48485    0.06099   7.950 1.87e-15 ***
## remittances  1.07505    0.07903  13.603  < 2e-16 ***
## Log(theta)  -1.58078    0.11598 -13.630  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.1748     0.3340  -3.518 0.000435 ***
## femalehh      2.3923     0.2559   9.348  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.2058 
## Number of iterations in BFGS optimization: 30 
## Log-likelihood: -2.377e+04 on 8 Df

Para los modelos multinivel, primero con una variable continua

library(lme4)
## Loading required package: Matrix
model8<-lmer(usaexphh~femalehh+migfam1+migfam2+remittances +(1 | commun),data=data)
summary(model8)
## Linear mixed model fit by REML ['lmerMod']
## Formula: usaexphh ~ femalehh + migfam1 + migfam2 + remittances + (1 |  
##     commun)
##    Data: data
## 
## REML criterion at convergence: 194884.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0714 -0.4526 -0.1693 -0.0036 13.0055 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  commun   (Intercept)  320     17.89   
##  Residual             3463     58.85   
## Number of obs: 17710, groups:  commun, 116
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    7.457      2.065   3.612
## femalehh     -15.350      1.385 -11.080
## migfam1       22.896      1.460  15.686
## migfam2        5.475      1.329   4.120
## remittances   22.931      1.653  13.876
## 
## Correlation of Fixed Effects:
##             (Intr) femlhh migfm1 migfm2
## femalehh    -0.096                     
## migfam1     -0.255  0.006              
## migfam2     -0.397  0.040 -0.250       
## remittances -0.032 -0.065 -0.006 -0.079

Metiende una variable a nivel superior

model9<-lmer(usaexphh~femalehh+migfam1+migfam2+remittances +(femalehh | commun),data=data)
summary(model9)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## usaexphh ~ femalehh + migfam1 + migfam2 + remittances + (femalehh |  
##     commun)
##    Data: data
## 
## REML criterion at convergence: 194813.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1718 -0.4365 -0.1696  0.0043 13.0211 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  commun   (Intercept)  374.3   19.35         
##           femalehh     159.4   12.62    -1.00
##  Residual             3447.9   58.72         
## Number of obs: 17710, groups:  commun, 116
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    8.497      2.157   3.938
## femalehh     -15.695      1.806  -8.692
## migfam1       21.572      1.419  15.201
## migfam2        5.076      1.318   3.851
## remittances   23.441      1.638  14.309
## 
## Correlation of Fixed Effects:
##             (Intr) femlhh migfm1 migfm2
## femalehh    -0.625                     
## migfam1     -0.222  0.023              
## migfam2     -0.368  0.041 -0.281       
## remittances -0.040 -0.048  0.015 -0.076

Ahora para una logistica

model10<-glmer(chdie2~femalehh+migfam1+migfam2+remittances +(1 | commun),data=data, family="binomial")
summary(model10)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## chdie2 ~ femalehh + migfam1 + migfam2 + remittances + (1 | commun)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   6694.1   6740.8  -3341.0   6682.1    17704 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.5560 -0.2952 -0.1695 -0.0973 11.6260 
## 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  commun (Intercept) 1.311    1.145   
## Number of obs: 17710, groups:  commun, 116
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.8672     0.1772 -21.827  < 2e-16 ***
## femalehh      0.2699     0.1049   2.572  0.01011 *  
## migfam1       0.4289     0.1556   2.756  0.00586 ** 
## migfam2       0.1582     0.1532   1.032  0.30189    
## remittances   0.3533     0.1590   2.222  0.02628 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) femlhh migfm1 migfm2
## femalehh    -0.100                     
## migfam1     -0.311 -0.003              
## migfam2     -0.560  0.025 -0.199       
## remittances -0.052 -0.044 -0.047 -0.089
# y con efectos a nivel superio
model11<-glmer(chdie2~femalehh+migfam1+migfam2+remittances +(tmigprev | commun),data=data, family="binomial")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge with max|grad| = 0.00118723 (tol =
## 0.001, component 1)
summary(model11)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## chdie2 ~ femalehh + migfam1 + migfam2 + remittances + (tmigprev |  
##     commun)
##    Data: data
## 
##      AIC      BIC   logLik deviance df.resid 
##   6678.5   6740.8  -3331.3   6662.5    17702 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.6129 -0.2879 -0.1731 -0.0929 12.4795 
## 
## Random effects:
##  Groups Name        Variance  Std.Dev. Corr 
##  commun (Intercept) 2.3274282 1.52559       
##         tmigprev    0.0009824 0.03134  -0.80
## Number of obs: 17710, groups:  commun, 116
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.8617     0.1862 -20.743  < 2e-16 ***
## femalehh      0.2212     0.1056   2.094  0.03625 *  
## migfam1       0.4117     0.1584   2.598  0.00937 ** 
## migfam2       0.1321     0.1531   0.863  0.38805    
## remittances   0.3366     0.1601   2.102  0.03556 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) femlhh migfm1 migfm2
## femalehh    -0.086                     
## migfam1     -0.321  0.001              
## migfam2     -0.542  0.028 -0.192       
## remittances -0.044 -0.039 -0.062 -0.089
## convergence code: 0
## Model failed to converge with max|grad| = 0.00118723 (tol = 0.001, component 1)