Factores Asociados a la Fecundidad: Modelos múltiples

Primer paso: carga de paquetes necesarios:

library(ggplot2)
library(stargazer)

Para este ejercicio utilizaremos la base de datos de indicadores sociodemográficos del mundo por quinquenios. La fuente original de los datos es el proyecto Gapminder y puede descargarse del siguiente enlace: dataWorld_q.rda

Consideraremos las siguientes variables

Primero calcularemos dos modelos de regresión simple, que luego compararemos con un modelo de regresión múltiple. Trabajaremos primero con datos del quinquenio 1995-1999:

# Carga de los datos

load("dataWorld_q.rda")

# Modelo 1 simple

mod.reg1 <- lm(tfr ~ yearSchF, 
               data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])

summary(mod.reg1)
## 
## Call:
## lm(formula = tfr ~ yearSchF, data = dataWorld_q[dataWorld_q$quinq == 
##     "1995-1999", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.49027 -0.56797 -0.03326  0.56028  2.96957 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.80769    0.16843   40.42   <2e-16 ***
## yearSchF    -0.43497    0.02037  -21.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9627 on 171 degrees of freedom
##   (21 observations deleted due to missingness)
## Multiple R-squared:  0.7272, Adjusted R-squared:  0.7256 
## F-statistic: 455.8 on 1 and 171 DF,  p-value: < 2.2e-16
# Modelo 2 simple

mod.reg2 <- lm(tfr ~ contracep, 
               data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])

summary(mod.reg2)
## 
## Call:
## lm(formula = tfr ~ contracep, data = dataWorld_q[dataWorld_q$quinq == 
##     "1995-1999", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.05066 -0.49731  0.02912  0.61047  2.15326 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.795721   0.162956   41.70   <2e-16 ***
## contracep   -0.067323   0.003072  -21.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8302 on 128 degrees of freedom
##   (64 observations deleted due to missingness)
## Multiple R-squared:  0.7895, Adjusted R-squared:  0.7879 
## F-statistic: 480.2 on 1 and 128 DF,  p-value: < 2.2e-16
# Modelo 3 múltiple

mod.reg3 <- lm(tfr ~ yearSchF +  contracep, 
               data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])

summary(mod.reg3)
## 
## Call:
## lm(formula = tfr ~ yearSchF + contracep, data = dataWorld_q[dataWorld_q$quinq == 
##     "1995-1999", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71926 -0.58924 -0.00554  0.52794  2.47162 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.102160   0.153349  46.314  < 2e-16 ***
## yearSchF    -0.176278   0.029194  -6.038 1.76e-08 ***
## contracep   -0.046436   0.004366 -10.637  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7355 on 121 degrees of freedom
##   (70 observations deleted due to missingness)
## Multiple R-squared:  0.8405, Adjusted R-squared:  0.8379 
## F-statistic: 318.9 on 2 and 121 DF,  p-value: < 2.2e-16

Una forma más “elegante” de presentar los datos de los tres modelos es a través del paquete stargazer. El paquete stargazer nos permite también etiquetar las variables con nombre significativos:

stargazer(mod.reg1, mod.reg2, mod.reg3, type = "text",
          omit.stat=c("ser","f"), 
          title = "OLS para Tasa Global de Fecundidad 1995-1999",
          covariate.labels = c("Años de escolaridad mujeres", 
                               "% Uso de anticonceptivos"),
          dep.var.caption = "Variable dependiente:",
          dep.var.labels   = "Fecundidad",
          star.cutoffs = c(0.05, 0.01, 0.001))
## 
## OLS para Tasa Global de Fecundidad 1995-1999
## ===========================================================
##                                  Variable dependiente:     
##                             -------------------------------
##                                       Fecundidad           
##                                (1)        (2)        (3)   
## -----------------------------------------------------------
## Años de escolaridad mujeres -0.435***             -0.176***
##                              (0.020)               (0.029) 
##                                                            
## % Uso de anticonceptivos               -0.067***  -0.046***
##                                         (0.003)    (0.004) 
##                                                            
## Constant                     6.808***   6.796***  7.102*** 
##                              (0.168)    (0.163)    (0.153) 
##                                                            
## -----------------------------------------------------------
## Observations                   173        130        124   
## R2                            0.727      0.790      0.841  
## Adjusted R2                   0.726      0.788      0.838  
## ===========================================================
## Note:                         *p<0.05; **p<0.01; ***p<0.001

Estimando Valores de Y

new.data1 <- data.frame(yearSchF = c(11), contracep = 50)
predict(mod.reg3, newdata = new.data1)
##        1 
## 2.841304
new.data2 <- data.frame(yearSchF = c(11,12,21), contracep = 50)
predict(mod.reg3, newdata = new.data2)
##        1        2        3 
## 2.841304 2.665027 1.078528
new.data3 <- data.frame(yearSchF = c(11), contracep = c(50, 60, 70))
predict(mod.reg3, newdata = new.data3)
##        1        2        3 
## 2.841304 2.376944 1.912584

Modelo con 3 variables independientes

Se incluye el ingreso per cápita, transformado en escala logarítmica de base 10:

mod.reg4 <- lm(tfr ~ yearSchF +  contracep + log10(incomePp), 
               data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])

summary(mod.reg4)
## 
## Call:
## lm(formula = tfr ~ yearSchF + contracep + log10(incomePp), data = dataWorld_q[dataWorld_q$quinq == 
##     "1995-1999", ])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96829 -0.55149  0.00337  0.54480  2.20363 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      8.577224   0.530364  16.172  < 2e-16 ***
## yearSchF        -0.139705   0.031024  -4.503 1.56e-05 ***
## contracep       -0.042635   0.004436  -9.610  < 2e-16 ***
## log10(incomePp) -0.509500   0.175828  -2.898  0.00447 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.714 on 120 degrees of freedom
##   (70 observations deleted due to missingness)
## Multiple R-squared:  0.851,  Adjusted R-squared:  0.8472 
## F-statistic: 228.4 on 3 and 120 DF,  p-value: < 2.2e-16
stargazer(mod.reg1, mod.reg2, mod.reg3, mod.reg4, type = "text",
          omit.stat=c("ser","f"), 
          title = "OLS para Tasa Global de Fecundidad 1995-1999",
          covariate.labels = c("Años de escolaridad mujeres", 
                               "% Uso de anticonceptivos",
                               "Ingreso per cápita (log10)"),
          dep.var.caption = "Variable dependiente:",
          dep.var.labels   = "Fecundidad",
          star.cutoffs = c(0.05, 0.01, 0.001))
## 
## OLS para Tasa Global de Fecundidad 1995-1999
## ===================================================================
##                                      Variable dependiente:         
##                             ---------------------------------------
##                                           Fecundidad               
##                                (1)       (2)       (3)       (4)   
## -------------------------------------------------------------------
## Años de escolaridad mujeres -0.435***           -0.176*** -0.140***
##                              (0.020)             (0.029)   (0.031) 
##                                                                    
## % Uso de anticonceptivos              -0.067*** -0.046*** -0.043***
##                                        (0.003)   (0.004)   (0.004) 
##                                                                    
## Ingreso per cápita (log10)                                -0.509** 
##                                                            (0.176) 
##                                                                    
## Constant                    6.808***  6.796***  7.102***  8.577*** 
##                              (0.168)   (0.163)   (0.153)   (0.530) 
##                                                                    
## -------------------------------------------------------------------
## Observations                   173       130       124       124   
## R2                            0.727     0.790     0.841     0.851  
## Adjusted R2                   0.726     0.788     0.838     0.847  
## ===================================================================
## Note:                                 *p<0.05; **p<0.01; ***p<0.001

Cálculo de coeficientes estandarizados ó coeficientes beta (\(\beta\))

library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
data95 <- subset(dataWorld_q, dataWorld_q$quinq == "1995-1999")

names(data95)
##  [1] "country"    "quinq"      "tfr"        "yearSchF"   "contracep" 
##  [6] "age1mar"    "sanitat"    "water"      "birthSkill" "childMort" 
## [11] "deathRate"  "extPov"     "famWorkFem" "femWork"    "incomePp"  
## [16] "income10p"  "gini"       "lifExpFem"  "lifExpTot"  "maleWork"  
## [21] "materMort"  "vaccMeas"   "schGenEq"   "doctor"     "teenFert"
describe(data95[, c(3,4,5)], fast=T)
##           vars   n  mean    sd  min   max range   se
## tfr          1 184  3.54  1.82 1.20  7.71  6.51 0.13
## yearSchF     2 174  7.45  3.59 0.44 13.80 13.36 0.27
## contracep    3 131 47.47 23.70 3.30 87.08 83.78 2.07

Para calcular los coeficientes estandarizados de un modelo de regresión podemos utilizar el comando lm.beta del paquete con el mismo nombre. Por ejemplo, para el modelo 3 tenemos:

library(lm.beta)

lm.beta(mod.reg3)
## 
## Call:
## lm(formula = tfr ~ yearSchF + contracep, data = dataWorld_q[dataWorld_q$quinq == 
##     "1995-1999", ])
## 
## Standardized Coefficients::
## (Intercept)    yearSchF   contracep 
##   0.0000000  -0.3502702  -0.6170146

Ejercicios