Primer paso: carga de paquetes necesarios:
library(ggplot2)
library(stargazer)
Para este ejercicio utilizaremos la base de datos de indicadores sociodemográficos del mundo por quinquenios. La fuente original de los datos es el proyecto Gapminder y puede descargarse del siguiente enlace: dataWorld_q.rda
Consideraremos las siguientes variables
Primero calcularemos dos modelos de regresión simple, que luego compararemos con un modelo de regresión múltiple. Trabajaremos primero con datos del quinquenio 1995-1999:
# Carga de los datos
load("dataWorld_q.rda")
# Modelo 1 simple
mod.reg1 <- lm(tfr ~ yearSchF,
data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])
summary(mod.reg1)
##
## Call:
## lm(formula = tfr ~ yearSchF, data = dataWorld_q[dataWorld_q$quinq ==
## "1995-1999", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.49027 -0.56797 -0.03326 0.56028 2.96957
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.80769 0.16843 40.42 <2e-16 ***
## yearSchF -0.43497 0.02037 -21.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9627 on 171 degrees of freedom
## (21 observations deleted due to missingness)
## Multiple R-squared: 0.7272, Adjusted R-squared: 0.7256
## F-statistic: 455.8 on 1 and 171 DF, p-value: < 2.2e-16
# Modelo 2 simple
mod.reg2 <- lm(tfr ~ contracep,
data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])
summary(mod.reg2)
##
## Call:
## lm(formula = tfr ~ contracep, data = dataWorld_q[dataWorld_q$quinq ==
## "1995-1999", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05066 -0.49731 0.02912 0.61047 2.15326
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.795721 0.162956 41.70 <2e-16 ***
## contracep -0.067323 0.003072 -21.91 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8302 on 128 degrees of freedom
## (64 observations deleted due to missingness)
## Multiple R-squared: 0.7895, Adjusted R-squared: 0.7879
## F-statistic: 480.2 on 1 and 128 DF, p-value: < 2.2e-16
# Modelo 3 múltiple
mod.reg3 <- lm(tfr ~ yearSchF + contracep,
data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])
summary(mod.reg3)
##
## Call:
## lm(formula = tfr ~ yearSchF + contracep, data = dataWorld_q[dataWorld_q$quinq ==
## "1995-1999", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.71926 -0.58924 -0.00554 0.52794 2.47162
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.102160 0.153349 46.314 < 2e-16 ***
## yearSchF -0.176278 0.029194 -6.038 1.76e-08 ***
## contracep -0.046436 0.004366 -10.637 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7355 on 121 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.8405, Adjusted R-squared: 0.8379
## F-statistic: 318.9 on 2 and 121 DF, p-value: < 2.2e-16
Una forma más “elegante” de presentar los datos de los tres modelos es a través del paquete stargazer. El paquete stargazer nos permite también etiquetar las variables con nombre significativos:
stargazer(mod.reg1, mod.reg2, mod.reg3, type = "text",
omit.stat=c("ser","f"),
title = "OLS para Tasa Global de Fecundidad 1995-1999",
covariate.labels = c("Años de escolaridad mujeres",
"% Uso de anticonceptivos"),
dep.var.caption = "Variable dependiente:",
dep.var.labels = "Fecundidad",
star.cutoffs = c(0.05, 0.01, 0.001))
##
## OLS para Tasa Global de Fecundidad 1995-1999
## ===========================================================
## Variable dependiente:
## -------------------------------
## Fecundidad
## (1) (2) (3)
## -----------------------------------------------------------
## Años de escolaridad mujeres -0.435*** -0.176***
## (0.020) (0.029)
##
## % Uso de anticonceptivos -0.067*** -0.046***
## (0.003) (0.004)
##
## Constant 6.808*** 6.796*** 7.102***
## (0.168) (0.163) (0.153)
##
## -----------------------------------------------------------
## Observations 173 130 124
## R2 0.727 0.790 0.841
## Adjusted R2 0.726 0.788 0.838
## ===========================================================
## Note: *p<0.05; **p<0.01; ***p<0.001
new.data1 <- data.frame(yearSchF = c(11), contracep = 50)
predict(mod.reg3, newdata = new.data1)
## 1
## 2.841304
new.data2 <- data.frame(yearSchF = c(11,12,21), contracep = 50)
predict(mod.reg3, newdata = new.data2)
## 1 2 3
## 2.841304 2.665027 1.078528
new.data3 <- data.frame(yearSchF = c(11), contracep = c(50, 60, 70))
predict(mod.reg3, newdata = new.data3)
## 1 2 3
## 2.841304 2.376944 1.912584
Se incluye el ingreso per cápita, transformado en escala logarítmica de base 10:
mod.reg4 <- lm(tfr ~ yearSchF + contracep + log10(incomePp),
data = dataWorld_q[dataWorld_q$quinq == "1995-1999", ])
summary(mod.reg4)
##
## Call:
## lm(formula = tfr ~ yearSchF + contracep + log10(incomePp), data = dataWorld_q[dataWorld_q$quinq ==
## "1995-1999", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96829 -0.55149 0.00337 0.54480 2.20363
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.577224 0.530364 16.172 < 2e-16 ***
## yearSchF -0.139705 0.031024 -4.503 1.56e-05 ***
## contracep -0.042635 0.004436 -9.610 < 2e-16 ***
## log10(incomePp) -0.509500 0.175828 -2.898 0.00447 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.714 on 120 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.851, Adjusted R-squared: 0.8472
## F-statistic: 228.4 on 3 and 120 DF, p-value: < 2.2e-16
stargazer(mod.reg1, mod.reg2, mod.reg3, mod.reg4, type = "text",
omit.stat=c("ser","f"),
title = "OLS para Tasa Global de Fecundidad 1995-1999",
covariate.labels = c("Años de escolaridad mujeres",
"% Uso de anticonceptivos",
"Ingreso per cápita (log10)"),
dep.var.caption = "Variable dependiente:",
dep.var.labels = "Fecundidad",
star.cutoffs = c(0.05, 0.01, 0.001))
##
## OLS para Tasa Global de Fecundidad 1995-1999
## ===================================================================
## Variable dependiente:
## ---------------------------------------
## Fecundidad
## (1) (2) (3) (4)
## -------------------------------------------------------------------
## Años de escolaridad mujeres -0.435*** -0.176*** -0.140***
## (0.020) (0.029) (0.031)
##
## % Uso de anticonceptivos -0.067*** -0.046*** -0.043***
## (0.003) (0.004) (0.004)
##
## Ingreso per cápita (log10) -0.509**
## (0.176)
##
## Constant 6.808*** 6.796*** 7.102*** 8.577***
## (0.168) (0.163) (0.153) (0.530)
##
## -------------------------------------------------------------------
## Observations 173 130 124 124
## R2 0.727 0.790 0.841 0.851
## Adjusted R2 0.726 0.788 0.838 0.847
## ===================================================================
## Note: *p<0.05; **p<0.01; ***p<0.001
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
data95 <- subset(dataWorld_q, dataWorld_q$quinq == "1995-1999")
names(data95)
## [1] "country" "quinq" "tfr" "yearSchF" "contracep"
## [6] "age1mar" "sanitat" "water" "birthSkill" "childMort"
## [11] "deathRate" "extPov" "famWorkFem" "femWork" "incomePp"
## [16] "income10p" "gini" "lifExpFem" "lifExpTot" "maleWork"
## [21] "materMort" "vaccMeas" "schGenEq" "doctor" "teenFert"
describe(data95[, c(3,4,5)], fast=T)
## vars n mean sd min max range se
## tfr 1 184 3.54 1.82 1.20 7.71 6.51 0.13
## yearSchF 2 174 7.45 3.59 0.44 13.80 13.36 0.27
## contracep 3 131 47.47 23.70 3.30 87.08 83.78 2.07
Para calcular los coeficientes estandarizados de un modelo de regresión podemos utilizar el comando lm.beta del paquete con el mismo nombre. Por ejemplo, para el modelo 3 tenemos:
library(lm.beta)
lm.beta(mod.reg3)
##
## Call:
## lm(formula = tfr ~ yearSchF + contracep, data = dataWorld_q[dataWorld_q$quinq ==
## "1995-1999", ])
##
## Standardized Coefficients::
## (Intercept) yearSchF contracep
## 0.0000000 -0.3502702 -0.6170146