04/10, 2018

¿Como decidimos que modelo elegimos para explicar mejor un fenomeno?

Que es lo que no nos permite la inferencia multimodelo

  • Arreglar un estudio mal diseñado
  • Predecir en base a variables no relacionadas
  • Darnos una receta

Para que nos sirve la inferencia multimodelo

  • Seleccionar el o los modelos más parsimoniosos entre varios en competencia
    • No el que predice más
  • Nos entrega estrategia para seleccionar modelos

Que necesitamos para realizar inferencia multimodelo

  • Buen diseño de muestreo/experimental
  • Generación cuidadosa de hipótesis
  • Selección adecuada de variables para distinguir entre hipótesis

Paquete MuMIn

Multi Moldel Inference

data("mtcars")

fit1 <- glm(mpg ~ carb + cyl, data = mtcars)
fit2 <- glm(mpg ~ cyl + wt, data = mtcars)
fit3 <- glm(mpg ~ am + qsec + wt, data = mtcars)
fit4 <- glm(mpg ~ carb + cyl + wt, data = mtcars)
fit5 <- glm(mpg ~ am + carb + cyl + qsec + wt, data = mtcars)
fit6 <- glm(mpg ~ am + carb + cyl + hp + qsec, data = mtcars)

models <- list(fit1, fit2, fit3, fit4, fit5, fit6)

Paquete MuMIn (model.sel)

Select <- model.sel(models)
(Intercept) carb cyl wt am qsec hp df logLik AICc delta weight
9.62 NA NA -3.92 2.94 1.23 NA 5 -72.06 156.43 0.00 0.46
39.69 NA -1.51 -3.19 NA NA NA 4 -74.01 157.49 1.06 0.27
39.60 -0.49 -1.29 -3.16 NA NA NA 5 -72.81 157.93 1.50 0.22
20.04 -0.52 -0.46 -3.05 2.94 0.73 NA 7 -71.03 160.73 4.30 0.05
33.97 -0.80 -1.27 NA 4.19 -0.13 -0.02 7 -74.85 168.36 11.93 0.00
37.81 -0.53 -2.63 NA NA NA NA 4 -80.79 171.06 14.64 0.00
  • ¿Que modelos seleccionamos?

¿Pesos de Akaike?

  • Suman uno
  • A mayor peso, mejor modelo
  • Dependen del número de modelos
Selected <- subset(Select, delta <= 2)
(Intercept) carb cyl wt am qsec hp df logLik AICc delta weight
9.62 NA NA -3.92 2.94 1.23 NA 5 -72.06 156.43 0.00 0.49
39.69 NA -1.51 -3.19 NA NA NA 4 -74.01 157.49 1.06 0.29
39.60 -0.49 -1.29 -3.16 NA NA NA 5 -72.81 157.93 1.50 0.23
(Intercept) carb cyl wt am qsec hp df logLik AICc delta weight
9.62 NA NA -3.92 2.94 1.23 NA 5 -72.06 156.43 0.00 0.63
39.69 NA -1.51 -3.19 NA NA NA 4 -74.01 157.49 1.06 0.37

¿Hemos seleccionado 3 modelos ahora que?

  • Seleccionar el mejor modelo
BestModel <- get.models(Select, 1)[[1]]
broom::glance(BestModel)
null.deviance df.null logLik AIC BIC deviance df.residual
1126.05 31 -72.06 154.12 161.45 169.29 28

Mejor modelo

broom::tidy(BestModel)
term estimate std.error statistic p.value
(Intercept) 9.62 6.96 1.38 0.18
am 2.94 1.41 2.08 0.05
qsec 1.23 0.29 4.25 0.00
wt -3.92 0.71 -5.51 0.00
  • ¿p no es significativo?
    • ¡¡¡¡No importa!!!!

Promediar modelos

Promediar modelos usando MuMIn

  • Trabajemos con el numero de cilindros
S <- as.data.frame(Selected)
S <- as.data.frame(Selected) %>% select(cyl, weight)
cyl weight
3 NA 0.4854123
2 -1.51 0.2850763
4 -1.29 0.2295115
  • Dos métodos full y subset

Metodo full

\[\hat{\theta} = \sum_{i=1}^R w_i \times \theta_i\]

S_full <- S
S_full$cyl <- ifelse(is.na(S_full$cyl), 0, S_full$cyl)
S_full <- S_full %>% mutate(Theta_i = cyl * weight)
cyl weight Theta_i
0.00 0.4854123 0.0000000
-1.51 0.2850763 -0.4298366
-1.29 0.2295115 -0.2960210
Cyl_hat <- sum(S_full$Theta_i)
## [1] -0.7258576

Método subset

\[\hat{\theta} = \frac{\sum_{i=1}^Rw_i \times \theta_i}{\sum_{i=1}^Rw_i}\]

S_sub <- S %>% filter(!is.na(cyl))

S_sub <- S_sub %>% mutate(Theta_i = cyl * weight)
cyl weight Theta_i
-1.51 0.2850763 -0.4298366
-1.29 0.2295115 -0.2960210
Cyl_hat <- sum(S_sub$Theta_i)/sum(S_sub$weight)
## [1] -1.410561

MuMIn

AverageModel <- model.avg(Select, subset = delta < 2, fit = T)
AverageModel$coefficients
(Intercept) am qsec wt cyl carb
full 25.07 1.43 0.60 -3.54 -0.73 -0.11
subset 25.07 2.94 1.23 -3.54 -1.41 -0.49
AverageModel$importance
Importance
wt 1.0000000
cyl 0.5145877
am 0.4854123
qsec 0.4854123
carb 0.2295115

Comparemos modelos

Comparemos modelos (cont.)

Discusión artículo

GPA y multiolinearidad

GPA MathSAT VerbalSAT HSmath HSEnglish
GPA 1.00 0.85 0.65 0.69 0.61
MathSAT 0.85 1.00 0.46 0.56 0.66
VerbalSAT 0.65 0.46 1.00 0.43 0.42
HSmath 0.69 0.56 0.43 1.00 0.27
HSEnglish 0.61 0.66 0.42 0.27 1.00

GPA y multiolinearidad (cont)

(Intercept) HSEnglish HSmath MathSAT VerbalSAT df logLik AICc delta weight
0.3342 NA 0.1799 0.0022 0.0013 5 0.6323 13.0211 0.0000 0.4554
0.5071 NA NA 0.0026 0.0016 4 -1.7029 14.0724 1.0513 0.2692
0.6438 NA 0.2331 0.0025 NA 4 -2.6622 15.9911 2.9700 0.1031
0.1615 0.0876 0.1894 0.0020 0.0013 6 0.7951 16.8714 3.8503 0.0664
0.4863 0.0111 NA 0.0026 0.0016 5 -1.7007 17.6871 4.6660 0.0442
0.2690 0.1756 0.2474 0.0021 NA 5 -2.1589 18.6035 5.5824 0.0279
0.9670 NA NA 0.0032 NA 3 -5.5719 18.6437 5.6227 0.0274
0.7670 0.0989 NA 0.0030 NA 4 -5.4517 21.5701 8.5490 0.0063
(Intercept) HSmath MathSAT VerbalSAT HSEnglish
full 0.4241 0.1262 0.0024 0.0012 0.0132
subset 0.4241 0.1930 0.0024 0.0014 0.0895

Todos vs Delta 2

Todos los modelos
(Intercept) HSmath MathSAT VerbalSAT HSEnglish
full 0.4241 0.1262 0.0024 0.0012 0.0132
subset 0.4241 0.1930 0.0024 0.0014 0.0895
Delta 2
(Intercept) HSmath MathSAT VerbalSAT
full 0.3985 0.1130 0.0023 0.0014
subset 0.3985 0.1799 0.0023 0.0014