Processing math: 80%
  • 1 Introducción
  • 2 Problema
  • 3 Ajustando el modelo
  • 4 Resultados
  • 5 Resultados gráficos

1 Introducción

The brms package fits Bayesian generalized (non-)linear multivariate multilevel models using ‘Stan’ for full Bayesian inference. A wide range of distributions and link functions are supported. All parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their beliefs.

For more information please consult the webpage.

2 Problema

Vamos a simular ni=50 observaciones para G=10 grupos, es decir, en total 500 observaciones que sigan el siguiente modelo.

yij∼N(μij,σ2y)μij=4−6xij+b0iσ2y=16b0∼N(0,σ2b0=625)xij∼U(min=−5,max=6)

ni <- 50
G <- 10
nobs <- ni * G
grupo <- factor(rep(x=1:G, each=ni))
obs <- rep(x=1:ni, times=G)
set.seed(1234567)                      # Fijando la semilla
x <- runif(n=nobs, min=-5, max=6)
b0 <- rnorm(n=G, mean=0, sd=sqrt(625)) # Intercepto aleatorio
b0 <- rep(x=b0, each=ni)               # El mismo intercepto aleatorio pero repetido
media <- 4 - 6 * x + b0
set.seed(123456)                       # Fijando la semilla
y <- rnorm(n=nobs, mean=media, sd=sqrt(16))
datos <- data.frame(grupo, obs, b0, x, media, y)

Vamos ahora a mostra las primeras observaciones de la base de datos simulada.

head(datos)
##   grupo obs        b0          x      media          y
## 1     1   1 -27.21634  1.1848692 -30.325555 -26.990622
## 2     1   2 -27.21634  2.9914835 -41.165241 -42.269432
## 3     1   3 -27.21634  5.0677484 -53.622830 -55.042837
## 4     1   4 -27.21634 -4.6733619   4.823832   5.173782
## 5     1   5 -27.21634  3.4426067 -43.871980 -34.862957
## 6     1   6 -27.21634  0.2805938 -24.899902 -21.562061

A continuación se dibujan los datos simulados.

library(ggplot2)
ggplot(datos, aes(x, y, color=grupo) ) + 
  geom_point() + 
  labs(colour="Grupo/Cluster")

3 Ajustando el modelo

library(brms)

bprior1 <- prior(student_t(5,0,10), class = b) +
  prior(cauchy(0,2), class = sd)

mod <- brm(y ~ x + (1|grupo), data=datos, family=gaussian,
           prior=bprior1)
## 
## SAMPLING FOR MODEL '6b29a435592c543b59fe125b6d292592' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 4.171 seconds (Warm-up)
## Chain 1:                4.022 seconds (Sampling)
## Chain 1:                8.193 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '6b29a435592c543b59fe125b6d292592' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 5.255 seconds (Warm-up)
## Chain 2:                6.973 seconds (Sampling)
## Chain 2:                12.228 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '6b29a435592c543b59fe125b6d292592' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 4.677 seconds (Warm-up)
## Chain 3:                3.941 seconds (Sampling)
## Chain 3:                8.618 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '6b29a435592c543b59fe125b6d292592' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
## Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 4.696 seconds (Warm-up)
## Chain 4:                4.454 seconds (Sampling)
## Chain 4:                9.15 seconds (Total)
## Chain 4:

4 Resultados

Una tabla de resumen se puede obtener así:

summary(mod)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: y ~ x + (1 | grupo) 
##    Data: datos (Number of observations: 500) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~grupo (Number of levels: 10) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)    22.02      5.60    14.07    36.49 1.01      584      986
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.62      6.37    -9.18    15.81 1.00      665      755
## x            -5.98      0.06    -6.09    -5.86 1.01     1675     1511
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     4.02      0.13     3.78     4.29 1.00     1609     1203
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

De esta tabla se observa que los valores medios de las distribuciones a posterior para cada parámetro son

ˆΘ=(ˆβ0=3.6189297,ˆβ1=−5.9787835,ˆσy=4.0231953,ˆσbo=22.0196408)⊤ y están cerca de los verdaderos valores \boldsymbol{\Theta} = (\beta_0=4, \beta_1=-6, \sigma_y=4, \sigma_{b0}=25)^\top.

5 Resultados gráficos

Para obtener las distribuciones a posteior de los parámetros se usa el siguiente código.

plot(mod)

plot(conditional_effects(mod))