Introducción a Stan: Diagnósticos y Jerarquías.

Este caso nos servirá para introducir el ambiente de Stan (Carpenter et al. 2017) con el cual simularemos realizaciones de parámetros para su uso en inferencia bayesiana. Para este propósito utilizaremos los datos de un estudio de desempeño de 8 escuelas (Rubin 1981; Gelman et al. 2013). Los datos consisten en el puntaje promedio de cada escuela y y los errores estándar reportados sigma.

## # A tibble: 8 x 3
##   id        y sigma
##   <fct> <dbl> <dbl>
## 1 1        28    15
## 2 2         8    10
## 3 3        -3    16
## 4 4         7    11
## 5 5        -1     9
## 6 6         1    11
## 7 7        18    10
## 8 8        12    18

En este caso se utiliza un modelo normal para los resultados de cada escuela

\[y_j \sim \mathsf{N}(\theta_j, \sigma_j) \qquad j = 1, \ldots, J\,,\] donde \(J = 8,\) y \(\theta_j\) representa el promedio de los alumnos de escuela que no observamos pero del cual tenemos un estimador \(y_j.\)

Nota que tenemos \(J\) valores distintos para \(\theta_j\) y \(\sigma_j.\) Dado que esperamos que las escuelas provengan de la misma población de escuelas asumimos que

\[ \theta_j \sim \mathsf{N}(\mu, \tau) \qquad j = 1, \ldots, J\,,\]

donde \(\mu\) representa la media poblacional (el promedio en el sistema escolar) y \(\tau\) la desviación estándar alrededor de este valor. Representamos nuestra incertidumbre en estos dos valores por medio de

\[ \mu \sim \mathsf{N}(0, 5) \qquad \tau \sim \textsf{Half-Cauchy}(0,5)\,, \]

lo cual representa información poco precisa de estos valores poblacionales.

Primer modelo en `Stan`

La forma en que escribimos el modelo en Stan es de manera generativa (bottom up): \[ \begin{align*} \mu &\sim \mathsf{N}(0, 5) \,,\\ \tau &\sim \textsf{Half-Cauchy}(0,5) \,,\\ \theta_j &\sim \mathsf{N}(\mu, \tau) \qquad j = 1, \ldots, J \,,\\ y_j &\sim \mathsf{N}(\theta_j, \sigma_j) \qquad j = 1, \ldots, J\,. \end{align*} \]

Un modelo de Stan se escribe en un archivo de texto y es una secuencia de bloques con nombre. En general el esqueleto es como sigue:

## functions {
##     // ... function declarations and definitions ...
## }
## data {
##     // ... declarations ...
## }
## transformed data {
##     // ... declarations ... statements ...
## }
## parameters {
##     // ... declarations ...
## }
## transformed parameters {
##     // ... declarations ... statements ...
## }
## model {
##     // ... declarations ... statements ...
## }
## generated quantities {
##     // ... declarations ... statements ...
## }

En general todos los bloques son opcionales, y no es necesario tener todos para compilar un modelo. Para mas información puedes consultar la guía.

Por ejemplo, el codigo de nuestro modelo para las escuelas es:

## data {
##   int<lower=0> J;
##   real y[J];
##   real<lower=0> sigma[J];
## }
## 
## parameters {
##   real mu;
##   real<lower=0> tau;
##   real theta[J];
## }
## 
## model {
##   mu ~ normal(0, 5);
##   tau ~ cauchy(0, 5);
##   theta ~ normal(mu, tau);
##   y ~ normal(theta, sigma);
## }

Nota que sigma está definida como parte del conjunto de datos que el usuario debe de proveer. Aunque es un parámetro en nuestro modelo (verosimilitud) no está sujeto al proceso de inferencia. Por otro lado, nota que la declaración no se hace de manera puntual componente por componente, sino de forma vectorizada.

Una vez escrito nuestro modelo, lo podemos compilar utilizando la librería de cmdstanr, que es la interface con Stan desde R.

modelos_files <- "/"
ruta <- file.path("modelos/modelo-escuelas.stan")
modelo <- cmdstan_model(ruta, dir = modelos_files)
# str(modelo)

Los datos que necesita el bloque data se pasan como una lista con nombres.

data_list <- c(data, J = 8)
data_list

## $id
## [1] 1 2 3 4 5 6 7 8
## Levels: 1 2 3 4 5 6 7 8
## 
## $y
## [1] 28  8 -3  7 -1  1 18 12
## 
## $sigma
## [1] 15 10 16 11  9 11 10 18
## 
## $J
## [1] 8

Primera cadena de Markov

Contra todas las recomendaciones usuales, corramos sólo una cadena corta.

muestras <- modelo$sample(data = data_list, 
                          chains = 1, 
                          iter=700, 
                          iter_warmup=500, 
                          seed=483892929, 
                          refresh=1200)

## Running MCMC with 1 chain...
## 
## Chain 1 Iteration:    1 / 1200 [  0%]  (Warmup) 
## Chain 1 Iteration:  501 / 1200 [ 41%]  (Sampling) 
## Chain 1 Iteration: 1200 / 1200 [100%]  (Sampling) 
## Chain 1 finished in 0.1 seconds.

## 
## Warning: 16 of 700 (2.0%) transitions ended with a divergence.
## This may indicate insufficient exploration of the posterior distribution.
## Possible remedies include: 
##   * Increasing adapt_delta closer to 1 (default is 0.8) 
##   * Reparameterizing the model (e.g. using a non-centered parameterization)
##   * Using informative or weakly informative prior distributions

El muestreador en automático nos regresa ciertas alertas las cuales podemos inspeccionar más a fondo con el siguiente comando:

muestras$cmdstan_diagnose()

## Processing csv files: /tmp/RtmpzXvgVf/modelo-escuelas-202102172330-1-5235e2.csv
## 
## Checking sampler transitions treedepth.
## Treedepth satisfactory for all transitions.
## 
## Checking sampler transitions for divergences.
## 16 of 700 (2.3%) transitions ended with a divergence.
## These divergent transitions indicate that HMC is not fully able to explore the posterior distribution.
## Try increasing adapt delta closer to 1.
## If this doesn't remove all divergences, try to reparameterize the model.
## 
## Checking E-BFMI - sampler transitions HMC potential energy.
## The E-BFMI, 0.25, is below the nominal threshold of 0.3 which suggests that HMC may have trouble exploring the target distribution.
## If possible, try to reparameterize the model.
## 
## Effective sample size satisfactory.
## 
## Split R-hat values satisfactory all parameters.
## 
## Processing complete.

Notamos que parece ser que tenemos varias transiciones divergentes, algunos parámetros tienen una \(\hat R\) tienen un valor que excede la referencia de 1.1, y parece ser que los estadisticos de energía también presentan problemas.

Podemos inspeccionar el resultado de las simulaciones utilizando

muestras$cmdstan_summary()

## Inference for Stan model: modelo_escuelas_model
## 1 chains: each with iter=(700); warmup=(0); thin=(1); 700 iterations saved.
## 
## Warmup took 0.025 seconds
## Sampling took 0.032 seconds
## 
##                  Mean     MCSE   StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
## 
## lp__              -14      1.0      5.9    -24   -14  -5.0       33     1017      1.0
## accept_stat__    0.81  4.7e-02  2.9e-01  0.035  0.96  1.00  3.9e+01  1.2e+03  1.0e+00
## stepsize__       0.13      nan  1.7e-16   0.13  0.13  0.13      nan      nan      nan
## treedepth__       3.7  1.4e-01  9.3e-01    2.0   4.0   5.0  4.5e+01  1.4e+03  1.0e+00
## n_leapfrog__       18  1.4e+00  1.1e+01    3.0    15    31  6.4e+01  2.0e+03  1.0e+00
## divergent__     0.023  1.4e-02  1.5e-01   0.00  0.00  0.00  1.2e+02  3.8e+03  1.0e+00
## energy__           19  1.1e+00  6.4e+00    9.3    19    30  3.4e+01  1.1e+03  1.0e+00
## 
## mu                4.1     0.45      3.1   -1.1   4.2   9.1       46     1443      1.0
## tau               3.4     0.47      2.9   0.74   2.5   9.4       40     1244      1.0
## theta[1]          5.6     0.54      5.5   -1.3   4.9    15      102     3194     1.00
## theta[2]          4.5     0.45      4.8   -2.9   4.3    12      114     3576      1.0
## theta[3]          3.7     0.36      5.3   -4.3   3.9    12      212     6631     1.00
## theta[4]          4.5     0.50      4.4   -2.3   4.2    12       78     2436      1.0
## theta[5]          3.5     0.41      4.1   -2.9   3.7    10       98     3074     1.00
## theta[6]          3.9     0.43      4.4   -3.5   3.8    11      106     3314     1.00
## theta[7]          5.9     0.60      4.9   -1.0   5.5    15       68     2111      1.0
## theta[8]          4.3     0.46      5.1   -3.2   3.9    13      123     3854     1.00
## 
## Samples were drawn using hmc with nuts.
## For each parameter, N_Eff is a crude measure of effective sample size,
## and R_hat is the potential scale reduction factor on split chains (at 
## convergence, R_hat=1).

Donde además de los resúmenes usuales para nuestros parámetros de interes encontramos resúmenes internos del simulador.

Podemos utilizar las funciones que RStan (otra interfase con Stan desde R) para visualizar los resúmenes de manera alternativa.

stanfit <- rstan::read_stan_csv(muestras$output_files())
stanfit

## Inference for Stan model: modelo-escuelas-202102172330-1-5235e2.
## 1 chains, each with iter=1200; warmup=500; thin=1; 
## post-warmup draws per chain=700, total post-warmup draws=700.
## 
##            mean se_mean   sd   2.5%    25%    50%   75% 97.5% n_eff Rhat
## mu         4.11    0.46 3.08  -2.20   2.22   4.14  6.21  9.73    45 1.00
## tau        3.44    0.47 2.93   0.56   1.38   2.50  4.47 11.29    39 1.02
## theta[1]   5.63    0.54 5.48  -3.59   2.05   4.92  8.29 18.66   102 1.00
## theta[2]   4.46    0.45 4.78  -3.77   1.40   4.28  7.26 15.23   114 1.00
## theta[3]   3.67    0.36 5.28  -9.59   0.93   3.88  6.62 13.48   211 1.00
## theta[4]   4.50    0.51 4.43  -2.98   1.64   4.23  7.19 14.26    77 1.00
## theta[5]   3.48    0.42 4.09  -5.28   0.80   3.73  6.02 11.05    97 1.00
## theta[6]   3.88    0.43 4.41  -4.71   1.33   3.77  6.52 12.65   105 1.00
## theta[7]   5.95    0.60 4.90  -2.99   2.45   5.50  8.44 17.37    67 1.00
## theta[8]   4.32    0.47 5.14  -4.20   1.08   3.91  7.00 14.76   122 1.00
## lp__     -14.17    1.04 5.89 -25.55 -18.11 -13.86 -9.58 -4.67    32 1.03
## 
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 23:30:53 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

De manera gráfica podemos explorar el factor de reducción utilizando la librería bayesplot.

rhats <- rhat(stanfit)
mcmc_rhat(rhats) + yaxis_text(hjust = 1) + sin_lineas

También podríamos explorar el estimador de tamaño efectivo de muestra de manera gráfica.

neff <- neff_ratio(stanfit, pars = c("theta", "mu", "tau"))
mcmc_neff(neff) +  yaxis_text(hjust = 1) + sin_lineas

En caso de necesitarlo podemos extraer las muestras en una tabla para poder procesarlas y generar visualizaciones. Por ejemplo, un gráfico de dispersión con \(\tau\) que es el parámetro donde más problemas parecemos tener.

muestras_dt <- tibble(posterior::as_draws_df(muestras$draws(c("tau", "theta"))))

g_tau <- muestras_dt %>% 
   ggplot(aes(x = .iteration, y = log(tau))) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    ylim(-4, 4) + 
    geom_hline(yintercept = 0.7657852, lty = 2)

g_theta <- muestras_dt %>% 
   ggplot(aes(x = .iteration, y =`theta[1]`)) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    geom_hline(yintercept = 0.7657852, lty = 2)


g_tau /g_theta

Claramente no podemos afirmar que el muestreador está explorando bien la posterior. Hay correlaciones muy altas. Si usáramos la media acumulada no seríamos capaces de diagnosticar estos problemas.

muestras_dt %>% 
   mutate(media = cummean(log(tau))) %>% 
   ggplot(aes(x = .iteration, y = media)) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    ylim(-4, 4) + 
    geom_hline(yintercept = 0.7657852, lty = 2)

Utilizar gráficos de dispersión bivariados nos ayuda a identificar mejor el problema En color salmón apuntamos las muestras con transiciones divergentes.

g1_dispersion <- muestras_dt %>% 
  mutate(log_tau = log(tau)) %>% 
  mcmc_scatter(
  pars = c("theta[1]", "log_tau"),
  np = nuts_params(stanfit),
  np_style = scatter_style_np(div_color = "salmon", div_alpha = 0.8)
) + sin_lineas+ ylim(-1, 4)

g1_dispersion

Otra visualización muy conocida es la de coordenadas paralelas. En este tipo de gråficos podemos observar de manera simultánea ciertos patrones en todos los componentes.

posterior_cp <- as.array(stanfit)

mcmc_parcoord(posterior_cp, 
              transform = list(tau = "log"),
              np = nuts_params(stanfit), 
              np_style = scatter_style_np(div_color = "salmon", 
                                          div_alpha = 0.5, 
                                          div_size = .5)) + 
  sin_lineas

acf_theta <- mcmc_acf(posterior_cp, pars = "theta[1]", lags = 10) + sin_lineas
acf_tau   <- mcmc_acf(posterior_cp, pars = "tau", lags = 10) + sin_lineas

acf_tau / acf_theta

Generando mas simulaciones

Hasta ahora los resultados parecen no ser buenos. Tenemos muestras con transiciones divergentes y una correlacion muy alta entre las muestras. Podríamos aumentar el número de simulaciones con la esperanza que esto permita una mejor exploracion de la posterior.

muestras <- modelo$sample(data        = data_list, 
                          chains      = 1, 
                          iter        = 5000, 
                          iter_warmup = 5000, 
                          seed        = 483892929, 
                          refresh     = 10000)

## Running MCMC with 1 chain...
## 
## Chain 1 Iteration:    1 / 10000 [  0%]  (Warmup) 
## Chain 1 Iteration: 5001 / 10000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 10000 / 10000 [100%]  (Sampling) 
## Chain 1 finished in 0.4 seconds.

stanfit <- rstan::read_stan_csv(muestras$output_files())
stanfit

## Inference for Stan model: modelo-escuelas-202102172330-1-3ecc11.
## 1 chains, each with iter=10000; warmup=5000; thin=1; 
## post-warmup draws per chain=5000, total post-warmup draws=5000.
## 
##            mean se_mean   sd   2.5%    25%    50%    75% 97.5% n_eff Rhat
## mu         4.34    0.11 3.19  -2.18   2.26   4.44   6.43 10.64   801    1
## tau        4.25    0.13 3.08   0.87   2.04   3.47   5.64 12.23   560    1
## theta[1]   6.52    0.15 5.86  -3.32   2.83   5.97   9.35 20.88  1433    1
## theta[2]   4.99    0.12 4.83  -4.51   1.93   4.86   7.86 15.06  1539    1
## theta[3]   3.78    0.14 5.59  -8.40   0.82   4.08   7.17 14.18  1665    1
## theta[4]   4.77    0.12 5.10  -5.34   1.69   4.74   7.70 15.10  1739    1
## theta[5]   3.33    0.13 4.77  -7.03   0.62   3.61   6.40 12.14  1388    1
## theta[6]   3.97    0.13 4.97  -6.69   1.16   4.13   7.05 13.59  1573    1
## theta[7]   6.70    0.14 5.33  -2.30   3.20   6.12   9.35 19.16  1373    1
## theta[8]   4.86    0.12 5.29  -5.67   1.73   4.85   7.95 15.86  1849    1
## lp__     -16.27    0.29 5.53 -26.79 -20.07 -16.44 -12.53 -5.36   364    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 23:30:57 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

rhats <- rhat(stanfit)
mcmc_rhat(rhats) + yaxis_text(hjust = 1) + sin_lineas

muestras_dt <- tibble(posterior::as_draws_df(muestras$draws(c("tau", "theta[1]"))))

muestras_dt %>% 
   ggplot(aes(x = .iteration, y = log(tau))) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    ylim(-4, 4) + 
    geom_hline(yintercept = 0.7657852, lty = 2)

Como vemos, seguimos teniendo problemas con la exploración del espacio y tenemos dificultades en explorar esa zona con \(\tau\) pequeña Lo confirmamos en la siguiente gráfica.

g2_dispersion <- muestras_dt %>% 
  mutate(log_tau = log(tau)) %>% 
  mcmc_scatter(
  pars = c("theta[1]", "log_tau"),
  np = nuts_params(stanfit),
  np_style = scatter_style_np(div_color = "salmon", div_alpha = 0.8)) + 
  sin_lineas+ ylim(-6, 3)

g2_dispersion

Confirmamos que seguimos con dificultades en el embudo de la distribución. Visualizaciones gráficas como la siguiente no nos permiten identificar dichos problemas.

muestras_dt %>% 
   mutate(media = cummean(log(tau))) %>% 
   ggplot(aes(x = .iteration, y = media)) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    ylim(0, 4) + 
    geom_hline(yintercept = 0.7657852, lty = 2)

Podriamos correr una cadena con algunas opciones que permitan la exploracion mas segura de la distribución.

muestras <- modelo$sample(data        = data_list, 
                          chains      = 1, 
                          iter        = 5000, 
                          iter_warmup = 5000, 
                          seed        = 483892929, 
                          refresh     = 10000, 
                          adapt_delta = .90)

## Running MCMC with 1 chain...
## 
## Chain 1 Iteration:    1 / 10000 [  0%]  (Warmup) 
## Chain 1 Iteration: 5001 / 10000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 10000 / 10000 [100%]  (Sampling) 
## Chain 1 finished in 0.5 seconds.

## 
## Warning: 115 of 5000 (2.0%) transitions ended with a divergence.
## This may indicate insufficient exploration of the posterior distribution.
## Possible remedies include: 
##   * Increasing adapt_delta closer to 1 (default is 0.8) 
##   * Reparameterizing the model (e.g. using a non-centered parameterization)
##   * Using informative or weakly informative prior distributions

muestras_dt <- tibble(posterior::as_draws_df(muestras$draws(c("tau", "theta[1]"))))
stanfit <- rstan::read_stan_csv(muestras$output_files())

muestras_dt %>% 
   ggplot(aes(x = .iteration, y = log(tau))) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    ylim(-4, 4) + 
    geom_hline(yintercept = 0.7657852, lty = 2)

g2_dispersion_90 <- muestras_dt %>% 
  mutate(log_tau = log(tau)) %>% 
  mcmc_scatter(
  pars = c("theta[1]", "log_tau"),
  np = nuts_params(stanfit),
  np_style = scatter_style_np(div_color = "salmon", div_alpha = 0.8)) + 
  sin_lineas + ylim(-6, 3)

g2_dispersion + g2_dispersion_90

Reparametrizando el modelo

Tener cuidado en la simulación del sistema Hamiltoniano nos ayuda hasta cierto punto. Seguimos teniendo problemas y no hay garantías que nuestra simulación y nuestrso estimadores Monte Carlo no estén sesgados.

Esta situación es muy común en modelos jerárquicos. El problema es la geometría de la distribución posterior. La ventaja es que existe una solución sencilla para hacer el problema de muestreo mas sencillo. Esto es al escribir el modelo en términos de una variable auxiliar: \[ \begin{align*} \mu &\sim \mathsf{N}(0, 5) \,,\\ \tau &\sim \textsf{Half-Cauchy}(0,5) \,,\\ \tilde{\theta}_j &\sim \mathsf{N}(0, 1), \qquad \quad j = 1, \ldots, J \,,\\ \theta_j &= \mu + \tau \cdot \tilde{\theta}_j\qquad j = 1, \ldots, J \,,\\ y_j &\sim \mathsf{N}(\theta_j, \sigma_j) \qquad j = 1, \ldots, J\,. \end{align*} \]

El modelo en Stan es muy parecido. La nomenclatura que se utiliza es: modelo centrado para el primero, y para la reparametrización presentada en la ecuacion de arriba nos refereimos a un modelo no centrado. Nota que la definición de nuevos parametros se hace desde el bloque transformed parameters en donde la asignación se ejecuta componente por componente mientras que la definición del modelo de probabilidad conjunto se puede hacer de manera vectorizada.

## data {
##   int<lower=0> J;
##   real y[J];
##   real<lower=0> sigma[J];
## }
## 
## parameters {
##   real mu;
##   real<lower=0> tau;
##   real theta_tilde[J];
## }
## 
## transformed parameters {
##   real theta[J];
##   for (j in 1:J)
##     theta[j] = mu + tau * theta_tilde[j];
## }
## 
## model {
##   mu ~ normal(0, 5);
##   tau ~ cauchy(0, 5);
##   theta_tilde ~ normal(0, 1);
##   y ~ normal(theta, sigma);
## }

Igual que antes lo necesitamos compilar para hacerlo un objeto ejecutable desde R.

ruta_ncp <- file.path("modelos/modelo-escuelas-ncp.stan")
modelo_ncp <- cmdstan_model(ruta_ncp, dir = modelos_files)

Muestreamos de la posterior

muestras_ncp <- modelo_ncp$sample(data = data_list, 
                          chains = 1, 
                          iter=5000, 
                          iter_warmup=5000, 
                          seed=483892929, 
                          refresh=10000)

## Running MCMC with 1 chain...
## 
## Chain 1 Iteration:    1 / 10000 [  0%]  (Warmup) 
## Chain 1 Iteration: 5001 / 10000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 10000 / 10000 [100%]  (Sampling) 
## Chain 1 finished in 0.3 seconds.

stanfit_ncp <- rstan::read_stan_csv(muestras_ncp$output_files())
stanfit_ncp

## Inference for Stan model: modelo-escuelas-ncp-202102172331-1-2b19d6.
## 1 chains, each with iter=10000; warmup=5000; thin=1; 
## post-warmup draws per chain=5000, total post-warmup draws=5000.
## 
##                 mean se_mean   sd   2.5%   25%   50%   75% 97.5% n_eff Rhat
## mu              4.47    0.04 3.28  -1.96  2.17  4.49  6.71 10.81  5327    1
## tau             3.51    0.05 3.12   0.13  1.22  2.65  4.90 11.69  3589    1
## theta_tilde[1]  0.30    0.01 0.98  -1.69 -0.36  0.31  0.97  2.17  5375    1
## theta_tilde[2]  0.09    0.01 0.94  -1.76 -0.54  0.11  0.73  1.89  5526    1
## theta_tilde[3] -0.09    0.01 0.98  -1.95 -0.76 -0.09  0.59  1.87  5129    1
## theta_tilde[4]  0.07    0.01 0.96  -1.86 -0.58  0.08  0.73  1.95  5295    1
## theta_tilde[5] -0.14    0.01 0.94  -1.90 -0.78 -0.16  0.50  1.70  4853    1
## theta_tilde[6] -0.08    0.01 0.95  -2.00 -0.69 -0.09  0.55  1.80  5928    1
## theta_tilde[7]  0.34    0.01 0.96  -1.56 -0.29  0.36  0.99  2.17  4929    1
## theta_tilde[8]  0.08    0.01 0.98  -1.83 -0.57  0.09  0.74  2.02  5580    1
## theta[1]        6.18    0.08 5.53  -3.25  2.77  5.62  8.92 19.80  4738    1
## theta[2]        4.97    0.06 4.64  -3.67  2.02  4.82  7.78 14.71  5729    1
## theta[3]        3.92    0.08 5.23  -7.30  1.08  4.05  7.13 13.78  4856    1
## theta[4]        4.84    0.07 4.87  -4.47  1.85  4.74  7.70 14.63  5496    1
## theta[5]        3.73    0.06 4.69  -6.68  1.10  3.98  6.62 12.26  5740    1
## theta[6]        4.04    0.06 4.77  -6.18  1.23  4.21  7.07 12.85  5713    1
## theta[7]        6.32    0.07 5.08  -1.96  2.94  5.71  8.95 18.81  5037    1
## theta[8]        4.93    0.07 5.24  -5.41  1.82  4.79  7.81 15.85  5486    1
## lp__           -7.00    0.05 2.36 -12.72 -8.32 -6.68 -5.29 -3.39  2230    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Feb 17 23:31:01 2021.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Si graficamos la dispersión de \(\tau\) (\(\log \tau\)), vemos un mejor comportamiento (del cual ya teniamos indicios por los diagnósticos del modelo).

muestras_dt <- tibble(posterior::as_draws_df(muestras_ncp$draws(c("tau", "theta[1]", "theta_tilde[1]"))))

muestras_dt %>% 
   ggplot(aes(x = .iteration, y = log(tau))) + 
    geom_point() + sin_lineas + 
    xlab("Iteraciones") + 
    ylim(-4, 4) + 
    geom_hline(yintercept = 0.7657852, lty = 2)

Si regresamos a los gráficos de dispersión para verificar que se hayan resuelto los problemas observamos lo siguiente:

muestras_dt %>% 
  mutate(log_tau = log(tau)) %>% 
  mcmc_scatter(
  pars = c("theta_tilde[1]", "log_tau"),
  np = nuts_params(stanfit_ncp),
  np_style = scatter_style_np(div_color = "salmon", div_alpha = 0.8)) + 
  sin_lineas + ylim(-6, 3)

g3_dispersion <- muestras_dt %>% 
  mutate(log_tau = log(tau)) %>% 
  mcmc_scatter(
  pars = c("theta[1]", "log_tau"),
  np = nuts_params(stanfit_ncp),
  np_style = scatter_style_np(div_color = "salmon", div_alpha = 0.8)) + 
  sin_lineas + ylim(-6, 3)

g3_dispersion

Que podemos comparar con lo que teniamos antes:

g2_dispersion + g2_dispersion_90 + g3_dispersion

Y muestra la limitante que tenía el modelo inicial para muestrear de la posterior por cuestiones de la geometría del problema original.

Por último, podemos observar las diferencias en el número efectivo de simulaciones las cuales mejorar considerablemente al cambiar la forma de escribir el modelo.

neff_cp <- neff_ratio(stanfit_cp, pars = c("theta", "mu", "tau"))
neff_ncp <- neff_ratio(stanfit_ncp, pars = c("theta", "mu", "tau"))
g_cp <- mcmc_neff(neff_cp) + ggtitle("Modelo Centrado") + 
  sin_lineas + yaxis_text(hjust = 1) 
g_ncp <- mcmc_neff(neff_ncp) + ggtitle("Modelo No Centrado") + 
  sin_lineas + yaxis_text(hjust = 1) 

g_cp / g_ncp

Referencias

Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).

Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian Data Analysis. CRC Press.

Rubin, Donald B. 1981. “Estimation in Parallel Randomized Experiments.” Journal of Educational Statistics 6 (4): 377–401.