Estimación de parámetros de una población normal con LaplacesDemon

Introducción
Problema
Los datos
Distribuciones a priori
Definiendo el modelo
Corriendo el modelo
Medias de las distribuciones a posteriori
Algunos gráficos

Introducción

LaplacesDemon is an open-source statistical package that is intended to provide a complete environment for Bayesian inference. LaplacesDemon has been used in numerous fields. The user writes their own model specification function and selects a numerical approximation algorithm to update their Bayesian model. Some numerical approximation families of algorithms include Laplace’s method (Laplace approximation), numerical integration (iterative quadrature), Markov chain Monte Carlo (MCMC), and Variational Bayes.

For more information please consult the webpage.

Problema

El objetivo es estimar la media y la desviación de una población normal usando una muestra aleatoria e información a priori de los parámetros.

Los datos

Vamos a simular unos datos de una población con parámetros conocidos. El parámetro \(\tau\) se llama parámetro precisión y se define como \(\tau=1/\sigma\).

true_mean <- 15
true_sd <- 7
true_tau <- 1 / true_sd

set.seed(666)  # Semilla del diablo
y <- rnorm(n=100, mean=true_mean, sd=true_sd)
hist(y, freq=FALSE)

Distribuciones a priori

Vamos a usar las siguientes dos a priori para \(\mu\) y \(\tau\) respectivamente.

# Hyperparameters for mu
mu0 <- 0
sd0 <- 6

curve(dnorm(x, mean=mu0, sd=sd0), from=-20, to=20, las=1, lwd=3,
      main='Prior to mu', ylab='Density', xla='mu')

# Hyperparameters for tau
df0 <- 1
scale0 <- 1

library(LaplacesDemon)
curve(dinvchisq(x, df=df0, scale=scale0), from=0.01, to=15, las=1, lwd=3,
      xlim=c(-3, 15),
      main='Prior to tau', ylab='Density', xlab='tau')

Definiendo el modelo

# Model function
Model <- function(parm, Data) {
  mu <- parm[1]
  tau <- exp(parm[2])
  loglik <- sum(dnorm(Data$y, mean=mu, sd=1/tau, log=T))
  postlik <- loglik + dnorm(mu, mean=mu0, sd=sd0, log = T)
  postlik <- postlik + dinvchisq(tau, df=df0, scale=scale0, log=T)
  Modelout<-list(LP=postlik,
                 Dev= -2 * loglik,
                 Monitor=c(postlik, loglik), 
                 yhat=rnorm(n=length(Data$y), mean=mu, sd=1/tau),
                 parm=parm)
  return(Modelout)
}

# Data object
MyData <- list(N = length(y), 
               mon.names = c("postlik", "loglik"),
               parm.names = c("mu", "tau"), 
               y=y)

# Initial values for the chain
Initial.Values <- c(0, 1)

Corriendo el modelo

library(LaplacesDemon)
N <- 10000

fit <- LaplacesDemon(Model=Model, Data = MyData,
                     Iterations = N, Status=1000, 
                     Algorithm = "HARM", Thinning = 1,
                     Initial.Values=Initial.Values)

## 
## Laplace's Demon was called on Thu Mar 05 11:22:09 2020
## 
## Performing initial checks...
## Algorithm: Hit-And-Run Metropolis 
## 
## Laplace's Demon is beginning to update...
## Iteration: 1000Iteration: 1000,   Proposal: Multivariate,   LP: -347
## Iteration: 2000Iteration: 2000,   Proposal: Multivariate,   LP: -347.2
## Iteration: 3000Iteration: 3000,   Proposal: Multivariate,   LP: -346.1
## Iteration: 4000Iteration: 4000,   Proposal: Multivariate,   LP: -346.6
## Iteration: 5000Iteration: 5000,   Proposal: Multivariate,   LP: -348.1
## Iteration: 6000Iteration: 6000,   Proposal: Multivariate,   LP: -346.6
## Iteration: 7000Iteration: 7000,   Proposal: Multivariate,   LP: -347.5
## Iteration: 8000Iteration: 8000,   Proposal: Multivariate,   LP: -346.1
## Iteration: 9000Iteration: 9000,   Proposal: Multivariate,   LP: -346.1
## Iteration: 10000Iteration: 10000,   Proposal: Multivariate,   LP: -348.3
## 
## Assessing Stationarity
## Assessing Thinning and ESS
## Creating Summaries
## Estimating Log of the Marginal Likelihood
## Creating Output
## 
## Laplace's Demon has finished.

Para ver un resumen del modelo usamos la siguiente instrucción. En esa salida aparecerá información valiosa.

Consort(fit)

## 
## #############################################################
## # Consort with Laplace's Demon                              #
## #############################################################
## Call:
## LaplacesDemon(Model = Model, Data = MyData, Initial.Values = Initial.Values, 
##     Iterations = N, Status = 1000, Thinning = 1, Algorithm = "HARM")
## 
## Acceptance Rate: 0.2399
## Algorithm: Hit-And-Run Metropolis
## Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)
##         mu        tau 
## 3.37584682 0.02120749 
## 
## Covariance (Diagonal) History: (NOT SHOWN HERE)
## Deviance Information Criterion (DIC):
##             All Stationary
## Dbar      824.4      824.4
## pD   10976408.5 10976408.5
## DIC  10977232.9 10977232.9
## Initial Values:
## [1] 0 1
## 
## Iterations: 10000
## Log(Marginal Likelihood): NA
## Minutes of run-time: 0.01
## Model: (NOT SHOWN HERE)
## Monitor: (NOT SHOWN HERE)
## Parameters (Number of): 2
## Posterior1: (NOT SHOWN HERE)
## Posterior2: (NOT SHOWN HERE)
## Recommended Burn-In of Thinned Samples: 0
## Recommended Burn-In of Un-thinned Samples: 0
## Recommended Thinning: 40
## Specs: (NOT SHOWN HERE)
## Status is displayed every 1000 iterations
## Summary1: (SHOWN BELOW)
## Summary2: (SHOWN BELOW)
## Thinned Samples: 10000
## Thinning: 1
## 
## 
## Summary of All Samples
##                 Mean           SD        MCSE        ESS          LB
## mu         14.020045    1.8320830   0.2147966   35.77964   11.122464
## tau        -1.975999    0.1425621   0.0097089  406.18349   -2.155096
## Deviance  824.400336 4685.3833422 195.9012293 1126.65157  677.808876
## postlik  -419.280386 2342.6295956  97.9511875 1126.84331 -355.977930
## loglik   -412.200168 2342.6916711  97.9506147 1126.65157 -349.919435
##               Median          UB
## mu         14.249027   15.746808
## tau        -1.967926   -1.834977
## Deviance  679.262259  699.838869
## postlik  -346.738514 -346.045123
## loglik   -339.631129 -338.904438
## 
## 
## Summary of Stationary Samples
##                 Mean           SD        MCSE        ESS          LB
## mu         14.020045    1.8320830   0.2147966   35.77964   11.122464
## tau        -1.975999    0.1425621   0.0097089  406.18349   -2.155096
## Deviance  824.400336 4685.3833422 195.9012293 1126.65157  677.808876
## postlik  -419.280386 2342.6295956  97.9511875 1126.84331 -355.977930
## loglik   -412.200168 2342.6916711  97.9506147 1126.65157 -349.919435
##               Median          UB
## mu         14.249027   15.746808
## tau        -1.967926   -1.834977
## Deviance  679.262259  699.838869
## postlik  -346.738514 -346.045123
## loglik   -339.631129 -338.904438
## 
## Demonic Suggestion
## 
## Due to the combination of the following conditions,
## 
## 1. Hit-And-Run Metropolis
## 2. The acceptance rate (0.2399) is within the interval [0.15,0.5].
## 3. At least one target MCSE is >= 6.27% of its marginal posterior
##    standard deviation.
## 4. At least one target distribution has an effective sample size
##    (ESS) less than 100. The worst mixing chain is: mu (ESS=35.77964).
## 5. Each target distribution became stationary by
##    1 iteration.
## 
## Laplace's Demon has not been appeased, and suggests
## copy/pasting the following R code into the R console,
## and running it.
## 
## Initial.Values <- as.initial.values(fit)
## fit <- LaplacesDemon(Model, Data=MyData, Initial.Values,
##      Covar=NULL, Iterations=4e+05, Status=632, Thinning=40,
##      Algorithm="HARM", Specs=list(alpha.star=NA, B=NULL)
## 
## Laplace's Demon is finished consorting.

Como se obtuvo el mensaje Laplace’s Demon has not been appeased, vamos a usar la __Demonic Suggestion_ anterior y vamos a correr el modelo nuevamente con las condiciones sugeridas. El valor de Status se cambió a 100000 para no llenar la pantalla de tantos resultados. En este nuevo ajuste se usó Thinning=40 para hacer un adelgazamiento de la cadena, como Iterations=4e+05, la longitud de la cadena resultante será 4e+05/40=10000.

Initial.Values <- as.initial.values(fit)
fit <- LaplacesDemon(Model, Data=MyData, Initial.Values,
     Covar=NULL, Iterations=4e+05, Status=100000, Thinning=40,
     Algorithm="HARM")

## 
## Laplace's Demon was called on Thu Mar 05 11:22:10 2020
## 
## Performing initial checks...
## Algorithm: Hit-And-Run Metropolis 
## 
## Laplace's Demon is beginning to update...
## Iteration: 100000Iteration: 100000,   Proposal: Multivariate,   LP: -346.4
## Iteration: 200000Iteration: 200000,   Proposal: Multivariate,   LP: -346.2
## Iteration: 300000Iteration: 300000,   Proposal: Multivariate,   LP: -346.3
## Iteration: 400000Iteration: 400000,   Proposal: Multivariate,   LP: -348.7
## 
## Assessing Stationarity
## Assessing Thinning and ESS
## Creating Summaries
## Estimating Log of the Marginal Likelihood
## Creating Output
## 
## Laplace's Demon has finished.

Para ver un resumen del modelo usamos la siguiente instrucción. En esa salida aparecerá información valiosa.

Consort(fit)

## 
## #############################################################
## # Consort with Laplace's Demon                              #
## #############################################################
## Call:
## LaplacesDemon(Model = Model, Data = MyData, Initial.Values = Initial.Values, 
##     Covar = NULL, Iterations = 4e+05, Status = 1e+05, Thinning = 40, 
##     Algorithm = "HARM")
## 
## Acceptance Rate: 0.23782
## Algorithm: Hit-And-Run Metropolis
## Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)
##          mu         tau 
## 0.510509490 0.004836395 
## 
## Covariance (Diagonal) History: (NOT SHOWN HERE)
## Deviance Information Criterion (DIC):
##          All Stationary
## Dbar 679.792    679.792
## pD     2.029      2.029
## DIC  681.820    681.820
## Initial Values:
## [1] 12.929896 -1.925251
## 
## Iterations: 4e+05
## Log(Marginal Likelihood): -340.0374
## Minutes of run-time: 0.32
## Model: (NOT SHOWN HERE)
## Monitor: (NOT SHOWN HERE)
## Parameters (Number of): 2
## Posterior1: (NOT SHOWN HERE)
## Posterior2: (NOT SHOWN HERE)
## Recommended Burn-In of Thinned Samples: 0
## Recommended Burn-In of Un-thinned Samples: 0
## Recommended Thinning: 120
## Specs: (NOT SHOWN HERE)
## Status is displayed every 1e+05 iterations
## Summary1: (SHOWN BELOW)
## Summary2: (SHOWN BELOW)
## Thinned Samples: 10000
## Thinning: 40
## 
## 
## Summary of All Samples
##                 Mean         SD        MCSE       ESS          LB      Median
## mu         14.327347 0.71439853 0.013566785  4659.500   12.957601   14.326115
## tau        -1.968282 0.06954634 0.000685856 10000.000   -2.109866   -1.966135
## Deviance  679.791613 2.01430963 0.026078445  7547.505  677.806805  679.174719
## postlik  -347.019001 0.99640045 0.012435297  7812.426 -349.624913 -346.707303
## loglik   -339.895806 1.00715481 0.013039223  7547.505 -342.580604 -339.587360
##                   UB
## mu         15.751521
## tau        -1.838438
## Deviance  685.161208
## postlik  -346.046106
## loglik   -338.903402
## 
## 
## Summary of Stationary Samples
##                 Mean         SD        MCSE       ESS          LB      Median
## mu         14.327347 0.71439853 0.013566785  4659.500   12.957601   14.326115
## tau        -1.968282 0.06954634 0.000685856 10000.000   -2.109866   -1.966135
## Deviance  679.791613 2.01430963 0.026078445  7547.505  677.806805  679.174719
## postlik  -347.019001 0.99640045 0.012435297  7812.426 -349.624913 -346.707303
## loglik   -339.895806 1.00715481 0.013039223  7547.505 -342.580604 -339.587360
##                   UB
## mu         15.751521
## tau        -1.838438
## Deviance  685.161208
## postlik  -346.046106
## loglik   -338.903402
## 
## Demonic Suggestion
## 
## Due to the combination of the following conditions,
## 
## 1. Hit-And-Run Metropolis
## 2. The acceptance rate (0.237825) is within the interval [0.15,0.5].
## 3. Each target MCSE is < 6.27% of its marginal posterior
##    standard deviation.
## 4. Each target distribution has an effective sample size (ESS)
##    of at least 100.
## 5. Each target distribution became stationary by
##    1 iteration.
## 
## Laplace's Demon has been appeased, and suggests
## the marginal posterior samples should be plotted
## and subjected to any other MCMC diagnostic deemed
## fit before using these samples for inference.
## 
## Laplace's Demon is finished consorting.

Como se obtuvo el mensaje Laplace’s Demon has been appeased podemos continuar.

Medias de las distribuciones a posteriori

Aquí se comparan los resultados de las medias de las distribuciones a posteriori con los valores reales.

res <- fit$Summary1
post_mean_mu <- res[1, 1]
post_mean_tau <- res[2, 1]
post_mean_sigma <- 1/exp(post_mean_tau)

w <- cbind(c(true_mean, true_sd, true_tau),
           c(post_mean_mu, post_mean_sigma, post_mean_tau))
colnames(w) <- c('True', 'Posterior mean')
rownames(w) <- c('media', 'desviacion', 'tau')
w

##                  True Posterior mean
## media      15.0000000      14.327347
## desviacion  7.0000000       7.158369
## tau         0.1428571      -1.968282

Algunos gráficos

Vamos a dibujar la evolución de los parámetros.

plot(fit, BurnIn=5000, MyData, PDF=FALSE, Parms=NULL)

De la figura anterior no se observa ninguna tendencia, los kernel se ven normales y los valores de ACF son bajo, todo esto significa que tenemos distribuciones a posterior estables.

Ahora los gráficos de oruga para los cuantiles (2.5%, 50%, and 97.5%) de los parámetros.

caterpillar.plot(fit, Parms=c("mu", "tau"))

Para obtener la distancia de Hellinger entre las cadenas se usa el código de abajo. Distancias mayores a 0.5 indican que la cadena es no estacionaria y que el proceso aún no converge. En la figura de abajo predominan los colores negros lo que indica que hay poca distancia entre las cadenas.

plot(BMK.Diagnostic(fit$Posterior1[5001:10000,]))