Play with the inputs of the first, manual, non-stan model for HMC, and re-run with the same data. What changes do you observe in the trace plots OR posterior distributions?

hmc_sampler <- function(y, mu0, sigma0_sq, n_iter, epsilon, L, mu_start, logsigmasq_start) {
  n <- length(y)
  mu_samples <- numeric(n_iter)
  theta_samples <- numeric(n_iter)
  accept <- logical(n_iter)

  # Initialize
  mu <- mu_start
  theta <- logsigmasq_start

  # Potential energy function
  U <- function(mu, theta) {
    -sum(dnorm(y, mu, sqrt(exp(theta)), log = TRUE)) -
      dnorm(mu, mu0, sqrt(sigma0_sq), log = TRUE) -
      log(dnorm(theta, 0, 1))
  }

  # Gradient of the potential energy function
  grad_U <- function(mu, theta) {
    dU_dmu <- -sum((y - mu) / exp(theta)) + (mu - mu0) / sigma0_sq
    dU_dtheta <- 0.5 * n - 0.5 * sum((y - mu)^2) / exp(theta) + theta

    c(dU_dmu, dU_dtheta)
  }

  for (t in 1:n_iter) {
    # Sample momentum
    p_mu <- rnorm(1)
    p_theta <- rnorm(1)

    # Save current values
    mu_old <- mu
    theta_old <- theta
    p_mu_old <- p_mu
    p_theta_old <- p_theta

    # Perform L leapfrog steps
    for (i in 1:L) {
      p_mu <- p_mu - 0.5 * epsilon * grad_U(mu, theta)[1]
      mu <- mu + epsilon * p_mu

      p_theta <- p_theta - 0.5 * epsilon * grad_U(mu, theta)[2]
      theta <- theta + epsilon * p_theta

      p_mu <- p_mu - 0.5 * epsilon * grad_U(mu, theta)[1]
      p_theta <- p_theta - 0.5 * epsilon * grad_U(mu, theta)[2]
    }

    # Compute the Hamiltonian
    H_old <- U(mu_old, theta_old) + 0.5 * (p_mu_old^2 + p_theta_old^2)
    H_new <- U(mu, theta) + 0.5 * (p_mu^2 + p_theta^2)

    # Check for NaN values in Hamiltonian
    if (is.nan(H_old) || is.nan(H_new)) {
      stop("NaN detected in Hamiltonian calculation")
    }

    # Accept or reject the proposal
    if (runif(1) < exp(H_old - H_new)) {
      accept[t] <- TRUE
    } else {
      mu <- mu_old
      theta <- theta_old
      accept[t] <- FALSE
    }

    # Store samples
    mu_samples[t] <- mu
    theta_samples[t] <- theta
  }

  list(mu_samples = mu_samples, theta_samples = theta_samples, accept = accept)
}

set.seed(123)

# True parameters
true_mu <- 5
true_sigma <- 2

# Simulate data
n <- 100
y <- rnorm(n, mean = true_mu, sd = true_sigma)

# Prior parameters
mu0 <- 0
sigma0_sq <- 10^2 # A vague prior

# HMC parameters
n_iter <- 5000
epsilon <- 0.01
L <- 20
mu_start <- 0
logsigmasq_start <- log(1)

# Run the HMC sampler (This is the original sampler we did during the tutorial)
# so the epsilon is 0.01 in this one
set.seed(123)
hmc_results_orig <- hmc_sampler(y, mu0, sigma0_sq, n_iter, epsilon, L, mu_start, logsigmasq_start)

# let's change the epsilon values
set.seed(123)
hmc_results_2 <- hmc_sampler(y, mu0, sigma0_sq, n_iter, epsilon = 0.001, L, mu_start, logsigmasq_start)

set.seed(123)
hmc_results_3 <- hmc_sampler(y, mu0, sigma0_sq, n_iter, epsilon = 0.00001, L, mu_start, logsigmasq_start)

# Define the burn-in period
burn_in <- 1000

# Trace plots
par(mfrow = c(3, 2))

# Original epsilon = 0.01
plot(hmc_results_orig$mu_samples[(burn_in + 1):n_iter],
  type = "l",
  main = "Traceplot of Mu (epsilon = 0.01)"
)
plot(sqrt(exp(hmc_results_orig$theta_samples[(burn_in + 1):n_iter])),
  type = "l",
  main = "Traceplot of Sigma (epsilon = 0.01)"
)

# Epsilon = 0.001
plot(hmc_results_2$mu_samples[(burn_in + 1):n_iter],
  type = "l",
  main = "Traceplot of Mu (epsilon = 0.001)"
)
plot(sqrt(exp(hmc_results_2$theta_samples[(burn_in + 1):n_iter])),
  type = "l",
  main = "Traceplot of Sigma (epsilon = 0.001)"
)

# Epsilon = 0.00001
plot(hmc_results_3$mu_samples[(burn_in + 1):n_iter],
  type = "l",
  main = "Traceplot of Mu (epsilon = 0.00001)"
)
plot(sqrt(exp(hmc_results_3$theta_samples[(burn_in + 1):n_iter])),
  type = "l",
  main = "Traceplot of Sigma (epsilon = 0.00001)"
)

These trace plots illustrate how HMC sampler’s performance varies with different step sizes, denoted by epsilon. With epsilon = 0.01 (top row), we see well-mixed trace plots for both Mu and Sigma. This indicates that the sampler is efficiently exploring the parameter space, as it fluctuates around the target values without obvious trends or drift. This suggests that epsilon = 0.01 strikes a good balance between exploration and acceptance rate.

In contrast, with a smaller epsilon of 0.001 (middle row), the trace plots start showing more autocorrelation, particularly for Mu, resulting in slower mixing, although the sampler still captures the overall distribution.

Finally, with a very small epsilon = 0.00001 (bottom row), the trace plots exhibit poor mixing, with a distinct upward drift rather than stabilizing around a central value, indicating ineffective exploration and lack of convergence.

2. Comparing Acceptance Rates

What was the acceptance rate for the custom HMC sampler? How does it compare the acceptance rates we got in the previous assignments using different samplers?

(acceptance_rate <- mean(hmc_results_orig$accept))
#;-) [1] 0.9992

The acceptance rate from Metropolis Hastings algorithm was 0.36. This lower rate is common in MH samplers, which are more prone to rejection as they rely on a proposal distribution that might not align well with the target distribution

And in the Gibbs sampling, we didn’t even have an acceptance rate because typically, Gibbs sampling has an acceptance rate of 1.0 - meaning each update is drawn directly from the conditional distributions of the target distribution.

3. BRMS

Install and load the library brms, and implement the equivalent model in brms. How similar are your results compared to the results from rstan that we did today? tip: you can use brm function, with an intercept-only model (y~1)

library(brms)
# Create a data frame
data_df <- data.frame(y = y)

# Define priors
priors <- c(
  prior(normal(0, 10), class = "Intercept"), # Prior for mu
  prior(cauchy(0, 2.5), class = "sigma") # Prior for sigma
)

# Fit the model
brms_fit <- brm(
  formula = y ~ 1,
  data = data_df,
  family = gaussian(),
  prior = priors,
  iter = 5000,
  chains = 4,
  seed = 123
)
#;-) Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
#;-) using C compiler: ‘Apple clang version 15.0.0 (clang-1500.0.40.1)’
#;-) using SDK: ‘MacOSX14.0.sdk’
#;-) clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/usr/local/opt/gettext/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
#;-) In file included from <built-in>:1:
#;-) In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
#;-) In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
#;-) In file included from /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
#;-) /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#;-) #include <cmath>
#;-)          ^~~~~~~
#;-) 1 error generated.
#;-) make: *** [foo.o] Error 1
#;-) 
#;-) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
#;-) Chain 1: 
#;-) Chain 1: Gradient evaluation took 1.9e-05 seconds
#;-) Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
#;-) Chain 1: Adjust your expectations accordingly!
#;-) Chain 1: 
#;-) Chain 1: 
#;-) Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#;-) Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#;-) Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#;-) Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#;-) Chain 1: Iteration: 2000 / 5000 [ 40%]  (Warmup)
#;-) Chain 1: Iteration: 2500 / 5000 [ 50%]  (Warmup)
#;-) Chain 1: Iteration: 2501 / 5000 [ 50%]  (Sampling)
#;-) Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#;-) Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#;-) Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#;-) Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#;-) Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#;-) Chain 1: 
#;-) Chain 1:  Elapsed Time: 0.026 seconds (Warm-up)
#;-) Chain 1:                0.023 seconds (Sampling)
#;-) Chain 1:                0.049 seconds (Total)
#;-) Chain 1: 
#;-) 
#;-) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
#;-) Chain 2: 
#;-) Chain 2: Gradient evaluation took 5e-06 seconds
#;-) Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
#;-) Chain 2: Adjust your expectations accordingly!
#;-) Chain 2: 
#;-) Chain 2: 
#;-) Chain 2: Iteration:    1 / 5000 [  0%]  (Warmup)
#;-) Chain 2: Iteration:  500 / 5000 [ 10%]  (Warmup)
#;-) Chain 2: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#;-) Chain 2: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#;-) Chain 2: Iteration: 2000 / 5000 [ 40%]  (Warmup)
#;-) Chain 2: Iteration: 2500 / 5000 [ 50%]  (Warmup)
#;-) Chain 2: Iteration: 2501 / 5000 [ 50%]  (Sampling)
#;-) Chain 2: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#;-) Chain 2: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#;-) Chain 2: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#;-) Chain 2: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#;-) Chain 2: Iteration: 5000 / 5000 [100%]  (Sampling)
#;-) Chain 2: 
#;-) Chain 2:  Elapsed Time: 0.026 seconds (Warm-up)
#;-) Chain 2:                0.032 seconds (Sampling)
#;-) Chain 2:                0.058 seconds (Total)
#;-) Chain 2: 
#;-) 
#;-) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
#;-) Chain 3: 
#;-) Chain 3: Gradient evaluation took 4e-06 seconds
#;-) Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
#;-) Chain 3: Adjust your expectations accordingly!
#;-) Chain 3: 
#;-) Chain 3: 
#;-) Chain 3: Iteration:    1 / 5000 [  0%]  (Warmup)
#;-) Chain 3: Iteration:  500 / 5000 [ 10%]  (Warmup)
#;-) Chain 3: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#;-) Chain 3: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#;-) Chain 3: Iteration: 2000 / 5000 [ 40%]  (Warmup)
#;-) Chain 3: Iteration: 2500 / 5000 [ 50%]  (Warmup)
#;-) Chain 3: Iteration: 2501 / 5000 [ 50%]  (Sampling)
#;-) Chain 3: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#;-) Chain 3: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#;-) Chain 3: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#;-) Chain 3: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#;-) Chain 3: Iteration: 5000 / 5000 [100%]  (Sampling)
#;-) Chain 3: 
#;-) Chain 3:  Elapsed Time: 0.026 seconds (Warm-up)
#;-) Chain 3:                0.021 seconds (Sampling)
#;-) Chain 3:                0.047 seconds (Total)
#;-) Chain 3: 
#;-) 
#;-) SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
#;-) Chain 4: 
#;-) Chain 4: Gradient evaluation took 3e-06 seconds
#;-) Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
#;-) Chain 4: Adjust your expectations accordingly!
#;-) Chain 4: 
#;-) Chain 4: 
#;-) Chain 4: Iteration:    1 / 5000 [  0%]  (Warmup)
#;-) Chain 4: Iteration:  500 / 5000 [ 10%]  (Warmup)
#;-) Chain 4: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#;-) Chain 4: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#;-) Chain 4: Iteration: 2000 / 5000 [ 40%]  (Warmup)
#;-) Chain 4: Iteration: 2500 / 5000 [ 50%]  (Warmup)
#;-) Chain 4: Iteration: 2501 / 5000 [ 50%]  (Sampling)
#;-) Chain 4: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#;-) Chain 4: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#;-) Chain 4: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#;-) Chain 4: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#;-) Chain 4: Iteration: 5000 / 5000 [100%]  (Sampling)
#;-) Chain 4: 
#;-) Chain 4:  Elapsed Time: 0.024 seconds (Warm-up)
#;-) Chain 4:                0.027 seconds (Sampling)
#;-) Chain 4:                0.051 seconds (Total)
#;-) Chain 4:

# Summary of the fit
print(summary(brms_fit))
#;-)  Family: gaussian 
#;-)   Links: mu = identity; sigma = identity 
#;-) Formula: y ~ 1 
#;-)    Data: data_df (Number of observations: 100) 
#;-)   Draws: 4 chains, each with iter = 5000; warmup = 2500; thin = 1;
#;-)          total post-warmup draws = 10000
#;-) 
#;-) Regression Coefficients:
#;-)           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#;-) Intercept     5.18      0.18     4.82     5.54 1.00     8909     6520
#;-) 
#;-) Further Distributional Parameters:
#;-)       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#;-) sigma     1.84      0.13     1.61     2.12 1.00     8496     6778
#;-) 
#;-) Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#;-) and Tail_ESS are effective sample size measures, and Rhat is the potential
#;-) scale reduction factor on split chains (at convergence, Rhat = 1).

The estimated mean from brms package is also close to 5 (5.18). The 95% credible interval is between 4.82 and 5.54, and the model has converged.

Remember: The estimated mean from rstan was also 5.18, close to the true value of 5. The 95% credible interval is [4.82, 5.54], showing good estimation.

So basically we achieved the same results with both packages. Under the hood, they used the same HMC algorithm.

HMC Tutorial Assignment

1. Different Values

2. Comparing Acceptance Rates

3. BRMS