1 What is this about?

In this publication we will work with a changepoint regression model from a Bayesian approach and estimate the model with the Metropolis sampler using R. So let’s get started…

2 Metropolis sampler

The Metropolis sampler generates a Markov Chain \(\{X_t|t=0,1,2,...\}\) as follows:

Choose a proposal distribution \(g(.|X_t)\). Before choosing it, check the book presented above to learn more about the desirable features in this function.
Generate \(X_0\) from the proposal distribution \(g\).
Repeat until the chain has converged to a stationary distribution:

Generate Y from \(g(.|X_T)\).
Generate U from Uniform(0, 1).
If \(U \leq \frac{f(Y)}{f(X_t)}\), accept \(Y\) and set \(X_{t+1}=Y\), otherwise set \(X_{t+1}=X_T\). This means that the candidate point Y is accepted with probability \(\alpha(X_t, Y)=\min\left(1, \frac{f(Y)}{f(X_t)}\right)\).
Increment \(t\).

To learn more about Metropolis sampler check this book: Statistical computing with R

3 Bayesian approach

In this case, we are going to sample from the next changepoint model:

\[ Y_i = \alpha + \beta_1 x_i + \beta_2 I_i(\theta) x_i + \varepsilon_i, \\ I_i(\theta)=\left\{ \begin{array}{ll} 0 & i < \theta \\ 1 & i \geq \theta \end{array} \right., \]

where \(\alpha\), \(\beta_1\) and \(\beta_2\) are unknown parameters, \(\theta\) is the changepoint, \(\varepsilon_i \sim \mathcal{N}(0, \sigma^{2}_\varepsilon)\) and \(i=1,2,...,n\). For estimating parameters in Bayesian analysis we need to determine the likelihood function and for this example is given by:

\[ L(X,Y|\alpha, \beta_1, \beta_2, \sigma, \theta )=\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(\frac{-(Y_i-(\alpha + \beta_1 X_i + \beta_2 I_i(\theta) X_i))^{2}}{2\sigma^2}\right) \]

Now we have to specify a prior distribution for each parameter \(\alpha\), \(\beta_1\), \(\beta_2\), \(\sigma\) and \(\theta\). In this case, we will use uninformative normal and uniform distributions for the parameters:

\[\begin{align*} \alpha &\sim \mathcal{N}(1, 100),\\ \beta_1 &\sim \mathcal{N}(1, 100),\\ \beta_2 &\sim \mathcal{N}(4, 100),\\ \sigma &\sim Uniform(4, 15),\\ \theta &\sim Uniform\{10, 11,12,...,50\}. \end{align*}\]

Finally, we define the posterior distribution as the product of the prior and likelihood distribution:

\[ p(\alpha, \beta_1, \beta_2, \sigma, \theta | X, Y) \propto L(X,Y|\alpha, \beta_1, \beta_2, \sigma, \theta )\times \pi(\alpha)\times \pi(\beta_1)\times \pi(\beta_2)\times \pi(\sigma)\times \pi(\theta), \]

The posterior distribution will be the target distribution when using the Metropolis sampler.

4 Computational approach… So much fun!

4.1 The data

First, we generate the data from the changepoint regression model presented above:

set.seed(5674)
alpha.init <- 0.5
beta1.init <- 0.5
beta2.init <- 2
sigma.init <- 7
theta.init <- 30
x <- 1:60
mean.init <- alpha.init + (x * beta1.init) + ifelse(x >= theta.init, x * beta2.init, 0)
y <- rnorm(60, mean = mean.init , sd = sigma.init)

4.2 The Likelihood function

Now we define the likelihood function. In this case we will work with the logarithm of this function which is highly recommended to avoid numerical problems while running the algorithm:

LikelihoodFunction <- function(param){
        alpha <- param[1]
        beta1 <- param[2]
        beta2 <- param[3]
        sigma <- param[4]
        theta <- param[5]
        mu <- alpha + (x * beta1) + ifelse(x > theta, x * beta2, 0)
        loglikelihoods <- sum(dnorm(y, mu, sigma), log = TRUE)
        return(loglikelihoods)
}

4.3 The prior distributions

Next we define the prior distributions for the parameters \(\alpha\), \(\beta_1\), \(\beta_2\), \(\sigma\) and \(\theta\). As with the likelihood function, we are going to use the logarithm of the prior distributions:

LogPriorFunction <- function(param){
        alpha <- param[1]
        beta1 <- param[2]
        beta2 <- param[3]
        sigma <- param[4]
        theta <- param[5]
        alphaprior <- dnorm(alpha, 1, 10, log = TRUE)
        beta1prior <- dnorm(beta1, 1, 10, log = TRUE)
        beta2prior <- dnorm(beta2, 4, 10, log = TRUE)
        sigmaprior <- dunif(sigma, 4, 15, log = TRUE)
        thetaprior <- ifelse(theta %in% 10:50, log(1/41), -Inf)
        return(alphaprior + beta1prior + beta2prior + sigmaprior + thetaprior)
}

4.4 The posterior distributions

As we are working with logarithms, we define the posterior distribution as the sum of the logarithm of the likelihood function and the logarithm of the prior distributions. Remember that this function is our target function \(f(.)\) from which we want to sample.

PosteriorFunction <- function(param){
        return (LikelihoodFunction(param) + LogPriorFunction(param)) 
}

4.5 The proposal function

Finally, we define the proposal distribution \(g(.|X_t)\). As we will work with the Metropolis sampler, the proposal distribution must be symmetric and depend on the current state of the chain so we will use the normal distribution with mean equal to the values of the parameters in the current state for \(\alpha\), \(\beta_1\), \(\beta_2\) and \(\sigma\) and for \(\theta\) we will use the discrete uniform distribution.

ProposalFunction <- function(param){
        alpha <- rnorm(1, param[1], sd = 1)
        beta1 <- rnorm(1, param[2], sd = 1)
        beta2 <- rnorm(1, param[3], sd = 1)
        sigma <- rnorm(1, param[4], sd = 0.5)
        theta <- sample(-3:3, 1) + param[5]
        return(c(alpha, beta1, beta2, sigma, theta))
}

4.6 The Metropolis sampler

Finally, we write the code that will help us to perform the Metropolis sampler. In this case, as we are working with logarithms, we must define the probability that the candidate point \(Y\) is accepted as:

\[ \alpha(X_t, Y)=\min\left(1, \exp(\log(f(Y))-\log(f(X_t))\right). \]

RunMetropolisMCMC <- function(startvalue, iterations){
        # Create an array to save the chain values
        chain <- array(dim=c(iterations + 1, 5)) 
        chain[1, ] <- startvalue # Define the start value of the chain
        for (i in 1:iterations){
                # Generate Y from the proposal function
                Y <- ProposalFunction(chain[i, ]) 
                # Probability that the candidate point is accepted
                probability <- exp(PosteriorFunction(Y) - 
                                           PosteriorFunction(chain[i, ]))
                # Decision criteria to accept or reject Y 
                if (runif(1) < probability) {
                        chain[i+1, ] <- Y
                }else{ 
                        chain[i+1, ] <- chain[i, ]
                }
        }
        return(chain)
}

As the MCMC chain is strongly autocorrelated, we will thin the sample using only every nth value in the chain to decrease autocorrelation. In this case, we will select a value for our final chain every 50 iterations of the algorithm. To learn more about this method check this publication: Thinning to reduce autocorrelation: Rarely useful.

startvalue <- c(0, 0, 0, 5, 40) # Define the start value of the chain
iterations <- 50 * 10000 
chain <- RunMetropolisMCMC(startvalue, iterations)
# Select values every 20 iterations for the final chain
cfinal <- matrix(NA, ncol = 5, nrow = 10000)
for (i in 1:10000){
        if (i == 1){
                cfinal[i, ] <- chain[i*50,]
        } else {
                cfinal[i, ] <- chain[i*50,]
        }
}
# Remove the first 8000 values of the chain
burnIn <- 8000

Here, you can observe the ACF plot which gives us values of auto-correlation of any series with its lagged values. In this case, we present an ACF plot for the initial MCMC chain and for the final chain after thinning the sample for the parameter \(\theta\). From the graphs we can conclude that the procedure used actually manages to significantly decrease autocorrelation. You can build these graphs for the other parameters and observe the same behavior.

To learn more about ACF check this publication: Significance of ACF and PACF Plots In Time Series Analysis.

5 Results

In this section, we present the chain generated by Metropolis sampler and its distribution for the parameters \(\alpha\), \(\beta_1\), \(\beta_2\), \(\sigma\) and \(\theta\). The true value of the parameters is represented by the red line.

The next table presents the real value of the parameters and the average of the estimates obtained using the Metropolis sampler:

res <- colMeans(cfinal[-(1:burnIn),])
real <- c(alpha.init, beta1.init, beta2.init, sigma.init, theta.init)
resfinal <- cbind(real, res)
colnames(resfinal) <- c('True value', 'Mean MCMC')
rownames(resfinal) <- c('alpha', 'beta1', 'beta2', 'sigma', 'theta')
resfinal

##       True value  Mean MCMC
## alpha        0.5  0.2793430
## beta1        0.5  0.4340764
## beta2        2.0  5.0349229
## sigma        7.0  9.5907309
## theta       30.0 29.3975000

6 Conclusions

From the results, we can conclude that the estimations for the parameters \(\alpha\), \(\beta_2\) and \(\sigma\) obtained using the Metropolis sampler are significantly different from the true values of the parameters. On the other hand, the estimations for \(\beta_1\) and \(\theta\) tend to get closer to the real values of these two parameters. From this conclusion, it is important to realize that the results are significantly influenced by the selection of the prior distribution for each parameter and the initial values of the chain, that is, when these values are very close to the real parameter values, the estimations will be better, which represents a disadvantage of this sampling method.

I hope you enjoyed this post and learned about the Metropolis sampler. I encourage you to replicate this procedure with other changepoint problems.

The Metropolis sampler and the changepoint regression model