1 What is this about?

The exponential distribution is the probability distribution of the time between the events in a Poisson process, so it is used to predict the amount of wainting time until the next event, for example, the amount of time you need to wait at the bus stop until the next bus arrives. If you want to learn more about this distribution check this post: Exponential Distribution — Intuition, Derivation, and Applications.

In this publication we are going to work with an exponential distribution and let’s suppose that its parameter \(\lambda\), which is the mean time between events, has change at some point in time \(k\), that is:

\[\begin{align*} Y_i &\sim Exp(\lambda) \hspace{1cm} i=1,...,k \\ Y_i &\sim Exp(\alpha) \hspace{1cm} i=k+1,...,n \end{align*}\]

Our main objective is to estimate parameters \(\lambda\), \(\alpha\) and \(k\) given a sample of \(n\) observations from this distribution using the Gibbs sampler.

2 Gibbs sampler

The Gibbs sampler is a special case of the Metropolis-Hastings sampler and is often used when the target is a multivariate distribution. With this method, the chain is generated by sampling from the marginal distributions of the target distribution and therefore every candidate point is accepted.

The Gibbs sampler generates the Markov Chain as follows:

Let \(X=(X_1,...,X_d)\) be a random vector in \(\mathbb{R}^d\), initialize \(X(0)\) at time \(t=0\).
For each iteration \(t=1,2,3,...\) repeat:

Set \(x_1=X_1(t-1)\).
For each \(j=1,...,d\):
- Generate \(X_j^*(t)\) from \(f(X_j|x_{(-j)})\), where \(f(X_j|X_{(-j)})\) is the univariate conditional density of \(X_j\) given \(X_{(-j)}\).
- Update \(x_j=X_j^*(t)\).
As every candidate point is accepted, set \(X(t)=(X_1^*(t),...,X_d^*(t))\).
Increment t.

To learn more about the Gibbs sampler check this book: Statistical computing with R

3 Bayesian formulation

A simple formulation of the changepoint problem assumes \(f\) and \(g\) known densities with:

\[\begin{align*} Y_i &\sim f(Y|\lambda) \hspace{1cm} i=1,...,k, \\ Y_i &\sim g(Y|\alpha) \hspace{1cm} i=k+1,...,n \end{align*}\]

where \(k\) unknown and \(k=1,2,...,n\). Let \(Y_i\) be the time elapsed in minutes between the arrival of buses at a bus stop. Suppose the changepoint occurs at minute \(k\), that is:

\[\begin{align*} Y_i &\sim Exp(\lambda) \hspace{1cm} i=1,...,k, \\ Y_i &\sim Exp(\alpha) \hspace{1cm} i=k+1,...,n \end{align*}\]

With \(\textbf{Y}=(Y_1,Y_2,...,Y_n)\), the likelihood \(L(\textbf{Y}|k)\) is given by:

\[ L(Y;k,\lambda,\alpha)=\prod_{i=1}^{k} f(Y_i|\lambda) \prod_{i=k+1}^{n} g(Y_i|\alpha)=\prod_{i=1}^{k} \lambda \exp(-\lambda Y_i) \prod_{i=k+1}^{n} \alpha \exp(-\alpha Y_i) \]

Assume the Bayesian model with independent priors given by:

\[\begin{align*} \lambda &\sim Gamma(a, b), \\ \alpha &\sim Gamma(c, d), \\ k &\sim Uniform\{1,2,3,...,n\}. \end{align*}\]

The joint distribution of the data and the parameters is:

\[ L(Y;k,\lambda,\alpha) \times \pi{(k)} \times \pi{(\lambda)} \times \pi{(\alpha)}, \]

where,

\[ \pi(k)=\frac{1}{n}, \]

\[ \pi(\lambda)=\frac{b^{a}}{\Gamma(a)} \lambda^{a-1}\exp(-\lambda b), \]

\[ \pi(\alpha)=\frac{d^{c}}{\Gamma(c)} \alpha^{c-1}\exp(-\alpha d). \]

As I mentioned before, the implementation of the Gibbs sampler requires sampling from the marginal distributions of the target distribution, so we need to find the complete conditional distributions for \(\lambda\), \(\alpha\) and \(k\). How can you do it? In simple terms, you have to select from the join distribution presented above, only the terms that depend on the parameter of interest and ignore the rest.

The complete conditional distribution for \(\lambda\) is given by:

\[\begin{align*} \lambda | Y, k, \alpha = \lambda^{(k+a)-1} \exp\left(-\lambda \left(b +\sum_{i=1}^{k}Y_i\right)\right),\\ \lambda | Y, k, \alpha \sim Gamma\left(k+a, b+\sum_{i=1}^{k}Y_i\right). \end{align*}\]

The complete conditional distributions for \(\alpha\) is given by:

\[\begin{align*} \alpha|Y, k, \lambda = \alpha^{(n-k+c)-1} \exp\left(-\alpha \left(d +\sum_{i=k+1}^{n}Y_i\right)\right),\\ \alpha|Y, k, \lambda \sim Gamma\left(n-k+c, d+\sum_{i=k+1}^{n}Y_i\right). \end{align*}\]

The complete conditional distributions for \(k\) is given by:

\[\begin{align*} k|Y, \lambda, \alpha = \frac{L(\textbf{Y};k,\lambda,\alpha)}{\sum_{j=1}^{n}L(\textbf{Y};j,\lambda,\alpha)}=\frac{\lambda^{k}\exp\left(-\lambda \sum_{i=1}^{k}Y_i\right) \alpha^{-k}\exp\left(-\alpha\left(- \sum_{i=1}^{k}Y_i\right)\right)}{\sum_{j=1}^{n} \lambda^{j}\exp\left(-\lambda \sum_{i=1}^{j}Y_i\right) \alpha^{-j}\exp\left(-\alpha\left(- \sum_{i=1}^{j}Y_i\right)\right)}. \end{align*}\]

To learn more about this Bayesian formulation, see the following article: Hierarchical Bayesian Analysis of Changepoint Problems.

4 Computational approach… So much fun!

Here you are going to learn how to use the Gibbs sampler using R to estimate parameters \(\lambda\), \(\alpha\) and \(k\).

4.1 The data

First, we generate the data from the next exponential distribution with changepoint:

\[\begin{align*} Y_i &\sim Exp(\lambda=2) \hspace{1cm} i=1,...,25, \\ Y_i &\sim Exp(\alpha=10) \hspace{1cm} i=26,...,60 \end{align*}\]

set.seed(98712)
y <- c(rexp(25, rate = 2), rexp(35, rate = 10))

Considering the situation of the bus stop, in the beginning, the buses were arriving every 2 minutes on average but from time \(i=26\), the buses started arriving every 10 minutes on average to the bus stop.

4.2 Implementation of the Gibbs sampler

First, we need to initialize \(k\), \(\lambda\) and \(\alpha\).

n <- length(y) # Number of observations of the sample
lchain <- 10000 # Chain size
lambda <- alpha <- k <- numeric(lchain)
L <- numeric(n)
k[1] <- sample(1:n, 1) # Start value for parameter k
lambda[1] <- 1 # Start value for parameter lambda
alpha[1] <- 1 # Start value for parameter alpha
# Parameters of the prior distribution of lambda
a <- 1 
b <- 0.5
# Parameters of the prior distribution of alpha
c <- 5
d <- 1

Now, for each iteration of the algorithm, we need to generate \(\lambda(t)\), \(\alpha(t)\) and \(k(t)\) as follows (remember that if \(k+1>n\) there is no changepoint):

\[\begin{align*} \lambda(t) &= Gamma\left(k+a, b+\sum_{i=1}^{k}Y_i\right), \\ \alpha(t) &= Gamma\left(n-k+c, d+\sum_{i=k+1}^{n}Y_i\right), \\ k(t) &= \frac{L(\textbf{Y};k,\lambda,\alpha)}{\sum_{j=1}^{n}L(\textbf{Y};j,\lambda,\alpha)}. \end{align*}\]

for (i in 2:lchain){
        kt <- k[i-1]
        # Generate lambda
        lambda[i] <- rgamma(1, shape = kt + a, rate = sum(y[1:kt]) + b)
        # Generate alpha
        if (kt + 1 > n){
                r <- sum(y) + d
        }else{
                r <- sum(y[(kt+1):n]) + d
        }
        alpha[i] <- rgamma(1, shape = n - kt + c, rate = r)
        # Generate k   
        for (j in 1:n) {
                L[j] <- ((lambda[i] / alpha[i])^{j})*exp(-lambda[i]*sum(y[1:j]))*exp(-alpha[i]*-sum(y[1:j]))
        }
        L <- L / sum(L)
        # Generate k from discrete distribution L on 1:n
        k[i] <- sample(1:n, prob=L, size=1)
}
# Remove the first 9000 values of the chain
burnIn <- 9000

5 Results

In this section, we present the chain generated by Gibbs sampler and its distribution for the parameters \(\lambda\), \(\alpha\) and \(k\). The true value of the parameters is represented by the red line.

The next table presents the real value of the parameters and the average of the estimates obtained using the Gibbs sampler:

res <- c(mean(k[-(1:burnIn)]), mean(lambda[-(1:burnIn)]), mean(alpha[-(1:burnIn)]))
real <- c(25, 2, 10)
resfinal <- cbind(real, res)
colnames(resfinal) <- c('True value', 'Mean MCMC')
rownames(resfinal) <- c('k', 'lambda', 'alpha')
resfinal

##        True value Mean MCMC
## k              25 24.138000
## lambda          2  2.472162
## alpha          10  8.418692

6 Conclusions

From the results, we can conclude that the average of the estimates for the parameters \(k\), \(\lambda\) and \(\alpha\) from the exponential distribution with changepoint obtained using the Gibbs sampler in R, are close to the real values of the parameters, however we would expect better estimations. It is possible that this is due to the selection of the initial values of the chain or to the selection of the prior distributions of \(\lambda\) and \(\alpha\).

I hope you enjoyed this post and learned about the Gibbs sampler. I encourage you to replicate this procedure with other distributions.

The Gibbs sampler and the exponential distribution with changepoint