Here we will perform Bayesian MCMC inference of the probability of heads based on coin tosses. We will investigate convergence of the MCMC chains in various conditions.
We have some data providing the results of 300 coin tosses. We would like to estimate how fair the coin is, i.e. what is the probability of getting heads (1).
data = c(1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1)
sum(data)
[1] 190
From our data, we see that we obtained 190 heads out of 300 tosses.
The ML estimate of the probability of heads is simply the ratio of heads over all tosses:
sum(data)/length(data)
[1] 0.6333333
We build a probabilistic model of coin tossing.
All coin tosses are supposed to be independent tosses of the same coin, which always have the same probability of returning a head. We want to perform Bayesian inference, therefore we need a prior. For inference, we will be using Metropolis MCMC.
Defining a prior
We need to put some prior probability on the fairness of the coin. For this, a beta distribution seems appropriate, as it is a continuous distribution between 0 and 1. We choose to use a=4, b=4. This is a somewhat informative prior that the coin should be fair, given our past experience with coins.
x <- seq(-0, 1, length=100)
colors <- c("red", "purple", "blue", "darkgreen", "gold")
shape1s = c(1,2,4,1,0.2)
shape2s = c(1,2,4,4,0.2)
a=4
b=4
Building of the model
We build the functions necessary to compute the likelihood and the prior probability for a given value of the parameter.
# Function to compute the likelihood P(D|M)
# We use the binomial formula.
likelihood <-function (numHeads, numTosses, parameter){
p = choose(numTosses, numHeads)*parameter^numHeads * (1-parameter)^(numTosses-numHeads)
return (p)
}
# Function to compute the prior P(M)
prior <- function (parameter, shape1, shape2){
return (dbeta(parameter, shape1=shape1, shape2=shape2) )
}
# Function to compute the un-normalized posterior P(D|M) * P(M)
unnormalized_posterior <-function (numHeads, numTosses, parameter, shape1, shape2) {
return (likelihood(numHeads, numTosses, parameter) * prior(parameter, shape1, shape2))
}
Inference of the fairness of the coin using MCMC
We build a MCMC chain to estimate the probability of heads for this coin. First we define the model, with the prior, the likelihood and the posterior probability, then we implement a Metropolis MCMC inference mechanism.
Implementing the MCMC algorithm
# Number of iterations for the MCMC algorithm
num_iterations = 50000
# Function to propose a new parameter value, randomly drawn between 0 and 1
propose_new_parameter_value <-function() {
return (runif(1))
}
# Function to run Metropolis MCMC inference
MetropolisMCMC <- function (numHeads, numTosses, shape1, shape2, number_iterations) {
current_parameter_value <- propose_new_parameter_value()
record_parameter <- c()
record_parameter <- c(record_parameter,current_parameter_value)
#print(paste("Initial parameter value for the MCMC: ", current_parameter_value, sep=""))
current_posterior <- unnormalized_posterior(numHeads, numTosses, current_parameter_value, shape1, shape2)
#print(paste("Initial probability of the model: ", current_posterior, sep=""))
record_posterior <- c()
record_posterior <- c(record_posterior, current_posterior)
# We also record the likelihood:
current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
record_likelihood <- c()
record_likelihood <- c(record_likelihood, current_likelihood)
for (i in 1:number_iterations) {
acceptance_threshold <- runif(1)
proposed_parameter_value <- propose_new_parameter_value()
proposed_posterior = unnormalized_posterior(numHeads , numTosses, proposed_parameter_value, shape1, shape2)
if (proposed_posterior / current_posterior > acceptance_threshold) {
current_parameter_value <- proposed_parameter_value
current_posterior <- proposed_posterior
current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
}
record_parameter <- c(record_parameter, current_parameter_value)
record_posterior <-c(record_posterior, current_posterior)
record_likelihood <- c(record_likelihood, current_likelihood)
}
return (list(record_parameter, record_posterior, record_likelihood))
}
Running the MCMC
result = MetropolisMCMC(sum(data), length(data), a, b, num_iterations)
Comparison of the posterior distributions with respect to the number of iterations
We plot in different colors the posterior distributions.
par(mfrow=c(6,1), mar=c(0,4,0,2)+0.1)
plot(density(result[[1]][1:10]), col=colors[1], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="10 iterations")
plot(density(result[[1]][1:50]), col=colors[2], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="50 iterations")
plot(density(result[[1]][1:100]), col=colors[3], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="100 iterations")
plot(density(result[[1]][1:1000]), col=colors[4], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="1000 iterations")
plot(density(result[[1]][1:20000]), col=colors[5], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="20 000 iterations")
plot(density(result[[1]]), col="black", lwd=2, main="", ylab="Density", xlab="Probability of Heads", ylim=c(0,24), xlim=c(0,1))
legend("topleft", legend="50 000 iterations")

Looking at summary statistics provides the same information:
summary(result[[1]][1:10])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.7084 0.7439 0.7439 0.7404 0.7440 0.7440
summary(result[[1]][1:50])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6361 0.6361 0.6404 0.6655 0.6719 0.7440
summary(result[[1]][1:100])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.6016 0.6115 0.6361 0.6456 0.6493 0.7440
summary(result[[1]][1:1000])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5735 0.6182 0.6361 0.6379 0.6554 0.7440
summary(result[[1]][1:20000])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5363 0.6115 0.6308 0.6304 0.6490 0.7440
summary(result[[1]])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5318 0.6118 0.6303 0.6300 0.6485 0.7440
As the number of iterations increases, the estimates converge.
Better moves for better convergence
In the Metropolis MCMC above, a new state is proposed according to a Uniform distribution. This is probably suboptimal: after all, if at iteration \(n\) in the chain we have a parameter value with a high posterior probability, it’s probably a good idea to propose a new value for iteration \(n+1\) in the vicinity of the parameter value instead of completely randomly over \([0:1]\). Therefore, we implement a new move, the Scale move.
Scale move
# Function to propose a new parameter value, according to a scaling move centered on the current value
scale_move_propose_new_parameter_value <-function(current_parameter_value, lambda_scaling_parameter) {
# Propose a new value of p
scaling_factor <- exp( lambda_scaling_parameter * ( runif(1) - 0.5 ) )
p_prime <- current_parameter_value * scaling_factor
Hastings_ratio <- scaling_factor
return(list(p_prime, Hastings_ratio))
}
The scale move is non-symmetric: it is more likely to propose smaller values than larger values. Therefore a Hastings ratio needs to be computed. In the case of the scale move, the Hastings ratio turns out to be simple to compute, and is the scaling factor used on the current parameter value to propose the new value.
MCMC with the scale move
We implement a new MCMC function that will use this Scale move instead of the previous uniform move.
# Function to run Metropolis MCMC inference
MHMCMC_ScaleMove <- function (numHeads, numTosses, shape1, shape2, number_iterations, lambda_scaling_parameter) {
current_parameter_value <- propose_new_parameter_value()
record_parameter <- c()
record_parameter <- c(record_parameter,current_parameter_value)
#print(paste("Initial parameter value for the MCMC: ", current_parameter_value, sep=""))
current_posterior <- unnormalized_posterior(numHeads, numTosses, current_parameter_value, shape1, shape2)
#print(paste("Initial probability of the model: ", current_posterior, sep=""))
record_posterior <- c()
record_posterior <- c(record_posterior, current_posterior)
# We also record the likelihood:
current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
record_likelihood <- c()
record_likelihood <- c(record_likelihood, current_likelihood)
for (i in 1:number_iterations) {
acceptance_threshold <- runif(1)
value_and_HastingsRatio <- scale_move_propose_new_parameter_value(current_parameter_value, lambda_scaling_parameter = lambda_scaling_parameter)
proposed_parameter_value <- value_and_HastingsRatio[[1]]
Hastings_ratio <- value_and_HastingsRatio[[2]]
if (proposed_parameter_value > 0 & proposed_parameter_value < 1 ) {
proposed_posterior = unnormalized_posterior(numHeads , numTosses, proposed_parameter_value, shape1, shape2) * Hastings_ratio
if (proposed_posterior / current_posterior > acceptance_threshold) {
current_parameter_value <- proposed_parameter_value
current_posterior <- proposed_posterior
current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
}
}
record_parameter <- c(record_parameter, current_parameter_value)
record_posterior <-c(record_posterior, current_posterior)
record_likelihood <- c(record_likelihood, current_likelihood)
}
return (list(record_parameter, record_posterior, record_likelihood))
}
Running the MCMC with the scale move
We use different lambda scaling parameters.
results_ScaleMove01 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 0.1)
results_ScaleMove1 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 1)
results_ScaleMove10 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 10)
Let’s look at the sampled parameters
summary(results_ScaleMove01[[1]])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5196 0.6107 0.6295 0.6291 0.6478 0.8698
summary(results_ScaleMove1[[1]])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5201 0.6118 0.6305 0.6301 0.6489 0.7434
summary(results_ScaleMove10[[1]])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5491 0.6116 0.6300 0.6303 0.6471 0.7097
par(mfrow=c(3,1), mar=c(0,4,0,2)+0.1)
plot(density(results_ScaleMove01[[1]]), col=colors[1], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="scaling parameter=0.1")
plot(density(results_ScaleMove1[[1]]), col=colors[2], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="scaling parameter=1")
plot(density(results_ScaleMove10[[1]]), col=colors[3], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="scaling parameter=10")

Could we have sampling problems? Investigating the number of sampled values
Let’s count the number of different parameter values sampled during the MCMC. If the MCMC does not move well through parameter space, it will either: - rarely accept changes because the proposal moves too far, therefore the MCMC always samples the same values - always accept changes because the proposal moves very close, therefore the MCMC moves very slowly through parameter space.
# Number of different values sampled
length(unique(results_ScaleMove01[[1]]))
[1] 38739
length(unique(results_ScaleMove1[[1]]))
[1] 7019
length(unique(results_ScaleMove10[[1]]))
[1] 712
# Ratio of the number to the total number of iterations
length(unique(results_ScaleMove01[[1]]))/num_iterations
[1] 0.77478
length(unique(results_ScaleMove1[[1]]))/num_iterations
[1] 0.14038
length(unique(results_ScaleMove10[[1]]))/num_iterations
[1] 0.01424
The first scaling factor produces moves that are very often accepted, probably too often; the second scaling factor seems quite good, and the last one produces moves that are too rarely accepted. In the latter case, it is likely that in lots of cases, values outside of \([0,1]\) have been proposed and rejected.
We could look for even better sampling characteristics: numerical studies have shown that the optimal asymptotic acceptance rate is 0.234 for multi-dimensional models, and 0.44 for unidimensional models. However, as long as a move has an acceptance probability between \(0.15\) and \(0.5\), one can consider that the chain mixes well, because it is at least \(80%\) efficient compared to optimality (Yang and Rosenthal, 2016, http://probability.ca/jeff/ftpdir/jinyoung1.pdf).
A better scaling parameter for the scaling move
So let’s run another chain, with a scaling parameter that should give better acceptance rates.
results_ScaleMove04 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 0.4)
length(unique(results_ScaleMove04[[1]]))
[1] 17094
length(unique(results_ScaleMove04[[1]]))/num_iterations
[1] 0.34188
Let’s compare to our Metropolis MCMC which was proposing values uniformly over \([0,1]\):
length(unique(result[[1]]))
[1] 4443
length(unique(result[[1]]))/num_iterations
[1] 0.08886
So the MCMC with a scaling move can have a much better sampling rate than our Metropolis MCMC.
Investigating autocorrelation
However, it could be navigating the parameter space not very efficiently: subsequent samples could be very correlated. To look into this, we can compute the autocorrelation in our samples using the \(acf\) function, which computes the autocorrelation after 1, 2, 3, … n iterations.
par(mfrow=c(5,1), mar=c(0,4,0,2)+0.1)
acf(results_ScaleMove01[[1]], xaxt="n")
legend("topright", "Scaling 0.1")
acf(results_ScaleMove04[[1]], xaxt="n")
legend("topright", "Scaling 0.4")
acf(results_ScaleMove1[[1]], xaxt="n")
legend("topright", "Scaling 1")
acf(results_ScaleMove10[[1]], xaxt="n")
legend("topright", "Scaling 10")
acf(result[[1]])
legend("topright", "Uniform")

The scaling move with scaling parameter 0.4 seems to have the best behavior. The autocorrelation with the scaling parameter 0.1 was a bit worse than that with the parameter 1.0, which confirms that the moves were too narrow.
The effective sample size
It is possible to summarize both the number of different values and the autocorrelation with one single number, the Effective Sample Size (ESS). This is obtained with the following formula:
\[
ESS = \frac{number~of~samples}{1+2\sum_{k=1}^{\infty}\rho(k)}
\]
where \(\rho(k)\) is the autocorrelation plotted in the \(acf\) plots above.
Let’s build an ESS function:
ess <- function (values) {
x <- acf(values, plot=F)
list_of_rhos <- x$acf[2:length(x$acf)]
return (length(values) / (1 + 2*sum(list_of_rhos)) )
}
And we use it on all our chains:
ess(results_ScaleMove01[[1]])
[1] 3764.796
ess(results_ScaleMove04[[1]])
[1] 11204
ess(results_ScaleMove1[[1]])
[1] 4631.386
ess(results_ScaleMove10[[1]])
[1] 812.992
ess(result[[1]])
[1] 3219.739
This confirms that the scaling move with scaling parameter 0.4 is much better than the other MCMCs.
Therefore our best estimate of the distribution should be:
par(mfrow=c(1,1), mar=c(4,4,2,2)+0.1)
plot(density(results_ScaleMove04[[1]]), col="orange", lwd=2, main="", ylab="Density", xlab="Probability of heads", ylim=c(0,24), xlim=c(0,1))
legend("topleft", legend="Distribution obtained with the scaling move, lambda=0.4", lwd=2, col="orange")

---
title: "Coin flip analysis by MCMC: convergence and moves"
output: html_notebook
---

Here we will perform Bayesian MCMC inference of the probability of heads based on coin tosses.
We will investigate convergence of the MCMC chains in various conditions.

We have some data providing the results of 300 coin tosses. We would like to estimate how fair the coin is, i.e. what is the probability of getting heads (1).
```{r}
data = c(1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1,1,0,1,0,0,1,1,1,0,0,1,1,1,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0,1,1)
sum(data)
```
From our data, we see that we obtained 190 heads out of 300 tosses.

The ML estimate of the probability of heads is simply the ratio of heads over all tosses: 

```{r}
sum(data)/length(data)
```

# We build a probabilistic model of coin tossing.

All coin tosses are supposed to be independent tosses of the same coin, which always have the same probability of returning a head. We want to perform Bayesian inference, therefore we need a prior. For inference, we will be using Metropolis MCMC.


## Defining a prior

We need to put some prior probability on the fairness of the coin. For this, a beta distribution seems appropriate, as it is a continuous distribution between 0 and 1. We choose to use a=4, b=4. This is a somewhat informative prior that the coin should be fair, given our past experience with coins.


```{r}
x <- seq(-0, 1, length=100)
colors <- c("red", "purple", "blue", "darkgreen", "gold")
shape1s = c(1,2,4,1,0.2)
shape2s = c(1,2,4,4,0.2)
a=4
b=4
```



## Building of the model
We build the functions necessary to compute the likelihood and the prior probability for a given value of the parameter.

```{r}
# Function to compute the likelihood P(D|M)
# We use the binomial formula.
likelihood <-function (numHeads, numTosses, parameter){
    p = choose(numTosses, numHeads)*parameter^numHeads * (1-parameter)^(numTosses-numHeads)
    return (p)
}

# Function to compute the prior P(M)
prior <- function (parameter, shape1, shape2){
    return (dbeta(parameter, shape1=shape1, shape2=shape2) )
}
# Function to compute the un-normalized posterior P(D|M) * P(M)
unnormalized_posterior <-function (numHeads, numTosses, parameter, shape1, shape2) {
    return (likelihood(numHeads, numTosses, parameter) * prior(parameter, shape1, shape2))
}
```


## Inference of the fairness of the coin using MCMC

We build a MCMC chain to estimate the probability of heads for this coin. First we define the model, with the prior, the likelihood and the posterior probability, then we implement a Metropolis MCMC inference mechanism.


### Implementing the MCMC algorithm

```{r}
# Number of iterations for the MCMC algorithm
num_iterations = 50000

# Function to propose a new parameter value, randomly drawn between 0 and 1
propose_new_parameter_value <-function() {
    return (runif(1))
}

# Function to run Metropolis MCMC inference
MetropolisMCMC <- function (numHeads, numTosses, shape1, shape2, number_iterations) {
    current_parameter_value <- propose_new_parameter_value()
    record_parameter <- c()
    record_parameter <- c(record_parameter,current_parameter_value)
    #print(paste("Initial parameter value for the MCMC: ", current_parameter_value, sep=""))
    current_posterior <- unnormalized_posterior(numHeads, numTosses, current_parameter_value, shape1, shape2)
    #print(paste("Initial probability of the model: ", current_posterior, sep=""))
    record_posterior <- c()
    record_posterior <- c(record_posterior, current_posterior)
    # We also record the likelihood:
    current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
    record_likelihood <- c()
    record_likelihood <- c(record_likelihood, current_likelihood)
    for (i in 1:number_iterations) {
        acceptance_threshold <- runif(1)
        proposed_parameter_value <- propose_new_parameter_value()
        proposed_posterior = unnormalized_posterior(numHeads , numTosses, proposed_parameter_value, shape1, shape2)
        if (proposed_posterior / current_posterior > acceptance_threshold) {
            current_parameter_value <- proposed_parameter_value
            current_posterior <- proposed_posterior
            current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
        }
        record_parameter <- c(record_parameter, current_parameter_value)
        record_posterior <-c(record_posterior, current_posterior)
        record_likelihood <- c(record_likelihood, current_likelihood)
    }
    return (list(record_parameter, record_posterior, record_likelihood))
}
```

### Running the MCMC

```{r}
result = MetropolisMCMC(sum(data), length(data), a, b, num_iterations)

```

## Comparison of the posterior distributions with respect to the number of iterations

We plot in different colors the posterior distributions.

```{r}
par(mfrow=c(6,1), mar=c(0,4,0,2)+0.1)
plot(density(result[[1]][1:10]), col=colors[1], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="10 iterations")
plot(density(result[[1]][1:50]), col=colors[2], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="50 iterations")
plot(density(result[[1]][1:100]), col=colors[3], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="100 iterations")
plot(density(result[[1]][1:1000]), col=colors[4], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="1000 iterations")
plot(density(result[[1]][1:20000]), col=colors[5], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="20 000 iterations")
plot(density(result[[1]]), col="black", lwd=2, main="", ylab="Density", xlab="Probability of Heads", ylim=c(0,24), xlim=c(0,1))
legend("topleft", legend="50 000 iterations")
```


Looking at summary statistics provides the same information:
```{r}
summary(result[[1]][1:10])
summary(result[[1]][1:50])
summary(result[[1]][1:100])
summary(result[[1]][1:1000])
summary(result[[1]][1:20000])
summary(result[[1]])
```

As the number of iterations increases, the estimates converge.

## Better moves for better convergence
In the Metropolis MCMC above, a new state is proposed according to a Uniform distribution. This is probably suboptimal: after all, if at iteration $n$ in the chain we have a parameter value with a high posterior probability, it's probably a good idea to propose a new value for iteration $n+1$ in the vicinity of the parameter value instead of completely randomly over $[0:1]$.
Therefore, we implement a new move, the Scale move.

### Scale move
```{r}
# Function to propose a new parameter value, according to a scaling move centered on the current value
scale_move_propose_new_parameter_value <-function(current_parameter_value, lambda_scaling_parameter) {
  # Propose a new value of p
  scaling_factor <- exp( lambda_scaling_parameter * ( runif(1) - 0.5 ) )
  p_prime <- current_parameter_value * scaling_factor
  Hastings_ratio <- scaling_factor
  return(list(p_prime, Hastings_ratio))
}

```
The scale move is non-symmetric: it is more likely to propose smaller values than larger values. Therefore a Hastings ratio needs to be computed. In the case of the scale move, the Hastings ratio turns out to be simple to compute, and is the scaling factor used on the current parameter value to propose the new value.

### MCMC with the scale move
We implement a new MCMC function that will use this Scale move instead of the previous uniform move.

```{r}
# Function to run Metropolis MCMC inference
MHMCMC_ScaleMove <- function (numHeads, numTosses, shape1, shape2, number_iterations, lambda_scaling_parameter) {
    current_parameter_value <- propose_new_parameter_value()
    record_parameter <- c()
    record_parameter <- c(record_parameter,current_parameter_value)
    #print(paste("Initial parameter value for the MCMC: ", current_parameter_value, sep=""))
    current_posterior <- unnormalized_posterior(numHeads, numTosses, current_parameter_value, shape1, shape2)
    #print(paste("Initial probability of the model: ", current_posterior, sep=""))
    record_posterior <- c()
    record_posterior <- c(record_posterior, current_posterior)
    # We also record the likelihood:
    current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
    record_likelihood <- c()
    record_likelihood <- c(record_likelihood, current_likelihood)
    for (i in 1:number_iterations) {
        acceptance_threshold <- runif(1)
         
        value_and_HastingsRatio <- scale_move_propose_new_parameter_value(current_parameter_value, lambda_scaling_parameter = lambda_scaling_parameter)
        proposed_parameter_value <- value_and_HastingsRatio[[1]]
        Hastings_ratio <- value_and_HastingsRatio[[2]]
        if (proposed_parameter_value > 0 & proposed_parameter_value < 1 ) {
          proposed_posterior = unnormalized_posterior(numHeads , numTosses, proposed_parameter_value, shape1, shape2) * Hastings_ratio
          if (proposed_posterior / current_posterior > acceptance_threshold) {
              current_parameter_value <- proposed_parameter_value
              current_posterior <- proposed_posterior
              current_likelihood <- likelihood(numHeads, numTosses, current_parameter_value)
          }
        }
        record_parameter <- c(record_parameter, current_parameter_value)
        record_posterior <-c(record_posterior, current_posterior)
        record_likelihood <- c(record_likelihood, current_likelihood)
    }
    return (list(record_parameter, record_posterior, record_likelihood))
}
```


### Running the MCMC with the scale move

We use different lambda scaling parameters.
```{r}
results_ScaleMove01 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 0.1)
results_ScaleMove1 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 1)
results_ScaleMove10 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 10)

```

### Let's look at the sampled parameters
```{r}
summary(results_ScaleMove01[[1]])
summary(results_ScaleMove1[[1]]) 
summary(results_ScaleMove10[[1]]) 

```
```{r}
par(mfrow=c(3,1), mar=c(0,4,0,2)+0.1)
plot(density(results_ScaleMove01[[1]]), col=colors[1], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="scaling parameter=0.1")
plot(density(results_ScaleMove1[[1]]), col=colors[2], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="scaling parameter=1")
plot(density(results_ScaleMove10[[1]]), col=colors[3], lwd=2, main="", ylab="Density", xlab="", ylim=c(0,24), xlim=c(0,1), xaxt="n")
legend("topleft", legend="scaling parameter=10")

```

### Could we have sampling problems? Investigating the number of sampled values

Let's count the number of different parameter values sampled during the MCMC. If the MCMC does not move well through parameter space, it will either:
- rarely accept changes because the proposal moves too far, therefore the MCMC always samples the same values
- always accept changes because the proposal moves very close, therefore the MCMC moves very slowly through parameter space.

```{r}
# Number of different values sampled
length(unique(results_ScaleMove01[[1]]))
length(unique(results_ScaleMove1[[1]]))
length(unique(results_ScaleMove10[[1]]))

# Ratio of the number to the total number of iterations
length(unique(results_ScaleMove01[[1]]))/num_iterations
length(unique(results_ScaleMove1[[1]]))/num_iterations
length(unique(results_ScaleMove10[[1]]))/num_iterations
```

The first scaling factor produces moves that are very often accepted, probably too often; the second scaling factor seems quite good, and the last one produces moves that are too rarely accepted. In the latter case, it is likely that in lots of cases, values outside of $[0,1]$ have been proposed and rejected.

We could look for even better sampling characteristics: numerical studies have shown that the optimal asymptotic acceptance rate is 0.234 for multi-dimensional models, and 0.44 for unidimensional models. However, as long as a move has an acceptance probability between $0.15$ and $0.5$, one can consider that the chain mixes well, because it is at least $80%$ efficient compared to optimality (Yang and Rosenthal, 2016, http://probability.ca/jeff/ftpdir/jinyoung1.pdf).

### A better scaling parameter for the scaling move
So let's run another chain, with a scaling parameter that should give better acceptance rates.

```{r}
results_ScaleMove04 <- MHMCMC_ScaleMove(sum(data), length(data), a, b, num_iterations, 0.4)
```


```{r}
length(unique(results_ScaleMove04[[1]]))
length(unique(results_ScaleMove04[[1]]))/num_iterations
```

Let's compare to our Metropolis MCMC which was proposing values uniformly over $[0,1]$:

```{r}
length(unique(result[[1]]))
length(unique(result[[1]]))/num_iterations
```

So the MCMC with a scaling move can have a much better sampling rate than our Metropolis MCMC.


### Investigating autocorrelation
However, it could be navigating the parameter space not very efficiently: subsequent samples could be very correlated. To look into this, we can compute the autocorrelation in our samples using the $acf$ function, which computes the autocorrelation after 1, 2, 3, ... n iterations.


```{r}
par(mfrow=c(5,1), mar=c(0,4,0,2)+0.1)
acf(results_ScaleMove01[[1]], xaxt="n")
legend("topright", "Scaling 0.1")
acf(results_ScaleMove04[[1]], xaxt="n")
legend("topright", "Scaling 0.4")
acf(results_ScaleMove1[[1]], xaxt="n")
legend("topright", "Scaling 1")
acf(results_ScaleMove10[[1]], xaxt="n")
legend("topright", "Scaling 10")
acf(result[[1]])
legend("topright", "Uniform")

```


The scaling move with scaling parameter 0.4 seems to have the best behavior. The autocorrelation with the scaling parameter 0.1 was a bit worse than that with the parameter 1.0, which confirms that the moves were too narrow.


### The effective sample size
It is possible to summarize both the number of different values and the autocorrelation with one single number, the Effective Sample Size (ESS). This is obtained with the following formula:

$$
ESS = \frac{number~of~samples}{1+2\sum_{k=1}^{\infty}\rho(k)}
$$

where $\rho(k)$ is the autocorrelation plotted in the $acf$ plots above.

Let's build an ESS function:
```{r}
ess <- function (values) {
  x <- acf(values, plot=F)
  list_of_rhos <- x$acf[2:length(x$acf)]
  return (length(values) / (1 + 2*sum(list_of_rhos)) )
}
```

And we use it on all our chains:

```{r}
ess(results_ScaleMove01[[1]])
ess(results_ScaleMove04[[1]])
ess(results_ScaleMove1[[1]])
ess(results_ScaleMove10[[1]])
ess(result[[1]])
```

This confirms that the scaling move with scaling parameter 0.4 is much better than the other MCMCs.

Therefore our best estimate of the distribution should be:
```{r}
par(mfrow=c(1,1), mar=c(4,4,2,2)+0.1)
plot(density(results_ScaleMove04[[1]]), col="orange", lwd=2, main="", ylab="Density", xlab="Probability of heads", ylim=c(0,24), xlim=c(0,1))
legend("topleft", legend="Distribution obtained with the scaling move, lambda=0.4", lwd=2, col="orange")
```



