Assessment 2

Monte Carlo algorithms project: DNA methylation of cytosines

Zane Brackley z5541280

knitr::opts_chunk$set(echo = TRUE)

Question 1

Plot density f(x)

f <- function(x) {
     (1/3) * dbeta(x, 1, 5) +
     (1/3) * dbeta(x, 3, 5) +
     (1/3) * dbeta(x, 10, 5)
}

x <- seq(0,1, length.out = 100000)
y <- f(x)

plot(x, y, type = "l", col="blue",
     lwd = 2, main = "Mixture of Beta Distributions",
     xlab = expression(theta), ylab = "Density")

I plotted the density function f(x) where:
\[ f(x) = \frac{1}{3}\operatorname{Be}(\alpha_1 = 1,\beta_1 = 5) + \frac{1}{3}\operatorname{Be}(\alpha_2 = 3,\beta_2 = 5) + \frac{1}{3}\operatorname{Be}(\alpha_3 = 10, \beta_3 = 5) \]

Question 2

Implement accept/reject algorithm

set.seed(1234)

K <- max(y)
x_K <- x[which.max(y)]

K
## [1] 1.666667
x_K
## [1] 0
N <- 10000
x_proposal <- runif(N, 0, 1)

u <- runif(N, 0, 1)

accept_prob <- f(x_proposal) / K

accepted_values <- x_proposal[u <= accept_prob]

length(accepted_values)
## [1] 5978

Using K <- max(y), we find the largest value of the density values stored in y. This value becomes the upper bound for the rejection sampling. The maximum density occurs at x = 0, which is found using x[which.max(y)]. Using the given function, at x = 0, the first beta density has value 5, while the other two beta densities have a value of 0, giving us,

\(f(0) = \frac{1}{3}(5) + \frac{1}{3}(0) + \frac{1}{3}(0) = \frac{5}{3} \approx 1.6667\).

Therefore, K = 1.6667 and the corresponding value is x = 0.

We then simulate 10,000 candidate values, stored in x_proposal, from a Uniform(0,1) distribution.

A corresponding value is then generated in u which is used to decide whether to accept or reject the proposed values.

accept_prob calculates the acceptance probability for every proposed value. Since the proposal distribution is Uniform(0,1), its density is \(q(x) = 1\), therefore giving us a rejection sampling acceptance probability of \(\frac{f(x)}{K}\). We then compare the values in u with accept_prob and store the resulting accepted values in accepted_values.

Question 3

Compute the observed acceptance rate and compare it with the theoretical one

observed_acceptance_rate <- length(accepted_values) / N
observed_acceptance_rate
## [1] 0.5978
theoretical_acceptance_rate <- 1 / K
theoretical_acceptance_rate
## [1] 0.6

The observed acceptance rate is calculated as the number of accepted values divided by the total number of proposed values:

\[ \text{Observed acceptance rate} = \frac{\text{number of accepted values}}{N} \]

The theoretical acceptance rate for rejection sampling is:

\[ \text{Theoretical acceptance rate} = \frac{1}{K} \]

This is because the proposal distribution is Uniform(0,1), so q(x) = 1 on the interval [0,1], and the target density integrates to 1. Since \(K \approx 1.6667\),

\[ \frac{1}{K} = \frac{1}{1.6667} \approx 0.6 \]

Therefore, the theoretical acceptance rate is approximately 0.6. The observed acceptance rate should be close to this value, although it may not be exactly equal due to random simulation.

Question 4

Implement an importance sampling algorithm.

theta_imp <- x_proposal

q_theta <- dunif(theta_imp, 0, 1)

weights <- f(theta_imp) / q_theta
norm_weights <- weights / sum(weights)

mean_imp <- sum(theta_imp * norm_weights)
var_imp <- sum((theta_imp - mean_imp)^2 * norm_weights)

mean_imp
## [1] 0.4048396
var_imp
## [1] 0.06148562
head(weights)
## [1] 1.3076239 1.2601404 1.2414027 1.2615142 0.3347256 1.2785976

For the importance sampling algorithm, I used the same 10,000 values generated from the Uniform(0,1) proposal distribution in the accept/reject algorithm. These values are stored in theta_imp.

The importance weights are calculated as

\[ w_i = \frac{f(\theta_i)}{q(\theta_i)} \]

where \(f(\theta_i)\) is the target density and \(q(\theta_i)\) is the proposal density. Since the proposal distribution is Uniform(0,1), we have \(q(\theta_i) = 1\) for \(\theta_i \in [0,1]\). Therefore, the weights simplify to

\[ w_i = f(\theta_i). \]

The weights are then normalised by dividing each weight by the sum of all weights.

Question 5

Compare the densities

density_accept_reject <- density(
  accepted_values,
  from = 0,
  to = 1,
  cut = 0
)

bw_imp <- bw.nrd0(theta_imp)

density_importance <- density(
  theta_imp,
  weights = norm_weights,
  bw = bw_imp,
  from = 0,
  to = 1,
  cut = 0
)

plot(x, y, type = "l", col = "blue", lwd = 2,
     ylim = c(0, 2), main = "Comparison of Target and Estimated Densities",
     xlab = expression(theta), ylab = "Density")

lines(
  density_accept_reject, col = "red",
  lwd = 2, lty = 2
)

lines(
  density_importance, col = "green",
  lwd = 2, lty = 3
)

legend(
  "topright", 
  legend = c("Target Density", "Accept/Reject Density", "Importance Sampling Weighted Density"), 
  col = c("blue", "red", "green"), 
  lwd = 2, lty = c(1,2,3)
)

The blue curve shows the true target density. The red dashed curve shows the density estimate from the values accepted by the accept/reject algorithm. The green dotted curve shows the weighted density estimate from the importance sampling algorithm, where the normalised importance weights are included using the weights argument in the density() function.

Both simulated density estimates are close to the target density, although they are not exactly identical because they are based on random simulation.

Question 6

Which algorithm best approximates the target? Which would you use? Why?

From the plot, the accept/reject algorithm appears to approximate the target density slightly better in this simulation. The density of the accepted values closely follows the shape of the target density, and the observed acceptance rate is approximately 0.5978, which is very close to the theoretical acceptance rate of 0.6. This shows that the accept/reject algorithm is working efficiently for this target distribution.

The importance sampling algorithm also provides a reasonable approximation, and it has the advantage of using all 10,000 simulated values. However, it relies on the importance weights to correct for the difference between the proposal distribution and the target distribution. This can make the approximation more sensitive to the behaviour of the weights.

I would use the accept/reject algorithm for this problem because the acceptance rate is reasonably high, the method is simple, and the accepted values are direct samples from the target distribution. Importance sampling is still useful, especially for estimating expectations, but for simulating values from this target density, accept/reject sampling is more direct.