class: center, middle, inverse, title-slide .title[ # Topic 2, Assignment 1: Rejection Sampling ] .author[ ### Jacob, Kasper & Frigg ] .institute[ ### University of Copenhagen ] .date[ ### 2024/26/09 ] --- ### Setup Our goal is to sample from a probability distribution `\(f\)` on `\([0,\infty)\)`, where `\begin{equation*} f(y) \propto \prod_{i = 1}^{100}\exp(yz_ix_i - \exp(yx_i)) =: h(y), \qquad y \geq 0. \end{equation*}` We will sample from `\(f\)` using rejection sampling, i.e. we want to determine envelopes `\(g/\alpha'\)` such that `\(\alpha'f \leq g\)` (note that it does not matter if we plot one or the other, it only affects the scaling on the second axis). However, we use the unnormalized version of `\(f\)` and instead consider `\(\alpha' h \leq g\)`, where we can sample from `\(g\)`. We load the data: ``` r pois_data <- read_csv("Assignment 2/data/poisson.csv", show_col_types = FALSE) x <- pois_data$x z <- pois_data$z ``` --- ### Implementing `\(h\)` First, we implement two versions of `\(h\)` ``` r h_product = function(y, x = pois_data$x, z = pois_data$z){ sapply(y, function(y) prod(exp(y * z * x - exp(y *x))) * (y >= 0)) } #maybe consider doing vapply/mapply or something like that instead of sapply h_sum = function(y, x = pois_data$x, z = pois_data$z){ sapply(y, function(y) exp(sum(y * z * x - exp(y *x))) * (y >= 0)) } ``` --- ### Implementing `\(h\)` We test that they compute the same thing ``` r test_that("The two versions of h match", { expect_equal( h_product(seq(0, 1, length.out = 1e4)), h_sum(seq(0, 1, length.out = 1e4))) }) ``` ``` ## Test passed 😸 ``` --- ### Implementing `\(h\)` And we also plot the function ``` r ggplot() + geom_function(fun = h_product) + xlab("y") + ylab("density") ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-4-1.png" width="720" /> --- ### Choosing implementation of `\(h\)` As we need to calculate `\(h\)` a lot of times we perform a benchmark already and choose an implementation based on the result. We do not test how the the implementations scale with size since this is not relevant (most of the time) for our purposes: ``` r bench::mark( h_product(runif(1000)), h_sum(runif(1000)) ) %>% autoplot() ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-5-1.png" width="720" /> ``` r h <- h_sum ``` --- ### Method for finding `\(\alpha'\)` We will use the formula in the notes, that is `\begin{equation*} \alpha' = \inf_{x: q(x)>0} \frac{p(x)}{q(x)}. \end{equation*}` Where q is our unnormalized target distribution (h) and p is our proposal. If `\(\alpha' > 0\)` we can use that, otherwise `\(p\)` cannot be used to construct an envelope. We minimize the fraction, but only over the interval `\((0,1)\)`, as all of the support of `\(h\)` is there. ``` r alpha_opt <- function(p, q = h, interval = c(0,1), ...){ val <- optimize(f = function(y) p(y, ...) / q(y), interval = interval, maximum = F)$objective #we take objective as we are not taking argmin if(val > 0){ return(val) } else{ stop("alpha_prime is not strictly greater than 0, \n choose a different proposal p") } } ``` --- ### Creating envelopes First we consider the case where `\(g\)` is the standard guassian density. Using the method described on last slide, we find an alpha and plot the corresponding envelope together with the difference between the target and the proposed envelope: <img src="presentation_compstat_files/figure-html/unnamed-chunk-7-1.png" width="720" /> --- ### Finding optimal alpha' This envelope is not very tight, and we suspect that we have a low acceptance rate. To estimate this, we note that if `\(\exists\alpha>0\)` such that `\(\alpha f \leq g\)`, where `\(f\)` and `\(g\)` are densities, then `\(\alpha\)` is the acceptance probability. If either `\(f\)` or `\(g\)` are not densities, we can simply scale by the integral. Obviously, we want the acceptance probability to be large, that is we do not throw away too many samples. We implement an estimator of the accept probability. We first make sure we have an envelope by using `\(\texttt{alpha_opt}\)` ``` r #right now it assumes that the proposal is a true density, fix with integral est_accept_prob <- function(target = h, proposal, interval = c(0,1), ...){ alpha_prime <- alpha_opt(p = function(x) proposal(x), q = function(x) target(x)) alpha_prime * integrate(function(x) target(x), -Inf, Inf)$value / integrate(function(x) proposal(x), -Inf, Inf)$value #maybe we should divide by integral of proposal! } ``` We estimate the accept probability using the standard gaussian envelope and find that this is very low. ``` r est_accept_prob(proposal = function(x) dnorm(x)) ``` ``` ## [1] 0.05506426 ``` --- ### The most optimal envelope We want to maximize the acceptance rate. We optimize over all means and standard errors and plot the corresponding envelope and the differences ``` r objective_function <- function(par, target, interval = c(0,1)){ mu <- par[1] sd <- par[2] -est_accept_prob(target, proposal = function(y) dnorm(y, mean = mu, sd = sd)) } (opt_par <- optim(par = c(0.2, 0.2), fn = objective_function, target = h, interval = c(0,1))$par) ``` ``` ## [1] 0.23982832 0.06151772 ``` --- ### The most optimal envelope We view the result of using a gaussian envelope with the estimated mean and sd: <img src="presentation_compstat_files/figure-html/unnamed-chunk-19-1.png" width="720" /> --- ### Implementing rejection sampling We create an S3 object ``` r gaussian_envelope2 <- function(target, proposal = "tight", interval = c(0,1), ...){ #proposal can either be tight, target_based or standard #if tight, then fit the tightest envelope #if target_based, use mean and sd of target #if standard, use standard gaussian envelope if(proposal == "tight"){ params <- opt_par <- optim(par = c(0.2, 0.2), fn = objective_function, target = h, interval = c(0,1))$par proposal <- function(x) dnorm(x, opt_par[1], opt_par[2]) } else if(proposal == "target_based"){ params <- unlist(mean_sd_fun(h)[c("mean_target", "sd_target")]) proposal <- function(x) dnorm(x, params[1], params[2]) } else if(proposal == "standard"){ params <- c(0, 1) #mean = 0, sd = 1 proposal <- function(x) dnorm(x) } alpha_prime <- alpha_opt(p = proposal, q = target) envelope <- function(x) proposal(x) / alpha_prime est_accept_prob <- est_accept_prob(target = target, proposal = proposal) target_dens <- function(x) target(x) / mean_sd_fun(target)[[1]] #normalizes the target structure( list( target = target, target_dens = target_dens, proposal = proposal, params = params, envelope = envelope, alpha_prime = alpha_prime, est_accept_prob = est_accept_prob ), class = "gaussian_envelope2" ) } envelopes2 <- list("gaussian" = gaussian_envelope2) ``` --- ### Implementing rejection sampling Now we implement a generic function that performs rejection sampling from the gaussian_envelope2 ``` r sample_from_envelope <- function(envelope, ...){ UseMethod("sample_from_envelope") } sample_from_envelope.gaussian_envelope2 <- function(envelope, n, ...){ samples <- c() #n is sample size we want counter <- 0 #a counter to see how many times we have to sample so we can calculate the accept rate index_counter <- 1 #counter for the index params <- envelope$params while(length(samples) < n){ Y <- rnorm(1, mean = params[1], sd = params[2]) #sample from envelope U <- runif(1) #sample uniform to check if reject or not if(U <= envelope$target(Y) / envelope$envelope(Y)){ #we use the equation about unnormalized densities from CSwR samples[index_counter] <- Y index_counter <- index_counter + 1 } counter <- counter + 1 } calc_accept_prob <- n / counter structure( list( samples = samples, number_of_discarded_samples = counter - n, est_accept_prob= envelope$est_accept_prob, calc_accept_prob = calc_accept_prob, target = envelope$target, target_dens = envelope$target_dens, proposal = envelope$proposal, envelope = envelope$envelope, alpha_prime = envelope$alpha_prime, class_type = envelope #this is a bit odd right now... ), class = "rejection_sampling2" ) } ``` --- ### Implementing rejection sampling Finally we implement a single function that performs rejection sampling ``` r rejection_sampler2 <- function(target, envelope, n, ...) { envelope_obj <- envelopes2[[envelope]](target, ...) sample_result <- sample_from_envelope(envelope_obj, n, ...) sample_result$class_type <- envelope_obj #only so we can plot in the right way return(sample_result) } ``` --- ### We test the implementation ``` r samples2 <- rejection_sampler2(h, "gaussian", 1e4) plot(samples2) # we have implemted a plot method for our object ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-26-1.png" width="720" /> --- ### Profiling implementation ``` r source("sourcecode_pres2.R") profvis::profvis(rejection_sampler2(h, "gaussian", 1e4)) ```
--- ### The adaptive envelope We now want to implement an adaptive envelopes. We consider the approach in CSwR. For a given target `\(h\)`, we also need to compute its derivative. We can either do this analytically or numerically. We will consider the first approach, but it can be done numerically using for instance the `\(\texttt{numDeriv}\)`-package in `\(\texttt{R}\)`. Recall that `\begin{equation*} h(y) = \exp\left(\sum_{i = 1}^{100}x_i y z_i - \exp(y x_i)\right), \qquad y\geq 0, \end{equation*}` and thus `\begin{align*} h'(y) &= \exp\left(\sum_{i = 1}^{100}x_i y z_i - \exp(y x_i)\right) \sum_{i = 1}^{100} \left(x_i z_i - x_i \exp(y x_i)\right), \qquad y\geq 0 \\ &= h(y) \sum_{i = 1}^{100} \left(x_i z_i - x_i \exp(y x_i)\right), \qquad y\geq 0. \end{align*}` ``` r h_prime <- function(y, x = pois_data$x, z = pois_data$z){ sapply(y, function(y) h(y) * sum(x * z - x * exp(y * x))) } ``` --- ### Implemeting the adaptive envelope We now implement an adaptive envelope class ``` r adaptive_envelope2 <- function(target, target_deriv, breaks){ #consider implementing a stop-criterion if the target-density at the min and #max of breaks is too close to 0 #the envelope should be where the support of the target is m <- length(breaks) #number of breaks we make a <- target_deriv(breaks) / target(breaks) b <- log(target(breaks)) - a * breaks z <- numeric(m + 1) #we index starting at 0 z[1] <- -Inf #this could be something else, maybe 0 z[m + 1] <- Inf #same as above - maybe 1 z[2:m] <- (b[2:m] - b[1:(m - 1)]) / (a[1:(m - 1)] - a[2:m]) #fill the rest of z-vector R <- 1/a * exp(b) * (exp(a * z[2:(m + 1)]) - exp(a * z[1:m])) #proper index R with the z's Q <- cumsum(c(0, R)) #sequentially add more R's to Q d <- Q[m + 1] #take the maximum Q / sum(R) log_envelope <- function(y){ # creates the linear envelope on the interval i = which.max(y < z) - 1 a[i] * y + b[i] #this is the V function in the notes } sample_func <- function(q){ # creates the sample for the proposal on the given interval i = which.max(d * q < Q) - 1 log(a[i] * exp(-b[i]) * (d * q - Q[i]) + exp(a[i] * z[i])) / a[i] #solution to the equation we seek to solve #maybe these should be i-1 for all but Q and z? } envelope <- function(y){ sapply(y, FUN = function(y) exp(log_envelope(y))) #the envelope is actually exp(V) } sample_function <- function(q){ sapply(q, FUN = function(q) sample_func(q)) * (abs(q) < 1) #maybe multiply by (abs(q) < 1) so that it only takes inputs in (0,1) or make user input } est_accept_prob <- est_accept_prob(target = target, proposal = envelope) target_dens <- function(x) target(x) / mean_sd_fun(target)[[1]] structure( list( target = target, target_dens = target_dens, breaks = breaks, est_accept_prob = est_accept_prob, envelope = envelope, #envelope sample_function = sample_function #proposal ), class = "adaptive_envelope2" ) } ``` --- ### Implemeting the adaptive envelope We implement a way of sampling from this class: ``` r sample_from_envelope.adaptive_envelope2 <- function(envelope, n, ...){ samples <- numeric(n) counter <- 0 index_counter <- 1 while(index_counter < n){ U1 <- runif(1) U2 <- runif(1) Y <- envelope$sample_function(U2) #we sample from this new one if(U1 <= envelope$target(Y) / envelope$envelope(Y)){ samples[index_counter] <- Y index_counter <- index_counter + 1 } counter <- counter + 1 } structure( list( samples = samples, calc_accept_prob = n / counter, number_of_discard_samples = counter - n, #also print estimated accept_rate? target = envelope$target, target_dens = envelope$target_dens, envelope = envelope, class_type = envelope ), class = "rejection_sampling2" ) } ``` --- ### Test of the adaptive envelope 2 breaks ``` r samples2_adap <- rejection_sampler2(target = h, target_deriv = h_prime, envelope = "adaptive", n = 1e4, breaks = c(0.2, 0.3)) plot(samples2_adap) ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-33-1.png" width="720" /> --- ### Test of the adaptive envelope 6 breaks ``` r samples2_adap2 <- rejection_sampler2(target = h, target_deriv = h_prime, envelope = "adaptive", n = 1e4, breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.5)) plot(samples2_adap2) ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-34-1.png" width="720" /> --- ### Benchmarking (in sample size) First we see how the runtime and memory allocation is affected by the number of samples we want from `\(f\)`. ``` r benchmark_samplesize <- bench::press( n_test = 2^(10:15), { bench::mark( gaussian = rejection_sampler2(target = h, envelope = "gaussian", n = n_test), adaptive = rejection_sampler2(target = h, target_deriv = h_prime, envelope = "adaptive", breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.5), n = n_test), check = F ) } ) %>% dplyr::select("expression", "n_test", "min", "median", "mem_alloc") ``` --- ### Benchmarking <img src="presentation_compstat_files/figure-html/unnamed-chunk-36-1.png" width="720" /> --- ### Benchmarking (in number of breakpoints) We could be interested in seeing how the run time is affected when increasing the number of breakpoints for a fixed sample (with the adaptive envelope) ``` r benchmark_breakpoints <- bench::press( length = c(5, 20, 50), { bench::mark( gaussian = rejection_sampler2(target = h, envelope = "gaussian", n = 1e4), adaptive = rejection_sampler2(target = h, target_deriv = h_prime, envelope = "adaptive", breaks = seq(0, 0.5, length.out = length), n = 1e4), check = F ) } ) %>% dplyr::select("expression", "length", "min", "median", "mem_alloc") ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-38-1.png" width="720" /> --- ``` r benchmark_results <- bench::mark( gaussian = rejection_sampler2(h, "gaussian", 1e4), adaptive_5 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 5)), adaptive_10 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 10)), adaptive_20 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 20)), adaptive_50 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 50)), iterations = 50, check = FALSE ) bench_plot <- benchmark_results %>% mutate(expression = as.character(expression)) bench_plot ``` ``` ## # A tibble: 5 × 6 ## expression min median `itr/sec` mem_alloc `gc/sec` ## <chr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 gaussian 179ms 188ms 4.96 45.3MB 14.8 ## 2 adaptive_5 283ms 290ms 3.38 20.4MB 17.3 ## 3 adaptive_10 237ms 242ms 4.04 17.1MB 17.4 ## 4 adaptive_20 232ms 237ms 4.04 16.7MB 16.8 ## 5 adaptive_50 235ms 243ms 4.00 21.7MB 16.6 ``` --- ### Benchmarking (in number of breakpoints) ``` r microbenchmark::microbenchmark( gaussian = rejection_sampler2(h, "gaussian", 1e4), adaptive_5 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 5)), adaptive_10 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 10)), adaptive_20 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 20)), adaptive_50 = rejection_sampler2(h, "adaptive", 1e4, target_deriv = h_prime, breaks = seq(0, 0.5, length.out = 50)), times = 50 ) %>% autoplot() ``` <img src="presentation_compstat_files/figure-html/unnamed-chunk-40-1.png" width="720" /> --- ### Profiling profiling on the sampling from the adaptive?