A Brief Overview of Computational Statistics with an Example using Monte Carlo Methods

2024-10-31

What is computational statistics?

It is the application of computational techniques and computers in general to solve statistical problems that are difficult or impossible to solve analytically or by hand.
It typically involves numerical methods or algorithms that can estimate or simulate statistical scenarios.

How is it typically performed?

Common techniques used in computational statistics are:

Monte Carlo methods
Resampling methods
Bayesian methods
Maximum likelihood estimation
Bootstrapping

History of the Monte Carlo technique

The technique was developed by American mathematicians Stanislaw Ulam and John von Neumann while working on the Manhattan Project.
It saw application in quantum mechanics and more generally in complex system dynamics, as well as in the computation of analytic results in path integration.
Recent applications utilize Bayesian inference, and have been able to reduce system noise by orders of magnitude.

How do Monte Carlo methods work?

Unlike traditional simulation approaches, which estimate uncertainty in a known (deterministic) scenario, Monte Carlo methods use probabilistic techniques to solve deterministic problems.
Random samples are repeatedly generated from a probability distribution, and the numerical results are analyzed to make inferences.
The process involves defining a sample space/input domain, generating random inputs over that domain from some probability distribution, and dividing the results into categories to make inferences as to the frequency of those categories.

Example

Consider the following analytic question:

Compute the value of the integral \(\int_{0}^{1}\int_{0}^{1}\int_{0}^{1}e^{-(x + y + z)}dxdydz\).

We will use the Monte Carlo method to estimate the value of this integral by generating random points in the unit cube, and computing the average value of the integrand on those points. For reference, the true value of the integral is \((1 - \frac{1}{e})^{3}\), or roughly 0.25.

Example (cont.)

After generating 1000 random points, I computed the average value of \(e^{-(x_{1} + x_{2} + x_{3})}\) for each 3D point \(x = (x_{1}, x_{2}, x_{3})\) and plotted them.

integrand <- function(x) {
  exp(-sum(x))
}

monte_carlo_integration <- function(num_samples) {
  estimates <- numeric(num_samples)
  errors <- numeric(num_samples)
  for (i in 1:num_samples) {
    points <- matrix(runif(i * 3), ncol = 3) 
    values <- apply(points, 1, integrand)    
    estimates[i] <- mean(values)              
    errors[i] <- abs(estimates[i] - 0.25258)
  }
  return(list(estimates = estimates, errors = errors))
}

Example (cont.)

Here, I calculated the absolute error between the previously computed values and the true value of the integral.