library(tidyverse)

A common example to illustrate how Monte Carlo simulation can be used is by estimating pi. Here I will show how this can easily be done with a few lines of code using R.

The idea behind this simulation is to generate uniform random (x, y) points within the domain. When estimating pi, the domain is the two-dimensional Cartesian coordinate system of the bounding square of a circle. For each generated point we calculate the distance to the origin to determine if it falls within the circle. Then we aggregate the results and approximate the value of \(\pi\) by taking the ratio of points that fall within the circle and the total number of points and multiply by 4.

The area of a circle is calculated by \(\pi*r^2\), and the area of the bounding square is \((2r)^2 = 4r^2\). Dividing the area of the circle by the area of the square we get the ratio of the two areas:

\[ \frac{\text{area of circle}}{\text{area of square}}=\frac{\pi*r^2}{4*r^2}=\frac{\pi}{4} \] This means we can estimate \(\pi\) using the formula:\[\pi\approx4*\frac{\text{simulated points within circle}}{\text{total number of simulated points}}\]

By simulating random numbers within this square, we get approximated values for the area of the circle and the area of the square. Hence, we can estimate \(\pi\) by taking the ratio of simulated points that fall within the circle and the total number of simulated points, and multiply by 4.

To identify which points fall within the circle we use the equation of the circle: \[(x-a)^2+(y-b)^2 = r^2,\text{ where } (a, b) \text{ is the circle center}\] In our example the circle center is \((0, 0)\) and the radius is 1, so simulated points that satisfy \(\sqrt{x^2+y^2}\le1\) are within the circle.

Writing a Function for the Simulation

Our function will take two arguments, seed for reproducibility, and iterations so that we can choose how many simulated points we want to generate.

The function goes through the following steps:

estimate_pi <- function(seed = 28, iterations = 1000){
    # set seed for reproducibility
    set.seed(seed)
    
    # generate the (x, y) points
    x <- runif(n = iterations, min = 0, max = 1)
    y <- runif(n = iterations, min = 0, max = 1)
    
    # calculate 
    sum_sq_xy <- sqrt(x^2 + y^2) 
    
    # see how many points are within circle
    index_within_circle <- which(sum_sq_xy <= 1)
    points_within_circle = length(index_within_circle)
    
    # estimate pi
    pi_est <- 4 * points_within_circle / iterations
    return(pi_est)
}

Calling the simulate_pi function will return the approximated value of \(\pi\) for the seed and number of points passed to its arguments. Please note that by not having a seed in the function, the simulation will yield different values of the estimation each run due to different random numbers being generated.

# get estimated value of pi for 10,000 points
estimate_pi(seed = 28, iterations = 10000)
## [1] 3.1584

Increased Number of Generated Points Will Improve Estimation

We can estimate pi for different number of generated points in one call using sapply.

Here we get the estimated values for \(\pi\) from 10, 100, 1,000, 10,000, 100,000 and 1,000,000 points.

no_of_iterations <- c(10, 100, 1000, 10000, 100000, 1000000)
res <- sapply(no_of_iterations, function(n)estimate_pi(iterations = n))
names(res) <- no_of_iterations
res
##       10      100     1000    10000    1e+05    1e+06 
## 2.400000 3.160000 3.184000 3.158400 3.138760 3.143816

As we can see, the estimated value of \(\pi\) gets closer to \(\pi\) with larger number of points. This illustrates one important aspect of Monte Carlo simulations, using only a small number of random points will generally give a poor approximation. The approximation will, on average, improve as the number of points increases.

Plot the Estimated Pi Values

To illustrate this we can plot the estimated value of \(\pi\) to see how it changes as the number of points increases. In order to make this plot we need to store each estimated value in the data frame. We can do this by modifying our function as below.

# function estimating pi and storing points and each estimation of pi in a dataframe
est_pi_data <- function(seed=28, iterations = 1000) {
    set.seed(seed)
    
    # store generated points in a data frame
    df <- data.frame(x <- runif(n = iterations, min = 0, max = 1),
                     y <- runif(n = iterations, min = 0, max = 1))
    
    # add a column to index the number of iterations
    df$iteration <- 1:iterations
    
    # add column to identify if point falls within circle
    df$points_within_circle <- ifelse(sqrt(x^2 + y^2) <= 1, 1, 0) 
    
    # estimate pi
    df$pi_est <- 4 * cumsum(df$points_within_circle) / df$iteration
    
    return(df)
}

Now we have a function that generates the data we need. Let’s generate the data and plot the approximated values for \(\pi\) from the first point up to 100,000 points.

# generate estimation data for 100,000 points
pi_data <- est_pi_data(seed = 28, iterations = 100000)

# plot showing estimated pi as number of points increases
ggplot(pi_data, aes(x = iteration, y = pi_est)) +
            geom_line(col = "blue") +
            geom_hline(yintercept = pi) +
            ylim(c(3, 3.3)) +
            labs(title = expression(paste("Approximation of ", pi)),
                 x = "number of points",
                 y = expression(paste("estimated value of ",pi)))