This article was inspired by Bruno Rodrigues’ blog post that you can find here. His entire blog is a fantastic resource if you are into econometrics and R. Here, I attempt to provide a bit more explanations, examples and visual aids on the topic.
Today is March 12th and it’s been already a week or so since I started getting emails about events regarding \(\pi\)-day (Pi day), so I decided to join the hype bandwagon. Although these events are mostly about baking pies and other fun activities, I figured that my contribution would be slightly different. This article describes in simple terms a process for estimating the value of \(\pi\). The process is known as Monte Carlo simulations. We break down the process in three steps:
To kick things off, let’s briefly talk about the two geometric shapes, which are the basis for the entire process: the unit square and the unit circle. The unit square is simply a square whose side length is equal to \(1\)! Actually, the unit square can be conveniently thought of as being the square with the following vertices: \((0,0)\), \((0,1)\), \((1,1)\) and \((1,0)\).
(#fig:unit_square)The unit square
In the same way, the unit circle is a circle of diameter equal to \(1\). Given these definitions, it follows that the unit circle can be inscribed in the unit square.
After establishing the special relationship between the two shapes, it is now time to investigate the following: what is the ratio of the area of the unit circle to the area of the unit square?. In other words, what percentage of the unit square is covered by the unit circle? This question can be answered by using the corresponding areas formulas. If \(r\) represents the radius of the circle, we can compute the circle’s area as :
\[ A_1 = \pi r^2 \]
Also, if \(c\) is the length of one side of the square, then the area of the square is computed as:
\[ A_2 = c^{2} \]
Before we move on to compute the ratio of \(A_1\) to \(A_2\), it is important to identify the relationship between \(c\) and \(r\). We know that \(c = 2r = 1\), therefore:
\[ r = \frac{c}{2} \]
Let’s now compute the areas ratio and answer the question posed in this section:
\[ \frac{A_1}{A_2} = \frac{\pi \frac{c^{2}}{4}}{c^{2}} = \frac{\pi c^2}{4c^2} = \frac{\pi}{4} \\ A_1 = \frac{\pi}{4}A_2 \]
The answer to the question is: the unit circle is \(\frac{\pi}{4}\) times the unit square. This scaling factor is approximately equal to 0.785, which is to say that the unit circle covers about 78.5% of the unit square. At this point, I must add that this result is not restricted to the unit circle and the unit square. It is also true for any circle inscribed in a square.
Let’s now assume that you are an inexperienced archer and that the arrows you shoot hit your target at random points. Let’s say you shoot \(A\) arrows at the square and you want to find out how many arrows also landed in the circle (let the number of arrows that hit the circle be \(B\)). Based on the result in the previous section, we can confidently say that we expect:
\[ B = \frac{\pi}{4}A \]
In other words, we expect that the number of arrows that hit the circle is roughly about 78.5% (i.e. \(\frac{\pi}{4}\)) of the arrows that hit the square. We can rewrite this relationship as:
\[\begin{equation} \pi = \frac{4B}{A} \tag{3.1} \end{equation}\]
Finally, it is time to estimate the value of \(\pi\) using Monte Carlo simulations. The idea is based on what we just described. We will shoot (i.e. simulate) 10000 random arrows at the square and use the number of arrows that hit the square (i.e. \(A\)) and the number of arrows that also hit the circle (i.e. \(B\)) to compute \(\pi\). Remember that our unit square is at the origin of a coordinate system. This means that the x and y coordinates of all shot arrows are between 0 and 1 (inclusive).
This can easily be achieved by simulating 2 uniformly distributed random vectors of 10000 elements each, one for x coordinates and the other for y coordinates.
library(dplyr)
library(magrittr)
set.seed(12345)
inside_square <- tibble(
x = runif(10000),
y = runif(10000)
)
inside_square
# A tibble: 10,000 x 2
x y
<dbl> <dbl>
1 0.720904 0.244320
2 0.875773 0.689401
3 0.760982 0.869641
4 0.886125 0.981234
5 0.456481 0.569278
6 0.166372 0.164329
7 0.325095 0.0245451
8 0.509224 0.0124972
9 0.727705 0.0500600
10 0.989737 0.423860
# ... with 9,990 more rows
Each row of the data set represents a shot arrow. Now, we need to figure out a way to identify the points, which are also inside the unit circle. It turns out that if a point is inside the unit circle, then its coordinates satisfy the following condition:
\[ x^2 + y^2 < 1 \]
We now use this rule to label each point of the data set as 1
if it is in the unit circle and as 0
otherwise.
inside_square %<>%
mutate(unit_circle = if_else(x^2 + y^2 < 1, 1, 0))
inside_square
# A tibble: 10,000 x 3
x y unit_circle
<dbl> <dbl> <dbl>
1 0.720904 0.244320 1
2 0.875773 0.689401 0
3 0.760982 0.869641 0
4 0.886125 0.981234 0
5 0.456481 0.569278 1
6 0.166372 0.164329 1
7 0.325095 0.0245451 1
8 0.509224 0.0124972 1
9 0.727705 0.0500600 1
10 0.989737 0.423860 0
# ... with 9,990 more rows
Now, we have everything we need to estimate \(\pi\): \(A = 10000\) and \(B\) is equal to the number of \(1\)’s in the unit_circle
column. Using Equation (3.1), we estimate \(\pi\) in the following way:
inside_square %>%
summarize(pi_estimate = 4 * sum(unit_circle == 1) / n())
# A tibble: 1 x 1
pi_estimate
<dbl>
1 3.16320
Our result is pretty good, huh? It’s quite close to 3.14! Conducting the experiment with more data yields better results and I’ll end this article with a visualization that illustrates this fact:
inside_square %<>%
mutate(n_points = row_number(),
n_inside = cumsum(unit_circle),
pi_estimate = 4 * n_inside / n_points)
library(ggplot2)
ggplot(data = inside_square, aes(x = n_points, y = pi_estimate)) +
geom_line() +
geom_hline(yintercept = pi, color = "red") +
labs(x = "Number of simulated points", y = "Estimate of pi") +
expand_limits(y = 0) +
theme_classic()