1 Introduction

Researchers often run numerous trials to test the performance of an experiment; this is a great way to see if the results of an experiment is defined by randomness or not. An experiment that generates the desired result a huge proportion of the time is less likely defined by randomness than one that generates the desired result a few times. This is the premise on which the Monte Carlo method was built upon. Monte Carlo method has become the bedrock of testing out the success/outcome of different events in numerous fields. However, the simulations generated by Monte Carlo method can be prone to error. Hence, providing researchers with the wrong idea of how an experiment performs. This paper is focused on exploring the errors stemming from using the Monte Carlo method to run experiments. The error rates will be compared to the number of trials to delineate the relationships between error rate and number of trials.

1.1 Background

There are two types of errors: absolute error and relative error. Absolute error can be defined as the difference between the actual value and the estimated value, while relative error is the absolute error divided by the actual value. Relative error is more of a measure of how far off the estimated value is from the actual value. For example, If LeBron James drops 70 points on Monday and 50 points on Tuesday. His points’ absolute error with respect to Monday can be calculated as \[|50-70| = 20\] , and its relative error with respect to Monday is \[|50-70|/70 = 0.286\] The code written below shows how the absolute and relative errors were generated for a wide range of trial values. Each trial value was combined with five different probability values (0.01, 0.05, 0.1, 0.25 and 0.5) to estimate the number of success. Hence, dividing the estimated number of success with the number of trials leads to the derivation of the estimated probability. Upon obtaining the estimated probability, it can be compared the actual probability to generate both the absolute and relative error.

2 Methods

# Creating empty vectors that will be used to record important variables later
n <- rep(0, 14)
two_to_power <- rep(0, 14)
p_0_01 <- rep(0, 14)
p_0_05 <- rep(0, 14)
p_0_10 <- rep(0, 14)
p_0_25 <- rep(0, 14)
p_0_50 <- rep(0, 14)

# Using a for loop to run numerous simulations for different combinations
for (i in 1:14) {
  m <- i + 1
  n[i] <- m
  two_to_power[i] <- 2**m
  # 1000 observations were ran for each combination.Then, the mean
  # of the outputs was obtained and recorded.
  p_0_01[i] <- mean(rbinom(1000,2**m,0.01))
  p_0_05[i] <- mean(rbinom(1000,2**m,0.05))
  p_0_10[i] <- mean(rbinom(1000,2**m,0.10))
  p_0_25[i] <- mean(rbinom(1000,2**m,0.25))
  p_0_50[i] <- mean(rbinom(1000,2**m,0.50))
}

# Creating a data frame that stores the output of the for loop iteration into columns
df <- data.frame(number = n,power_2 = two_to_power, p_0_01 = p_0_01, p_0_05 = p_0_05, p_0_10 = p_0_10, p_0_25 = p_0_25, p_0_50 = p_0_50)

# Wrangling the data set to make it become more readable to the reader. The probability columns were rearranged into a new column.
df <- df %>% 
  gather(p_0_01:p_0_50, key = "actual", value = "pred")
df$actual <- str_replace_all(df$actual, "p_", "")
df$actual <- str_replace_all(df$actual, "_", ".")
df$actual <- as.double(df$actual)

# Creating two new columns for absolute error and relative error
df <- df %>% 
  mutate(pred_prob = pred/power_2) %>% 
  mutate(absolute = abs(pred_prob - actual), relative = abs(pred_prob-actual)/actual)
df <- na.omit(df) # Removing NA values in the data set
head(df)
##   number power_2 actual  pred   pred_prob    absolute  relative
## 1      2       4   0.01 0.036 0.009000000 0.001000000 0.1000000
## 2      3       8   0.01 0.073 0.009125000 0.000875000 0.0875000
## 3      4      16   0.01 0.162 0.010125000 0.000125000 0.0125000
## 4      5      32   0.01 0.325 0.010156250 0.000156250 0.0156250
## 5      6      64   0.01 0.619 0.009671875 0.000328125 0.0328125
## 6      7     128   0.01 1.270 0.009921875 0.000078125 0.0078125

3 Results

df$actual <- as.character(df$actual) # changing the actual column into a character vector

# Plotting the absolute error chart
ggplot(df) +
  geom_point(aes(x = number, y = absolute, color = actual), size = 2) + 
  geom_line(aes(x = number, y = absolute, color = actual)) + 
  labs(x = parse(text = "N~(log[2]~scale)"), y = "absolute error", title = "Absolute error chart") +
  scale_x_continuous(breaks=seq(2, 15, 1), labels=c(unique(as.character(df$power_2)))) + 
  theme_bw()

The chart above displays the absolute error on the y-axis, and number of trials on the x-axis. Regardless of the probability value, it turns out that as the number of trials increases, the absolute error decreases to approximately zero. Generally speaking, in the leftmost region (N < 64) of the chart, the higher the probability value, the higher the absolute error.

df$actual <- as.character(df$actual)

# Plotting the relative error chart
ggplot(df) +
  geom_point(aes(x = number, y = relative, color = actual), size = 2) + 
  geom_line(aes(x = number, y = relative, color = actual)) +
  labs(x = parse(text = "N~(log[2]~scale)"), y = "relative error", title = "Relative error chart") +
  scale_x_continuous(breaks=seq(2, 15, 1), labels=c(unique(as.character(df$power_2)))) +
  theme_bw()

The chart above displays the relative error on the y-axis, and number of trials on the x-axis. Regardless of the probability value, it turns out that as the number of trials increases, the relative error decreases to approximately zero. Generally speaking, in the leftmost region (N < 64) of the chart, the higher the probability value, the lower the relative error. The reason behind this reversed outcome for the relative error chart, as compared to that of the absolute error, has to do with the fact that the absolute errors of the bigger probability values are being divided by bigger probability values to obtain the relative errors.

4 Conclusion

Nothing is perfect in this world. Mistakes are just as natural as breathing itself. Hence, we should not be surprised that computer functions also make errors. However, we should not be complacent about the errors that computer simulations make. We should strive to reduce the magnitude of error made by computer simulations. In the case of the Monte Carlo method, the best way to reduce the degree of error seems to be an increment in the number of trials.