Section 5.1, Problem 3

Creating the Monte Carlo method with \(n = 5000\) points, the following simulation is created using the uniform distribution for generating \(x\) and \(y\) values:

n <- 5000  # number of simulations
Q <- 0     # counter for number in circle

for (i in 1:n) {
  # generate random numbers
  x <- runif(1, 0, 1)
  y <- runif(1, 0, 1)
  # if inside cirlce; increment
  if (x^2 + y^2 <= 1) {
    Q <- Q + 1
  }
}

# calculate ratio of areas
ratio <- Q / n
# estimate pi
pi_est <- ratio * 4

The estimated value produced by the simulation is \(\pi \approx 3.1616\).

Section 5.2, Problem 1

A function is written to generate the numbers using the middle-square method:

mid_sq <- function(seed, n) {
  # start sequence with seed
  gen <- seed
  # get length of seed
  len <- nchar(as.character(seed))
  # determine which digits to grab from squared numbers
  dig_start <- (2 * len) / 4 + 1
  dig_end   <- 3 * (2 * len) / 4
  for (i in 1:n) {
    # square last number generated
    new_gen <- gen[length(gen)]^2
    # convert new number to string to subset
    new_gen <- as.character(new_gen)
    # ensure number is of proper length
    lead_zero <- len * 2 - nchar(new_gen)
    new_gen <- paste0(paste(rep('0', lead_zero), collapse = ''), new_gen)
    # get middle digits
    new_gen <- substr(new_gen, dig_start, dig_end)
    # add new number to array of generated numbers
    gen <- c(gen, as.numeric(new_gen))
  }
  gen
}
  a b c
0 1009 653217 3043
1 180 692449 2598
2 324 485617 7496
3 1049 823870 1900
4 1004 761776 6100
5 80 302674 2100
6 64 611550 4100
7 40 993402 8100
8 16 847533 6100
9 2 312186 2100
10 0 460098 4100
11 690169 8100
12 333248 6100
13 54229 2100
14 940784 4100
15 74534 8100
16 555317
17 376970
18 106380
19 316704
20 301423

Random sequence a rapidly degenerates to zero. Sequence b exhibits degeneration, but it is far less rapid than part a. Sequence c exhibits cycling beginning with the fourth number generated.

Section 5.3, Problem 4

The total of the two dice is simulated by running a Monte Carlo simulation for each weighted die:

# load probabilities
die_1 <- c(0.1, 0.1, 0.2, 0.3, 0.2, 0.1)
die_2 <- c(0.3, 0.1, 0.2, 0.1, 0.05, 0.25)
# convert to cumulative probabilities
die_1 <- cumsum(die_1)
die_2 <- cumsum(die_2)

# set up container for sums
tot <- numeric()

for (i in 1:300) {
  # generate two random numbers
  d1 <- runif(1)
  d2 <- runif(1)
  # get rolls associated with values
  d1 <- min(which(d1 < die_1))
  d2 <- min(which(d2 < die_2))
  # add sum to set of sums
  tot <- c(tot, d1 + d2)
}

The count of observation by sum is below:

Sum Count Proportion
2 13 0.04333
3 9 0.03
4 30 0.1
5 43 0.1433
6 48 0.16
7 51 0.17
8 43 0.1433
9 23 0.07667
10 20 0.06667
11 10 0.03333
12 10 0.03333

Section 5.3, Project 4

To generate the probabilities for the Monte Carlo simulation, the provided odds must be converted to probabilies: \[n:1 \ \ \textrm{odds} \implies \ \frac{1}{n+1} \ \ \textrm{probability}\]

The sum of the individual probabilities do not add up to exactly 1, so each probability is divided by the sum of the probabilities to give the adjusted probability.

odds <- c(7, 5, 9, 12, 4, 35, 15, 4)
prob <- 1 / (odds + 1)
prob.adj <- prob / (sum(prob))
  Probability Adjusted.Probability
Euler’s Folly 0.125 0.1304
Leapin’ Leibniz 0.1667 0.1738
Newton Lobell 0.1 0.1043
Count Cauchy 0.07692 0.08022
Pumped up Poisson 0.2 0.2086
Loping L’Hopital 0.02778 0.02897
Steaming’ Stokes 0.0625 0.06518
Dancin’ Dantzig 0.2 0.2086

These adjusted probabilites are used to create the simulation:

# set up counter for wins
wins <- rep(0, length(prob.adj))
# get cumulative distribution
cum_prob <- cumsum(prob.adj)
# run simulation
for (i in 1:1000) {
  # generate random number
  w <- runif(1)
  # determine winning horse
  w <- min(which(w < cum_prob))
  # increment horse's winnings
  wins[w] <- wins[w] + 1
}
  Wins
Euler’s Folly 147
Leapin’ Leibniz 172
Newton Lobell 102
Count Cauchy 91
Pumped up Poisson 199
Loping L’Hopital 20
Steaming’ Stokes 59
Dancin’ Dantzig 210

The horse with the most wins was Dancin’ Dantzig – this is unsurprising, as it was tied for the best odds (4:1). Loping L’Hopital had the fewest wins – this too is unsurprising, since it had the worst odds (35:1).

Section 5.4, Problem 3

Assuming that the lag can only take the discrete values given (i.e. a minimum of 2 days, maximum of 7, and no half days), the simulation can be set up as follows:

lag_count <- c(10, 25, 30, 20, 13, 2)
# calculate cumulative probabilities
lag_prob <- cumsum(lag_count / 100)
# set up counter for lags
lag_days <- rep(0, length(lag_count))
# run simulation
for (i in 1:1000) {
  # generate random number
  l <- runif(1)
  # determine lag duration based on number
  l <- min(which(l < lag_prob))
  # increment lag count
  lag_days[l] <- lag_days[l] + 1
}
Lag time Occurences
2 113
3 247
4 302
5 182
6 136
7 20

Based on the observed occurences, the simulation results align with what would be expected.

Section 5.5, Problem 2

Table 5.18 contains two subtables – one for arrivals, and one for service times. Using the midpoints of each interval:

arrival <- data.frame(
  times = c(19.5, 29.5, 39.5, 49.5, 59.5, 69.5, 79.5, 89.5, 99.5, 109.5, 119.5, 129.5, 140),
  probs = c(0.009, 0.029, 0.035, 0.051, 0.09, 0.161, 0.2, 0.172, 0.125, 0.071, 0.037, 0.017, 0.003)
)

service <- data.frame(
  times = c(47, 52, 57, 62, 67, 72, 77, 82, 87.5),
  probs = c(0.017, 0.045, 0.095, 0.086, 0.13, 0.185, 0.208, 0.143, 0.091)
)

# convert to cumulative probabilities
arrival$probs <- cumsum(arrival$probs)
service$probs <- cumsum(service$probs)

To establish a smooth polynomial function, plots of times vs. probabilities are created for each set of observations:

Arrival Times

The curve for arrival times shown above appears to resemble that of a cubic equation for \(x\) vs. \(y^3\). As such, a third-degree polynomial is fit to the data:

mod_arr <- lm(times ~ poly(probs, 3, raw = TRUE), data = arrival)
arrival$preds <- predict(mod_arr, arrival)

The equation for the best-fit third-order fit is \[b = 18.4377 + 302.2950 x - 561.4642 x^2 + 371.2266 x^3\] This appears reasonable when plotted against the original data:

Service Times

To try fitting a one-term model, a number of transforms are examined. A transformation using the square root of probability appears to provide a near-linear relationship:

A linear equation is fit for this one-term model:

mod_arr_sq <- lm(times ~ I(sqrt(probs)), data = service)

The equation for this line is given by \[u = 40.4257 + 43.9362 \sqrt{x}\]

Plotting this equation against the original data, it appears reasonable: