DATA 609: Homework 4

Page 191: # 3

Usint Monte Carlo simulation, write an algorith to calculate an approximation to \(\pi\) by considering the number of random points selected inside the quarter circle

\[Q: x^2 + y^2 = 1, x \geq 0, y\geq 0\]

where the quarter circle is taken to be inside the square

\[S: 0 \leq x \leq 1 \text{ and } 0 \leq y \leq 1\]

sim <- function(iterations) {

   # interations = total number of N random points

   # generate random numbers for xi and yi such that x is between 0 and 1 and
   # y is between 0 and 1
   
   x <- runif(iterations, 0, 1)
   y <- runif(iterations, 0, 1)

   # Q: increment Q if x^2 + y^2 is less than 1
   Q <- x^2 + y^2 < 1
   
   # calculate estimate of area
   est <- (sum(Q)/iterations) * 4
   
   return (est)

}

Number of Points	Monte Carlo Function	Area Estimate
100	sim(100)	3.2
500	sim(500)	3.12
1000	sim(1000)	3.188
5000	sim(5000)	3.1424
10000	sim(10000)	3.1328
50000	sim(50000)	3.1352

Page 194: #1

Use the middle-square method to generate

middle_square <- function(x, len) {

  sq <- x^2
  len_sq <- len * 2
  
  sq <- as.character(stringr::str_pad(as.character(sq), len_sq, pad = "0"))
   
  start <- len_sq/2 - len/2 + 1 
  end <- len_sq/2 + len/2  

  return (as.numeric(substring(sq, start, end)))
}

10 random numbers using \(x_0\) = 1009

result  <- 1009

for (i in 1:10) {
   result[i + 1] <- middle_square(result[i], 4)
}

print(result)

##  [1] 1009  180  324 1049 1004   80   64   40   16    2    0

20 random numbers using \(x_0\) = 653217

result  <- 653217

for (i in 1:20) {
   result[i + 1] <- middle_square(result[i], 6)
}

print(result)

##  [1] 653217 692449 485617 823870 761776 302674 611550 993402 847533 312186
## [11] 460098 690169 333248  54229 940784  74534 555317 376970 106380 316704
## [21] 301423

20 random numbers using \(x_0\) = 653217

result  <- 3043

for (i in 1:15) {
   result[i + 1] <- middle_square(result[i], 4)
}

print(result)

##  [1] 3043 2598 7496 1900 6100 2100 4100 8100 6100 2100 4100 8100 6100 2100
## [15] 4100 8100

Comment on the results of each sequence. Was there cycling? Did each sequence degenerate rapidly?

The sequence in (a) degenerated to 0 fairly quickly within 10 iterations. The sequence in (b) did not degenerate or appear to cycle after 20 iterations. Sequence (c) cycles after hitting 6100 on the 4th iteration and then repeats the cycle of 6100, 2100, 4100, 8100.

Page 199: #4

Given loaded dice according to the following distribution, use Monte Carlo simulation to simulate the sum of 300 rolls of two unfair dice.

Roll	Die 1	Die 2
1	0.1	0.3
2	0.1	0.1
3	0.2	0.2
4	0.3	0.1
5	0.2	0.05
6	0.1	0.25

# probabilities for die 1
prob1 <- c(0.1, 0.1, 0.2, 0.3, 0.2, 0.1)

# probabilities for die 2
prob2 <- c(0.3, 0.1, 0.2, 0.1, 0.05, 0.25)

# simulate the outcomes for die1
sim_die1 <- sample(1:6, size = 300, replace = TRUE, prob = prob1)

# simulate the outcomes for die2
sim_die2 <- sample(1:6, size = 300, replace = TRUE, prob = prob2)

results <- sim_die1 + sim_die2

hist(results, xlab="Sum", ylab="Frequency")

results <- setNames(data.frame(table(results)), c("Sum", "Frequency"))

Sum	Frequency
2	6
3	12
4	28
5	41
6	40
7	57
8	36
9	31
10	29
11	13
12	7

Page 201: #4

Horse Race - Construct and perform a Monte Carlo simulation of a horse race. You can be creative and use odds from thenewspaper, or simulate the Mathematical Derby with the entries and odds shown in the following table.

Entry’s name	Odds
Euler’s Folly	7-1
Leapin’ Leibniz	5-1
Newton Lobell	9-1
Count Cauchy	12-1
Pumped up Poisson	4-1
Loping L’Hopital	35-1
Steamin’ Stokes	15-1
Dancin’ Dantzig	4-1

Construct and perform a Monte Carlo simulation of 1000 horse races. Which horse won the most races? Which horse won the fewest races? Do these results surprise you? Provide the tallies of how many races each horse won with your output.

odds <- c(1/7, 1/5, 1/9, 1/12, 1/4, 1/35, 1/15, 1/4)

Note the odds do not sum to 1. This seems to be related to the house taking an additional percentage of each dollar bet.

sum(odds)

## [1] 1.13254

set.seed(113)

entries <- as.factor(c("Euler's Folly",   "Leapin' Leibniz",   "Newton Lobell", 
                      "Count Cauchy",    "Pumped up Poisson",  "Loping L'Hopital", 
                      "Steamin' Stokes", "Dancin' Dantzig"))

num_races <- 1000

sim <- sample(x = entries, size = num_races, prob = odds, replace = T)

results <-  data.frame(outcome = summary(sim))
results$entry <- rownames(results)

entry	outcome
Loping L’Hopital	26
Steamin’ Stokes	60
Count Cauchy	83
Newton Lobell	101
Euler’s Folly	141
Leapin’ Leibniz	171
Pumped up Poisson	205
Dancin’ Dantzig	213

In this particular simulation, Dancin’ Dantzig won the most races. However, with equal odds, it’s possible that Pumped up Poisson could win as a result of this simulation.

Loping L’Hopital has the fewest wins. The results are not suprising…these are consistent with the odds.

Page 211: #3

In many situations, the time \(T\) between deliveries and the order quantity \(Q\) is not fixed. Instead, an order is placed for a specific amount of gasoline. Depending on how many orders are placed in a give time interval, the time to fill an order varies. You have no reason to believe that the performance of the delivery operation will change. Therefore, you have examined records for the past 100 deliveries and found the following lag times, or extra days, required to fill your order:

Lag time (in days)	Number of occurrences
2	10
3	25
4	30
5	20
6	13
7	2
Total:	100

Construct a Monte Carlo simulation for the lag time submodel. If you have a handheld calculator or computer available, test your submodel by running 1000 trials and comparing the number of occurrences of the various lag times with the historical data.

df <- data.frame(lagtime =  as.factor(2:7), occurrences = c(10, 25, 30, 20, 13, 2))

df$prob  <- with(df, occurrences / sum(occurrences))

df$cumprob <- cumsum(df$prob)

sim <- function(iterations) {
  
  n <- iterations

  counter <- rep(0, 6)

  for (i in 1:n) { 
    #generate a random number between 0 and 1
    
    random <- runif(1)
    
    counter[1] <- counter[1] + (random >= 0    & random <= 0.1)
    counter[2] <- counter[2] + (random >  0.1  & random <= 0.35)
    counter[3] <- counter[3] + (random >  0.35 & random <= 0.65)
    counter[4] <- counter[4] + (random >  0.65 & random <= 0.85)
    counter[5] <- counter[5] + (random >  0.85 & random <= 0.98)
    counter[6] <- counter[6] + (random >  0.98 & random <= 1.0)
    
  }
  
  return (counter)
}


# run the simulation
n <- 10000
counter <- sim(10000)

# update the dataframe with the simulated results

df$simulated_number[1] <- counter[1] 
df$simulated_number[2] <- counter[2]
df$simulated_number[3] <- counter[3]
df$simulated_number[4] <- counter[4]
df$simulated_number[5] <- counter[5]
df$simulated_number[6] <- counter[6]


df$simulated_prob[1] <- counter[1]/n
df$simulated_prob[2] <- counter[2]/n
df$simulated_prob[3] <- counter[3]/n
df$simulated_prob[4] <- counter[4]/n
df$simulated_prob[5] <- counter[5]/n
df$simulated_prob[6] <- counter[6]/n

lagtime	occurrences	prob	cumprob	simulated_number	simulated_prob
2	10	0.10	0.10	901	0.0901
3	25	0.25	0.35	2546	0.2546
4	30	0.30	0.65	2944	0.2944
5	20	0.20	0.85	2029	0.2029
6	13	0.13	0.98	1395	0.1395
7	2	0.02	1.00	185	0.0185

Alternately, let’s construct the Monte Carlo simulation using the sample fuction.

simulation <- sample(x = df$lagtime, size = n, prob = df$prob, replace = T)

summary( simulation)

##    2    3    4    5    6    7 
## 1049 2485 2960 1953 1365  188

results <-  data.frame(simulated_number = summary( simulation))
results$lagtime <- rownames(results)
rownames(results) <-  NULL

results <- results[c(2, 1)]

lagtime	simulated_number
2	1049
3	2485
4	2960
5	1953
6	1365
7	188