Problem 1

Considering the probability distribution associated with rolling 3 fair dice labelled d1, d2 and d3, calculate the probability of the following:

Compute the probability that the sum of the dice is greater than 12 and less than 18.
Compute the probability that the sum is even.
Compute the probability that the mean is exactly 4.

The problem is from the following stackoverflow post: http://stackoverflow.com/questions/40253035/how-to-compute-the-probability-with-rolling-3-fair-dice-for-some-specific-condit/40254805#40254805

Theroetical Solution with Classical Definition of Probability

Total number of points in the Sample Space that are mututally exclusive, exhaustive and equally likely = \(6^3=216\).
Number of cases favorable to the events in (1) is \(55\). Here are all the \(55\) cases.

#[1] "1 6 6"
#[1] "2 5 6"
#[1] "2 6 5"
#[1] "2 6 6"
#[1] "3 4 6"
#[1] "3 5 5"
#[1] "3 5 6"
#[1] "3 6 4"
#[1] "3 6 5"
#[1] "3 6 6"
#[1] "4 3 6"
#[1] "4 4 5"
#[1] "4 4 6"
#[1] "4 5 4"
#[1] "4 5 5"
#[1] "4 5 6"
#[1] "4 6 3"
#[1] "4 6 4"
#[1] "4 6 5"
#[1] "4 6 6"
#[1] "5 2 6"
#[1] "5 3 5"
#[1] "5 3 6"
#[1] "5 4 4"
#[1] "5 4 5"
#[1] "5 4 6"
#[1] "5 5 3"
#[1] "5 5 4"
#[1] "5 5 5"
#[1] "5 5 6"
#[1] "5 6 2"
#[1] "5 6 3"
#[1] "5 6 4"
#[1] "5 6 5"
#[1] "5 6 6"
#[1] "6 1 6"
#[1] "6 2 5"
#[1] "6 2 6"
#[1] "6 3 4"
#[1] "6 3 5"
#[1] "6 3 6"
#[1] "6 4 3"
#[1] "6 4 4"
#[1] "6 4 5"
#[1] "6 4 6"
#[1] "6 5 2"
#[1] "6 5 3"
#[1] "6 5 4"
#[1] "6 5 5"
#[1] "6 5 6"
#[1] "6 6 1"
#[1] "6 6 2"
#[1] "6 6 3"
#[1] "6 6 4"
#[1] "6 6 5"

Number of cases favorable to the events in (2) is \(108\).
Number of cases favorable to the events in (3) is \(25\).
Hence, by the classical definition of probability, the corresponding probabilities are \(55/216=0.2546296\), \(108/216=0.5\) and \(25/216=0.1157407\) respectively.

# compute theroetical probs
actual.probs <- rep(0,3)
# all points in the sample space
for (d1 in 1:6) 
  for (d2 in 1:6)
    for (d3 in 1:6) {
      s <- d1 + d2 + d3
      actual.probs[1] <- actual.probs[1] + ((s > 12) && (s < 18))
      actual.probs[2] <- actual.probs[2] + ((s %% 2) == 0)
      actual.probs[3] <- actual.probs[3] + ((s / 3) == 4)
    }
actual.probs <- actual.probs / 6^3 # theoretical probs

Solution with Simulation

Now let’s try to obtain the solution using simulation.
Let’s simulate an experiment of throwing 3 dice together for a number of trials.
Then let’s repeat the experiment for a number of times

# compute simulated probs
num.repeat <- 100
num.trials <- 10^3
sim.probs <- vector("list", 3)
for (j in 1:num.repeat) {
  res <- .rowMeans(replicate(num.trials, {
    #dice <- as.integer(runif(3,1,6)) # does not work since continuous distribution truncated
    dice <- sample(1:6, 3, replace=TRUE)
    s <- sum(dice)
    p1 <- ((s > 12) && (s < 18))
    p2 <- (s %% 2 == 0)
    p3 <- ((s/3) == 4)
    c(p1, p2, p3)
  }), 3, num.trials)
  #print(res)
  for (i in 1:3) {
    sim.probs[[i]] <- c(sim.probs[[i]], res[i])
  }
}

Here is a visual comparison between the simulated and the actual (theoretical) probabilities for each of the 3 cases, the theoretical probabilities are represented as dotted lines. As can be seen, in general, as the number of trials increase, the simulated probability tends to more accurately estimate the theoretical probabilities.

Now let’s find the impact of the number of trials on the mean and absolute difference from the theoretical probabities w.r.t. the computed probabilities with simulation.

Problem 2

Let \(\Theta\) be an unknown constant. Let \(W_1,W_2,\ldots,W_n\) be independent exponential random variables each with parameter \(1\). Let \(X_i=\Theta+W_i\). It can be shown that \(\hat{\Theta}_{ML}=min_i\{X_i\}\). Now, You have been asked to construct a confidence interval of the particular form \([\hat{\Theta}-c,\hat{\Theta}]\), where \(\hat{\Theta}=min_i\{X_i\}\) and \(c\) is a constant that we need to choose. For \(n=10\), how should the constant be chosen so that we have a \(95\%\) confidence interval? (Give the smallest possible value of \(c\).) Your answer should be accurate to 3 decimal places.

This problem appeared in the final exam of the edX course MITx: 6.041x Introduction to Probability - The Science of Uncertainty.

n <- 10
th <- 100
exp.samp <- function() {
  w <- rexp(n, 1)
  x <- th + w
  th.hat <- min(x)  
  th.hat
}
ntrial <- 10^6
th.hat <- replicate(ntrial, exp.samp())
#c <- qexp(0.95) / n
#print(c)
c <- quantile(th.hat - th, probs=0.95) 
print(c)

##      95% 
## 0.300221

print(mean(th.hat - c <= th)) # & (th <= th.hat))

## [1] 0.95

hist(th.hat - th, xlab=expression(hat(theta)-theta), main=expression(hat(theta)-theta))

Problem 3

Find the area of the following shaeded regions with Monte-Carlo Simulation.

Let’s use Monte-Carlo-Integration to compute the area of the above green shaded region and then compare with the actural area enclosed by the curves.

n <- 10^6
df <- data.frame(x = runif(n), y = runif(n))
indices <- which(((df$y)^2<=df$x)&((df$x)^2<=df$y))
print(paste('Area with Monte-Carlo Integration = ', length(indices) / n))

## [1] "Area with Monte-Carlo Integration =  0.333468"

res <- integrate(function(x) sqrt(x) - x^2,0,1)
print(paste('Actual Area = ', res$value, ' with absolute error', res$abs.err))

## [1] "Actual Area =  0.333333408167678  with absolute error 7.80933322530327e-05"

Again let’s use Monte-Carlo-Integration to compute the area under the below sinc function in between the vertical lines (-3,3) and x axis then compare with the actural area. The sinc function is defined as follows, by removint the discontinuity at \(0\).

\(f(x)=\left\{ \begin{array}{ll} 1 & x = 0 \\ \frac{sin(x)}{x} & otherwise \end{array} \right.\)

n <- 10^6
xmax <- 3
df <- data.frame(x = runif(n,-xmax,xmax), y = runif(n,-0.25,1))
indices <- which((df$y>0)&(abs(df$x)<=xmax)&(df$y<eq(df$x)))
print(paste('Area with Monte-Carlo Integration = ', 6*1.25*length(indices) / n))

## [1] "Area with Monte-Carlo Integration =  3.698565"

res <- integrate(eq,-3,3)
print(paste('Actual Area = ', res$value, ' with absolute error', res$abs.err))

## [1] "Actual Area =  3.69730505599894  with absolute error 4.10483320223302e-14"

Again let’s use Monte-Carlo-Integration to compute the area under the Standard Normal Curve between the vertical lines and then compare the values with those obtained from the actual cdf \(\Phi\) values availble from the standard normal table.

\(f(z)= \frac{1}{\sqrt{2\pi}}e^{\frac{-z^2}{2}}\)

Again let’s use Monte-Carlo-Integration to compute the area under the pdf of the Student’s t-distribution with degrees of freedom \(\nu=5\) between the vertical lines and then compare the values with those obtained from the actual cdf values availble from the t-table.

\(f(t)= \frac{\Gamma \left(\frac{\nu+1}{2} \right) }{\sqrt{\nu \pi}\Gamma \left(\frac{\nu}{2} \right)}\left( 1+\frac{t^2}{\nu} \right) ^ {\frac{\nu+1}{2}} \)

Solving Probability Problems with Simulation

Sandipan Dey

26 October 2016

Problem 1

Theroetical Solution with Classical Definition of Probability

Solution with Simulation

Problem 2

Problem 3