HW5_605

jbrnbrg

September 25, 2017

library(tidyverse)
library(kableExtra)

HW5:

Choose independently two numbers \(B\) and \(C\) at random from the interval \([0, 1]\) with uniform density. Prove that \(B\) and \(C\) are proper probability distributions. Note that the point \((X_B,Y_C)\) is then chosen at random in the unit square. Find the probability that…

JB It’s a proper probability distributions because it’s it’s non-negative for all real values \([0, 1]\) and the area of the unit square sums to one.

a. \(P[B+C < \frac{1}{2}]\):

JB If I consider the the shape created by the distribution \(B+C\), it’s a triangle with whose hypotenuse is 2, bound by the function:

\(f_(x)= \begin{cases} x, & 0 \leq x < 1 \\ 2-x, & 1 \leq x < 2 \end{cases}\)

Since I’m looking for the area less than \(\frac{1}{2}\), I integrate: \(\int_0^{\frac{1}{2}} f(x)dx=\int_0^{\frac{1}{2}}xdx=\frac{1}{8}\)

b. \(P[BC < \frac{1}{2}]\):

JB If I consider the function that bounds the area covered by \(BC\) from \([0,1]\), I get:

\(f_(x)= \begin{cases} -ln(x), & 0 \leq x < 1 \\ 0, & otherwise \end{cases}\)

Since I’m looking for the area less than \(\frac{1}{2}\), I integrate: \(\int_0^{\frac{1}{2}} f(x)dx=\int_0^{\frac{1}{2}}-ln(x)dx \approx 0.8465\)

c. \(P[|B-C| < \frac{1}{2}]\):

JB Since all values of \(B\) and \(C\) will fall on the unit square \([0,1] \times [0,1]\). Now for a fixed value of \(B\), I need to find region of the square that has values \(C\) such that \(|B-C|< \frac{1}{2}\). This can be described as:

\(\int_0^{\frac{1}{2}} f(x)dx=2\int_0^{\frac{1}{2}}(1-x)dx =\frac{3}{4}\)

d. \(P[max(B,C) < \frac{1}{2}]\):

JB Note I completed part e before this problem so I it’s based, in part, on that solution since \(min(X_1,...,X_n) = 1-max(X_1,...,X_n)\), I get: \(\int_0^{\frac{1}{2}} f(x)dx=2\int_0^{\frac{1}{2}}xdx =\frac{1}{4}\)

e. \(P[min(B,C) < \frac{1}{2}]\):

JB Let \(X = (B,C)\). Then \(P[min(X)<x]=1-P[min(X)>x]=f(x)=2(1-x)\) so the density function is given by: \(f(x)=2(1-x)\).

\(\int_0^{\frac{1}{2}} f(x)dx=2\int_0^{\frac{1}{2}}(1-x)dx =\frac{3}{4}\)

Notes:

Simulated and Graphical Representations:

Monte Carlo Sim:

As mentioned, \(B\) and \(C\) are each uniform from \([0,1]\). Let the \(X_B\) and \(Y_C\) be random variables describing the selections that lead to a point chosen at random on the unit square. Since the total area of the square is 1, the probability of the point falling in a specific subset \(E\) of the unit square should be equal to its area. Thus we can estimate the probability that point chosen at random from the square falls in specific subset by estimating the area.
To complete this, I’ll run Monte Carlo Simulations for each question, a through e, and return the results in a table. These amounts won’t be exact, but they should be close if I use sufficiently high \(n\); in this case \(20,000\) should do it:

unit_square <- function(n){
  # function creates a dataset of size n of to answer each 
  # question in this homework assigment via Monte Carlo Simulation
  rand_B <- runif(1)
  rand_C <- runif(1)
  
  rt <- data.frame(val_B = c(rand_B), val_C = c(rand_C), stringsAsFactors = F)
  
  if(n == 1){
    return(rt)
  }else{
    for(i in 2:n){
      rt <- rbind(rt, data.frame(val_B = c(runif(1)),
                                val_C = c(runif(1))))
    }
    
  }
  clean_results <- rt
  # get raw results to plot
  st <- clean_results %>% 
    mutate(`B+C` = val_B + val_C,                               # a
           BC = val_B * val_C,                                  # b
           `AbsB-C` = abs(val_B - val_C),                       # c
           `maxB.C` = pmax(val_B, val_C),                       # d
           `minB.C` = pmin(val_B, val_C))                       # e
  
  # get binary counts of each to est probability 
  pt <- clean_results %>% 
    mutate(`B+C` = ifelse(val_B + val_C < 1/2, 1, 0),           # a
           BC = ifelse(val_B * val_C < 1/2, 1, 0),              # b
           `AbsB-C` = ifelse(abs(val_B - val_C) < 1/2, 1, 0),   # c
           `maxB.C` = ifelse(pmax(val_B, val_C) < 1/2, 1, 0),   # d
           `minB.C` = ifelse(pmin(val_B, val_C) < 1/2, 1, 0))   # e
  
  return(list("pt" = pt, "st" = st))
}

n <- 10000
results <- unit_square(n)

# Produce a monte carlo sim to get theoretical values:  
zz <- results$pt %>% 
  select(`B+C`, BC, `AbsB-C`, `maxB.C`, `minB.C`) %>% 
  summarise_each(funs(sum))/n

# Create plot data for graphical solution
my_order <- c("B+C", "BC", "AbsB-C", "maxB.C", "minB.C")

my_plots <- results$st %>% 
  select(`B+C`, BC, `AbsB-C`, `maxB.C`, `minB.C`) %>% 
  gather()

my_plots$key <-  ordered(my_plots$key, my_order)

#my_plots <- my_plots %>% mutate(value = value/n)

As shown in the results below, the values provided make intuitive sense. E.g. for \(P[B+C < \frac{1}{2}]\), it makes sense that the sum of two random numbers between zero and one would have a lower probability of being less than \(\frac{1}{2}\) than the product of two random numbers between zero and one: \(P[BC < \frac{1}{2}]\).

simulated probabilities each is less than \(1/2\)
\(B+C\) \(BC\) \(|B-C|\) \(max(B,C)\) \(min(B,C)\)
0.118 0.846 0.744 0.239 0.751

Graphical Representation:

To drive this concept further, let the \(X_B\) and \(Y_C\) be random variables describing the selections that lead to a point chosen at random on the unit square. Then if I make histograms each of the problems, I should be able to confirm these results graphically:

fancy_n <- c("$B+C$", "$BC$",
             "$|B-C|$", "$max(B,C)$","$min(B,C)$")
hw_plots <- my_plots %>% 
  ggplot(aes(value), alpha = .2) +
  facet_grid(~key, scales = "free") +
  geom_histogram(color = "#DDDDDD", alpha = .5, binwidth = .1) +
  geom_vline(aes(xintercept=.5), 
             color="#7987AC", 
             linetype="dashed", 
             size=1.5, alpha=.9) +
  theme(axis.title.x=element_blank(),
        axis.title.y=element_blank(),
        legend.position="none")
#
hw_plots

The histograms above show a blue, dashed line at the \(\frac{1}{2}=.5\) point for each. Because they’re histograms, the area beneath each equals 1.

If we look to the left of that line in each of the plots, I can, by inspection, confirm that the result from the Monte Carlo experiment provided good estimations. E.g in the plot for \(P[B+C < \frac{1}{2}]\) it looks like a very small portion of the unit triangle is below the \(\frac{1}{2}\) mark while for \(P[BC < \frac{1}{2}]\) it’s clear at least 85% is below the \(\frac{1}{2}\) mark.

Notes, References:

If \(X_B\) and \(Y_C\) take values in \([0,1]\), then the the pair \((X_B,Y_C)\) takes values in the product \([0,1] \times [0,1]\). The joint probability density function (joint pdf) of \(X_B\) and \(Y_C\) is a function \(f(x,y)\) giving the probability density at \((x,y)\). That is, the probability that \((X_B,Y_C)\) is in a small square of width \(dx\) and height \(dy\) around \((x,y)\) is \(f(x,y)dxdy\). This function must satisfy \(0\leq f(x,y)\) and the total probability must equal 1 as expressed:

\(\int_0^1 \int_0^1 f(x,y)dxdy=1\)

joint probability density

joint probability density

A good explanation of the calculus-based explanation can be found here - MIT’s openCourseWare on Joint Distributions, Independence.