Grando 5 Homework

Choose independently two numbers B and C at random from the interval [0, 1] with uniform density. Prove that B and C are proper probability distributions. Note that the point (B,C) is then chosen at random in the unit square. Find the probability that

First, lets prove B (or C) is a proper probability distribution function. For this to be true, we must verify that no value is less than zero, no value is greater than 1, and the sum of all the probabilities in the range (integral in calculus terms) is one.

So, let’s take an increasing amount of trials for each set and prove each of them. Note, by defnition of the problem, the limits are set between 0 and 1, so we know the first two rules to be true. We will still print the outputs though

make_sets <- function(x) {
    set <- list()
    for (i in 1:x) {
        set[[i]] <- runif(i^2, 0, 1)
    }
    return(set)
}
set.seed(994564)
B <- make_sets(100)
set.seed(8834)
C <- make_sets(100)

First the minimum of all numbers generated:

min(unlist(B))
## [1] 3.441935e-06

Now the maximum:

max(unlist(B))
## [1] 0.9999946

And now, the sum of the probabilities

length(unique(unlist(B)))/length(unlist(B))
## [1] 0.9999586

Okay, so why the series of sets? So that i could make this graph which shows the trend of data as we take more samples, which is a visual example of the “law of large numbers” discussed in our reading.

hist(B[[5]])

hist(B[[10]])

hist(B[[15]])

hist(B[[20]])

hist(B[[50]])

hist(B[[100]])

  1. B + C < 1/2.

As shown above, we can compute an approximation by throwing a lot of trials at the question

sum((unlist(B) + unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.1254175

Also, we can graph the simulations and determine the results

library(ggplot2)
binw <- 0.1
a_vals <- as.data.frame((unlist(B) + unlist(C)))
colnames(a_vals) <- c("sums")
(g <- ggplot(a_vals) + geom_histogram(aes(x = sums, ..density..), binwidth = binw)) + 
    scale_x_continuous(limits = c(0, 2), breaks = seq(0, 2, binw))

gb <- ggplot_build(g)
a_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(a_dens_vars * binw) - a_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.1266736

The resulting sums of the bins below the value of 0.5 show us that the probability is 0.1267. However, note due to the counting of graphs via bins, the first option is probably more accurate.

Also, we see that all the probabilities make a triangle up to (0.5, 0.5). By computing that area, we find the probability geometrically 0.5 x 0.5 * 1/2 = 0.1258.

Note, the brute force method is consistent with the reading this week, but i’m not sure if we should be trying to create integrals for these quesions yet since it has not been indicated as a method in the text; therefore, i’m setting out to find the probabilities using the more accurate brute force method and double checking the results with the graphical methods.

  1. BC < 1/2.

Brute force:

sum((unlist(B) * unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.8468686

Graphically:

b_vals <- as.data.frame((unlist(B) * unlist(C)))
colnames(b_vals) <- c("product")
(g <- ggplot(b_vals) + geom_histogram(aes(x = product, ..density..), binwidth = binw)) + 
    scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))

gb <- ggplot_build(g)
b_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(b_dens_vars * binw) - b_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.8443875

  1. |B − C| < 1/2.

Brute force:

sum(abs((unlist(B) - unlist(C))) < 0.5)/length(unlist(B))
## [1] 0.7489789

Graphically:

c_vals <- as.data.frame(abs((unlist(B) - unlist(C))))
colnames(c_vals) <- c("abs_diff")
(g <- ggplot(c_vals) + geom_histogram(aes(x = abs_diff, ..density..), binwidth = binw)) + 
    scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))

gb <- ggplot_build(g)
c_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(c_dens_vars * binw) - c_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.7464829

  1. max{B,C} < 1/2.

Brute force:

sum(pmax(unlist(B), unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.2498478

Graphically:

d_vals <- as.data.frame(pmax(unlist(B), unlist(C)))
colnames(d_vals) <- c("max_val")
(g <- ggplot(d_vals) + geom_histogram(aes(x = max_val, ..density..), binwidth = binw)) + 
    scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))

gb <- ggplot_build(g)
d_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(d_dens_vars * binw) - d_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.2521708

  1. min{B,C} < 1/2.

Brute force:

sum(pmin(unlist(B), unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.7505808

Graphically:

e_vals <- as.data.frame(pmin(unlist(B), unlist(C)))
colnames(e_vals) <- c("max_val")
(g <- ggplot(e_vals) + geom_histogram(aes(x = max_val, ..density..), binwidth = binw)) + 
    scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))

gb <- ggplot_build(g)
e_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(e_dens_vars * binw) - e_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.7481129