First, lets prove B (or C) is a proper probability distribution function. For this to be true, we must verify that no value is less than zero, no value is greater than 1, and the sum of all the probabilities in the range (integral in calculus terms) is one.
So, let’s take an increasing amount of trials for each set and prove each of them. Note, by defnition of the problem, the limits are set between 0 and 1, so we know the first two rules to be true. We will still print the outputs though
make_sets <- function(x) {
set <- list()
for (i in 1:x) {
set[[i]] <- runif(i^2, 0, 1)
}
return(set)
}
set.seed(994564)
B <- make_sets(100)
set.seed(8834)
C <- make_sets(100)
First the minimum of all numbers generated:
min(unlist(B))
## [1] 3.441935e-06
Now the maximum:
max(unlist(B))
## [1] 0.9999946
And now, the sum of the probabilities
length(unique(unlist(B)))/length(unlist(B))
## [1] 0.9999586
Okay, so why the series of sets? So that i could make this graph which shows the trend of data as we take more samples, which is a visual example of the “law of large numbers” discussed in our reading.
hist(B[[5]])
hist(B[[10]])
hist(B[[15]])
hist(B[[20]])
hist(B[[50]])
hist(B[[100]])
As shown above, we can compute an approximation by throwing a lot of trials at the question
sum((unlist(B) + unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.1254175
Also, we can graph the simulations and determine the results
library(ggplot2)
binw <- 0.1
a_vals <- as.data.frame((unlist(B) + unlist(C)))
colnames(a_vals) <- c("sums")
(g <- ggplot(a_vals) + geom_histogram(aes(x = sums, ..density..), binwidth = binw)) +
scale_x_continuous(limits = c(0, 2), breaks = seq(0, 2, binw))
gb <- ggplot_build(g)
a_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(a_dens_vars * binw) - a_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.1266736
The resulting sums of the bins below the value of 0.5 show us that the probability is 0.1267. However, note due to the counting of graphs via bins, the first option is probably more accurate.
Also, we see that all the probabilities make a triangle up to (0.5, 0.5). By computing that area, we find the probability geometrically 0.5 x 0.5 * 1/2 = 0.1258.
Note, the brute force method is consistent with the reading this week, but i’m not sure if we should be trying to create integrals for these quesions yet since it has not been indicated as a method in the text; therefore, i’m setting out to find the probabilities using the more accurate brute force method and double checking the results with the graphical methods.
Brute force:
sum((unlist(B) * unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.8468686
Graphically:
b_vals <- as.data.frame((unlist(B) * unlist(C)))
colnames(b_vals) <- c("product")
(g <- ggplot(b_vals) + geom_histogram(aes(x = product, ..density..), binwidth = binw)) +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))
gb <- ggplot_build(g)
b_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(b_dens_vars * binw) - b_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.8443875
Brute force:
sum(abs((unlist(B) - unlist(C))) < 0.5)/length(unlist(B))
## [1] 0.7489789
Graphically:
c_vals <- as.data.frame(abs((unlist(B) - unlist(C))))
colnames(c_vals) <- c("abs_diff")
(g <- ggplot(c_vals) + geom_histogram(aes(x = abs_diff, ..density..), binwidth = binw)) +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))
gb <- ggplot_build(g)
c_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(c_dens_vars * binw) - c_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.7464829
Brute force:
sum(pmax(unlist(B), unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.2498478
Graphically:
d_vals <- as.data.frame(pmax(unlist(B), unlist(C)))
colnames(d_vals) <- c("max_val")
(g <- ggplot(d_vals) + geom_histogram(aes(x = max_val, ..density..), binwidth = binw)) +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))
gb <- ggplot_build(g)
d_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(d_dens_vars * binw) - d_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.2521708
Brute force:
sum(pmin(unlist(B), unlist(C)) < 0.5)/length(unlist(B))
## [1] 0.7505808
Graphically:
e_vals <- as.data.frame(pmin(unlist(B), unlist(C)))
colnames(e_vals) <- c("max_val")
(g <- ggplot(e_vals) + geom_histogram(aes(x = max_val, ..density..), binwidth = binw)) +
scale_x_continuous(limits = c(0, 1), breaks = seq(0, 1, binw))
gb <- ggplot_build(g)
e_dens_vars <- gb$data[[1]]$density[c(1:(0.5/binw + 1))]
# take half of last bin since it spans over our value of interest.
(prob_val <- sum(e_dens_vars * binw) - e_dens_vars[(0.5/binw + 1)]/2 * binw)
## [1] 0.7481129