1. Let X1, X2, . . . , Xn be n mutually independent random variables, each of which is uniformly distributed on the integers from 1 to k. Let Y denote the minimum of the Xi’s. Find the distribution of Y .
Answer:
The denominator of the distribution is pretty straightforward since it is all the choices for each evnt \({k}^{n}\).
For the numerator, let’s start with an example. Suppose we have \(k=3\) and \(n=3\), then we can easily compute the number of occurances with \(Y = 1\), which is a total of seven.
\[S = \left\{ 111, 112, 121, 122, 211, 212, 221 \right\}\]
How do I get these numbers? I’m just cycling through the options like so:
\[\sum{S} = 1\times k \times k + k \times 1 \times k + k \times + k \times k \times 1\]
But that’s not actually right, I’m removing repeats as I go; however I can express that by just removing one item from each set I iterated over:
\[\sum{S} = 1\times k \times k + \left(k-1\right) \times 1 \times k + \left(k-1\right) \times \left(k-1\right) \times 1 \]
If we look at the pattern, then we can see that I can express this operation more concisely:
\[\sum{S} = \sum _{ i = 0 }^{ n-1 }{ {k}^{n-1-i} \times {\left( k-1 \right)}^{i}}\]
But that is just for the value of 1 where we have \(k\) options. If the minimum were 2 then that would mean we would only have \(\left( k-1 \right)\) options to start with since none of the choices could contain 1. We can add that in by a simple substitution of \(\left(k-j \right)\) for \(k\):
\[\sum{S\left(Y\right)} = \sum _{ i = 0 }^{ n-1 }{ {\left(k-j \right)}^{n-1-i} \times {\left( k-j-1 \right)}^{i}} \\ where \quad j = \left( Y-1 \right)\]
\[\sum{S\left(Y\right)} = \sum _{ i = 0 }^{ n-1 }{ {\left(k-Y+1 \right)}^{n-1-i} \times {\left( k-Y \right)}^{i}}\]
So, let’s write a function that does this:
minimum_sum_set <- function(k, n, y) {
x <- 0
for (i in 0:(n - 1)) {
x <- x + (k - y + 1)^(n - 1 - i) * (k - y)^i
}
return(x)
}
So, if we run the formula we just made for all \(Y\) in a set determined by \(k, n\), we should get the exact number of combinations \({k}^{n}\). Let’s check:
k <- sample(2:20, 1)
n <- sample(2:10, 1)
sum_all_minimum_sets <- function(k, n) {
x <- 0
for (y in 1:k) {
x <- x + minimum_sum_set(k, n, y)
}
return(x)
}
all.equal(k^n, sum_all_minimum_sets(k, n))
## [1] TRUE
And now we can write a function that gets the probability distribution of \(Y\):
minimum_probability <- function(k, n) {
y_v <- c(rep(0, k))
p_v <- c(rep(0, k))
for (i in 1:k) {
y_v[i] <- i
p_v[i] <- minimum_sum_set(k, n, i)/k^n
}
return(data.frame(y_v, p_v))
}
Now plot:
library(ggplot2)
library(knitr)
library(kableExtra)
prob_df <- minimum_probability(k, n)
colnames(prob_df) <- c("Y values", "Probabilities")
ggplot(prob_df, aes(x = `Y values`, y = Probabilities)) + geom_bar(stat = "identity") +
ggtitle(paste("Probability Function for K =", k, "n =", n, sep = " ")) + theme(plot.title = element_text(hjust = 0.5))
kable(prob_df, "html") %>% kable_styling(c("striped", "bordered")) %>% add_header_above(c(`Probabilty Density Function Results` = 2))
| Y values | Probabilities |
|---|---|
| 1 | 0.2739750 |
| 2 | 0.2134029 |
| 3 | 0.1624943 |
| 4 | 0.1204089 |
| 5 | 0.0863065 |
| 6 | 0.0593467 |
| 7 | 0.0386891 |
| 8 | 0.0234936 |
| 9 | 0.0129197 |
| 10 | 0.0061272 |
| 11 | 0.0022758 |
| 12 | 0.0005252 |
| 13 | 0.0000350 |
# Sum of all K probabilities
sum(prob_df$Probabilities)
## [1] 1
To summarize, our final formula is as follows:
\[P\left(Y\right) = \frac{\sum _{ i = 0 }^{ n-1 }{ {\left(k-Y+1 \right)}^{n-1-i} \times {\left( k-Y \right)}^{i}} }{{k}^{n}}\]
2. Your organization owns a copier (future lawyers, etc.) or MRI (future doctors). This machine has a manufacturer’s expected lifetime of 10 years. This means that we expect one failure every ten years. (Include the probability statements and R Code for each part.).
a. What is the probability that the machine will fail after 8 years?. Provide also the expected value and standard deviation. Model as a geometric. (Hint: the probability is equivalent to not failing during the first 8 years..)
Answer:
A geometric distribution is described as the following:
\[P\left( X = x \right) = {q}^{x-1}p\]
\[E\left( X \right) = \frac{1}{p}\]
\[Var\left(X \right) = \frac{q}{{p}^{2}}\]
So we can use the expected value of ten years to find p:
\[E\left( X \right) = 10 = \frac{1}{p}\]
\[p = 0.1\]
To compute \(P\left( X > 8 \right)\) we can either add up all the probabilities of the discrete events up to 8 and subtract from 1:
\[P\left( X>8 \right) =1-\sum _{ x = 1 }^{ 9 }{ { q }^{ x-1 }p } \]
(round(1 - (0.9^(1 - 1) * 0.1 + 0.9^(2 - 1) * 0.1 + 0.9^(3 - 1) * 0.1 + 0.9^(4 -
1) * 0.1 + 0.9^(5 - 1) * 0.1 + 0.9^(6 - 1) * 0.1 + 0.9^(7 - 1) * 0.1 + +0.9^(8 -
1) * 0.1 + +0.9^(9 - 1) * 0.1), 3))
## [1] 0.387
Or we can just use the builtin function:
(round(pgeom(8, prob = 0.1, lower.tail = FALSE), 3))
## [1] 0.387
Or, we could have just used the cumulative distribution function:
(round((1 - (1 - (1 - 0.1)^9)), 3))
## [1] 0.387
As noted above, we determined the expected value to be ten years
The standard deviation can be computed as such:
\[sd(X)=\sqrt { Var\left( X \right) } =\sqrt { \frac { q }{ { p }^{ 2 } } } = \sqrt { \frac { 0.9 }{ { 0.1 }^{ 2 } } } \]
(round(sqrt(0.9/0.1^2), 3))
## [1] 9.487
b. What is the probability that the machine will fail after 8 years?. Provide also the expected value and standard deviation. Model as an exponential.
Answer:
A exponential cumulative distribution is described as the following:
\[P\left( X \le x \right) =1-{ e }^{ -\lambda x }\]
\[E\left( X \right) = \frac{1}{\lambda}\]
\[Var\left(X \right) = \frac{1}{{\lambda}^{2}}\]
As previously noted, the expected value is 10 years, which we use to compute \(\lambda\):
\[E\left( X \right) = 10 = \frac{1}{\lambda}\]
\[\lambda = 0.1\]
To compute \(P\left(X>8\right)\), we can use the cumulative distributin function:
(round(1 - (1 - exp(-0.1 * 8)), 3))
## [1] 0.449
builtin check:
pexp(8, 0.1, lower.tail = FALSE)
## [1] 0.449329
standard deviation:
(round(sqrt(1/0.1^2), 3))
## [1] 10
c. What is the probability that the machine will fail after 8 years?. Provide also the expected value and standard deviation. Model as a binomial. (Hint: 0 success in 8 years)
Answer:
Note, I realy struggled with this one, and i’m unsure if it is correct since I didn’t incorporate the hint into my answer, but here’s my best shot….
A binomial cumulative distribution is described as the following:
\[P\left( X\le x \right) =\sum _{ i=0 }^{ x }{ \left( \begin{matrix} n \\ i \end{matrix} \right) { p }^{ i }{ \left( 1-p \right) }^{ n-i } } \]
\[E\left( X \right) = np\]
\[Var\left(X \right) = np\left(1-p\right)\]
As previously noted, the expected value is 10 years, which we use to compute p:
\[E\left( X \right) = 10 = np\]
However, we have some trouble here, because n (the number of years we observe the machine status), is used to determine the expected value. So, let’s just pick 20 years and see what we get (\(p = 0.5\)).
To compute \(P\left(X>8\right)\), we can use the cumulative distributin function:
n = 20
p = 10/n
round(1 - (choose(n, 0) * p^0 * (1 - p)^(n - 0) + choose(n, 1) * p^1 * (1 - p)^(n -
1) + choose(n, 2) * p^2 * (1 - p)^(n - 2) + choose(n, 3) * p^3 * (1 - p)^(n -
3) + choose(n, 4) * p^4 * (1 - p)^(n - 4) + choose(n, 5) * p^5 * (1 - p)^(n -
5) + choose(n, 6) * p^6 * (1 - p)^(n - 6) + choose(n, 7) * p^7 * (1 - p)^(n -
7) + choose(n, 8) * p^8 * (1 - p)^(n - 8)), 3)
## [1] 0.748
Well, that’s okay but since there isnt’ a definite maximum number of years, why don’t we just pick something really big to encompass all options.
n = 1e+07
binom_func <- function(n) {
p = 10/n
round(1 - (choose(n, 0) * p^0 * (1 - p)^(n - 0) + choose(n, 1) * p^1 * (1 - p)^(n -
1) + choose(n, 2) * p^2 * (1 - p)^(n - 2) + choose(n, 3) * p^3 * (1 - p)^(n -
3) + choose(n, 4) * p^4 * (1 - p)^(n - 4) + choose(n, 5) * p^5 * (1 - p)^(n -
5) + choose(n, 6) * p^6 * (1 - p)^(n - 6) + choose(n, 7) * p^7 * (1 - p)^(n -
7) + choose(n, 8) * p^8 * (1 - p)^(n - 8)), 3)
}
binom_func(n)
## [1] 0.667
builtin check:
round(pbinom(8, prob = 10/n, size = n, lower.tail = FALSE), 3)
## [1] 0.667
standard deviation:
n * (10/n) * (1 - (10/n))
## [1] 9.99999
well, that is an odd way of solving this problem so I thought I would also plot \(P\left(X>8\right)\) for given choices of n:
binom_df <- data.frame(val = seq(10, 10000))
binom_df$prob <- binom_func(binom_df$val)
ggplot(binom_df, aes(x = val, y = prob)) + geom_line() + coord_cartesian(xlim = c(10,
100)) + labs(x = "Values of n", y = "Probability")
d. What is the probability that the machine will fail after 8 years?. Provide also the expected value and standard deviation. Model as a Poisson.
Answer:
The cumulative poisson distribution function is:
\[P\left( X\le x \right) ={e}^{-\lambda}\sum _{ i=0 }^{ x }{ \frac{{\lambda}^{i}}{i!} } \]
Where the expected value and variance is:
\[\lambda = E\left(X\right) = Var\left(X\right) = 10\]
\[sd\left(X\right)=\sqrt{10}\] So, the probability using this distribution is:
round(1 - exp(-10) * (10^0/factorial(0) + 10^1/factorial(1) + 10^2/factorial(2) +
10^3/factorial(3) + 10^4/factorial(4) + 10^5/factorial(5) + 10^6/factorial(6) +
10^7/factorial(7) + 10^8/factorial(8)), 3)
## [1] 0.667
builtin check:
(round(ppois(8, lambda = 10, lower.tail = FALSE), 3))
## [1] 0.667