Probability Density 1: \(X\)~\(\Gamma\)
Using R, generate a random variable \(X\) that has 10,000 random \(\Gamma\) PDF values. A \(\Gamma\) PDF is completely described by \(n\) (a size parameter) and \(\lambda\) (a shape parameter). Choose any \(n\) greater 3 and an expected value \(\lambda\) between 2 and 10 (you choose).
Probability Density 2: \(Y\)~Sum of Exponentials
Then generate 10,000 observations from the sum of \(n\) exponential PDFs with rate/shape
parameter \(\lambda\). The \(n\) and \(\lambda\) must be the same as in the
previous case. (e.g.,
mysum=rexp(10000,lambda)+rexp(10000,lambda)+..)
Probability Density 3: \(Z\)~Exponential
Then generate 10,000 observations from a single exponential PDF with rate/shape parameter \(\lambda\).
NOTE: The Gamma distribution is quite common in data science. For example, it is used to model failures for multiple processes when each of those processes has the same failure rate. The exponential is used for constant failure rates, service times, etc.
The three distributions, \(X, Y\), and \(Z\), are created below empirically using parameters \(n=6\) and \(\lambda=2\):
set.seed(seed_num)
samples <- 10000
n <- 6
l <- 2
X <- rgamma(samples, shape=n, rate=l)
Y <- rexp(samples, rate=l)
for(i in 1:(n-1)){
Y <- Y + rexp(samples, rate=l)
}
Z <- rexp(samples, rate=l)
Calculate the empirical expected value (means) and variances of all three PDFs.
The code below calculates and displays the means and standard deviations of the three probability distributions:
pdfs <- cbind(X, Y, Z)
means <- apply(pdfs, 2, mean)
vars <- apply(pdfs, 2, var)
mean_var <- cbind(means, vars)
rownames(mean_var) <- c('X', 'Y', 'Z')
mean_var
## means vars
## X 2.9977346 1.5339061
## Y 2.9790812 1.4789938
## Z 0.5044382 0.2523357
Using calculus, calculate the expected value and variance of the Gamma pdf (\(X\)).
Using the moment generating function for exponentials, calculate the expected value of the single exponential (\(Z\)) and the sum of exponentials (\(Y\)).
The Gamma probability distribution \(g_X\) is given as the following:
\[ \begin{equation} g_X(x, n, \lambda) = \begin{cases} \frac{\lambda^n}{\Gamma(n)}x^{n-1}e^{-\lambda x} & \text{if } 0 \leq x < \infty\\ 0 & \text{otherwise } \end{cases} \end{equation} \]
where \(n\) and \(\lambda\) are the parameters described above, and \(\Gamma(n) = (n-1)!\) is the Gamma function. The expectation value \(E[X]\) of a probability distribution function \(f(x)\) is defined as \(E[X] = \int x \cdot f(x) \ dx\). Thus, to determine the expectation value of \(g_X\), we can solve the following:
\[ \begin{equation} E[X] = \int_0^{\infty} \frac{\lambda^n}{\Gamma(n)}x^{n-1}e^{-\lambda x} \cdot x \ dx \end{equation} \]
Because this is a difficult to solve integral using brute force, we can use some algebraic manipulation to make it much simpler. The first step involves pulling a out a \(\frac{1}{\lambda}\):
\[ \begin{equation} E[X] = \int_0^{\infty} \frac{1}{\lambda}\frac{\lambda^{n+1}}{\Gamma(n)}x^{n-1}e^{-\lambda x} \cdot x \ dx \end{equation} \]
Next, we use that fact that \(\Gamma(n) = \frac{\Gamma(n+1)}{n}\) to pull out a \(n\):
\[ \begin{equation} E[X] = \int_0^{\infty} \frac{n}{\lambda}\frac{\lambda^{n+1}}{\Gamma(n+1)}x^{n-1}e^{-\lambda x} \cdot x \ dx \end{equation} \]
Next, we combine the \(x\) terms, and pull out the constant \(\frac{n}{\lambda}\):
\[ \begin{equation} E[X] = \frac{n}{\lambda} \int_0^{\infty} \frac{\lambda^{n+1}}{\Gamma(n+1)}x^{(n+1)-1}e^{-\lambda x} \ dx \end{equation} \]
While this method of combining exponents in the \(x\) term might seem strange, the reason for it is due to the fact that the integrand now represents its own Gamma probability distribution function with parameters \(n+1\) and \(\lambda\). Thus, we can substitute the entire integrand as follows:
\[ \begin{equation} E[X] =\frac{n}{\lambda} \int_0^{\infty} g_X(x, n+1, \lambda) \ dx \end{equation} \]
Since the integral of any probability distribution over its range of values is equal to 1, we can conclude for the Gamma probability distribution that:
\[ \begin{equation} E[X] = \frac{n}{\lambda} \end{equation} \]
Next, we can find the variance \(\sigma^2\) of the Gamma probability distribution using the fact that \(\sigma^2 = E[X^2] - E[X]^2\). However, we must first find \(E[X^2]\), which, given a probability density function \(f(x)\) is defined as: \(E[X^2] = \int f(x) \cdot x^2\):
\[ \begin{equation} E[X^2] = \int_0^{\infty} \frac{\lambda^n}{\Gamma(n)}x^{n-1}e^{-\lambda x} \cdot x^2 \ dx \end{equation} \]
The steps of solving this integral are quite similar to the ones shown above:
\[ \begin{align} E[X] &= \int_0^{\infty} \frac{1}{\lambda^2}\frac{\lambda^{n-2}}{\Gamma(n)}x^{n-1}e^{-\lambda x} \cdot x^2 \ dx \\ E[X] &= \int_0^{\infty} \frac{1}{\lambda^2}\frac{\lambda^{n-2}}{\Gamma(n)}x^{n-1}e^{-\lambda x} \cdot x^2 \ dx \\ E[X] &= \int_0^{\infty} \frac{n(n+1)}{\lambda^2}\frac{\lambda^{n-2}}{\Gamma(n+2)}x^{n-1}e^{-\lambda x} \cdot x^2 \ dx \\ E[X] &= \frac{n(n+1)}{\lambda^2}\int_0^{\infty} \frac{\lambda^{n-2}}{\Gamma(n+2)}x^{(n+2)-1}e^{-\lambda x} \ dx \\ E[X] &= \frac{n(n+1)}{\lambda^2}\int_0^{\infty} g_X(x, n+2, \lambda) \ dx \\ E[X] &= \frac{n(n+1)}{\lambda^2} \end{align} \] We can now compute the variance \(\sigma^2\) as follows:
\[ \begin{align} \sigma^2 &= E[X^2] - E[X]^2 \\ \sigma^2 &= \frac{n^2+n}{\lambda^2} - \frac{n^2}{\lambda^2} \\ \sigma^2 &= \frac{n}{\lambda^2} \end{align} \]
Using the values of \(n\) and \(\lambda\) chosen in part A, we get that \(E[X] = \frac{n}{\lambda} = 6/2 = 3\) and \(\sigma^2 = \frac{n}{\lambda^2} = \frac{6}{4}=1.5\). These are very close to the empirically calculated values for the \(X\) probability distribution in part A.
print(mean_var[1,])
## means vars
## 2.997735 1.533906
The exponential probability distribution is defined as follows:
\[ \begin{equation} f_X(x) = \begin{cases} \lambda e^{-\lambda x} & \text{if } 0 \leq x < \infty\\ 0 & \text{otherwise } \end{cases} \end{equation} \]
In homework 9, we defined the moment generating function \(M_X(t)\) of \(f_X(x)\) to be:
\[ \begin{align} M_X(t) &= \frac{\lambda}{\lambda - t} \end{align} \]
Given that \(E[X] = M'_X(0)\), we can use this moment generating function to determine the expected value of the exponential distribution. To start, we take the first derivative of \(M_X(t)\):
\[ \begin{align} M'_X(t) &= \frac{d}{dt}\left( \frac{\lambda}{\lambda - t} \right) \\ M'_X(t) &= \frac{\lambda}{(\lambda-t)^2} \\ \end{align} \]
We can now plug 0 into the above to find \(E[X]\):
\[ \begin{align} E[X] &= M'_X(0) \\ E[X] &= \frac{\lambda}{(\lambda-0)^2} \\ E[X] &= \frac{\lambda}{\lambda^2} \\ E[X] &= \frac{1}{\lambda} \\ \end{align} \]
We see indeed that the that mean of the \(Z\) distribution from A is very close to \(\frac{1}{\lambda} = \frac{1}{2} = 0.5\):
print(mean_var[3,1])
## [1] 0.5044382
The distribution \(Y\) from part A is a sum of \(n\) exponential random variables: \(Y = \sum_1^n Z\). Because the expected value of the sum of several random variables is equal to the sum of their expectations (i.e. \(E[U+W] = E[W] + E[U]\)), the expected value of \(Y\) can be written as:
\[ \begin{align} E[Y] &= \sum_1^n E[Z] \\ E[Y] &= \sum_1^n \frac{1}{\lambda} \\ E[Y] &= \frac{n}{\lambda} \end{align} \]
Note that this is exactly the same expression that represents the expected value of a Gamma probability distribution. Thus, a Gamma probability distribution with size parameter \(n\) has exactly the same expected value as a probability distribution that represents the sum of \(n\) exponential random variables, so long as all the variables have the same rate parameter \(\lambda\). Looking at the mean of \(Y\) from part A, we see that is indeed very close to \(\frac{n}{\lambda} = \frac{6}{2}=3\):
print(mean_var[2,1])
## [1] 2.979081
For PDF \(Z\) (the exponential), calculate empirically the probability \(P(Z > \lambda \ | \ Z > \lambda/2)\). Then evaluate through calculus whether the memoryless property holds.
Using Baye’s rule, the conditional probability \(P(A | B)\) is defined as:
\[ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} \]
In this case, \(P(B|A) = P(Z > \lambda/2 \ |\ Z > \lambda) = 1\), since if \(Z\) is greater than \(\lambda\) it must also be greater than \(\lambda / 2\). Thus, we can easily compute \(P(A | B)\) by dividing \(P(A)\) by \(P(B)\). This is done in the cell below:
P_A <- length(which(Z > l)) / samples
P_B <- length(which(Z > l/2)) / samples
P <- P_A / P_B
# alternative method using subsets to check above value of P
check <- length(which(Z[Z>l/2] > l)) / length(Z[Z>l/2])
if(P == check){
print(P)
}
## [1] 0.1341552
Thus, \(P(Z > \lambda \ | \ Z > \lambda/2) \approx 0.133\).
Given a continuous random variable \(X\), its associated probability distribution is considered “memoryless” if \(P(X > r + s\ |\ X > r) = P(X >s).\) For this example, we can confirm that the exponential distribution adheres to the this memoryless property if \(P(Z > \frac{\lambda}{2} + \frac{\lambda}{2}\ |\ Z > \frac{\lambda}{2}) = P(Z > \frac{\lambda}{2})\). In other words, we must prove that:
\[ \begin{equation} \frac{P(Z > \lambda)}{P(Z > \lambda/2)} = P(Z > \lambda/2) \end{equation} \]
To determine \(P(Z > \lambda)\) we can solve the following integral:
\[ \begin{align} P(Z > \lambda) &= 1 - \int_0^{\lambda} \lambda e^{-\lambda x} \ dx \\ &= 1 - (-e^{-\lambda x}|_0^\lambda) \\ &= 1 - (-e^{-\lambda^2}-1) \\ &= e^{-\lambda^2} \end{align} \]
We can invoke the same methodology to determine \(P(Z > \lambda/2)\):
\[ \begin{align} P(Z > \lambda/2) &= 1 - \int_0^{\lambda / 2} \lambda e^{-\lambda x} \ dx \\ &= 1 - (-e^{-\lambda x}|_0^{\lambda/2}) \\ &= 1 - (-e^{\frac{-\lambda^2}{2}}-1) \\ &= e^{\frac{-\lambda^2}{2}} \end{align} \]
Dividing \(\frac{P(Z > \lambda)}{P(Z > \lambda/2)}\) gives:
\[ \begin{align} \frac{P(Z > \lambda)}{P(Z > \lambda/2)} = \frac{e^{-\lambda^2}} {e^\frac{-\lambda^2}{2}}=e^{-\lambda^2+\frac{\lambda^2}{2}} = e^{\frac{-\lambda^2}{2}} = P(Z > \lambda/2) \end{align} \]
Thus, we have confirmed the exponential distribution’s memoryless trait for this conditional probability.
For PDF \(Z\) (the exponential), calculate empirically the probability \(P(Z > 2\lambda \ | \ Z > \lambda)\). Then evaluate through calculus whether the memoryless property holds.
We can determine the probability by mimicking the steps used in part C. Once again, we can use Baye’s rule with the helpful addition that \(P(B|A) = P(Z > \lambda \ |\ Z > 2\lambda) = 1\), since if \(Z\) is greater than \(2\lambda\) it must also be greater than \(\lambda\). The probability is computed in the cell below:
P_A <- length(which(Z > 2*l)) / samples
P_B <- length(which(Z > l)) / samples
P <- P_A / P_B
# alternative method using subsets to check above value of P
check <- length(which(Z[Z>l] > 2*l)) / length(Z[Z>l])
if(P == check){
print(P)
}
## [1] 0.02162162
Thus, \(P(Z > 2\lambda \ | \ Z > \lambda) \approx 0.022\).
Following in the same footsteps as in part C, we can show adherence of the memoryless property for this probability by proving that:
\[ \begin{equation} \frac{P(Z > 2\lambda)}{P(Z > \lambda)} = P(Z > \lambda) \end{equation} \]
To determine \(P(Z > 2\lambda)\) we can solve the following integral:
\[ \begin{align} P(Z > \lambda) &= 1 - \int_0^{2\lambda} \lambda e^{-\lambda x} \ dx \\ &= 1 - (-e^{-\lambda x}|_0^{2\lambda}) \\ &= 1 - (-e^{-2\lambda^2}-1) \\ &= e^{-2\lambda^2} \end{align} \]
We have already determined \(P(Z > \lambda)\) in part C, and so can immediately compute \(\frac{P(Z > 2\lambda)}{P(Z > \lambda)}\):
\[ \begin{align} \frac{P(Z > 2\lambda)}{P(Z > \lambda)} = \frac{e^{-2\lambda^2}} {e^{-\lambda^2}}=e^{-2\lambda^2+\lambda^2} = e^{-\lambda^2} = P(Z > \lambda) \end{align} \]
Thus, we have confirmed the exponential distribution’s memoryless trait for this conditional probability.
For PDF \(Z\) (the exponential), calculate empirically the probability \(P(Z > 3\lambda \ | \ Z > \lambda)\). Then evaluate through calculus whether the memoryless property holds.
We can determine the probability by mimicking the steps used in parts C and D. Once again, we can use Baye’s rule with the helpful addition that \(P(B|A) = P(Z > \lambda \ |\ Z > 3\lambda) = 1\), since if \(Z\) is greater than \(3\lambda\) it must also be greater than \(\lambda\):
P_A <- length(which(Z > 3*l)) / samples
P_B <- length(which(Z > l)) / samples
P <- P_A / P_B
# alternative method using subsets to check above value of P
check <- length(which(Z[Z>l] > 3*l)) / length(Z[Z>l])
if(P == check){
print(P)
}
## [1] 0
Thus, \(P(Z > 3\lambda \ | \ Z > \lambda) \approx 0\). Given the choices of \(n\) and \(\lambda\), it appears there were no samples that satisfied this conditional probability. Using R we can compute the theoretical value of this probability, which confirms that it is very close to 0:
pexp(3*l, rate=l, lower.tail=FALSE) / pexp(l, rate=l, lower.tail=FALSE)
## [1] 0.0003354626
Following in the same footsteps as in parts 1C and 1D, we can show adherence of the memoryless property for this probability by proving that:
\[ \begin{equation} \frac{P(Z > 3\lambda)}{P(Z > \lambda)} = P(Z > 2\lambda) \end{equation} \] To determine \(P(Z > 3\lambda)\) we can solve the following integral:
\[ \begin{align} P(Z > \lambda) &= 1 - \int_0^{3\lambda} \lambda e^{-\lambda x} \ dx \\ &= 1 - (-e^{-\lambda x}|_0^{3\lambda}) \\ &= 1 - (-e^{-3\lambda^2}-1) \\ &= e^{-3\lambda^2} \end{align} \]
We have already determined \(P(Z > \lambda)\) in part 1C, and so can immediately compute \(\frac{P(Z > 3\lambda)}{P(Z > \lambda)}\):
\[ \begin{align} \frac{P(Z > 3\lambda)}{P(Z > \lambda)} = \frac{e^{-3\lambda^2}} {e^{-\lambda^2}}=e^{-3\lambda^2+\lambda^2} = e^{-2\lambda^2} = P(Z > 2\lambda) \end{align} \]
Thus, we have confirmed the exponential distribution’s memoryless trait for this conditional probability.
Loosely investigate whether \(P(YZ) = P(Y)P(Z)\) by building a table with quartiles and evaluating the marginal and joint probabilities.
The cell below goes through the process of creating the joint probability table by taking the cross product of the quartile values for both \(Y\) and \(Z\). We can divide each entry in the table by the the sum of all table entries to generate the joint probabilities:
Y_quants <- quantile(Y, probs=c(0.25, 0.5, 0.75, 1))
Z_quants <- quantile(Z, probs=c(0.25, 0.5, 0.75, 1))
YZ_quants <- t(tcrossprod(Y_quants, Z_quants))
count_sum <- sum(YZ_quants)
joint_table <- YZ_quants / count_sum
joint_table
## [,1] [,2] [,3] [,4]
## [1,] 0.002813712 0.003807419 0.004970523 0.01198910
## [2,] 0.007039678 0.009525853 0.012435844 0.02999574
## [3,] 0.013736705 0.018588042 0.024266382 0.05853147
## [4,] 0.095732305 0.129541700 0.169114557 0.40791097
The cell below then adds in the marginal probabilities by adding the row and column sums of the joint probabilities:
row_sums <- apply(joint_table, 1, sum)
col_sums <- apply(joint_table, 2, sum)
joint_marginal_table <- cbind(rbind(joint_table, col_sums), c(row_sums, 1))
colnames(joint_marginal_table) <- c('1st Quar. Y', '2nd Quar. Y',
'3rd Quar. Y', '4th Quar. Y', 'Sum')
rownames(joint_marginal_table) <- c('1st Quar. Z', '2nd Quar. Z',
'3rd Quar.Z', '4th Quar. Z', 'Sum')
joint_marginal_table
## 1st Quar. Y 2nd Quar. Y 3rd Quar. Y 4th Quar. Y Sum
## 1st Quar. Z 0.002813712 0.003807419 0.004970523 0.01198910 0.02358075
## 2nd Quar. Z 0.007039678 0.009525853 0.012435844 0.02999574 0.05899712
## 3rd Quar.Z 0.013736705 0.018588042 0.024266382 0.05853147 0.11512260
## 4th Quar. Z 0.095732305 0.129541700 0.169114557 0.40791097 0.80229953
## Sum 0.119322399 0.161463013 0.210787307 0.50842728 1.00000000
We can now use this joint/marginal probability table to investigate whether or not \(P(YZ) = P(Y)P(Z)\), by creating a matrix which compares the two values:
check_mat <- matrix(nrow = 0, ncol=5)
for(i in 1:4){
for(j in 1:4){
PYZ <- joint_table[i, j]
PYPZ <- row_sums[i] * col_sums[j]
diff <- abs(PYZ - PYPZ)
diff_row <- c(i, j, PYZ, PYPZ, diff)
check_mat <- rbind(check_mat, diff_row)
}
}
check_mat <- unname(check_mat)
colnames(check_mat) <- c('Y Quar', 'Z Quar', 'P(YZ)', 'P(Y)P(Z)', 'Diff')
check_mat
## Y Quar Z Quar P(YZ) P(Y)P(Z) Diff
## [1,] 1 1 0.002813712 0.002813712 4.336809e-19
## [2,] 1 2 0.003807419 0.003807419 0.000000e+00
## [3,] 1 3 0.004970523 0.004970523 0.000000e+00
## [4,] 1 4 0.011989098 0.011989098 1.734723e-18
## [5,] 2 1 0.007039678 0.007039678 8.673617e-19
## [6,] 2 2 0.009525853 0.009525853 0.000000e+00
## [7,] 2 3 0.012435844 0.012435844 1.734723e-18
## [8,] 2 4 0.029995745 0.029995745 3.469447e-18
## [9,] 3 1 0.013736705 0.013736705 0.000000e+00
## [10,] 3 2 0.018588042 0.018588042 3.469447e-18
## [11,] 3 3 0.024266382 0.024266382 0.000000e+00
## [12,] 3 4 0.058531470 0.058531470 6.938894e-18
## [13,] 4 1 0.095732305 0.095732305 0.000000e+00
## [14,] 4 2 0.129541700 0.129541700 0.000000e+00
## [15,] 4 3 0.169114557 0.169114557 0.000000e+00
## [16,] 4 4 0.407910969 0.407910969 0.000000e+00
As we can see from the above, it is clear that \(P(YZ) = P(Y)P(Z)\) for every joint probability. Thus, we say with a high degree of confidence that \(Y\) and \(Z\) are independent events.
Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?
We can also use statistical tests to confirm the independence of \(Y\) and \(Z\). First, we can turn our joint probability table into a frequency table by multiplying by a large enough constant (10,000 in this case), and rounding to the nearest integer:
freq_table <- round(joint_table * 10000)
freq_table
## [,1] [,2] [,3] [,4]
## [1,] 28 38 50 120
## [2,] 70 95 124 300
## [3,] 137 186 243 585
## [4,] 957 1295 1691 4079
The cell below runs a Fisher’s exact test to test for independence between \(Y\) and \(Z\). Given two variables \(Y\) and \(Z\), the null and alternative hypotheses for a Fisher’s exact test are as follows:
fisher.test(freq_table, simulate.p.value = TRUE)
##
## Fisher's Exact Test for Count Data with simulated p-value (based on
## 2000 replicates)
##
## data: freq_table
## p-value = 1
## alternative hypothesis: two.sided
Thus, given the extremely high \(p\) value, we can reject the alternative hypotheses and confirm that \(Y\) and \(Z\) are independent.
We can also perform a Chi-Squared test of association (\(H_0\) and \(H_A\) are the same as they were for Fisher’s exact test):
chisq.test(freq_table)
##
## Pearson's Chi-squared test
##
## data: freq_table
## X-squared = 0.0055442, df = 9, p-value = 1
Once again, we see a \(p\)-value of 1, indicating that we cannot reject the null hypothesis and conclude that \(Y\) and \(Z\) are independent.
In this case, though both tests yielded the same results, the Chi-Squared test is more appropriate given the sample size. Typically, most researchers adhere to following: “if ≤ 20% of expected cell counts are less than 5, then use the chi-square test; if > 20% of expected cell counts are less than 5, then use Fisher’s exact test.” (Source)