Probability Density 1: X~Gamma.
Using R, generate a random variable X that has 10,000 random Gamma
pdf values. A Gamma pdf is completely describe by \(n\) (a size parameter) and lambda (λ , a
shape parameter). Choose any \(n\)
greater 3 and an expected value (λ) between 2 and 10 (you
choose).
# Set the seed for reproducibility
set.seed(123)
# Define the size and shape parameters
n <- 5
lambda <- 3
# Generate 10,000 random values from a Gamma distribution
X <- rgamma(10000, shape = n, rate = lambda)
Calculate the empirical expected value (means) and variances
\[E(X) = \frac{n}{\lambda}\]
\[Var(X) = \frac{n}{\lambda^2}\]
# Calculate the expected value and variance
e_v_gamma <- n/lambda # expected value of a gamma distribution with shape=5 and rate=3
v_gamma <- n/lambda^2 # variance of a gamma distribution with shape=5 and rate=3
# Print the results
cat("Expected value:", e_v_gamma, "\n")
## Expected value: 1.666667
cat("Variance:", v_gamma, "\n")
## Variance: 0.5555556
To calculate the expected value with calculus for the Gamma pdf we integrate \(X\) times the pdf function over the range \([0,\infty]\)
\[E(X)=\int_0^{\infty} x \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x} d x\]
If we substitute the values for \(a = 5\) (which is equal to \(n\)) and \(\lambda = 3\) than we get:
\[E(X)=\int_0^{\infty} x \frac{3^5}{\Gamma(5)} x^{5-1} e^{-3 x} d x\]
Combine the two exponents of \(x\) into one term:
\[E(X)=\int_0^{\infty} \frac{3^5}{\Gamma(5)} x^{(5+1)-1} e^{-3 x} d x\]
Multiply by \(\frac{1}{\lambda}\)
\[E(X)=\int_0^{\infty} \frac{1}{3}{\cdot}\frac{3^{5+1}}{\Gamma(5)} x^{(5+1)-1} e^{-3 x} d x\]
Use the following relationship \(\Gamma(x+1) = \Gamma(x) \cdot x\) we get:
\[E(X)=\int_0^{\infty} \frac{5}{3}{\cdot}\frac{3^{5+1}}{\Gamma(5+1)} x^{(5+1)-1} e^{-3 x} d x\]
Again using the density of the gamma distribution we now get:
\[E(X)=\frac{5}{3}\int_0^{\infty} Gam(x;5+1,3) d x\]
Which once we pull out the term \(\frac{5}{3}\) equates to 1, so using calculus the expected value is the same:
\[E(X)=\frac{5}{3}\] Which is:
n/lambda
## [1] 1.666667
The variance can be expressed in expected value terms as:
\[\operatorname{Var}(X)=\mathrm{E}\left(X^2\right)-\mathrm{E}(X)^2\]
Therefore with the pdf of the gamma distribution, the expected value of a squared gamma random variable is:
\[E(X^2)=\int_0^{\infty} x^2 \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x} d x\]
Using similar substitution and reduction techniques:
\[=\int_0^{\infty} x^2 \frac{3^5}{\Gamma(5)} x^{5-1} e^{-3 x} d x\] \[=\int_0^{\infty} \frac{3^5}{\Gamma(5)} x^{(5+2)-1} e^{-3 x} d x\] \[=\int_0^{\infty} \frac{1}{3^2}\cdot\frac{3^{5+2}}{\Gamma(5)} x^{(5+2)-1} e^{-3 x} d x\]
Then twice applying the \(\Gamma(x+1) = \Gamma(x) \cdot x\) relationship:
\[\mathrm{E}\left(X^2\right)=\int_0^{\infty} \frac{5(5+1)}{3^2} \cdot \frac{3^{5+2}}{\Gamma(5+2)} x^{(5+2)-1} e^{-3 x} \mathrm{d} x\]
Again using the density of the gamma distribution we get:
\[\begin{aligned} \mathrm{E}\left(X^2\right) & =\frac{5(5+1)}{3^2} \int_0^{\infty} \operatorname{Gam}(x ; 5+2, 3) \mathrm{d} x \\ & =\frac{5^2+5}{3^2} .\end{aligned}\] Plugging the this into the initial formula of:
\[\operatorname{Var}(X)=\mathrm{E}\left(X^2\right)-\mathrm{E}(X)^2\]
We get:
\[\operatorname{Var}(X)=\frac{5^2+5}{3^2}-(\frac{5}{3})^2\] \[=\frac{5}{3^2}\]
Which equates too:
n/lambda^2
## [1] 0.5555556
Probability Density 2: Y~Sum of Exponentials.
Then generate 10,000 observations from the sum of \(n\) exponential pdfs with rate/shape
parameter (λ). The \(n\) and λ must be
the same as in the previous case. (e.g.,
mysum=rexp(10000,λ)+rexp(10000,λ)+..)
# Set the seed for reproducibility
set.seed(123)
# Define the size and shape parameters
n <- 5
lambda <- 3
# Generate 10,000 observations from the sum of n exponential PDFs
Y <- replicate(10000, sum(rexp(n, rate = lambda)))
Calculate the empirical expected value (means) and variances
\[E(Y) = E(X_1 + X_2 + \ldots + X_n) = E(X_1) + E(X_2) + \ldots + E(X_n) = \frac{1}{\lambda} + \frac{1}{\lambda} + \ldots + \frac{1}{\lambda} = \frac{n}{\lambda}\]
\[Var(Y) = Var(X_1 + X_2 + \ldots + X_n) = Var(X_1) + Var(X_2) + \ldots + Var(X_n) = \frac{1}{\lambda^2} + \frac{1}{\lambda^2} + \ldots + \frac{1}{\lambda^2} = \frac{n}{\lambda^2}\]
# Calculate the expected value and variance
e_v_y_sum_exp <- n/lambda # expected value of the y-sum of 5 exponentials with rate=3
v_y_sum_exp <- n/lambda^2 # variance of the y-sum of 5 exponentials with rate=3
# Print the results
cat("Expected value:", e_v_y_sum_exp, "\n")
## Expected value: 1.666667
cat("Variance:", v_y_sum_exp, "\n")
## Variance: 0.5555556
Using the moment generating function (MGF) for exponentials, calculate the sum of exponentials (Y).
The MGF of an exponential distribution with rate parameter \(\lambda\) is given by:
\[M_X(t)=\frac{\lambda}{\lambda-t}\]
To find the MGF of the sum of n independent exponential random variables, each with a rate parameter of lambda, we raise the MGF of a single exponential rand variable to the power of n. Substituting the values \(\alpha=5\) and \(\lambda=3\) into the MGF equation we have:
\[M_Y(t) =\frac{3}{3-t}^{5}\]
When evaluating MGF at \(t=0\) we have:
\[M_Y(0) =\frac{3}{3-0}^{5}\]
Which simplifies too:
\[M_Y(0)=1^5=1\]
Probability Density 3: Z~ Exponential.
Then generate 10,000 observations from a single exponential pdf with
rate/shape parameter (λ).
# Set the seed for reproducibility
set.seed(123)
# Define the rate/shape parameter
lambda <- 3
# Generate 10,000 observations from an exponential PDF
Z <- rexp(10000, rate = lambda)
Calculate the empirical expected value (means) and variances
\(E(Z) = \frac{1}{\lambda}\)
\(Var(Z) = \frac{1}{\lambda^2}\)
# Calculate the expected value and variance
e_v_z_exp <- 1/lambda # expected value of an exponential distribution with rate=3
v_z_exp <- 1/lambda^2 # variance of an exponential distribution with rate=3
# Print the results
cat("Expected value:", e_v_z_exp, "\n")
## Expected value: 0.3333333
cat("Variance:", v_z_exp, "\n")
## Variance: 0.1111111
Using the moment generating function for exponentials, calculate the expected value of the single exponential (Z)
Here we can use the same formula for MGF of an exponential distribution with parameter \(\lambda\) as follows:
\[M_X(t)=\frac{\lambda}{\lambda-t}\]
However, to find the expected value we differentiate the MGF with respect to \(t\):
\[M_X(t)=\frac{\lambda}{(\lambda-t)^2}\]
Then, we evaluate the derivative at \(t=0\):
\[M(0)=\frac{\lambda}{(\lambda-0)}=\frac{\lambda}{\lambda^2}=\frac{1}{\lambda}\]
Substituting \(\lambda=3\) into the equation, we have:
\[M(0)=\frac{1}{3}\]
Therefore, the expected value of a single exponential random variable with the rate parameter \(lambda=3\) is”
\[E(Z)=\frac{1}{\lambda}=\frac{1}{3}\approx0.333\]
For pdf Z (the exponential), calculate empirically probabilities a through c. Then evaluate through calculus whether the memoryless property holds.
We are dealing with the exponential pdf and want the probability so
we can use the pexp()
function to generate the cumulative
distribution function.
We can subtract by 1 to get anything greater than.
(p_a <- 1-pexp(e_v_z_exp, lambda))
## [1] 0.3678794
(p_b <- 1-pexp(e_v_z_exp, 2*lambda))
## [1] 0.1353353
(p_c <- 1-pexp(e_v_z_exp, 3*lambda))
## [1] 0.04978707
Loosely investigate whether P(YZ) = P(Y)P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities.
# first we need to get the quantiles of each
y_quant <- quantile(Y, c(0.25,0.50,0.75,1))
z_quant <- quantile(Z, c(0.25,0.50,0.75,1))
# convert to df (creates row/col names)
y_quant <- as.data.frame(y_quant)
z_quant <- as.data.frame(z_quant)
# convert to matrix
y_quant <- as.matrix(t(y_quant))
z_quant <- as.matrix(z_quant)
# matrix multiplication
prod_y_z <- z_quant %*% y_quant
# sum
prod_sum_y_z <- sum(prod_y_z)
Calculate the the probability
# create blank matrix and set col and row names
prob_y_z <- matrix(nrow=4,ncol=4, 0,
dimnames = dimnames(prod_y_z))
# loop through blank matrix and calc prob
for (i in 1:dim(prob_y_z)[1]) {
for (j in 1:dim(prob_y_z)[2]) {
prob_y_z[i,j] = prod_y_z[i,j]/prod_sum_y_z
}
}
# view results
prob_y_z
## 25% 50% 75% 100%
## 25% 0.002806858 0.003894488 0.005279363 0.01328900
## 50% 0.006774183 0.009399112 0.012741427 0.03207220
## 75% 0.013422715 0.018623887 0.025246523 0.06354951
## 100% 0.088072232 0.122199366 0.165653338 0.41697580
Next we need to generate sums and append
# sums of cols and append as row
col_sums <- colSums(prob_y_z)
prob_y_z <- rbind(prob_y_z, col_sums)
# sums of rows and append as col
row_sums <- rowSums(prob_y_z)
prob_y_z <- cbind(prob_y_z, row_sums)
View results.
prob_y_z
## 25% 50% 75% 100% row_sums
## 25% 0.002806858 0.003894488 0.005279363 0.01328900 0.02526971
## 50% 0.006774183 0.009399112 0.012741427 0.03207220 0.06098692
## 75% 0.013422715 0.018623887 0.025246523 0.06354951 0.12084264
## 100% 0.088072232 0.122199366 0.165653338 0.41697580 0.79290074
## col_sums 0.111075988 0.154116853 0.208920651 0.52588651 1.00000000
Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?
Both tests are statistical tests used to analyze categorical data and determine if there is a significant association between two categorical variables. Typically Fisher Exact Test is used for a smaller sample size, whereas the Chi Square test is used when the sample size is larger. So the null hypothesis here would be that there is NO association between the Y and Z.
So if we are testing categorical variables, let’s generate counts for these tests.
counts <- prob_y_z*10000
Run Fisher-Test
fisher.test(counts, simulate.p.value = T)
##
## Fisher's Exact Test for Count Data with simulated p-value (based on
## 2000 replicates)
##
## data: counts
## p-value = 1
## alternative hypothesis: two.sided
Run Chi-square Test
chisq.test(counts, simulate.p.value = T)
##
## Pearson's Chi-squared test with simulated p-value (based on 2000
## replicates)
##
## data: counts
## X-squared = 8.2278e-28, df = NA, p-value = 1
Due to low p-value in Chi Square Test we can fail to reject the null hypothesis that there is association between Y and Z.