Final Exam

Problem 1

Gamma

PDF

Probability Density 1: X~Gamma.
Using R, generate a random variable X that has 10,000 random Gamma pdf values. A Gamma pdf is completely describe by \(n\) (a size parameter) and lambda (λ , a shape parameter). Choose any \(n\) greater 3 and an expected value (λ) between 2 and 10 (you choose).

Helpful Info

# Set the seed for reproducibility
set.seed(123)

# Define the size and shape parameters
n <- 5
lambda <- 3

# Generate 10,000 random values from a Gamma distribution
X <- rgamma(10000, shape = n, rate = lambda)

Expected value and variance

Calculate the empirical expected value (means) and variances

\[E(X) = \frac{n}{\lambda}\]

\[Var(X) = \frac{n}{\lambda^2}\]

# Calculate the expected value and variance
e_v_gamma <- n/lambda # expected value of a gamma distribution with shape=5 and rate=3
v_gamma <- n/lambda^2   # variance of a gamma distribution with shape=5 and rate=3

# Print the results
cat("Expected value:", e_v_gamma, "\n")

## Expected value: 1.666667

cat("Variance:", v_gamma, "\n")

## Variance: 0.5555556

Calculus calculation of expected value and variance

Expected Value using calculus

Helpful Info

To calculate the expected value with calculus for the Gamma pdf we integrate \(X\) times the pdf function over the range \([0,\infty]\)

\[E(X)=\int_0^{\infty} x \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x} d x\]

If we substitute the values for \(a = 5\) (which is equal to \(n\)) and \(\lambda = 3\) than we get:

\[E(X)=\int_0^{\infty} x \frac{3^5}{\Gamma(5)} x^{5-1} e^{-3 x} d x\]

Combine the two exponents of \(x\) into one term:

\[E(X)=\int_0^{\infty} \frac{3^5}{\Gamma(5)} x^{(5+1)-1} e^{-3 x} d x\]

Multiply by \(\frac{1}{\lambda}\)

\[E(X)=\int_0^{\infty} \frac{1}{3}{\cdot}\frac{3^{5+1}}{\Gamma(5)} x^{(5+1)-1} e^{-3 x} d x\]

Use the following relationship \(\Gamma(x+1) = \Gamma(x) \cdot x\) we get:

\[E(X)=\int_0^{\infty} \frac{5}{3}{\cdot}\frac{3^{5+1}}{\Gamma(5+1)} x^{(5+1)-1} e^{-3 x} d x\]

Again using the density of the gamma distribution we now get:

\[E(X)=\frac{5}{3}\int_0^{\infty} Gam(x;5+1,3) d x\]

Which once we pull out the term \(\frac{5}{3}\) equates to 1, so using calculus the expected value is the same:

\[E(X)=\frac{5}{3}\] Which is:

n/lambda

## [1] 1.666667

Variance using calculus

Helpful Info

The variance can be expressed in expected value terms as:

\[\operatorname{Var}(X)=\mathrm{E}\left(X^2\right)-\mathrm{E}(X)^2\]

Therefore with the pdf of the gamma distribution, the expected value of a squared gamma random variable is:

\[E(X^2)=\int_0^{\infty} x^2 \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x} d x\]

Using similar substitution and reduction techniques:

\[=\int_0^{\infty} x^2 \frac{3^5}{\Gamma(5)} x^{5-1} e^{-3 x} d x\] \[=\int_0^{\infty} \frac{3^5}{\Gamma(5)} x^{(5+2)-1} e^{-3 x} d x\] \[=\int_0^{\infty} \frac{1}{3^2}\cdot\frac{3^{5+2}}{\Gamma(5)} x^{(5+2)-1} e^{-3 x} d x\]

Then twice applying the \(\Gamma(x+1) = \Gamma(x) \cdot x\) relationship:

\[\mathrm{E}\left(X^2\right)=\int_0^{\infty} \frac{5(5+1)}{3^2} \cdot \frac{3^{5+2}}{\Gamma(5+2)} x^{(5+2)-1} e^{-3 x} \mathrm{d} x\]

Again using the density of the gamma distribution we get:

\[\begin{aligned} \mathrm{E}\left(X^2\right) & =\frac{5(5+1)}{3^2} \int_0^{\infty} \operatorname{Gam}(x ; 5+2, 3) \mathrm{d} x \\ & =\frac{5^2+5}{3^2} .\end{aligned}\] Plugging the this into the initial formula of:

\[\operatorname{Var}(X)=\mathrm{E}\left(X^2\right)-\mathrm{E}(X)^2\]

We get:

\[\operatorname{Var}(X)=\frac{5^2+5}{3^2}-(\frac{5}{3})^2\] \[=\frac{5}{3^2}\]

Which equates too:

n/lambda^2

## [1] 0.5555556

Y~Sum of Exponentials

Probability Density 2: Y~Sum of Exponentials.
Then generate 10,000 observations from the sum of \(n\) exponential pdfs with rate/shape parameter (λ). The \(n\) and λ must be the same as in the previous case. (e.g., mysum=rexp(10000,λ)+rexp(10000,λ)+..)

# Set the seed for reproducibility
set.seed(123)

# Define the size and shape parameters
n <- 5
lambda <- 3

# Generate 10,000 observations from the sum of n exponential PDFs
Y <- replicate(10000, sum(rexp(n, rate = lambda)))

Calculate the empirical expected value (means) and variances

\[E(Y) = E(X_1 + X_2 + \ldots + X_n) = E(X_1) + E(X_2) + \ldots + E(X_n) = \frac{1}{\lambda} + \frac{1}{\lambda} + \ldots + \frac{1}{\lambda} = \frac{n}{\lambda}\]

\[Var(Y) = Var(X_1 + X_2 + \ldots + X_n) = Var(X_1) + Var(X_2) + \ldots + Var(X_n) = \frac{1}{\lambda^2} + \frac{1}{\lambda^2} + \ldots + \frac{1}{\lambda^2} = \frac{n}{\lambda^2}\]

# Calculate the expected value and variance
e_v_y_sum_exp <- n/lambda  # expected value of the y-sum of 5 exponentials with rate=3
v_y_sum_exp <- n/lambda^2  # variance of the y-sum of 5 exponentials with rate=3

# Print the results
cat("Expected value:", e_v_y_sum_exp, "\n")

## Expected value: 1.666667

cat("Variance:", v_y_sum_exp, "\n")

## Variance: 0.5555556

Sum of Exponential Mean of Y

Using the moment generating function (MGF) for exponentials, calculate the sum of exponentials (Y).

Helpful Info

The MGF of an exponential distribution with rate parameter \(\lambda\) is given by:

\[M_X(t)=\frac{\lambda}{\lambda-t}\]

To find the MGF of the sum of n independent exponential random variables, each with a rate parameter of lambda, we raise the MGF of a single exponential rand variable to the power of n. Substituting the values \(\alpha=5\) and \(\lambda=3\) into the MGF equation we have:

\[M_Y(t) =\frac{3}{3-t}^{5}\]

When evaluating MGF at \(t=0\) we have:

\[M_Y(0) =\frac{3}{3-0}^{5}\]

Which simplifies too:

\[M_Y(0)=1^5=1\]

Z~ Exponential

Probability Density 3: Z~ Exponential.
Then generate 10,000 observations from a single exponential pdf with rate/shape parameter (λ).

# Set the seed for reproducibility
set.seed(123)

# Define the rate/shape parameter
lambda <- 3

# Generate 10,000 observations from an exponential PDF
Z <- rexp(10000, rate = lambda)

Calculate the empirical expected value (means) and variances

\(E(Z) = \frac{1}{\lambda}\)

\(Var(Z) = \frac{1}{\lambda^2}\)

# Calculate the expected value and variance
e_v_z_exp <- 1/lambda   # expected value of an exponential distribution with rate=3
v_z_exp <- 1/lambda^2   # variance of an exponential distribution with rate=3

# Print the results
cat("Expected value:", e_v_z_exp, "\n")

## Expected value: 0.3333333

cat("Variance:", v_z_exp, "\n")

## Variance: 0.1111111

Exponential Mean

Using the moment generating function for exponentials, calculate the expected value of the single exponential (Z)

Here we can use the same formula for MGF of an exponential distribution with parameter \(\lambda\) as follows:

\[M_X(t)=\frac{\lambda}{\lambda-t}\]

However, to find the expected value we differentiate the MGF with respect to \(t\):

\[M_X(t)=\frac{\lambda}{(\lambda-t)^2}\]

Then, we evaluate the derivative at \(t=0\):

\[M(0)=\frac{\lambda}{(\lambda-0)}=\frac{\lambda}{\lambda^2}=\frac{1}{\lambda}\]

Substituting \(\lambda=3\) into the equation, we have:

\[M(0)=\frac{1}{3}\]

Therefore, the expected value of a single exponential random variable with the rate parameter \(lambda=3\) is”

\[E(Z)=\frac{1}{\lambda}=\frac{1}{3}\approx0.333\]

Probability.

For pdf Z (the exponential), calculate empirically probabilities a through c. Then evaluate through calculus whether the memoryless property holds.

\(P(Z> λ| Z> λ/2)\)

We are dealing with the exponential pdf and want the probability so we can use the pexp() function to generate the cumulative distribution function.

We can subtract by 1 to get anything greater than.

(p_a <- 1-pexp(e_v_z_exp, lambda))

## [1] 0.3678794

\(P(Z> 2λ | Z> λ)\)

(p_b <- 1-pexp(e_v_z_exp, 2*lambda))

## [1] 0.1353353

\(P(Z>3λ | Z> λ)\)

(p_c <- 1-pexp(e_v_z_exp, 3*lambda))

## [1] 0.04978707

Probability Table

Loosely investigate whether P(YZ) = P(Y)P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities.

# first we need to get the quantiles of each
y_quant <- quantile(Y, c(0.25,0.50,0.75,1))
z_quant <- quantile(Z, c(0.25,0.50,0.75,1))

# convert to df (creates row/col names)
y_quant <- as.data.frame(y_quant)
z_quant <- as.data.frame(z_quant)

# convert to matrix
y_quant <- as.matrix(t(y_quant))
z_quant <- as.matrix(z_quant)

# matrix multiplication
prod_y_z <- z_quant %*% y_quant

# sum
prod_sum_y_z <- sum(prod_y_z)

Calculate the the probability

# create blank matrix and set col and row names
prob_y_z <- matrix(nrow=4,ncol=4, 0,
                    dimnames = dimnames(prod_y_z))

# loop through blank matrix and calc prob
for (i in 1:dim(prob_y_z)[1]) {
  for (j in 1:dim(prob_y_z)[2]) {
    prob_y_z[i,j] = prod_y_z[i,j]/prod_sum_y_z
  }
}

# view results
prob_y_z

##              25%         50%         75%       100%
## 25%  0.002806858 0.003894488 0.005279363 0.01328900
## 50%  0.006774183 0.009399112 0.012741427 0.03207220
## 75%  0.013422715 0.018623887 0.025246523 0.06354951
## 100% 0.088072232 0.122199366 0.165653338 0.41697580

Next we need to generate sums and append

# sums of cols and append as row
col_sums <- colSums(prob_y_z)
prob_y_z <- rbind(prob_y_z, col_sums)

# sums of rows and append as col
row_sums <- rowSums(prob_y_z)
prob_y_z <- cbind(prob_y_z, row_sums)

View results.

prob_y_z

##                  25%         50%         75%       100%   row_sums
## 25%      0.002806858 0.003894488 0.005279363 0.01328900 0.02526971
## 50%      0.006774183 0.009399112 0.012741427 0.03207220 0.06098692
## 75%      0.013422715 0.018623887 0.025246523 0.06354951 0.12084264
## 100%     0.088072232 0.122199366 0.165653338 0.41697580 0.79290074
## col_sums 0.111075988 0.154116853 0.208920651 0.52588651 1.00000000

Test for independence

Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?

Both tests are statistical tests used to analyze categorical data and determine if there is a significant association between two categorical variables. Typically Fisher Exact Test is used for a smaller sample size, whereas the Chi Square test is used when the sample size is larger. So the null hypothesis here would be that there is NO association between the Y and Z.

So if we are testing categorical variables, let’s generate counts for these tests.

counts <- prob_y_z*10000

Run Fisher-Test

fisher.test(counts, simulate.p.value = T)

## 
##  Fisher's Exact Test for Count Data with simulated p-value (based on
##  2000 replicates)
## 
## data:  counts
## p-value = 1
## alternative hypothesis: two.sided

Run Chi-square Test

chisq.test(counts, simulate.p.value = T)

## 
##  Pearson's Chi-squared test with simulated p-value (based on 2000
##  replicates)
## 
## data:  counts
## X-squared = 8.2278e-28, df = NA, p-value = 1

Due to low p-value in Chi Square Test we can fail to reject the null hypothesis that there is association between Y and Z.

Final Exam

Justin Williams

2023-04-22

Problem 1

Gamma

PDF

Expected value and variance

Calculus calculation of expected value and variance

Expected Value using calculus

Variance using calculus

Y~Sum of Exponentials

Sum of Exponential Mean of Y

Z~ Exponential

Exponential Mean

Probability.

Probability Table

Test for independence