Final Problem 1

Problem 1

Probability Density 1:

X~Gamma. Using R, generate a random variable X that has 10,000 random Gamma pdf values. A Gamma pdf is completely describe by n (a size parameter) and lambda (l , a shape parameter). Choose any n greater 3 and an expected value (l) between 2 and 10 (you choose).

# Set the size parameter (n) and shape parameter (lambda)
n <- 5
lambda <- 6

# Generate the random variable
X <- rgamma(10000, shape = n, rate = lambda)

# Display the first few values
head(X)

## [1] 0.5649308 1.2298262 0.2715935 0.7964066 1.4789274 0.9218103

Probability Density 2:

Y~Sum of Exponentials. Then generate 10,000 observations from the sum of n exponential pdfs with rate/shape parameter (l). The n and l must be the same as in the previous case. (e.g., mysum=rexp(10000,l)+rexp(10000,l)+..)

Y_sum <- rexp(10000, rate=lambda)
for(i in 1:(n-1)){
  Y_sum <- Y_sum + rexp(10000, rate=lambda)
}

head(Y_sum)

## [1] 0.7525961 1.5479806 0.5014050 0.6219232 0.8973353 0.3369982

Probability Density 3:

Z~ Exponential. Then generate 10,000 observations from a single exponential pdf with rate/shape parameter (l).

# Generate 10,000 observations from a single exponential pdf
Z <- rexp(10000, rate = lambda)

# Display the first few values
head(Z)

## [1] 0.03877801 0.21359056 0.05457032 0.04820432 0.24274215 0.57111148

1a.

Calculate the empirical expected value (means) and variances of all three pdfs.

pdfs <- cbind(X, Y_sum, Z)
mean <- apply(pdfs, 2, mean)
variance <- apply(pdfs, 2, var)

cbind(mean, variance)

##            mean   variance
## X     0.8271590 0.13550853
## Y_sum 0.8306691 0.13619071
## Z     0.1664165 0.02896334

1b.

Using calculus, calculate the expected value and variance of the Gamma pdf (X). Using the moment generating function for exponentials, calculate the expected value of the single exponential (Z) and the sum of exponentials (Y)

The probability density function of the Gamma distribution with size parameter n and shape parameter \(\lambda\) is:

\[ f(x) = \frac {\lambda^n}{\Gamma(n)}x^{n-1}e^{\lambda x} \]

The expected value of \(X\) is:

\[\begin{align*} E(X) &= \int_{0}^{\infty} x f(x) dx \\\\ &= \int_{0}^{\infty} x \frac{\lambda^n}{\Gamma(n)} x^{n-1} e^{-\lambda x} dx \\\\ &= \frac{\lambda^n}{\Gamma(n)} \int_{0}^{\infty} x^n e^{-\lambda x} dx \\\\ &= \frac{n}{\lambda} \end{align*}\]

The variance of \(X\) is:

\[\begin{align*} Var(X) &= E(X^2) - (E(X))^2 \\\\ &= \int_{0}^{\infty} x^2 f(x) dx - \left(\frac{n}{\lambda}\right)^2 \\\\ &= \int_{0}^{\infty} x^2 \frac{\lambda^n}{\Gamma(n)} x^{n-1} e^{-\lambda x} dx - \left(\frac{n}{\lambda}\right)^2 \\\\ &= \frac{n}{\lambda^2} \end{align*}\]

Now, to calculate the expected value of the single exponential (Z) and the sum of exponentials (Y), we can use the moment generating function for exponentials. The moment generating function of the exponential distribution with rate parameter \(\lambda\) is:

\[ M_Z (t) = \frac{\lambda}{\lambda-t} \] Using the moment generating function, we can calculate the expected value of \(Z\) as follows:

\[\begin{align*} E(Z) &= \frac{d}{dt} M_Z(t) \bigg\rvert_{t=0} \\\\ &= \frac{d}{dt} \frac{\lambda}{\lambda-t} \bigg\rvert_{t=0} \\\\ &= \frac{\lambda}{\lambda-0} \\\\ &= \lambda \end{align*}\]

To calculate the expected value of the sum of exponentials (Y), we can use the fact that the sum of independent exponential random variables with rates \(\lambda_1, \lambda_2, \ldots, \lambda_n\) is a Gamma random variable with size parameter \(n\) and shape parameter \(\lambda_1 + \lambda_2 + \cdots + \lambda_n\). Therefore, the moment generating function of \(Y\) is:

\[ M_Y(t) = (\frac {\lambda_1}{\lambda_1 - t}) (\frac {\lambda_2}{\lambda_2 - t}) \cdots \frac {\lambda_n}{\lambda_n - t} \]

Using the moment generating function, we can calculate the expected value of \(Y\) as follows:

\[\begin{align*} E(Y) &= \frac{d}{dt} M_Y(t) \bigg\rvert_{t=0} \\\\ &= \frac{d}{dt} \left(\frac{\lambda_1}{\lambda_1-t}\right) \left(\frac{\lambda_2}{\lambda_2-t}\right) \cdots \left(\frac{\lambda_n}{\lambda_n-t}\right) \bigg\rvert_{t=0} \\\\ &= \left(\frac{\lambda_1}{(\lambda_1-0)^2}\right) \left(\frac{\lambda_2}{(\lambda_2-0)^2}\right) \cdots \left(\frac{\lambda_n}{(\lambda_n-0)^2}\right) \\\\ &= \frac{1}{\lambda_1} + \frac{1}{\lambda_2} + \cdots + \frac{1}{\lambda_n} \end{align*}\]

1c-e

l <- 5  # rate parameter
n <- 10000  # number of observations
z <- rexp(n, l)  # generate exponential random variables

# Empirical probabilities
a <- 1 - pexp(l, 1/l)
b <- 1 - pexp(2*l, 1/l)
c <- 1 - pexp(3*l, 1/l)

# Print results
cat("a. P(Z > l | Z > l/2) =", a, "\n")

## a. P(Z > l | Z > l/2) = 0.3678794

cat("b. P(Z > 2l | Z > l) =", b, "\n")

## b. P(Z > 2l | Z > l) = 0.1353353

cat("c. P(Z > 3l | Z > l) =", c, "\n")

## c. P(Z > 3l | Z > l) = 0.04978707

To evaluate whether the memoryless property holds for the exponential distribution, we can use the following approach:

Let \(Z\) be an exponential random variable with rate parameter \(\lambda\). Then the memoryless property states that for any \(s, t > 0\),

\[P(Z>s+t∣Z>t)=P(Z>s)P(Z>s+t∣Z>t)=P(Z>s)\]

To check whether this property holds, we can evaluate the left-hand side and right-hand side of the equation above. Using conditional probability, we have:

\[\begin{align*} P(Z > s + t | Z > t) &= \frac{P(Z > s + t \text{ and } Z > t)}{P(Z > t)} \\\\ &= \frac{P(Z > s + t)}{P(Z > t)} \\\\ &= \frac{e^{-\lambda(s+t)}}{e^{-\lambda t}} \\\\ &= e^{-\lambda s} \\\\ &= P(Z > s) \end{align*}\]

Therefore, we see that the memoryless property holds for the exponential distribution.

1f

Loosely investigate whether P(YZ) = P(Y) P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities

# Generate 10000 observations of Y and Z
set.seed(123)
Y <- rexp(10000, lambda)
Z <- rexp(10000, lambda)

# Compute the quartiles for Y and Z
quartiles_Y <- quantile(Y, probs = c(0.25, 0.5, 0.75))
quartiles_Z <- quantile(Z, probs = c(0.25, 0.5, 0.75))

# Build the table with quartiles and initialize the sum
table_data <- matrix(0, nrow = 5, ncol = 5)
colnames(table_data) <- c("1st Quartile Y", "2nd Quartile Y", "3rd Quartile Y", "4th Quartile Y", "Sum")
rownames(table_data) <- c("1st Quartile Z", "2nd Quartile Z", "3rd Quartile Z", "4th Quartile Z", "Sum")

# Compute the joint probabilities and fill in the table
for (i in 1:3) {
  for (j in 1:3) {
    prob_joint <- mean(Y >= quartiles_Y[i] & Z >= quartiles_Z[j])
    table_data[i, j] <- prob_joint
  }
}

# Compute the marginal probabilities for Y and Z
marginal_Y <- colSums(table_data[1:3, 1:3])
marginal_Z <- rowSums(table_data[1:3, 1:3])

# Compute the sum
table_data[5, 1:4] <- c(marginal_Y, sum(marginal_Y))
table_data[1:4, 5] <- c(marginal_Z, sum(marginal_Z))
table_data[5, 5] <- sum(table_data[5, 1:4])

# Print the table
print(table_data)

##                1st Quartile Y 2nd Quartile Y 3rd Quartile Y 4th Quartile Y
## 1st Quartile Z         0.5620         0.3729         0.1858         0.0000
## 2nd Quartile Z         0.3750         0.2466         0.1226         0.0000
## 3rd Quartile Z         0.1864         0.1225         0.0599         0.0000
## 4th Quartile Z         0.0000         0.0000         0.0000         0.0000
## Sum                    1.1234         0.7420         0.3683         2.2337
##                   Sum
## 1st Quartile Z 1.1207
## 2nd Quartile Z 0.7442
## 3rd Quartile Z 0.3688
## 4th Quartile Z 2.2337
## Sum            4.4674

# Perform Fisher's Exact Test
fisher_result <- fisher.test(table_data[1:3, 1:3])

## Warning in fisher.test(table_data[1:3, 1:3]): 'x' has been rounded to integer:
## Mean relative difference: 0.9444867

# Perform Chi-Square Test
chi2_result <- chisq.test(table_data[1:3, 1:3])

## Warning in chisq.test(table_data[1:3, 1:3]): Chi-squared approximation may be
## incorrect

fisher_result

## 
##  Fisher's Exact Test for Count Data
## 
## data:  table_data[1:3, 1:3]
## p-value = 1
## alternative hypothesis: two.sided

chi2_result

## 
##  Pearson's Chi-squared test
## 
## data:  table_data[1:3, 1:3]
## X-squared = 3.2482e-05, df = 4, p-value = 1

If we have small sample sizes or expected cell counts below 5, Fisher’s exact test is often recommended due to its exactness. However, if we have larger sample sizes and expected cell counts above 5, Pearson’s chi-squared test is commonly used and provides a good approximation.

It’s important to interpret the results of both tests in conjunction with the specific context and consider factors such as sample size, expected cell counts, and the nature of the data.