Problem 1

Probability Density 1: X~Gamma. Using R, generate a random variable X that has 10,000 random Gamma pdf values. A Gamma pdf is completely describe by n (a size parameter) and lambda ( \(\lambda\) , a shape parameter). Choose any n greater 3 and an expected value (\(\lambda\)) between 2 and 10 (you choose).

Christian’s Response:

# set up parameters
n<-10
lambda<-7
random<-10000

x <-rgamma(random,n,lambda ) 

Probability Density 2: Y~Sum of Exponentials. Then generate 10,000 observations from the sum of n exponential pdfs with rate/shape parameter (\(\lambda\)). The n and \(\lambda\) must be the same as in the previous case. (e.g., mysum=rexp(10000,\(\lambda\))+rexp(\(\lambda\))+..)

Christian’s Response:

y <- rowSums(matrix(rexp(random * n, rate = lambda), nrow = 10000))

Probability Density 3: Z~ Exponential. Then generate 10,000 observations from a single exponential pdf with rate/shape parameter (\(\lambda\)).

Christian’s Response:

z <- rexp(10000, rate = lambda)

NOTE: The Gamma distribution is quite common in data science. For example, it is used to model failures for multiple processes when each of those processes has the same failure rate. The exponential is used for constant failure rates, service times, etc

1a. Calculate the empirical expected value (means) and variances of all three pdfs.

Christian’s Response:

mean_x <- mean(x)
var_x <- var(x)

mean_y <- mean(y)
var_y <- var(y)

mean_z <- mean(z)
var_z <- var(z)

# Print the results
cat("The mean of x is:", mean_x, "\n")
## The mean of x is: 1.429566
cat("The variance of x is:", var_x, "\n")
## The variance of x is: 0.2042737
cat("The mean of y is:", mean_y, "\n")
## The mean of y is: 1.430987
cat("The variance of y is:", var_y, "\n")
## The variance of y is: 0.2053878
cat("The mean of z is:", mean_z, "\n")
## The mean of z is: 0.1421183
cat("The variance of z is:", var_z, "\n")
## The variance of z is: 0.020126

1b. Using calculus, calculate the expected value and variance of the Gamma pdf (x). Using the moment generating function for exponentials, calculate the expected value of the single exponential (z) and the sum of exponentials (y)

Christian’s Response:

  1. Expected Value: We cam get the Gamma probability density function (PDF) by the following:

\[ f(x)=\frac{1}{\Gamma \lambda n^{\lambda}}x^{\lambda-1}e^{-x/n}\] To calculate the expect value we can use this formula:

\[=\frac{1}{\Gamma \lambda n^{\lambda}}\int_{0}^{\infty}x^{\lambda}f(x) dx\]

When substituting the given PDF into the formula, we get:

\[=\frac{1}{\Gamma \lambda n^{\lambda}}\int_{0}^{\infty}x^{\lambda}e^{-x/n} dx\]

Next, we make a substitution: \(t = \frac{x}{n}\), which gives us \(x = nt\) and \(dx = ndt\). The integral becomes:

\[=\frac{1}{\Gamma (\lambda) n^{\lambda}}\int_{0}^{\infty}(nt)^{\lambda}e^{-t}n dt\]

When simplified, we get:

\[\Gamma ({\lambda})=\frac{1}{\Gamma(\lambda)}*n^{\lambda-1}*n\int_{0}^{\infty} t^{\lambda}e^{-t}dt\]

Using the property \(\Gamma(\lambda + 1) = \lambda \cdot \Gamma(\lambda)\), we can rewrite the expression as:

\[\therefore E(X)=\frac{n \Gamma(\lambda +1)}{\Gamma(\lambda)}\]

  1. Variance:

The variance is given by:

\[Variance:E(x-Ex)^{2}=Ex^{2}-(Ex)^{2}\]

To find \(E(X^2)\), we calculate the integral:

\[E(X^2) = \frac{1}{\Gamma(\lambda) n^\lambda} \int_{0}^{\infty} x^2 t^{\lambda-1} e^{-t}n \, dt = \frac{n^2}{\Gamma(\lambda)} \int_{0}^{\infty} t^{\lambda+1} e^{-t} \, dt\] For the moment generating function (MGF) of the exponential distribution, we have:

\[M(t) = E[e^{tX}] = \int_{0}^{\infty} e^{tx} \cdot \lambda e^{-\lambda x} \, dx\]

Simplifying, we get:

\[M(t) = \lambda \int_{0}^{\infty} e^{tx} \cdot e^{(t-\lambda)x} \, dx\]

Evaluating the integral, we find:

\[M(t) = \lambda \left(\frac{1}{t-\lambda}e^{(t-\lambda)x}\right) \Bigg|_{0}^{\infty}\] Next, we find the first derivative of the MGF:

\[E(X) = \frac{dM(t)}{dt} = \frac{\lambda}{(t-\lambda)^2}\] Taking the second derivative, we obtain:

\[E(X^2) = \frac{d^2M(t)}{dt^2} = \frac{2\lambda}{(t-\lambda)^3}\]

Therefore, the variance is given by:

\[\text{Var}(X) = \frac{2}{\lambda^2} - \left(\frac{1}{\lambda}\right)^2 = \frac{1}{\lambda^2}\]

1c-e. Probability. For pdf Z (the exponential), calculate empirically probabilities a through c. Then evaluate through calculus whether the memoryless property holds.

  1. P(Z>\(\lambda\)|Z>\(\lambda\)/2) b. P(Z>2\(\lambda\)|Z>\(\lambda\)) c. P(Z>3\(\lambda\)|Z>\(\lambda\))

Christian’s Response:

To calculate the probability P(Z > λ | Z > λ/2) empirically, you need to determine the proportion of values from the exponential distribution Z that are greater than λ, given that they are already greater than λ/2. We can do this by dividing the sum of values from Z that are greater than λ by the sum of values that are greater than λ/2.

We can use the cumulative distribution function (CDF) of the exponential distribution to calculate this probability. The memoryless property of the exponential distribution states that the probability of an event occurring in the next λ units of time is independent of how much time has already passed. Therefore, P(Z > λ | Z > λ/2) should be equal to P(Z > λ/2).

# Empirical calculation
prob_a_empirical <- sum(z > lambda) / sum(z > lambda/2)
# Calculus calculation
prob_a_calculus <- 1 - pexp(lambda/2, rate = lambda)

Empirically, we can calculate the proportion of values from Z that are greater than 2λ, given that they are already greater than λ. We can use the CDF of the exponential distribution to calculate this probability.

# Empirical calculation
prob_b_empirical <- sum(z > 2*lambda) / sum(z > lambda)
# Calculus calculation
prob_b_calculus <- 1 - pexp(lambda, rate = lambda)

To calculate the proportion of values from Z that are greater than 3λ, given that they are already greater than λ, we can use the CDF of the exponential distribution to calculate this probability.

In the solution, the empirical calculations are performed by comparing the values from Z to the given thresholds (λ, λ/2, 2λ, 3λ) and summing the counts of values that satisfy the conditions. The proportions are then obtained by dividing these counts.

The calculus calculations are performed using the pexp function, which calculates the CDF of the exponential distribution. The rate parameter of pexp is set to λ, as the exponential distribution is characterized by a rate parameter rather than a scale parameter. The probabilities are obtained by subtracting the CDF values at the corresponding thresholds from 1.

# Empirical calculation
prob_c_empirical <- sum(z > 3*lambda) / sum(z > lambda)
# Calculus calculation
prob_c_calculus <- 1 - pexp(2*lambda, rate = lambda)

Loosely investigate whether P(YZ) = P(Y) P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities.

Christian’s Response:

This table displays the joint probabilities for different quartiles of Y and Z, as well as the marginal probabilities for Y and Z. The sum of each row and column represents the marginal probabilities, and the sum of all probabilities is shown

set.seed(123)
Y <- rexp(10000, lambda) 
Z <- rexp(10000, lambda)
# Compute the quartiles for Y and Z
quartiles_Y <- quantile(Y, probs = c(0.25, 0.5, 0.75))
quartiles_Z <- quantile(Z, probs = c(0.25, 0.5, 0.75))

# Build the table with quartiles and initialize the sum
table_data <- matrix(0, nrow = 5, ncol = 5)
colnames(table_data) <- c("1st Quartile Y", "2nd Quartile Y", "3rd Quartile Y", "4th Quartile Y", "Sum")
rownames(table_data) <- c("1st Quartile Z", "2nd Quartile Z", "3rd Quartile Z", "4th Quartile Z", "Sum")

# Compute the joint probabilities and fill in the table
for (i in 1:3) {
  for (j in 1:3) {
    prob_joint <- mean(Y >= quartiles_Y[i] & Z >= quartiles_Z[j])
    table_data[i, j] <- prob_joint
  }
}

# Compute the marginal probabilities for Y and Z
marginal_Y <- colSums(table_data[1:3, 1:3])
marginal_Z <- rowSums(table_data[1:3, 1:3])

# Compute the sum
table_data[5, 1:4] <- c(marginal_Y, sum(marginal_Y))
table_data[1:4, 5] <- c(marginal_Z, sum(marginal_Z))
table_data[5, 5] <- sum(table_data[5, 1:4])

# Print the table
print(table_data)
##                1st Quartile Y 2nd Quartile Y 3rd Quartile Y 4th Quartile Y
## 1st Quartile Z         0.5620         0.3729         0.1858         0.0000
## 2nd Quartile Z         0.3750         0.2466         0.1226         0.0000
## 3rd Quartile Z         0.1864         0.1225         0.0599         0.0000
## 4th Quartile Z         0.0000         0.0000         0.0000         0.0000
## Sum                    1.1234         0.7420         0.3683         2.2337
##                   Sum
## 1st Quartile Z 1.1207
## 2nd Quartile Z 0.7442
## 3rd Quartile Z 0.3688
## 4th Quartile Z 2.2337
## Sum            4.4674

Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?

Christian’s Response:

# Perform Fisher's Exact Test
fisher_result <- fisher.test(table_data[1:3, 1:3])
## Warning in fisher.test(table_data[1:3, 1:3]): 'x' has been rounded to integer:
## Mean relative difference: 0.9444867
# Perform Chi-Square Test
chi2_result <- chisq.test(table_data[1:3, 1:3])
## Warning in chisq.test(table_data[1:3, 1:3]): Chi-squared approximation may be
## incorrect
fisher_result
## 
##  Fisher's Exact Test for Count Data
## 
## data:  table_data[1:3, 1:3]
## p-value = 1
## alternative hypothesis: two.sided
chi2_result
## 
##  Pearson's Chi-squared test
## 
## data:  table_data[1:3, 1:3]
## X-squared = 3.2482e-05, df = 4, p-value = 1

Based on the provided information and the results of the Chi-Square Test, the data suggests that independence holds between the variables Y and Z.

The Chi-Square Test is used to determine whether two categorical variables are independent or if there is an association between them. In this case, the test was performed on the joint probabilities in the table, and the obtained p-value of 1 indicates that there is no significant evidence to reject the null hypothesis of independence.