Problem: 1 Probability Density 1: X~Gamma. Using R, generate a random variable X that has 10,000 random Gamma pdf values. A Gamma pdf is completely describe by n (a size parameter) and lambda (l , a shape parameter). Choose any n greater 3 and an expected value (l) between 2 and 10 (you choose).
Probability Density 2: Y~Sum of Exponentials. Then generate 10,000 observations from the sum of n exponential pdfs with rate/shape parameter (l). The n and l must be the same as in the previous case. (e.g., mysum=rexp(10000,l)+rexp(10000,l)+..)
Probability Density 3: Z~ Exponential. Then generate 10,000 observations from a single exponential pdf with rate/shape parameter (l).
NOTE: The Gamma distribution is quite common in data science. For example, it is used to model failures for multiple processes when each of those processes has the same failure rate. The exponential is used for constant failure rates, service times, etc.
#Set values for n and lambda
n <- 5
lambda <- 7
#Generate 10,000 random variables from Gamma distribution
X <- rgamma(10000, n, lambda)
#Generate 10,000 random variables from sum of exponential distributions
Y <- sum(rexp(10000, lambda), n)
#Generate 10,000 random variables from exponential distribution
Z <- rexp(10000, lambda)
1a. Calculate the empirical expected value (means) and variances of all three pdfs.
#Calculate empirical means and variances
mean_X <- mean(X)
mean_X
## [1] 0.7127722
var_X <- var(X)
var_X
## [1] 0.09995861
mean_Y <- mean(Y)
mean_Y
## [1] 1438.283
var_Y <- var(Y)
var_Y
## [1] NA
mean_Z <- mean(Z)
mean_Z
## [1] 0.1387415
var_Z <- var(Z)
var_Z
## [1] 0.01958095
1b. Using calculus, calculate the expected value and variance of the Gamma pdf (X). Using the moment generating function for exponentials, calculate the expected value of the single exponential (Z) and the sum of exponentials (Y)
#Calculate expected values and variances analytically
exp_X <- n / lambda
exp_X
## [1] 0.7142857
var_X <- n / lambda^2
var_X
## [1] 0.1020408
exp_Y <- n / lambda
exp_Y
## [1] 0.7142857
var_Y <- n / lambda^2
var_Y
## [1] 0.1020408
exp_Z <- 1 / lambda
exp_Z
## [1] 0.1428571
var_Z <- 1 / lambda^2
var_Z
## [1] 0.02040816
1c-e. Probability. For pdf Z (the exponential), calculate empirically probabilities a through c. Then evaluate through calculus whether the memoryless property holds.
#Calculate probabilities and check memorylessness
prob_a <- mean(Z > lambda) / mean(Z > lambda/2)
prob_b <- mean(Z > 2*lambda) / mean(Z > lambda)
prob_c <- mean(Z > 3*lambda) / mean(Z > lambda)
mem_property <- ifelse(prob_a == prob_b & prob_b == prob_c, "Memoryless property holds", "Memoryless property does not hold")
Loosely investigate whether P(YZ) = P(Y) P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities.
#Create table for quartiles
Y_quartiles <- quantile(Y, c(0.25, 0.5, 0.75))
Z_quartiles <- quantile(Z, c(0.25, 0.5, 0.75))
table <- matrix(c(mean(Y[Z < Z_quartiles[1]]), mean(Y[Z >= Z_quartiles[1] & Z < Z_quartiles[2]]), mean(Y[Z >= Z_quartiles[2] & Z < Z_quartiles[3]]), mean(Y[Z >= Z_quartiles[3]]),
mean(Z[Y < Y_quartiles[1]]), NA, NA, NA,
mean(Z[Y >= Y_quartiles[1] & Y < Y_quartiles[2]]), NA, NA, NA,
mean(Z[Y >= Y_quartiles[2] & Y < Y_quartiles[3]]), NA, NA, NA,
mean(Z[Y >= Y_quartiles[3]]), NA, NA, NA), nrow = 4, ncol = 4, dimnames = list(c("1st Quartile Y", "2nd Quartile Y", "3rd Quartile Y", "4th Quartile Y"), c("1st Quartile Z", "2nd Quartile Z", "3rd Quartile Z", "4th Quartile Z")))
## Warning in matrix(c(mean(Y[Z < Z_quartiles[1]]), mean(Y[Z >= Z_quartiles[1] & :
## data length differs from size of matrix: [20 != 4 x 4]
colSums(table, na.rm = TRUE)
## 1st Quartile Z 2nd Quartile Z 3rd Quartile Z 4th Quartile Z
## 0 0 0 0