#Problem 1.
Probability Density 1: X~Gamma. Using R, generate a random variable X that has 10,000 random Gamma pdf values. A Gamma pdf is completely describe by n (a size parameter) and lambda ( λ , a shape parameter). Choose any n greater 3 and an expected value (λ) between 2 and 10 (you choose).
#We will use the following function in R: rgamma(n, shape = λ)
#set the seed
set.seed(0)
#I will choose 7 as my shape parameter and 5 as my n (size parameter). Thus, utilizing the rgamma formula we will generate the random variable X
X <-rgamma(5, 7)
#answer
print(X)
## [1] 10.118678 5.694872 4.544125 7.600114 3.166744
Probability Density 2: Y~Sum of Exponentials. Then generate 10,000 observations from the sum of n exponential pdfs with rate/shape parameter (λ). The n and λ must be the same as in the previous case. (e.g., mysum=rexp(1000, λ)+ rexp(10000,λ)+…)
#We will use the following function in R: rexp(n,λ)
# n = 5 and λ = 7
#We will need to take the sum of the rexp function with 10,000 observations 5 times.
#set the seed
set.seed(0)
#answer
Y <- rexp(10000,7) + rexp(10000,7) + rexp(10000,7) + rexp(10000,7) + rexp(10000,7)
Probability Density 3: Z~ Exponential. Then generate 10,000 observations from a single exponential pdf with rate/shape parameter (λ).
#We will use the following function in R: rexp(n,rate/λ)
# rate = 1, n = 1 and λ = 7
#We will need to only execute the rexp(n,rate/λ) once.
#set the seed
set.seed(0)
#answer
Z <- rexp(10000,7)
1a.Calculate the empirical expected value (means) and variances of all three pdfs.
#1.Expected Value of Gamma pdf
#E(X) = n/λ; n=5, λ=7
Expected_Value_Gamma <- 5/7
#answer
print(Expected_Value_Gamma)
## [1] 0.7142857
#1.Variance Value of Gamma pdf
#V(X) = n/λ^2; n=5, λ=7
Variance_Value_Gamma <- 5/49
#answer
print(Variance_Value_Gamma)
## [1] 0.1020408
#2.Expected Value of Sum of Exponentials pdf (Erlang distribution)
#E(X) = k/λ; K=5, λ=7
Expected_Value_Erlang <- 5/7
#answer
print(Expected_Value_Erlang)
## [1] 0.7142857
#2.Variance of Sum of Exponentials pdf (Erlang distribution)
#E(X) = k/λ^2; K=5, λ=7
Variance_Erlang <- 5/49
#answer
print(Expected_Value_Erlang)
## [1] 0.7142857
#3.Expected Value of Exponential distribution
#E(X) = 1/λ; λ=7
Expected_Value_Exponential<- 1/7
#answer
print(Expected_Value_Exponential)
## [1] 0.1428571
#3.Variance of Exponential distribution
#E(X) = 1/λ^2; λ=7
Variance_Exponential<- 1/49
#answer
print(Variance_Exponential)
## [1] 0.02040816
1b. Using calculus, calculate the expected value and variance of the Gamma pdf (X). Using the moment generating function for exponentials, calculate the expected value of the single exponential (Z) and the sum of exponentials (Y)
Let λ=7
1c-e. Probability. For pdf Z (the exponential), calculate empirically probabilities a through c. Then evaluate through calculus whether the memoryless property holds.
We will say that λ=7 as I chose from previous examples above.
Loosely investigate whether P(YZ) = P(Y) P(Z) by building a table with quartiles and evaluating the marginal and joint probabilities.
#Random "Sum of Exponentials" number generation
Y <- rexp(10000,7) + rexp(10000,7) + rexp(10000,7) + rexp(10000,7) + rexp(10000,7)
#Random "Exponential" number generation
Z <- rexp(10000,7)
#binding the output and displaying the first 5 numbers which were generated
df = data.frame(cbind(Y, Z))
df[0:5,1:2]
## Y Z
## 1 1.2255592 0.45010634
## 2 0.9530598 0.10309628
## 3 0.3968399 0.33846478
## 4 0.6621716 0.05029390
## 5 0.3780897 0.01787496
# finding Quartile values for Y
quantile(Y)
## 0% 25% 50% 75% 100%
## 0.07358754 0.47470113 0.67158664 0.89838938 2.30837518
# finding Quartile values for Z
quantile(Z)
## 0% 25% 50% 75% 100%
## 2.079637e-05 4.099367e-02 9.863839e-02 1.970642e-01 1.361793e+00
Below, I develop a joint probability table which contains the values of the “Y” variable and the “Z” variable.
Below, I converted the joint probability table values into joint
probabilities P(Y ∩ Z) and they are highlighted in yellow. I also
converted the joint probability table into marginal probabilities —-
which are represented by the “Sum” row and “Sum” column and are
highlighted in blue. The total marginal probabilities of the “Sum” row
should equal to the total marginal probabilities of the “Sum” column
which should equal to 1.
By looking at the joint probability table with probabilities below we
can see that in fact, P(YZ) = P(Y) P(Z). We can see that: P(Table[4,4])
* P(Table[5,1]) = P(Table[4,1]); 0.8422*0.1066 = 0.0898 and it follows
that P(Table[4,1]) in fact equals to 0.0898.
Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?
To see if independence holds and utilizing Fisher’s Exact Test and Chi Square Test, I will create a matrix which will be a copy of the Joint Probability table which will list all the values for both the “Y” and “Z” variables.
H0 (Null Hypotheses): Y and Z are independent events. H1 (Alternate Hypotheses): Y and Z are dependent events.
# Create a matrix from the values of the "Y" and "Z" variable values
data = matrix(c(0.019,0.027,0.036,0.100,0.047,0.065,0.087,0.243,0.095,0.131,0.176,0.490,0.863,1.191,1.594,4.444), ncol=4, byrow=TRUE)
colnames(data) <- c('1st Quartile Y','2d Quartile Y','3d Quartile Y','4th Quartile Y')
rownames(data) <- c('1st Quartile Z','2d Quartile Z','3d Quartile Z','4th Quartile Z')
#convert to table
table = as.table(data)
#conduct the fisher test
fisher.test(table)
## Warning in fisher.test(table): 'x' has been rounded to integer: Mean relative
## difference: 0.2803913
##
## Fisher's Exact Test for Count Data
##
## data: table
## p-value = 1
## alternative hypothesis: two.sided
#conduct the Chi Square Test
chisq.test(table)
## Warning in chisq.test(table): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: table
## X-squared = 1.253e-05, df = 9, p-value = 1
Fisher Exact Test Analysis: Since the p-value=1 so we fail to reject the Null Hypothesis. This would mean that there is no positive association between the “Y” and “Z” variables. This means the outcomes for the “Y” and “Z” samples is independent
Chi Square Test: Since the p-value=1 so we fail to reject the Null Hypothesis. This would mean that there is no positive association between the “Y” and “Z” variables. This means the outcomes for the “Y” and “Z” samples is independent
In conclusion, both the Fisher Exact Test and the Chi Square Test both yield the same p-value of 1 and confirming that “Y” and “Z” samples is independent. I believe that the Chi Square Test is more appropriate because we have a large sample size of 10,000 and the Chi Square Test better applies to large sample like in our case.