library(kableExtra)

Problem 2:

Background:

A Type I error occurs if the null hypothesis is rejected when in fact the null hypothesis is true. An empirical Type I error rate can be computed by a Monte Carlo experiment. The test procedure is replicated a large number of times under the conditions of the null hypothesis. The empirical Type I error rate for the Monte Carlo experiment is the sample proportion of significant test statistics among the replicates. The probability of Type I error is the conditional probability that the null hypothesis is rejected given that H0 is true. Thus, if the test procedure is replicated a large number of times under the conditions of the null hypothesis (H0: \(\ μ=μ0\)), the observed Type I error rate should be at most (approximately) \(\alpha\). The simulation really only investigates empirically whether the method of computing the p-value in t.test (a numerical algorithm) is consistent with the theoretical value \(\alpha\) = 0.05.

Question:

The t-test is robust to mild departures from normality. Use Monte Carlo simulation to investigate whether the empirical Type I error rate of the t-test is approximately equal to the nominal significance level \(\alpha\) = 0.05, when the sampled population is non-normal. Discuss the simulation results for the cases where the sampled population is:

  1. \(\chi(1)^2\)

n = 100:

set.seed(123456) #keeps the data the same every run
n<- 100 #sample size
alpha <- 0.05 #significance level
mu0 <- 1 #mean of Chi-Squared(1)

m <- 1000 #number of replicates
p <- numeric(m) #storage for p-values
for (j in 1:m) { #Monte Carlo method generating the jth random sample from the null distribution x below
  x <- rchisq(n, df = mu0) #calling the distribution under t-test to be of a Chi-Squared Distribution with sample size n and degrees of freedom set to mu0 = 1
  ttest <- t.test(x, alternative = "two.sided", mu = mu0) #student two tailed t-test with Chi-Squared(1) distribution with a null hypothesis of mu = mu0
  p[j] <- ttest$p.value #computing the proportion of significant tests for each replicate, indexed by j = 1,...,m=1000
  }

Simulation results for n = 100:

p.hat <- mean(p < alpha) #Type I Error rate
se.hat <- sqrt(p.hat * (1-p.hat) / m) #estimate standard error of Type I Error rate
dt <- round(rbind(p.hat, se.hat),4) #rounding values to 4 decimal places and combining rows of p.hat and se.hat


dtnames <- c("Empirical Type-I Error", "Standard Error of Estimate") #labeling the rows
dimnames(dt) <- list(dtnames) #having the data include the names called

dt %>% #creating a neat table using the kableExtra package and giving it a caption
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Population from Chi-Squared(1) with n = 100") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Population from Chi-Squared(1) with n = 100
Empirical Type-I Error 0.0720
Standard Error of Estimate 0.0082

Estimates of Type I error probability will vary, but should be close to the nominal rate \(\alpha\) = 0.05 because all samples were generated under the null hypothesis from the assumed model for a t-test Here, the observed Type I Error rate is 0.0720 and the Standard Error of the estimate is around 0.0082. As you can see, the Empirical Type I Error rate differs from the nominal significance level \(\alpha\) = 0.05, by approximately 0.022, when the sampled population is of non-normal \(\chi(1)^2\) and n = 100. Theoretically, the probability of rejecting the null hypothesis of Ho:\(\ μ=μ0\) is 0.0720.

n = 1,000:

set.seed(123456)
n<- 1000 #increasing sample size to get closer to alpha
alpha <- 0.05
mu0 <- 1

m <- 1000 
p <- numeric(m) 
for (j in 1:m) {
  x <- rchisq(n, df = mu0)
  ttest <- t.test(x, alternative = "two.sided", mu = mu0)
  p[j] <- ttest$p.value
  }

Simulation results for n = 1,000:

p.hat1 <- mean(p < alpha)
se.hat1 <- sqrt(p.hat1 * (1-p.hat1) / m)

dt1 <- round(rbind(p.hat1, se.hat1),4)


dtnames <- c("Empirical Type-I Error", "Standard Error of Estimate")
dimnames(dt1) <- list(dtnames)

dt1 %>%
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Population from Chi-Squared(1) with n = 1,000") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Population from Chi-Squared(1) with n = 1,000
Empirical Type-I Error 0.0640
Standard Error of Estimate 0.0077

Here, the observed Type I Error rate is 0.0640 and the Standard Error of the estimate is around 0.0077. As you can see, the Empirical Type I Error rate differs from the nominal significance level \(\alpha\) = 0.05, by approximately 0.014, when the sampled population is of non-normal \(\chi(1)^2\) and n = 1,000. Theoretically, the probability of rejecting the null hypothesis of Ho: \(\ μ=μ0\) is 0.0640 which is 0.008 closer than with n = 100.

n = 10,000:

set.seed(123456)
n<- 10000 #increasing sample size to get closer to alpha
alpha <- 0.05
mu0 <- 1

m <- 1000 
p <- numeric(m) 
for (j in 1:m) {
  x <- rchisq(n, df = mu0)
  ttest <- t.test(x, alternative = "two.sided", mu = mu0)
  p[j] <- ttest$p.value
  }

Simulation results for n = 10,000:

p.hat2 <- mean(p < alpha)
se.hat2 <- sqrt(p.hat2 * (1-p.hat2) / m)
dt2 <- round(rbind(p.hat2, se.hat2),4)


dtnames <- c("Empirical Type-I Error", "Standard Error of Estimate")
dimnames(dt2) <- list(dtnames)

dt2 %>%
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Population from Chi-Squared(1) with n = 10,000") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Population from Chi-Squared(1) with n = 10,000
Empirical Type-I Error 0.0460
Standard Error of Estimate 0.0066

Here, the observed Type I Error rate is 0.0460 and the Standard Error of the estimate is around 0.0066. As you can see, the Empirical Type I Error rate differs from the nominal significance level \(\alpha\) = 0.05, by approximately 0.004, when the sampled population is of non-normal \(\chi(1)^2\) and n = 10,000. Theoretically, the probability of rejecting the null hypothesis of Ho: \(\ μ=μ0\) is 0.0460 which is 0.010 closer than with n = 1,000.

  1. \(\ U(0,2)\)

n = 100:

set.seed(123456) #keeps the data the same every run
n<- 100 #sample size
alpha <- 0.05 #significance level
mu0 <- 1 #mean of U(0,2)

m <- 1000 #number of replicates
p <- numeric(m) #storage for p-values
for (j in 1:m) { #Monte Carlo method generating the jth random sample from the null distribution x below
  x <- runif(n,0,2) #calling the distribution under t-test to be of a Uniform(0,2) Distribution with sample size n and degrees of freedom set to mu0 = 1
  ttest <- t.test(x, alternative = "two.sided", mu = mu0) #student two tailed t-test with U(0,2) distribution with a null hypothesis of mu = mu0
  p[j] <- ttest$p.value #computing the proportion of significant tests for each replicate, indexed by j = 1,...,m=1000
  }

Simulation results for n = 100:

p.hat3 <- mean(p < alpha) #Type I Error rate
se.hat3 <- sqrt(p.hat3 * (1-p.hat3) / m) #estimate standard error of Type I Error rate

dt3 <- round(rbind(p.hat3, se.hat3),4) #rounding values to 4 decimal places and combining rows of p.hat and se.hat


dtnames <- c("Empirical Type-I Error", "Standard Error of Estimate") #labeling the rows
dimnames(dt3) <- list(dtnames) #having the data include the names called

dt3 %>% #creating a neat table using the kableExtra package and giving it a caption
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Population from U(0,2) with n = 100") %>% 
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Population from U(0,2) with n = 100
Empirical Type-I Error 0.0440
Standard Error of Estimate 0.0065

Here, the observed Type I Error rate is 0.0440 and the Standard Error of the estimate is around 0.0065. As you can see, the Empirical Type I Error rate differs from the nominal significance level \(\alpha\) = 0.05, by approximately 0.006, when the sampled population is of non-normal \(\ U(0,2)\) dist. and n = 100. Theoretically, the probability of rejecting the null hypothesis of Ho: \(\ μ=μ0\) is 0.0440.

n = 1,000:

set.seed(123456)
n<- 1000 #increasing sample size to get closer to alpha
alpha <- 0.05
mu0 <- 1

m <- 1000 
p <- numeric(m) 
for (j in 1:m) {
  x <- runif(n,0,2)
  ttest <- t.test(x, alternative = "two.sided", mu = mu0)
  p[j] <- ttest$p.value
}

Simulation results for n = 1,000:

p.hat4 <- mean(p < alpha)
se.hat4 <- sqrt(p.hat4 * (1-p.hat4) / m)
dt4 <- round(rbind(p.hat4, se.hat4),4)


dtnames <- c("Empirical Type-I Error", "Standard Error of Estimate")
dimnames(dt4) <- list(dtnames)

dt4 %>%
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Population from U(0,2) with n = 1,000") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Population from U(0,2) with n = 1,000
Empirical Type-I Error 0.038
Standard Error of Estimate 0.006

Here, the observed Type I Error rate is 0.0380 and the Standard Error of the estimate is around 0.0006. As you can see, the Empirical Type I Error rate differs from the nominal significance level \(\alpha\) = 0.05, by approximately 0.012, when the sampled population is of non-normal \(\ U(0,2)\) dist. and n = 1,000. Theoretically, the probability of rejecting the null hypothesis of Ho: \(\ μ=μ0\) is 0.0380 which is actually 0.006 further away than with n = 100.

n = 10,000:

set.seed(123456)
n<- 10000 #increasing sample size to get closer to alpha
alpha <- 0.05
mu0 <- 1

m <- 1000 
p <- numeric(m) 
for (j in 1:m) {
  x <- runif(n,0,2)
  ttest <- t.test(x, alternative = "two.sided", mu = mu0)
  p[j] <- ttest$p.value
  }

Simulation results for n = 10,000:

p.hat5 <- mean(p < alpha)
se.hat5 <- sqrt(p.hat5 * (1-p.hat5) / m)

dt5 <- round(rbind(p.hat5, se.hat5),4)


dtnames <- c("Empirical Type-I Error", "Standard Error of Estimate")
dimnames(dt5) <- list(dtnames)

dt5 %>%
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Population from U(0,2) with n = 10,000") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Population from U(0,2) with n = 10,000
Empirical Type-I Error 0.051
Standard Error of Estimate 0.007

Here, the observed Type I Error rate is 0.0510 and the Standard Error of the estimate is around 0.007. As you can see, the Empirical Type I Error rate differs from the nominal significance level \(\alpha\) = 0.05, by approximately 0.001, when the sampled population is of non-normal \(\ U(0,2)\) dist. and n = 100. Theoretically, the probability of rejecting the null hypothesis of Ho: \(\ μ=μ0\) is 0.0510 which is 0.013 closer than with n = 1,000.

Final Discussion with table of values:

a <- c(p.hat, p.hat1, p.hat2, p.hat3, p.hat4, p.hat5, se.hat, se.hat1, se.hat2, se.hat3, se.hat4, se.hat5) #creating a vector of all of the different Type I Error rates and standard errors
tab <- matrix(rep(a, times = 1), ncol=2, nrow = 6) #creating a matrix of the values
colnames(tab) <- c("Empirical Type-I Error", "Standard Error of Estimate") #labeling the columns
rownames(tab) <- c("Chi-Squared(1): n = 100", "n = 1,000", "n = 10,000", "U(0,2): n = 100", "n = 1,000", "n = 10,000") #labeling the rows
tab <- as.table(round(tab, digits = 4)) #creating the data table of the values combined and rounding to 4 decimal places

tab %>% 
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Populations") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Populations
Empirical Type-I Error Standard Error of Estimate
Chi-Squared(1): n = 100 0.072 0.0082
n = 1,000 0.064 0.0077
n = 10,000 0.046 0.0066
U(0,2): n = 100 0.044 0.0065
n = 1,000 0.038 0.0060
n = 10,000 0.051 0.0070

As you can see from the table, the t-test is robust to mild departures from normality. Thus, when the test procedure was replicated a large number of times under the conditions of the null hypothesis Ho: \(\ μ=μ0\), the observed Type I error rate got closer to \(\alpha\) = 0.05 as the sample size increased in both non-normal distributions.

Problem 3:

Background:

Empirical confidence level is an estimate of the confidence level obtained by simulation. If the sampling and estimation is repeated a large number of times, approximately 90% of the intervals should contain \(\ mu\).

Question:

Suppose that Y1,Y2,…,Yn are a random sample from a lognormal distribution with unknown parameters. Construct a 90% confidence interval for the parameter \(\ mu\). Use Monte Carlo Method to obtain an empirical estimate of the confidence level.

Solution:

set.seed(1234) #setting the seed
m1 <- 10000 #number of Monte Carlo paths
n1 <- 10 #random number of samples chosen
alpha1 <- 0.10 #significance level

#If Y follows logNormal(mu,sd) distribution then log(X) follows standard normal distribution with mu = 0 and sigma = 1

x <- rlnorm(n1, meanlog = 0, sdlog = 1) #lognormal(n = 30, mu = 0, var = 1) distribution
LCI <- mean(log(x)) - qt((1 - (alpha1/2)), df = (n1 - 1)) * (sqrt((var(x))/n1)) #lower confidence interval limit
UCI <- mean(log(x)) + qt((1 - (alpha1/2)), df = (n1 - 1)) * (sqrt((var(x))/n1)) #upper confidence interval limit

CI <- c(LCI, UCI) #90% confidence interval for the parameter mu

LCL1 <- replicate(m1, expr = { #Monte Carlo method generating the expression to be repeatedly executed in braces { }
    x <- rlnorm(n1, meanlog = 0, sdlog = 1)
    mean(log(x)) - (qt(1-(alpha1/2), df = (n1-1)) * (sd(x)/sqrt(n1))) #lower confidence level
    } )
UCL1 <- replicate(m1, expr = {
    x <- rlnorm(n1, meanlog = 0, sdlog = 1)
    mean(log(x)) + (qt(1-(alpha1/2), df = (n1-1)) * (sd(x)/sqrt(n1))) #upper confidence level
    } )

CL <- mean((UCL1 > 0) & (LCL1 < 0)) #computing the mean to get the confidence 
count <- sum((UCL1 > 0) & (LCL1 < 0)) #count the intervals that contain mu 

Discussion:

dt6 <- round(rbind(CI, CL, count), 4) #creating the data table of the values combined and rounding to 4 decimal places
names <- c("90% Confidence Interval", "Empirical Estimate of Confidence Level", "Sum of Confidence Intervals that contain mu") #labeling the columns
dimnames(dt6) <- list(names)
 

dt6 %>% 
  kbl(caption = "Monte Carlo Simulation with Non-Normal Sampled Populations to Estimate a Confidence Level") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Monte Carlo Simulation with Non-Normal Sampled Populations to Estimate a Confidence Level
90% Confidence Interval -0.8895 0.1232
Empirical Estimate of Confidence Level 0.9265 0.9265
Sum of Confidence Intervals that contain mu 9265.0000 9265.0000

Here, you can see that the result is 9,265 intervals satisfied (UCL1 > 0) & (LCL1 < 0), so the empirical confidence level is 92.65% in this experiment. The result is known to vary but should be close to the theoretical value of 90%. Which means that 92.65% of the intervals contained the population mean.