Assignment Objectives

  • Reinforce the understanding of Bootstrap sampling .

  • Understand the bootstrap estimation: confidence interval and sampling distribution.

Policies of Using AI Tools

Policy on AI Tool Use: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

Code Inclusion Requirement: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.

Log-normal Distribution Revisited

If \(Y = \ln(X) \sim N(\mu, \sigma^2)\), then \(X\) follows a lognormal distribution \(X \sim \text{Lognormal}(\mu, \sigma^2)\). The probability density is given by

\[ f(x|\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0 \]

After some algebra, we can express the mean and variance of the above lognormal distribution in the following

\[\begin{align} \mathbb{E}[X] &= \exp\left(\mu + \frac{\sigma^2}{2}\right) \\ \text{Var}(X) &= [\exp(\sigma^2) - 1] \exp(2\mu + \sigma^2) \end{align}\]

Using the relationship between normal and log-normal distribution and a sample \(\{x_1, x_2, \dots, x_n\}\), the MLE estimators of \(\mu\) and \(\sigma^2\) are given by

\[\begin{align} \hat{\mu} &= \frac{1}{n}\sum_{i=1}^n \ln(x_i) \\ \hat{\sigma}^2 &= \frac{1}{n}\sum_{i=1}^n (\ln(x_i) - \hat{\mu})^2 \end{align}\]

Using the plug-in principle of MLE, we have the MLE of \(\mathbb{E}[X]\) and \(\text{Var}(X)\) in the following

\[ \boxed{\widehat{\mathbb{E}[X]} = \exp\left(\hat{\mu} + \frac{\hat{\sigma}^2}{2}\right)} \]

This assignment focuses on constructing various bootstrap confidence intervals of the lognormal population mean \(\mathbb{E}[X]\)


Question: Trace Metal Concentrations in Soil

Soil lead (Pb) concentrations (mg/kg) from 55 urban garden sites. Trace metals in environmental media typically follow lognormal distributions due to:

  • Multiplicative processes controlling accumulation

  • Positive constraints (concentrations cannot be negative)

  • Right-skewed nature of contamination patterns

0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91, 
2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56, 
1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89, 
11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78, 
22.34, 3.78, 2.78

\[\boxed{\text{Instructions:}}\]

Assume the data follow a log-normal distribution. For each question in parts (b)–(d), begin by clearly explaining the reasoning behind your analytical approach. Then, develop your own R functions to implement three types of confidence intervals: the asymptotic interval, the percentile bootstrap interval, and the bias-corrected and accelerated (BCa) bootstrap interval. Use these functions to construct the required confidence intervals, and verify your results using the appropriate functions from the boot package.

You are encouraged to design a wrapper function that integrates all confidence interval methods, allowing users to select the desired method through an input argument.

\[\boxed{\text{Individual Questions:}}\]

a). Perform 5000 bootstrap samples to estimate the bootstrap sampling distribution of \(\boxed{\widehat{\mathbb{E}[X]}}\). Display the distribution using either a histogram or a kernel density plot. Comment on the shape, variability, and any notable patterns observed in the bootstrap sampling distribution.

To estimate the sampling distribution of the lognormal population mean, I use bootstrap resampling. For each bootstrap sample, I resample the observed data with replacement and compute the plug-in estimator of \(E[X]\).

## Soil lead concentration data (mg/kg)
soil <- c(
  0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91,
  2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56,
  1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89,
  11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78,
  22.34, 3.78, 2.78
)

## Plug-in estimator for the lognormal population mean E[X]
lognormal_mean_hat <- function(x) {
  logx <- log(x)
  mu_hat <- mean(logx)
  sigma2_hat <- mean((logx - mu_hat)^2)  # MLE of variance on log-scale
  exp(mu_hat + sigma2_hat / 2)
}

## Compute estimate from original sample
theta_hat <- lognormal_mean_hat(soil)
theta_hat
[1] 4.218099
## Bootstrap sampling distribution of E[X]
set.seed(1)

B <- 5000
n <- length(soil)
boot_theta <- rep(0, B)

for (b in 1:B) {
  x_star <- sample(soil, size = n, replace = TRUE)
  boot_theta[b] <- lognormal_mean_hat(x_star)
}
## Histogram of bootstrap sampling distribution
hist(boot_theta,
     breaks = 30,
     main = "Bootstrap Sampling Distribution of E[X]",
     xlab = expression(hat(E[X])))

The bootstrap sampling distribution of the estimated value of E[X] appears roughly unimodal with a slight right skew. It is centered around the original estimate and shows a moderate amount of variability. The right skew is expected given the lognormal nature of the data, and a few larger values indicate the effect of higher observations in the sample.

b). Construct a 95% bootstrap percentile confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

To construct the 95% bootstrap percentile confidence interval, I use the empirical distribution of the bootstrap estimates obtained in part (a). Since the percentile method uses the quantiles of the bootstrap sampling distribution directly, I take the 2.5th and 97.5th percentiles of the bootstrap estimates of the lognormal population mean. This gives a 95% percentile bootstrap confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

## Percentile bootstrap confidence interval for E[X]
percentile_ci <- function(boot_stat, alpha = 0.05) {
  quantile(boot_stat, probs = c(alpha / 2, 1 - alpha / 2))
}

ci_percentile <- percentile_ci(boot_theta)
ci_percentile
    2.5%    97.5% 
3.125359 5.589956 
## Verify percentile interval using boot package
library(boot)

boot_mean_stat <- function(data, indices) {
  sample_data <- data[indices]
  lognormal_mean_hat(sample_data)
}

boot_result <- boot(data = soil, statistic = boot_mean_stat, R = 5000)
boot.ci(boot_result, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_result, type = "perc")

Intervals : 
Level     Percentile     
95%   ( 3.136,  5.549 )  
Calculations and Intervals on Original Scale

Using my percentile bootstrap function, the 95% bootstrap percentile confidence interval for \(\mu_{LN} = \mathbb{E}[X]\) is (3.1254, 5.5900). I verified this result using the boot package, which gave a very similar interval of approximately (3.136,5.549). Since the intervals are close, this indicates that the percentile method was applied correctly.

c). Construct a 95% bootstrap BCa confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

To construct the 95% bootstrap BCa confidence interval, I use the bootstrap estimates from part (a) and adjust them for bias and skewness. The BCa method improves on the percentile approach by accounting for asymmetry in the bootstrap sampling distribution. Since the estimated lognormal mean is slightly right-skewed, this method is appropriate for constructing a more accurate confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

## BCa bootstrap confidence interval for E[X]
bca_ci <- function(data, boot_stat, stat_function, alpha = 0.05) {
  theta_hat <- stat_function(data)
  
  ## Measures how the bootstrap distribution is shifted from the original estimate
  z0 <- qnorm(mean(boot_stat < theta_hat))
  
  n <- length(data)
  jack_stat <- rep(0, n)
  
  ## Uses jackknife estimates to account for skewness in the distribution
  for (i in 1:n) {
    jack_sample <- data[-i]
    jack_stat[i] <- stat_function(jack_sample)
  }
  
  jack_mean <- mean(jack_stat)
  num <- sum((jack_mean - jack_stat)^3)
  den <- sum((jack_mean - jack_stat)^2)
  a_hat <- num / (6 * den^(3/2))
  
  z_alpha1 <- qnorm(alpha / 2)
  z_alpha2 <- qnorm(1 - alpha / 2)
  
  ## Adjusts the percentile levels using bias and skewness
  p1 <- pnorm(z0 + (z0 + z_alpha1) / (1 - a_hat * (z0 + z_alpha1)))
  p2 <- pnorm(z0 + (z0 + z_alpha2) / (1 - a_hat * (z0 + z_alpha2)))
  
  quantile(boot_stat, probs = c(p1, p2))
}

ci_bca <- bca_ci(soil, boot_theta, lognormal_mean_hat)
ci_bca
4.163632% 98.69315% 
 3.219438  5.811369 
## Verify BCa interval using boot package
boot.ci(boot_result, type = "bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_result, type = "bca")

Intervals : 
Level       BCa          
95%   ( 3.233,  5.774 )  
Calculations and Intervals on Original Scale

Using my BCa bootstrap function, the 95% BCa confidence interval for \(\mu_{LN} = \mathbb{E}[X]\) is (3.2194, 5.8114). I verified this result using the boot package, which gave a very similar BCa interval of approximately (3.233,5.774). Since the intervals are close, this indicates that the BCa method was applied correctly.

d). Use the Central Limit Theorem to construct a 95% asymptotic confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

To construct the 95% asymptotic confidence interval, I use the Central Limit Theorem together with the delta method. Since the estimator of the lognormal population mean is a smooth function of \(\hat{\mu}\) and \(\hat{\sigma}^2\), its sampling distribution is approximately normal for a sufficiently large sample size. I therefore estimate its standard error using the plug-in approach and then use the normal critical value \(1.96\) to form the confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

## Asymptotic confidence interval for E[X]
asymptotic_ci <- function(data, alpha = 0.05) {
  logx <- log(data)
  n <- length(data)
  
  mu_hat <- mean(logx)
  sigma2_hat <- mean((logx - mu_hat)^2)
  theta_hat <- exp(mu_hat + sigma2_hat / 2)
  
  ## The standard error is estimated using the delta method with plug-in estimates
  se_hat <- theta_hat * sqrt((sigma2_hat + sigma2_hat^2 / 2) / n)
  
  ## The normal critical value is used because the estimator is asymptotically normal
  z_crit <- qnorm(1 - alpha / 2)
  
  c(theta_hat - z_crit * se_hat, theta_hat + z_crit * se_hat)
}

ci_asym <- asymptotic_ci(soil)
ci_asym
[1] 3.134984 5.301214
## Verify asymptotic interval using boot package
boot.ci(boot_result, type = "norm")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_result, type = "norm")

Intervals : 
Level      Normal        
95%   ( 2.986,  5.427 )  
Calculations and Intervals on Original Scale

Using my asymptotic confidence interval function, the 95% asymptotic confidence interval for \(\mu_{LN} = \mathbb{E}[X]\) is (3.1350, 5.3012). I verified this result using the boot package, which gave a similar normal interval of approximately (2.986, 5.427). Since the intervals are reasonably close, this indicates that the asymptotic method was applied correctly.

## Wrapper function for confidence interval methods
ci_wrapper <- function(data, boot_stat, method = "percentile", alpha = 0.05) {
  if (method == "percentile") {
    percentile_ci(boot_stat, alpha)
  } else if (method == "bca") {
    bca_ci(data, boot_stat, lognormal_mean_hat, alpha)
  } else if (method == "asymptotic") {
    asymptotic_ci(data, alpha)
  } else {
    stop("Method must be 'percentile', 'bca', or 'asymptotic'")
  }
}
ci_wrapper(soil, boot_theta, method = "percentile")
    2.5%    97.5% 
3.125359 5.589956 
ci_wrapper(soil, boot_theta, method = "bca")
4.163632% 98.69315% 
 3.219438  5.811369 
ci_wrapper(soil, boot_theta, method = "asymptotic")
[1] 3.134984 5.301214

f). Assuming the confidence intervals constructed in the previous parts are valid, evaluate their performance by comparing their widths, symmetry, stability, and sensitivity to distributional skewness. Then, provide a well‑reasoned recommendation regarding which method is most suitable for this analysis.

The asymptotic interval is the narrowest, the BCa interval is the widest, and the percentile interval lies in between. The asymptotic interval is also the most symmetric around the original estimate because it is based on a normal approximation. However, the bootstrap distribution from part (a) showed slight right skewness, so a symmetric interval may not fully reflect the shape of the data. The percentile interval reflects the bootstrap distribution more directly, but it does not adjust for bias. The BCa interval adjusts for both bias and skewness, making it more sensitive to the asymmetry in the data. All three intervals are relatively similar, but the BCa interval is more reliable in this case. Therefore, I would recommend the BCa interval as the most suitable method for estimating \(\mu_{LN} = \mathbb{E}[X]\). This method takes into account two important things: bias and skewness. Bias is like a systematic error, and skewness is when the data isn’t perfectly symmetrical. By considering both of these, the BCa interval gives a more accurate estimate. Compared to other methods, it’s more suitable for this kind of analysis.

---
title: "Assignment 7: Bootstrap Methods and Applications"
author: "Kayla Dyer"
date: " Due: March 31, 2026"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body {background-color: #ffffff;
      color: #000000;
      font-family: Arial, sans-serif;
      font-size: 1rem;
      line-height: 1.6;
      }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}

if (!require("VGAM")) {
  install.packages("VGAM")
  library(VGAM)
}
#### VGAM
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
 \
 
## **Assignment Objectives** 

<p>
* Reinforce the understanding of Bootstrap sampling .

* Understand the bootstrap estimation: confidence interval and sampling distribution.
</p>


## **Policies of Using AI Tools**

<p>
**Policy on AI Tool Use**: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.
</p>

<p>
**Code Inclusion Requirement**: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.
</p>


<p>**Log-normal Distribution Revisited**</p>

<p>
If $Y = \ln(X) \sim N(\mu, \sigma^2)$, then $X$ follows a lognormal distribution $X \sim \text{Lognormal}(\mu, \sigma^2)$. The probability density is given by

$$
f(x|\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0
$$

After some algebra, we can express the mean and variance of the above lognormal distribution in the following


\begin{align}
\mathbb{E}[X] &= \exp\left(\mu + \frac{\sigma^2}{2}\right) \\
\text{Var}(X) &= [\exp(\sigma^2) - 1] \exp(2\mu + \sigma^2)
\end{align}


Using the relationship between normal and log-normal distribution and a sample $\{x_1, x_2, \dots, x_n\}$, the MLE estimators of $\mu$ and $\sigma^2$ are given by


\begin{align}
\hat{\mu} &= \frac{1}{n}\sum_{i=1}^n \ln(x_i) \\
\hat{\sigma}^2 &= \frac{1}{n}\sum_{i=1}^n (\ln(x_i) - \hat{\mu})^2
\end{align}


Using the plug-in principle of MLE, we have the MLE of $\mathbb{E}[X]$ and $\text{Var}(X)$ in the following

$$
\boxed{\widehat{\mathbb{E}[X]} = \exp\left(\hat{\mu} + \frac{\hat{\sigma}^2}{2}\right)}
$$

</P>



<p><font color = "blue">**This assignment focuses on constructing various bootstrap confidence intervals of the lognormal population mean $\mathbb{E}[X]$**</font></p>


\

## **Question: Trace Metal Concentrations in Soil**

<p>
Soil lead (Pb) concentrations (mg/kg) from 55 urban garden sites. Trace metals in environmental media typically follow lognormal distributions due to:

* Multiplicative processes controlling accumulation

* Positive constraints (concentrations cannot be negative)

* Right-skewed nature of contamination patterns
</p>



<p>
```
0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91, 
2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56, 
1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89, 
11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78, 
22.34, 3.78, 2.78
```
</p>

<p>

$$\boxed{\text{Instructions:}}$$

Assume the data follow a log-normal distribution. For each question in parts (b)–(d), begin by clearly explaining the reasoning behind your analytical approach. Then, develop your own R functions to implement three types of confidence intervals: the asymptotic interval, the percentile bootstrap interval, and the bias-corrected and accelerated (BCa) bootstrap interval. Use these functions to construct the required confidence intervals, and verify your results using the appropriate functions from the `boot` package.

You are encouraged to design a wrapper function that integrates all confidence interval methods, allowing users to select the desired method through an input argument.
</P>

<p>

$$\boxed{\text{Individual Questions:}}$$

a). Perform 5000 bootstrap samples to estimate the bootstrap sampling distribution of $\boxed{\widehat{\mathbb{E}[X]}}$. Display the distribution using either a histogram or a kernel density plot. Comment on the shape, variability, and any notable patterns observed in the bootstrap sampling distribution.

To estimate the sampling distribution of the lognormal population mean, I use bootstrap resampling. For each bootstrap sample, I resample the observed data with replacement and compute the plug-in estimator of \(E[X]\).

```{r}
## Soil lead concentration data (mg/kg)
soil <- c(
  0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91,
  2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56,
  1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89,
  11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78,
  22.34, 3.78, 2.78
)

## Plug-in estimator for the lognormal population mean E[X]
lognormal_mean_hat <- function(x) {
  logx <- log(x)
  mu_hat <- mean(logx)
  sigma2_hat <- mean((logx - mu_hat)^2)  # MLE of variance on log-scale
  exp(mu_hat + sigma2_hat / 2)
}

## Compute estimate from original sample
theta_hat <- lognormal_mean_hat(soil)
theta_hat
```

```{r}
## Bootstrap sampling distribution of E[X]
set.seed(1)

B <- 5000
n <- length(soil)
boot_theta <- rep(0, B)

for (b in 1:B) {
  x_star <- sample(soil, size = n, replace = TRUE)
  boot_theta[b] <- lognormal_mean_hat(x_star)
}
```

```{r}
## Histogram of bootstrap sampling distribution
hist(boot_theta,
     breaks = 30,
     main = "Bootstrap Sampling Distribution of E[X]",
     xlab = expression(hat(E[X])))
```

The bootstrap sampling distribution of the estimated value of E[X] appears roughly unimodal with a slight right skew. It is centered around the original estimate and shows a moderate amount of variability. The right skew is expected given the lognormal nature of the data, and a few larger values indicate the effect of higher observations in the sample.

b). Construct a 95% bootstrap percentile confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

To construct the 95% bootstrap percentile confidence interval, I use the empirical distribution of the bootstrap estimates obtained in part (a). Since the percentile method uses the quantiles of the bootstrap sampling distribution directly, I take the 2.5th and 97.5th percentiles of the bootstrap estimates of the lognormal population mean. This gives a 95% percentile bootstrap confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

```{r}
## Percentile bootstrap confidence interval for E[X]
percentile_ci <- function(boot_stat, alpha = 0.05) {
  quantile(boot_stat, probs = c(alpha / 2, 1 - alpha / 2))
}

ci_percentile <- percentile_ci(boot_theta)
ci_percentile
```

```{r}
## Verify percentile interval using boot package
library(boot)

boot_mean_stat <- function(data, indices) {
  sample_data <- data[indices]
  lognormal_mean_hat(sample_data)
}

boot_result <- boot(data = soil, statistic = boot_mean_stat, R = 5000)
boot.ci(boot_result, type = "perc")
```

Using my percentile bootstrap function, the 95% bootstrap percentile confidence interval for $\mu_{LN} = \mathbb{E}[X]$ is (3.1254, 5.5900). I verified this result using the boot package, which gave a very similar interval of approximately (3.136,5.549). Since the intervals are close, this indicates that the percentile method was applied correctly.

c). Construct a 95% bootstrap BCa confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

To construct the 95% bootstrap BCa confidence interval, I use the bootstrap estimates from part (a) and adjust them for bias and skewness. The BCa method improves on the percentile approach by accounting for asymmetry in the bootstrap sampling distribution. Since the estimated lognormal mean is slightly right-skewed, this method is appropriate for constructing a more accurate confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

```{r}
## BCa bootstrap confidence interval for E[X]
bca_ci <- function(data, boot_stat, stat_function, alpha = 0.05) {
  theta_hat <- stat_function(data)
  
  ## Measures how the bootstrap distribution is shifted from the original estimate
  z0 <- qnorm(mean(boot_stat < theta_hat))
  
  n <- length(data)
  jack_stat <- rep(0, n)
  
  ## Uses jackknife estimates to account for skewness in the distribution
  for (i in 1:n) {
    jack_sample <- data[-i]
    jack_stat[i] <- stat_function(jack_sample)
  }
  
  jack_mean <- mean(jack_stat)
  num <- sum((jack_mean - jack_stat)^3)
  den <- sum((jack_mean - jack_stat)^2)
  a_hat <- num / (6 * den^(3/2))
  
  z_alpha1 <- qnorm(alpha / 2)
  z_alpha2 <- qnorm(1 - alpha / 2)
  
  ## Adjusts the percentile levels using bias and skewness
  p1 <- pnorm(z0 + (z0 + z_alpha1) / (1 - a_hat * (z0 + z_alpha1)))
  p2 <- pnorm(z0 + (z0 + z_alpha2) / (1 - a_hat * (z0 + z_alpha2)))
  
  quantile(boot_stat, probs = c(p1, p2))
}

ci_bca <- bca_ci(soil, boot_theta, lognormal_mean_hat)
ci_bca
```

```{r}
## Verify BCa interval using boot package
boot.ci(boot_result, type = "bca")
```

Using my BCa bootstrap function, the 95% BCa confidence interval for $\mu_{LN} = \mathbb{E}[X]$  is (3.2194, 5.8114). I verified this result using the boot package, which gave a very similar BCa interval of approximately (3.233,5.774). Since the intervals are close, this indicates that the BCa method was applied correctly.

d). Use the Central Limit Theorem to construct a 95% asymptotic confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

To construct the 95% asymptotic confidence interval, I use the Central Limit Theorem together with the delta method. Since the estimator of the lognormal population mean is a smooth function of $\hat{\mu}$ and $\hat{\sigma}^2$, its sampling distribution is approximately normal for a sufficiently large sample size. I therefore estimate its standard error using the plug-in approach and then use the normal critical value $1.96$ to form the confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

```{r}
## Asymptotic confidence interval for E[X]
asymptotic_ci <- function(data, alpha = 0.05) {
  logx <- log(data)
  n <- length(data)
  
  mu_hat <- mean(logx)
  sigma2_hat <- mean((logx - mu_hat)^2)
  theta_hat <- exp(mu_hat + sigma2_hat / 2)
  
  ## The standard error is estimated using the delta method with plug-in estimates
  se_hat <- theta_hat * sqrt((sigma2_hat + sigma2_hat^2 / 2) / n)
  
  ## The normal critical value is used because the estimator is asymptotically normal
  z_crit <- qnorm(1 - alpha / 2)
  
  c(theta_hat - z_crit * se_hat, theta_hat + z_crit * se_hat)
}

ci_asym <- asymptotic_ci(soil)
ci_asym
```

```{r}
## Verify asymptotic interval using boot package
boot.ci(boot_result, type = "norm")
```

Using my asymptotic confidence interval function, the 95% asymptotic confidence interval for $\mu_{LN} = \mathbb{E}[X]$ is (3.1350, 5.3012). I verified this result using the boot package, which gave a similar normal interval of approximately (2.986, 5.427). Since the intervals are reasonably close, this indicates that the asymptotic method was applied correctly.

```{r}
## Wrapper function for confidence interval methods
ci_wrapper <- function(data, boot_stat, method = "percentile", alpha = 0.05) {
  if (method == "percentile") {
    percentile_ci(boot_stat, alpha)
  } else if (method == "bca") {
    bca_ci(data, boot_stat, lognormal_mean_hat, alpha)
  } else if (method == "asymptotic") {
    asymptotic_ci(data, alpha)
  } else {
    stop("Method must be 'percentile', 'bca', or 'asymptotic'")
  }
}
```

```{r}
ci_wrapper(soil, boot_theta, method = "percentile")
ci_wrapper(soil, boot_theta, method = "bca")
ci_wrapper(soil, boot_theta, method = "asymptotic")
```

f). Assuming the confidence intervals constructed in the previous parts are valid, evaluate their performance by comparing their widths, symmetry, stability, and sensitivity to distributional skewness. Then, provide a well‑reasoned recommendation regarding which method is most suitable for this analysis.

The asymptotic interval is the narrowest, the BCa interval is the widest, and the percentile interval lies in between. The asymptotic interval is also the most symmetric around the original estimate because it is based on a normal approximation. However, the bootstrap distribution from part (a) showed slight right skewness, so a symmetric interval may not fully reflect the shape of the data. The percentile interval reflects the bootstrap distribution more directly, but it does not adjust for bias. The BCa interval adjusts for both bias and skewness, making it more sensitive to the asymmetry in the data. All three intervals are relatively similar, but the BCa interval is more reliable in this case. Therefore, I would recommend the BCa interval as the most suitable method for estimating $\mu_{LN} = \mathbb{E}[X]$. This method takes into account two important things: bias and skewness. Bias is like a systematic error, and skewness is when the data isn't perfectly symmetrical. By considering both of these, the BCa interval gives a more accurate estimate. Compared to other methods, it's more suitable for this kind of analysis.

