Assignment Objectives

  • Reinforce the understanding of Bootstrap sampling .

  • Understand the bootstrap estimation: confidence interval and sampling distribution.

Policies of Using AI Tools

Policy on AI Tool Use: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

Code Inclusion Requirement: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.

Log-normal Distribution Revisited

If \(Y = \ln(X) \sim N(\mu, \sigma^2)\), then \(X\) follows a lognormal distribution \(X \sim \text{Lognormal}(\mu, \sigma^2)\). The probability density is given by

\[ f(x|\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0 \]

After some algebra, we can express the mean and variance of the above lognormal distribution in the following

\[ \begin{align} \mathbb{E}[X] &= \exp\left(\mu + \frac{\sigma^2}{2}\right) \\ \text{Var}(X) &= [\exp(\sigma^2) - 1] \exp(2\mu + \sigma^2) \end{align} \]

Using the relationship between normal and log-normal distribution and a sample \(\{x_1, x_2, \dots, x_n\}\), the MLE estimators of \(\mu\) and \(\sigma^2\) are given by

\[ \begin{align} \hat{\mu} &= \frac{1}{n}\sum_{i=1}^n \ln(x_i) \\ \hat{\sigma}^2 &= \frac{1}{n}\sum_{i=1}^n (\ln(x_i) - \hat{\mu})^2 \end{align} \]

Using the plug-in principle of MLE, we have the MLE of \(\mathbb{E}[X]\) and \(\text{Var}(X)\) in the following

\[ \boxed{\widehat{\mathbb{E}[X]} = \exp\left(\hat{\mu} + \frac{\hat{\sigma}^2}{2}\right)} \]

This assignment focuses on constructing various bootstrap confidence intervals of the lognormal population mean \(\mathbb{E}[X]\)


Question: Trace Metal Concentrations in Soil

Soil lead (Pb) concentrations (mg/kg) from 55 urban garden sites. Trace metals in environmental media typically follow lognormal distributions due to:

  • Multiplicative processes controlling accumulation

  • Positive constraints (concentrations cannot be negative)

  • Right-skewed nature of contamination patterns

0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91, 
2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56, 
1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89, 
11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78, 
22.34, 3.78, 2.78

\[\boxed{\text{Instructions:}}\]

Assume the data follow a log-normal distribution. For each question in parts (b)–(d), begin by clearly explaining the reasoning behind your analytical approach. Then, develop your own R functions to implement three types of confidence intervals: the asymptotic interval, the percentile bootstrap interval, and the bias-corrected and accelerated (BCa) bootstrap interval. Use these functions to construct the required confidence intervals, and verify your results using the appropriate functions from the boot package.

You are encouraged to design a wrapper function that integrates all confidence interval methods, allowing users to select the desired method through an input argument.

\[\boxed{\text{Individual Questions:}}\]

a). Perform 5000 bootstrap samples to estimate the bootstrap sampling distribution of \(\boxed{\widehat{\mathbb{E}[X]}}\). Display the distribution using either a histogram or a kernel density plot. Comment on the shape, variability, and any notable patterns observed in the bootstrap sampling distribution.

Question 1 Part A

set.seed(123) #Sets seed
soil <- c(0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91, 
2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56, 
1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89, 
11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78, 
22.34, 3.78, 2.78)

B <- 5000 #Number of bootstrap samples
n <- length(soil) #Gets n which is used later
boot_ex <- numeric(B) #Sets empty vector to store bootstrap sample means

for(b in 1:B){
  boot_sample <- sample(soil, n, replace=TRUE) #Does bootstrap sampling
  boot_mu <- mean(log(boot_sample)) #Gets mu
  boot_sigma2 <- var(log(boot_sample)) #Gets sigma^2
  boot_ex[b] <- exp(boot_mu + boot_sigma2/2) #Gets E[X]
  
}

kde_soil <- density(boot_ex) #Makes KDE of bootstrap sample means

plot(kde_soil, main = "KDE from Bootstrap Sample Means", xlab="Sample Means", col="blue")

The bootstrap sampling distribution appears to follow a symmetric distribution centered somewhere between 4 and 5. It’s approximately a bell shaped distribution and the vast majority of the observations are are between 3 and 6.

b). Construct a 95% bootstrap percentile confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

Question 1 Part B

For the bootstrap percentile confidence interval we first need to sort the bootstrap \(\widehat{\mathbb{E}[X]}\) values (this will be done automatically in the quantile function in the code). Then we need to plug these sorted values into the formula:

\[ \left[\hat{\theta}_{B\cdot\alpha/2}^*, \hat{\theta}_{B\cdot(1-\alpha/2)}^* \right] \Rightarrow \left[\widehat{\mathbb{E}[X]}_{(5000\cdot 0.05/2)}^*, \widehat{\mathbb{E}[X]}_{(5000\cdot (1-0.05/2))}^* \right] \]

This formula can then be used to find the bootstrap percentile confidence interval in the code below:

ci_percentile <- function(boot_vals, alpha = 0.05){ #Creates function
  output_ci <- quantile(boot_vals, c(alpha/2, 1-alpha/2)) #Creates interval
  return(output_ci)
}
lowuppercentile <- ci_percentile(boot_ex) 
lowuppercentile #States interval 
    2.5%    97.5% 
3.110817 5.683647 

From this output we know that the lower bound of the bootstrap percentile confidence interval is 3.110817 and the upper bound is 5.683647.

library(boot)

boot_fn <- function(data, i) { #Used to create function used later
  x <- data[i]
  mu_hat <- mean(log(x))
  sigma2_hat <- var(log(x))
  exp(mu_hat + sigma2_hat / 2)
}

boot_obj <- boot(soil, boot_fn, R = 5000) #Creates boot object

boot.ci(boot_obj, type = "perc") #Uses boot.ci to get confidence interval
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_obj, type = "perc")

Intervals : 
Level     Percentile     
95%   ( 3.140,  5.649 )  
Calculations and Intervals on Original Scale

Given that the interval provided by the ‘boot.ci’ function is (3.134, 5.676) which is similar to our function’s output of (3.110817, 5.683647) it seems that our function is accurate.

c). Construct a 95% bootstrap BCa confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

Question 1 Part C

In order to find the bootstrap BCa confidence interval we will first compute the bias correction using the \(\hat{z}_0\) formula shown in class. Then we will use the jackknife method to find \(\hat{a}\). Finally, we will compute the adjusted percentiles \(\alpha_1\) and \(\alpha_2\). We will then use the adjusted percentiles to find the confidence interval:

\[ \left[\hat{\theta}_{B\cdot\alpha_1}^*, \hat{\theta}_{B\cdot\alpha_2}^* \right] \Rightarrow \left[\widehat{\mathbb{E}[X]}_{(5000\cdot \alpha_1)}^*, \widehat{\mathbb{E}[X]}_{(5000\cdot\alpha_2}^* \right] \]

We can then plug these formulas into the code below to get the bootstrap BCa confidence interval:

ex_hat_fn <- function(x) { #Used later to get E[X]
  mu_hat <- mean(log(x))
  sigma2_hat <- var(log(x))
  exp(mu_hat + sigma2_hat / 2)
}

ci_bca <- function(data, boot_vals, ex_hat_fn, alpha = 0.05) {
  n <- length(data) #Gets n
  ex_hat <- ex_hat_fn(data) #Gets E[X] from sample
  
  z0 <- qnorm(mean(boot_vals < ex_hat)) #Gets bias correction factors
  
  jack <- sapply(1:n, function(i) ex_hat_fn(data[-i])) #Gets the jackknife estimates
  jack_mean <- mean(jack) #Gets mean
  
  num <- sum((jack_mean - jack)^3) #Gets acceleration factor
  den <- 6 * (sum((jack_mean - jack)^2))^(3/2)
  a <- num / den
  
  z_alpha <- qnorm(c(alpha/2, 1 - alpha/2)) #Gets normal cutoffs
  
  adj <- pnorm(z0 + (z0 + z_alpha) / (1 - a * (z0 + z_alpha))) #Uses formula from above
  
  output_ci2 <- quantile(boot_vals, adj) #Uses adjusted probabilities to get to get confidence interval
  
  return(output_ci2)
}

lowupbca <- ci_bca(soil, boot_ex, ex_hat_fn)
lowupbca #States confidence interval
4.196505%  98.7084% 
 3.244080  5.915138 

From this output we know that the lower bound of the bootstrap percentile confidence interval is 3.244080 and the upper bound is 5.915138.

boot.ci(boot_obj, type = "bca") #Uses boot.ci to get confidence interval
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_obj, type = "bca")

Intervals : 
Level       BCa          
95%   ( 3.241,  5.848 )  
Calculations and Intervals on Original Scale

Given that the interval provided by the ‘boot.ci’ function is (3.248, 6.001) which is similar to our function’s output of (3.244080, 5.915138) it seems that our function is accurate.

d). Use the Central Limit Theorem to construct a 95% asymptotic confidence interval for \(\mu_{LN} = \mathbb{E}[X]\).

Question 1 Part D

Now that we have the bootstrap \(\widehat{\mathbb{E}[X]}\) values we can all we need to do is calculate \(\widehat{\mathbb{E}[X]}\) for our original sample, find the standard deviation of the bootstrap \(\widehat{\mathbb{E}[X]}\)’s. We know that:

\[ \widehat{\mathbb{E}[X]} = \exp\left(\hat{\mu} + \frac{\hat{\sigma}^2}{2}\right) \]

We also know that the confidence interval can be found using:

\[ \widehat{\mathbb{E}[X]} \pm z_{\alpha/2} \cdot \hat{se}_{boot} \]

Where:

\[ \hat{se}_{boot}=\sqrt{\frac{1}{5000-1}\sum_{b=1}^{5000}(\widehat{\mathbb{E}[X]}_b^*-\overline{\mathbb{E}[X]}^*)^2} \]

We can plug these formulas into our code in order to find the confidence interval

ci_asymptotic <- function(data, boot_vals, alpha = 0.05) {
  ex_hat <- mean(boot_vals) #Gets E[X]
  se <- sd(boot_vals) #Gets standard deviation
  z <- qnorm(1 - alpha/2) #Gets z-value
  
  output3 <- c(lower = ex_hat - z * se, upper = ex_hat + z * se)  #Calculates confidence interval
  return(output3)
}

lowupasymptotic <- ci_asymptotic(soil, boot_ex) 
lowupasymptotic #States confidence interval
   lower    upper 
2.985746 5.532843 

From this output we know that the lower bound of the bootstrap percentile confidence interval is 2.985746 and the upper bound is 5.532843.

boot.ci(boot_obj, type = "norm") #Uses boot.ci to get confidence interval
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 5000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_obj, type = "norm")

Intervals : 
Level      Normal        
95%   ( 2.980,  5.487 )  
Calculations and Intervals on Original Scale

Given that the interval provided by the ‘boot.ci’ function is (2.955, 5.531) which is similar to our function’s output of (2.985746, 5.532843 ) it seems that our function is accurate.

f). Assuming the confidence intervals constructed in the previous parts are valid, evaluate their performance by comparing their widths, symmetry, stability, and sensitivity to distributional skewness. Then, provide a well‑reasoned recommendation regarding which method is most suitable for this analysis.

Question 1 Part F

The asymptotic confidence interval has the smallest confidence interval with a width of 2.547097, compared to the bootstrap percentile confidence interval width of 2.57283 and the bootstrap BCa confidence interval width of 2.671058. The percentile method is simple and range-preserving, the BCa method correct bias/skew, and the asymptotic normal approximation is fast but assumes normality. Considering the KDE from the bootstrap sample means which is symmetric and appears normally distributed with a bell-shaped curve, the asymptotic confidence interval seems to be appropriate in this case. So, provided that all of the confidence intervals are valid (in that we can be at least 95% confident the true value of the expected value is between the upper and lower limits) I would recommend the asymptotic confidence interval given that it has the smallest interval and it seems appropriate in this setting.

---
title: "Assignment 8: Bootstrap Methods and Applications"
author: "Grace Lippert"
date: " Due: 3/31/2026 "
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body {background-color: #ffffff;
      color: #000000;
      font-family: Arial, sans-serif;
      font-size: 1rem;
      line-height: 1.6;
      }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}

if (!require("VGAM")) {
  install.packages("VGAM")
  library(VGAM)
}
#### VGAM
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
 \
 
## **Assignment Objectives** 

<p>
* Reinforce the understanding of Bootstrap sampling .

* Understand the bootstrap estimation: confidence interval and sampling distribution.
</p>


## **Policies of Using AI Tools**

<p>
**Policy on AI Tool Use**: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.
</p>

<p>
**Code Inclusion Requirement**: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.
</p>


<p>**Log-normal Distribution Revisited**</p>

<p>
If $Y = \ln(X) \sim N(\mu, \sigma^2)$, then $X$ follows a lognormal distribution $X \sim \text{Lognormal}(\mu, \sigma^2)$. The probability density is given by

$$
f(x|\mu,\sigma) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right), \quad x > 0
$$

After some algebra, we can express the mean and variance of the above lognormal distribution in the following

$$
\begin{align}
\mathbb{E}[X] &= \exp\left(\mu + \frac{\sigma^2}{2}\right) \\
\text{Var}(X) &= [\exp(\sigma^2) - 1] \exp(2\mu + \sigma^2)
\end{align}
$$

Using the relationship between normal and log-normal distribution and a sample $\{x_1, x_2, \dots, x_n\}$, the MLE estimators of $\mu$ and $\sigma^2$ are given by

$$
\begin{align}
\hat{\mu} &= \frac{1}{n}\sum_{i=1}^n \ln(x_i) \\
\hat{\sigma}^2 &= \frac{1}{n}\sum_{i=1}^n (\ln(x_i) - \hat{\mu})^2
\end{align}
$$

Using the plug-in principle of MLE, we have the MLE of $\mathbb{E}[X]$ and $\text{Var}(X)$ in the following

$$
\boxed{\widehat{\mathbb{E}[X]} = \exp\left(\hat{\mu} + \frac{\hat{\sigma}^2}{2}\right)}
$$

</P>



<p><font color = "blue">**This assignment focuses on constructing various bootstrap confidence intervals of the lognormal population mean $\mathbb{E}[X]$**</font></p>


\

## **Question: Trace Metal Concentrations in Soil**

<p>
Soil lead (Pb) concentrations (mg/kg) from 55 urban garden sites. Trace metals in environmental media typically follow lognormal distributions due to:

* Multiplicative processes controlling accumulation

* Positive constraints (concentrations cannot be negative)

* Right-skewed nature of contamination patterns
</p>



<p>
```
0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91, 
2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56, 
1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89, 
11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78, 
22.34, 3.78, 2.78
```
</p>

<p>

$$\boxed{\text{Instructions:}}$$

Assume the data follow a log-normal distribution. For each question in parts (b)–(d), begin by clearly explaining the reasoning behind your analytical approach. Then, develop your own R functions to implement three types of confidence intervals: the asymptotic interval, the percentile bootstrap interval, and the bias-corrected and accelerated (BCa) bootstrap interval. Use these functions to construct the required confidence intervals, and verify your results using the appropriate functions from the `boot` package.

You are encouraged to design a wrapper function that integrates all confidence interval methods, allowing users to select the desired method through an input argument.
</P>

<p>

$$\boxed{\text{Individual Questions:}}$$

a). Perform 5000 bootstrap samples to estimate the bootstrap sampling distribution of $\boxed{\widehat{\mathbb{E}[X]}}$. Display the distribution using either a histogram or a kernel density plot. Comment on the shape, variability, and any notable patterns observed in the bootstrap sampling distribution.

# Question 1 Part A

```{r}
set.seed(123) #Sets seed
soil <- c(0.85, 1.23, 0.92, 3.45, 2.11, 1.56, 4.89, 2.34, 1.78, 6.72, 0.95, 1.34, 8.91, 
2.67, 1.89, 5.43, 1.12, 3.78, 2.45, 7.65, 1.05, 1.45, 12.34, 2.89, 2.01, 4.56, 
1.23, 4.32, 2.67, 9.87, 0.99, 1.56, 15.23, 3.12, 2.34, 3.89, 1.34, 5.67, 2.89, 
11.45, 1.12, 1.67, 18.90, 3.45, 2.56, 3.45, 1.45, 6.78, 3.12, 14.56, 1.23, 1.78, 
22.34, 3.78, 2.78)

B <- 5000 #Number of bootstrap samples
n <- length(soil) #Gets n which is used later
boot_ex <- numeric(B) #Sets empty vector to store bootstrap sample means

for(b in 1:B){
  boot_sample <- sample(soil, n, replace=TRUE) #Does bootstrap sampling
  boot_mu <- mean(log(boot_sample)) #Gets mu
  boot_sigma2 <- var(log(boot_sample)) #Gets sigma^2
  boot_ex[b] <- exp(boot_mu + boot_sigma2/2) #Gets E[X]
  
}

kde_soil <- density(boot_ex) #Makes KDE of bootstrap sample means

plot(kde_soil, main = "KDE from Bootstrap Sample Means", xlab="Sample Means", col="blue")
```

The bootstrap sampling distribution appears to follow a symmetric distribution centered somewhere between 4 and 5.  It's approximately a bell shaped distribution and the vast majority of the observations are are between 3 and 6.  


b). Construct a 95% bootstrap percentile confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

# Question 1 Part B

For the bootstrap percentile confidence interval we first need to sort the bootstrap $\widehat{\mathbb{E}[X]}$ values (this will be done automatically in the quantile function in the code).  Then we need to plug these sorted values into the formula:

$$
\left[\hat{\theta}_{B\cdot\alpha/2}^*, \hat{\theta}_{B\cdot(1-\alpha/2)}^* \right] \Rightarrow 
\left[\widehat{\mathbb{E}[X]}_{(5000\cdot 0.05/2)}^*, \widehat{\mathbb{E}[X]}_{(5000\cdot (1-0.05/2))}^* \right]
$$

This formula can then be used to find the bootstrap percentile confidence interval in the code below:

```{r}
ci_percentile <- function(boot_vals, alpha = 0.05){ #Creates function
  output_ci <- quantile(boot_vals, c(alpha/2, 1-alpha/2)) #Creates interval
  return(output_ci)
}
lowuppercentile <- ci_percentile(boot_ex) 
lowuppercentile #States interval 
```

From this output we know that the lower bound of the bootstrap percentile confidence interval is 3.110817 and the upper bound is 5.683647.

```{r}
library(boot)

boot_fn <- function(data, i) { #Used to create function used later
  x <- data[i]
  mu_hat <- mean(log(x))
  sigma2_hat <- var(log(x))
  exp(mu_hat + sigma2_hat / 2)
}

boot_obj <- boot(soil, boot_fn, R = 5000) #Creates boot object

boot.ci(boot_obj, type = "perc") #Uses boot.ci to get confidence interval
```

Given that the interval provided by the 'boot.ci' function is (3.134, 5.676) which is similar to our function's output of (3.110817, 5.683647) it seems that our function is accurate.

c). Construct a 95% bootstrap BCa confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

# Question 1 Part C

In order to find the bootstrap BCa confidence interval we will first compute the bias correction using the $\hat{z}_0$ formula shown in class.  Then we will use the jackknife method to find $\hat{a}$.  Finally, we will compute the adjusted percentiles $\alpha_1$ and $\alpha_2$.  We will then use the adjusted percentiles to find the confidence interval:

$$
\left[\hat{\theta}_{B\cdot\alpha_1}^*, \hat{\theta}_{B\cdot\alpha_2}^* \right] \Rightarrow 
\left[\widehat{\mathbb{E}[X]}_{(5000\cdot \alpha_1)}^*, \widehat{\mathbb{E}[X]}_{(5000\cdot\alpha_2}^* \right]
$$

We can then plug these formulas into the code below to get the bootstrap BCa confidence interval:

```{r}
ex_hat_fn <- function(x) { #Used later to get E[X]
  mu_hat <- mean(log(x))
  sigma2_hat <- var(log(x))
  exp(mu_hat + sigma2_hat / 2)
}

ci_bca <- function(data, boot_vals, ex_hat_fn, alpha = 0.05) {
  n <- length(data) #Gets n
  ex_hat <- ex_hat_fn(data) #Gets E[X] from sample
  
  z0 <- qnorm(mean(boot_vals < ex_hat)) #Gets bias correction factors
  
  jack <- sapply(1:n, function(i) ex_hat_fn(data[-i])) #Gets the jackknife estimates
  jack_mean <- mean(jack) #Gets mean
  
  num <- sum((jack_mean - jack)^3) #Gets acceleration factor
  den <- 6 * (sum((jack_mean - jack)^2))^(3/2)
  a <- num / den
  
  z_alpha <- qnorm(c(alpha/2, 1 - alpha/2)) #Gets normal cutoffs
  
  adj <- pnorm(z0 + (z0 + z_alpha) / (1 - a * (z0 + z_alpha))) #Uses formula from above
  
  output_ci2 <- quantile(boot_vals, adj) #Uses adjusted probabilities to get to get confidence interval
  
  return(output_ci2)
}

lowupbca <- ci_bca(soil, boot_ex, ex_hat_fn)
lowupbca #States confidence interval
```

From this output we know that the lower bound of the bootstrap percentile confidence interval is 3.244080 and the upper bound is 5.915138.

```{r}
boot.ci(boot_obj, type = "bca") #Uses boot.ci to get confidence interval
```

Given that the interval provided by the 'boot.ci' function is (3.248, 6.001) which is similar to our function's output of (3.244080, 5.915138) it seems that our function is accurate.

d). Use the Central Limit Theorem to construct a 95% asymptotic confidence interval for $\mu_{LN} = \mathbb{E}[X]$.

# Question 1 Part D

Now that we have the bootstrap $\widehat{\mathbb{E}[X]}$ values we can all we need to do is calculate $\widehat{\mathbb{E}[X]}$ for our original sample, find the standard deviation of the bootstrap $\widehat{\mathbb{E}[X]}$'s.  We know that:

$$
\widehat{\mathbb{E}[X]} = \exp\left(\hat{\mu} + \frac{\hat{\sigma}^2}{2}\right)
$$

We also know that the confidence interval can be found using:

$$
\widehat{\mathbb{E}[X]} \pm z_{\alpha/2} \cdot \hat{se}_{boot}
$$

Where:

$$
\hat{se}_{boot}=\sqrt{\frac{1}{5000-1}\sum_{b=1}^{5000}(\widehat{\mathbb{E}[X]}_b^*-\overline{\mathbb{E}[X]}^*)^2}
$$

We can plug these formulas into our code in order to find the confidence interval 

```{r}
ci_asymptotic <- function(data, boot_vals, alpha = 0.05) {
  ex_hat <- mean(boot_vals) #Gets E[X]
  se <- sd(boot_vals) #Gets standard deviation
  z <- qnorm(1 - alpha/2) #Gets z-value
  
  output3 <- c(lower = ex_hat - z * se, upper = ex_hat + z * se)  #Calculates confidence interval
  return(output3)
}

lowupasymptotic <- ci_asymptotic(soil, boot_ex) 
lowupasymptotic #States confidence interval

```

From this output we know that the lower bound of the bootstrap percentile confidence interval is 2.985746 and the upper bound is 5.532843.

```{r}
boot.ci(boot_obj, type = "norm") #Uses boot.ci to get confidence interval
```

Given that the interval provided by the 'boot.ci' function is (2.955, 5.531) which is similar to our function's output of (2.985746, 5.532843 ) it seems that our function is accurate.

f). Assuming the confidence intervals constructed in the previous parts are valid, evaluate their performance by comparing their widths, symmetry, stability, and sensitivity to distributional skewness. Then, provide a well‑reasoned recommendation regarding which method is most suitable for this analysis.

# Question 1 Part F

The asymptotic confidence interval has the smallest confidence interval with a width of 2.547097, compared to the bootstrap percentile confidence interval width of 2.57283 and the bootstrap BCa confidence interval width of 2.671058.  The percentile method is simple and range-preserving, the BCa method correct bias/skew, and the asymptotic normal approximation is fast but assumes normality.  Considering the KDE from the bootstrap sample means which is symmetric and appears normally distributed with a bell-shaped curve, the asymptotic confidence interval seems to be appropriate in this case.  So, provided that all of the confidence intervals are valid (in that we can be at least 95% confident the true value of the expected value is between the upper and lower limits)  I would recommend the asymptotic confidence interval given that it has the smallest interval and it seems appropriate in this setting.


</p>




