For this exam, we will be estimating the parameters of a lognormal distribution using a data set that consists of the sizes of 50 recently drilled oil reserves.

oil <- c(17.3, 24.8, 8.2, 31.5, 14.1, 42.7, 11.9, 55.3, 21.4, 9.7, 36.2, 18.6, 63.1, 13.2, 28.9, 
47.5, 19.8, 7.5, 33.4, 52.1, 16.0, 25.9, 38.7, 10.3, 44.2, 22.5, 58.6, 12.8, 30.3, 48.9, 
20.1, 35.4, 15.7, 60.2, 26.7, 41.3, 18.1, 53.8, 23.9, 46.2, 29.4, 37.6, 14.9, 50.5, 32.8, 
19.3, 56.7, 11.2, 39.5, 27.1)

1 Question 1

For this question, we will use the method of moments to estimate \(\mu\) and \(\sigma^2\). Then we will use those parameter estimates to estimate the mean of the lognormal distribution.

1.1 Part A

For Part A, we will derive and set up both the system of equations for the moments. The Derivation of the system of equations is as follows: and then subsequently obtain the parameter estimates to use to find the mean of the lognormal distribution. The derivations are as follows:

Distribution moments: \(E(X^k) = \exp(k\mu + \frac{1}{2} k^2 \sigma^2)\)

Distribution moment 1: \(E(x) = \exp(\mu + \frac{1}{2} \sigma^2)\)

Distribution moment 2: \(E(x^2) = \exp(2\mu + 2\sigma^2)\)

Sample moments: \(E(X^k) = \frac{1}{n} \sum X_i^k = m_k\)

Sample moment 1: \(E(x) = \frac{1}{n} \sum X_i = \bar{x} = m_1\)

Sample moment 2: \(E(x^2) = \frac{1}{n} \sum X_i^2 = m_2\)

\(E(x) = E(x) \Rightarrow \exp(\mu + \frac{1}{2} \sigma^2) = \frac{1}{n} \sum_{i=1}^n X_i = \bar{x} = m_1\)

\(E(x^2) = E(x^2) = \exp(2\mu + 2\sigma^2) = \frac{1}{n} \sum X_i^2 = m_2\)

1.2 Part B

For this part, we will now solve for both \(\mu\) and \(\sigma^2\) and obtain the estimates.

\(\exp(\mu + \frac{1}{2} \sigma^2) = m_1 \Rightarrow \mu + \frac{1}{2} \sigma^2 = \ln m_1\)

\(\exp(2\mu + 2\sigma^2) = m_2 \Rightarrow 2\mu + 2\sigma^2 = \ln m_2\)

\[\begin{align*} 2\mu + \sigma^2 &= 2 \ln m_1 \\ 2\mu + 2\sigma^2 &= \ln m_2 \end{align*}\]

\(\Rightarrow \sigma^2 = \ln m_2 - 2 \ln m_1 = \ln \frac{m_2}{m_1^2}\)

\[\begin{align*} \mu + \frac{1}{2} (\ln m_2 - 2 \ln m_1) &= \ln m_1 \\ \mu + \frac{1}{2} \ln m_2 - \ln m_1 &= \ln m_1 \\ \mu &= 2 \ln m_1 - \frac{1}{2} \ln m_2 \end{align*}\]

1.3 Part C

For part c, we must write a function that will return the parameter estimates and the mean of the lognormal distribution.

n <- length(oil) # sample size of oil

m1.samp <- mean(oil) # first sample moment 
m2.samp <- mean(oil^2) # second sample moment

mm.param.est <- function(data) {

mm.mu.est = 2*log(m1.samp) - log(m2.samp)/2 
# method of moments estimator for mu
mm.var.est = log(m2.samp) - 2*log(m1.samp)
# method of moments estimator for varianc

mm.logmean.est <- exp(mm.mu.est + mm.var.est/2)
# mm estimator for lognormal mean

return(c(as.numeric(mm.mu.est), as.numeric(mm.var.est), as.numeric(mm.logmean.est)))
# converts outputs into a numerical vector and outputs list

}

mm.param.est(oil) # oil is the data set we're using
[1]  3.3012685  0.2339652 30.5160000

According to the output, \(\widetilde{\mu}\) is 3.3013, \(\widetilde\sigma^2\) is 0.234, and the population mean, \(\widetilde{\mu_{LN}}\), is 30.516.

2 Question 2

For this question, we will derive the maxmimum likelihood estimators for \(\mu\) and \(\sigma^2\). We will then use those estimates to generate both the maximum likelihood estimate for the mean of the lognormal distribution. We will also use the moments from the previous question to derive the variance estimate of the lognormal.

2.1 Part A

For Part A, we will derive the score functions, also known as the gradient(s) of the log-likelihood function. The derivations are as follows:

\(L(\mu, \sigma^2 | X) = \prod_{i=1}^n f(x_i | \mu, \sigma^2) = \prod_{i=1}^n \frac{1}{x_i \sqrt{2\pi\sigma^2}} \exp\left( -\frac{(\ln(x_i) - \mu)^2}{2\sigma^2} \right)\)

\(= \left( \prod_{i=1}^n \frac{1}{x_i} \right) \left( \frac{1}{\sqrt{2\pi\sigma^2}} \right)^n \exp\left( \sum_{i=1}^n -\frac{(\ln(x_i) - \mu)^2}{2\sigma^2} \right)\)

\(= \left( \prod_{i=1}^n x_i \right)^{-1} (2\pi\sigma^2)^{-\frac{n}{2}} \exp\left( -\frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2 \right)\)

\(\ln(L) = -\ln\left( \prod_{i=1}^n x_i \right) - \frac{n}{2} \ln(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2\)

\(= -\sum_{i=1}^n \ln(x_i) - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2\)

2.2 Part B

For Part B, we will now solve for both \(\mu\) and \(\sigma^2\) and obtain the maximum likelihood estimates.

\(\frac{\partial \ln L}{\partial \mu} = \frac{\partial}{\partial \mu} \left( -\frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2 \right) = -\frac{1}{2\sigma^2} \sum_{i=1}^n \frac{\partial}{\partial \mu} (\ln(x_i) - \mu)^2\)

\(= -\frac{1}{2\sigma^2} (2)(-1) \sum_{i=1}^n (\ln(x_i) - \mu) = \frac{1}{\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu) = 0\)

\(\frac{1}{\sigma^2} \sum_{i=1}^n \ln(x_i) - \frac{1}{\sigma^2} \sum_{i=1}^n \mu = 0 \Rightarrow \sum \ln(x_i) = n\mu \Rightarrow \mu = \frac{1}{n} \sum \ln(x_i) \text{ or } \ln(\bar{x})\)

\(\frac{\partial \ln L}{\partial \sigma^2} = \frac{\partial}{\partial \sigma^2} \left( -\frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2 \right) = 0\)

\(= -\frac{n}{2} \left( \frac{1}{\sigma^2} \right) (2\sigma) - \frac{1}{2} (-2)(\sigma^{-3}) \sum_{i=1}^n (\ln(x_i) - \mu)^2 = 0\)

\(-\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_{i=1}^n (\ln(x_i) - \mu)^2 = 0 \Rightarrow \frac{1}{\sigma^3} \sum_{i=1}^n (\ln(x_i) - \mu)^2 = \frac{n}{\sigma}\)

\(\sum (\ln(x_i) - \mu)^2 = n\sigma^2 \Rightarrow \sigma^2 = \frac{1}{n} \sum (\ln(x_i) - \mu)^2\)

2.3 Part C

For this section, we will use the relationship between the variance and the moments of a distribution to derive the formula for the variance of the lognormal distribution.

\(Var(X) = E(X^2) - (E(X))^2\)

\(= \exp(2\mu + 2\sigma^2) - (\exp(\mu + \frac{1}{2}\sigma^2))^2\)

\(= \exp(2\mu + 2\sigma^2) - \exp(2\mu + \sigma^2) = \exp(2\mu) \exp(2\sigma^2) - \exp(2\mu) \exp(\sigma^2)\)

\(= \exp(2\mu + \sigma^2) [\exp(\sigma^2) - 1]\)

n <- length(oil) # sample size 

mu.mle <- sum(log(oil))/n # MLE for mu
sigmasquare.mle <- sum((log(oil) - (mu.mle))^2)/n # mle for sigma^2
mu.lognorm.mle <- exp(mu.mle + sigmasquare.mle/2) 

c(mu.mle, sigmasquare.mle, mu.lognorm.mle) # estimate for mean of lognormal
[1]  3.2685359  0.3271989 30.9426444

3 Question 3

For this question, we will estimate the mean of the lognormal distribution nonparametrically. Specifically, we will use bootstrapping to generate a sampling distribution of the mean and compare it to an asymptotic distribution of the sampling mean.

3.1 Part A

For Part A, we will write a function to estimate the maximum likelihood estimators for \(\mu\) and \(\sigma^2\)

lognorm.loglik <- function(params, data) {

mu.mle <- params[1]  # passing the parameters
ss.mle <- params[2]

n <- length(oil)   # sample size
  
  # Log-likelihood for lognormal distribution
  loglik.lognorm <- -sum(log(oil)) - n*log(2*pi)/2 - n*log(ss.mle/2) - sum((log(oil) - mean(ss.mle))^2)/(2*ss.mle)
  
  return(loglik.lognorm)    # Return negative for minimization

}


## Score (gradient) equation

lognorm.score <- function(params, data) {

mu.mle <- params[1]
ss.mle <- params[2]
 
n <- length(oil)
  
# Gradient for mean is mean of ln(x)
grad_mu <- mean(log(oil))
  
# Gradient for variance is variance of the ln(x)
grad_ss <- (n-1)*var(log(oil))/n 
# multiplying by (n-1) to cancel (n-1) in denominator then dividing by n to get gradient of variance

# Estimate of mean of lognormal
logmean.mle <- exp(grad_mu + grad_ss/2)

return(c(grad_mu, grad_ss, logmean.mle))


}


lognorm.score(oil)
[1]  3.2685359  0.3271989 30.9426444

According to the output, \(\widehat{\mu}\) is 3.2685, \(\widehat{\sigma^2}\) is 0.3272, and \(\widehat{\mu_{LN}}\) is 30.9426.

3.2 Part B

For Part B, we will take 1000 bootstrap samples and find the bootstrap MLE for each resample.

set.seed(1) # generate same numbers every time for sample

original.sample = sample(oil,       # resampling from oil sample 
                         50,       # sample size = 50
                       replace = FALSE) # sample without replacement
                                              
# Bootstrap sampling 
bt.sample.logmean.vec = NULL      # define an empty vector to hold sample means of repeated samples
for(i in 1:1000){              # starting for-loop to take bootstrap samples with n = 50
ith.bt.sample = sample(x = original.sample, # Original sample with 50 oil reserves
size = 50, #  size must be equal to the sample size
replace = TRUE) # MUST use WITH REPLACEMENT!!

bt.sample.logmean.vec[i] = mean(ith.bt.sample) # calculates the mean of 
# i-th bootstrap sample and saves it in the empty vector: bt.sample.logmean.vec 
}

mean(bt.sample.logmean.vec)
[1] 30.45754

According to the output, \(\widehat{\mu_{LN}}\) is 30.4575. Since we do have enough data points (\(n \gt 30\)), we could potentially apply the central limit theorem toward this sampling distribution.

3.3 Part C

For Part C, we will use a kernel density to estimate the density of the bootstrap sampling distribution.

hist(bt.sample.logmean.vec,   # data used for histogram
     breaks = 14,   # specify number of vertical bars
     probability = TRUE, # y values are density values
       xlab = "Sample Mean of the Lognormal Distribution", # change the label of x-axis
      # add a title to the histogram
        main = "Histogram of Bootstrap Sampling Distribution of Lognormal Sample Means",
          cex.main = 0.9,
       col.main = "navy", col = "lightblue")   
lines(density(bt.sample.logmean.vec, kernel = "gaussian"), col = "red", lwd = 2) # adds kernel density estimate curve to histogram

bt.density.logmean <- density(bt.sample.logmean.vec, kernel = "gaussian")
bt.density.logmean # calls gaussian kernel density estimates

Call:
    density.default(x = bt.sample.logmean.vec, kernel = "gaussian")

Data: bt.sample.logmean.vec (1000 obs.);    Bandwidth 'bw' = 0.5116

       x               y            
 Min.   :21.89   Min.   :8.770e-06  
 1st Qu.:26.36   1st Qu.:2.490e-03  
 Median :30.83   Median :2.655e-02  
 Mean   :30.83   Mean   :5.579e-02  
 3rd Qu.:35.31   3rd Qu.:1.121e-01  
 Max.   :39.78   Max.   :1.728e-01  

According to the output of the density function, it looks like \(\widehat{\mu_{LN}}\) is 30.83. As for the histogram, it somewhat resembles a normal distribution. The density curve (red) resembles a very rough looking bell shape.

---
title: "Maximum Likelihood Estimation (MLE)"
author: "Cheng Peng"
date: "West Chester University"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: yes
    toc_collapsed: yes
    code_folding: show
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}
if (!require("fitdistrplus")) {
  install.packages("fitdistrplus")
  library(fitdistrplus)
}
## library(fitdistrplus)
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```

\

For this exam, we will be estimating the parameters of a lognormal distribution using a data set that consists of the sizes of 50 recently drilled oil reserves.
```{r}
oil <- c(17.3, 24.8, 8.2, 31.5, 14.1, 42.7, 11.9, 55.3, 21.4, 9.7, 36.2, 18.6, 63.1, 13.2, 28.9, 
47.5, 19.8, 7.5, 33.4, 52.1, 16.0, 25.9, 38.7, 10.3, 44.2, 22.5, 58.6, 12.8, 30.3, 48.9, 
20.1, 35.4, 15.7, 60.2, 26.7, 41.3, 18.1, 53.8, 23.9, 46.2, 29.4, 37.6, 14.9, 50.5, 32.8, 
19.3, 56.7, 11.2, 39.5, 27.1)
```

# Question 1
For this question, we will use the method of moments to estimate $\mu$ and $\sigma^2$. Then we will use those parameter estimates to estimate the mean of the lognormal distribution.

## Part A
For Part A, we will derive and set up both the system of equations for the moments. The Derivation of the system of equations is as follows: and then subsequently obtain the parameter estimates to use to find the mean of the lognormal distribution. The derivations are as follows:


Distribution moments: $E(X^k) = \exp(k\mu + \frac{1}{2} k^2 \sigma^2)$

Distribution moment 1: $E(x) = \exp(\mu + \frac{1}{2} \sigma^2)$

Distribution moment 2: $E(x^2) = \exp(2\mu + 2\sigma^2)$ 

Sample moments: $E(X^k) = \frac{1}{n} \sum X_i^k = m_k$

Sample moment 1: $E(x) = \frac{1}{n} \sum X_i = \bar{x} = m_1$

Sample moment 2: $E(x^2) = \frac{1}{n} \sum X_i^2 = m_2$ 

$E(x) = E(x) \Rightarrow \exp(\mu + \frac{1}{2} \sigma^2) = \frac{1}{n} \sum_{i=1}^n X_i = \bar{x} = m_1$

$E(x^2) = E(x^2) = \exp(2\mu + 2\sigma^2) = \frac{1}{n} \sum X_i^2 = m_2$ 


## Part B
For this part, we will now solve for both $\mu$ and $\sigma^2$ and obtain the estimates.


$\exp(\mu + \frac{1}{2} \sigma^2) = m_1 \Rightarrow \mu + \frac{1}{2} \sigma^2 = \ln m_1$ 

$\exp(2\mu + 2\sigma^2) = m_2 \Rightarrow 2\mu + 2\sigma^2 = \ln m_2$ 

\begin{align*}
2\mu + \sigma^2 &= 2 \ln m_1  \\
2\mu + 2\sigma^2 &= \ln m_2 
\end{align*}

$\Rightarrow \sigma^2 = \ln m_2 - 2 \ln m_1 = \ln \frac{m_2}{m_1^2}$

\begin{align*}
\mu + \frac{1}{2} (\ln m_2 - 2 \ln m_1) &= \ln m_1 \\
\mu + \frac{1}{2} \ln m_2 - \ln m_1 &= \ln m_1 \\
\mu &= 2 \ln m_1 - \frac{1}{2} \ln m_2 
\end{align*}


## Part C
For part c, we must write a function that will return the parameter estimates and the mean of the lognormal distribution.
```{r}
n <- length(oil) # sample size of oil

m1.samp <- mean(oil) # first sample moment 
m2.samp <- mean(oil^2) # second sample moment

mm.param.est <- function(data) {

mm.mu.est = 2*log(m1.samp) - log(m2.samp)/2 
# method of moments estimator for mu
mm.var.est = log(m2.samp) - 2*log(m1.samp)
# method of moments estimator for varianc

mm.logmean.est <- exp(mm.mu.est + mm.var.est/2)
# mm estimator for lognormal mean

return(c(as.numeric(mm.mu.est), as.numeric(mm.var.est), as.numeric(mm.logmean.est)))
# converts outputs into a numerical vector and outputs list

}

mm.param.est(oil) # oil is the data set we're using



```
According to the output, $\widetilde{\mu}$ is 3.3013, $\widetilde\sigma^2$ is 0.234, and the population mean, $\widetilde{\mu_{LN}}$, is 30.516.

# Question 2
For this question, we will derive the maxmimum likelihood estimators for $\mu$ and $\sigma^2$. We will then use those estimates to generate both the maximum likelihood estimate for the mean of the lognormal distribution. We will also use the moments from the previous question to derive the variance estimate of the lognormal.

## Part A
For Part A, we will derive the score functions, also known as the gradient(s) of the log-likelihood function. The derivations are as follows:

$L(\mu, \sigma^2 | X) = \prod_{i=1}^n f(x_i | \mu, \sigma^2) = \prod_{i=1}^n \frac{1}{x_i \sqrt{2\pi\sigma^2}} \exp\left( -\frac{(\ln(x_i) - \mu)^2}{2\sigma^2} \right)$


$= \left( \prod_{i=1}^n \frac{1}{x_i} \right) \left( \frac{1}{\sqrt{2\pi\sigma^2}} \right)^n \exp\left( \sum_{i=1}^n -\frac{(\ln(x_i) - \mu)^2}{2\sigma^2} \right)$

$= \left( \prod_{i=1}^n x_i \right)^{-1} (2\pi\sigma^2)^{-\frac{n}{2}} \exp\left( -\frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2 \right)$ 

$\ln(L) = -\ln\left( \prod_{i=1}^n x_i \right) - \frac{n}{2} \ln(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2$ 

$= -\sum_{i=1}^n \ln(x_i) - \frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2$

## Part B
For Part B, we will now solve for both $\mu$ and $\sigma^2$ and obtain the maximum likelihood estimates.

$\frac{\partial \ln L}{\partial \mu} = \frac{\partial}{\partial \mu} \left( -\frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2 \right) = -\frac{1}{2\sigma^2} \sum_{i=1}^n \frac{\partial}{\partial \mu} (\ln(x_i) - \mu)^2$ 

$= -\frac{1}{2\sigma^2} (2)(-1) \sum_{i=1}^n (\ln(x_i) - \mu) = \frac{1}{\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu) = 0$ 

$\frac{1}{\sigma^2} \sum_{i=1}^n \ln(x_i) - \frac{1}{\sigma^2} \sum_{i=1}^n \mu = 0 \Rightarrow \sum \ln(x_i) = n\mu \Rightarrow \mu = \frac{1}{n} \sum \ln(x_i) \text{ or } \ln(\bar{x})$ 

$\frac{\partial \ln L}{\partial \sigma^2} = \frac{\partial}{\partial \sigma^2} \left( -\frac{n}{2} \ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (\ln(x_i) - \mu)^2 \right) = 0$ 

$= -\frac{n}{2} \left( \frac{1}{\sigma^2} \right) (2\sigma) - \frac{1}{2} (-2)(\sigma^{-3}) \sum_{i=1}^n (\ln(x_i) - \mu)^2 = 0$ 

$-\frac{n}{\sigma} + \frac{1}{\sigma^3} \sum_{i=1}^n (\ln(x_i) - \mu)^2 = 0 \Rightarrow \frac{1}{\sigma^3} \sum_{i=1}^n (\ln(x_i) - \mu)^2 = \frac{n}{\sigma}$ 

$\sum (\ln(x_i) - \mu)^2 = n\sigma^2 \Rightarrow \sigma^2 = \frac{1}{n} \sum (\ln(x_i) - \mu)^2$ 

## Part C
For this section, we will use the relationship between the variance and the moments of a distribution to derive the formula for the variance of the lognormal distribution.

$Var(X) = E(X^2) - (E(X))^2$ 

$= \exp(2\mu + 2\sigma^2) - (\exp(\mu + \frac{1}{2}\sigma^2))^2$ 

$= \exp(2\mu + 2\sigma^2) - \exp(2\mu + \sigma^2) = \exp(2\mu) \exp(2\sigma^2) - \exp(2\mu) \exp(\sigma^2)$ 

$= \exp(2\mu + \sigma^2) [\exp(\sigma^2) - 1]$ 


```{r}
n <- length(oil) # sample size 

mu.mle <- sum(log(oil))/n # MLE for mu
sigmasquare.mle <- sum((log(oil) - (mu.mle))^2)/n # mle for sigma^2
mu.lognorm.mle <- exp(mu.mle + sigmasquare.mle/2) 

c(mu.mle, sigmasquare.mle, mu.lognorm.mle) # estimate for mean of lognormal
```

# Question 3
For this question, we will estimate the mean of the lognormal distribution nonparametrically. Specifically, we will use bootstrapping to generate a sampling distribution of the mean and compare it to an asymptotic distribution of the sampling mean.

## Part A
For Part A, we will write a function to estimate the maximum likelihood estimators for $\mu$ and $\sigma^2$
```{r}
lognorm.loglik <- function(params, data) {

mu.mle <- params[1]  # passing the parameters
ss.mle <- params[2]

n <- length(oil)   # sample size
  
  # Log-likelihood for lognormal distribution
  loglik.lognorm <- -sum(log(oil)) - n*log(2*pi)/2 - n*log(ss.mle/2) - sum((log(oil) - mean(ss.mle))^2)/(2*ss.mle)
  
  return(loglik.lognorm)    # Return negative for minimization

}


## Score (gradient) equation

lognorm.score <- function(params, data) {

mu.mle <- params[1]
ss.mle <- params[2]
 
n <- length(oil)
  
# Gradient for mean is mean of ln(x)
grad_mu <- mean(log(oil))
  
# Gradient for variance is variance of the ln(x)
grad_ss <- (n-1)*var(log(oil))/n 
# multiplying by (n-1) to cancel (n-1) in denominator then dividing by n to get gradient of variance

# Estimate of mean of lognormal
logmean.mle <- exp(grad_mu + grad_ss/2)

return(c(grad_mu, grad_ss, logmean.mle))


}


lognorm.score(oil)
```
According to the output, $\widehat{\mu}$ is 3.2685, $\widehat{\sigma^2}$ is 0.3272, and $\widehat{\mu_{LN}}$ is 30.9426.

## Part B
For Part B, we will take 1000 bootstrap samples and find the bootstrap MLE for each resample.
```{r}
set.seed(1) # generate same numbers every time for sample

original.sample = sample(oil,       # resampling from oil sample 
                         50,       # sample size = 50
                       replace = FALSE) # sample without replacement
                                              
# Bootstrap sampling 
bt.sample.logmean.vec = NULL      # define an empty vector to hold sample means of repeated samples
for(i in 1:1000){              # starting for-loop to take bootstrap samples with n = 50
ith.bt.sample = sample(x = original.sample, # Original sample with 50 oil reserves
size = 50, #  size must be equal to the sample size
replace = TRUE) # MUST use WITH REPLACEMENT!!

bt.sample.logmean.vec[i] = mean(ith.bt.sample) # calculates the mean of 
# i-th bootstrap sample and saves it in the empty vector: bt.sample.logmean.vec 
}

mean(bt.sample.logmean.vec)
```
According to the output, $\widehat{\mu_{LN}}$ is 30.4575. Since we do have enough data points ($n \gt 30$), we could potentially apply the central limit theorem toward this sampling distribution.

## Part C
For Part C, we will use a kernel density to estimate the density of the bootstrap sampling distribution.
```{r}
hist(bt.sample.logmean.vec,   # data used for histogram
     breaks = 14,   # specify number of vertical bars
     probability = TRUE, # y values are density values
       xlab = "Sample Mean of the Lognormal Distribution", # change the label of x-axis
      # add a title to the histogram
        main = "Histogram of Bootstrap Sampling Distribution of Lognormal Sample Means",
          cex.main = 0.9,
       col.main = "navy", col = "lightblue")   
lines(density(bt.sample.logmean.vec, kernel = "gaussian"), col = "red", lwd = 2) # adds kernel density estimate curve to histogram

bt.density.logmean <- density(bt.sample.logmean.vec, kernel = "gaussian")
bt.density.logmean # calls gaussian kernel density estimates
```
According to the output of the density function, it looks like $\widehat{\mu_{LN}}$ is 30.83. As for the histogram, it somewhat resembles a normal distribution. The density curve (red) resembles a very rough looking bell shape.

