Midterm Exam Objectives

  • Understand the definition and relationship between PDFs and CDFs, including their non-parametric estimators: the empirical distribution function and kernel density estimation (KDE).

  • Estimate sampling distributions using simulation-based methods, specifically the bootstrap.

  • Derive point estimates of parameters using the method of moments and maximum likelihood estimation (MLE).

  • Describe the asymptotic (normal) and bootstrap sampling distributions of maximum likelihood estimators.

  • Apply all the above inferential procedures in a programming environment (such as Python) to perform numerical data analysis.

Lognormal Distrubution

5.1 Log-normal Distribution The log-normal distributions have been widely used in many different fields such as finance and economics, engineering and reliability, environmental science, medical and biology etc. If X follows a lognormal distribution: f(x;µ,σ2) = √ 1 x 2πσ2 exp −(lnx−µ)2 2σ2 where µ ∈ R is the mean of lnX and σ2 > 0 is the variance of lnX.

Working Dataset: A reservoir engineer at an oil exploration company has recently drilled 50 new wells in a shale formation. According to geological theory, the size of oil reserves in this type of formation is often modeled using a log-normal distribution. The engineer is responsible for estimating the typical reserve size and quantifying the uncertainty of these estimates for a presentation to senior management. The estimated ultimate recovery (EUR) for the 50 wells, measured in thousands of barrels (Mbbl), is as follows:

17.3, 24.8, 8.2, 31.5, 14.1, 42.7, 11.9, 55.3, 21.4, 9.7, 36.2, 18.6, 63.1, 13.2, 28.9, 47.5, 19.8, 7.5, 33.4, 52.1, 16.0, 25.9, 38.7, 10.3, 44.2, 22.5, 58.6, 12.8, 30.3, 48.9, 20.1, 35.4, 15.7, 60.2, 26.7, 41.3, 18.1, 53.8, 23.9, 46.2, 29.4, 37.6, 14.9, 50.5, 32.8, 19.3, 56.7, 11.2, 39.5, 27.1 

This assignment focuses on finding the asymptotic sampling distributions of the MLE ˆµ and ˆ to µ and σ2 based on an environmental study data.

Question 1: Method of Moments Estimation (MME) of µ and σ2

Estimate the parameters µ and σ2 (caution: not σ!) from the raw oil reserve data using the given k-th moment function. Organize your work into three parts:

  1. Derive the system of two equations for the method of moments estimators (MMEs) of µ and σ2.
#1a. Deriving the system of two equations for the methods of moment estimators (MME) of mu and sigma_sq

wellsize= sort(c(17.3, 24.8, 8.2, 31.5, 14.1, 42.7, 11.9, 55.3, 21.4, 9.7, 36.2, 18.6, 63.1, 13.2, 28.9, 47.5, 19.8, 7.5, 33.4, 52.1, 16.0, 25.9, 38.7, 10.3, 44.2, 22.5, 58.6, 12.8, 30.3, 48.9, 20.1, 35.4, 15.7, 60.2, 26.7, 41.3, 18.1, 53.8, 23.9, 46.2, 29.4, 37.6, 14.9, 50.5, 32.8, 19.3, 56.7, 11.2, 39.5, 27.1 ))


#m moments for parameters
m1 = mean(wellsize) #xbar
m2= mean(wellsize^2) #x squared bar
#algebric application for closed form population parameters sigma^2 and mu

mom_sigma_sq = log(m2) - 2*log(m1)
mom_mu = log(m1) -0.5 * mom_sigma_sq

cat("MME for mu:", mom_mu, "\nMME for sigma_sq:", mom_sigma_sq)
MME for mu: 3.301269 
MME for sigma_sq: 0.2339652
cat("Sample mean:", m1, "sample standard deviation:", m2)
Sample mean: 30.516 sample standard deviation: 1176.698
  1. Derive the closed-form expressions for the MMEs of µ, σ2, and µLN, denoted by µ~, σ2~, and µLN~, respectively.

System of equations: \[ \ln(\bar{x}) = \mu + \frac{1}{2}\sigma^2 \ln(\bar{x^2}) = 2\mu + 2\sigma^2 \]

Solve Equation 1 for \(\mu\) in terms of \(\sigma^2\): \[ [ mu = \ln(\bar{x}) - \frac{1}{2}\sigma^2 ] \]

Substitute this expression for \(\mu\):

\[ [ln({x^2}) = 2[ \ln(\bar{x}) - \frac{1}{2}\sigma^2] + 2\sigma^2] \]

Distribute and simplify to find \(\tilde{\sigma}^2\):}

\[ [ln({x^2}) = 2ln(\bar{x}) - \sigma^2 + 2\sigma^2] \] \[ [ln({x^2}) = 2ln(\bar{x}) + \sigma^2] \]

\[ [ \tilde{\sigma}^2 = ln({x^2}) - 2ln(\bar{x})] \]

Substitute \(\tilde{\sigma}^2\) back into the expression for \(\mu\):

\[ [\tilde{\mu} = ln(\bar{x}) - \frac{1}{2} [ln({x^2}) - 2ln(\bar{x})]] \] \[ [\tilde{\mu} = ln(\bar{x}) - \frac{1}{2} ln({x^2}) + ln(\bar{x})] \] \[ [\tilde{\mu} = 2ln(\bar{x}) - \frac{1}{2}ln({x^2})] \]

Observing both equations

\[ [\tilde{\mu} = 2ln(\bar{x}) - \frac{1}{2}\ln(x^2)] \]

\[ [\tilde{\sigma}^2 = ln({x^2}) - 2ln(\bar{x})] \]

  1. Write an R function that outputs the MMEs µ, σ2, and µLN based on the given data set.
sigma_squ_tilda = log(m2)-2* log(m1)
mu_tilda = 2*log(m1) - ((1/2)*log(m2))


cat("Mu tilda:", mu_tilda, "Population standard deviation:", sigma_squ_tilda)
Mu tilda: 3.301269 Population standard deviation: 0.2339652
#Writing a function for Methods of Moment Estimation

MMElognorm= function(wellsize)
{
  
#m moments for parameters
m1 = mean(wellsize) #xbar
m2= mean(wellsize^2) #x squared bar
#algebric application for closed form population parameters sigma^2 and mu

#mom_sigma_sq = log(m2) - 2*log(m1)  #remove its the same below
#mom_mu = log(m1) -0.5 * mom_sigma_sq
  
sigma_squ_tilda = log(m2)-2* log(m1) #calculate the MOM for standard deivation
mu_tilda = 2*log(m1) - ((1/2)*log(m2)) # calculate the MOM for mean

return(c(mu_tilda, sigma_squ_tilda, m1) )

}
results = MMElognorm(wellsize)

cat("Mu tilda:", mu_tilda, "Population standard deviation:", sigma_squ_tilda, "mu_LN", m1)
Mu tilda: 3.301269 Population standard deviation: 0.2339652 mu_LN 30.516

The outputs for the method of moment estimators are \(\tilde\mu\) = 3.3012685, \(\tilde\sigma^2\) = 0.2339652, and \(\tilde\mu_{LN}\) = 30.516.

Question 2: Finding the MLE of \(\mu\) and \(\sigma^2\)

Question 2.1

(1). Derive the score functions (i.e., the gradient of the log-likelihood) for the parameters of the log-normal distribution, and

set them to zero to obtain the system of score equations.

Begin with the full log-likelihood and compute its partial derivatives with respect to µ and σ2. [Hint: letting β = σ2 may simplify differentiation.]

The log-likelihood for the log-normal distribution is given by:

\[ \ell(\mu, \beta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \beta - \sum_{i=1}^n \ln x_i - \frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2 \] Partial derivative with respect to \(\mu\):

\[ \frac{\partial \ell}{\partial \mu} = \frac{\partial}{\partial \mu} \left[ -\frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2 \right] \\ = -\frac{1}{2\beta} \sum_{i=1}^n 2(\ln x_i - \mu) \cdot \frac{\partial}{\partial \mu}(\ln x_i - \mu) \\ = -\frac{1}{2\beta} \sum_{i=1}^n 2(\ln x_i - \mu) \cdot (-1) \\ U(\mu) = \frac{1}{\beta} \sum_{i=1}^n (\ln x_i - \mu) \]

Reach the score function of the equations Partial derivative with respect to \(\beta\), we treat \(\beta\) in substitution of \(\sigma^2\) for simplicity: \[ \frac{\partial \ell}{\partial \beta} = \frac{\partial}{\partial \beta} \left[ -\frac{n}{2} \ln \beta \right] - \frac{\partial}{\partial \beta} \left[ \frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2 \right] \\ = -\frac{n}{2\beta} - \frac{1}{2} \sum_{i=1}^n (\ln x_i - \mu)^2 \cdot \frac{d}{d\beta}(\beta^{-1}) \\ = -\frac{n}{2\beta} - \frac{1}{2} \sum_{i=1}^n (\ln x_i - \mu)^2 \cdot (-\beta^{-2}) \\ U(\beta) = -\frac{n}{2\beta} + \frac{1}{2\beta^2} \sum_{i=1}^n (\ln x_i - \mu)^2 \]

Replacing \(\beta\) with \(\sigma^2\)

\[ -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \mu)^2 = 0 \]

Question 2.2

(2). Perform some algebra to find the closed form of the MLE of µ and σ2, denoted by µ and σ2

Solving for \(\hat\mu\)

\[ \sum_{i=1}^n (\ln x_i - \mu) = 0 \\ \sum_{i=1}^n \ln x_i - n\mu = 0 \\ \hat{\mu} = \frac{1}{n} \sum_{i=1}^n \ln x_i \]

Solving for \(\hat\sigma^2\)

\[ \frac{n}{2\sigma^2} = \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \hat{\mu})^2 \\ n\sigma^2 = \sum_{i=1}^n (\ln x_i - \hat{\mu})^2 \\ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (\ln x_i - \hat{\mu})^2 \]

Question 2.3

(3). Recall that lognormal population mean (i.e., the first moment) is μLN = E[X] = exp(μ + 1/2σ2). Using the relationship between variance and the first and second moments,

Var(X) = E[X2] − (E[X])2,

we derive the population variance of the lognormal distribution, denoted as σ2 LN. This result will be used in applying the Central Limit Theorem in subsequent questions

\[ {Var}(X) = E[X^2] -(E[X])^2 \\ \]

\[ \sigma^2_{LN} = E[X^2] - (E[X])^2 \]

$$ E[X] = exp(+ ^2) \

E[X^k] = exp(k+ k22), \

E[X^2] = exp(2+ (22)2) \ E[X^2] = exp(2+ 2^2)

$$

\[ \sigma^2_{LN} = E[X^2] - (E[X])^2 \]

\[ \sigma^2_{LN} = exp(2\mu + 2\sigma^2) - [ exp(\mu + \frac{1}{2}\sigma^2)]^2 \] \[ \sigma^2_{LN} = exp(2\mu + 2\sigma^2) - exp(2\mu + \sigma^2) \]

Question 3: Sampling Distribution of Lognormal Sample Mean \(\hat\mu_{LN}\)

We have derived two different estimators of lognormal population mean μLN. This Question focuses on the sampling distribution of lognormal sample sample means based on MLE. The following are detailed instructions.

Question 3.1

(1). Write an R function to estimate the MLE of μ and σ2, denoted by bμ and cσ2 and return the MLE of μLN, denoted by \(\hat\mu_{LN}\).

The log-likelihood for the log-normal distribution is given by:

\[ \ell(\mu, \beta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \beta - \sum_{i=1}^n \ln x_i - \frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2 \]

#Write the R function that estimations the MLE of the mu and sigma^2 denoted by mu_hat and sigma square hat to then return
#The Maximum Limitation Estimator of the mu_LN, denoted mu_LN_hat


data = wellsize #inserting the data of interest to be wellsize



normlog_mle= function(data)
  {
    #LOG LIKLEIHOOD--> its a normallog so fine
    log_data = log(data)
    #Find the length of the data for gradient equations
    
    n = length(data)
    
    #closed form for mu_hat and sigma_sq_hat
    mu_hat = mean(log_data)
    sigma_sq_hat = mean((log_data-mu_hat)^2) 
    
    #Calculating the population paraters for MLE: mu_ln_hat
    
    mu_ln_hat = exp(mu_hat + (0.5* sigma_sq_hat))
    
cat("Mu hat:", mu_hat, "Sigma Squared hat - Predicted standard deviation:", sigma_sq_hat, "MLE mu_ln", mu_ln_hat)   


return(c(mu_hat = mu_hat, sigma_sq_hat = sigma_sq_hat, mu_ln_hat = mu_ln_hat))
}


mle_results = normlog_mle(wellsize)
Mu hat: 3.268536 Sigma Squared hat - Predicted standard deviation: 0.3271989 MLE mu_ln 30.94264
#no gradient function (closed form no optim) --> partial differentiation and set to 0




# Solve system for MLE (mu_LN_hat)

Question 3.2

(2). Take 1000 Bootstrap samples from the raw oil data set. For each bootstrap sample, call the above R function to find the bootstrap MLE of μLN, denoted by dμ∗ LN (1) , dμ∗ LN (2) , · · · , dμ∗ LN (1000).

Note: \[\mu_{LN} = \exp\left(\mu + \frac{1}{2}\sigma^2\right)\]

set.seed(143)

length(wellsize) #n =50
[1] 50
boot_mu_ln_mle = NULL
log_well= wellsize


for (i in 1:1000)
{

#bootstrap sample
# make sure sample is in its log form
  
 # take raw not log of mean #log(wellsize) --> log the bootstrapped sample
  
bootstrap_well= sample(x=log_well, 50, replace= TRUE) #bootstrapping from the well sample

log_boot_well=log(bootstrap_well) #take the log of the bootstrapped sample to run the log likelihood


#take the mean of the log likelihood bootstrap to input in MLE mu equation
mu_hat = mean(log_boot_well)  #MLE estimator for mu
sigma_sq_hat = mean((log_boot_well-mu_hat)^2) #MLE estimator for simga^2


#Now calculate the mu ln

boot_mu_ln_mle[i] = exp(mu_hat +(0.5* sigma_sq_hat)) #gives distrubution of the bootstrap for the mu ln mle

final_mu_ln_MLE = mean(boot_mu_ln_mle) #gives single value for bootstrapped mle mu ln

#return(c(final_mu_ln_MLE)) #only keeping last thing ran --> VERRY VERYY BAD FOR bootstrapping --> not keeping the distribution for the model
}

cat("Bootstrap MLE of μLN:", final_mu_ln_MLE)
Bootstrap MLE of μLN: 30.93064

The bootstrap MLE of \(\mu_{ln}\) is 30.9306417.

Question 3.3

(3). Using kernel density estimation approach to estimate the bootstrap density curves based on the bootstrap MLEs: dμ∗ LN (1) , dμ∗ LN (2) , · · · , dμ∗ LN (1000).

#sd_ml_MLE=sd(boot_mu_ln_mle) #you would need to calculate standard error of the distribution to make a bandwidth for the kernel 

kde = density(boot_mu_ln_mle, kernel = "gaussian")  #density(boot_mu_ln_mle, bw = sd_ml_MLE)

hist(boot_mu_ln_mle,                     # data used for histogram
     breaks = 14,                            # specify number of vertical bars
     probability = TRUE,
       xlab = "Bootstrap sample means",      # change the label of x-axis
      # add a title to the histogram
        main="Bootstrap Sampling Distribution and Kernel Comparison \n of Sample Means",
          cex.main = 0.9,
       col.main = "navy")  
lines(density(boot_mu_ln_mle, kernel = "gaussian"), col= "skyblue", lwd=2)

x_values <- seq(min(boot_mu_ln_mle), max(boot_mu_ln_mle), length = 100)
y_values <- dnorm(x_values, mean = mean(boot_mu_ln_mle), sd = sd(boot_mu_ln_mle))
lines(x_values, y_values, col = "red", lty = 2, lwd = 2) # Theoretical Normal
legend("topright", legend = c("KDE (Actual)", "Bootstrapped Normal (Theory)"), 
       col = c("skyblue", "red"), lty = c(1, 2), lwd = 2)

##Add asymptotic density curve

Question 3.4

(4). In order to find the asymptotic sampling distribution using the central limit theorem (CLT). That is, when sample size is large, ¯X → N  μLN, σLN √ n  where μLN and σLN can be estimated by using the plug-in principle of MLE. Based on the asymptotic sampling distribution of lognormal sample mean ¯X , add the asymptotic density curve to the above Bootstrap kernel density curve.

n = length(wellsize) #

#sigma / (sqrt(n))

########################Background info for log specific estimators
#a normal distribution  function dnorm() requires a mean and standard deviation 
mu_hat=  mean(log(wellsize)) # mean of natural log of sample


sigma_sq_hat = mean((log(wellsize)-mu_hat)^2) #needed to calculate standard error #no use ln value

#dnorm(wellsize, mean = mu_hat, sd= )
##############################################



#######LN values needed to calculate dnorm() asymptotic distribution 

mu_ln_hat = exp(mu_hat + (0.5* sigma_sq_hat))
sigma_sq_hat_ln= exp(2*mu_hat + sigma_sq_hat) * (exp(sigma_sq_hat) - 1) #sigma^2 ln hat value --> predicted MLE value for variation 


asympt_dis_se=sqrt(sigma_sq_hat_ln)/sqrt(n) #standard error for asymptotic sampling distribution 


x_axis = seq(min(boot_mu_ln_mle), max(boot_mu_ln_mle), length = 200) #defining range for distribution most valuable when being graphed

#Make theoretical density of asymptotic distribution 

asymptotic_dis_theory = dnorm(x_axis, mean = mu_ln_hat, sd = asympt_dis_se)

#asymptotic_dis_theory = dnorm(wellsize, mean= mu_ln_hat, sd=asympt_dis_se ) #sd --> enter standard error asympt_dis_se
kde = density(boot_mu_ln_mle, kernel = "gaussian")  #density(boot_mu_ln_mle, bw = sd_ml_MLE)

hist(boot_mu_ln_mle,                     # data used for histogram
     breaks = 14,                            # specify number of vertical bars
     probability = TRUE,
       xlab = "Bootstrap sample means",      # change the label of x-axis
      # add a title to the histogram
        main="Bootstrap Sampling Distribution, Kernel and Asymptotic Density \n of Sample Means",
          cex.main = 0.9,
       col.main = "navy")  
lines(density(boot_mu_ln_mle, kernel = "gaussian"), col= "skyblue", lwd=2)

lines(x_axis, asymptotic_dis_theory, col = "blue", lwd = 2, lty = 4)

x_values <- seq(min(boot_mu_ln_mle), max(boot_mu_ln_mle), length = 100) #defining graph range bootstrap
y_values <- dnorm(x_values, mean = mean(boot_mu_ln_mle), sd = sd(boot_mu_ln_mle))
lines(x_values, y_values, col = "red", lty = 2, lwd = 2) # Theoretical Normal
legend("topright", legend = c("Bootstrap Normal KDE (Actual)", "Bootstrap Normal (Theory)", "Asymptotic CLT"), 
       col = c("skyblue", "red", "blue"), lty = c(1, 2), lwd = 2)  #col = c("skyblue", "blue"), lty = c(1, 2), lwd = 2)

Question 3.5

(5). Write a short summary to compare the two sampling distributions of the sample mean of the lognormal distribution.

The description below refers to the behavior of the kernel density estimator (KDE) bootstrap exclusively when referring to the bootstrap distrbution.

The two major sampling distributions used above for the sample mean included bootstrap sampling distribution for \(\mu_{ln}\) and asymptotic distribution. Both distributions are used to estimate the behavior of a distribution and estimations for population parameters.

The bootstrapping distribution is an empirical estimation of the \(\mu_{LN}\) parameter by taking repeated random samples from the original well sample and repeatedly calculating a mean from each sample. All of the bootstrapped samples are then meaned to reach an \(\mu\) estimator. This is a simulated approach to the MLE \(\hat\mu_{LN}\). As the bootstrap resampling method is an indirect way to calculate and estimator, as in the collected distribution is not assessed soley within the single collected distribution but the estimator is calculated over the estimator over of collection of resamples from the originally collected distribution. Repeatably resampling and calculating random bootstrap variables to ultimately calculate the random variable estimator of interest, \(\hat\mu_{LN}\).

We see the relationship in the Figure above (last figure in question 3.4) the bootstrap gaussian kernel density has a better fit to the maxima and minimums of the distribution, while the asymptotic distribution under performs in assuming the shape for the most occurring density at \(\hat\mu_{LN}\) and has fatter tails than the bootstrap KDE, suggesting a weaker fit than the KDE bootstrap.

The more the bootstrap is sampled, it can be assumed the bootstrap estimator reaches closer to the true estimator. In examining the differences between a kernel and asymptotic distribution, the bootstrap kernel density best suits the empirical data rather than assuming normality with a large sample size to met an approximation for the asymptotic distribution. As the best suited normal distribution is assigned to the asymptotic distribution in its parametric approach in approximation rather than fitting for the data empirically.

In reflection for this lognormal distribution specifically, Both distributions are similar in shape showing a natural order pattern in the data set, in which the distributions meets their respective conditions. Additionally, both distributions show a similar value towards the \(\hat\mu_{LN}\), the bootstrap being 30.9306417 and the asymptotic distribution being 30.9426444.

The bootstrapped value for \(\hat\mu_{LN}\) is 30.9306417 which is very close to the asymptotic mean at30.9426444. Our two distributions capture the predicted \(\hat\mu_{LN}\) similarly suggesting some rigor in both tests. If a decision needs to be made about which distribution best suits the model, the bootstrap KDE resamping offers a stronger fit to the histogram’s raw or natural distribution suggesting the MLE parameters derived from this distribution may have more rigor than their asymptotic distribution counterparts, although again, both distributions follow very similar patterns and information. Any selection made between the two distributions, should not vary between each other very much by eye and graph analysis.

4 Bonus Question

Derive the Hessian matrix by computing the second-order partial derivatives of the log-likelihood function (with respect to μ and σ2). Then, translate the derived Hessian into an R function that takes the MLEs of the parameters and the data as inputs and returns the Hessian matrix. Finally, compare your analytically derived Hessian with the numerical Hessian returned by optim().

data= wellsize #log(wellsize)
log_like_wells_hess <- function(params, data) 
  {
  mu = params[1]       # shape parameter
  sigma_sq = params[2]  # scale parameter
  
  
  
n <- length(data)
  log_lik = (-n/2 * log(2*pi) - n/2 * log(sigma_sq) - sum(log(data)) - (1/(2*sigma_sq)) * sum((log(data) - mu)^2)) #we flip signs in the matrix process
  
return(log_lik)
}





log_likelihood_gradient = function(params, data) 
  {
  mu = params[1]
  sigma_sq = params[2]
  n = length(data)
#Gradient functions

  
# Taking the partial derivative with respect to mu
 d_mu <- (1/sigma_sq) * sum(log(data) - mu)
  
  # Partial derivative with respect to sigma_sq
  d_sigma_sq <- -n/(2 * sigma_sq) + (1/(2 * sigma_sq^2)) * sum((log(data) - mu)^2)
  
return(c(d_mu, d_sigma_sq))
}



analytical_hessian_func = function(params, data) 
  {
  mu = params[1]
  sigma_sq = params[2]
  n = length(data)
  
#Computing the second derivatives for mu and sigma^2
d2_mu <- -n / sigma_sq
  d2_sigma_sq <- n/(2 * sigma_sq^2) - (1/sigma_sq^3) * sum((log(data) - mu)^2)
  
  d_mu_d_sigma_sq <- -(1/sigma_sq^2) * sum(log(data) - mu)
  
# This analytically derived matrix   
hess_mat <- matrix(c(d2_mu, d_mu_d_sigma_sq, d_mu_d_sigma_sq, d2_sigma_sq),nrow = 2, byrow = TRUE)  

#return(c(d_mu, d_sigma_sq, d2_mu, d2_sigma_sq, d_mu_d_sigma_sq))  # Return log-likelihood  NOT the negative log-likelihood and gradients


  return(hess_mat)
}



# Finding Initial Values
## It is critical to select appropriate initial values to
## ensure fast convergence. In general, we can use whatever
## methods (such as MME) that are available to get appropriate
## initial values


# Using the MLE as the initial parameter estimates

initial_mu = mu_hat

initial_sigma_sq = sigma_sq_hat



#mle_optimization <- optim(par = c(initial_mu,initial_sigma_sq), fn = log_like_wells_hess, x = wellsize, hessian = TRUE)


# MLE using optim() with gradient



mle_result <- optim(
  par = c(mu_hat, sigma_sq_hat), #starting point of the predicted mu and sigma for the MLE
  fn = log_like_wells_hess,  
  gr = log_likelihood_gradient,
  data = wellsize,
  method = "L-BFGS-B",
  lower = c(-Inf, 1e-6),
  hessian = TRUE,
  control = list(fnscale = -1)
)


##
mle_result
$par
[1] 3.2685359 0.3271989

$value
[1] -206.444

$counts
function gradient 
       8        8 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

$hessian
              [,1]          [,2]
[1,] -1.528123e+02  5.185142e-14
[2,]  5.185142e-14 -2.335224e+02
#Comparing the hessian matrices derived analytically and optim()

numerical_hessian  <- mle_result$hessian
analytical_hessian <- analytical_hessian_func(mle_result$par, wellsize)

# Print for your midterm report
cat("--- Numerical Hessian from optim() ---\n")
--- Numerical Hessian from optim() ---
print(numerical_hessian)
              [,1]          [,2]
[1,] -1.528123e+02  5.185142e-14
[2,]  5.185142e-14 -2.335224e+02
cat("\n--- Analytically Derived Hessian ---\n")

--- Analytically Derived Hessian ---
print(analytical_hessian)
              [,1]          [,2]
[1,] -1.528123e+02  1.037019e-13
[2,]  1.037019e-13 -2.335158e+02

Both show negative correlation among \(\mu\) and \(\sigma^2\) and present similar values. Both show similarity in their sampling distribution.

---
title: "07- Midterm Exam: STA 506"
author: "Ezana Rivers"
date: " Due: 03-08-2026"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}
####
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
\

# Midterm Exam Objectives 


* Understand the definition and relationship between PDFs and CDFs, including their non-parametric estimators: the empirical distribution function and kernel density estimation (KDE). \

* Estimate sampling distributions using simulation-based methods, specifically the bootstrap. \

* Derive point estimates of parameters using the method of moments and maximum likelihood estimation (MLE). \
* Describe the asymptotic (normal) and bootstrap sampling distributions of maximum likelihood estimators. \
* Apply all the above inferential procedures in a programming environment (such as Python) to perform numerical data analysis. \



## Lognormal Distrubution

5.1 Log-normal Distribution The log-normal distributions have been widely used in many different fields such as finance and economics, engineering and reliability, environmental science, medical and biology etc. If X follows a lognormal distribution: f(x;µ,σ2) = √ 1 x 2πσ2 exp −(lnx−µ)2 2σ2 where µ ∈ R is the mean of lnX and σ2 > 0 is the variance of lnX.



Working Dataset: A reservoir engineer at an oil exploration company has recently drilled 50 new wells in a shale formation. According to geological theory, the size of oil reserves in this type of formation is often modeled using a log-normal distribution. The engineer is responsible for estimating the typical reserve size and quantifying the uncertainty of these estimates for a presentation to senior management. The estimated ultimate recovery (EUR) for the 50 wells, measured in thousands of barrels (Mbbl), is as follows: 

```
17.3, 24.8, 8.2, 31.5, 14.1, 42.7, 11.9, 55.3, 21.4, 9.7, 36.2, 18.6, 63.1, 13.2, 28.9, 47.5, 19.8, 7.5, 33.4, 52.1, 16.0, 25.9, 38.7, 10.3, 44.2, 22.5, 58.6, 12.8, 30.3, 48.9, 20.1, 35.4, 15.7, 60.2, 26.7, 41.3, 18.1, 53.8, 23.9, 46.2, 29.4, 37.6, 14.9, 50.5, 32.8, 19.3, 56.7, 11.2, 39.5, 27.1 
```

This assignment focuses on finding the asymptotic sampling distributions of the MLE ˆµ and ˆ to µ and σ2 based on an environmental study data.


# Question 1: Method of Moments Estimation (MME) of µ and σ2

Estimate the parameters µ and σ2 (caution: not σ!)  from the raw oil reserve data using the given k-th moment function. 
Organize your work into three parts:

(a) Derive the system of two equations for the method of moments estimators (MMEs) of µ and σ2.


```{r}
#1a. Deriving the system of two equations for the methods of moment estimators (MME) of mu and sigma_sq

wellsize= sort(c(17.3, 24.8, 8.2, 31.5, 14.1, 42.7, 11.9, 55.3, 21.4, 9.7, 36.2, 18.6, 63.1, 13.2, 28.9, 47.5, 19.8, 7.5, 33.4, 52.1, 16.0, 25.9, 38.7, 10.3, 44.2, 22.5, 58.6, 12.8, 30.3, 48.9, 20.1, 35.4, 15.7, 60.2, 26.7, 41.3, 18.1, 53.8, 23.9, 46.2, 29.4, 37.6, 14.9, 50.5, 32.8, 19.3, 56.7, 11.2, 39.5, 27.1 ))


#m moments for parameters
m1 = mean(wellsize) #xbar
m2= mean(wellsize^2) #x squared bar
#algebric application for closed form population parameters sigma^2 and mu

mom_sigma_sq = log(m2) - 2*log(m1)
mom_mu = log(m1) -0.5 * mom_sigma_sq

cat("MME for mu:", mom_mu, "\nMME for sigma_sq:", mom_sigma_sq)

cat("Sample mean:", m1, "sample standard deviation:", m2)
```


(b) Derive the closed-form expressions for the MMEs of µ, σ2, and µLN, denoted by µ~, σ2~, and µLN~, respectively.


System of equations:
$$
    \ln(\bar{x}) = \mu + \frac{1}{2}\sigma^2
     \ln(\bar{x^2}) = 2\mu + 2\sigma^2
$$




Solve Equation 1 for $\mu$ in terms of $\sigma^2$:
$$
[
mu = \ln(\bar{x}) - \frac{1}{2}\sigma^2
]
$$

Substitute this expression for $\mu$:
 
$$ 
[ln({x^2}) = 2[ \ln(\bar{x}) - \frac{1}{2}\sigma^2] + 2\sigma^2]
$$

Distribute and simplify to find $\tilde{\sigma}^2$:}

$$
[ln({x^2}) = 2ln(\bar{x}) - \sigma^2 + 2\sigma^2]
$$
$$
[ln({x^2}) = 2ln(\bar{x}) + \sigma^2]
$$

$$
[ \tilde{\sigma}^2 = ln({x^2}) - 2ln(\bar{x})]
$$
 


 Substitute $\tilde{\sigma}^2$ back into the expression for $\mu$:

$$
[\tilde{\mu} = ln(\bar{x}) - \frac{1}{2} [ln({x^2}) - 2ln(\bar{x})]]
$$
$$
[\tilde{\mu} = ln(\bar{x}) - \frac{1}{2} ln({x^2}) + ln(\bar{x})]
$$
$$
[\tilde{\mu} = 2ln(\bar{x}) - \frac{1}{2}ln({x^2})]
$$


Observing both equations

$$
[\tilde{\mu} = 2ln(\bar{x}) - \frac{1}{2}\ln(x^2)]
$$

$$
[\tilde{\sigma}^2 = ln({x^2}) - 2ln(\bar{x})]
$$



(c) Write an R function that outputs the MMEs µ, σ2, and µLN based on the given data set.


```{r}


sigma_squ_tilda = log(m2)-2* log(m1)
mu_tilda = 2*log(m1) - ((1/2)*log(m2))


cat("Mu tilda:", mu_tilda, "Population standard deviation:", sigma_squ_tilda)


```
```{r}
#Writing a function for Methods of Moment Estimation

MMElognorm= function(wellsize)
{
  
#m moments for parameters
m1 = mean(wellsize) #xbar
m2= mean(wellsize^2) #x squared bar
#algebric application for closed form population parameters sigma^2 and mu

#mom_sigma_sq = log(m2) - 2*log(m1)  #remove its the same below
#mom_mu = log(m1) -0.5 * mom_sigma_sq
  
sigma_squ_tilda = log(m2)-2* log(m1) #calculate the MOM for standard deivation
mu_tilda = 2*log(m1) - ((1/2)*log(m2)) # calculate the MOM for mean

return(c(mu_tilda, sigma_squ_tilda, m1) )

}
results = MMElognorm(wellsize)

cat("Mu tilda:", mu_tilda, "Population standard deviation:", sigma_squ_tilda, "mu_LN", m1)

```

The outputs for the method of moment estimators are $\tilde\mu$ = `r mu_tilda`, $\tilde\sigma^2$ = `r sigma_squ_tilda`, and $\tilde\mu_{LN}$ = `r m1`.


# Question 2: Finding the MLE of $\mu$ and $\sigma^2$ 

## Question 2.1 

(1). Derive the score functions (i.e., the gradient of the log-likelihood) for the parameters of the log-normal distribution, and 

set them to zero to obtain the system of score equations. 

Begin with the full log-likelihood and compute its partial derivatives with respect to µ and σ2. [Hint: letting β = σ2 may simplify differentiation.]



The log-likelihood for the log-normal distribution is given by:

$$
\ell(\mu, \beta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \beta - \sum_{i=1}^n \ln x_i - \frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2
$$
 Partial derivative with respect to $\mu$:
 
 
$$
\frac{\partial \ell}{\partial \mu} = \frac{\partial}{\partial \mu} \left[ -\frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2 \right] \\
    = -\frac{1}{2\beta} \sum_{i=1}^n 2(\ln x_i - \mu) \cdot \frac{\partial}{\partial \mu}(\ln x_i - \mu) \\
    = -\frac{1}{2\beta} \sum_{i=1}^n 2(\ln x_i - \mu) \cdot (-1) \\
    U(\mu) = \frac{1}{\beta} \sum_{i=1}^n (\ln x_i - \mu)
$$



Reach the score function of the equations
Partial derivative with respect to $\beta$, we treat $\beta$ in substitution of $\sigma^2$ for simplicity:
$$
\frac{\partial \ell}{\partial \beta} = \frac{\partial}{\partial \beta} \left[ -\frac{n}{2} \ln \beta \right] - \frac{\partial}{\partial \beta} \left[ \frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2 \right] \\
    = -\frac{n}{2\beta} - \frac{1}{2} \sum_{i=1}^n (\ln x_i - \mu)^2 \cdot \frac{d}{d\beta}(\beta^{-1}) \\
    = -\frac{n}{2\beta} - \frac{1}{2} \sum_{i=1}^n (\ln x_i - \mu)^2 \cdot (-\beta^{-2}) \\
    U(\beta) = -\frac{n}{2\beta} + \frac{1}{2\beta^2} \sum_{i=1}^n (\ln x_i - \mu)^2
$$

Replacing $\beta$ with $\sigma^2$  

$$
-\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \mu)^2 = 0 
$$


## Question 2.2

(2). Perform some algebra to find the closed form of the MLE of µ and σ2, denoted by µ and σ2


Solving for $\hat\mu$

$$
\sum_{i=1}^n (\ln x_i - \mu) = 0 \\
\sum_{i=1}^n \ln x_i - n\mu = 0 \\
\hat{\mu} = \frac{1}{n} \sum_{i=1}^n \ln x_i
$$


Solving for $\hat\sigma^2$

$$
\frac{n}{2\sigma^2} = \frac{1}{2\sigma^4} \sum_{i=1}^n (\ln x_i - \hat{\mu})^2 \\
n\sigma^2 = \sum_{i=1}^n (\ln x_i - \hat{\mu})^2 \\
\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (\ln x_i - \hat{\mu})^2
$$



## Question 2.3 

(3). Recall that lognormal population mean (i.e., the first moment) is
μLN = E[X] = exp(μ + 1/2σ2).
Using the relationship between variance and the first and second moments, 

Var(X) = E[X2] − (E[X])2, 

we derive the population variance of the lognormal distribution, denoted as σ2
LN. This result will be used in
applying the Central Limit Theorem in subsequent questions


$$
{Var}(X) = E[X^2] -(E[X])^2 \\
$$

$$
\sigma^2_{LN} = E[X^2] - (E[X])^2 
$$

$$
E[X] = exp(\mu + \frac{1}{2}\sigma^2) \\

E[X^k] = exp(k\mu + \frac{1}{k}k^2\sigma^2), \\

E[X^2] = exp(2\mu + \frac{1}{2}(2^2)\sigma^2) \\
E[X^2] = exp(2\mu + 2\sigma^2)

$$

$$
\sigma^2_{LN} = E[X^2] - (E[X])^2 
$$


$$
\sigma^2_{LN} = exp(2\mu + 2\sigma^2) - [ exp(\mu + \frac{1}{2}\sigma^2)]^2 
$$
$$
\sigma^2_{LN} = exp(2\mu + 2\sigma^2) - exp(2\mu + \sigma^2)
$$






# Question 3: Sampling Distribution of Lognormal Sample Mean $\hat\mu_{LN}$

We have derived two different estimators of lognormal population mean μLN. This Question focuses on
the sampling distribution of lognormal sample sample means based on MLE. The following are detailed
instructions.

# Question 3.1 

(1). Write an R function to estimate the MLE of μ and σ2, denoted by bμ and cσ2 and return the MLE of
μLN, denoted by  $\hat\mu_{LN}$.

The log-likelihood for the log-normal distribution is given by:

$$
\ell(\mu, \beta) = -\frac{n}{2} \ln(2\pi) - \frac{n}{2} \ln \beta - \sum_{i=1}^n \ln x_i - \frac{1}{2\beta} \sum_{i=1}^n (\ln x_i - \mu)^2
$$


```{r}

#Write the R function that estimations the MLE of the mu and sigma^2 denoted by mu_hat and sigma square hat to then return
#The Maximum Limitation Estimator of the mu_LN, denoted mu_LN_hat


data = wellsize #inserting the data of interest to be wellsize



normlog_mle= function(data)
  {
    #LOG LIKLEIHOOD--> its a normallog so fine
    log_data = log(data)
    #Find the length of the data for gradient equations
    
    n = length(data)
    
    #closed form for mu_hat and sigma_sq_hat
    mu_hat = mean(log_data)
    sigma_sq_hat = mean((log_data-mu_hat)^2) 
    
    #Calculating the population paraters for MLE: mu_ln_hat
    
    mu_ln_hat = exp(mu_hat + (0.5* sigma_sq_hat))
    
cat("Mu hat:", mu_hat, "Sigma Squared hat - Predicted standard deviation:", sigma_sq_hat, "MLE mu_ln", mu_ln_hat)   


return(c(mu_hat = mu_hat, sigma_sq_hat = sigma_sq_hat, mu_ln_hat = mu_ln_hat))
}


mle_results = normlog_mle(wellsize)


#no gradient function (closed form no optim) --> partial differentiation and set to 0




# Solve system for MLE (mu_LN_hat)


```
## Question 3.2 

(2). Take 1000 Bootstrap samples from the raw oil data set. For each bootstrap sample, call the above R
function to find the bootstrap MLE of μLN, denoted by dμ∗
LN
(1)
, dμ∗
LN
(2)
, · · · , dμ∗
LN
(1000).

Note: 
$$\mu_{LN} = \exp\left(\mu + \frac{1}{2}\sigma^2\right)$$

```{r}

set.seed(143)

length(wellsize) #n =50

boot_mu_ln_mle = NULL
log_well= wellsize


for (i in 1:1000)
{

#bootstrap sample
# make sure sample is in its log form
  
 # take raw not log of mean #log(wellsize) --> log the bootstrapped sample
  
bootstrap_well= sample(x=log_well, 50, replace= TRUE) #bootstrapping from the well sample

log_boot_well=log(bootstrap_well) #take the log of the bootstrapped sample to run the log likelihood


#take the mean of the log likelihood bootstrap to input in MLE mu equation
mu_hat = mean(log_boot_well)  #MLE estimator for mu
sigma_sq_hat = mean((log_boot_well-mu_hat)^2) #MLE estimator for simga^2


#Now calculate the mu ln

boot_mu_ln_mle[i] = exp(mu_hat +(0.5* sigma_sq_hat)) #gives distrubution of the bootstrap for the mu ln mle

final_mu_ln_MLE = mean(boot_mu_ln_mle) #gives single value for bootstrapped mle mu ln

#return(c(final_mu_ln_MLE)) #only keeping last thing ran --> VERRY VERYY BAD FOR bootstrapping --> not keeping the distribution for the model
}

cat("Bootstrap MLE of μLN:", final_mu_ln_MLE)

```
The bootstrap MLE of $\mu_{ln}$ is `r final_mu_ln_MLE`. 

```{r include=FALSE} 
#Can remove bc question 3.3 creates graph
hist(boot_mu_ln_mle,                     # data used for histogram
     breaks = 14,                            # specify number of vertical bars
     probability = TRUE,
       xlab = "Bootstrap sample means",      # change the label of x-axis
      # add a title to the histogram
        main="Bootstrap Sampling Distribution \n of Sample Means",
          cex.main = 0.9,
       col.main = "navy")   
lines(density(boot_mu_ln_mle), col = "skyblue", lwd = 2)


```
# Question 3.3 

(3). Using kernel density estimation approach to estimate the bootstrap density curves based on the bootstrap MLEs: 
dμ∗
LN
(1)
, dμ∗
LN
(2)
, · · · , dμ∗
LN
(1000).


```{r}

#sd_ml_MLE=sd(boot_mu_ln_mle) #you would need to calculate standard error of the distribution to make a bandwidth for the kernel 

kde = density(boot_mu_ln_mle, kernel = "gaussian")  #density(boot_mu_ln_mle, bw = sd_ml_MLE)

hist(boot_mu_ln_mle,                     # data used for histogram
     breaks = 14,                            # specify number of vertical bars
     probability = TRUE,
       xlab = "Bootstrap sample means",      # change the label of x-axis
      # add a title to the histogram
        main="Bootstrap Sampling Distribution and Kernel Comparison \n of Sample Means",
          cex.main = 0.9,
       col.main = "navy")  
lines(density(boot_mu_ln_mle, kernel = "gaussian"), col= "skyblue", lwd=2)

x_values <- seq(min(boot_mu_ln_mle), max(boot_mu_ln_mle), length = 100)
y_values <- dnorm(x_values, mean = mean(boot_mu_ln_mle), sd = sd(boot_mu_ln_mle))
lines(x_values, y_values, col = "red", lty = 2, lwd = 2) # Theoretical Normal
legend("topright", legend = c("KDE (Actual)", "Bootstrapped Normal (Theory)"), 
       col = c("skyblue", "red"), lty = c(1, 2), lwd = 2)


##Add asymptotic density curve

```


## Question 3.4

(4). In order to find the asymptotic sampling distribution using the central limit theorem (CLT). That is,
when sample size is large,
¯X
→ N

μLN,
σLN √
n

where μLN and σLN can be estimated by using the plug-in principle of MLE. Based on the asymptotic
sampling distribution of lognormal sample mean ¯X , add the asymptotic density curve to the above
Bootstrap kernel density curve.



```{r}

n = length(wellsize) #

#sigma / (sqrt(n))

########################Background info for log specific estimators
#a normal distribution  function dnorm() requires a mean and standard deviation 
mu_hat=  mean(log(wellsize)) # mean of natural log of sample


sigma_sq_hat = mean((log(wellsize)-mu_hat)^2) #needed to calculate standard error #no use ln value

#dnorm(wellsize, mean = mu_hat, sd= )
##############################################



#######LN values needed to calculate dnorm() asymptotic distribution 

mu_ln_hat = exp(mu_hat + (0.5* sigma_sq_hat))
sigma_sq_hat_ln= exp(2*mu_hat + sigma_sq_hat) * (exp(sigma_sq_hat) - 1) #sigma^2 ln hat value --> predicted MLE value for variation 


asympt_dis_se=sqrt(sigma_sq_hat_ln)/sqrt(n) #standard error for asymptotic sampling distribution 


x_axis = seq(min(boot_mu_ln_mle), max(boot_mu_ln_mle), length = 200) #defining range for distribution most valuable when being graphed

#Make theoretical density of asymptotic distribution 

asymptotic_dis_theory = dnorm(x_axis, mean = mu_ln_hat, sd = asympt_dis_se)

#asymptotic_dis_theory = dnorm(wellsize, mean= mu_ln_hat, sd=asympt_dis_se ) #sd --> enter standard error asympt_dis_se

```



```{r}


kde = density(boot_mu_ln_mle, kernel = "gaussian")  #density(boot_mu_ln_mle, bw = sd_ml_MLE)

hist(boot_mu_ln_mle,                     # data used for histogram
     breaks = 14,                            # specify number of vertical bars
     probability = TRUE,
       xlab = "Bootstrap sample means",      # change the label of x-axis
      # add a title to the histogram
        main="Bootstrap Sampling Distribution, Kernel and Asymptotic Density \n of Sample Means",
          cex.main = 0.9,
       col.main = "navy")  
lines(density(boot_mu_ln_mle, kernel = "gaussian"), col= "skyblue", lwd=2)

lines(x_axis, asymptotic_dis_theory, col = "blue", lwd = 2, lty = 4)

x_values <- seq(min(boot_mu_ln_mle), max(boot_mu_ln_mle), length = 100) #defining graph range bootstrap
y_values <- dnorm(x_values, mean = mean(boot_mu_ln_mle), sd = sd(boot_mu_ln_mle))
lines(x_values, y_values, col = "red", lty = 2, lwd = 2) # Theoretical Normal
legend("topright", legend = c("Bootstrap Normal KDE (Actual)", "Bootstrap Normal (Theory)", "Asymptotic CLT"), 
       col = c("skyblue", "red", "blue"), lty = c(1, 2), lwd = 2)  #col = c("skyblue", "blue"), lty = c(1, 2), lwd = 2)



```


## Question 3.5 

(5). Write a short summary to compare the two sampling distributions of the sample mean of the lognormal distribution.

The description below refers to the behavior of the kernel density estimator (KDE) bootstrap exclusively when referring to the bootstrap distrbution.

The two major sampling distributions used above for the sample mean included bootstrap sampling distribution for $\mu_{ln}$ and asymptotic distribution. Both distributions are used to estimate the behavior of a distribution and estimations for population parameters.


The bootstrapping distribution is an empirical estimation of the $\mu_{LN}$ parameter by taking repeated random samples from the original well sample and repeatedly calculating a mean from each sample. All of the bootstrapped samples are then meaned to reach an $\mu$ estimator. This is a simulated approach to the MLE $\hat\mu_{LN}$. As the bootstrap resampling method is an indirect way to calculate and estimator, as in the collected distribution is not assessed soley within the single collected distribution but the estimator is calculated over the estimator over of collection of resamples from the originally collected distribution. Repeatably resampling and calculating random bootstrap variables to ultimately calculate the random variable estimator of interest, $\hat\mu_{LN}$.

We see the relationship in the Figure above (last figure in question 3.4) the bootstrap gaussian kernel density has a better fit to the maxima and minimums of the distribution, while the asymptotic distribution under performs in assuming the shape for the most occurring density at $\hat\mu_{LN}$ and has fatter tails than the bootstrap KDE, suggesting a weaker fit than the KDE bootstrap.

The more the bootstrap is sampled, it can be assumed the bootstrap estimator reaches closer to the true estimator. In examining the differences between a kernel and asymptotic distribution, the bootstrap kernel density best suits the empirical data rather than assuming normality with a large sample size to met an approximation for the asymptotic distribution. As the best suited normal distribution is assigned to the asymptotic distribution in its parametric approach in approximation rather than fitting for the data empirically. 

In reflection for this lognormal distribution specifically, Both distributions are similar in shape showing a natural order pattern in the data set, in which the distributions meets their respective conditions. Additionally, both distributions show a similar value towards the $\hat\mu_{LN}$, the bootstrap being `r final_mu_ln_MLE` and the asymptotic distribution being `r mu_ln_hat`.

The bootstrapped value for $\hat\mu_{LN}$ is `r final_mu_ln_MLE` which is very close to the asymptotic mean at`r mu_ln_hat`. Our two distributions capture the predicted $\hat\mu_{LN}$ similarly suggesting some rigor in both tests. If a decision needs to be made about which distribution best suits the model, the bootstrap KDE resamping offers a stronger fit to the histogram's raw or natural distribution suggesting the MLE parameters derived from this distribution may have more rigor than their asymptotic distribution counterparts, although again, both distributions follow very similar patterns and information. Any selection made between the two distributions, should not vary between each other very much by eye and graph analysis.

# 4 Bonus Question

Derive the Hessian matrix by computing the second-order partial derivatives of the log-likelihood function
(with respect to μ and σ2). Then, translate the derived Hessian into an R function that takes the MLEs of the parameters and the data as inputs and returns the Hessian matrix. Finally, compare your analytically derived Hessian with the numerical Hessian returned by optim().


```{r}

data= wellsize #log(wellsize)
log_like_wells_hess <- function(params, data) 
  {
  mu = params[1]       # shape parameter
  sigma_sq = params[2]  # scale parameter
  
  
  
n <- length(data)
  log_lik = (-n/2 * log(2*pi) - n/2 * log(sigma_sq) - sum(log(data)) - (1/(2*sigma_sq)) * sum((log(data) - mu)^2)) #we flip signs in the matrix process
  
return(log_lik)
}





log_likelihood_gradient = function(params, data) 
  {
  mu = params[1]
  sigma_sq = params[2]
  n = length(data)
#Gradient functions

  
# Taking the partial derivative with respect to mu
 d_mu <- (1/sigma_sq) * sum(log(data) - mu)
  
  # Partial derivative with respect to sigma_sq
  d_sigma_sq <- -n/(2 * sigma_sq) + (1/(2 * sigma_sq^2)) * sum((log(data) - mu)^2)
  
return(c(d_mu, d_sigma_sq))
}



analytical_hessian_func = function(params, data) 
  {
  mu = params[1]
  sigma_sq = params[2]
  n = length(data)
  
#Computing the second derivatives for mu and sigma^2
d2_mu <- -n / sigma_sq
  d2_sigma_sq <- n/(2 * sigma_sq^2) - (1/sigma_sq^3) * sum((log(data) - mu)^2)
  
  d_mu_d_sigma_sq <- -(1/sigma_sq^2) * sum(log(data) - mu)
  
# This analytically derived matrix   
hess_mat <- matrix(c(d2_mu, d_mu_d_sigma_sq, d_mu_d_sigma_sq, d2_sigma_sq),nrow = 2, byrow = TRUE)  

#return(c(d_mu, d_sigma_sq, d2_mu, d2_sigma_sq, d_mu_d_sigma_sq))  # Return log-likelihood  NOT the negative log-likelihood and gradients


  return(hess_mat)
}



# Finding Initial Values
## It is critical to select appropriate initial values to
## ensure fast convergence. In general, we can use whatever
## methods (such as MME) that are available to get appropriate
## initial values


# Using the MLE as the initial parameter estimates

initial_mu = mu_hat

initial_sigma_sq = sigma_sq_hat



#mle_optimization <- optim(par = c(initial_mu,initial_sigma_sq), fn = log_like_wells_hess, x = wellsize, hessian = TRUE)


# MLE using optim() with gradient



mle_result <- optim(
  par = c(mu_hat, sigma_sq_hat), #starting point of the predicted mu and sigma for the MLE
  fn = log_like_wells_hess,  
  gr = log_likelihood_gradient,
  data = wellsize,
  method = "L-BFGS-B",
  lower = c(-Inf, 1e-6),
  hessian = TRUE,
  control = list(fnscale = -1)
)


##
mle_result

#Comparing the hessian matrices derived analytically and optim()

numerical_hessian  <- mle_result$hessian
analytical_hessian <- analytical_hessian_func(mle_result$par, wellsize)

# Print for your midterm report
cat("--- Numerical Hessian from optim() ---\n")
print(numerical_hessian)

cat("\n--- Analytically Derived Hessian ---\n")
print(analytical_hessian)

```
Both show negative correlation among $\mu$ and $\sigma^2$ and present similar values. Both show similarity in their sampling distribution. 
 