# read in data
data = rbind.data.frame(5.2, 7.8, 9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9,
19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7,
25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3,
30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9,
36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
41.9, 42.3, 42.7, 43.1, 43.5)
colnames(data) = "time"
# time is in months
Intro and Setup
The dataset at our disposal for this analyis is made up of 75
recorded failure times (measured in months) for wind turbine gearbox
components. Summary statistics as well as a histogram depiction of our
sample data can be seen below.
sum_stats_table = function(variable){
table = cbind(
round(min(variable),2),
round(quantile(variable, 0.25),2),
round(median(variable),2),
round(mean(variable),2),
round(quantile(variable, 0.75),2),
round(max(variable),2)
)
colnames(table) = c("Min", "Q1", "Med", "Mean", "Q3", "Max")
rownames(table) = NULL
return(table)
}
Summary_Table = sum_stats_table(variable = data$time)
kable(Summary_Table,align = "c",
caption = "<span style='color:##000000;'>
Summary Stats of Survival Times (Months)</span>") %>%
kable_styling(
bootstrap_options = c("striped", "bordered"),
full_width = FALSE,
position = "center")
Summary Stats of Survival Times
(Months)
|
Min
|
Q1
|
Med
|
Mean
|
Q3
|
Max
|
|
5.2
|
21.25
|
28.7
|
28.19
|
36.1
|
43.5
|
ggplot(data = data) +
geom_histogram(mapping = aes(x = time, y = after_stat(density)),
color = "black",
# argument color in geom_histogram is for color of bin lines
fill = "lightblue",
bins = 10,) +
labs(title = "Distribution of Survival Times",
x = "Time (Months)",
y = "Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
) # this is the equivalent of box(lwd = 3) in base R

The engineering team at a wind energy company is unsure whether to
label and treat the broader population distribution of fail times as
exponentially or Weibull distributed. The probability density of an
exponential distribution is simply a more simplified version of the
Weibull’s. It is the value of our shape parameter, \(\beta\), that determines which of these
distribution’s we will label as the framework for predictive analysis on
gearbox fail times moving forward.
For the purposes of today’s analysis, we will treat the null
hypothesis that said population is exponentially distributed (AKA \(\beta = 1\)), and the alternative
hypothesis that a Weibull distribution more accurately represents the
broader population of gearbox fail times (AKA \(\beta \neq 1\)).
\[\begin{align}
H_0&: \beta = 1 \quad \text{Takeaway: Exponential} \\
H_a&: \beta \neq 1 \quad \text{Takeaway: Weibull}
\end{align}\]
In order to assess the validity of these opposing hypotheses, we must
first quantify an estimate for our parameter \(\beta\) under the assumption of each
scenario (Weibull distribution vs exponential distribution). In the
following sections, I will use maximum likelihood estimation (MLE) to do
so.
a.) MLE of Weibull
Operating under the assumption that the population of gearbox fail
times is Weibull distributed, we can use MLE to
simultaneously generate estimates for the shape parameter, \(\beta\), and the the scale parameter, \(\lambda\). We’ll denote these estimates as
\(\hat{\beta}\) and \(\hat{\lambda}\) respectively.
Parameter calculations were performed with R’s built-in optim() function. For mathematical reference,
the log-likelihood function for the Weibull distribution as well as the
partial derivatives of that very function with respect to both \(\beta\) and \(\lambda\) are below.
\[\Large \ell(\beta,\lambda|t) =
[n \times \ln(\beta)] -
[n \times \beta \times \ln(\lambda)]
+ [(\beta - 1) \times \sum\ln(t_i)]
- \sum\left(\frac{(t_i)}{\lambda}\right)^\beta \]
\[\Large \frac{\partial f}{\partial \beta}
=
\frac{n}{\beta} -
n \times \ln(\lambda)
+ \sum\ln(t_i) -
\sum\left(\ln\frac{t_i}{\lambda}\right) \times
\left(\frac{t_i}{\lambda}\right)^\beta \]
\[\Large \frac{\partial f}{\partial
\lambda} = \frac{\beta}{\lambda} \times
\sum\left(\frac{t_i}{\lambda}\right)^\beta - n \]
## MLE Weibull Estimation
# code is largely copied from HW 9 assignment, which took formulas and procedure from following:
# Week 11, Section 5.2
# Week 7, Section 5
#### define likelihood function
## *negative* log likelihood function for Weibull distribution
# by negative log likelihood, this function....
log_likelihood = function(params, data) {
beta = params[1] # shape
lambda = params[2] # scale
# Ensure parameters are positive
if (beta <= 0 || lambda <= 0) {
return(1e10) # Return large value for invalid parameters
}
n = length(data)
log_lik = n * log(beta) - n * beta * log(lambda) +
(beta - 1) * sum(log(data)) - sum((data / lambda)^beta)
return(log_lik) # Return log-likelihood! NOT the negative log-likelihood
}
#### define score equations (partial derivatives)
log_likelihood_gradient = function(params, data) {
beta = params[1] # shape parameter
lambda = params[2] # scale parameter
n = length(data)
partial_deriv_beta = n/beta - n * log(lambda) + sum(log(data)) -
sum(log(data/lambda) * (data/lambda)^beta)
partial_deriv_lambda = (beta/lambda)*(sum((data/lambda)^beta) -n)
return(c(partial_deriv_beta,
partial_deriv_lambda)) # Gradient of log-likelihood, NOT the negative scores
}
#### use log likelihood and score equations in optim function
# need to have good initial guesses for beta and lambda for the optim function
beta_guess = (sd(data$time)/mean(data$time))^(-1.086)
lambda_guess = mean(data$time)/gamma(1 + 1/beta_guess)
weib_mle_estimates = optim(
par = c(beta_guess, lambda_guess), # par argument is initial "guesses"
data = data$time,
fn = log_likelihood,
gr = log_likelihood_gradient,
method = "L-BFGS-B",
hessian = TRUE,
# will use the Hessian matrix later on
control = list(maxit = 1000,
fnscale = -1,
trace = FALSE,
abstol = 1e-8)
)
#weib_mle_estimates
weib_mle_beta = weib_mle_estimates$par[1] # shape
weib_mle_lambda = weib_mle_estimates$par[2] # scale
Post-MLE, we have estimates of about 3.371 and 31.418 for \(\beta\) and \(\lambda\) respectively. To further ensure
these estimates seem reasonable, below I overlaid a Weibull distribution
with said parameter values atop our available sample data.
ggplot(data = data) +
geom_histogram(mapping = aes(x = time, y = after_stat(density)),
color = "black",
# argument color in geom_histogram is for color of bin lines
fill = "lightblue",
bins = 10,) +
# use stat_function in ggplot to show density curves of other distributions
stat_function(fun = dweibull,
args = list(shape = weib_mle_beta, scale = weib_mle_lambda),
color = "black",
linewidth = 1.5,
xlim = range(0:50))+
# xlim argument so curve doesn't awkardly cut out at edge points of data
labs(title = "Distribution of Gearbox Survival Times (Weibull)",
x = "Time (Months)",
y = "Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3),
legend.position = c(0.85, 0.85)
)

As we can see in our histogram above, there is certainly a
respectable amount of overlap between the distributions of our sample
data and a theoretical Weibull density curve with parameters of \(\beta \approx 3.371\) and \(\lambda \approx 31.418\). This leads us to
believe that if we are to label the broader population of
gearbox survival times as Weibull distributed, then the \(\beta\) and \(\lambda\) estimates prescribed via MLE are
quality starting population estimators.
Now for comparison’s sake, I will repeat the MLE and visualization
procedure, but this time operating under the assumption that the gearbox
survival times are generally exponentially
distributed.
b.) MLE of Exponential
As alluded to before, the exponential distribution can be
simplistically described as a less complicated “version” of the Weibull
distribution. Instead of employing two active parameters, the
exponential distribution’s only parameter is \(\lambda\), which we will now estimate via
MLE. Log-likelihood functions and partial derivatives are written
below.
\[\Large \ell(\lambda|t) = -[n \times
\ln(\lambda)] - \frac{\sum(t_i)}{\lambda} \]
\[\Large \frac{\partial f}{\partial
\lambda} = \frac{n}{\lambda} +
\frac{\sum(t_i)}{\lambda^2} \]
## Exponential MLE Distribution
# once again code largely brought over from HW 9 assignment
# exponential log likelihood follows same pattern as Weibull log likelihood but a MUCH simpler distribution
exp_log_likelihood = function(lambda, data) {
if (lambda <= 0) return (1e10)
n = length(data)
log_lik = -n * log(lambda) - sum(data)/lambda
return(log_lik)
}
### now define partial derivative w/ respect to lambda (the score function or gradient)
# performed via quick hand calculation
exp_gradient = function(lambda, data) {
if (lambda <= 0) return (NA)
n = length(data)
partial_deriv_lambda = (n/lambda) + (sum(data)/(lambda^2))
return(partial_deriv_lambda)
}
exp_mle_estimates = optim(
par = 1/mean(data$time),
# the initial guess can simply be 1/(mean of the data) since we are performing this analysis under the theoretical that it is an EXPONENTIAL distribution. The mean of an exponential distribution is 1 divided by the the rate parameter lambda
data = data$time,
fn = exp_log_likelihood,
gr = exp_gradient,
method = "L-BFGS-B",
hessian = TRUE,
# will use the Hessian matrix later on
control = list(maxit = 1000,
fnscale = -1,
trace = FALSE,
abstol = 1e-8)
)
#exp_mle_estimates$par
exp_mle_lambda = exp_mle_estimates$par
Per MLE calculations, we can assert that the \(\lambda\) parameter of the distribution of
gearbox survival times, if said distribution was truly exponential, is
estimated to be roughly 28.377. An exponential
distribution with \(\lambda =
28.377\) is overlaid atop the sample data and available
below.
ggplot(data = data) +
geom_histogram(mapping = aes(x = time, y = after_stat(density)),
color = "black",
# argument color in geom_histogram is for color of bin lines
fill = "lightblue",
bins = 10,) +
# use stat_function in ggplot to show density curves of other distributions
stat_function(fun = dexp,
args = list(rate = 1/exp_mle_lambda),
color = "red",
linewidth = 1.5,
xlim = range(0:50)
)+
# xlim argument so curve doesn't awkardly cut out at edge points of data
labs(title = "Distribution of Survival Times (Exp)",
x = "Time (Months)",
y = "Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3),
legend.position = c(0.85, 0.85)
)

Comparing this histogram to the one from part a, we see a drastic
decline in distribution resemblance between the sample data and the
theoretical density curve, when working under the assumption that the
population of gearbox survival times is exponentially distributed. We
will keep this in mind moving forward, as we run hypothesis tests to
quantify the differences between the two distribution assumptions.
c.) Likelihood Ratio Test
To further investigate the difference in each distribution’s fit with
regards to our population of interest (gearbox survival times), we will
perform a likelihood ratio test (LRT). The LRT takes into account the
log-likelihood function evaluated with parameter values estimated from a
simpler model (the exponential in this case) and the log-likelihood
function evaluated with parameter values estimated from a more
complicated model (the Weibull in this scenario). The null hypothesis of
such a test is that, per our sample data, there is no
significant difference between the two model fits.
# weib_mle_beta and weib_mle_lambda are UNRESTRICTED MLE VALUES, but now we must find
# null hypothesis is that beta = 1
# if null hypothesis is true, then distribution is really just exponential
## the likelihood ratio test is just 2 *
# [log likelihood evaluated in unrestricted way] -
# [log likelihood evaluated in restricted or simple way]
# need to evaluate each distribution's likelihood function at the previously estimated parameters of beta and/or lambda
log_evaluation_weib = log_likelihood(params = c(weib_mle_beta, weib_mle_lambda),
data = data)
log_evaluation_exp = exp_log_likelihood(lambda = exp_mle_lambda,
data = data)
# test stat for likelihood ratio test is uppercase Lambda
Lambda = 2 * abs(log_evaluation_weib - log_evaluation_exp)
p_value_likelihoodratio = 1 - pchisq(Lambda, df = 1)
# test stat distribution for likelihood ratio test is chi-square distributed
# df = number of parameters we are estimating in function
Our LRT returned an extremely high test value (\(\Lambda\)) of about 1145.444. Considering
\(\Lambda\) follows a chi-square
distribution, this test stat coincides with a p value of approximately
zero. With these findings, we must reject the null
hypothesis as there is very convincing evidence that the Weibull
distribution more accurately represents the fit of our population of
interest than the exponential distribution does. After examining the
histograms in parts a and b, this finding is not surprising.
d.) Bootstrap Likelihood Ratio Test
Lastly, I repeated this same type of procedure, likelihood ratio
testing, and applied it to a bootstrapped scenario. Under the guise of
the bootstrap LRT, we take repeated samples of a theoretical
distribution, in this case an exponential distribution with \(\lambda \approx\) 28.377. Then, for each
sample, a test statistic of \(\Lambda\)
is found using that particular bootstrap sample’s log-likelihood
evaluation along with the log-likelihood evaluation of our original and
more complex/unrestricted model (the Weibull in this scenario).
### Bootstrap Likelihood Ratio Test (BLRT) for short
# reference code section 4.3 from Week 12 notes, need to change formulas since we are applying to exponential distribution, not normal
bootstrap_lrt_test = function(data, b, rate_null, obs_lr_stat){
# obs_lr_stat will = Lambda = likelihood ratio test stat taken from observed data calculations
n = length(data)
boot_lr_stat = numeric(b)
# Log-likelihood functions are found in earlier code chunks and will be called to within the optim functions of the for loop process
# START FOR LOOP
for (i in 1:b){
# Generating artificial data meant to replicate the characteristics of our NULL hypothesis
# In this case, our null hypothesis is that the data fits an exponential distribution with rate parameter lambda
boot_exp_sample = rexp(n = n, rate = rate_null)
####### Weibull likelihood per boot sample
boot_beta_guess = (sd(boot_exp_sample)/mean(boot_exp_sample))^(-1.086)
boot_lambda_guess = mean(boot_exp_sample)/gamma(1 + 1/boot_beta_guess)
boot_log_weib = optim(
par = c(boot_beta_guess, boot_lambda_guess), # par argument is initial "guesses"
data = boot_exp_sample,
fn = log_likelihood,
gr = log_likelihood_gradient,
method = "L-BFGS-B",
lower = c(1e-8, 1e-8), ## this limiting lower argument is NECESSARY, before I added it I was getting an error line for "non-finite value supplied by optim"
hessian = TRUE,
control = list(maxit = 1000,
fnscale = -1,
trace = FALSE,
abstol = 1e-8)
)
boot_log_weib = boot_log_weib$value # store log likelihood evaluation
####### End Weibull likelihood per boot sample
####### Exponential likelihood per boot sample
boot_log_exp = optim(
par = 1/mean(boot_exp_sample),
data = boot_exp_sample,
fn = exp_log_likelihood,
gr = exp_gradient,
method = "L-BFGS-B",
hessian = TRUE,
control = list(maxit = 1000,
fnscale = -1,
trace = FALSE,
abstol = 1e-8)
)
boot_log_exp = boot_log_exp$value # store log likelihood evaluation
####### End Exponential likelihood per boot sample
# the likelihood ratio for EACH instance of the bootstrap is 2 times the difference between the likelihood ratio evaluated under unrestricted (Weibull) circumstances and evaluated under simpler (exponential) circumstances
boot_lr_stat[i] = 2 * abs(boot_log_weib - boot_log_exp)
} # END FOR LOOP
# calculate and output final p value
p = (sum(boot_lr_stat >= obs_lr_stat) + 1)/ (b + 1)
### FINAL p value of bootstrap exercise is equal to..
# number of bootstrap sample LR test stats greater than or equal to the original LR test stat found in section c
return(list(p_value_bootstrap = p,
bootstrap_lr_stats = boot_lr_stat))
} #### END BOOTSTRAP FUNCTION
## now running the function
set.seed(123)
BLRT_test = bootstrap_lrt_test(data = data$time,
b = 1000,
rate_null = exp_mle_lambda,
obs_lr_stat = Lambda)
# BLRT_test$p_value_bootstrap = 0.000999001
Using a repeated sample size of 1,000, there once again appears to be
significant evidence that an assumption of exponential distribution
would result in a much less accurate resemblance of our population of
gearbox survival times than an assumption of Weibull distribution would.
Our bootstrap LRT yielded a p-value of only 0.001.
e.) Summary and Conclusions
To summarize, all of the analysis up to this point supports the
notion of moving forward under the assumption that the broader
population of gearbox survival times is Weibull
distributed with a shape parameter (\(\beta\)) of about 3.371 and scale parameter
(\(\lambda\)) of roughly 31.418. Both
the standard and bootstrap iterations of the LRT show a significant
decline in distribution resemblance when assuming an exponential
distribution. Furthermore, we could see in preliminary visual analysis
via the histograms in sections a and b, that a Weibull distribution
density curve with said paremeters was much more aligned with
our sample distribution than that of the exponential curve with \(\beta\) value of approximately 28.377.
---
title: "Bootstrap Likelihood Ratio Test"
author: "Chris Bahm"
date: "April 18, 2026"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: false
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(forecast)) {library(forecast)} 

if (!require(lubridate)) {library(lubridate)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE,
	fig.align = "center"
)
```

```{r}
# read in data
data = rbind.data.frame(5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5)
colnames(data) = "time"
  # time is in months
```

# Intro and Setup

The dataset at our disposal for this analyis is made up of `r nrow(data)` recorded failure times (measured in months) for wind turbine gearbox components. Summary statistics as well as a histogram depiction of our sample data can be seen below.

```{r}
sum_stats_table = function(variable){
  table = cbind(
    round(min(variable),2),
    round(quantile(variable, 0.25),2),
    round(median(variable),2),
    round(mean(variable),2),
    round(quantile(variable, 0.75),2),
    round(max(variable),2)
  )
  colnames(table) = c("Min", "Q1", "Med", "Mean", "Q3", "Max")
  rownames(table) = NULL
  return(table)
}

Summary_Table = sum_stats_table(variable = data$time)
kable(Summary_Table,align = "c",
      caption = "<span style='color:##000000;'>
      Summary Stats of Survival Times (Months)</span>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

ggplot(data = data) +
  geom_histogram(mapping = aes(x = time, y = after_stat(density)),
                 color = "black",
                  # argument color in geom_histogram is for color of bin lines
                 fill = "lightblue",
                 bins = 10,) +
  labs(title = "Distribution of Survival Times",
       x = "Time (Months)",
       y = "Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) # this is the equivalent of box(lwd = 3) in base R
```


The engineering team at a wind energy company is unsure whether to label and treat the broader population distribution of fail times as exponentially or Weibull distributed. The probability density of an exponential distribution is simply a more simplified version of the Weibull's. It is the value of our shape parameter, $\beta$, that determines which of these distribution's we will label as the framework for predictive analysis on gearbox fail times moving forward.

For the purposes of today's analysis, we will treat the null hypothesis that said population is exponentially distributed (AKA $\beta = 1$), and the alternative hypothesis that a Weibull distribution more accurately represents the broader population of gearbox fail times (AKA $\beta \neq 1$).


\begin{align}
H_0&: \beta = 1 \quad \text{Takeaway: Exponential} \\
H_a&: \beta \neq 1 \quad \text{Takeaway: Weibull}
\end{align}

In order to assess the validity of these opposing hypotheses, we must first quantify an estimate for our parameter $\beta$ under the assumption of each scenario (Weibull distribution vs exponential distribution). In the following sections, I will use maximum likelihood estimation (MLE) to do so.


# a.) MLE of Weibull
Operating under the assumption that the population of gearbox fail times is **Weibull distributed**, we can use MLE to simultaneously generate estimates for the shape parameter, $\beta$, and the the scale parameter, $\lambda$. We'll denote these estimates as $\hat{\beta}$ and $\hat{\lambda}$ respectively.

Parameter calculations were performed with R's built-in <span style="color:blue;">optim()</span> function. For mathematical reference, the log-likelihood function for the Weibull distribution as well as the partial derivatives of that very function with respect to both $\beta$ and $\lambda$ are below.

$$\Large \ell(\beta,\lambda|t) =
[n \times \ln(\beta)] - 
[n \times \beta \times \ln(\lambda)]
+ [(\beta - 1) \times \sum\ln(t_i)]
- \sum\left(\frac{(t_i)}{\lambda}\right)^\beta $$

<br>

$$\Large \frac{\partial f}{\partial \beta} =
\frac{n}{\beta} -
n \times \ln(\lambda)
+ \sum\ln(t_i) -
\sum\left(\ln\frac{t_i}{\lambda}\right) \times \left(\frac{t_i}{\lambda}\right)^\beta    $$

<br>

$$\Large \frac{\partial f}{\partial \lambda} =  \frac{\beta}{\lambda} \times \sum\left(\frac{t_i}{\lambda}\right)^\beta - n    $$
<br>

```{r}
## MLE Weibull Estimation
  # code is largely copied from HW 9 assignment, which took formulas and procedure from following:
    # Week 11, Section 5.2 
    # Week 7, Section 5

#### define likelihood function
## *negative* log likelihood function for Weibull distribution
  # by negative log likelihood, this function....
log_likelihood = function(params, data) {
  beta = params[1]  # shape
  lambda = params[2]  # scale 
  
  # Ensure parameters are positive
  if (beta <= 0 || lambda <= 0) {
    return(1e10)  # Return large value for invalid parameters
  }
  
  n = length(data)
  log_lik = n * log(beta) - n * beta * log(lambda) + 
    (beta - 1) * sum(log(data)) - sum((data / lambda)^beta)
  
  return(log_lik)  # Return log-likelihood! NOT the negative log-likelihood
}

#### define score equations (partial derivatives)
log_likelihood_gradient = function(params, data) {
  beta = params[1]       # shape parameter
  lambda = params[2]  # scale parameter
  
  n = length(data)
  
  partial_deriv_beta = n/beta - n * log(lambda) + sum(log(data)) - 
        sum(log(data/lambda) * (data/lambda)^beta)
  
  partial_deriv_lambda =  (beta/lambda)*(sum((data/lambda)^beta) -n)
  
  return(c(partial_deriv_beta,
           partial_deriv_lambda))  # Gradient of log-likelihood, NOT the negative scores
}

#### use log likelihood and score equations in optim function

# need to have good initial guesses for beta and lambda for the optim function
beta_guess = (sd(data$time)/mean(data$time))^(-1.086)

lambda_guess = mean(data$time)/gamma(1 + 1/beta_guess)


weib_mle_estimates = optim(
  par = c(beta_guess, lambda_guess), # par argument is initial "guesses"
  data = data$time,
  fn = log_likelihood,
  gr = log_likelihood_gradient,
  method = "L-BFGS-B",
  hessian = TRUE,
    # will use the Hessian matrix later on
  control = list(maxit = 1000, 
                 fnscale  = -1,
                 trace = FALSE,
                 abstol = 1e-8)
)
#weib_mle_estimates
  
weib_mle_beta = weib_mle_estimates$par[1] # shape
weib_mle_lambda = weib_mle_estimates$par[2] # scale
```

Post-MLE, we have estimates of about `r round(weib_mle_beta,3)` and `r round(weib_mle_lambda,3)` for $\beta$ and $\lambda$ respectively. To further ensure these estimates seem reasonable, below I overlaid a Weibull distribution with said parameter values atop our available sample data.

```{r}
ggplot(data = data) +
  geom_histogram(mapping = aes(x = time, y = after_stat(density)),
                 color = "black",
                  # argument color in geom_histogram is for color of bin lines
                 fill = "lightblue",
                 bins = 10,) +
  # use stat_function in ggplot to show density curves of other distributions
    stat_function(fun = dweibull, 
                args = list(shape = weib_mle_beta, scale = weib_mle_lambda),
                color = "black",
                linewidth = 1.5,
                xlim = range(0:50))+
            # xlim argument so curve doesn't awkardly cut out at edge points of data
  labs(title = "Distribution of Gearbox Survival Times (Weibull)",
       x = "Time (Months)",
       y = "Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3),
    legend.position = c(0.85, 0.85)
  )
```

As we can see in our histogram above, there is certainly a respectable amount of overlap between the distributions of our sample data and a theoretical Weibull density curve with parameters of $\beta \approx `r round(weib_mle_beta,3)`$ and $\lambda \approx `r round(weib_mle_lambda,3)`$. This leads us to believe that *if* we are to label the broader population of gearbox survival times as Weibull distributed, then the $\beta$ and $\lambda$ estimates prescribed via MLE are quality starting population estimators.

Now for comparison's sake, I will repeat the MLE and visualization procedure, but this time operating under the assumption that the gearbox survival times are generally **exponentially** distributed.

# b.) MLE of Exponential
As alluded to before, the exponential distribution can be simplistically described as a less complicated "version" of the Weibull distribution. Instead of employing two active parameters, the exponential distribution's only parameter is $\lambda$, which we will now estimate via MLE. Log-likelihood functions and partial derivatives are written below.

$$\Large \ell(\lambda|t) = -[n \times \ln(\lambda)] - \frac{\sum(t_i)}{\lambda}  $$

<br>

$$\Large \frac{\partial f}{\partial \lambda} = \frac{n}{\lambda} + \frac{\sum(t_i)}{\lambda^2}   $$

<br>
```{r}
## Exponential MLE Distribution
  # once again code largely brought over from HW 9 assignment

# exponential log likelihood follows same pattern as Weibull log likelihood but a MUCH simpler distribution
exp_log_likelihood = function(lambda, data) {
 if (lambda <= 0) return (1e10)
 n = length(data)
 log_lik = -n * log(lambda) - sum(data)/lambda
 return(log_lik)
}

### now define partial derivative w/ respect to lambda (the score function or gradient)
  # performed via quick hand calculation

exp_gradient = function(lambda, data) {
  if (lambda <= 0) return (NA)
  n = length(data)
  partial_deriv_lambda = (n/lambda) + (sum(data)/(lambda^2))
  return(partial_deriv_lambda)
}

exp_mle_estimates = optim(
  par = 1/mean(data$time),
    # the initial guess can simply be 1/(mean of the data) since we are performing this analysis under the theoretical that it is an EXPONENTIAL distribution. The mean of an exponential distribution is 1 divided by the the rate parameter lambda
  data = data$time,
  fn = exp_log_likelihood,
  gr = exp_gradient,
  method = "L-BFGS-B",
  hessian = TRUE,
    # will use the Hessian matrix later on
  control = list(maxit = 1000, 
                 fnscale  = -1,
                 trace = FALSE,
                 abstol = 1e-8)
)
  
#exp_mle_estimates$par

exp_mle_lambda = exp_mle_estimates$par
```

Per MLE calculations, we can assert that the $\lambda$ parameter of the distribution of gearbox survival times, if said distribution was truly exponential, is estimated to be roughly `r round(exp_mle_lambda,3)`. An <span style="color:red;">exponential distribution</span> with $\lambda = `r round(exp_mle_lambda,3)`$ is overlaid atop the sample data and available below.

```{r}
ggplot(data = data) +
  geom_histogram(mapping = aes(x = time, y = after_stat(density)),
                 color = "black",
                  # argument color in geom_histogram is for color of bin lines
                 fill = "lightblue",
                 bins = 10,) +
  # use stat_function in ggplot to show density curves of other distributions
    stat_function(fun = dexp, 
                args = list(rate = 1/exp_mle_lambda),
                color = "red",
                linewidth = 1.5,
                xlim = range(0:50)
                )+
            # xlim argument so curve doesn't awkardly cut out at edge points of data
  labs(title = "Distribution of Survival Times (Exp)",
       x = "Time (Months)",
       y = "Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3),
    legend.position = c(0.85, 0.85)
  )

```

Comparing this histogram to the one from part a, we see a drastic decline in distribution resemblance between the sample data and the theoretical density curve, when working under the assumption that the population of gearbox survival times is exponentially distributed. We will keep this in mind moving forward, as we run hypothesis tests to quantify the differences between the two distribution assumptions.

# c.) Likelihood Ratio Test

To further investigate the difference in each distribution's fit with regards to our population of interest (gearbox survival times), we will perform a likelihood ratio test (LRT). The LRT takes into account the log-likelihood function evaluated with parameter values estimated from a simpler model (the exponential in this case) and the log-likelihood function evaluated with parameter values estimated from a more complicated model (the Weibull in this scenario). The null hypothesis of such a test is that, per our sample data, there is **no** significant difference between the two model fits.

```{r}
# weib_mle_beta and weib_mle_lambda are UNRESTRICTED MLE VALUES, but now we must find

# null hypothesis is that beta = 1
# if null hypothesis is true, then distribution is really just exponential

## the likelihood ratio test is just 2 * 
  # [log likelihood evaluated in unrestricted way] -
  # [log likelihood evaluated in restricted or simple way]

# need to evaluate each distribution's likelihood function at the previously estimated parameters of beta and/or lambda
log_evaluation_weib = log_likelihood(params = c(weib_mle_beta, weib_mle_lambda),
                                     data = data)
log_evaluation_exp = exp_log_likelihood(lambda = exp_mle_lambda,
                                        data = data)

# test stat for likelihood ratio test is uppercase Lambda

Lambda = 2 * abs(log_evaluation_weib - log_evaluation_exp)

p_value_likelihoodratio = 1 - pchisq(Lambda, df = 1)
  # test stat distribution for likelihood ratio test is chi-square distributed
  # df = number of parameters we are estimating in function
```

Our LRT returned an extremely high test value ($\Lambda$) of about `r round(Lambda,3)`. Considering $\Lambda$ follows a chi-square distribution, this test stat coincides with a p value of approximately zero. With these findings, we must **reject** the null hypothesis as there is very convincing evidence that the Weibull distribution more accurately represents the fit of our population of interest than the exponential distribution does. After examining the histograms in parts a and b, this finding is not surprising.

# d.) Bootstrap Likelihood Ratio Test 
Lastly, I repeated this same type of procedure, likelihood ratio testing, and applied it to a bootstrapped scenario. Under the guise of the bootstrap LRT, we take repeated samples of a theoretical distribution, in this case an exponential distribution with $\lambda \approx$ `r round(exp_mle_lambda,3)`. Then, for each sample, a test statistic of $\Lambda$ is found using that particular bootstrap sample's log-likelihood evaluation along with the log-likelihood evaluation of our original and more complex/unrestricted model (the Weibull in this scenario).

```{r, warning=FALSE}
### Bootstrap Likelihood Ratio Test (BLRT) for short
# reference code section 4.3 from Week 12 notes, need to change formulas since we are applying to exponential distribution, not normal

bootstrap_lrt_test = function(data, b, rate_null, obs_lr_stat){
    # obs_lr_stat will = Lambda = likelihood ratio test stat taken from observed data calculations
  n = length(data)
  boot_lr_stat = numeric(b)
    
  # Log-likelihood functions are found in earlier code chunks and will be called to within the optim functions of the for loop process
    
  # START FOR LOOP
    for (i in 1:b){
      
      # Generating artificial data meant to replicate the characteristics of our NULL hypothesis
      # In this case, our null hypothesis is that the data fits an exponential distribution with rate parameter lambda
      boot_exp_sample = rexp(n = n, rate = rate_null) 
  
      ####### Weibull likelihood per boot sample    
      boot_beta_guess = (sd(boot_exp_sample)/mean(boot_exp_sample))^(-1.086)

      boot_lambda_guess = mean(boot_exp_sample)/gamma(1 + 1/boot_beta_guess)
      
      boot_log_weib = optim(
          par = c(boot_beta_guess, boot_lambda_guess), # par argument is initial "guesses"
          data = boot_exp_sample,
          fn = log_likelihood,
          gr = log_likelihood_gradient,
          method = "L-BFGS-B",
          lower = c(1e-8, 1e-8), ## this limiting lower argument is NECESSARY, before I added it I was getting an error line for "non-finite value supplied by optim"
          hessian = TRUE,
          control = list(maxit = 1000, 
                         fnscale  = -1,
                         trace = FALSE,
                         abstol = 1e-8)
          )
      boot_log_weib = boot_log_weib$value # store log likelihood evaluation
      ####### End Weibull likelihood per boot sample 
  

      ####### Exponential likelihood per boot sample    
      boot_log_exp = optim(
          par = 1/mean(boot_exp_sample),
          data = boot_exp_sample,
          fn = exp_log_likelihood,
          gr = exp_gradient,
          method = "L-BFGS-B",
          hessian = TRUE,
          control = list(maxit = 1000, 
                         fnscale  = -1,
                         trace = FALSE,
                         abstol = 1e-8)
          )
      boot_log_exp = boot_log_exp$value  # store log likelihood evaluation
      ####### End Exponential likelihood per boot sample  
      
      
      # the likelihood ratio for EACH instance of the bootstrap is 2 times the difference between the likelihood ratio evaluated under unrestricted (Weibull) circumstances and evaluated under simpler (exponential) circumstances
      boot_lr_stat[i] = 2 * abs(boot_log_weib - boot_log_exp)
      

    } # END FOR LOOP
    
  # calculate and output final p value
  p = (sum(boot_lr_stat >= obs_lr_stat) + 1)/ (b + 1)
  
  ### FINAL p value of bootstrap exercise is equal to.. 
  # number of bootstrap sample LR test stats greater than or equal to the original LR test stat found in section c
  
  return(list(p_value_bootstrap = p,
              bootstrap_lr_stats = boot_lr_stat))
  
} #### END BOOTSTRAP FUNCTION


## now running the function
set.seed(123)
BLRT_test = bootstrap_lrt_test(data = data$time,
                               b = 1000, 
                               rate_null = exp_mle_lambda,
                               obs_lr_stat = Lambda)

# BLRT_test$p_value_bootstrap = 0.000999001
```

Using a repeated sample size of 1,000, there once again appears to be significant evidence that an assumption of exponential distribution would result in a much less accurate resemblance of our population of gearbox survival times than an assumption of Weibull distribution would. Our bootstrap LRT yielded a p-value of only `r round(BLRT_test$p_value_bootstrap, 4)`.

# e.) Summary and Conclusions
To summarize, all of the analysis up to this point supports the notion of moving forward under the assumption that the broader population of gearbox survival times is **Weibull distributed** with a shape parameter ($\beta$) of about `r round(weib_mle_beta,3)` and scale parameter ($\lambda$) of roughly `r round(weib_mle_lambda,3)`. Both the standard and bootstrap iterations of the LRT show a significant decline in distribution resemblance when assuming an exponential distribution. Furthermore, we could see in preliminary visual analysis via the histograms in sections a and b, that a Weibull distribution density curve with said paremeters was *much* more aligned with our sample distribution than that of the exponential curve with $\beta$ value of approximately `r round(exp_mle_lambda,3)`.


