Assignment Objectives

  • Enhance understanding the procedure of Bootstrap hypothesis testing.

  • Implement the procedures for detecting overfitting/underfitting issues in practical applications using bootstrap likelihood ratio test.

Policies of Using AI Tools

Policy on AI Tool Use: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

Code Inclusion Requirement: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.

Testing Overfitting/Underfitting

In Machine Learning and Statistics, overfitting occurs when a model is too complex and learns noise, leading to poor performance on new data, while underfitting happens when a model is too simple to capture important patterns, resulting in high errors overall; both issues are explained by the Bias–Variance Tradeoff and can cause unreliable predictions in real-world applications.

The probability density function (PDF) of the Weibull distribution is:

\[ f(t; \lambda, \beta) = \frac{\beta}{\lambda} \left( \frac{t}{\lambda} \right)^{\beta-1} \exp\left[ -\left( \frac{t}{\lambda} \right)^\beta \right], \quad t \ge 0 \] where \(\lambda > 0\) is the scale parameter (characteristic life) and \(\beta > 0\) is the shape parameter.

When \(\beta = 1\), the Weibull PDF simplifies to the exponential PDF:

\[ f(t; \lambda) = \frac{1}{\lambda} \exp\left( -\frac{t}{\lambda} \right) \] with constant hazard rate \(h(t) = 1/\lambda\).

This assignment focuses on performing a hypothesis test for the shape parameter (\(\beta\)) of the Weibull distribution within a reliability mode

\[\begin{align} H_0&: \beta = 1 \quad \text{(Exponential model, simpler)} \\ H_1&: \beta \neq 1 \quad \text{(Weibull model, more complex)} \end{align}\]

Steps of the BLRT

  • Fit models under \(H_0\) and \(H_1\)} to the original data, compute \(\Lambda_{\text{obs}}\).

  • Generate bootstrap samples under \(H_0\)}:

    • Estimate parameters under \(H_0\) from the original data.
    • Generate \(B\) datasets by sampling from the model under \(H_0\) (parametric bootstrap) or by resampling residuals/cases (nonparametric bootstrap; parametric is common for BLRT).
  • For each bootstrap sample \(b = 1,\dots,B\):

    • Fit \(H_0\) and \(H_1\) models.
    • Compute \(\Lambda_b = -2[\ell_{0,b} - \ell_{1,b}]\).
  • Approximate p-value:

\[ p = \frac{1}{B} \sum_{b=1}^B I(\Lambda_b \ge \Lambda_{\text{obs}}) \] (Often a small adjustment is made for stability: \((1 + \#\{\Lambda_b \ge \Lambda_{\text{obs}}\})/(B+1)\)).


Question: Reliability Application

A wind energy company monitors the reliability of gearboxes in 75 identical wind turbines located in a coastal wind farm. The gearbox is a critical component; its failure often leads to costly downtime and repairs. Previous studies suggest that the hazard rate (failure risk) may increase over time due to mechanical wear (fatigue, pitting, bearing degradation). Engineers want to test whether the failure time distribution follows an exponential model (constant hazard, random failures) or a Weibull model with shape parameter \(k>1\) (increasing hazard, indicative of aging/degradation). The failure times (in months) are:

   5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5

This assignment focuses on hypothesis \(H_0: \beta = 1\) (exponential) against \(H_1: \beta \neq 1\) (Weibull). This framework detects overfitting (fitting a Weibull when exponential is true) and underfitting (fitting exponential when Weibull with \(\beta \neq 1\) is true).

a). Find the MLE of the Weibull parameters \(\lambda\) (scale) and \(\beta\) (shape), denoted by \(\hat{\lambda}\) and \(\hat{\beta}\), respectively, using the optim() procedure. [Hint: You should provide explicit expressions for the log-likelihood and gradient functions of the Weibull distribution parameters.]

windfarm= sort(c(5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5))


n = length(windfarm)
#here we define the weibull log likelihood function
#
weibull_loglik = function(params, data) #weilbull distribution parameters: beta (shape), alpha (scale)
{
  sum(dweibull(data, shape = params[1], #beta shape parameter
                      scale = params[2], #lambda scale parameter
                      log = TRUE         #write the log likelihood of the weibull distribution
                ))
  
  
}


#Finding the MLEs using optim()
  #use mean of data as reference

mle_weibull = optim(par = c(1, mean(windfarm)),
                    fn = weibull_loglik, # minimizes function
                    data= windfarm,
                    control = list(fnscale= -1) #-1 for maximization
                    
                    )

hat_beta = mle_weibull$par[1]
hat_lambda = mle_weibull$par[2]
logL1_obs = mle_weibull$value


cat("Weibull MLEs Shape (Beta):", hat_beta, "Scale (Lambda):", hat_lambda)
Weibull MLEs Shape (Beta): 3.370336 Scale (Lambda): 31.41928

b). Find the MLE of the exponential parameter \(\lambda\) (scale), denoted by \(\hat{\lambda}\), using any procedure. [Hint: You should provide explicit expressions for the log-likelihood and gradient functions of the exponential distribution parameters.]

exp_loglik = function(lambda, data) #exponetial distrubution parameters: lambda
{
  sum(dexp(data, rate= 1/lambda , #solve for rate with lambda parameter
                      log = TRUE #write the log likelihood of the exponential distribution
                ))
  
  
}


#Finding the MLEs using optim()
  #use mean of data as reference

mle_exp = optim(par = c(mean(windfarm)), #intial guess starting point for lambda
                    fn = exp_loglik, # minimizes function
                    data= windfarm,
                    
                    control = list(fnscale= -1) #-1 for maximization
                    
                    )



hat_lambda_exp = mle_exp$par
loglike_exp_null = mle_exp$value


cat("Exponetial MLEs Scale (Lambda):", hat_lambda_exp, 
    "Max Log-Likelihood (Null):", loglike_exp_null)
Exponetial MLEs Scale (Lambda): 28.18533 Max Log-Likelihood (Null): -325.4101

c). Use a) and b) to perform the regular likelihood ratio \(\chi^2\) test for \(\beta = 1\) and report the p-value.

#Calculating the p value

  #Find the difference of the from lower peak or higher peak

LR_obs = -2 *(loglike_exp_null - logL1_obs) #likelihood_ratio_test

#Calc p value w/ chisquare distrubution w/ 1 degree of freedom
p_val_chi2= 1-pchisq(LR_obs,1) #p_val =1-pchisq(likelihood_ratio_test,1)
  
  
cat("Likelihood Ratio Statistic (Observed):", LR_obs, 
    "\nChi-Square P-value:", p_val_chi2)
Likelihood Ratio Statistic (Observed): 100.6143 
Chi-Square P-value: 0

d). Use the BLRT algorithm to perform a bootstrap likelihood ratio test and report the bootstrap p-value. Note that you are expected to translate the BLRT algorithm into R code to perform the BLRT. [Hint: The chi-square distribution should not be used in this part of the analysis.] Algorithm

windfarm= sort(c(5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5))

B = 9999
n = length(windfarm)

#finding weilbuil MLEs
#using MLEs found in earlier sections 

LR_obs = -2 * (loglike_exp_null - logL1_obs) #find log likehood ratio for observed values and force into a chi square distrubution

weibull_loglik = function(params, data) {
  sum(dweibull(data, shape = params[1], scale = params[2], log = TRUE))
}
################Bootstrap Building###333
boot_lrt= function(data, hat_lambda_exp, B=B){  
  
    n = length(data)
    LR_star = numeric(B) #bootstrap the log likelihood ratio
    
    
    
    # Bootstrap distribution
    LR_star <- numeric(B)
    for (b in 1:B) {
    
    #bootstrap sample under HO
      z_star = rexp(n, rate =1/hat_lambda_exp)
    
  # Fit null model (exponential distribution)

loglike_null_boot = sum(dexp(z_star, rate = 1 / mean(z_star), log = TRUE))
      
#Fit alternative  model (weibull distribution)

fit_star = optim(par = c(1, mean(z_star)), fn = weibull_loglik,  data = z_star, control = list(fnscale = -1))

logL1_star = fit_star$value

#Calculate the 'Fake' bootstrapped comparison Likelihood Ratio
    LR_star[b] = -2 * (loglike_null_boot - logL1_star)
    }
  
return(LR_star)

}

boot_results = boot_lrt(windfarm, hat_lambda_exp, B = B)

# Calculating the p-value:
#p_val_boot = (sum(boot_results >= LR_obs) + 1) / (length(boot_results) + 1)

p_val_boot = (sum(boot_results >= LR_obs)+1)/ (B+1) #number of bootstraps





cat("Bootstrap P-Value:", p_val_boot)
Bootstrap P-Value: 1e-04

e). Write a summary of the above analyses to address the following:

  • Whether the two tests generated the same results.

  • Which model is recommended for the data.

Both hypothesis testing results preformed with MLE testing and bootstrapped hypothesis tested were statistically significant, suggesting that the complex model, the weibull distribution, is more significant. In both approaches, the Weibull distribution does a better job at explaining the data than the simpler, exponential model . In generality, both the bootstrapped hypothesis test 10^{-4} and the non-bootstrapped hypothesis test 0 convey the same information.

Considering the moderate sample size of 75 and further testing would be required to know if there are strong violations against distributional assumptions a bootstrap could be a strong recommendation over the traditional hypothesis testing.Bootstrap hypothesis testing provides robust analysis for small sample sizes and when distributional assumptions are violated. Additionally, we do not need to assume a chi square distribution for a bootstrap unlike the chi-square based likelihood ratio statistic, allowing us to likely better suit the data for analysis than a parametric test. Use a weibull distribution to monitor the reliability of gearboxes.

---
title: "Assignment 12: Bootstrap Likelihood Ratio Test (BLRT)"
author: "Ezana Rivers "
date: " 4-21: "
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body {background-color: #ffffff;
      color: #000000;
      font-family: Arial, sans-serif;
      font-size: 1rem;
      line-height: 1.6;
      }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}

if (!require("VGAM")) {
  install.packages("VGAM")
  library(VGAM)
}
#### VGAM
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
 \
 
## **Assignment Objectives** 

<p>
* Enhance understanding the procedure of Bootstrap hypothesis testing.

* Implement the procedures for detecting overfitting/underfitting issues in practical applications using bootstrap likelihood ratio test.
</p>


## **Policies of Using AI Tools**

<p>
**Policy on AI Tool Use**: Please adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.
</p>

<p>
**Code Inclusion Requirement**: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.
</p>




## Testing Overfitting/Underfitting

In Machine Learning and Statistics, overfitting occurs when a model is too complex and learns noise, leading to poor performance on new data, while underfitting happens when a model is too simple to capture important patterns, resulting in high errors overall; both issues are explained by the Bias–Variance Tradeoff and can cause unreliable predictions in real-world applications.


The probability density function (PDF) of the Weibull distribution is:

$$
f(t; \lambda, \beta) = \frac{\beta}{\lambda} \left( \frac{t}{\lambda} \right)^{\beta-1} \exp\left[ -\left( \frac{t}{\lambda} \right)^\beta \right], \quad t \ge 0
$$
where $\lambda > 0$ is the scale parameter (characteristic life) and $\beta > 0$ is the shape parameter.

When $\beta = 1$, the Weibull PDF simplifies to the exponential PDF:

$$
f(t; \lambda) = \frac{1}{\lambda} \exp\left( -\frac{t}{\lambda} \right)
$$
with constant hazard rate $h(t) = 1/\lambda$.


<p><font color = "darkred">**This assignment focuses on performing a hypothesis test for the shape parameter ($\beta$) of the Weibull distribution within a reliability mode**</font></p>


\begin{align}
H_0&: \beta = 1 \quad \text{(Exponential model, simpler)} \\
H_1&: \beta \neq 1 \quad \text{(Weibull model, more complex)}
\end{align}



## Steps of the BLRT


* Fit models under $H_0$ and $H_1$} to the original data, compute $\Lambda_{\text{obs}}$.

* Generate bootstrap samples under $H_0$}: 
  + Estimate parameters under $H_0$ from the original data.
  + Generate $B$ datasets by sampling from the model under $H_0$ (parametric bootstrap) or by resampling residuals/cases (nonparametric bootstrap; parametric is common for BLRT).

* For each bootstrap sample $b = 1,\dots,B$:
  + Fit $H_0$ and $H_1$ models.
  + Compute $\Lambda_b = -2[\ell_{0,b} - \ell_{1,b}]$.

* Approximate p-value:

$$
  p = \frac{1}{B} \sum_{b=1}^B I(\Lambda_b \ge \Lambda_{\text{obs}})
$$
(Often a small adjustment is made for stability: $(1 + \#\{\Lambda_b \ge \Lambda_{\text{obs}}\})/(B+1)$).



\

## **Question: Reliability Application**

<p>
A wind energy company monitors the reliability of gearboxes in 75 identical wind turbines located in a coastal wind farm. The gearbox is a critical component; its failure often leads to costly downtime and repairs. Previous studies suggest that the hazard rate (failure risk) may increase over time due to mechanical wear (fatigue, pitting, bearing degradation). Engineers want to test whether the failure time distribution follows an exponential model (constant hazard, random failures) or a Weibull model with shape parameter $k>1$ (increasing hazard, indicative of aging/degradation). The failure times (in months) are:

```
   5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5
```
</p>

This assignment focuses on hypothesis $H_0: \beta = 1$ (exponential) against $H_1: \beta \neq 1$ (Weibull). This framework detects overfitting (fitting a Weibull when exponential is true) and underfitting (fitting exponential when Weibull with $\beta \neq 1$ is true). 


<p>
a). Find the MLE of the Weibull parameters $\lambda$ (scale) and $\beta$ (shape), denoted by $\hat{\lambda}$ and $\hat{\beta}$, respectively, using the `optim()` procedure. [*Hint: You should provide explicit expressions for the log-likelihood and gradient functions of the Weibull distribution parameters.*]

```{r}

windfarm= sort(c(5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5))


n = length(windfarm)


```


```{r}
#here we define the weibull log likelihood function
#
weibull_loglik = function(params, data) #weilbull distribution parameters: beta (shape), alpha (scale)
{
  sum(dweibull(data, shape = params[1], #beta shape parameter
                      scale = params[2], #lambda scale parameter
                      log = TRUE         #write the log likelihood of the weibull distribution
                ))
  
  
}


#Finding the MLEs using optim()
  #use mean of data as reference

mle_weibull = optim(par = c(1, mean(windfarm)),
                    fn = weibull_loglik, # minimizes function
                    data= windfarm,
                    control = list(fnscale= -1) #-1 for maximization
                    
                    )

hat_beta = mle_weibull$par[1]
hat_lambda = mle_weibull$par[2]
logL1_obs = mle_weibull$value


cat("Weibull MLEs Shape (Beta):", hat_beta, "Scale (Lambda):", hat_lambda)

```




b). Find the MLE of the exponential parameter $\lambda$ (scale), denoted by $\hat{\lambda}$, using any procedure. [*Hint: You should provide explicit expressions for the log-likelihood and gradient functions of the exponential distribution parameters.*]

```{r}

exp_loglik = function(lambda, data) #exponetial distrubution parameters: lambda
{
  sum(dexp(data, rate= 1/lambda , #solve for rate with lambda parameter
                      log = TRUE #write the log likelihood of the exponential distribution
                ))
  
  
}


#Finding the MLEs using optim()
  #use mean of data as reference

mle_exp = optim(par = c(mean(windfarm)), #intial guess starting point for lambda
                    fn = exp_loglik, # minimizes function
                    data= windfarm,
                    
                    control = list(fnscale= -1) #-1 for maximization
                    
                    )



hat_lambda_exp = mle_exp$par
loglike_exp_null = mle_exp$value


cat("Exponetial MLEs Scale (Lambda):", hat_lambda_exp, 
    "Max Log-Likelihood (Null):", loglike_exp_null)
```



c). Use a) and b) to perform the regular likelihood ratio $\chi^2$ test for $\beta = 1$ and report the p-value.


```{r}

#Calculating the p value

  #Find the difference of the from lower peak or higher peak

LR_obs = -2 *(loglike_exp_null - logL1_obs) #likelihood_ratio_test

#Calc p value w/ chisquare distrubution w/ 1 degree of freedom
p_val_chi2= 1-pchisq(LR_obs,1) #p_val =1-pchisq(likelihood_ratio_test,1)
  
  
cat("Likelihood Ratio Statistic (Observed):", LR_obs, 
    "\nChi-Square P-value:", p_val_chi2)

```




d). Use the BLRT algorithm to perform a bootstrap likelihood ratio test and report the bootstrap p-value. Note that you are expected to translate the BLRT algorithm into R code to perform the BLRT. [*Hint: The chi-square distribution should not be used in this part of the analysis.*]
**Algorithm**


```{r}


windfarm= sort(c(5.2,  7.8,  9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 
  19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 
  25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 
  30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 
  36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
  41.9, 42.3, 42.7, 43.1, 43.5))

B = 9999
n = length(windfarm)

#finding weilbuil MLEs
#using MLEs found in earlier sections 

LR_obs = -2 * (loglike_exp_null - logL1_obs) #find log likehood ratio for observed values and force into a chi square distrubution

weibull_loglik = function(params, data) {
  sum(dweibull(data, shape = params[1], scale = params[2], log = TRUE))
}
################Bootstrap Building###333
boot_lrt= function(data, hat_lambda_exp, B=B){  
  
    n = length(data)
    LR_star = numeric(B) #bootstrap the log likelihood ratio
    
    
    
    # Bootstrap distribution
    LR_star <- numeric(B)
    for (b in 1:B) {
    
    #bootstrap sample under HO
      z_star = rexp(n, rate =1/hat_lambda_exp)
    
  # Fit null model (exponential distribution)

loglike_null_boot = sum(dexp(z_star, rate = 1 / mean(z_star), log = TRUE))
      
#Fit alternative  model (weibull distribution)

fit_star = optim(par = c(1, mean(z_star)), fn = weibull_loglik,  data = z_star, control = list(fnscale = -1))

logL1_star = fit_star$value

#Calculate the 'Fake' bootstrapped comparison Likelihood Ratio
    LR_star[b] = -2 * (loglike_null_boot - logL1_star)
    }
  
return(LR_star)

}

boot_results = boot_lrt(windfarm, hat_lambda_exp, B = B)

# Calculating the p-value:
#p_val_boot = (sum(boot_results >= LR_obs) + 1) / (length(boot_results) + 1)

p_val_boot = (sum(boot_results >= LR_obs)+1)/ (B+1) #number of bootstraps





cat("Bootstrap P-Value:", p_val_boot)

```


e). Write a summary of the above analyses to address the following:

* Whether the two tests generated the same results.

* Which model is recommended for the data.

</p>
 
 
Both hypothesis testing results preformed with MLE testing and bootstrapped hypothesis tested were statistically significant, suggesting that the complex model, the weibull distribution, is more significant. In both approaches, the Weibull distribution does a better job at explaining the data than the simpler, exponential model . In generality, both the bootstrapped hypothesis test `r p_val_boot` and the non-bootstrapped hypothesis test `r p_val_chi2` convey the same information. 


Considering the moderate sample size of 75 and further testing would be required to know if there are strong violations against distributional assumptions a bootstrap could be a strong recommendation over the traditional hypothesis testing.Bootstrap hypothesis testing provides robust analysis for small sample sizes and when distributional assumptions are violated. 
 Additionally, we do not need to assume a chi square distribution for a bootstrap unlike the chi-square based likelihood ratio statistic, allowing us to likely better suit the data for analysis than a parametric test. Use a weibull distribution to  monitor the reliability of gearboxes.


