Introduction
For this assignment, we will focus on comparing a asymptotic \(\chi^2\) hypothesis test to a bootstrapped
hypothesis test to determine whether or not a sample came from one
distribution or another. Specifically, we will do a Wald \(\chi^2\) and a bootstrapped hypothesis test
to determine if a sample came from a Weibull distribution or an
exponential distribution.
The sample we will use for our tests is a sample consisting of 75
failure times for gearboxes used in wind turbines at a distribution
company’s facility.
times <- c(5.2, 7.8, 9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
41.9, 42.3, 42.7, 43.1, 43.5)
Part A
For this part, we will find the maximum likelihood estimators of the
Weibull distribution, treating this sample as if it comes from a Weibull
distribution.
First, we’ll start with the log-likelihood function for the weibull
distribution:
$$ \[\begin{align}
\ell(k, \lambda; \mathbf{t}) &= \ln L(k, \lambda; \mathbf{t}) \\
&= \sum_{i=1}^{n} \ln \left[ \frac{k}{\lambda} \left(
\frac{t_i}{\lambda} \right)^{k-1} \exp\left( -\left( \frac{t_i}{\lambda}
\right)^k \right) \right] \\
&= \sum_{i=1}^{n} \left[ \ln k - \ln \lambda + (k-1)(\ln t_i -
\ln \lambda) - \left( \frac{t_i}{\lambda} \right)^k \right] \\
&= \sum_{i=1}^{n} \left[ \ln k - k \ln \lambda + (k-1) \ln t_i -
\left( \frac{t_i}{\lambda} \right)^k \right]
\end{align}\]
$$
From there, we can take the partial derivatives (one with respect to
each variable), set them both equal to 0, and then solve for the two
parameters and get their maximum likelihood estimates. If done right,
the maximum likelihood estimates are: \[
\begin{align}
\hat{\lambda} &= \left( \frac{1}{n} \sum_{i=1}^{n}
x_i^{\hat{\beta}} \right)^{1/\hat{\beta}} \quad \\
\frac{1}{\hat{\beta}} &= \frac{\sum_{i=1}^{n} (t_i^{\hat{\beta}}
\ln x_i)}{\sum_{i=1}^{n} x_i^{\hat{\beta}}} - \frac{1}{n} \sum_{i=1}^{n}
\ln x_i \quad
\end{align}
\]
Where \(\lambda\) is the scale
parameter and \(\beta\) is the shape
parameter.
To obtain these parameters, we will use R’s built in optim()
function, which will solve a system of equations numerically. The
following code will be used to obtain the parameter estimates:
## Manual MLE implementation for Weibull distribution
# Create Weibull log-likelihood function
weibull.loglik <- function(params, data) {
shape <- params[1] # passing the parameters
scale <- params[2]
n <- length(data) # sample size
# Log-likelihood for Weibull distribution
loglik <- n * log(shape) - n * shape * log(scale) +
(shape - 1) * sum(log(data)) -
sum((data / scale)^shape)
return(loglik) # Return
}
# Score equation
weibull.score <- function(params, data) {
shape <- params[1]
scale <- params[2]
n <- length(data)
# Gradient for shape parameter
grad_shape <- n/shape - n * log(scale) +
sum(log(data)) -
sum((data/scale)^shape * log(data/scale))
# Gradient for scale parameter
grad_scale <- -(n * shape)/scale +
(shape/scale) * sum((data/scale)^shape)
return(c(grad_shape, grad_scale))
}
# Need to provide initial values for parameters
initial.params <- c(shape = 1, scale = 5) # Reasonable starting values
# Using optim with Nelder-Mead method
mle.result.weibull <- optim(
par = initial.params,
fn = weibull.loglik,
gr = weibull.score,
data = times,
method = "L-BFGS-B",
hessian = TRUE,
control = list(trace = FALSE,
fnscale = -1,
maxit = 500,
abstol = 1e-8)
)
##
mle.result.weibull$par
shape scale
3.370783 31.418204
The estimate for the shape parameter is 3.370783, while the estimate
for the scale parameter is 31.418204.
To verify, we can use the fitdistr() function from the
MASS package:
mle.weibull.fit <- fitdistr(times, "weibull")
mle.weibull.fit
shape scale
3.3707800 31.4184018
( 0.3206184) ( 1.1278755)
They’re just about the same, so we can say that our original
estimates are correct.
Part B
For this part, we will find the maximum likelihood estimator for the
exponential distribution, treating this sample as if it came from an
exponential distribution.
In the case of the exponential distribution, we are only estimating
the scale parameter because an exponential random variable is just a
Weibull random variable with the shape parameter, \(\beta\), set to 1.
Depending on how we write the density function for the exponential,
the resulting maximum likelihood estimate for the scale parameter is:
\[
\hat{\lambda} = \frac{1}{n} \sum x_i = \bar{x} \implies
\frac{1}{\hat{\lambda}} = \frac{n}{\sum x_i} = \frac{1}{\bar{x}}
\]
Regardless of how the estimator is written, the process for solving
for \(\lambda\) is the same. In this
case, since we only have one parameter to solve for, and it has a simple
form, we can use simpler methods to solve for it. Since we already
determined that \(\lambda\) is just the
sample mean, we will use that estimate.
First, since we did it manually in Part A, we can do it manually in
this part as well.
## Manual MLE implementation for Exponential distribution
# Create Exponential Log-Likelihood Function
exponential.loglik <- function(params, data) {
scale <- params[1]
n <- length(data) # sample size
# Log-likelihood for Exponential Distribution
loglik <- -n*log(scale) - sum(data)/scale
return(loglik) # Return
}
# Score equation
exponential.score <- function(params, data) {
scale <- params[1]
n <- length(data)
# Gradient for scale parameter
grad_scale <- -(n/scale) + sum(data)/scale^2
return(grad_scale)
}
# Need to provide initial values for parameters
# Reasonable starting values
# Using optim with Nelder-Mead method
mle.result.exponential <- optim(
par = 5, # Just a generic number for an estimate
fn = exponential.loglik,
gr = exponential.score,
data = times,
method = "L-BFGS-B",
hessian = TRUE,
control = list(trace = FALSE,
fnscale = -1,
maxit = 500,
abstol = 1e-8)
)
##
mle.result.exponential$par
[1] 28.18533
Now let’s plug the sample mean for the mle:
## Using Sample Mean as MLE For Exponential
mle.exponential.scale <- mean(times)
mle.exponential.scale
[1] 28.18533
The mean of our sample is 28.18533, so the estimate for our scale
parameter is 28.18533.
To verify, we can use fitdistr() from the MASS
package:
## Verifying MLE For Exponential via fitdistr()
exponential.fit <- fitdistr(times, "exponential")
exponential.fit
rate
0.035479446
(0.004096813)
As we can see, R gave us the estimate for the rate parameter. But we
need the estimate for the scale parameter. The good news is that we can
get the estimate for the scale parameter, as it is just the inverse of
the rate parameter. In other words, scale = 1/rate.
mle.fit <- 1/exponential.fit$estimate
mle.fit
rate
28.18533
Part C
For this part, we will conduct the first of two hypothesis tests to
determine whether or not this data comes from a Weibull distribution or
an exponential distribution. Specifically, we will conduct a likelihood
ratio \(\chi^2\) test. Our null
hypothesis, \(H_0\), is that this data
comes from an exponential distribution. In other words, our null
hypothesis is that \(\beta = 1\). Our
alternative hypothesis, \(H_a\), is
that it our data comes from a Weibull distribution. In other words, our
alternative hypothesis is that \(\beta \neq
1\). To put into symbols: \[
H_0: \beta = 1 (Exponential) \\
H_a: \beta \neq 1 (Weibull)
\]
This test will be done at significance level \(\alpha = 0.05\).
## Likelihood Ratio Test using both Log-Likelihood Functions
# Weibull Log-Likelihood Function
loglike.weibull <- function(data, lambda, beta){
n <- length(data)
n*log(beta)-n*beta*log(lambda)+(beta-1)*sum(log(data))-sum((data/lambda)^beta)
}
# Exponential Log-Likelihood Function
loglike.exponential <- function(data, lambda){
n <- length(data)
-n*log(lambda)-sum(data)/lambda
}
## Evaluate Log Likelihood @ MLE
scale.est.weibull <- mle.result.weibull$par[[2]]
shape.est.weibull <- mle.result.weibull$par[[1]]
scale.est.exponential <- mle.fit
# Exponential vs Weibull Hypotheses
LogLike.Alt <- loglike.weibull(times, scale.est.weibull, shape.est.weibull)
LogLike.Null <- loglike.exponential(times, scale.est.exponential)
ratio <- 2*(LogLike.Alt - LogLike.Null) #Computes likelihood ratio
## Getting p-value for ratio test
pvalue.ratio <- 1 - pchisq(ratio, df = 1) #Gets p-value
# STatistic and p-value together
test1 <- c(ratio, pvalue.ratio)
test1
rate rate
100.6144 0.0000
Our \(\chi^2\) statistic is
100.6144, and the p-value is \(\approx\) 0, which so it’s safe to say that
we can reject the null hypothesis and conclude that there is sufficient
evidence to say that this sample came from an weibull distribution.
Part D
For this part, we will conduct the second of our hypothesis tests to
determine whether or not this sample came from a Weibull distribution or
an exponential distribution. Specifically, we will use a bootstrapped
likelihood ratio hypothesis test. The null hypothesis, the alternative
hypothesis, and the level of significance will be the same as the
likelihood ratio test from Part C.
# Bootstrap Likelihood Ratio test for normal mean
set.seed(1)
bootstrap_lrt_test <- function(data, scale_0, B = 10000) {
n <- length(data)
# MLE of Exponential
scale_hat <- mean(data)
scale_0 <- mean(data)
# Log-likelihoods
logL_hat <- sum(dexp(data, rate = 1/scale_hat, log = TRUE))
logL0 <- sum(dexp(data, rate = 1/scale_0, log = TRUE))
# Likelihood Ratio Test Statistic
LR_obs <- -2 * (logL0 - logL_hat)
# Bootstrap distribution
LR_star <- numeric(B)
for (b in 1:B) {
# Generate data under H0: Exp(Scale)
bs_sample <- rexp(n, rate = 1/scale_0)
# MLE from bootstrap sample
scale_1 <- mean(bs_sample)
# Likelihoods
logL_star <- sum(dexp(bs_sample, rate = 1/scale_hat, log = TRUE))
logL_star0 <- sum(dexp(bs_sample, rate = 1/scale_0, log = TRUE))
LR_star[b] <- -2 * (logL_star0 - logL_star)
}
p_value <- (sum(LR_star >= LR_obs) + 1) / (B + 1)
return(list(
LR_obs = LR_obs,
p_value = p_value,
scale_1 = scale_1,
scale_0 = scale_0
))
}
bootstrap_lrt_test(times)
$LR_obs
[1] 0
$p_value
[1] 1
$scale_1
[1] 22.82617
$scale_0
[1] 28.18533
It’s hard to determine what exactly our test statistic is, but one
thing is clear: the p-value for the test is \(\approx\) 1, so this time we will fail to
reject the null hypothesis, and we can conclude that there is
insufficient evidence to say that this sample came from a weibull
distribution.
Part E
The two tests did not generate the same result. What is odd is how
much the bootstrapped estimate for the scale parameter changed as n
increases. If n = 1000, then the scale parameter is estimated to be
26.5558. If n = 5000, then the scale parameter is estimated to be
30.10166. If n = 10,000, then the scale parameter is estimated to be
22.82617.
Still, based on the non-bootstrapped likelihood ratio test, i would
be inclined to believe that the Weibull distribution is a better fit for
this data.
---
title: "STA 512 Assignment 12: Bootstrap Testing Hypothesis"
author: "Ian VanWright"
date: "04/20/2026"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: yes
    toc_collapsed: yes
    code_folding: show
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{html, echo=FALSE}
<button id="toggleTOCBtn" class="btn btn-default" style="position: fixed; bottom: 20px; right: 20px; z-index: 1000;">
  Toggle TOC
</button>

<script>
document.getElementById("toggleTOCBtn").onclick = function() {
  var toc = document.querySelector(".list-group, #TOC, .tocify");
  if (toc) {
    if (toc.style.display === "none") {
      toc.style.display = "";
    } else {
      toc.style.display = "none";
    }
  }
};
</script>

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}
if (!require("fitdistrplus")) {
  install.packages("fitdistrplus")
  library(fitdistrplus)
}
if (!require("MASS")) {
  install.packages("MASS")
  library(MASS)
}

## library(fitdistrplus)
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```

\

# Introduction
For this assignment, we will focus on comparing a asymptotic $\chi^2$ hypothesis test to a bootstrapped hypothesis test to determine whether or not a sample came from one distribution or another. Specifically, we will do a Wald $\chi^2$ and a bootstrapped hypothesis test to determine if a sample came from a Weibull distribution or an exponential distribution.

The sample we will use for our tests is a sample consisting of 75 failure times for gearboxes used in wind turbines at a distribution company's facility.
```{r}
times <- c(5.2, 7.8, 9.1, 11.3, 12.5, 13.0, 14.2, 15.1, 15.9, 16.7, 17.2, 17.8, 18.4, 18.9, 19.3, 19.7, 20.2, 20.6, 21.0, 21.5, 21.9, 22.3, 22.7, 23.1, 23.5, 23.9, 24.3, 24.7, 25.1, 25.5, 25.9, 26.3, 26.7, 27.1, 27.5, 27.9, 28.3, 28.7, 29.1, 29.5, 29.9, 30.3, 30.7, 31.1, 31.5, 31.9, 32.3, 32.7, 33.1, 33.5, 33.9, 34.3, 34.7, 35.1, 35.5, 35.9, 36.3, 36.7, 37.1, 37.5, 37.9, 38.3, 38.7, 39.1, 39.5, 39.9, 40.3, 40.7, 41.1, 41.5,
41.9, 42.3, 42.7, 43.1, 43.5)
```

# Part A
For this part, we will find the maximum likelihood estimators of the Weibull distribution, treating this sample as if it comes from a Weibull distribution.

First, we'll start with the log-likelihood function for the weibull distribution:

$$
\begin{align}
    \ell(k, \lambda; \mathbf{t}) &= \ln L(k, \lambda; \mathbf{t}) \\
    &= \sum_{i=1}^{n} \ln \left[ \frac{k}{\lambda} \left( \frac{t_i}{\lambda} \right)^{k-1} \exp\left( -\left( \frac{t_i}{\lambda} \right)^k \right) \right] \\
    &= \sum_{i=1}^{n} \left[ \ln k - \ln \lambda + (k-1)(\ln t_i - \ln \lambda) - \left( \frac{t_i}{\lambda} \right)^k \right] \\
    &= \sum_{i=1}^{n} \left[ \ln k - k \ln \lambda + (k-1) \ln t_i - \left( \frac{t_i}{\lambda} \right)^k \right]
\end{align}

$$

From there, we can take the partial derivatives (one with respect to each variable), set them both equal to 0, and then solve for the two parameters and get their maximum likelihood estimates. If done right, the maximum likelihood estimates are:
$$
\begin{align}
    \hat{\lambda} &= \left( \frac{1}{n} \sum_{i=1}^{n} x_i^{\hat{\beta}} \right)^{1/\hat{\beta}} \quad \\
    \frac{1}{\hat{\beta}} &= \frac{\sum_{i=1}^{n} (t_i^{\hat{\beta}} \ln x_i)}{\sum_{i=1}^{n} x_i^{\hat{\beta}}} - \frac{1}{n} \sum_{i=1}^{n} \ln x_i \quad
\end{align}
$$

Where $\lambda$ is the scale parameter and $\beta$ is the shape parameter.

To obtain these parameters, we will use R's built in optim() function, which will solve a system of equations numerically. The following code will be used to obtain the parameter estimates:
```{r}
## Manual MLE implementation for Weibull distribution

# Create Weibull log-likelihood function
weibull.loglik <- function(params, data) {
  shape <- params[1]  # passing the parameters
  scale <- params[2]

n <- length(data)   # sample size
  
# Log-likelihood for Weibull distribution
loglik <- n * log(shape) - n * shape * log(scale) + 
            (shape - 1) * sum(log(data)) - 
            sum((data / scale)^shape)
  
  return(loglik)    # Return
}


# Score equation

weibull.score <- function(params, data) {
  shape <- params[1]
  scale <- params[2]
  n <- length(data)
  
# Gradient for shape parameter
grad_shape <- n/shape - n * log(scale) + 
                sum(log(data)) - 
                sum((data/scale)^shape * log(data/scale))
  
# Gradient for scale parameter
grad_scale <- -(n * shape)/scale + 
                (shape/scale) * sum((data/scale)^shape)
  
  return(c(grad_shape, grad_scale))
}

# Need to provide initial values for parameters
initial.params <- c(shape = 1, scale = 5)  # Reasonable starting values


# Using optim with Nelder-Mead method
mle.result.weibull <- optim(
  par = initial.params,
  fn = weibull.loglik,
  gr = weibull.score,
  data = times,
  method = "L-BFGS-B",
  hessian = TRUE,
  control = list(trace = FALSE,
                 fnscale = -1,
                 maxit = 500,
                 abstol = 1e-8)
)
##
mle.result.weibull$par
```
The estimate for the shape parameter is 3.370783, while the estimate for the scale parameter is 31.418204.

To verify, we can use the fitdistr() function from the `MASS` package:

```{r}
mle.weibull.fit <- fitdistr(times, "weibull")
mle.weibull.fit
```
They're just about the same, so we can say that our original estimates are correct.


# Part B
For this part, we will find the maximum likelihood estimator for the exponential distribution, treating this sample as if it came from an exponential distribution.

In the case of the exponential distribution, we are only estimating the scale parameter because an exponential random variable is just a Weibull random variable with the shape parameter, $\beta$, set to 1.

Depending on how we write the density function for the exponential, the resulting maximum likelihood estimate for the scale parameter is:
$$
\hat{\lambda} = \frac{1}{n} \sum x_i = \bar{x} \implies \frac{1}{\hat{\lambda}} = \frac{n}{\sum x_i} = \frac{1}{\bar{x}}
$$ 

Regardless of how the estimator is written, the process for solving for $\lambda$ is the same. In this case, since we only have one parameter to solve for, and it has a simple form, we can use simpler methods to solve for it. Since we already determined that $\lambda$ is just the sample mean, we will use that estimate.

First, since we did it manually in Part A, we can do it manually in this part as well.

```{r}
## Manual MLE implementation for Exponential distribution

# Create Exponential Log-Likelihood Function
exponential.loglik <- function(params, data) {
  scale <- params[1]

n <- length(data)   # sample size
  
# Log-likelihood for Exponential Distribution
loglik <- -n*log(scale) - sum(data)/scale
  
  return(loglik)    # Return 
}


# Score equation

exponential.score <- function(params, data) {
  scale <- params[1]
  n <- length(data)

  
# Gradient for scale parameter
grad_scale <- -(n/scale) + sum(data)/scale^2

  return(grad_scale)
}

# Need to provide initial values for parameters
  # Reasonable starting values


# Using optim with Nelder-Mead method
mle.result.exponential <- optim(
  par = 5, # Just a generic number for an estimate
  fn = exponential.loglik,
  gr = exponential.score,
  data = times,
  method = "L-BFGS-B",
  hessian = TRUE,
  control = list(trace = FALSE,
                 fnscale = -1,
                 maxit = 500,
                 abstol = 1e-8)
)
##
mle.result.exponential$par
```

Now let's plug the sample mean for the mle:
```{r}
## Using Sample Mean as MLE For Exponential 

mle.exponential.scale <- mean(times)
mle.exponential.scale
```

The mean of our sample is 28.18533, so the estimate for our scale parameter is 28.18533.

To verify, we can use fitdistr() from the `MASS` package:
```{r}
## Verifying MLE For Exponential via fitdistr()

exponential.fit <- fitdistr(times, "exponential")
exponential.fit
```
As we can see, R gave us the estimate for the rate parameter. But we need the estimate for the scale parameter. The good news is that we can get the estimate for the scale parameter, as it is just the inverse of the rate parameter. In other words, scale = 1/rate.

```{r}
mle.fit <- 1/exponential.fit$estimate
mle.fit
```

# Part C
For this part, we will conduct the first of two hypothesis tests to determine whether or not this data comes from a Weibull distribution or an exponential distribution. Specifically, we will conduct a likelihood ratio $\chi^2$ test. Our null hypothesis, $H_0$, is that this data comes from an exponential distribution. In other words, our null hypothesis is that $\beta = 1$. Our alternative hypothesis, $H_a$, is that it our data comes from a Weibull distribution. In other words, our alternative hypothesis is that $\beta \neq 1$. To put into symbols:
$$
H_0: \beta = 1 (Exponential) \\
H_a: \beta \neq 1 (Weibull)
$$ 

This test will be done at significance level $\alpha = 0.05$. 
```{r}
## Likelihood Ratio Test using both Log-Likelihood Functions

# Weibull Log-Likelihood Function
loglike.weibull <- function(data, lambda, beta){
  n <- length(data)
  n*log(beta)-n*beta*log(lambda)+(beta-1)*sum(log(data))-sum((data/lambda)^beta)
}

# Exponential Log-Likelihood Function
loglike.exponential <- function(data, lambda){ 
  n <- length(data)
  -n*log(lambda)-sum(data)/lambda
}

## Evaluate Log Likelihood @ MLE
scale.est.weibull <- mle.result.weibull$par[[2]]
shape.est.weibull <- mle.result.weibull$par[[1]]
scale.est.exponential <- mle.fit

# Exponential vs Weibull Hypotheses
LogLike.Alt <- loglike.weibull(times, scale.est.weibull, shape.est.weibull)
LogLike.Null <- loglike.exponential(times, scale.est.exponential)

ratio <- 2*(LogLike.Alt - LogLike.Null) #Computes likelihood ratio

## Getting p-value for ratio test
pvalue.ratio <- 1 - pchisq(ratio, df = 1) #Gets p-value

# STatistic and p-value together
test1 <- c(ratio, pvalue.ratio)
test1
```
Our $\chi^2$ statistic is 100.6144, and the p-value is $\approx$ 0, which so it's safe to say that we can reject the null hypothesis and conclude that there is sufficient evidence to say that this sample came from an weibull distribution.

# Part D
For this part, we will conduct the second of our hypothesis tests to determine whether or not this sample came from a Weibull distribution or an exponential distribution. Specifically, we will use a bootstrapped likelihood ratio hypothesis test. The null hypothesis, the alternative hypothesis, and the level of significance will be the same as the likelihood ratio test from Part C.


```{r}
# Bootstrap Likelihood Ratio test for normal mean
set.seed(1)

bootstrap_lrt_test <- function(data, scale_0, B = 10000) {

n <- length(data)

# MLE of Exponential
scale_hat <- mean(data)

scale_0 <- mean(data)
    
# Log-likelihoods
logL_hat <- sum(dexp(data, rate = 1/scale_hat, log = TRUE))
logL0 <- sum(dexp(data, rate = 1/scale_0, log = TRUE))

    
# Likelihood Ratio Test Statistic
LR_obs <- -2 * (logL0 - logL_hat)

# Bootstrap distribution
LR_star <- numeric(B)
    for (b in 1:B) {
# Generate data under H0: Exp(Scale)
         bs_sample <- rexp(n, rate = 1/scale_0)
        
        # MLE from bootstrap sample
        scale_1 <- mean(bs_sample)
        
        # Likelihoods
        logL_star <- sum(dexp(bs_sample, rate = 1/scale_hat, log = TRUE))
        logL_star0 <- sum(dexp(bs_sample, rate = 1/scale_0, log = TRUE))
        
        LR_star[b] <- -2 * (logL_star0 - logL_star)
    }
    
p_value <- (sum(LR_star >= LR_obs) + 1) / (B + 1)
    
 return(list(
        LR_obs = LR_obs,
        p_value = p_value,
        scale_1 = scale_1,
        scale_0 = scale_0
    ))
}

bootstrap_lrt_test(times)
```
It's hard to determine what exactly our test statistic is, but one thing is clear: the p-value for the test is $\approx$ 1, so this time we will fail to reject the null hypothesis, and we can conclude that there is insufficient evidence to say that this sample came from a weibull distribution.

# Part E
The two tests did not generate the same result. What is odd is how much the bootstrapped estimate for the scale parameter changed as n increases. If n = 1000, then the scale parameter is estimated to be 26.5558. If n = 5000, then the scale parameter is estimated to be 30.10166. If n = 10,000, then the scale parameter is estimated to be 22.82617. 

Still, based on the non-bootstrapped likelihood ratio test, i would be inclined to believe that the Weibull distribution is a better fit for this data.

