Final Exam Guidelines

  • Coverage: The major concepts and inference procedures—such as sampling distributions, confidence intervals, and hypothesis testing—are covered and implemented using both classical parametric likelihood-based methods and modern non-parametric approaches, including the bootstrap and kernel density estimation.

  • Part A requires derivation of selected likelihood-based functions for performing various types of inference, with sufficient detail to enable translation of these derivations into code for numerical analysis.

  • Your code for the problems in Part B must align with your derivations in Part A and be well commented where necessary.

  • In Part B, all numerical results must be interpreted from a practical perspective.


Policies of Using AI Tools

  • Policy on AI Tool Use: Students must adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

  • Code Inclusion Requirement: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.


Working Model for the Final Exam

Caution: Please follow the suggested expressions and guided steps to complete the exam. Other approaches such as transformation for trivialize the problems that will not meet the exam objectives.

The Kumaraswamy distribution is a two-parameter continuous probability distribution defined on the interval (0, 1). It is often used as an alternative to the Beta distribution due to its simple closed-form expressions for the cumulative distribution function (CDF) and quantile function. It is commonly used in

  • Hydrology: Modeling rainfall, streamflow, or other bounded natural phenomena

  • Economics: Income shares, proportions, or bounded indices

  • Monte Carlo simulation: Efficient random variate generation (via inverse transform)

  • Machine learning: Output layer for bounded targets, prior distributions in Bayesian models

  • Reliability engineering: Modeling failure rates of systems with bounded lifetimes


Let \(X\) be the Kumaraswamy random variable with Cumulative Distribution Function (CDF)

\[ F(x; a, b) = 1 - (1 - x^a)^b \]

where \(a > 0\) and \(b > 0\) unknown parameters and \(0 < x < 1\).

The following are two special case of the Kumaraswamy distribution:

  1. Uniform Distribution: When \(a = 1\) and \(b = 1\), the Kumaraswamy distribution becomes a uniform distribution over \([0, 1]\) with CDF \(F(x) = x\).

  2. Power Distribution: when \(b = 1\) and \(a > 0\), the Kumaraswamy distribution becomes a power distribution over \([0, 1]\) with CDF \(F(x) = x^a\).

This final exam focuses on inferences of Kumaraswamy distribution and related data analysis.

Part A: Methodological Derivations


Problem A1:

Show that the density function of the Kumaraswamy distribution is

\[ f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}. \]

In order to get the probability density function (pdf) for the Kumaraswamy distribution we must take the derivative of the cdf: \[ f(x; a, b) = \frac{d}{dx} F(x; a, b) = \frac{d}{dx} [1 - (1 - x^a)^b] \]

Using the chain rule, we obtain:

\[ f(x; a, b) = -b(1 - x^a)^{b-1} \cdot -ax^{a-1} \]

Once simplified, the pdf is: \[ f(x; a, b) = abx^{a-1}(1 - x^a)^{b-1} \]


Problem A2:

Let \(\{x_1, x_2, \cdots, x_n \}\) be an i.i.d. random sample taken from a population that follows the aove 2-parameter Kumaraswamy distribution. Write out the loglikelihood function of \(a\) and \(b\), denoted by \(\ell(a,b)\), based on the above random sample and derive the gradient vector \([\ell_a^\prime(a,b), \ell_b^\prime(a,b)]\), the first order partial derivative of the log-likelihood with respect to parameters \(a\) and \(b\).

From Problem A1, the PDF is: \[ f(x) = abx^{a-1}(1 - x^a)^{b-1}, \quad 0 < x < 1, a > 0, b > 0 \]

The likelihood function \(L(a, b)\) is: \[ L(a, b) = \prod_{i=1}^n f(x_i; a, b) = \prod_{i=1}^n abx_i^{a-1}(1 - x_i^a)^{b-1} \]

Rewriting the equation, we get: \[ \prod_{i=1}^n ab \cdot \prod_{i=1}^n x_i^{a-1} \cdot \prod_{i=1}^n (1 - x_i^a)^{b-1} \] \[ = a^n b^n \cdot \prod_{i=1}^n x_i^{a-1} \prod_{i=1}^n (1 - x_i^a)^{b-1} \]

To get the log-likelihood function, \(\ell(a, b)\), we take the natural log of the likelihood function. The log-likelihood function is: \[ \ell(a, b) = \ln L(a, b) = n \ln a + n \ln b + (a-1) \sum_{i=1}^n \ln x_i + (b-1) \sum_{i=1}^n \ln(1 - x_i^a) \]

To build our gradient vector, we need the first-order partial derivatives: \[ \ell'_a(a, b) = \frac{\partial \ell}{\partial a} = \frac{n}{a} + \sum \ln x + (b-1) \sum \frac{x_i^a \ln x_i}{1 - x_i^a} \] \[ \ell'_b = \frac{\partial \ell}{\partial b} = \frac{n}{b} + \sum \ln(1 - x_i^a) \]

The gradient vector \(\nabla \ell\) is: \[ \nabla \ell = \left[ \frac{n}{a} + \sum \ln x + \sum \frac{(b-1) x_i^a \ln x_i}{1 - x_i^a}, \frac{n}{b} + \sum \ln(1 - x_i^a) \right] \]


Problem A3:

Based on the gradients functions obtained in the above problem A2, derive the observed Fisher Information matrix (i.e, the negative Hessian Matrix).

In order to obtain the Fisher matrix, we need to get our second order partial derivatives.

The second-order partial derivatives are: \[ \frac{\partial^2 \ell}{\partial a^2} = -\frac{n}{a^2} + \sum \frac{(b-1) x_i^a (\ln x_i)^2}{1 - x_i^a} \] \[ \frac{\partial^2 \ell}{\partial a \partial b} = \sum \frac{x_i^a \ln x_i}{1 - x_i^a} \] \[ \frac{\partial^2 \ell}{\partial b \partial a} = \sum \frac{x_i^a \ln x_i}{1 - x_i^a} \] \[ \frac{\partial^2 \ell}{\partial b^2} = -\frac{n}{b^2} \]

If we were to build the matrix as is, we would get the hessian matrix. But there’s good news: the Fisher information matrix is just the hessian matrix multiplied by a scalar of -1. So, we can multiply each partial derivative by -1 and then build the matrix.

The Fisher Matrix ends up as: \[ \begin{bmatrix} \frac{n}{a^2} - \sum \frac{(b-1)x^a(\ln x)^2}{1-x^a} & -\sum \frac{x^a \ln x}{1-x^a} \\ -\sum \frac{x^a \ln x}{1-x^a} & \frac{n}{b^2} \end{bmatrix} \]

 

Problem A4:

Consider power distribution \(F(x) = x^a, (a >0 \quad \text{ and }\quad x \in (0,1))\), a special case of the Kumaraswamy distribution with \(b = 1\), and a random sample from this distribution \(\{ x_1, x_2, \cdots, x_n\}\). Derive the MLE and MME of \(a\) respectively. [Hint: To find the MME, you need to compute the moment of the power distribution; that is, \(E[X^k] = \int_0^1 x^k F'(x) dx\). Note that both the MLE and the MME have closed-form expressions.]

In order to derive either estimate for a, we need the pdf of the distribution. To get the pdf from the cdf, we need to take the derivative of the cdf: \[ f(x; a) \frac{d}{dx} F(x; a) = \frac{d}{dx} = x^a \] \[ f(x; a) = ax^{a-1}, \quad 0 < x < 1, \quad a > 0 \]

From here, let’s first derive the MLE for a. Once the pdf is obtained, we can get the likelihood function: \[ L(a) = \prod_{i=1}^n f(x_i; a) = \prod_{i=1}^n ax_i^{a-1} = a^n \prod_{i=1}^n x_i^{a-1} \]

And then the Log-likelihood function: \[ \ell(a) = \ln L = n \ln a + (a-1) \sum_{i=1}^n \ln x_i \]

Next, we can take the derivative with respect to a and obtain the score function: \[ \frac{d\ell}{da} = \frac{n}{a} + \sum_{i=1}^n \ln x_i \] Unlike before, we didn’t need to solve for any parameters, as we were just building a vector of the score functions. But, now we do need to solve for a parameter.

To do so, we need to set the derivative equal to 0 and solve for a. \[ \frac{n}{a} + \sum_{i=1}^n \ln x_i = 0 \]

Solving for a, we get our maximum likelihood estimator for a: \[ \hat{a} = -\frac{n}{\sum_{i=1}^n \ln x_i} \]

Next, we can get the method of moments estimator for a. To do so, we must set the population moment equal to the sample moment. Since we are only estimating one parameter, we only need the first population and sample moments.

The first moment for either a population or sample is the mean. To get the population mean, we need to compute the expected value of x. The expected value of x is: \[ \mu_x = \int_0^1 x \cdot ax^{a-1} \, dx = \int_0^1 ax^a \, dx = \left[ \frac{ax^{a+1}}{a+1} \right]_0^1 = \frac{a}{a+1} \]

Once we get the population mean, we can set it equal to the sample mean: \[ \mu_x = \bar{x} \implies \bar{x} = \frac{a}{a+1} \]

Solving for a, we get the method of moments estimator for a: \[ \hat{a} = \frac{\bar{x}}{1 - \bar{x}} \]


Problem A5:

Using the same setting as in Problem A4, find the asymptotic (Wald) confidence interval for \(a\). [Hint: Compute the Fisher information for \(a\), then take its reciprocal to obtain the variance.]

A Wald confidence interval looks like: \[ \text{Wald CI} = \hat{a} \pm z_{1-\alpha/2}*se, \]

where \(\hat{a}\) is the maximum likelihood estimate for a, \(z_{1-\alpha/2}\) is the critical value of the \((1 - \alpha)*100%\) confidence level, and se is the standard error.

Assuming we’re building a 95% Wald Confidence Interval, which would mean that \(\alpha = 0.05\):

\[ 95\% \text{ CI} = \left( \hat{a} - z_{0.975}*se, \hat{a} + 1.96*se \right) \]

We already have the maximum likelihood estimate from problem A4. But what is the critical z value and the standard error? Getting the critical z value is easy: that can just be looked up. The critical z value for a 95% confidence interval is 1.96. Getting the standard error is a little more difficult because that requires some math.

To get the standard error, we need to go back to the log likelihood function of a. We will first obtain the variance of a by taking the second derivative of the log-likelihood function (or the derivative of the score function we already obtained in problem A4). Regardless, we should get: \[ \frac{d^2\ell}{da^2} = -\frac{n}{a^2} \]

From here, we can compute the Fisher information, which is just the negative expected value of the second derivative. Since the second derivative is just a constant, the expected value will just be that derivative. Multiply it by -1, we get the fisher information.

\[ I(a) = -E\left(\frac{d^2 \ell}{da^2}\right) = \frac{n}{a^2}$ \]

To get the variance, we just take the reciprocal of the fisher information. \[ Var(a) = \frac{a^2}{n} \]

To get the standard error, we just take the square root of the variance: \[ {se}(a) = \frac{a}{\sqrt{n}} \]

And now we can build the 95% Wald Confidence Interval: \[ 95\% \text{Wald CI} = \left( \hat{a} - 1.96 \frac{\hat{a}}{\sqrt{n}}, \hat{a} + 1.96 \frac{\hat{a}}{\sqrt{n}} \right) \]


Problem A6:

Using the same setting as in Problem A4, perform a likelihood ratio test for the hypothesis \(H_0 :a=1\) (i.e., the power distribution reduces to a uniform distribution). [Hint: Evaluate the log-likelihood function at the maximum likelihood estimate \(\hat{a}\) and at \(a=1\), then use these values to construct the LRT test statistic.]

When doing a likelihood ratio test, we are comparing the fitness of two models: a simpler restricted model, or a more complex unrestricted model. In this case, we are going to test whether or not a power distribution or another Kumaraswamy based distribution will fit the data better. In other words, is this sample more likely to come from a population that is modeled with a power distribution or another Kumaraswamy based distribution?

\[ H_0: a = 1 (Power Distribution) H_a: a \neq 1 (Kumaraswamy Distribution) \]

To conduct our test, we need to evalute the log likelihood of the power distribution at two points: the first is when a is set to 1, which gives us our restricted model and the second is when a is set to the maximum likelihood estimator, which gives us our unrestricted model.

The null hypothesis, \(H_0\): \[ a = 1 \implies \ell(1) = 0 \]

The alternative hypothesis, \(H_a\): \[ a=\hat{a} \implies \ell(\hat{a}) = n \ln \hat{a} + (\hat{a}-1) \sum_{i=1}^n \ln x_i \]

The test statistic will be: \[ LR = 2[\ell(\hat{a}) - \ell(1)] = 2 \left[ n \ln \hat{a} + (\hat{a}-1) \sum_{i=1}^n \ln x_i \right] \]

There are several important things to note here: The first is that this test will work better if the sample size is large enough. The second is that if the sample size is large enough, then the test statistic will asymptotically follow a \(\chi^2\) distribution with 1 degree of freedom. The the third is that this is a two tailed test. Which means that there are two rejection regions: if the \(\chi^2\) statistic you obtain is less than the critical value associated with \(\frac{\alpha}{2}\) or greater than the critical value associated with \(\{frac{1-\alpha}{2}\). If \(n\) is large enough (\(n > 30\)), then \(LR \sim \chi_1^2\). For example, if the level of significance is 0.05, then you’d reject the null hypothesis if your test statistic was less than 0.000982 (and so on) or greater than 5.023886. But at least the p-value doesn’t change for rejection; you reject the null if the p-value is less than the level of significance.


Part B: Numerical Analysis

All code must be well commented and adhere to best coding practices

Working Dataset: A small reservoir supplies water to a town. During the dry season (50 days), engineers record the fraction of usable storage filled each morning. Values near 0 mean the reservoir is nearly empty; values near 1 mean it’s full. The distribution tends to be right‑skewed (mostly low levels due to drought) but with occasional replenishment.

The following 50 data points (ordered for clarity) represent the daily proportion of usable storage:

0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78
# Putting Data Into An Object

water <- c(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)


Problem B1:

Fit the Kumaraswamy distribution to the above data. Use the derivations in Problem A2 to find the MLE of \(a\) and \(b\). Please copy the key formulas before coding.

# Manual MLE implementation for Kumaraswamy distribution

# Kumaraswamy log-likelihood function
kuma.loglik <- function(params, data) {
  shape1 <- params[1]  # passing the parameters
  shape2 <- params[2]

n <- length(data)   # sample size
  
# Log-likelihood for Kumaraswamy distribution
loglik <- n*log(shape1) + n*log(shape2) + (shape1 - 1)*sum(log(data)) + 
            (shape2 - 1)*sum(log(1 - data^shape1)) 
  
  return(loglik)    # Return negative for minimization
}


## Score (gradient) equation
kuma.score <- function(params, data) {
  shape1 <- params[1]
  shape2 <- params[2]
  n <- length(data)
  
# Gradient for Parameter a
  grad_shape1 <- n/shape1 + sum(log(data)) -
                (shape2 - 1)*sum((log(data)*data^shape1)/(1-data^shape1))
  
# Gradient for Parameter b
  grad_shape2 <- (n/shape2) + sum(log(1-data^shape1))
                
  
  return(c(grad_shape1, grad_shape2))
}


# Need to provide initial values for parameters
initial.params <- c(shape1 = 2.2, shape2 = 1.5)  # Reasonable starting values

# Using optim with Nelder-Mead method
mle.result.kuma <- optim(
  par = initial.params,
  fn = kuma.loglik,
  gr = kuma.score,
  data = water,
  method = "L-BFGS-B",
  hessian = TRUE,
  control = list(trace = FALSE,
                 fnscale = -1,
                 maxit = 500,
                 abstol = 1e-8)
)
##
mle.result.kuma$par
  shape1   shape2 
2.529602 7.883386 

According to the output, the first shape parameter, a, is equal to 2.529602, and the second shape parameter, b, is equal to 7.883386.

For future use, the estimates will be extracted and put into objects.

# Extracting the Parameter Estimates for Future Use

shape1.est <- mle.result.kuma$par[1]
shape2.est <- mle.result.kuma$par[2]
kuma.est <- c(shape1.est, shape2.est)
kuma.est
  shape1   shape2 
2.529602 7.883386 


Problem B2:

Fit the power distribution to the above data using the derived of \(a\) obtained in Problem A4 to test the following hypothesis using likelihood ratio procedure ar significance level \(\alpha = 0.05\):

\[ H_0: b = 1 \quad \text{ versus } \quad H_a: b \ne 1. \]

State the statistical decision clearly. What is the practical implication of the testing result?

The hypotheses of our test are as follows: \[ H_0: b = 1 (Kumaraswamy becomes a Power Distribution) / H_a: b \neq 1 (Data follows another Kumaraswamy Distribution) \]

The following code chunk is for the likelihood ratio test:

# Likelihood Ratio Test For Power vs Kumaraswamy distribution

# MLE Of Power Distribution Parameter from Part A4
n <- length(water)
shape3.mle <- -n / sum(log(water))

# Kumaraswamy Log-Likelihood Function
loglike.kuma <- function(data, a, b) {
  n <- length(data)
 n*log(a) + n*log(b) + (a - 1)*sum(log(data)) + 
            (b - 1)*sum(log(1 - data^a)) 
}

# Power Log-Likelihood Function
loglike.power <- function(data, a) { 
  n <- length(data)
  n*log(a) + (a - 1)*sum(log(data))
}

# Unrestricted and Restricted MLEs for LR Test
loglike.alt1 <- loglike.kuma(water, shape1.est, shape2.est)
loglike.null1 <- loglike.power(water, shape3.mle)

# Likelihood ratio test statistic
LR1 <- 2*(loglike.alt1 - loglike.null1)

# p-value from chi-square(1)
pvalue.LR1 <- 1 - pchisq(LR1, df = 1)

# Test Statistic and P-value
LR1
  shape1 
48.92533 
pvalue.LR1
      shape1 
2.658984e-12 

According to the output, the test statistic is \(\chi^2 = 48.92533\), and the corresponding p-value is \(2.65x10^-12\). Since the p-value is less than the level of significance, we can reject the null hypothesis and conclude that there is sufficient evidence to conclude that this data comes from another Kumaraswamy based distribution. To put it in practical terms, the power distribution is not a good fit for this data, meaning this sample is unlikely to come from a population modeled by a power distribution. This sample more than likely comes from a different Kumaraswamy based distribution.


Problem B3:

Use the procedure and code from Problem B1 to estimate the MLEs of \(a\) and \(b\), and then complete the following analyses:

(1). Obtain the bootstrap sampling distributions of \(\hat{a}\) and \(\hat{b}\) and plot each distribution using Gaussian kernel density curves.

First, let’s get our bootstrap samples:

# Bootstrap Sampling Distributions of Kumaraswamy MLEs

# Set Seed to Keep Same Sample
set.seed(1)

# Choose Number of Bootstrap Samples 
B <- 5000

# Store bootstrap estimates
boot.a <- NULL
boot.b <- NULL

for(i in 1:B) {
  
  # Bootstrap sample
  bootstrap.sample <- sample(water, size = n, replace = TRUE)
  
  # Refit Kumaraswamy model to bootstrap sample
  boot.fit <- optim(
    par = c(shape1.est, shape2.est),
    fn = kuma.loglik,
    gr = kuma.score,
    data = bootstrap.sample,
    method = "L-BFGS-B",
    lower = c(1e-8, 1e-8),
    control = list(
      maxit = 1000,
      fnscale = -1,
      trace = FALSE,
      abstol = 1e-8
    )
  )
  
  # Save bootstrap estimates
  boot.a[i] <- boot.fit$par[1]
  boot.b[i] <- boot.fit$par[2]
}

Let’s plot both of the sampling distributions for the estimates of a

# Plot bootstrap distribution of a_hat
hist(boot.a,
     probability = TRUE,
     breaks = 20, 
     col = "lightblue",
     main = "Bootstrap Sampling Distribution for Parameter a",
     xlab = expression(paste("Estimates for ", hat(a))))
lines(density(boot.a), col = "red", lwd = 2)

# Plot bootstrap distribution of b_hat
hist(boot.b,
     probability = TRUE,
     breaks = 20, 
     col = "lightblue",
     main = "Bootstrap Sampling Distribution for Parameter b",
     xlab = expression(paste("Estimates for ", (hat(b)))))
     lines(density(boot.b), col = "red", lwd = 2)

The plot for \(\hat{a}\) appears to roughly resemble a normal curve. The kernel density curve also does fit relatively well. As for the plot for \(\hat{b}\), it does not resemble a normal curve. There is some very obvious skew.

If we take a look at a summary of the distributions, we can clearly see that there is not too much spread for the distribution of \(\hat{a}\). And the mean is relatively close to the median. But there is much more spread in the distribution for \(\hat{b}\), and mean is farther from the median.

summary(boot.a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.767   2.387   2.586   2.622   2.820   4.208 
summary(boot.b)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.756   6.956   8.412   9.022  10.452  32.424 

(2). Construct both the \(95\%\) bootstrap confidence interval and the Wald confidence interval for \(b\). Do these intervals agree with the results obtained in Problem B2? [Compute the standard error of \(\hat{b}\) using the observed Fisher information matrix, i.e., the inverse of the negative Hessian obtained from optim()]

# Find Bootstrap and Wald Confidence Intervals for b

# 95% Percentile Bootstrap Confidence Interval 
bootci.perc.b <- quantile(boot.b, probs = c(0.025, 0.975))

# Pieces to Build 95% Wald Confidence Interval
hessian <- mle.result.kuma$hessian
fisher <- solve(-hessian)
se.b.est <- sqrt(fisher[2, 2])
z <- qnorm(0.975)
error <- z*se.b.est

# Build Wald Confidence Interval
waldci.b <- c(shape2.est - error, shape2.est + error)


bootci.perc.b
     2.5%     97.5% 
 5.006674 16.188918 
waldci.b
   shape2    shape2 
 3.484033 12.282739 

The 95% Bootstrap Confidence Interval is (5.0066, 16.1889), and the 95% Wald confidence interval is (3,.484, 12.2827). Neither confidence interval contains 1, which suggests that the results agree with what we got in Part B2.

(3). Based on the bootstrap sampling distributions from part (1) of this problem, assess whether the validity of the Wald confidence interval is supported.

Since the wald confidence interval is based on asymptotic normality, the distribution should resemble a normal distribution. However, it did not. There was considerable skew, and the summary pointed out that there is considerable spread in the distribution. Based on this criteria, we can conclude that the Wald confidence interval is not a good confidence interval to use in this case.


Problem B4:

In the introduction to the working model for this exam, the Kumaraswamy distribution reduces to the uniform distribution on (0,1). In this problem, we perform a likelihood ratio test for the following hypothesis to assess whether the data come from the uniform distribution on (0,1):

\[ H_0: a = 1\quad \& \quad b = 1\quad \text{ versus } \quad H_a: a \ne 1 \quad \text{or} \quad b \ne 1 \quad \text{or}\quad (a \ne 1 \quad \& \quad b \ne 1). \]

Provide a practical interpretation of the above test result. [Hint: \(H_a\) basically says that there is no constraints for \(a\) and \(b\). Please review the lecture note for module 11 on the likelihood ratio test before coding.]

# Likelihood Ratio test for Uniform vs Kumaraswamy Distribution

# Using Kumaraswamy Likelihood From Part B2 
loglike.alt2 <- loglike.kuma(water, shape1.est, shape2.est)
loglike.null2 <- loglike.kuma(water, 1, 1)

# Likelihood ratio test statistic
LR2 <- 2*(loglike.alt2 - loglike.null2)

# p-value from Chi-Square(1)
pvalue.LR2 <- 1 - pchisq(LR2, df = 1)

# Test Statistic and P-value
LR2
  shape1 
49.12542 
pvalue.LR2
      shape1 
2.401079e-12 

The test statistic is \(\chi^2 = 49.12542\), and the corresponding p-value is \(2.4x10^{-12}\). Since the p-value is smaller than 0.05, then we can reject the null hypothesis, and can conclude that there is sufficient evidence to conclude that this data comes from another Kumaraswamy based distribution. In other words, this sample is unlikely to have originated from a population modeled by a uniform distribution. To put into practical terms, the measurements are all not equally likely to occur.

---
title: "STA 506 Final Examination"
author: "Ian VanWright"
date: " Due: 05/05/2026"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: no
    fig_width: 3
    fig_height: 3
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}
####
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
\
 
## **Final Exam Guidelines** 

* **Coverage**: The major concepts and inference procedures—such as sampling distributions, confidence intervals, and hypothesis testing—are covered and implemented using both classical parametric likelihood-based methods and modern non-parametric approaches, including the bootstrap and kernel density estimation.

* **Part A** requires derivation of selected likelihood-based functions for performing various types of inference, with sufficient detail to enable translation of these derivations into code for numerical analysis.

* Your code for the problems in **Part B** must align with your derivations in **Part A** and be well commented where necessary.

* In **Part B**, all numerical results must be interpreted from a practical perspective.


\

## **Policies of Using AI Tools**

* **Policy on AI Tool Use**: Students must adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

* **Code Inclusion Requirement**: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.

\

## **Working Model for the Final Exam**

<font color = "orange">**Caution**: *Please follow the suggested expressions and guided steps to complete the exam. Other approaches such as transformation for trivialize the problems that will not meet the exam objectives.*</font>


The **Kumaraswamy distribution** is a two-parameter continuous probability distribution defined on the interval (0, 1). It is often used as an alternative to the Beta distribution due to its simple closed-form expressions for the cumulative distribution function (CDF) and quantile function. It is commonly used in 

* **Hydrology**: Modeling rainfall, streamflow, or other bounded natural phenomena

* **Economics**: Income shares, proportions, or bounded indices

* **Monte Carlo simulation**: Efficient random variate generation (via inverse transform)

* **Machine learning**: Output layer for bounded targets, prior distributions in Bayesian models

* **Reliability engineering**: Modeling failure rates of systems with bounded lifetimes

\

Let $X$ be the Kumaraswamy random variable with Cumulative Distribution Function (CDF)  

$$
F(x; a, b) = 1 - (1 - x^a)^b
$$

where $a > 0$ and $b > 0$ unknown parameters and $0 < x < 1$. 

The following are two special case of the Kumaraswamy distribution:

1. **Uniform Distribution**: When $a = 1$ and $b = 1$, the Kumaraswamy distribution becomes a uniform distribution over $[0, 1]$ with CDF $F(x) = x$.


2. **Power Distribution**: when $b = 1$ and $a > 0$, the Kumaraswamy distribution becomes a power distribution over $[0, 1]$ with CDF $F(x) = x^a$. 

This final exam focuses on inferences of Kumaraswamy distribution and related data analysis.


## Part A: Methodological Derivations

\

### **Problem A1**: 
Show that the density function of the Kumaraswamy distribution is

$$
f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}.
$$

In order to get the probability density function (pdf) for the Kumaraswamy distribution we must take the derivative of the cdf:
$$
f(x; a, b) = \frac{d}{dx} F(x; a, b) = \frac{d}{dx} [1 - (1 - x^a)^b] 
$$

Using the chain rule, we obtain:

$$
f(x; a, b) = -b(1 - x^a)^{b-1} \cdot -ax^{a-1} 
$$

Once simplified, the pdf is:
$$
f(x; a, b) = abx^{a-1}(1 - x^a)^{b-1} 
$$

\

### **Problem A2**: 
Let $\{x_1, x_2, \cdots, x_n \}$ be an i.i.d. random sample taken from a population that follows the aove 2-parameter Kumaraswamy distribution. Write out the loglikelihood function of $a$ and $b$, denoted by $\ell(a,b)$, based on the above random sample and **derive** the gradient vector $[\ell_a^\prime(a,b), \ell_b^\prime(a,b)]$, the first order partial derivative of the log-likelihood with respect to parameters $a$ and $b$.

From Problem A1, the PDF is:
$$
f(x) = abx^{a-1}(1 - x^a)^{b-1}, \quad 0 < x < 1, a > 0, b > 0
$$

The likelihood function $L(a, b)$ is:
$$
L(a, b) = \prod_{i=1}^n f(x_i; a, b) = \prod_{i=1}^n abx_i^{a-1}(1 - x_i^a)^{b-1} 
$$

Rewriting the equation, we get:
$$
\prod_{i=1}^n ab \cdot \prod_{i=1}^n x_i^{a-1} \cdot \prod_{i=1}^n (1 - x_i^a)^{b-1} 
$$
$$
= a^n b^n \cdot \prod_{i=1}^n x_i^{a-1} \prod_{i=1}^n (1 - x_i^a)^{b-1} 
$$

To get the log-likelihood function, $\ell(a, b)$, we take the natural log of the likelihood function. The log-likelihood function is:
$$
\ell(a, b) = \ln L(a, b) = n \ln a + n \ln b + (a-1) \sum_{i=1}^n \ln x_i + (b-1) \sum_{i=1}^n \ln(1 - x_i^a)
$$

To build our gradient vector, we need the first-order partial derivatives:
$$
\ell'_a(a, b) = \frac{\partial \ell}{\partial a} = \frac{n}{a} + \sum \ln x + (b-1) \sum \frac{x_i^a \ln x_i}{1 - x_i^a}
$$
$$
\ell'_b = \frac{\partial \ell}{\partial b} = \frac{n}{b} + \sum \ln(1 - x_i^a)
$$

The gradient vector $\nabla \ell$ is:
$$
\nabla \ell = \left[ \frac{n}{a} + \sum \ln x + \sum \frac{(b-1) x_i^a \ln x_i}{1 - x_i^a}, \frac{n}{b} + \sum \ln(1 - x_i^a) \right] 
$$

\

### **Problem A3**: 
Based on the gradients functions obtained in the above problem A2, **derive** the observed Fisher Information matrix (i.e, the negative Hessian Matrix).

In order to obtain the Fisher matrix, we need to get our second order partial derivatives.

The second-order partial derivatives are:
$$
\frac{\partial^2 \ell}{\partial a^2} = -\frac{n}{a^2} + \sum \frac{(b-1) x_i^a (\ln x_i)^2}{1 - x_i^a} 
$$
$$
\frac{\partial^2 \ell}{\partial a \partial b} = \sum \frac{x_i^a \ln x_i}{1 - x_i^a} 
$$
$$
\frac{\partial^2 \ell}{\partial b \partial a} = \sum \frac{x_i^a \ln x_i}{1 - x_i^a}
$$
$$
\frac{\partial^2 \ell}{\partial b^2} = -\frac{n}{b^2}
$$

If we were to build the matrix as is, we would get the hessian matrix. But there's good news: the Fisher information matrix is just the hessian matrix multiplied by a scalar of -1. So, we can multiply each partial derivative by -1 and then build the matrix.

The Fisher Matrix ends up as:
$$
\begin{bmatrix} \frac{n}{a^2} - \sum \frac{(b-1)x^a(\ln x)^2}{1-x^a} & -\sum \frac{x^a \ln x}{1-x^a} \\ -\sum \frac{x^a \ln x}{1-x^a} & \frac{n}{b^2} \end{bmatrix}
$$

\ 

### **Problem A4**: 

Consider power distribution $F(x) = x^a, (a >0 \quad \text{ and }\quad x \in (0,1))$, a special case of the Kumaraswamy distribution with $b = 1$, and a random sample from this distribution $\{ x_1, x_2, \cdots, x_n\}$. **Derive** the MLE and MME of $a$ respectively. [*Hint: To find the MME, you need to compute the moment of the power distribution; that is, $E[X^k] = \int_0^1 x^k F'(x) dx$. Note that both the MLE and the MME have closed-form expressions.*]

In order to derive either estimate for a, we need the pdf of the distribution. To get the pdf from the cdf, we need to take the derivative of the cdf:
$$
f(x; a) \frac{d}{dx} F(x; a) = \frac{d}{dx} = x^a
$$
$$
f(x; a) = ax^{a-1}, \quad 0 < x < 1, \quad a > 0
$$

From here, let's first derive the MLE for a. Once the pdf is obtained, we can get the likelihood function:
$$
L(a) = \prod_{i=1}^n f(x_i; a) = \prod_{i=1}^n ax_i^{a-1} = a^n \prod_{i=1}^n x_i^{a-1}
$$

And then the Log-likelihood function:
$$
\ell(a) = \ln L = n \ln a + (a-1) \sum_{i=1}^n \ln x_i
$$

Next, we can take the derivative with respect to a and obtain the score function:
$$
\frac{d\ell}{da} = \frac{n}{a} + \sum_{i=1}^n \ln x_i
$$
Unlike before, we didn't need to solve for any parameters, as we were just building a vector of the score functions. But, now we do need to solve for a parameter.

To do so, we need to set the derivative equal to 0 and solve for a.
$$
\frac{n}{a} + \sum_{i=1}^n \ln x_i = 0
$$

Solving for a, we get our maximum likelihood estimator for a:
$$
\hat{a} = -\frac{n}{\sum_{i=1}^n \ln x_i}
$$

Next, we can get the method of moments estimator for a. To do so, we must set the population moment equal to the sample moment. Since we are only estimating one parameter, we only need the first population and sample moments. 

The first moment for either a population or sample is the mean. To get the population mean, we need to compute the expected value of x. The expected value of x is:
$$
\mu_x = \int_0^1 x \cdot ax^{a-1} \, dx = \int_0^1 ax^a \, dx = \left[ \frac{ax^{a+1}}{a+1} \right]_0^1 = \frac{a}{a+1} 
$$

Once we get the population mean, we can set it equal to the sample mean:
$$
\mu_x = \bar{x} \implies \bar{x} = \frac{a}{a+1}
$$

Solving for a, we get the method of moments estimator for a:
$$
\hat{a} = \frac{\bar{x}}{1 - \bar{x}}
$$

\

### **Problem A5**:

Using the same setting as in **Problem A4**, find the asymptotic (Wald) confidence interval for $a$. [*Hint: Compute the Fisher information for $a$, then take its reciprocal to obtain the variance*.]

A Wald confidence interval looks like:
$$
\text{Wald CI} = \hat{a} \pm z_{1-\alpha/2}*se, 
$$

where $\hat{a}$ is the maximum likelihood estimate for a, $z_{1-\alpha/2}$ is the critical value of the $(1 - \alpha)*100%$ confidence level, and se is the standard error.

Assuming we're building a 95% Wald Confidence Interval, which would mean that $\alpha = 0.05$:

$$
95\% \text{ CI} = \left( \hat{a} - z_{0.975}*se, \hat{a} + 1.96*se \right)
$$

We already have the maximum likelihood estimate from problem A4. But what is the critical z value and the standard error? Getting the critical z value is easy: that can just be looked up. The critical z value for a 95% confidence interval is 1.96. Getting the standard error is a little more difficult because that requires some math. 

To get the standard error, we need to go back to the log likelihood function of a. We will first obtain the variance of a by taking the second derivative of the log-likelihood function (or the derivative of the score function we already obtained in problem A4). Regardless, we should get:
$$
\frac{d^2\ell}{da^2} = -\frac{n}{a^2} 
$$

From here, we can compute the Fisher information, which is just the negative expected value of the second derivative. Since the second derivative is just a constant, the expected value will just be that derivative. Multiply it by -1, we get the fisher information.

$$
I(a) = -E\left(\frac{d^2 \ell}{da^2}\right) = \frac{n}{a^2}$
$$

To get the variance, we just take the reciprocal of the fisher information. 
$$
Var(a) = \frac{a^2}{n}
$$

To get the standard error, we just take the square root of the variance:
$$
{se}(a) = \frac{a}{\sqrt{n}}
$$

And now we can build the 95% Wald Confidence Interval:
$$
95\% \text{Wald CI} = \left( \hat{a} - 1.96 \frac{\hat{a}}{\sqrt{n}}, \hat{a} + 1.96 \frac{\hat{a}}{\sqrt{n}} \right)
$$

\

### **Problem A6**:

Using the same setting as in **Problem A4**, perform a likelihood ratio test for the hypothesis $H_0 :a=1$ (i.e., the power distribution reduces to a uniform distribution). [*Hint: Evaluate the log-likelihood function at the maximum likelihood estimate $\hat{a}$ and at $a=1$, then use these values to construct the LRT test statistic.*]

When doing a likelihood ratio test, we are comparing the fitness of two models: a simpler restricted model, or a more complex unrestricted model. In this case, we are going to test whether or not a power distribution or another Kumaraswamy based distribution will fit the data better. In other words, is this sample more likely to come from a population that is modeled with a power distribution or another Kumaraswamy based distribution? 

$$
H_0: a = 1 (Power Distribution) 
H_a: a \neq 1 (Kumaraswamy Distribution)
$$


To conduct our test, we need to evalute the log likelihood of the power distribution at two points: the first is when a is set to 1, which gives us our restricted model and the second is when a is set to the maximum likelihood estimator, which gives us our unrestricted model. 

The null hypothesis, $H_0$:
$$
a = 1 \implies \ell(1) = 0
$$

The alternative hypothesis, $H_a$: 
$$
a=\hat{a} \implies \ell(\hat{a}) = n \ln \hat{a} + (\hat{a}-1) \sum_{i=1}^n \ln x_i
$$

The test statistic will be:
$$
LR = 2[\ell(\hat{a}) - \ell(1)] = 2 \left[ n \ln \hat{a} + (\hat{a}-1) \sum_{i=1}^n \ln x_i \right]
$$

There are several important things to note here: The first is that this test will work better if the sample size is large enough. The second is that if the sample size is large enough, then the test statistic will asymptotically follow a $\chi^2$ distribution with 1 degree of freedom. The the third is that this is a two tailed test. Which means that there are two rejection regions: if the $\chi^2$ statistic you obtain is less than the critical value associated with $\frac{\alpha}{2}$ or greater than the critical value associated with $\{frac{1-\alpha}{2}$.
If $n$ is large enough ($n > 30$), then $LR \sim \chi_1^2$. For example, if the level of significance is 0.05, then you'd reject the null hypothesis if your test statistic was less than 0.000982 (and so on) or greater than 5.023886. But at least the p-value doesn't change for rejection; you reject the null if the p-value is less than the level of significance.

\

## Part B: Numerical Analysis

**All code must be well commented and adhere to best coding practices**

**Working Dataset**: A small reservoir supplies water to a town. During the dry season (50 days), engineers record the fraction of usable storage filled each morning. Values near 0 mean the reservoir is nearly empty; values near 1 mean it's full. The distribution tends to be right‑skewed (mostly low levels due to drought) but with occasional replenishment.

The following 50 data points (ordered for clarity) represent the daily proportion of usable storage:

```
0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78
```

```{r}
# Putting Data Into An Object

water <- c(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)
```

\

### **Problem B1**:

Fit the Kumaraswamy distribution to the above data. Use the derivations in **Problem A2** to find the MLE of $a$ and $b$. Please copy the key formulas before coding.

```{r}
# Manual MLE implementation for Kumaraswamy distribution

# Kumaraswamy log-likelihood function
kuma.loglik <- function(params, data) {
  shape1 <- params[1]  # passing the parameters
  shape2 <- params[2]

n <- length(data)   # sample size
  
# Log-likelihood for Kumaraswamy distribution
loglik <- n*log(shape1) + n*log(shape2) + (shape1 - 1)*sum(log(data)) + 
            (shape2 - 1)*sum(log(1 - data^shape1)) 
  
  return(loglik)    # Return negative for minimization
}


## Score (gradient) equation
kuma.score <- function(params, data) {
  shape1 <- params[1]
  shape2 <- params[2]
  n <- length(data)
  
# Gradient for Parameter a
  grad_shape1 <- n/shape1 + sum(log(data)) -
                (shape2 - 1)*sum((log(data)*data^shape1)/(1-data^shape1))
  
# Gradient for Parameter b
  grad_shape2 <- (n/shape2) + sum(log(1-data^shape1))
                
  
  return(c(grad_shape1, grad_shape2))
}


# Need to provide initial values for parameters
initial.params <- c(shape1 = 2.2, shape2 = 1.5)  # Reasonable starting values

# Using optim with Nelder-Mead method
mle.result.kuma <- optim(
  par = initial.params,
  fn = kuma.loglik,
  gr = kuma.score,
  data = water,
  method = "L-BFGS-B",
  hessian = TRUE,
  control = list(trace = FALSE,
                 fnscale = -1,
                 maxit = 500,
                 abstol = 1e-8)
)
##
mle.result.kuma$par
```

According to the output, the first shape parameter, a, is equal to
2.529602, and the second shape parameter, b, is equal to 7.883386.

For future use, the estimates will be extracted and put into objects.

```{r}
# Extracting the Parameter Estimates for Future Use

shape1.est <- mle.result.kuma$par[1]
shape2.est <- mle.result.kuma$par[2]
kuma.est <- c(shape1.est, shape2.est)
kuma.est
```

\

### **Problem B2**:

Fit the **power distribution** to the above data using the derived  of $a$ obtained in **Problem A4** to test the following hypothesis using likelihood ratio procedure ar significance level $\alpha = 0.05$:

$$
H_0: b = 1 \quad \text{ versus } \quad H_a: b \ne 1.
$$

State the statistical decision clearly. What is the practical implication of the testing result?

The hypotheses of our test are as follows:
$$ 
H_0: b = 1 (Kumaraswamy becomes a Power Distribution) /
H_a: b \neq 1 (Data follows another Kumaraswamy Distribution)
$$

The following code chunk is for the likelihood ratio test:

```{r}
# Likelihood Ratio Test For Power vs Kumaraswamy distribution

# MLE Of Power Distribution Parameter from Part A4
n <- length(water)
shape3.mle <- -n / sum(log(water))

# Kumaraswamy Log-Likelihood Function
loglike.kuma <- function(data, a, b) {
  n <- length(data)
 n*log(a) + n*log(b) + (a - 1)*sum(log(data)) + 
            (b - 1)*sum(log(1 - data^a)) 
}

# Power Log-Likelihood Function
loglike.power <- function(data, a) { 
  n <- length(data)
  n*log(a) + (a - 1)*sum(log(data))
}

# Unrestricted and Restricted MLEs for LR Test
loglike.alt1 <- loglike.kuma(water, shape1.est, shape2.est)
loglike.null1 <- loglike.power(water, shape3.mle)

# Likelihood ratio test statistic
LR1 <- 2*(loglike.alt1 - loglike.null1)

# p-value from chi-square(1)
pvalue.LR1 <- 1 - pchisq(LR1, df = 1)

# Test Statistic and P-value
LR1
pvalue.LR1
```
According to the output, the test statistic is $\chi^2 = 48.92533$, and the corresponding p-value is $2.65x10^-12$. Since the p-value is less than the level of significance, we can reject the null hypothesis and conclude that there is sufficient evidence to conclude that this data comes from another Kumaraswamy based distribution. To put it in practical terms, the power distribution is not a good fit for this data, meaning this sample is unlikely to come from a population modeled by a power distribution. This sample more than likely comes from a different Kumaraswamy based distribution.

\

### **Problem B3**:

Use the procedure and code from **Problem B1** to estimate the MLEs of $a$ and $b$, and then complete the following analyses:

(1). Obtain the bootstrap sampling distributions of $\hat{a}$ and $\hat{b}$ and plot each distribution using **Gaussian kernel density curves**.

First, let's get our bootstrap samples:
```{r}
# Bootstrap Sampling Distributions of Kumaraswamy MLEs

# Set Seed to Keep Same Sample
set.seed(1)

# Choose Number of Bootstrap Samples 
B <- 5000

# Store bootstrap estimates
boot.a <- NULL
boot.b <- NULL

for(i in 1:B) {
  
  # Bootstrap sample
  bootstrap.sample <- sample(water, size = n, replace = TRUE)
  
  # Refit Kumaraswamy model to bootstrap sample
  boot.fit <- optim(
    par = c(shape1.est, shape2.est),
    fn = kuma.loglik,
    gr = kuma.score,
    data = bootstrap.sample,
    method = "L-BFGS-B",
    lower = c(1e-8, 1e-8),
    control = list(
      maxit = 1000,
      fnscale = -1,
      trace = FALSE,
      abstol = 1e-8
    )
  )
  
  # Save bootstrap estimates
  boot.a[i] <- boot.fit$par[1]
  boot.b[i] <- boot.fit$par[2]
}
```

Let's plot both of the sampling distributions for the estimates of a
```{r}
# Plot bootstrap distribution of a_hat
hist(boot.a,
     probability = TRUE,
     breaks = 20, 
     col = "lightblue",
     main = "Bootstrap Sampling Distribution for Parameter a",
     xlab = expression(paste("Estimates for ", hat(a))))
lines(density(boot.a), col = "red", lwd = 2)

# Plot bootstrap distribution of b_hat
hist(boot.b,
     probability = TRUE,
     breaks = 20, 
     col = "lightblue",
     main = "Bootstrap Sampling Distribution for Parameter b",
     xlab = expression(paste("Estimates for ", (hat(b)))))
     lines(density(boot.b), col = "red", lwd = 2)
```
The plot for $\hat{a}$ appears to roughly resemble a normal curve. The kernel density curve also does fit relatively well. As for the plot for $\hat{b}$, it does not resemble a normal curve. There is some very obvious skew.

If we take a look at a summary of the distributions, we can clearly see that there is not too much spread for the distribution of $\hat{a}$. And the mean is relatively close to the median. But there is much more spread in the distribution for $\hat{b}$, and mean is farther from the median.
```{r}
summary(boot.a)
summary(boot.b)
```


(2).  Construct both the $95\%$ **bootstrap confidence interval** and the **Wald confidence interval** for $b$. Do these intervals agree with the results obtained in **Problem B2**? [*Compute the standard error of $\hat{b}$ using the observed Fisher information matrix, i.e., the inverse of the negative Hessian obtained from optim()*]

```{r}
# Find Bootstrap and Wald Confidence Intervals for b

# 95% Percentile Bootstrap Confidence Interval 
bootci.perc.b <- quantile(boot.b, probs = c(0.025, 0.975))

# Pieces to Build 95% Wald Confidence Interval
hessian <- mle.result.kuma$hessian
fisher <- solve(-hessian)
se.b.est <- sqrt(fisher[2, 2])
z <- qnorm(0.975)
error <- z*se.b.est

# Build Wald Confidence Interval
waldci.b <- c(shape2.est - error, shape2.est + error)


bootci.perc.b
waldci.b
```
The 95% Bootstrap Confidence Interval is (5.0066, 16.1889), and the 95% Wald confidence interval is (3,.484, 12.2827). Neither confidence interval contains 1, which suggests that the results agree with what we got in Part B2. 


(3). Based on the bootstrap sampling distributions from part (1) of this problem, assess whether the validity of the Wald confidence interval is supported.

Since the wald confidence interval is based on asymptotic normality, the distribution should resemble a normal distribution. However, it did not. There was considerable skew, and the summary pointed out that there is considerable spread in the distribution. Based on this criteria, we can conclude that the Wald confidence interval is not a good confidence interval to use in this case.

\

### **Problem B4**:

In the introduction to the working model for this exam, the Kumaraswamy distribution reduces to the uniform distribution on (0,1). In this problem, we perform a **likelihood ratio test** for the following hypothesis to assess whether the data come from the uniform distribution on (0,1):

$$
H_0: a = 1\quad \& \quad b = 1\quad \text{ versus } \quad H_a: a \ne 1 \quad \text{or} \quad b \ne 1 \quad \text{or}\quad (a \ne 1 \quad \& \quad b \ne 1).
$$

Provide a practical interpretation of the above test result. [*Hint: $H_a$ basically says that there is no constraints for $a$ and $b$. Please review the lecture note for module 11  on the likelihood ratio test before coding.*]

```{r}
# Likelihood Ratio test for Uniform vs Kumaraswamy Distribution

# Using Kumaraswamy Likelihood From Part B2 
loglike.alt2 <- loglike.kuma(water, shape1.est, shape2.est)
loglike.null2 <- loglike.kuma(water, 1, 1)

# Likelihood ratio test statistic
LR2 <- 2*(loglike.alt2 - loglike.null2)

# p-value from Chi-Square(1)
pvalue.LR2 <- 1 - pchisq(LR2, df = 1)

# Test Statistic and P-value
LR2
pvalue.LR2
```
The test statistic is $\chi^2 = 49.12542$, and the corresponding p-value is $2.4x10^{-12}$. Since the p-value is smaller than 0.05, then we can reject the null hypothesis, and can conclude that there is sufficient evidence to conclude that this data comes from another Kumaraswamy based distribution. In other words, this sample is unlikely to have originated from a population modeled by a uniform distribution. To put into practical terms, the measurements are all not equally likely to occur.



