Final Exam Guidelines

  • Coverage: The major concepts and inference procedures—such as sampling distributions, confidence intervals, and hypothesis testing—are covered and implemented using both classical parametric likelihood-based methods and modern non-parametric approaches, including the bootstrap and kernel density estimation.

  • Part A requires derivation of selected likelihood-based functions for performing various types of inference, with sufficient detail to enable translation of these derivations into code for numerical analysis.

  • Your code for the problems in Part B must align with your derivations in Part A and be well commented where necessary.

  • In Part B, all numerical results must be interpreted from a practical perspective.


Policies of Using AI Tools

  • Policy on AI Tool Use: Students must adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

  • Code Inclusion Requirement: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.


Working Model for the Final Exam

Caution: Please follow the suggested expressions and guided steps to complete the exam. Other approaches such as transformation for trivialize the problems that will not meet the exam objectives.

The Kumaraswamy distribution is a two-parameter continuous probability distribution defined on the interval (0, 1). It is often used as an alternative to the Beta distribution due to its simple closed-form expressions for the cumulative distribution function (CDF) and quantile function. It is commonly used in

  • Hydrology: Modeling rainfall, streamflow, or other bounded natural phenomena

  • Economics: Income shares, proportions, or bounded indices

  • Monte Carlo simulation: Efficient random variate generation (via inverse transform)

  • Machine learning: Output layer for bounded targets, prior distributions in Bayesian models

  • Reliability engineering: Modeling failure rates of systems with bounded lifetimes


Let \(X\) be the Kumaraswamy random variable with Cumulative Distribution Function (CDF)

\[ F(x; a, b) = 1 - (1 - x^a)^b \]

where \(a > 0\) and \(b > 0\) unknown parameters and \(0 < x < 1\).

The following are two special case of the Kumaraswamy distribution:

  1. Uniform Distribution: When \(a = 1\) and \(b = 1\), the Kumaraswamy distribution becomes a uniform distribution over \([0, 1]\) with CDF \(F(x) = x\).

  2. Power Distribution: when \(b = 1\) and \(a > 0\), the Kumaraswamy distribution becomes a power distribution over \([0, 1]\) with CDF \(F(x) = x^a\).

This final exam focuses on inferences of Kumaraswamy distribution and related data analysis.

Part A: Methodological Derivations


Show that the density function of the Kumaraswamy distribution is

\[ f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}. \]

Given the cumulative distribution function (CDF): \[ F(x; a,b) = 1 - (1 - x^a)^b \]

Differentiate with respect to (x): \[ f(x; a,b) = \frac{d}{dx}F(x) \] Applying the chain rule:

\[ f(x) = b(1 - x^a)^{b-1} \cdot ax^{a-1} \] \[ f(x) = ab , x^{a-1}(1 - x^a)^{b-1} \]

\[ f(x; a,b) = ab \cdot x^{a-1}(1 - x^a)^{b-1} \] ————————————————————————

Problem A2:

Let \(\{x_1, x_2, \cdots, x_n \}\) be an i.i.d. random sample taken from a population that follows the aove 2-parameter Kumaraswamy distribution. Write out the loglikelihood function of \(a\) and \(b\), denoted by \(\ell(a,b)\), based on the above random sample and derive the gradient vector \([\ell_a^\prime(a,b), \ell_b^\prime(a,b)]\), the first order partial derivative of the log-likelihood with respect to parameters \(a\) and \(b\).

Given a sample (x_1, …, x_n), the likelihood is:

\[ L(a,b) = \prod_{i=1}^n ab , x_i^{a-1}(1 - x_i^a)^{b-1} \]

Taking logs:

\[ \ell(a,b) = n\log a + n\log b + (a-1)\sum \log x_i + (b-1)\sum \log(1 - x_i^a) \]

Gradient with respect to (a):

\[ \frac{\partial \ell}{\partial a} = \frac{n}{a} + \sum \log x_i - (b-1)\sum \frac{x_i^a \log x_i}{1 - x_i^a} \]

Gradient with respect to (b):

\[ \frac{\partial \ell}{\partial b} = \frac{n}{b} + \sum \log(1 - x_i^a) \] ————————————————————————


Problem A3:

Based on the gradients functions obtained in the above problem A2, derive the observed Fisher Information matrix (i.e, the negative Hessian Matrix).

The observed Fisher information is defined as:

\[ \mathcal{I}(a,b) = - \nabla^2 \ell(a,b) \]

The second derivatives are:

\[ \frac{\partial^2 \ell}{\partial a^2} = -\frac{n}{a^2} \]

\[ \frac{\partial^2 \ell}{\partial b^2} = -\frac{n}{b^2} \]

\[ \frac{\partial^2 \ell}{\partial a \partial b} = \sum \frac{-x_i^a \log x_i}{1-x_i^a} \]



Problem A4:

Consider power distribution \(F(x) = x^a, (a >0 \quad \text{ and }\quad x \in (0,1))\), a special case of the Kumaraswamy distribution with \(b = 1\), and a random sample from this distribution \(\{ x_1, x_2, \cdots, x_n\}\). Derive the MLE and MME of \(a\) respectively. [Hint: To find the MME, you need to compute the moment of the power distribution; that is, \(E[X^k] = \int_0^1 x^k F'(x) dx\). Note that both the MLE and the MME have closed-form expressions.]

CDF: \[ f(x)=x^a \]

PDF: \[ f(x; a,1) = a x^{a-1} \]

MLE of : \[ \hat{a} = -\frac{n}{\sum \log x_i} \]

Method of Moments of : \[ \boxed{\hat{a}_{MM} = \frac{\bar{x}}{1-\bar{x}}} \] ————————————————————————

Problem A5:

Using the same setting as in Problem A4, find the asymptotic (Wald) confidence interval for \(a\). [Hint: Compute the Fisher information for \(a\), then take its reciprocal to obtain the variance.]

For the power distribution, the log-likelihood is

\[ \ell(a)=n\log(a)+(a-1)\sum_{i=1}^n\log(x_i). \]

The second derivative is

\[ \ell''(a)=-\frac{n}{a^2}. \]

Therefore, the Fisher information is

\[ I(a)=-\ell''(a)=\frac{n}{a^2}. \]

The estimated variance of \(\hat a\) is the reciprocal of the observed Fisher information evaluated at \(\hat a\):

\[ {Var}(\hat a)=\frac{ a^2}{n}. \]

Thus, the estimated standard error is

\[ SE(\hat a)=\frac{\hat a}{\sqrt n}. \]

The 95% Wald confidence interval for \(a\) is

\[ \boxed{\hat a \pm 1.96\frac{\hat a}{\sqrt n}}. \] ————————————————————————


Problem A6:

Using the same setting as in Problem A4, perform a likelihood ratio test for the hypothesis \(H_0 :a=1\) (i.e., the power distribution reduces to a uniform distribution). [Hint: Evaluate the log-likelihood function at the maximum likelihood estimate \(\hat{a}\) and at \(a=1\), then use these values to construct the LRT test statistic.]

The parameter (a) controls behavior near 0, while (b) controls behavior near 1. Together, they determine the skewness and shape of the distribution over ([0,1]).

We test

\[ H_0:a=1 \qquad \text{versus} \qquad H_a:a\ne 1. \]

For the power distribution, the likelihood ratio test statistic is

\[ G^2=2\left[\ell(\hat a)-\ell(a_0)\right], \]

where \(a_0=1\). Under \(H_0\), the asymptotic reference distribution is

\[ G^2 \sim \chi^2_1. \]

The decision rule at \(\alpha=0.05\) is to reject \(H_0\) if

\[ \boxed{G^2>\chi^2_{0.95,1}}. \]


Part B: Numerical Analysis

All code must be well commented and adhere to best coding practices

Working Dataset: A small reservoir supplies water to a town. During the dry season (50 days), engineers record the fraction of usable storage filled each morning. Values near 0 mean the reservoir is nearly empty; values near 1 mean it’s full. The distribution tends to be right‑skewed (mostly low levels due to drought) but with occasional replenishment.

The following 50 data points (ordered for clarity) represent the daily proportion of usable storage:

0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78


Problem B1:

Fit the Kumaraswamy distribution to the above data. Use the derivations in Problem A2 to find the MLE of \(a\) and \(b\). Please copy the key formulas before coding.

# Dataset
x <- c(0.12,0.14,0.15,0.16,0.17,0.18,0.19,0.20,0.21,0.22,
       0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.30,0.31,0.32,
       0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42,
       0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.50,0.51,0.52,
       0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.60,0.61,0.78)

# Sample size
n <- length(x)

# Negative log-likelihood for Kumaraswamy distribution
loglik_kumar <- function(par) {
  
  # Extract parameters numerically
  a <- as.numeric(par[1])
  b <- as.numeric(par[2])
  
  # Parameter restrictions
  if (a <= 0 || b <= 0) {
    return(Inf)
  }
  
  # Log-likelihood
  ll <- n * log(a) +
        n * log(b) +
        (a - 1) * sum(log(x)) +
        (b - 1) * sum(log(1 - x^a))
  
  # Return negative log-likelihood for minimization
  return(-ll)
}

# Optimize
fit <- optim(
  par = c(1, 1),
  fn = loglik_kumar,
  hessian = TRUE
)

# MLEs
a_hat <- fit$par[1]
b_hat <- fit$par[2]

# Output
a_hat
[1] 2.529599
b_hat
[1] 7.883367


Problem B2:

Fit the power distribution to the above data using the derived of \(a\) obtained in Problem A4 to test the following hypothesis using likelihood ratio procedure ar significance level \(\alpha = 0.05\):

\[ H_0: b = 1 \quad \text{ versus } \quad H_a: b \ne 1. \]

State the statistical decision clearly. What is the practical implication of the testing result?

# Step 1: Fit power distribution (restricted model)
a_hat_power <- -n / sum(log(x))

# Step 2: Log-likelihoods
ll_full <- -fit$value

ll_restricted <- n*log(a_hat_power) +
                 (a_hat_power - 1)*sum(log(x))

# Step 3: LRT statistic
LRT <- 2 * (ll_full - ll_restricted)
p_value <- 1 - pchisq(LRT, df = 1)

cat("B2 Results:\n")
B2 Results:
cat("LRT statistic =", LRT, "\n")
LRT statistic = 48.92533 
cat("p-value =", p_value, "\n")
p-value = 2.658984e-12 
if (p_value < 0.05) {
  cat("Decision: Reject H0 (b ≠ 1)\n\n")
} else {
  cat("Decision: Fail to reject H0 (b = 1 is adequate)\n\n")
}
Decision: Reject H0 (b ≠ 1)


Problem B3:

Use the procedure and code from Problem B1 to estimate the MLEs of \(a\) and \(b\), and then complete the following analyses:

(1). Obtain the bootstrap sampling distributions of \(\hat{a}\) and \(\hat{b}\) and plot each distribution using Gaussian kernel density curves.

set.seed(123)
B <- 1000

a_boot <- numeric(B)
b_boot <- numeric(B)

for (i in 1:B) {
  
  # Resample with replacement
  xb <- sample(x, size = n, replace = TRUE)
  
  # Define log-likelihood for bootstrap sample
  loglik_boot <- function(par) {
    a <- par[1]
    b <- par[2]
    
    if (a <= 0 || b <= 0) return(Inf)
    
    ll <- length(xb)*log(a) + length(xb)*log(b) +
          (a - 1)*sum(log(xb)) +
          (b - 1)*sum(log(1 - xb^a))
    
    return(-ll)
  }
  
  fit_b <- optim(c(a_hat, b_hat), loglik_boot)
  
  a_boot[i] <- fit_b$par[1]
  b_boot[i] <- fit_b$par[2]
}

# Plot densities
plot(density(a_boot), main = "Bootstrap Density of a_hat")

plot(density(b_boot), main = "Bootstrap Density of b_hat")

(2). Construct both the \(95\%\) bootstrap confidence interval and the Wald confidence interval for \(b\). Do these intervals agree with the results obtained in Problem B2? [Compute the standard error of \(\hat{b}\) using the observed Fisher information matrix, i.e., the inverse of the negative Hessian obtained from optim()]

# Bootstrap percentile CI
boot_ci_b <- quantile(b_boot, c(0.025, 0.975))

# Wald CI using Hessian
H <- fit$hessian
vcov_matrix <- solve(H)

se_b <- sqrt(vcov_matrix[2,2])

wald_ci_b <- c(b_hat - 1.96*se_b,
               b_hat + 1.96*se_b)

cat("B3 Results:\n")
B3 Results:
cat("Bootstrap CI for b:\n")
Bootstrap CI for b:
print(boot_ci_b)
     2.5%     97.5% 
 5.127115 16.304358 
cat("\nWald CI for b:\n")

Wald CI for b:
print(wald_ci_b)
[1]  3.483944 12.282791

(3). Based on the bootstrap sampling distributions from part (1) of this problem, assess whether the validity of the Wald confidence interval is supported.

plot(density(b_boot),
     main = "Bootstrap Distribution of b_hat")
abline(v = wald_ci_b, col = "red", lty = 2)
abline(v = 1, col = "blue", lty = 2)

legend("topright",
       legend = c("Wald CI", "b = 1"),
       col = c("red", "blue"),
       lty = 2)


Problem B4:

In the introduction to the working model for this exam, the Kumaraswamy distribution reduces to the uniform distribution on (0,1). In this problem, we perform a likelihood ratio test for the following hypothesis to assess whether the data come from the uniform distribution on (0,1):

\[ H_0: a = 1\quad \& \quad b = 1\quad \text{ versus } \quad H_a: a \ne 1 \quad \text{or} \quad b \ne 1 \quad \text{or}\quad (a \ne 1 \quad \& \quad b \ne 1). \]

Provide a practical interpretation of the above test result. [Hint: \(H_a\) basically says that there is no constraints for \(a\) and \(b\). Please review the lecture note for module 11 on the likelihood ratio test before coding.]


# Likelihood under the null hypothesis: degenerate at 1
likelihood_null <- function(data) {
  if(all(data == 1)) {
    return(1)
  } else {
    return(0)
  }
}

# Likelihood under the alternative hypothesis: uniform on observed range
likelihood_alt <- function(data) {
  a_hat <- min(data)
  b_hat <- max(data)
  n <- length(data)
  
  # Check if data are within [a_hat, b_hat]
  if(any(data < a_hat) || any(data > b_hat)) {
    return(0)
  }
  
  # Likelihood for uniform distribution
  likelihood <- (1 / (b_hat - a_hat))^n
  return(likelihood)
}

# Calculate likelihoods
L_null <- likelihood_null(x)
L_alt <- likelihood_alt(x)

# Calculate likelihood ratio
if(L_null == 0) {
  LRT <- Inf  # Data incompatible with null
} else {
  LRT <- L_alt / L_null
}

# Output the results
cat("Likelihood under null (degenerate at 1):", L_null, "\n")
Likelihood under null (degenerate at 1): 0 
cat("Likelihood under alternative:", L_alt, "\n")
Likelihood under alternative: 1053909266 
cat("Likelihood Ratio (LRT):", LRT, "\n")
Likelihood Ratio (LRT): Inf 

Since the data clearly are not all 1, the likelihood under the null will be zero, making the likelihood ratio tend to infinity or undefined. This indicates strong evidence against the null hypothesis that the reservoir is always full.**

Note: Please download the template and insert your work into it to complete the exam.

---
title: "STA 506 Final Examination"
author: "Kieran Hefferan"
date: 'May 5, 2026 '
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    highlight: monochrome
    theme: spacelab
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: no
    fig_width: 3
    fig_height: 3
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
editor_options: 
  chunk_output_type: inline
  markdown: 
    wrap: 72
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}
####
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```

\

## **Final Exam Guidelines**

-   **Coverage**: The major concepts and inference procedures—such as
    sampling distributions, confidence intervals, and hypothesis
    testing—are covered and implemented using both classical parametric
    likelihood-based methods and modern non-parametric approaches,
    including the bootstrap and kernel density estimation.

-   **Part A** requires derivation of selected likelihood-based
    functions for performing various types of inference, with sufficient
    detail to enable translation of these derivations into code for
    numerical analysis.

-   Your code for the problems in **Part B** must align with your
    derivations in **Part A** and be well commented where necessary.

-   In **Part B**, all numerical results must be interpreted from a
    practical perspective.

\

## **Policies of Using AI Tools**

-   **Policy on AI Tool Use**: Students must adhere to the AI tool
    policy specified in the course syllabus. The direct copying of
    AI-generated content is strictly prohibited. All submitted work must
    reflect your own understanding; where external tools are consulted,
    content must be thoroughly rephrased and synthesized in your own
    words.

-   **Code Inclusion Requirement**: Any code included in your essay must
    be properly commented to explain the purpose and/or expected output
    of key code lines. Submitting AI-generated code without meaningful,
    student-added comments will not be accepted.

\

## **Working Model for the Final Exam**

<font color = "orange">**Caution**: *Please follow the suggested
expressions and guided steps to complete the exam. Other approaches such
as transformation for trivialize the problems that will not meet the
exam objectives.*</font>

The **Kumaraswamy distribution** is a two-parameter continuous
probability distribution defined on the interval (0, 1). It is often
used as an alternative to the Beta distribution due to its simple
closed-form expressions for the cumulative distribution function (CDF)
and quantile function. It is commonly used in

-   **Hydrology**: Modeling rainfall, streamflow, or other bounded
    natural phenomena

-   **Economics**: Income shares, proportions, or bounded indices

-   **Monte Carlo simulation**: Efficient random variate generation (via
    inverse transform)

-   **Machine learning**: Output layer for bounded targets, prior
    distributions in Bayesian models

-   **Reliability engineering**: Modeling failure rates of systems with
    bounded lifetimes

\

Let $X$ be the Kumaraswamy random variable with Cumulative Distribution
Function (CDF)

$$
F(x; a, b) = 1 - (1 - x^a)^b
$$

where $a > 0$ and $b > 0$ unknown parameters and $0 < x < 1$.

The following are two special case of the Kumaraswamy distribution:

1.  **Uniform Distribution**: When $a = 1$ and $b = 1$, the Kumaraswamy
    distribution becomes a uniform distribution over $[0, 1]$ with CDF
    $F(x) = x$.

2.  **Power Distribution**: when $b = 1$ and $a > 0$, the Kumaraswamy
    distribution becomes a power distribution over $[0, 1]$ with CDF
    $F(x) = x^a$.

This final exam focuses on inferences of Kumaraswamy distribution and
related data analysis.

## Part A: Methodological Derivations

\

Show that the density function of the Kumaraswamy distribution is

$$
f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}.
$$

Given the cumulative distribution function (CDF): 
$$
F(x; a,b) = 1 - (1 - x^a)^b
$$

Differentiate with respect to (x): 
$$
f(x; a,b) = \frac{d}{dx}F(x)
$$ 
Applying the chain rule:

$$
f(x) = b(1 - x^a)^{b-1} \cdot ax^{a-1}
$$
$$
f(x) = ab , x^{a-1}(1 - x^a)^{b-1} 
$$ 

$$
f(x; a,b) = ab \cdot x^{a-1}(1 - x^a)^{b-1}
$$
------------------------------------------------------------------------
\

## **Problem A2**: 
Let $\{x_1, x_2, \cdots, x_n \}$ be an i.i.d. random sample taken from a population that follows the aove 2-parameter Kumaraswamy distribution. Write out the loglikelihood function of $a$ and $b$, denoted by $\ell(a,b)$, based on the above random sample and **derive** the gradient vector $[\ell_a^\prime(a,b), \ell_b^\prime(a,b)]$, the first order partial derivative of the log-likelihood with respect to parameters $a$ and $b$.

Given a sample (x_1, ..., x_n), the likelihood is:

$$
L(a,b) = \prod_{i=1}^n ab , x_i^{a-1}(1 - x_i^a)^{b-1}
$$

Taking logs:

$$
\ell(a,b) = n\log a + n\log b + (a-1)\sum \log x_i + (b-1)\sum \log(1 - x_i^a)
$$

### Gradient with respect to (a):

$$
\frac{\partial \ell}{\partial a} = \frac{n}{a} + \sum \log x_i - (b-1)\sum \frac{x_i^a \log x_i}{1 - x_i^a}
$$

### Gradient with respect to (b):

$$
\frac{\partial \ell}{\partial b} = \frac{n}{b} + \sum \log(1 - x_i^a)
$$
------------------------------------------------------------------------

\

## **Problem A3**:

Based on the gradients functions obtained in the above problem A2,
**derive** the observed Fisher Information matrix (i.e, the negative
Hessian Matrix).

The observed Fisher information is defined as:

$$
\mathcal{I}(a,b) = - \nabla^2 \ell(a,b)
$$

The second derivatives are:

$$
\frac{\partial^2 \ell}{\partial a^2} = -\frac{n}{a^2}
$$

$$
\frac{\partial^2 \ell}{\partial b^2} = -\frac{n}{b^2}
$$

$$
\frac{\partial^2 \ell}{\partial a \partial b}
= \sum \frac{-x_i^a \log x_i}{1-x_i^a}
$$

------------------------------------------------------------------------

\

## **Problem A4**:

Consider power distribution
$F(x) = x^a, (a >0 \quad \text{ and }\quad x \in (0,1))$, a special case
of the Kumaraswamy distribution with $b = 1$, and a random sample from
this distribution $\{ x_1, x_2, \cdots, x_n\}$. **Derive** the MLE and
MME of $a$ respectively. [*Hint: To find the MME, you need to compute
the moment of the power distribution; that is,*
$E[X^k] = \int_0^1 x^k F'(x) dx$. Note that both the MLE and the MME
have closed-form expressions.]


CDF:
$$
f(x)=x^a
$$

PDF:
$$
f(x; a,1) = a x^{a-1}
$$

MLE of \hat{a}: 
$$
\hat{a} = -\frac{n}{\sum \log x_i}
$$

Method of Moments of \hat{a}:
$$
\boxed{\hat{a}_{MM} = \frac{\bar{x}}{1-\bar{x}}}
$$
------------------------------------------------------------------------
\

## **Problem A5**:

Using the same setting as in **Problem A4**, find the asymptotic (Wald)
confidence interval for $a$. [*Hint: Compute the Fisher information for*
$a$, then take its reciprocal to obtain the variance.]

For the power distribution, the log-likelihood is

$$
\ell(a)=n\log(a)+(a-1)\sum_{i=1}^n\log(x_i).
$$

The second derivative is

$$
\ell''(a)=-\frac{n}{a^2}.
$$

Therefore, the Fisher information is

$$
I(a)=-\ell''(a)=\frac{n}{a^2}.
$$

The estimated variance of \(\hat a\) is the reciprocal of the observed Fisher information evaluated at \(\hat a\):

$$
{Var}(\hat a)=\frac{ a^2}{n}.
$$

Thus, the estimated standard error is

$$
SE(\hat a)=\frac{\hat a}{\sqrt n}.
$$

The 95\% Wald confidence interval for \(a\) is

$$
\boxed{\hat a \pm 1.96\frac{\hat a}{\sqrt n}}.
$$
------------------------------------------------------------------------

\

## **Problem A6**:

Using the same setting as in **Problem A4**, perform a likelihood ratio
test for the hypothesis $H_0 :a=1$ (i.e., the power distribution reduces
to a uniform distribution). [*Hint: Evaluate the log-likelihood function
at the maximum likelihood estimate* $\hat{a}$ and at $a=1$, then use
these values to construct the LRT test statistic.]

The parameter (a) controls behavior near 0, while (b) controls behavior
near 1. Together, they determine the skewness and shape of the
distribution over ([0,1]).

We test

$$
H_0:a=1 \qquad \text{versus} \qquad H_a:a\ne 1.
$$

For the power distribution, the likelihood ratio test statistic is

$$
G^2=2\left[\ell(\hat a)-\ell(a_0)\right],
$$

where \(a_0=1\). Under \(H_0\), the asymptotic reference distribution is

$$
G^2 \sim \chi^2_1.
$$

The decision rule at \(\alpha=0.05\) is to reject \(H_0\) if

$$
\boxed{G^2>\chi^2_{0.95,1}}.
$$

------------------------------------------------------------------------

## Part B: Numerical Analysis

**All code must be well commented and adhere to best coding practices**

**Working Dataset**: A small reservoir supplies water to a town. During
the dry season (50 days), engineers record the fraction of usable
storage filled each morning. Values near 0 mean the reservoir is nearly
empty; values near 1 mean it's full. The distribution tends to be
right‑skewed (mostly low levels due to drought) but with occasional
replenishment.

The following 50 data points (ordered for clarity) represent the daily
proportion of usable storage:

```         
0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78
```

\

### **Problem B1**:

Fit the Kumaraswamy distribution to the above data. Use the derivations
in **Problem A2** to find the MLE of $a$ and $b$. Please copy the key
formulas before coding.

```{r}
# Dataset
x <- c(0.12,0.14,0.15,0.16,0.17,0.18,0.19,0.20,0.21,0.22,
       0.23,0.24,0.25,0.26,0.27,0.28,0.29,0.30,0.31,0.32,
       0.33,0.34,0.35,0.36,0.37,0.38,0.39,0.40,0.41,0.42,
       0.43,0.44,0.45,0.46,0.47,0.48,0.49,0.50,0.51,0.52,
       0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.60,0.61,0.78)

# Sample size
n <- length(x)

# Negative log-likelihood for Kumaraswamy distribution
loglik_kumar <- function(par) {
  
  # Extract parameters numerically
  a <- as.numeric(par[1])
  b <- as.numeric(par[2])
  
  # Parameter restrictions
  if (a <= 0 || b <= 0) {
    return(Inf)
  }
  
  # Log-likelihood
  ll <- n * log(a) +
        n * log(b) +
        (a - 1) * sum(log(x)) +
        (b - 1) * sum(log(1 - x^a))
  
  # Return negative log-likelihood for minimization
  return(-ll)
}

# Optimize
fit <- optim(
  par = c(1, 1),
  fn = loglik_kumar,
  hessian = TRUE
)

# MLEs
a_hat <- fit$par[1]
b_hat <- fit$par[2]

# Output
a_hat
b_hat
```

\

### **Problem B2**:

Fit the **power distribution** to the above data using the derived of
$a$ obtained in **Problem A4** to test the following hypothesis using
likelihood ratio procedure ar significance level $\alpha = 0.05$:

$$
H_0: b = 1 \quad \text{ versus } \quad H_a: b \ne 1.
$$

State the statistical decision clearly. What is the practical
implication of the testing result?

```{r}
# Step 1: Fit power distribution (restricted model)
a_hat_power <- -n / sum(log(x))

# Step 2: Log-likelihoods
ll_full <- -fit$value

ll_restricted <- n*log(a_hat_power) +
                 (a_hat_power - 1)*sum(log(x))

# Step 3: LRT statistic
LRT <- 2 * (ll_full - ll_restricted)
p_value <- 1 - pchisq(LRT, df = 1)

cat("B2 Results:\n")
cat("LRT statistic =", LRT, "\n")
cat("p-value =", p_value, "\n")

if (p_value < 0.05) {
  cat("Decision: Reject H0 (b ≠ 1)\n\n")
} else {
  cat("Decision: Fail to reject H0 (b = 1 is adequate)\n\n")
}
```

\

### **Problem B3**:

Use the procedure and code from **Problem B1** to estimate the MLEs of
$a$ and $b$, and then complete the following analyses:

(1). Obtain the bootstrap sampling distributions of $\hat{a}$ and
$\hat{b}$ and plot each distribution using **Gaussian kernel density
curves**.

```{r}
set.seed(123)
B <- 1000

a_boot <- numeric(B)
b_boot <- numeric(B)

for (i in 1:B) {
  
  # Resample with replacement
  xb <- sample(x, size = n, replace = TRUE)
  
  # Define log-likelihood for bootstrap sample
  loglik_boot <- function(par) {
    a <- par[1]
    b <- par[2]
    
    if (a <= 0 || b <= 0) return(Inf)
    
    ll <- length(xb)*log(a) + length(xb)*log(b) +
          (a - 1)*sum(log(xb)) +
          (b - 1)*sum(log(1 - xb^a))
    
    return(-ll)
  }
  
  fit_b <- optim(c(a_hat, b_hat), loglik_boot)
  
  a_boot[i] <- fit_b$par[1]
  b_boot[i] <- fit_b$par[2]
}

# Plot densities
plot(density(a_boot), main = "Bootstrap Density of a_hat")
plot(density(b_boot), main = "Bootstrap Density of b_hat")
```

(2). Construct both the $95\%$ **bootstrap confidence interval** and the
**Wald confidence interval** for $b$. Do these intervals agree with the
results obtained in **Problem B2**? [*Compute the standard error of*
$\hat{b}$ using the observed Fisher information matrix, i.e., the
inverse of the negative Hessian obtained from optim()]

```{r}
# Bootstrap percentile CI
boot_ci_b <- quantile(b_boot, c(0.025, 0.975))

# Wald CI using Hessian
H <- fit$hessian
vcov_matrix <- solve(H)

se_b <- sqrt(vcov_matrix[2,2])

wald_ci_b <- c(b_hat - 1.96*se_b,
               b_hat + 1.96*se_b)

cat("B3 Results:\n")
cat("Bootstrap CI for b:\n")
print(boot_ci_b)

cat("\nWald CI for b:\n")
print(wald_ci_b)

```

(3). Based on the bootstrap sampling distributions from part (1) of this
problem, assess whether the validity of the Wald confidence interval is
supported.

```{r}
plot(density(b_boot),
     main = "Bootstrap Distribution of b_hat")
abline(v = wald_ci_b, col = "red", lty = 2)
abline(v = 1, col = "blue", lty = 2)

legend("topright",
       legend = c("Wald CI", "b = 1"),
       col = c("red", "blue"),
       lty = 2)
```

\

### **Problem B4**:

In the introduction to the working model for this exam, the Kumaraswamy
distribution reduces to the uniform distribution on (0,1). In this
problem, we perform a **likelihood ratio test** for the following
hypothesis to assess whether the data come from the uniform distribution
on (0,1):

$$
H_0: a = 1\quad \& \quad b = 1\quad \text{ versus } \quad H_a: a \ne 1 \quad \text{or} \quad b \ne 1 \quad \text{or}\quad (a \ne 1 \quad \& \quad b \ne 1).
$$

Provide a practical interpretation of the above test result. [*Hint:*
$H_a$ basically says that there is no constraints for $a$ and $b$.
Please review the lecture note for module 11 on the likelihood ratio
test before coding.]

\

```{r}
# Likelihood under the null hypothesis: degenerate at 1
likelihood_null <- function(data) {
  if(all(data == 1)) {
    return(1)
  } else {
    return(0)
  }
}

# Likelihood under the alternative hypothesis: uniform on observed range
likelihood_alt <- function(data) {
  a_hat <- min(data)
  b_hat <- max(data)
  n <- length(data)
  
  # Check if data are within [a_hat, b_hat]
  if(any(data < a_hat) || any(data > b_hat)) {
    return(0)
  }
  
  # Likelihood for uniform distribution
  likelihood <- (1 / (b_hat - a_hat))^n
  return(likelihood)
}

# Calculate likelihoods
L_null <- likelihood_null(x)
L_alt <- likelihood_alt(x)

# Calculate likelihood ratio
if(L_null == 0) {
  LRT <- Inf  # Data incompatible with null
} else {
  LRT <- L_alt / L_null
}

# Output the results
cat("Likelihood under null (degenerate at 1):", L_null, "\n")
cat("Likelihood under alternative:", L_alt, "\n")
cat("Likelihood Ratio (LRT):", LRT, "\n")
```

Since the data clearly are not all 1, the likelihood under the null will
be zero, making the likelihood ratio tend to infinity or undefined. This
indicates strong evidence against the null hypothesis that the reservoir
is always full.\*\*

<font color = "red">**Note**: Please download the template and insert
your work into it to complete the exam. </font>
