Intro and Setup
For this exam, we will be examining and utilizing the
Kumaraswamy distribution. This distribution is
continuous, bounded between 0 and 1, and defined by two parameters
(\(a\) and \(b\)).
With \(X\) denoting a Kumaraswamy
random variable, the CDF of our distribution is the following.
\[\Large F(x; a,b) = 1 - (1-x^a)^b
\]
Two special cases of the Kumaraswamy distribution are the uniform and
power distributions. A distribution is uniform if \(a, b\) = 1, while such is power if \(a > 0, b = 1\).
Part A: Methodological Derivations
A1.)
Given the distribution’s cumulative function, we can use
calculus-based methods to derive the probability density function (PDF).
Said calculations were completed by hand and can be seen below.
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A1.jpg")

Moving forward, we define our density function as the following.
\[\Large f(x; a, b) = ab \, x^{a-1} (1 -
x^a)^{b-1}\]
A2.)
Now having our distribution’s density function, we can denote our
log-likelihood, \(\ell(a,b)\), function
and subsequent gradient vector. The gradient vector is made up of the
partial derivatives of the log-likelihood function, first with respect
to \(a\) then with respect to \(b\). Below are calculations and ultimate
results.
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_3.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_4.jpg")

\[\Large \ell(a,b) = n\ln(a) + n\ln(b) +
\left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a)
\right]\]
\[\Large \ell_a^\prime(a,b) = \frac{n}{a}
+ \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times
\ln(x_i)}{1 - x_i^a} \right]\]
\[\Large \ell_b^\prime(a,b) = \frac{n}{b}
+ \sum\ln(1-x_i^a)\]
A3.)
Then taking the gradient vector components, I was able to calculate
the Fisher information matrix (\(I\)),
which is simply the negative of the Hessian matrix (composed of all
second-derivative extensions of that distribution’s log-likelihood). The
Fisher Information matrix for the Kumaraswamy distribution is below.
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_3.jpg")

A4.)
We now consider the power distribution, a special
and more simplistic case of the Kumaraswamy distribution, which occurs
when the parameter \(b\) = 1. Given the
Kumaraswamy formula, a value of \(b\) =
1 means that the relative density function of the power distribution is
\(ax^{a-1}\). Knowing this, below are
formulas which can be used to estimate the parameter \(a\) via moment or maximum likelihood
estimation (MOM or MLE).
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_3.jpg")

A5.)
Given an estimate of our parameter \(a\), (\(\hat{a}\)), we can setup a general
confidence interval (CI) formula for the value of \(a\). Assuming 95% confidence, therefore a
critical value of 1.96 predicated on the normal distribution, the makeup
of the CI would look like the following.
The variance and standard deviation of the interval were found from
the power distribution’s information matrix. The inverse of such matrix
serves as the variance when applied in single-parameter scenarios.
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_2.jpg")

A6.)
The likelihood ratio test involves the subtraction of the log
likelihood estimate of a simpler model (the uniform distribution in this
case) from the log likelihood estimate of a more complex and more
parameterized model (the power distribution in this scenario). This test
is is meant to measure the improvement in fit that the more complex and
unrestricted model provides.
Below shows the formula for what our test statistic, \(\Lambda\) would equal in this instance.
Since \(\ell(1) = 0\), it is relatively
straightforward. When performing a likelihood ratio test, the
distribution of \(\Lambda\) follows a
chi-squared distribution. Again assuming a 95% confidence level, we
would consider any value of \(\Lambda\)
greater than about 3.841 sufficient evidence to suggest there is value
in using the more complex power distribution.
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_2.jpg")

Part B: Numerical Analysis
# read in data
data = rbind.data.frame(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)
colnames(data) = "storage"
The dataset we will analyze consists of 50 recorded usable storage
proportions of a reservoir in a small town, with a value of 0
representing a basically empty reservoir and a value of 1 representing a
full body of water. Summary statistics and a histogram of the data are
below.
sum_stats_table = function(variable){
table = cbind(
round(min(variable),2),
round(quantile(variable, 0.25),2),
round(median(variable),2),
round(mean(variable),2),
round(quantile(variable, 0.75),2),
round(max(variable),2)
)
colnames(table) = c("Min", "Q1", "Med", "Mean", "Q3", "Max")
rownames(table) = NULL
return(table)
}
Summary_Table = sum_stats_table(variable = data$storage)
kable(Summary_Table,align = "c",
caption = "<span style='color:##000000;'>
Summary Stats of Storage Volumes </span>") %>%
kable_styling(
bootstrap_options = c("striped", "bordered"),
full_width = FALSE,
position = "center")
Summary Stats of Storage Volumes
|
Min
|
Q1
|
Med
|
Mean
|
Q3
|
Max
|
|
0.12
|
0.25
|
0.38
|
0.38
|
0.5
|
0.78
|
ggplot(data = data) +
geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
color = "black",
# argument color in geom_histogram is for color of bin lines
fill = "lightblue",
bins = 10,) +
labs(title = "Distribution of Storage Volumes",
x = "Storage Volume",
y = "Probabiity Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
) # this is the equivalent of box(lwd = 3) in base R

B1.)
Now using the available sample data, and assuming a
Kumaraswamy distribution is applicable in this case, I
will perform MLE to estimate parameters \(a\) and \(b\). Key formulas for such calculation are
the log likelihood and gradient functions seen below.
\[\Large \ell(a,b) = n\ln(a) + n\ln(b) +
\left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a)
\right]\]
\[\Large \ell_a^\prime(a,b) = \frac{n}{a}
+ \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times
\ln(x_i)}{1 - x_i^a} \right]\]
\[\Large \ell_b^\prime(a,b) = \frac{n}{b}
+ \sum\ln(1-x_i^a)\]
## first define log likelihood of Kumaraswamy distribution
log_like_kum = function(parameters, data){
a = parameters[1]
b = parameters[2]
n = length(data)
log_likelihood = n * log(a) +
n * log(b) +
(a-1) * sum(log(data)) +
(b-1) * sum(log(1-data^a))
return(log_likelihood)
}
## gradient functions
gradient_kum = function(parameters, data){
a = parameters[1]
b = parameters[2]
n = length(data)
## partial derivative with respect to a
partial_a = n/a +
sum(log(data)) -
(b-1)*sum((data^a*log(data))/(1-data^a))
## partial derivative with respect to b
partial_b = n/b + sum(log(1-data^a))
return(c(partial_a,
partial_b))
}
## now get an MLE estimate
mle_kum = optim(
par = c(a = 1.5, b = 5),
# tried MANY different starting estimates, continued to adjust based on histogram visual below.
fn = log_like_kum,
gr = gradient_kum,
data = data$storage,
method = "L-BFGS-B",
lower = c(1e-8, 1e-8), # using this method with lower restraints since a and b cannot be negative in a Kumaraswamy distribution
hessian = TRUE,
control = list(trace = FALSE,
fnscale = -1,
maxit = 500,
abstol = 1e-8)
)
mle_kum_a_estimate = mle_kum$par[1] # a
## controls behavior near 0
mle_kum_b_estimate = mle_kum$par[2] # b
## controls behavior approaching 1 / think right tail
Using R’s optim function, and
performing numerous iterations with different starting parameter
estimates of both \(a\) and \(b\), a reasonable estimate for the two
shape parameters are 2.53 and 7.883 respectively. Below is a visual of
our sample data with a Kumaraswamy distribution curve with said
parameter values additionally overlaid.
library(extraDistr) # needed to plot Kumaraswamy distribution curve
## visual confirmation
ggplot(data = data) +
geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
color = "black",
fill = "lightblue",
bins = 10,) +
stat_function(fun = dkumar,
args = list(a = mle_kum_a_estimate, b = mle_kum_b_estimate),
color = "black",
linewidth = 1.5,
xlim = range(0:1))+
labs(title = "Distribution of Storage Volumes",
x = "Storage Volume",
y = "Probabiity Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
)

B2.)
Going back to the power distribution, I once again
performed MLE to approximate the value of the power distribution’s sole
parameter, \(a\).
## same procedure as B1, now use power distribution characteristics
log_like_power = function(a, data){
n = length(data)
log_likelihood = n * log(a) +
(a-1)*sum(log(data))
return(log_likelihood)
}
gradient_power = function(a, data){
n = length(data)
partial_a = n/a + sum(log(data))
return(partial_a)
}
mle_power = optim(
par = 2.6, ## maybe change this later
fn = log_like_power,
gr = gradient_power,
data = data$storage,
method = "L-BFGS-B",
lower = c(1e-8, 1e-8),
hessian = TRUE,
control = list(trace = FALSE,
fnscale = -1,
maxit = 500,
abstol = 1e-8)
)
mle_power_a_estimate = mle_power$par
# mean(data$storage) = 0.378 = x bar
# x bar = a / (a+1) [from question A4 MME formula]
# 0.94/1.94 = 0.4845361
After getting an estimate of roughly \(a
\approx 0.939\), I went ahead and once again overlaid what such a
power distribution would look like relative to our available sample
data.
## visual confirmation
# Since R does not have a built-in dpower function, need to manually create one
manual_d_power = function(x, a){
a * x^(a-1) # = pdf of power distribution
}
ggplot(data = data) +
geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
color = "black",
fill = "lightblue",
bins = 10,) +
stat_function(fun = manual_d_power,
args = list(a = mle_power_a_estimate),
color = "black",
linewidth = 1.5,
xlim = range(0:1))+
labs(title = "Distribution of Storage Volumes",
x = "Storage Volume",
y = "Probabiity Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
)

Per these visuals, it appears that the Kumaraswamy distribution much
more accurately captures the reality of our broader distribution than
the power distribution does. To quantify this difference, we perform a
likelihood ratio test (LRT).
## if b = 1, then power distribution is applicable
## if b =/ 1, then Kumaraswamy distribution is applicable
# test stat for likelihood ratio test is uppercase Lambda
## evaluate at unrestricted (Kumaraswamy)
log_evaluation_kum = log_like_kum(parameters = c(mle_kum_a_estimate, mle_kum_b_estimate),
data = data$storage)
## evaluate at restricted/simpler (power)
log_evaluation_power = log_like_power(a = mle_power_a_estimate,
data = data$storage)
## the likelihood ratio test is just 2 *
# [log likelihood evaluated in unrestricted way] -
# [log likelihood evaluated in restricted or simple way]
Lambda = 2 * (log_evaluation_kum - log_evaluation_power)
p_value_likelihoodratio = 1 - pchisq(Lambda, df = 1)
# test stat distribution for likelihood ratio test is chi-square distributed
# df = number of parameters we are estimating in function
The results of the LRT were overwhelmingly in support of the notion
that the paramater \(b\) is
not equal to 1, hence we should approach the broader
population of reservoir storage from the lens of a Kumaraswamy
distribution, rather than a power one. Our statistical results were a
\(\Lambda\) value of about 48.925 and a
p-value approximating 0.
Given the large visual discrepancies in the histograms overlaying the
Kumaraswamy and power distributions, these results are not
surprising.
B3.)
Now continuing with the assumption of a Kumaraswamy
distribution, i.e. both \(a,b\)
equaling any positive value other than exactly one, we return to the
likelihood estimates from section B1; \(a\) = 2.53 and \(b\) = 7.883 respectively.
1.)
To reduce the impact of sample variation, I performed bootstrapping
to obtain a sampling distribution - composed of 500 samples - of both a
and b.
# use the mle_kum function from earlier to obtain an estimate of a and b for each sample
bootstrap_kum_estimates = function(data, b){
n = length(data)
boots_a = numeric(b) # create a vector with b number of "slots" to fill throughout bootstrap process
boots_b = numeric(b)
# START FOR LOOP
for (i in 1:b){
boot_sample = sample(data, size = n, replace = TRUE) # replacement is necessary
# taken from mle_kum function from earlier
# we are getting estimates for the KUMARASWAMY distribution
boot_mle_kum = optim(
par = c(a = 1.5, b = 5),
fn = log_like_kum,
gr = gradient_kum,
data = boot_sample, ## critical change for bootstrap scenario
method = "L-BFGS-B",
lower = c(1e-8, 1e-8),
hessian = TRUE,
control = list(trace = FALSE,
fnscale = -1,
maxit = 500,
abstol = 1e-8)
)
## now store results for each parameter
boots_a[i] = boot_mle_kum$par[1]
boots_b[i] = boot_mle_kum$par[2]
} # END FOR LOOP
## now output our parameter estimates; will sort and put them as a dataframe for easier interpretation
boot_a_estimates = sort(boots_a)
boot_b_estimates = sort(boots_b)
boot_a_estimates = cbind.data.frame(boot_a_estimates)
boot_b_estimates = cbind.data.frame(boot_b_estimates)
return(list(boot_a_estimates,
boot_b_estimates))
} # END FUNCTION
set.seed(100)
# now run function, using 500 bootstrap samples
boot_sampling_dist = bootstrap_kum_estimates(data = data$storage, b = 500)
boot_a_estimates = boot_sampling_dist[[1]] # a hat distribution
colnames(boot_a_estimates) = "a"
boot_b_estimates = boot_sampling_dist[[2]] # b hat distribution
colnames(boot_b_estimates) = "b"
The results of this bootstrapping sequence resulted in average
estimates of parameters a and b of about 2.63 and 9.057 (not far from
our original estimate of 2.53 and 7.883).
Below are visuals of each parameter’s sampling distribution from the
bootstrapping sequence, with a Gaussian density
curve layered atop. Bandwidths for each density curve were chosen
in accordance with Silverman’s standard formula.
## use custom function to find Gaussian bandwidth values for each sampling distribution
gaussian_bw = function(data){
min_part = min(sd(data),(IQR(data)/1.34))
bw = 0.9 * min_part * (length(data)^(-0.2))
return(bw)
}
#gaussian_bw(boot_a_estimates$a) = 0.08314841
#gaussian_bw(boot_b_estimates$b) = 0.6875727
## plot a hat estimates first
ggplot(data = boot_a_estimates) +
geom_histogram(mapping = aes(x = a, y = after_stat(density)),
color = "black",
fill = "lightblue",
bins = 10,) +
## the geom_density function in ggplot by default uses Gaussian estimation
geom_density(
aes(x = a),
color = "red",
linewidth = 2,
bw = gaussian_bw(boot_a_estimates$a)
) +
labs(title = "Distribution of A Hat Estimates",
x = "A Hat",
y = "Probabiity Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
)

## now plot b hat estimates
ggplot(data = boot_b_estimates) +
geom_histogram(mapping = aes(x = b, y = after_stat(density)),
color = "black",
fill = "lightblue",
bins = 10,) +
geom_density(
aes(x = b),
color = "red",
linewidth = 2,
bw = gaussian_bw(boot_b_estimates$b)
) +
labs(title = "Distribution of B Hat Estimates",
x = "B Hat",
y = "Probabiity Density") +
theme(
panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
)

2.)
Looking deeper into the \(b\)
parameter for the Kumaraswamy distribution, I constructed a bootstrap
confidence interval for its value.
## bootstrap CI
ci_percentile = function(data, b, alpha){
n = length(data)
boots_a = numeric(b) # create a vector with b number of "slots" to fill throughout bootstrap process
boots_b = numeric(b)
### code taken from B3 Question 1
# START FOR LOOP
for (i in 1:b){
boot_sample = sample(data, size = n, replace = TRUE) # replacement is necessary
# taken from mle_kum function from earlier
# we are getting estimates for the KUMARASWAMY distribution
boot_mle_kum = optim(
par = c(a = 1.5, b = 5),
fn = log_like_kum,
gr = gradient_kum,
data = boot_sample, ## critical change for bootstrap scenario
method = "L-BFGS-B",
lower = c(1e-8, 1e-8),
hessian = TRUE,
control = list(trace = FALSE,
fnscale = -1,
maxit = 500,
abstol = 1e-8)
)
## now store results for each parameter
boots_a[i] = boot_mle_kum$par[1] # not necessary for output in this scenario but will keep for now
boots_b[i] = boot_mle_kum$par[2]
} # END FOR LOOP
## now do CI calculation
boots_b = sort(boots_b)
lower_quant = (alpha/2)
higher_quant = (1-(alpha/2))
lower_estimate = quantile(boots_b, lower_quant)
higher_estimate = quantile(boots_b, higher_quant)
# now put together in a table for output
results = rbind(lower_estimate,
higher_estimate)
colnames(results) = "b_hat"
rownames(results) = c("Lower Estimate", "Higher Estimate")
return(results)
}
set.seed(100)
boot_ci = ci_percentile(data = data$storage, b = 500, alpha = 0.05)
With a 500-iteration bootstrap sequence and confidence level of about
95%, we can assert that the true value of the parameter \(b\) is approximately somewhere between
5.191 and 16.676. Knowing our initial, MLE-based, estimate for the
parameter was roughly 7.883, and the Kumaraswamy distribution is
right-skewed, these results seem plausible.
After performing the bootstrap sequence, I then constructed a Wald
confidence interval, which is predicated on a parameter’s point
estimate, critical value and standard error.
## Wald CI
# Wald formula = estimate +- z score * standard error
# estimate = mle_kum_b_estimate ~ 7.88
wald_z = 1.96
# let I denote Fisher matrix
I = -mle_kum$hessian
wald_se = sqrt(diag(solve(I))) # b ~ 2.245
# if we invert and solve the Fisher matrix, then square root the diagonals, we will get estimates of each parameter's standard error
wald_se_b = wald_se[2] # what we need right now
wald_lower_est = mle_kum_b_estimate - wald_z * wald_se_b
wald_higher_est = mle_kum_b_estimate + wald_z * wald_se_b
Our standard error for the estimate of \(b\) was derived from the Fisher matrix of
\(a\) and \(b\) which was created in R’s optim function previously.
As per Wald’s interval, we can say with about 95% certainty that the
true value of b resides between 3.484 and 12.283.
3.)
Although the Wald interval provides a more conservative estimate for
\(b\) than the bootstrap interval does,
it is still reasonably close enough to not rule out and keep in
consideration for population estimation purposes. With an original point
estimate for \(b\) of 7.883, that
estimate falls roughly in the middle of our Wald interval.
Alternatively, our bootstrap confidence interval (as we can see in our
histogram in section B3 Q1) more accurately recognizes the rightward
skew that the Kumaraswamy distribution typically carries.
B4.)
Finally, I complete another LRT to determine if there is significant
loss in accuracy when examining our broader population via a uniform
distribution lens rather than a Kumaraswamy one, using the available
sample data as a “benchmark” of such.
# if (a,b) = 1, then uniform distribution is applicable
# if at least one of a or b =/ 1, then distribution is not uniform
log_evaluation_uniform = log_like_kum(parameters = c(1,1), # a and b = 1 of a Kum distribution, that is same as uniform distribution
data = data$storage)
Lambda_B4 = 2 * (log_evaluation_kum - log_evaluation_uniform)
p_value_B4 = 1 - pchisq(Lambda_B4, df = 2)
# now estimating 2 parameters so df = 2
Unsurprisingly, the LRT results found that there is sufficient reason
to believe that the broader population of reservoir storage is
not uniformly distributed, sporting a test statistic of
about 49.125 and a p-value approaching zero. Given the estimates for
\(a\) and \(b\) throughout this analysis as well as
visualizations of the data, these findings align with what we have
observed so far.
---
title: "STA 506 Final Exam: Estimation w/ Kumaraswamy Distribution"
author: "Chris Bahm"
date: "May 5, 2026"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: false
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
    self_contained: false
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(forecast)) {library(forecast)} 

if (!require(lubridate)) {library(lubridate)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE,
	fig.align = "center"
)
```

# Intro and Setup
For this exam, we will be examining and utilizing the **Kumaraswamy distribution**. This distribution is continuous, bounded between 0 and 1, and defined by two parameters ($a$ and $b$).

With $X$ denoting a Kumaraswamy random variable, the CDF of our distribution is the following.

$$\Large F(x; a,b) = 1 - (1-x^a)^b $$

Two special cases of the Kumaraswamy distribution are the uniform and power distributions. A distribution is uniform if $a, b$ = 1, while such is power if $a > 0, b = 1$.

<br>

# Part A: Methodological Derivations

## A1.)
Given the distribution's cumulative function, we can use calculus-based methods to derive the probability density function (PDF). Said calculations were completed by hand and can be seen below.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A1.jpg")
```

Moving forward, we define our density function as the following.
$$\Large  f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}$$

## A2.) 
Now having our distribution's density function, we can denote our log-likelihood, $\ell(a,b)$, function and subsequent gradient vector. The gradient vector is made up of the partial derivatives of the log-likelihood function, first with respect to $a$ then with respect to $b$. Below are calculations and ultimate results.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_3.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_4.jpg")
```


$$\Large \ell(a,b) = n\ln(a) + n\ln(b) + \left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a) \right]$$

$$\Large \ell_a^\prime(a,b) = \frac{n}{a} + \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times \ln(x_i)}{1 - x_i^a} \right]$$

$$\Large \ell_b^\prime(a,b) = \frac{n}{b} + \sum\ln(1-x_i^a)$$


## A3.) 
Then taking the gradient vector components, I was able to calculate the Fisher information matrix ($I$), which is simply the negative of the Hessian matrix (composed of all second-derivative extensions of that distribution's log-likelihood). The Fisher Information matrix for the Kumaraswamy distribution is below.

```{r, out.width="50%", out.height="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_3.jpg")
```

## A4.)
We now consider the **power** distribution, a special and more simplistic case of the Kumaraswamy distribution, which occurs when the parameter $b$ = 1. Given the Kumaraswamy formula, a value of $b$ = 1 means that the relative density function of the power distribution is $ax^{a-1}$. Knowing this, below are formulas which can be used to estimate the parameter $a$ via moment or maximum likelihood estimation (MOM or MLE).

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_3.jpg")
```


## A5.)
Given an estimate of our parameter $a$, ($\hat{a}$), we can setup a general confidence interval (CI) formula for the value of $a$. Assuming 95% confidence, therefore a critical value of 1.96 predicated on the normal distribution, the makeup of the CI would look like the following. 

The variance and standard deviation of the interval were found from the power distribution's information matrix. The inverse of such matrix serves as the variance when applied in single-parameter scenarios.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_2.jpg")
```

## A6.)
The likelihood ratio test involves the subtraction of the log likelihood estimate of a simpler model (the uniform distribution in this case) from the log likelihood estimate of a more complex and more parameterized model (the power distribution in this scenario). This test is is meant to measure the improvement in fit that the more complex and unrestricted model provides. 

Below shows the formula for what our test statistic, $\Lambda$ would equal in this instance. Since $\ell(1) = 0$, it is relatively straightforward. When performing a likelihood ratio test, the distribution of $\Lambda$ follows a chi-squared distribution. Again assuming a 95% confidence level, we would consider any value of $\Lambda$ greater than about 3.841 sufficient evidence to suggest there is value in using the more complex power distribution.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_2.jpg")
```

<br>

# Part B: Numerical Analysis

```{r}
# read in data
data = rbind.data.frame(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)
colnames(data) = "storage"
```
The dataset we will analyze consists of `r nrow(data)` recorded usable storage proportions of a reservoir in a small town, with a value of 0 representing a basically empty reservoir and a value of 1 representing a full body of water. Summary statistics and a histogram of the data are below.

```{r}
sum_stats_table = function(variable){
  table = cbind(
    round(min(variable),2),
    round(quantile(variable, 0.25),2),
    round(median(variable),2),
    round(mean(variable),2),
    round(quantile(variable, 0.75),2),
    round(max(variable),2)
  )
  colnames(table) = c("Min", "Q1", "Med", "Mean", "Q3", "Max")
  rownames(table) = NULL
  return(table)
}

Summary_Table = sum_stats_table(variable = data$storage)
kable(Summary_Table,align = "c",
      caption = "<span style='color:##000000;'>
      Summary Stats of Storage Volumes </span>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                  # argument color in geom_histogram is for color of bin lines
                 fill = "lightblue",
                 bins = 10,) +
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) # this is the equivalent of box(lwd = 3) in base R
```

## B1.)
Now using the available sample data, and assuming a **Kumaraswamy distribution** is applicable in this case, I will perform MLE to estimate parameters $a$ and $b$. Key formulas for such calculation are the log likelihood and gradient functions seen below.

$$\Large \ell(a,b) = n\ln(a) + n\ln(b) + \left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a) \right]$$

$$\Large \ell_a^\prime(a,b) = \frac{n}{a} + \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times \ln(x_i)}{1 - x_i^a} \right]$$

$$\Large \ell_b^\prime(a,b) = \frac{n}{b} + \sum\ln(1-x_i^a)$$

```{r}
## first define log likelihood of Kumaraswamy distribution
log_like_kum = function(parameters, data){
    a = parameters[1]
    b = parameters[2]
    n = length(data)
    
    log_likelihood = n * log(a) + 
        n * log(b) + 
        (a-1) * sum(log(data)) + 
        (b-1) * sum(log(1-data^a))
    return(log_likelihood)
    }

## gradient functions
gradient_kum = function(parameters, data){
    a = parameters[1]
    b = parameters[2]
    n = length(data)
    
    ## partial derivative with respect to a
    partial_a = n/a +
      sum(log(data)) -
      (b-1)*sum((data^a*log(data))/(1-data^a))
    
    ## partial derivative with respect to b 
    partial_b = n/b + sum(log(1-data^a))
    
    return(c(partial_a,
             partial_b))
  
    }

## now get an MLE estimate
mle_kum = optim(
    par = c(a = 1.5, b = 5),
      # tried MANY different starting estimates, continued to adjust based on histogram visual below.
    fn = log_like_kum,
    gr = gradient_kum,
    data = data$storage,
    method = "L-BFGS-B",
      lower = c(1e-8, 1e-8), # using this method with lower restraints since a and b cannot be negative in a Kumaraswamy distribution
    hessian = TRUE,
    control = list(trace = FALSE,
                   fnscale = -1,
                   maxit = 500,
                   abstol = 1e-8)
    )

mle_kum_a_estimate = mle_kum$par[1] # a
  ## controls behavior near 0
mle_kum_b_estimate = mle_kum$par[2] # b
  ## controls behavior approaching 1 / think right tail
```

Using R's <span style="color:blue;">optim</span> function, and performing numerous iterations with different starting parameter estimates of both $a$ and $b$, a reasonable estimate for the two shape parameters are `r round(mle_kum_a_estimate,3)` and `r round(mle_kum_b_estimate,3)` respectively. Below is a visual of our sample data with a Kumaraswamy distribution curve with said parameter values additionally overlaid.

```{r}
library(extraDistr) # needed to plot Kumaraswamy distribution curve

## visual confirmation
ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                 fill = "lightblue",
                 bins = 10,) +
      stat_function(fun = dkumar, 
                args = list(a = mle_kum_a_estimate, b = mle_kum_b_estimate),
                color = "black",
                linewidth = 1.5,
                xlim = range(0:1))+
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) 
```

## B2.) 
Going back to the **power** distribution, I once again performed MLE to approximate the value of the power distribution's sole parameter, $a$.

```{r}
## same procedure as B1, now use power distribution characteristics
log_like_power = function(a, data){
      n = length(data)
      log_likelihood = n * log(a) +
        (a-1)*sum(log(data))
    return(log_likelihood)
    }

gradient_power = function(a, data){
    n = length(data)
    partial_a = n/a + sum(log(data))
    return(partial_a)
    }

mle_power = optim(
    par = 2.6, ## maybe change this later
    fn = log_like_power,
    gr = gradient_power,
    data = data$storage,
    method = "L-BFGS-B",
      lower = c(1e-8, 1e-8),
    hessian = TRUE,
    control = list(trace = FALSE,
                   fnscale = -1,
                   maxit = 500,
                   abstol = 1e-8)
    )

mle_power_a_estimate = mle_power$par 

# mean(data$storage) = 0.378 = x bar

# x bar = a / (a+1) [from question A4 MME formula]
  # 0.94/1.94 = 0.4845361
```
After getting an estimate of roughly $a \approx `r round(mle_power_a_estimate,3)`$, I went ahead and once again overlaid what such a power distribution would look like relative to our available sample data.

```{r}
## visual confirmation

# Since R does not have a built-in dpower function, need to manually create one
manual_d_power = function(x, a){
  a * x^(a-1) # = pdf of power distribution
}

ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                 fill = "lightblue",
                 bins = 10,) +
      stat_function(fun = manual_d_power, 
                args = list(a = mle_power_a_estimate),
                color = "black",
                linewidth = 1.5,
                xlim = range(0:1))+
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) 
```

Per these visuals, it appears that the Kumaraswamy distribution much more accurately captures the reality of our broader distribution than the power distribution does. To quantify this difference, we perform a likelihood ratio test (LRT).

```{r}
## if b = 1, then power distribution is applicable
## if b =/ 1, then Kumaraswamy distribution is applicable

# test stat for likelihood ratio test is uppercase Lambda

## evaluate at unrestricted (Kumaraswamy)
log_evaluation_kum = log_like_kum(parameters = c(mle_kum_a_estimate, mle_kum_b_estimate),
                                  data = data$storage)

## evaluate at restricted/simpler (power)
log_evaluation_power = log_like_power(a = mle_power_a_estimate,
                                      data = data$storage)


## the likelihood ratio test is just 2 * 
  # [log likelihood evaluated in unrestricted way] -
  # [log likelihood evaluated in restricted or simple way]

Lambda = 2 * (log_evaluation_kum - log_evaluation_power)

p_value_likelihoodratio = 1 - pchisq(Lambda, df = 1)
  # test stat distribution for likelihood ratio test is chi-square distributed
  # df = number of parameters we are estimating in function

```
The results of the LRT were overwhelmingly in support of the notion that the paramater $b$ is **not** equal to 1, hence we should approach the broader population of reservoir storage from the lens of a Kumaraswamy distribution, rather than a power one. Our statistical results were a $\Lambda$ value of about `r round(Lambda,3)` and a p-value approximating `r round(p_value_likelihoodratio,6)`.

Given the large visual discrepancies in the histograms overlaying the Kumaraswamy and power distributions, these results are not surprising.

## B3.)
Now continuing with the assumption of a **Kumaraswamy** distribution, i.e. both $a,b$ equaling any positive value other than exactly one, we return to the likelihood estimates from section B1; $a$ = `r round(mle_kum_a_estimate,3)` and $b$ = `r round(mle_kum_b_estimate,3)` respectively.

### 1.)
To reduce the impact of sample variation, I performed bootstrapping to obtain a sampling distribution - composed of 500 samples - of both a and b.

```{r}
# use the mle_kum function from earlier to obtain an estimate of a and b for each sample

bootstrap_kum_estimates = function(data, b){
  n = length(data)
  boots_a = numeric(b) # create a vector with b number of "slots" to fill throughout bootstrap process
  boots_b = numeric(b)
  
  # START FOR LOOP
    for (i in 1:b){
        boot_sample = sample(data, size = n, replace = TRUE) # replacement is necessary 
        
        # taken from mle_kum function from earlier 
        # we are getting estimates for the KUMARASWAMY distribution
        boot_mle_kum = optim(
        par = c(a = 1.5, b = 5),
        fn = log_like_kum,
        gr = gradient_kum,
        data = boot_sample, ## critical change for bootstrap scenario
        method = "L-BFGS-B",
          lower = c(1e-8, 1e-8),
        hessian = TRUE,
        control = list(trace = FALSE,
                       fnscale = -1,
                       maxit = 500,
                       abstol = 1e-8)
      )
       
       ## now store results for each parameter
        boots_a[i] = boot_mle_kum$par[1]
        boots_b[i] = boot_mle_kum$par[2]
       
    } # END FOR LOOP
  
  ## now output our parameter estimates; will sort and put them as a dataframe for easier interpretation
  boot_a_estimates = sort(boots_a)
  boot_b_estimates = sort(boots_b)
  
  boot_a_estimates = cbind.data.frame(boot_a_estimates)
  boot_b_estimates = cbind.data.frame(boot_b_estimates)
  
  
  return(list(boot_a_estimates,
              boot_b_estimates))
  
} # END FUNCTION

set.seed(100)
# now run function, using 500 bootstrap samples
boot_sampling_dist = bootstrap_kum_estimates(data = data$storage, b = 500)

boot_a_estimates = boot_sampling_dist[[1]] # a hat distribution
  colnames(boot_a_estimates) = "a"
boot_b_estimates = boot_sampling_dist[[2]] # b hat distribution
  colnames(boot_b_estimates) = "b"
```
The results of this bootstrapping sequence resulted in average estimates of parameters a and b of about `r round(mean(boot_a_estimates$a),3)` and `r round(mean(boot_b_estimates$b),3)` (not far from our original estimate of `r round(mle_kum_a_estimate,3)` and `r round(mle_kum_b_estimate,3)`).

Below are visuals of each parameter's sampling distribution from the bootstrapping sequence, with a <span style="color:red;">Gaussian density curve</span> layered atop. Bandwidths for each density curve were chosen in accordance with Silverman’s standard formula.
```{r}
## use custom function to find Gaussian bandwidth values for each sampling distribution
gaussian_bw = function(data){
  min_part = min(sd(data),(IQR(data)/1.34))
  bw = 0.9 * min_part * (length(data)^(-0.2))
  return(bw)
}

#gaussian_bw(boot_a_estimates$a)  = 0.08314841
#gaussian_bw(boot_b_estimates$b)  = 0.6875727
```

```{r}
## plot a hat estimates first
ggplot(data = boot_a_estimates) +
  geom_histogram(mapping = aes(x = a, y = after_stat(density)),
                 color = "black",
                 fill = "lightblue",
                 bins = 10,) +
  ## the geom_density function in ggplot by default uses Gaussian estimation
  geom_density(
    aes(x = a),
    color = "red",
    linewidth = 2,
    bw = gaussian_bw(boot_a_estimates$a)
  ) + 
  labs(title = "Distribution of A Hat Estimates",
       x = "A Hat",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) 
```

```{r}
## now plot b hat estimates 
ggplot(data = boot_b_estimates) +
  geom_histogram(mapping = aes(x = b, y = after_stat(density)),
                 color = "black",
                 fill = "lightblue",
                 bins = 10,) +
  geom_density(
    aes(x = b),
    color = "red",
    linewidth = 2,
    bw = gaussian_bw(boot_b_estimates$b)
  ) + 
  labs(title = "Distribution of B Hat Estimates",
       x = "B Hat",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) 
```


### 2.)
Looking deeper into the $b$ parameter for the Kumaraswamy distribution, I constructed a bootstrap confidence interval for its value.

```{r}
## bootstrap CI
ci_percentile = function(data, b, alpha){
  n = length(data)
  boots_a = numeric(b) # create a vector with b number of "slots" to fill throughout bootstrap process
  boots_b = numeric(b)
  
  
  ### code taken from B3 Question 1
  # START FOR LOOP
    for (i in 1:b){
        boot_sample = sample(data, size = n, replace = TRUE) # replacement is necessary 
        
        # taken from mle_kum function from earlier 
        # we are getting estimates for the KUMARASWAMY distribution
        boot_mle_kum = optim(
        par = c(a = 1.5, b = 5),
        fn = log_like_kum,
        gr = gradient_kum,
        data = boot_sample, ## critical change for bootstrap scenario
        method = "L-BFGS-B",
          lower = c(1e-8, 1e-8),
        hessian = TRUE,
        control = list(trace = FALSE,
                       fnscale = -1,
                       maxit = 500,
                       abstol = 1e-8)
      )
       
       ## now store results for each parameter
        boots_a[i] = boot_mle_kum$par[1] # not necessary for output in this scenario but will keep for now
        boots_b[i] = boot_mle_kum$par[2]
       
    } # END FOR LOOP
  
  ## now do CI calculation
  boots_b = sort(boots_b)
  lower_quant = (alpha/2)
  higher_quant = (1-(alpha/2))
  lower_estimate = quantile(boots_b, lower_quant)
  higher_estimate = quantile(boots_b, higher_quant)
  
  # now put together in a table for output
  results = rbind(lower_estimate,
                  higher_estimate)
  colnames(results) = "b_hat"
  rownames(results) = c("Lower Estimate", "Higher Estimate")
  
  return(results)
}

set.seed(100)
boot_ci = ci_percentile(data = data$storage, b = 500, alpha = 0.05)
```
With a 500-iteration bootstrap sequence and confidence level of about 95%, we can assert that the true value of the parameter $b$ is approximately somewhere between `r round(boot_ci[1],3)` and `r round(boot_ci[2],3)`. Knowing our initial, MLE-based, estimate for the parameter was roughly `r round(mle_kum_b_estimate,3)`, and the Kumaraswamy distribution is right-skewed, these results seem plausible.

After performing the bootstrap sequence, I then constructed a Wald confidence interval, which is predicated on a parameter's point estimate, critical value and standard error. 

```{r}
## Wald CI

# Wald formula = estimate +- z score * standard error
  # estimate = mle_kum_b_estimate ~ 7.88

wald_z = 1.96
# let I denote Fisher matrix
I = -mle_kum$hessian

wald_se = sqrt(diag(solve(I))) # b ~ 2.245
  # if we invert and solve the Fisher matrix, then square root the diagonals, we will get estimates of each parameter's standard error

wald_se_b = wald_se[2] # what we need right now

wald_lower_est = mle_kum_b_estimate - wald_z * wald_se_b
wald_higher_est = mle_kum_b_estimate + wald_z * wald_se_b
```

Our standard error for the estimate of $b$ was derived from the Fisher matrix of $a$ and $b$ which was created in R's <span style="color:blue;">optim</span> function previously.

As per Wald's interval, we can say with about 95% certainty that the true value of b resides between `r round(wald_lower_est,3)` and `r round(wald_higher_est,3)`.

### 3.)
Although the Wald interval provides a more conservative estimate for $b$ than the bootstrap interval does, it is still reasonably close enough to not rule out and keep in consideration for population estimation purposes. With an original point estimate for $b$ of `r round(mle_kum_b_estimate,3)`, that estimate falls roughly in the middle of our Wald interval. Alternatively, our bootstrap confidence interval (as we can see in our histogram in section B3 Q1) more accurately recognizes the rightward skew that the Kumaraswamy distribution typically carries.

## B4.)
Finally, I complete another LRT to determine if there is significant loss in accuracy when examining our broader population via a uniform distribution lens rather than a Kumaraswamy one, using the available sample data as a "benchmark" of such.
```{r}
# if (a,b) = 1, then uniform distribution is applicable
# if at least one of a or b =/ 1, then distribution is not uniform

log_evaluation_uniform = log_like_kum(parameters = c(1,1), # a and b = 1 of a Kum distribution, that is same as uniform distribution
                                  data = data$storage)

Lambda_B4 = 2 * (log_evaluation_kum - log_evaluation_uniform) 

p_value_B4 = 1 - pchisq(Lambda_B4, df = 2)
  # now estimating 2 parameters so df = 2
```

Unsurprisingly, the LRT results found that there is sufficient reason to believe that the broader population of reservoir storage is **not** uniformly distributed, sporting a test statistic of about `r round(Lambda_B4,3)` and a p-value approaching zero. Given the estimates for $a$ and $b$ throughout this analysis as well as visualizations of the data, these findings align with what we have observed so far.
