Intro and Setup

For this exam, we will be examining and utilizing the Kumaraswamy distribution. This distribution is continuous, bounded between 0 and 1, and defined by two parameters (\(a\) and \(b\)).

With \(X\) denoting a Kumaraswamy random variable, the CDF of our distribution is the following.

\[\Large F(x; a,b) = 1 - (1-x^a)^b \]

Two special cases of the Kumaraswamy distribution are the uniform and power distributions. A distribution is uniform if \(a, b\) = 1, while such is power if \(a > 0, b = 1\).


Part A: Methodological Derivations

A1.)

Given the distribution’s cumulative function, we can use calculus-based methods to derive the probability density function (PDF). Said calculations were completed by hand and can be seen below.

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A1.jpg")

Moving forward, we define our density function as the following. \[\Large f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}\]

A2.)

Now having our distribution’s density function, we can denote our log-likelihood, \(\ell(a,b)\), function and subsequent gradient vector. The gradient vector is made up of the partial derivatives of the log-likelihood function, first with respect to \(a\) then with respect to \(b\). Below are calculations and ultimate results.

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_3.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_4.jpg")

\[\Large \ell(a,b) = n\ln(a) + n\ln(b) + \left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a) \right]\]

\[\Large \ell_a^\prime(a,b) = \frac{n}{a} + \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times \ln(x_i)}{1 - x_i^a} \right]\]

\[\Large \ell_b^\prime(a,b) = \frac{n}{b} + \sum\ln(1-x_i^a)\]

A3.)

Then taking the gradient vector components, I was able to calculate the Fisher information matrix (\(I\)), which is simply the negative of the Hessian matrix (composed of all second-derivative extensions of that distribution’s log-likelihood). The Fisher Information matrix for the Kumaraswamy distribution is below.

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_3.jpg")

A4.)

We now consider the power distribution, a special and more simplistic case of the Kumaraswamy distribution, which occurs when the parameter \(b\) = 1. Given the Kumaraswamy formula, a value of \(b\) = 1 means that the relative density function of the power distribution is \(ax^{a-1}\). Knowing this, below are formulas which can be used to estimate the parameter \(a\) via moment or maximum likelihood estimation (MOM or MLE).

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_3.jpg")

A5.)

Given an estimate of our parameter \(a\), (\(\hat{a}\)), we can setup a general confidence interval (CI) formula for the value of \(a\). Assuming 95% confidence, therefore a critical value of 1.96 predicated on the normal distribution, the makeup of the CI would look like the following.

The variance and standard deviation of the interval were found from the power distribution’s information matrix. The inverse of such matrix serves as the variance when applied in single-parameter scenarios.

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_2.jpg")

A6.)

The likelihood ratio test involves the subtraction of the log likelihood estimate of a simpler model (the uniform distribution in this case) from the log likelihood estimate of a more complex and more parameterized model (the power distribution in this scenario). This test is is meant to measure the improvement in fit that the more complex and unrestricted model provides.

Below shows the formula for what our test statistic, \(\Lambda\) would equal in this instance. Since \(\ell(1) = 0\), it is relatively straightforward. When performing a likelihood ratio test, the distribution of \(\Lambda\) follows a chi-squared distribution. Again assuming a 95% confidence level, we would consider any value of \(\Lambda\) greater than about 3.841 sufficient evidence to suggest there is value in using the more complex power distribution.

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_2.jpg")


Part B: Numerical Analysis

# read in data
data = rbind.data.frame(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)
colnames(data) = "storage"

The dataset we will analyze consists of 50 recorded usable storage proportions of a reservoir in a small town, with a value of 0 representing a basically empty reservoir and a value of 1 representing a full body of water. Summary statistics and a histogram of the data are below.

sum_stats_table = function(variable){
  table = cbind(
    round(min(variable),2),
    round(quantile(variable, 0.25),2),
    round(median(variable),2),
    round(mean(variable),2),
    round(quantile(variable, 0.75),2),
    round(max(variable),2)
  )
  colnames(table) = c("Min", "Q1", "Med", "Mean", "Q3", "Max")
  rownames(table) = NULL
  return(table)
}

Summary_Table = sum_stats_table(variable = data$storage)
kable(Summary_Table,align = "c",
      caption = "<span style='color:##000000;'>
      Summary Stats of Storage Volumes </span>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")
Summary Stats of Storage Volumes
Min Q1 Med Mean Q3 Max
0.12 0.25 0.38 0.38 0.5 0.78
ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                  # argument color in geom_histogram is for color of bin lines
                 fill = "lightblue",
                 bins = 10,) +
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) # this is the equivalent of box(lwd = 3) in base R

B1.)

Now using the available sample data, and assuming a Kumaraswamy distribution is applicable in this case, I will perform MLE to estimate parameters \(a\) and \(b\). Key formulas for such calculation are the log likelihood and gradient functions seen below.

\[\Large \ell(a,b) = n\ln(a) + n\ln(b) + \left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a) \right]\]

\[\Large \ell_a^\prime(a,b) = \frac{n}{a} + \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times \ln(x_i)}{1 - x_i^a} \right]\]

\[\Large \ell_b^\prime(a,b) = \frac{n}{b} + \sum\ln(1-x_i^a)\]

## first define log likelihood of Kumaraswamy distribution
log_like_kum = function(parameters, data){
    a = parameters[1]
    b = parameters[2]
    n = length(data)
    
    log_likelihood = n * log(a) + 
      n * log(b) + 
      (a-1) * sum(log(data)) + 
      (b-1) * sum(log(1-data^a))
    }

## gradient functions
gradient_kum = function(parameters, data){
    a = parameters[1]
    b = parameters[2]
    n = length(data)
    
    ## partial derivative with respect to a
    partial_a = n/a +
      sum(log(data)) -
      (b-1)*sum((data^a*log(data))/(1-data^a))
    
    ## partial derivative with respect to b 
    partial_b = n/b + sum(log(1-data^a))
    
    return(c(partial_a,
             partial_b))
  
    }

## now get an MLE estimate
mle_kum = optim(
    par = c(a = 1.5, b = 5),
      # tried MANY different starting estimates, continued to adjust based on histogram visual below.
    fn = log_like_kum,
    gr = gradient_kum,
    data = data$storage,
   # method = "L-BFGS-B",
    hessian = TRUE,
    control = list(trace = FALSE,
                   fnscale = -1,
                   maxit = 500,
                   abstol = 1e-8)
    )

mle_kum_a_estimate = mle_kum$par[1] # a
  ## controls behavior near 0
mle_kum_b_estimate = mle_kum$par[2] # b
  ## controls behavior approaching 1 / think right tail

Using R’s optim function, and performing numerous iterations with different starting parameter estimates of both \(a\) and \(b\), a reasonable estimate for the two shape parameters are 2 and 5 respectively. Below is a visual of our sample data with a Kumaraswamy distribution curve with said parameter values additionally overlaid.

library(extraDistr) # needed to plot Kumaraswamy distribution curve

## visual confirmation
ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                 fill = "lightblue",
                 bins = 10,) +
      stat_function(fun = dkumar, 
                args = list(a = mle_kum_a_estimate, b = mle_kum_b_estimate),
                color = "black",
                linewidth = 1.5,
                xlim = range(0:1))+
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) 

B2.) PICK UP HERE

## same procedure as B1, now use power distribution characteristics
log_like_power = function(a, data){
    n = length(data)
    log_likelihood = n * log(a) +
      (a-1)*sum(log(data))
    }

gradient_power = function(a, data){
    partial_a = n/a + sum(log(data))
    return(partial_a)
    }

B3.)

B4.)

---
title: "STA 506 Final Exam: Estimation w/ Kumaraswamy Distribution"
author: "Chris Bahm"
date: "May 5, 2026"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: false
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
    self_contained: false
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(forecast)) {library(forecast)} 

if (!require(lubridate)) {library(lubridate)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE,
	fig.align = "center"
)
```

# Intro and Setup
For this exam, we will be examining and utilizing the **Kumaraswamy distribution**. This distribution is continuous, bounded between 0 and 1, and defined by two parameters ($a$ and $b$).

With $X$ denoting a Kumaraswamy random variable, the CDF of our distribution is the following.

$$\Large F(x; a,b) = 1 - (1-x^a)^b $$

Two special cases of the Kumaraswamy distribution are the uniform and power distributions. A distribution is uniform if $a, b$ = 1, while such is power if $a > 0, b = 1$.

<br>

# Part A: Methodological Derivations

## A1.)
Given the distribution's cumulative function, we can use calculus-based methods to derive the probability density function (PDF). Said calculations were completed by hand and can be seen below.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A1.jpg")
```

Moving forward, we define our density function as the following.
$$\Large  f(x; a, b) = ab \, x^{a-1} (1 - x^a)^{b-1}$$

## A2.) 
Now having our distribution's density function, we can denote our log-likelihood, $\ell(a,b)$, function and subsequent gradient vector. The gradient vector is made up of the partial derivatives of the log-likelihood function, first with respect to $a$ then with respect to $b$. Below are calculations and ultimate results.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_3.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A2_4.jpg")
```


$$\Large \ell(a,b) = n\ln(a) + n\ln(b) + \left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a) \right]$$

$$\Large \ell_a^\prime(a,b) = \frac{n}{a} + \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times \ln(x_i)}{1 - x_i^a} \right]$$

$$\Large \ell_b^\prime(a,b) = \frac{n}{b} + \sum\ln(1-x_i^a)$$


## A3.) 
Then taking the gradient vector components, I was able to calculate the Fisher information matrix ($I$), which is simply the negative of the Hessian matrix (composed of all second-derivative extensions of that distribution's log-likelihood). The Fisher Information matrix for the Kumaraswamy distribution is below.

```{r, out.width="50%", out.height="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A3_3.jpg")
```

## A4.)
We now consider the **power** distribution, a special and more simplistic case of the Kumaraswamy distribution, which occurs when the parameter $b$ = 1. Given the Kumaraswamy formula, a value of $b$ = 1 means that the relative density function of the power distribution is $ax^{a-1}$. Knowing this, below are formulas which can be used to estimate the parameter $a$ via moment or maximum likelihood estimation (MOM or MLE).

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_2.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A4_3.jpg")
```


## A5.)
Given an estimate of our parameter $a$, ($\hat{a}$), we can setup a general confidence interval (CI) formula for the value of $a$. Assuming 95% confidence, therefore a critical value of 1.96 predicated on the normal distribution, the makeup of the CI would look like the following. 

The variance and standard deviation of the interval were found from the power distribution's information matrix. The inverse of such matrix serves as the variance when applied in single-parameter scenarios.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A5_2.jpg")
```

## A6.)
The likelihood ratio test involves the subtraction of the log likelihood estimate of a simpler model (the uniform distribution in this case) from the log likelihood estimate of a more complex and more parameterized model (the power distribution in this scenario). This test is is meant to measure the improvement in fit that the more complex and unrestricted model provides. 

Below shows the formula for what our test statistic, $\Lambda$ would equal in this instance. Since $\ell(1) = 0$, it is relatively straightforward. When performing a likelihood ratio test, the distribution of $\Lambda$ follows a chi-squared distribution. Again assuming a 95% confidence level, we would consider any value of $\Lambda$ greater than about 3.841 sufficient evidence to suggest there is value in using the more complex power distribution.

```{r, out.width="50%"}
knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_1.jpg")

knitr::include_graphics("/Users/chrisbahm/Library/Mobile Documents/com~apple~CloudDocs/West Chester/Spring 2026/STA 506 - Grad Math Stat 2 /STA 506 R Folder/Final/A6_2.jpg")
```

<br>

# Part B: Numerical Analysis

```{r}
# read in data
data = rbind.data.frame(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)
colnames(data) = "storage"
```
The dataset we will analyze consists of `r nrow(data)` recorded usable storage proportions of a reservoir in a small town, with a value of 0 representing a basically empty reservoir and a value of 1 representing a full body of water. Summary statistics and a histogram of the data are below.

```{r}
sum_stats_table = function(variable){
  table = cbind(
    round(min(variable),2),
    round(quantile(variable, 0.25),2),
    round(median(variable),2),
    round(mean(variable),2),
    round(quantile(variable, 0.75),2),
    round(max(variable),2)
  )
  colnames(table) = c("Min", "Q1", "Med", "Mean", "Q3", "Max")
  rownames(table) = NULL
  return(table)
}

Summary_Table = sum_stats_table(variable = data$storage)
kable(Summary_Table,align = "c",
      caption = "<span style='color:##000000;'>
      Summary Stats of Storage Volumes </span>") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                  # argument color in geom_histogram is for color of bin lines
                 fill = "lightblue",
                 bins = 10,) +
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) # this is the equivalent of box(lwd = 3) in base R
```

## B1.)
Now using the available sample data, and assuming a **Kumaraswamy distribution** is applicable in this case, I will perform MLE to estimate parameters $a$ and $b$. Key formulas for such calculation are the log likelihood and gradient functions seen below.

$$\Large \ell(a,b) = n\ln(a) + n\ln(b) + \left[ (a-1)\sum \ln(x_i) \right] + \left[ (b-1)\sum\ln(1-x_i^a) \right]$$

$$\Large \ell_a^\prime(a,b) = \frac{n}{a} + \left[ \sum\ln(x_i) \right] - \left[(b-1)\sum\frac{x_i^a \times \ln(x_i)}{1 - x_i^a} \right]$$

$$\Large \ell_b^\prime(a,b) = \frac{n}{b} + \sum\ln(1-x_i^a)$$

```{r}
## first define log likelihood of Kumaraswamy distribution
log_like_kum = function(parameters, data){
    a = parameters[1]
    b = parameters[2]
    n = length(data)
    
    log_likelihood = n * log(a) + 
      n * log(b) + 
      (a-1) * sum(log(data)) + 
      (b-1) * sum(log(1-data^a))
    }

## gradient functions
gradient_kum = function(parameters, data){
    a = parameters[1]
    b = parameters[2]
    n = length(data)
    
    ## partial derivative with respect to a
    partial_a = n/a +
      sum(log(data)) -
      (b-1)*sum((data^a*log(data))/(1-data^a))
    
    ## partial derivative with respect to b 
    partial_b = n/b + sum(log(1-data^a))
    
    return(c(partial_a,
             partial_b))
  
    }

## now get an MLE estimate
mle_kum = optim(
    par = c(a = 1.5, b = 5),
      # tried MANY different starting estimates, continued to adjust based on histogram visual below.
    fn = log_like_kum,
    gr = gradient_kum,
    data = data$storage,
   # method = "L-BFGS-B",
    hessian = TRUE,
    control = list(trace = FALSE,
                   fnscale = -1,
                   maxit = 500,
                   abstol = 1e-8)
    )

mle_kum_a_estimate = mle_kum$par[1] # a
  ## controls behavior near 0
mle_kum_b_estimate = mle_kum$par[2] # b
  ## controls behavior approaching 1 / think right tail
```

Using R's <span style="color:blue;">optim</span> function, and performing numerous iterations with different starting parameter estimates of both $a$ and $b$, a reasonable estimate for the two shape parameters are `r mle_kum_a_estimate` and `r mle_kum_b_estimate` respectively. Below is a visual of our sample data with a Kumaraswamy distribution curve with said parameter values additionally overlaid.

```{r}
library(extraDistr) # needed to plot Kumaraswamy distribution curve

## visual confirmation
ggplot(data = data) +
  geom_histogram(mapping = aes(x = storage, y = after_stat(density)),
                 color = "black",
                 fill = "lightblue",
                 bins = 10,) +
      stat_function(fun = dkumar, 
                args = list(a = mle_kum_a_estimate, b = mle_kum_b_estimate),
                color = "black",
                linewidth = 1.5,
                xlim = range(0:1))+
  labs(title = "Distribution of Storage Volumes",
       x = "Storage Volume",
       y = "Probabiity Density") + 
  theme(
    panel.border = element_rect(color = "black", fill = NA, linewidth = 3)
  ) 
```

## B2.) PICK UP HERE

```{r}
## same procedure as B1, now use power distribution characteristics
log_like_power = function(a, data){
    n = length(data)
    log_likelihood = n * log(a) +
      (a-1)*sum(log(data))
    }

gradient_power = function(a, data){
    partial_a = n/a + sum(log(data))
    return(partial_a)
    }



```

## B3.)

```{r}


```


## B4.)

```{r}


```

