The probability density function (PDF) is obtained by differentiating the cumulative distribution function (CDF) with respect to \(x\).
Given the Kumaraswamy CDF:
\[F(x; a, b) = 1 - (1 - x^a)^b\]
we apply the chain rule to find \(f(x; a, b) = \frac{d}{dx}F(x; a, b)\).
Step 1: Differentiate the outer function, treating \(u = 1 - x^a\):
\[\frac{d}{dx}\left[1 - (1 - x^a)^b\right] = -b(1 - x^a)^{b-1} \cdot \frac{d}{dx}(1 - x^a)\]
Step 2: Differentiate the inner function \(u = 1 - x^a\):
\[\frac{d}{dx}(1 - x^a) = -ax^{a-1}\]
Step 3: Substitute back and simplify:
\[f(x; a, b) = -b(1 - x^a)^{b-1} \cdot (-ax^{a-1})\]
\[\boxed{f(x; a, b) = ab\, x^{a-1}(1 - x^a)^{b-1}}\]
This result is valid for \(0 < x < 1\), with shape parameters \(a > 0\) and \(b > 0\). The two negatives cancel, confirming that \(f(x; a, b) \geq 0\) over its support,as required for a valid PDF.
Given an i.i.d. random sample \(\{x_1, x_2, \dots, x_n\}\) from the Kumaraswamy distribution with PDF \(f(x; a, b) = ab\,x^{a-1}(1-x^a)^{b-1}\), the joint likelihood function is the product of the individual densities:
\[L(a, b) = \prod_{i=1}^{n} ab\, x_i^{a-1}(1 - x_i^a)^{b-1}\]
Taking the natural logarithm and applying log rules (\(\ln(ABC) = \ln A + \ln B + \ln C\)), the log-likelihood function is:
\[\ell(a, b) = \sum_{i=1}^{n} \left[\ln a + \ln b + (a-1)\ln x_i + (b-1)\ln(1 - x_i^a)\right]\]
Which simplifies to:
\[\ell(a, b) = n\ln a + n\ln b + (a-1)\sum_{i=1}^{n}\ln x_i + (b-1)\sum_{i=1}^{n}\ln(1 - x_i^a)\]
We derive the first-order partial derivatives with respect to each parameter.
Partial derivative with respect to \(a\):
Differentiating term by term, the first two terms contribute \(\frac{n}{a}\), the third contributes \(\sum_{i=1}^{n}\ln x_i\), and the fourth requires the chain rule since \(x_i^a\) depends on \(a\), noting that \(\frac{\partial}{\partial a}(1 - x_i^a) = -x_i^a \ln x_i\):
\[\ell'_a(a, b) = \frac{n}{a} + \sum_{i=1}^{n}\ln x_i + (b-1)\sum_{i=1}^{n}\frac{-x_i^a \ln x_i}{1 - x_i^a}\]
Partial derivative with respect to \(b\):
Only the second and fourth terms depend on \(b\):
\[\ell'_b(a, b) = \frac{n}{b} + \sum_{i=1}^{n}\ln(1 - x_i^a)\]
The gradient vector is therefore:
\[\nabla \ell(a,b) = \left[\ell'_a(a,b),\ \ell'_b(a,b)\right] = \left[\frac{n}{a} + \sum_{i=1}^{n}\ln x_i - (b-1)\sum_{i=1}^{n}\frac{x_i^a \ln x_i}{1 - x_i^a},\ \frac{n}{b} + \sum_{i=1}^{n}\ln(1 - x_i^a)\right]\]
The observed Fisher Information Matrix is the negative of the Hessian matrix of the log-likelihood function, i.e.:
\[\mathcal{I}(a,b) = -\mathbf{H} = -\begin{bmatrix} \frac{\partial^2 \ell}{\partial a^2} & \frac{\partial^2 \ell}{\partial a \partial b} \\ \frac{\partial^2 \ell}{\partial b \partial a} & \frac{\partial^2 \ell}{\partial b^2} \end{bmatrix}\]
We differentiate each component of the gradient vector derived in Problem A2.
Second derivative with respect to \(a\):
Differentiating \(\ell'_a(a,b) = \frac{n}{a} + \sum_{i=1}^{n}\ln x_i - (b-1)\sum_{i=1}^{n}\frac{x_i^a \ln x_i}{1 - x_i^a}\) with respect to \(a\), and applying the quotient rule to the last term with \(\frac{\partial}{\partial a}(x_i^a \ln x_i) = x_i^a(\ln x_i)^2\):
\[\frac{\partial^2 \ell}{\partial a^2} = -\frac{n}{a^2} - (b-1)\sum_{i=1}^{n}\frac{x_i^a(\ln x_i)^2}{(1 - x_i^a)^2}\]
Second derivative with respect to \(b\):
Differentiating \(\ell'_b(a,b) = \frac{n}{b} + \sum_{i=1}^{n}\ln(1 - x_i^a)\) with respect to \(b\):
\[\frac{\partial^2 \ell}{\partial b^2} = -\frac{n}{b^2}\]
Mixed partial derivative:
Differentiating \(\ell'_a(a,b)\) with respect to \(b\), only the last term depends on \(b\):
\[\frac{\partial^2 \ell}{\partial a \partial b} = -\sum_{i=1}^{n}\frac{x_i^a \ln x_i}{1 - x_i^a}\]
Negating all second derivatives to form \(\mathcal{I}(a,b) = -\mathbf{H}\):
\[\mathcal{I}(a,b) = \begin{bmatrix} \frac{n}{a^2} + (b-1)\displaystyle\sum_{i=1}^{n}\frac{x_i^a(\ln x_i)^2}{(1-x_i^a)^2} & \displaystyle\sum_{i=1}^{n}\frac{x_i^a \ln x_i}{1-x_i^a} \\[10pt] \displaystyle\sum_{i=1}^{n}\frac{x_i^a \ln x_i}{1-x_i^a} & \frac{n}{b^2} \end{bmatrix}\]
Note that the matrix is symmetric, as expected, since \(\frac{\partial^2 \ell}{\partial a \partial b} = \frac{\partial^2 \ell}{\partial b \partial a}\). The diagonal entries represent the curvature of the log-likelihood with respect to each parameter individually, while the off-diagonal entries capture the cross-parameter curvature.
When \(b = 1\), the Kumaraswamy distribution reduces to the power distribution with CDF \(F(x) = x^a\), \(a > 0\), \(x \in (0,1)\).
Substituting \(b = 1\) into the log-likelihood derived in Problem A2, the last term vanishes since \((b-1) = 0\), giving:
\[\ell(a) = n\ln a + (a-1)\sum_{i=1}^{n}\ln x_i\]
Taking the first derivative and setting it equal to zero:
\[\ell'_a(a) = \frac{n}{a} + \sum_{i=1}^{n}\ln x_i = 0\]
Solving for \(a\):
\[\hat{a}_{MLE} = \frac{-n}{\sum_{i=1}^{n}\ln x_i}\]
Since \(0 < x_i < 1\), each \(\ln x_i < 0\), ensuring \(\hat{a}_{MLE} > 0\) as required.
The \(k\)-th theoretical moment of the power distribution is:
\[E[X^k] = \int_0^1 x^k F'(x)\,dx = \int_0^1 x^k \cdot ax^{a-1}\,dx = a\int_0^1 x^{k+a-1}\,dx = \frac{a}{k+a}\]
Setting \(k = 1\) to obtain the first moment:
\[E[X] = \frac{a}{1+a}\]
Matching this to the sample mean \(\bar{x}\) and solving for \(a\):
\[\frac{a}{1+a} = \bar{x} \implies a = \bar{x} + \bar{x}a \implies a(1-\bar{x}) = \bar{x}\]
\[\hat{a}_{MME} = \frac{\bar{x}}{1-\bar{x}}\]
Both estimators have closed-form expressions, making them computationally straightforward to evaluate.
Under the power distribution setting (\(b = 1\)), we have a single parameter \(a\). From the second derivative derived in Problem A3, setting \(b = 1\) causes the second term to vanish:
\[\frac{\partial^2 \ell}{\partial a^2}\Bigg|_{b=1} = -\frac{n}{a^2}\]
The observed Fisher Information is the negative of this second derivative:
\[\mathcal{I}(a) = -\frac{\partial^2 \ell}{\partial a^2} = \frac{n}{a^2}\]
By the asymptotic theory of MLEs, the asymptotic variance of \(\hat{a}_{MLE}\) is the reciprocal of the Fisher Information:
\[Var(\hat{a}_{MLE}) = \frac{1}{\mathcal{I}(a)} = \frac{a^2}{n}\]
In practice, we substitute \(\hat{a}_{MLE}\) for the unknown \(a\):
\[\widehat{Var}(\hat{a}_{MLE}) = \frac{\hat{a}_{MLE}^2}{n}\]
By the asymptotic normality of the MLE, the \(95\%\) Wald confidence interval for \(a\) is:
\[\hat{a}_{MLE} \pm z_{\alpha/2} \cdot \sqrt{\widehat{Var}(\hat{a}_{MLE})}\]
Substituting the estimated variance:
\[\hat{a}_{MLE} \pm z_{\alpha/2} \cdot \frac{\hat{a}_{MLE}}{\sqrt{n}}\]
where \(z_{\alpha/2} = 1.96\) for a \(95\%\) confidence level. This interval is valid asymptotically, relying on the MLE being approximately normally distributed for large \(n\).
The likelihood ratio test statistic is defined as:
\[\Lambda = 2\left[\ell(\hat{a}) - \ell(a_0)\right]\]
where \(\hat{a}_{MLE}\) is the unrestricted MLE derived in Problem A4, and \(a_0 = 1\) is the restricted value under \(H_0\).
Let \(S = \sum_{i=1}^{n}\ln x_i\) for notational convenience. Substituting \(\hat{a}_{MLE} = \frac{-n}{S}\) into the log-likelihood:
\[\ell(\hat{a}) = n\ln\left(\frac{-n}{S}\right) + \left(\frac{-n}{S} - 1\right)S = n\ln\left(\frac{-n}{S}\right) - n - S\]
\[\ell(1) = n\ln(1) + (1-1)\sum_{i=1}^{n}\ln x_i = 0\]
Substituting both values:
\[\Lambda = 2\left[n\ln\left(\frac{-n}{S}\right) - n - S\right]\]
where \(S = \sum_{i=1}^{n}\ln x_i\). At a significance level of \(\alpha = 0.05\), the critical value from the \(\chi^2_1\) distribution is \(\chi^2_{1,\, 0.05} = 3.841\), so we compare \(\Lambda\) against this value.
From Problem A2, the log-likelihood function for the Kumaraswamy distribution is:
\[\ell(a, b) = n\ln a + n\ln b + (a-1)\sum_{i=1}^{n}\ln x_i + (b-1)\sum_{i=1}^{n}\ln(1 - x_i^a)\]
with gradient vector:
\[\ell'_a(a, b) = \frac{n}{a} + \sum_{i=1}^{n}\ln x_i - (b-1)\sum_{i=1}^{n}\frac{x_i^a \ln x_i}{1 - x_i^a}\]
\[\ell'_b(a, b) = \frac{n}{b} + \sum_{i=1}^{n}\ln(1 - x_i^a)\]
We use optim() in R to numerically maximize the
log-likelihood, supplying the gradient vector to improve optimization
accuracy.
# Data: daily proportion of usable reservoir storage over 50 days
x <- c(0.12, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22,
0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.30, 0.31, 0.32,
0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40, 0.41, 0.42,
0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.50, 0.51, 0.52,
0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.60, 0.61, 0.78)
n <- length(x)
# Log-likelihood function for the Kumaraswamy distribution (from Problem A2)
kuma_loglik <- function(params) {
a <- params[1]
b <- params[2]
# Return a large penalty if parameters are non-positive
if (a <= 0 | b <= 0) return(1e10)
# Log-likelihood components
ll <- n * log(a) + n * log(b) +
(a - 1) * sum(log(x)) +
(b - 1) * sum(log(1 - x^a))
return(-ll) # Negative because optim() minimizes
}
# Gradient vector (from Problem A2) — supplied to optim() for accuracy
kuma_grad <- function(params) {
a <- params[1]
b <- params[2]
# Partial derivative with respect to a
dl_da <- n/a + sum(log(x)) - (b - 1) * sum((x^a * log(x)) / (1 - x^a))
# Partial derivative with respect to b
dl_db <- n/b + sum(log(1 - x^a))
# Return negative gradient since we are minimizing
return(c(-dl_da, -dl_db))
}
# Optimize using BFGS method with gradient supplied
# Initial values set to 1 (uniform distribution starting point)
fit <- optim(par = c(1, 1),
fn = kuma_loglik,
gr = kuma_grad,
method = "BFGS")
# Extract MLEs
a_hat <- fit$par[1]
b_hat <- fit$par[2]
cat("MLE of a:", round(a_hat, 4), "\n")MLE of a: 2.5296
MLE of b: 7.8838
We test whether the power distribution (\(b = 1\)) is sufficient to describe the data against the full Kumaraswamy distribution using the LRT at significance level \(\alpha = 0.05\):
\[H_0: b = 1 \quad \text{versus} \quad H_a: b \neq 1\]
The LRT statistic compares the log-likelihood of the unrestricted Kumaraswamy model (from Problem B1) against the restricted power distribution model (using \(\hat{a}_{MLE}\) from Problem A4):
\[\Lambda = 2\left[\ell(\hat{a}, \hat{b}) - \ell(\hat{a}_{power}, 1)\right]\]
Under \(H_0\), \(\Lambda \overset{d}{\to} \chi^2_1\). We reject \(H_0\) at \(\alpha = 0.05\) if \(\Lambda > \chi^2_{1,\, 0.05} = 3.841\).
# Data already defined in Problem B1
# ------- Unrestricted Model (Full Kumaraswamy from B1) -------
# Log-likelihood at MLEs from B1
ll_unrestricted <- -fit$value # optim() returns the minimized negative log-likelihood
# ------- Restricted Model (Power Distribution, b = 1) -------
# MLE of a for power distribution from Problem A4
a_hat_power <- -n / sum(log(x))
# Log-likelihood for power distribution at a_hat_power
ll_restricted <- n * log(a_hat_power) + (a_hat_power - 1) * sum(log(x))
# ------- LRT Statistic -------
Lambda <- 2 * (ll_unrestricted - ll_restricted)
# ------- Critical Value and Decision -------
critical_value <- qchisq(0.95, df = 1) # chi-squared critical value at alpha = 0.05
cat("Log-likelihood (Unrestricted):", round(ll_unrestricted, 4), "\n")Log-likelihood (Unrestricted): 24.5627
Log-likelihood (Restricted) : 0.1
LRT Statistic (Lambda) : 48.9253
Critical Value (chi^2_1,0.05): 3.8415
cat("Decision: ", ifelse(Lambda > critical_value,
"Reject H0 - data favours full Kumaraswamy",
"Fail to reject H0 - power distribution is sufficient"), "\n")Decision: Reject H0 - data favours full Kumaraswamy
At \(\alpha = 0.05\), we compare \(\Lambda\) to the critical value \(\chi^2_{1,\, 0.05} = 3.841\). If \(\Lambda > 3.841\), we reject \(H_0\), concluding that \(b \neq 1\) and the full Kumaraswamy distribution provides a significantly better fit than the power distribution. Practically, this would mean that the reservoir storage data has a more complex shape than what the simpler power distribution can capture, requiring both shape parameters \(a\) and \(b\) to adequately model the daily storage proportions during the dry season.
We obtain the bootstrap sampling distributions of \(\hat{a}\) and \(\hat{b}\) by resampling the data with
replacement \(B = 1000\) times,
refitting the Kumaraswamy distribution each time using the same
optim() procedure from Problem B1, and plotting the
resulting distributions using Gaussian kernel density curves.
set.seed(123) # For reproducibility
B <- 1000 # Number of bootstrap resamples
# Store bootstrap estimates
boot_a <- numeric(B)
boot_b <- numeric(B)
for (i in 1:B) {
# Resample data with replacement
x_boot <- sample(x, size = n, replace = TRUE)
# Negative log-likelihood for bootstrap sample
kuma_loglik_boot <- function(params) {
a <- params[1]
b <- params[2]
if (a <= 0 | b <= 0) return(1e10)
ll <- n * log(a) + n * log(b) +
(a - 1) * sum(log(x_boot)) +
(b - 1) * sum(log(1 - x_boot^a))
return(-ll)
}
# Gradient for bootstrap sample
kuma_grad_boot <- function(params) {
a <- params[1]
b <- params[2]
dl_da <- n/a + sum(log(x_boot)) -
(b - 1) * sum((x_boot^a * log(x_boot)) / (1 - x_boot^a))
dl_db <- n/b + sum(log(1 - x_boot^a))
return(c(-dl_da, -dl_db))
}
# Fit model to bootstrap sample
fit_boot <- tryCatch(
optim(par = c(a_hat, b_hat), # Use original MLEs as starting values
fn = kuma_loglik_boot,
gr = kuma_grad_boot,
method = "BFGS"),
error = function(e) NULL
)
# Store estimates if optimization converged
if (!is.null(fit_boot) && fit_boot$convergence == 0) {
boot_a[i] <- fit_boot$par[1]
boot_b[i] <- fit_boot$par[2]
} else {
boot_a[i] <- NA
boot_b[i] <- NA
}
}
# Remove any failed iterations
boot_a <- boot_a[!is.na(boot_a)]
boot_b <- boot_b[!is.na(boot_b)]
# ---- Plot 1: Bootstrap distribution of a_hat ----
plot(density(boot_a, kernel = "gaussian"),
main = expression("Bootstrap Sampling Distribution of " * hat(a)),
xlab = expression(hat(a)),
ylab = "Density",
col = "steelblue",
lwd = 3,
cex.main = 1.5,
cex.lab = 1.3,
cex.axis = 1.2)
abline(v = a_hat, col = "red", lty = 2, lwd = 2) # Original MLE
legend("topright",
legend = c("Gaussian KDE", "MLE"),
col = c("steelblue", "red"),
lty = c(1, 2),
lwd = 2,
cex = 1.2)# ---- Plot 2: Bootstrap distribution of b_hat ----
plot(density(boot_b, kernel = "gaussian"),
main = expression("Bootstrap Sampling Distribution of " * hat(b)),
xlab = expression(hat(b)),
ylab = "Density",
col = "darkgreen",
lwd = 3,
cex.main = 1.5,
cex.lab = 1.3,
cex.axis = 1.2)
abline(v = b_hat, col = "red", lty = 2, lwd = 2) # Original MLE
legend("topright",
legend = c("Gaussian KDE", "MLE"),
col = c("darkgreen", "red"),
lty = c(1, 2),
lwd = 2,
cex = 1.2)The \(95\%\) bootstrap confidence interval for \(b\) is obtained by taking the \(2.5\)th and \(97.5\)th percentiles of the bootstrap distribution of \(\hat{b}\).
The Wald CI requires the standard error of \(\hat{b}\), obtained from the inverse of the
observed Fisher Information matrix (i.e., the inverse of the negative
Hessian returned by optim()).
# ------- Bootstrap CI for b -------
boot_ci_b <- quantile(boot_b, probs = c(0.025, 0.975))
cat("95% Bootstrap CI for b: [", round(boot_ci_b[1], 4),
",", round(boot_ci_b[2], 4), "]\n")95% Bootstrap CI for b: [ 5.1266 , 16.2958 ]
# ------- Wald CI for b -------
# The observed Fisher Information matrix is the inverse of the negative Hessian
# optim() with hessian = TRUE returns the Hessian of the objective function
# (negative log-likelihood), so it is already the negative Hessian of the log-likelihood
fit_hess <- optim(par = c(a_hat, b_hat),
fn = kuma_loglik,
gr = kuma_grad,
method = "BFGS",
hessian = TRUE)
# Variance-covariance matrix is the inverse of the Hessian
vcov_matrix <- solve(fit_hess$hessian)
# Standard error of b_hat is the square root of the (2,2) entry
se_b <- sqrt(vcov_matrix[2, 2])
# 95% Wald CI for b
wald_ci_b <- c(b_hat - 1.96 * se_b, b_hat + 1.96 * se_b)
cat("95% Wald CI for b : [", round(wald_ci_b[1], 4),
",", round(wald_ci_b[2], 4), "]\n")95% Wald CI for b : [ 3.4842 , 12.2833 ]
# Compare with B2 result - does the CI contain 1?
cat("Bootstrap CI contains 1:", boot_ci_b[1] <= 1 & 1 <= boot_ci_b[2], "\n")Bootstrap CI contains 1: FALSE
Wald CI contains 1 : FALSE
The Wald confidence interval for \(b\) constructed in Part (2) relies on the assumption that \(\hat{b}\) is approximately normally distributed. We can assess this assumption by examining the shape of the bootstrap sampling distribution of \(\hat{b}\) from Part (1).
The bootstrap distribution of \(\hat{b}\) is clearly right-skewed, with a long tail extending well beyond the bulk of the distribution. This departure from normality suggests that the Wald CI for \(b\) is not well supported, and the bootstrap confidence interval would be the more reliable choice for inference on \(b\) in this setting.
We perform a likelihood ratio test to assess whether the reservoir storage data follows a uniform distribution on \((0,1)\), which corresponds to the Kumaraswamy distribution with \(a = 1\) and \(b = 1\):
\[H_0: a = 1 \ \& \ b = 1 \quad \text{versus} \quad H_a: a \neq 1 \ \text{or} \ b \neq 1 \ \text{or} \ (a \neq 1 \ \& \ b \neq 1)\]
The LRT statistic compares the log-likelihood of the unrestricted Kumaraswamy model (from Problem B1) against the restricted uniform model where \(a = 1\) and \(b = 1\):
\[\Lambda = 2\left[\ell(\hat{a}, \hat{b}) - \ell(1, 1)\right]\]
Since two parameters are simultaneously restricted under \(H_0\), by Wilks’ theorem \(\Lambda \overset{d}{\to} \chi^2_2\) asymptotically.
# ------- Restricted Model (Uniform Distribution, a = 1, b = 1) -------
# Log-likelihood simplifies to 0 since log(1) = 0 and remaining terms vanish
ll_uniform <- 0
# ------- LRT Statistic -------
Lambda_b4 <- 2 * (ll_unrestricted - ll_uniform)
# ------- Critical Value and Decision -------
# Two parameters restricted simultaneously, so chi-squared with df = 2
critical_value_b4 <- qchisq(0.95, df = 2)
cat("LRT Statistic (Lambda) :", round(Lambda_b4, 4), "\n")LRT Statistic (Lambda) : 49.1254
Critical Value (chi^2_2, 0.05) : 5.9915
cat("Decision:", ifelse(Lambda_b4 > critical_value_b4,
"Reject H0 - data does not follow uniform distribution",
"Fail to reject H0 - data is consistent with uniform distribution"), "\n")Decision: Reject H0 - data does not follow uniform distribution
At \(\alpha = 0.05\), the LRT statistic \(\Lambda = 49.1254\) far exceeds the critical value \(\chi^2_{2,\, 0.05} = 5.9915\), leading us to strongly reject \(H_0\). This provides overwhelming evidence that the reservoir storage data does not follow a uniform distribution on \((0,1)\).
Practically, this means that during the dry season, not all storage levels are equally likely. The data tends to concentrate at lower storage levels, consistent with the right skewed pattern described in the working dataset where drought conditions dominate and occasional replenishment events push storage higher only temporarily. The full Kumaraswamy distribution with both shape parameters \(\hat{a}\) and \(\hat{b}\) is therefore necessary to adequately model the daily reservoir storage proportions.