Assignment Objectives

  • Master the fundamental concepts of point estimation and performance metrics

  • Understand the theoretical foundation of the method of moments estimator (MME)

  • Implement MME in R, incorporating numerical approximation methods


Use of AI Tools

Policy on AI Tool Use: Students must adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

Code Inclusion Requirement: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.


Log-logistic Distribution

The log-logistic distribution (also known as the Fisk distribution) is a continuous probability distribution that is particularly useful in contexts where data exhibit non-negative, skewed behavior and where the hazard rate is unimodal (increases to a peak and then decreases). It has been widely used in the areas such as survival analysis and reliability engineering, environmental science, economics, pharmacology, finance and risk management, etc.

For given shape parameter \(\beta\) and scale parameter \(\alpha\), the cumulative distribution function

\[ F(x) = \frac{1}{1+(x/\alpha)^{-\beta}} \]

As an exercise, you can derive the density in the following form

\[ f(x) = \frac{(\beta/\alpha)(x/\alpha)^{\beta-1}}{[1+(x/\alpha)^\beta]^2}, \ \ \text{ for } \ \ x > 0. \]

After some algebra, we can find the \(k\)th moment

\[ \mu_k = E[X^k] = \alpha^k B\left(1+\frac{k}{\beta}, 1 - \frac{k}{\beta} \right). \]

This assignment will focus on finding MME of parameters \(\alpha\) and \(\beta\) based on a real-world application data set.


Question 1: Derive the log-logistic density function

Given the CDF of the two-parameter log-logistic distribution

\[ F(x) = \frac{1}{1+(x/\alpha)^{-\beta}}. \]

The probability density function is the derivative of the cumulative distribution function of the log-logistic function.

cdf = expression((1) /(1 + (x/alpha))^(-beta)) #write the CDF formula as an expression 
pdf = D(cdf, "x") #take the derivative of the cdf

print(pdf)
-((1) * ((1 + (x/alpha))^((-beta) - 1) * ((-beta) * (1/alpha)))/((1 + 
    (x/alpha))^(-beta))^2)


Question 2: Distribution of Recovery Time from A Surgery

Time to recovery (in days) after a specific knee surgery procedure. This follows a typical log-logistic pattern in medical survival/recovery analysis:

8.23, 12.74, 14.83, 16.61, 18.16, 19.55, 20.80, 21.94, 23.00, 23.98, 24.89, 25.75, 26.56, 
27.34, 28.08, 28.79, 29.48, 30.15, 30.81, 31.45, 32.08, 32.70, 33.31, 33.92, 34.53, 35.13, 
35.73, 36.33, 36.93, 37.53, 38.14, 38.75, 39.37, 40.00, 40.64, 41.29, 41.95, 42.63, 43.33, 
44.05, 44.79, 45.56, 46.36, 47.20, 48.08, 49.02, 50.03, 51.12, 52.32, 53.65

Based on the above data to perform the following analysis.

  1. Using method of moment estimation to estimate \(\alpha\) and \(\beta\), denoted by \(\hat{\alpha}\) and \(\hat{\beta}\), respectively.
recovery = c(8.23, 12.74, 14.83, 16.61, 18.16, 19.55, 20.80, 21.94, 23.00, 23.98, 24.89, 25.75, 26.56, 
27.34, 28.08, 28.79, 29.48, 30.15, 30.81, 31.45, 32.08, 32.70, 33.31, 33.92, 34.53, 35.13, 
35.73, 36.33, 36.93, 37.53, 38.14, 38.75, 39.37, 40.00, 40.64, 41.29, 41.95, 42.63, 43.33, 
44.05, 44.79, 45.56, 46.36, 47.20, 48.08, 49.02, 50.03, 51.12, 52.32, 53.65)

hist(recovery)

s.recovery= sort(recovery)
#a

length(recovery) #n = 50
[1] 50
m1 = mean(recovery) #sample mean x_bar=34.1922
m2 =mean(recovery^2) #second moment #1288

mom_beta <- function(beta, m1, m2)
{
  term1 <- ( 2 * pi/ beta) /sin(2 * pi/beta)  #second moment
  term2 <- ((pi / beta) /sin (pi /beta))^2  #square of mean divided by alpha squared
  
  return(term1 / term2 - (m2 / m1^2))
}
#info found: ://efaidnbmnnnibpcajpcglclefindmkaj/https://www.math.wm.edu/~leemis/chart/UDR/PDFs/Loglogistic.pdf

#nonlinear function must solve for beta
    sol_beta <- uniroot(mom_beta, interval = c(2.1, 20), m1 = m1, m2 = m2)$root
#subsitute beta to find alpha
    sol_alpha <- m1 / ((pi / sol_beta) / sin(pi / sol_beta))
    
# Print Results
cat("Estimated Alpha (Scale/Median):", sol_alpha, "\n") #Estimated Alpha (Scale/Median): 32.6543 
Estimated Alpha (Scale/Median): 32.6543 
cat("Estimated Beta (Shape/Width):", sol_beta, "\n")  #Estimated Beta (Shape/Width): 6.006232 
Estimated Beta (Shape/Width): 6.006232 

Estimated Alpha (Scale/Median): 32.6543 Estimated Beta (Shape/Width): 6.006232

The median recovery time for a patient is 32.65 days. As a lower beta value at 6 we know that recovery time over all in our sample have similar times. In other words the shape of our distribution is not very wide, suggesting there is large variety of time differences in the data; this does not occur.

  1. Since the moment estimates \(\hat{\alpha}\) and \(\hat{\beta}\) are random, construct bootstrap sampling distributions for each. To visualize these distributions, plot separate bootstrap histograms for \(\hat{\alpha}\) and \(\hat{\beta}\). hen, overlay a smooth density curve on each histogram using Gaussian kernel density estimation. Finally, describe the patterns of these density curves.
set.seed(325)

boot_alpha = NULL
boot_beta = NULL

for(i in 1:1000)
{
  sample_i = sample(recovery, size=50, replace = TRUE) #random sample from the sample recovery
  
  #calculate moments from the specific samples
  
m1_i = mean(sample_i)  #first moment
m2_i = mean(sample_i^2) #second moment
  
  boot_beta[i] = uniroot(mom_beta, interval = c(2.1, 20), m1 = m1_i, m2 = m2_i)$root
#subsitute beta to find alpha
  boot_alpha[i] = m1_i / ((pi /boot_beta[i]) / sin(pi / boot_beta[i]))


}



# Create data frame for plotting
boot_results <- data.frame(alpha = boot_alpha, beta = boot_beta)
# Plotting the boot straps of alpha and beta parameters (boot_results data frame)
bootsggp <- ggplot(boot_results) +
  #Graph the Alpha density
  geom_density(aes(x = alpha, color = "alpha"), size = 1) +
  # Graph the Beta density
  geom_density(aes(x = beta, color = "beta"), size = 1) +
  
  labs(title = "Bootstrap Distributions for Alpha (Recovery time) and Beta Log-Logistic Parameters",
       x = "Estimate Value", 
       y = "Density") +
  scale_color_manual(name = "Parameters", 
                     values = c("alpha" = "red", "beta" = "blue")) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Interactive graph
ggplotly(bootsggp)

This graph places the distributions of both the alpha distribution and the beta distribution on the same place. We see the estimated bootstrap value of the beta (\(\beta\) = 5.837965) is far lower than the estimated bootstrap alpha (\(\alpha\) = 30.89606). Additionally the alpha take on a wider range of values than the beta distribution, suggesting the beta value has more precision in our distribution. In other words, the shape of our data set, or how similar our data set to like recovery times is likely more consistent than the spread of our data, the distance between the ends of our data.

Additionally, our bootstrap estimated alpha values are similar to the moment of estimation value of 32.6542985 as well as the beta bootstrap and moment of estimation at 6.0062322, suggesting both the moment of estimations hold little bias.

The Bootstrap Parameter’s Alpha and Beta graphed separately with their respective kernels are below.

#To visualize these distributions, plot separate bootstrap histograms for $\hat{\alpha}$ and $\hat{\beta}$.  hen, overlay a smooth density curve on each histogram using Gaussian kernel density estimation. Finally, describe the patterns of these density curves.

#graphing the Bootstrap Parameter's separately with their respective kernels

# Plotting Parameter Alpha (Scale)
p_alpha <- ggplot(boot_results, aes(x = alpha)) +
  # The raw data (Histogram)
  geom_histogram(aes(y = ..density..), bins = 30, fill = "gray80", color = "white") +
  # The Smoothing Line (Gaussian Kernel)
  geom_density(color = "red", size = 1, kernel = "gaussian") +
  labs(title = "Bootstrap Distribution for Alpha - Recovery Time",
       subtitle = "Red line = Gaussian Kernel Smoothing",
       x = "Alpha (Days)", y = "Density") +
  theme_minimal()

# Separate plot for Parameter Beta (Shape)
p_beta <- ggplot(boot_results, aes(x = beta)) +
  # The raw data (Histogram)
  geom_histogram(aes(y = ..density..), bins = 30, fill = "gray80", color = "white") +
  # The Smoothing Line (Gaussian Kernel)
  geom_density(color = "blue", size = 1, kernel = "gaussian") +
  labs(title = "Bootstrap Distribution for Beta",
       subtitle = "Blue line = Gaussian Kernel Smoothing",
       x = "Beta (Shape Parameter)", y = "Density") +
  theme_minimal()

#Print interactive graphs
ggplotly(p_alpha) 
ggplotly(p_beta)

The alpha parameter, the parameter providing scale for the bootstrap distribution with its gaussian kernel looks approximately normal, meeting the assumption of mean of a bootstrap distribution. In other words, the bootstrap distribution looks as expected.

The beta parameter also looks to take a generally normal distribution, although may have slight skew. May suggest our normal based approximation may have some noise in the data, may not be the best estimation possible. An additional recommendation could be to alter the bandwidth of the smoothing of the Gaussian kernel, although the kernel looks generally smooth.

In respect to the recovery time data set, knee surgery recovery time, alpha, is about 31 days, and through the beta value, this is a consistent number across most patients.

---
title: "Assignment 3: Methods of Moment Estimation"
author: "Ezana Rivers"
date: " Due: 2-24-26"
output:
  html_document: 
    toc: yes
    toc_depth: 4
    toc_float: yes
    number_sections: no
    toc_collapsed: yes
    code_folding: hide
    code_download: yes
    smooth_scroll: yes
    theme: lumen
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
    fig_width: 3
    fig_height: 3
  word_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    keep_md: yes
editor_options: 
  chunk_output_type: inline
---

```{css, echo = FALSE}
#TOC::before {
  content: "Table of Contents";
  font-weight: bold;
  font-size: 1.2em;
  display: block;
  color: navy;
  margin-bottom: 10px;
}


div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 22px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
  font-family: "Gill Sans", sans-serif;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 15px;
  font-weight: bold;
  font-family: system-ui;
  color: navy;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Gill Sans", sans-serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";

body { background-color:white; }

.highlightme { background-color:yellow; }

p { background-color:white; }

}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.
if (!require("knitr")) {
   install.packages("knitr")
   library(knitr)
}
if (!require("pander")) {
   install.packages("pander")
   library(pander)
}
if (!require("ggplot2")) {
  install.packages("ggplot2")
  library(ggplot2)
}
if (!require("tidyverse")) {
  install.packages("tidyverse")
  library(tidyverse)
}

if (!require("plotly")) {
  install.packages("plotly")
  library(plotly)
}
####
knitr::opts_chunk$set(echo = TRUE,       # include code chunk in the output file
                      warning = FALSE,   # sometimes, you code may produce warning messages,
                                         # you can choose to include the warning messages in
                                         # the output file. 
                      results = TRUE,    # you can also decide whether to include the output
                                         # in the output file.
                      message = FALSE,
                      comment = NA
                      )  
```
 
 \
 
## **Assignment Objectives** 

* Master the fundamental concepts of point estimation and performance metrics

* Understand the theoretical foundation of the method of moments estimator (MME)

* Implement MME in R, incorporating numerical approximation methods

\

**Use of AI Tools**

**Policy on AI Tool Use**: Students must adhere to the AI tool policy specified in the course syllabus. The direct copying of AI-generated content is strictly prohibited. All submitted work must reflect your own understanding; where external tools are consulted, content must be thoroughly rephrased and synthesized in your own words.

**Code Inclusion Requirement**: Any code included in your essay must be properly commented to explain the purpose and/or expected output of key code lines. Submitting AI-generated code without meaningful, student-added comments will not be accepted.

\

**Log-logistic Distribution**

The log-logistic distribution (also known as the Fisk distribution) is a continuous probability distribution that is particularly useful in contexts where data exhibit non-negative, skewed behavior and where the hazard rate is unimodal (increases to a peak and then decreases). It has been widely used in the areas such as survival analysis and reliability engineering, environmental science, economics, pharmacology, finance and risk management, etc. 

For given shape parameter $\beta$ and scale parameter $\alpha$, the cumulative distribution function

$$
F(x) = \frac{1}{1+(x/\alpha)^{-\beta}}
$$

As an exercise, you can derive the density in the following form

$$
f(x) = \frac{(\beta/\alpha)(x/\alpha)^{\beta-1}}{[1+(x/\alpha)^\beta]^2}, \ \ \text{ for } \ \ x > 0.
$$

After some algebra, we can find the $k$th moment

$$
\mu_k = E[X^k] = \alpha^k B\left(1+\frac{k}{\beta}, 1 - \frac{k}{\beta} \right).
$$

This assignment will focus on finding MME of parameters $\alpha$ and $\beta$ based on a real-world application data set.


\

## **Question 1: Derive the log-logistic density function **

Given the CDF of the two-parameter log-logistic distribution

$$
F(x) = \frac{1}{1+(x/\alpha)^{-\beta}}.
$$


The probability density function is the derivative of the cumulative distribution function of the log-logistic function. 

```{r}

cdf = expression((1) /(1 + (x/alpha))^(-beta)) #write the CDF formula as an expression 
pdf = D(cdf, "x") #take the derivative of the cdf

print(pdf)

```


\

## **Question 2: Distribution of Recovery Time from A Surgery**

Time to recovery (in days) after a specific knee surgery procedure. This follows a typical **log-logistic pattern** in medical survival/recovery analysis:

```
8.23, 12.74, 14.83, 16.61, 18.16, 19.55, 20.80, 21.94, 23.00, 23.98, 24.89, 25.75, 26.56, 
27.34, 28.08, 28.79, 29.48, 30.15, 30.81, 31.45, 32.08, 32.70, 33.31, 33.92, 34.53, 35.13, 
35.73, 36.33, 36.93, 37.53, 38.14, 38.75, 39.37, 40.00, 40.64, 41.29, 41.95, 42.63, 43.33, 
44.05, 44.79, 45.56, 46.36, 47.20, 48.08, 49.02, 50.03, 51.12, 52.32, 53.65
```
Based on the above data to perform the following analysis.

a) Using method of moment estimation to estimate $\alpha$ and $\beta$, denoted by $\hat{\alpha}$ and $\hat{\beta}$, respectively. 



```{r}
recovery = c(8.23, 12.74, 14.83, 16.61, 18.16, 19.55, 20.80, 21.94, 23.00, 23.98, 24.89, 25.75, 26.56, 
27.34, 28.08, 28.79, 29.48, 30.15, 30.81, 31.45, 32.08, 32.70, 33.31, 33.92, 34.53, 35.13, 
35.73, 36.33, 36.93, 37.53, 38.14, 38.75, 39.37, 40.00, 40.64, 41.29, 41.95, 42.63, 43.33, 
44.05, 44.79, 45.56, 46.36, 47.20, 48.08, 49.02, 50.03, 51.12, 52.32, 53.65)

hist(recovery)

s.recovery= sort(recovery)
#a

length(recovery) #n = 50
m1 = mean(recovery) #sample mean x_bar=34.1922
m2 =mean(recovery^2) #second moment #1288

mom_beta <- function(beta, m1, m2)
{
  term1 <- ( 2 * pi/ beta) /sin(2 * pi/beta)  #second moment
  term2 <- ((pi / beta) /sin (pi /beta))^2  #square of mean divided by alpha squared
  
  return(term1 / term2 - (m2 / m1^2))
}
#info found: ://efaidnbmnnnibpcajpcglclefindmkaj/https://www.math.wm.edu/~leemis/chart/UDR/PDFs/Loglogistic.pdf

#nonlinear function must solve for beta
    sol_beta <- uniroot(mom_beta, interval = c(2.1, 20), m1 = m1, m2 = m2)$root
#subsitute beta to find alpha
    sol_alpha <- m1 / ((pi / sol_beta) / sin(pi / sol_beta))
    
# Print Results
cat("Estimated Alpha (Scale/Median):", sol_alpha, "\n") #Estimated Alpha (Scale/Median): 32.6543 
cat("Estimated Beta (Shape/Width):", sol_beta, "\n")  #Estimated Beta (Shape/Width): 6.006232 





```
Estimated Alpha (Scale/Median): 32.6543 
Estimated Beta (Shape/Width): 6.006232

The median recovery time for a patient is 32.65 days. As a lower beta value at 6 we know that recovery time over all in our sample have similar times. In other words the shape of our distribution is not very wide, suggesting there is large variety of time differences in the data; this does not occur.

b) Since the moment estimates $\hat{\alpha}$ and $\hat{\beta}$ are random, construct bootstrap sampling distributions for each. To visualize these distributions, plot separate bootstrap histograms for $\hat{\alpha}$ and $\hat{\beta}$.  hen, overlay a smooth density curve on each histogram using Gaussian kernel density estimation. Finally, describe the patterns of these density curves.

```{r}
set.seed(325)

boot_alpha = NULL
boot_beta = NULL

for(i in 1:1000)
{
  sample_i = sample(recovery, size=50, replace = TRUE) #random sample from the sample recovery
  
  #calculate moments from the specific samples
  
m1_i = mean(sample_i)  #first moment
m2_i = mean(sample_i^2) #second moment
  
  boot_beta[i] = uniroot(mom_beta, interval = c(2.1, 20), m1 = m1_i, m2 = m2_i)$root
#subsitute beta to find alpha
  boot_alpha[i] = m1_i / ((pi /boot_beta[i]) / sin(pi / boot_beta[i]))


}



# Create data frame for plotting
boot_results <- data.frame(alpha = boot_alpha, beta = boot_beta)
 

```
```{r}
# Plotting the boot straps of alpha and beta parameters (boot_results data frame)
bootsggp <- ggplot(boot_results) +
  #Graph the Alpha density
  geom_density(aes(x = alpha, color = "alpha"), size = 1) +
  # Graph the Beta density
  geom_density(aes(x = beta, color = "beta"), size = 1) +
  
  labs(title = "Bootstrap Distributions for Alpha (Recovery time) and Beta Log-Logistic Parameters",
       x = "Estimate Value", 
       y = "Density") +
  scale_color_manual(name = "Parameters", 
                     values = c("alpha" = "red", "beta" = "blue")) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

# Interactive graph
ggplotly(bootsggp)
```

This graph places the distributions of both the alpha distribution and the beta distribution on the same place. We see the estimated bootstrap value of the beta ($\beta$ = 5.837965) is far lower than the estimated bootstrap alpha ($\alpha$ = 30.89606). Additionally the alpha take on a wider range of values than the beta distribution, suggesting the beta value has more precision in our distribution. In other words, the shape of our data set, or how similar our data set to like recovery times is likely more consistent than the spread of our data, the distance between the ends of our data.

Additionally, our bootstrap estimated alpha values are similar to the moment of estimation value of  `r sol_alpha` as well as the beta bootstrap and moment of estimation at `r sol_beta`, suggesting both the moment of estimations hold little bias.



The Bootstrap Parameter's Alpha and Beta graphed separately with their respective kernels are below.

```{r}
#To visualize these distributions, plot separate bootstrap histograms for $\hat{\alpha}$ and $\hat{\beta}$.  hen, overlay a smooth density curve on each histogram using Gaussian kernel density estimation. Finally, describe the patterns of these density curves.

#graphing the Bootstrap Parameter's separately with their respective kernels

# Plotting Parameter Alpha (Scale)
p_alpha <- ggplot(boot_results, aes(x = alpha)) +
  # The raw data (Histogram)
  geom_histogram(aes(y = ..density..), bins = 30, fill = "gray80", color = "white") +
  # The Smoothing Line (Gaussian Kernel)
  geom_density(color = "red", size = 1, kernel = "gaussian") +
  labs(title = "Bootstrap Distribution for Alpha - Recovery Time",
       subtitle = "Red line = Gaussian Kernel Smoothing",
       x = "Alpha (Days)", y = "Density") +
  theme_minimal()

# Separate plot for Parameter Beta (Shape)
p_beta <- ggplot(boot_results, aes(x = beta)) +
  # The raw data (Histogram)
  geom_histogram(aes(y = ..density..), bins = 30, fill = "gray80", color = "white") +
  # The Smoothing Line (Gaussian Kernel)
  geom_density(color = "blue", size = 1, kernel = "gaussian") +
  labs(title = "Bootstrap Distribution for Beta",
       subtitle = "Blue line = Gaussian Kernel Smoothing",
       x = "Beta (Shape Parameter)", y = "Density") +
  theme_minimal()

#Print interactive graphs
ggplotly(p_alpha) 
ggplotly(p_beta)
 
```

The alpha parameter, the parameter providing scale for the bootstrap distribution with its gaussian kernel looks approximately normal, meeting the assumption of mean of a bootstrap distribution. In other words, the bootstrap distribution looks as expected.


The beta parameter also looks to take a generally normal distribution, although may have slight skew. May suggest our normal based approximation may have some noise in the data, may not be the best estimation possible. An additional recommendation could be to alter the bandwidth of the smoothing of the Gaussian kernel, although the kernel looks generally smooth. 

In respect to the recovery time data set, knee surgery recovery time, alpha, is about 31 days, and through the beta value, this is a consistent number across most patients.