1 Introduction

Whenever a statistician needs to understand how a variable functions, they often plot a probability density function curve to understand how the variable is distributed. This often gives them a clue as to how to describe the curve. However, from time to time, statisticians will need to know if a distribution belongs to a specific family of distribution i.e. whether its distribution is lognormal, gamma, uniform, normal or others. Hence, with the aid of systems like Maximum Likelihood Estimate (MLE) and Method of moments (MOM), one can clearly map behavior of different distributions onto the original distribution to figure out which family of distributions that the variable belongs to. This tutorial is focused on showcasing how to use MLE and MOM to model the distribution of both Glycohemoglobin and Adult Female heights. Technically speaking, a statistician will begin by plotting a distribution of the probability density function of any of the two variables stated above. Once that is done, the statistician will evaluate the curve to intuitively figure out which types of distribution it exhibits. Then, the statistician will go on to use either MLE or MOM to estimate parameters that can be used to build the estimated probability density function of the distributions in order to draw any similar connection to the empirical distribution of the variable. The distributions that showcases the closest similarity to the empirical distribution of Glycohemoglobin or Adult Female heights will be selected as the optimal ones.

2 Methods

The code chunk below showcases an example of how MLE is used to estimate the parameters that can be used to model a distribution. It utilizes numerous parameters such as shape and scale while leveraging the gamma distribution to calculate values for those parameters that can be used to model the behavior of the original distributions.

dist_n <- function(dist) {
  eval(parse(text = str_c("d", dist, sep = "")))
}
nLL <- function(shape, scale){
  fm <- dist_n("gamma")
  fs <- dgamma(
        x = gh
      , shape = shape
      , scale = scale
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(shape=1,scale=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))

As it can be observed the shape above is estimated to be 40.7 and the scale is also estimated as 0.15. For the other distributions such as normal and weibull, their parameters can be estimated by tweaking the dist argument in the dist_n function.

The code chunk below showcases an example of how MOM is used to estimate the parameters that can be used to model a distribution. It utilizes numerous parameters such as shape and scale while leveraging the gamma distribution to calculate values for those parameters that can be used to model the behavior of the original distributions.

m <- mean(gh)
v <- var(gh)
mom_shape <- m^2/v
mom_scale <- v/m
m

## [1] 5.7246

## [1] 1.107222

mom_shape

## [1] 29.59754

mom_scale

## [1] 0.1934147

Here, the MOM estimated the values of shape to be 29.5975363 and scale to be 0.1934147.

The next three sections will focus on how MOM mathematically calculates the parameters for gamma, normal and weibull distributions with respect to the Glycohaemoglobin variable.

Gamma

The parameters for gamma distribution are shape and scale

\[E(X)=shape*scale\] \[Var(X)=shape*scale^2\] \[shape=E(X) \div scale\] \[Var(X)=E(X)*scale\] \[scale=Var(X) \div E(X)\] \[E(X)=shape*(Var(X) \div E(X))\] \[shape=E(X)^2/Var(X)\] Hence, shape and scale are calculated as: \[shape=5.72^2 \div 1.10==29.6\] \[scale=1.10 \div 5.72==0.19\] Normal

The normal distribution’s parameters are mean and standard deviation.

\[E(X)=mean\] \[Var(X)= SD^2\] \[mean=5.7246\] \[SD = \sqrt(Var(X))\]

\[SD = 1.05\]

Weibull

The parameters for weibull distribution are shape and scale. Lambda represents scale, k represents shape.

\[E(X)=\Lambda\Gamma(1 + (1/k))\]

\[Var(X)=\Lambda^2\Gamma(1 + (2/k))-E(X)^2\] \[5.7246=\Lambda\Gamma(1 + (1/k))\]

\[1.107= \Lambda^2\Gamma(1 + (2/k))-32.771\] Using simultaneous equations,

\[\Lambda(scale) = 6.1516\] \[k(shape) = 6.3478\]

Multiple functions were created to showcase the discrepancies between using MLE and MOM to model the behavior of the variables. The pdf function uses the variable as well as the distribution and its parameters to plot the estimated distribution on the empirical distribution. This will provide people with the ability to discern if a curve fits well with the original distribution. The cdf function is similar to the pdf function but it is more interested in highlighting any similarity between the estimated and empirical distribution. The qplot function is interested in seeing how the quantiles of both the theoretical and sampling distributions are similar. Then, the med_est function was used to estimate the median from the estimated distribution. The estimated median is calculated by obtaining median quantile of the distribution using the parameters obtained via MLE and MOM. Hence is done by using q...(0.5, parameters). ... could represent normal, gamma or weibull distribution and the parameters are inserted in the spot reserved for parameters. The med_sample function modeled the distribution of the median sampling distribution, in order to showcase how the median varies under simulation. This is done by running numerous trials that generate random values using the parameters collected from MLE and MOM. Then, the medians of all these trials are stored in a vector. The histogram of this vector of medians is plotted to showcase the median sampling distribution. Finally, the mid_95 function captures the range of the middle 95% of values in the variable. This is a very important value to calculate because it gives us an idea as to where 95% of the population lie. For each one of the functions above, the MLE output is compared to that of the MOM. The function for calculating the parameters of the weibull distribution is also included in the code chunk below.

pdf <- function(y, dist, ..., word2) {
  fm <- eval(parse(text = str_c("d", dist, sep = "")))
  hist(y,freq=FALSE, main = sprintf("%s distribution's probability density function\n using %s", dist, word2),
       xlab = "value", ylab = "density")
  curve(fm(x, ...), add=TRUE, col = "blue")
}

cdf <- function(y, dist, ..., word2) {
  Fm <- eval(parse(text = str_c("p", dist, sep = "")))
  plot(ecdf(y), main = sprintf("Cumulative distribution function for %s \n distribution using %s", dist, word2),
       xlab = "value", ylab = "probability")
  curve(Fm(x, ...), add=TRUE, col = "blue")
}

qplot <- function(x,dist, ..., word2) {
  qm <- eval(parse(text = str_c("q", dist, sep = "")))
  ps <- ppoints(1000)
  theoretical <- qm(ps, ...)
  sample <- quantile(x, ps)
  plot(theoretical, sample, main = sprintf("QQPlot for %s distribution\n using %s", dist, word2),
       col = "blue")
  abline(0,1)
}

med_est <- function(dist,..., word2) {
  qm <- eval(parse(text = str_c("q", dist, sep = "")))
  val <- qm(0.5, ...)
  sprintf("The median of the estimated distribution under %s distribution using %s is %.2f", dist, word2, val)
}

med_sample <- function(dist, ..., word2) {
  rm <- eval(parse(text = str_c("r", dist, sep = "")))
  q <- c()
  for (i in 1:1000) {
    n <- rm(1000, ...)
    q[i] <- quantile(n, 0.5)
  }
  hist(q, freq=FALSE, main = sprintf("Histogram of the median of %s sampling\n distribution using %s", dist, word2), xlab = "value", ylab = "density")
}

mid_95 <- function(dist, ..., word2) {
  rm <- eval(parse(text = str_c("r", dist, sep = "")))
  n <- rm(1000, ...)
  l <- quantile(n, 0.025)
  h <- quantile(n, 0.975)
  sprintf("The middle 95 percent of the sampling distribution for %s distribution begins from %.2f and ends at %.2f using %s", dist, l, h, word2)
}

## This is the function for calculating the parameters for the weibull distribution
weibull_fn <- function(a, b) {
  mean <- a*gamma(1 + (1/b))
  var_mean_2 <- a*a*gamma(1 + (2/b))
  
  return(c(mean, var_mean_2))
}

3 Results

Below are the results of the outcomes of using MLE and MOM to model normal, gamma and weibull distributions of both Glycohemoglobin and Heights of Adult Females in order to see which of the distributions of them works best for both variables.

3.1 Glycohaemoglobin

3.1.1 Normal Distribution

3.1.1.1 Parameter Estimation

nLL <- function(mean, sd){
  fm <- dist_n("norm")
  fs <- fm(
        x = gh
      , mean = mean
      , sd = sd
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(mean=1,sd=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))
mle_mean <- coef(fit)[1]
mle_sd <- coef(fit)[2]
mom_mean <- mean(gh)
mom_sd <- sqrt(var(gh))

The maximum likelihood estimate parameters for gh are 5.7245999 and 1.0517205 for mean and standard deviation respectively. The method of moments parameters for gh are 5.7246 and 1.0522462 for mean and standard deviation respectively.

3.1.1.2 Estimated and Empirical distributions’ comparison

par(mfrow = c(1,2))
pdf(gh, "norm", mean = mle_mean, sd = mle_sd, word2 = "mle")
pdf(gh, "norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

There is no significant difference between the fits of values generated via MLE and MOM. However, it seems like the histograms of the original values are very postively skewed in both cases. This stems from the fact that most of the original values for glycohaemoglobin lie between 5 and 6.

par(mfrow = c(1,2))
cdf(gh, "norm", mean = mle_mean, sd = mle_sd, word2 = "mle")
cdf(gh, "norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

The two graphs are also very similar in this case and both the empirical and estimated distributions follow the same trends. However, the cdfs of the estimated distributions do not fit perfectly well on the ecdfs of the empirical values.

par(mfrow = c(1,2))
qplot(gh, dist = "norm", mean = mle_mean, sd = mle_sd, word2 = "mle")
qplot(gh, dist = "norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

Although, the qqplot charts show slight deviations from one another in both charts, they fit perfectly well onto the y=x line on the charts.

3.1.1.3 Estimated distribution’s median

med_est(dist="norm", mean = mle_mean, sd = mle_sd, word2 = "mle")

## [1] "The median of the estimated distribution under norm distribution using mle is 5.72"

med_est(dist="norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

## [1] "The median of the estimated distribution under norm distribution using mom is 5.72"

3.1.1.4 Median Sampling distribution

med_sample(dist="norm", mean = mle_mean, sd = mle_sd, word2 = "mle")

med_sample(dist="norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

3.1.1.5 Range of Middle 95% of sampling distribution

mid_95(dist="norm", mean=mle_mean, sd =mle_sd, word2="mle")

## [1] "The middle 95 percent of the sampling distribution for norm distribution begins from 3.61 and ends at 7.60 using mle"

mid_95(dist="norm", mean=mom_mean, sd =mom_sd, word2="mom")

## [1] "The middle 95 percent of the sampling distribution for norm distribution begins from 3.69 and ends at 7.71 using mom"

3.1.2 Gamma Distribution

3.1.2.1 Parameter Estimation

nLL <- function(shape, scale){
  fm <- dist_n("gamma")
  fs <- fm(
        x = gh
      , shape=shape
      , scale=scale
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(shape=1,scale=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))
mle_shape <- coef(fit)[1]
mle_scale <- coef(fit)[2]
m <- mean(gh)
v <- var(gh)
mom_shape <- m^2/v
mom_scale <- v/m

Under gamma distribution, the maximum likelihood estimate parameters for gh are 40.7065048 and 0.1406358 for shape and scale respectively. The method of moments parameters for gh are 29.5975363 and 0.1934147 for shape and scale respectively.

3.1.2.2 Estimated and Empirical distributions’ comparison

par(mfrow = c(1,2))
pdf(gh, "gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
pdf(gh, "gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

The mle chart on the left has the estimated gamma distribution drawing some semblance to the trajectory of the empirical distribution’s histogram. The mom chart on the right also draws a semblance to the one generated by the mle chart because the estimated distribution curve is quite similar.

par(mfrow = c(1,2))
cdf(gh, "gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
cdf(gh, "gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

Both charts have the estimated gamma distribution having a cdf function that slightly deviates from the ecdf of the empirical distribution’s histogram.

par(mfrow = c(1,2))
qplot(gh, dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
qplot(gh, dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

Looking at the values of the qqplot chart generated via the mom approach, it can be observed that the values are similar to those generated for the mle.

3.1.2.3 Estimated distribution’s median

par(mfrow = c(1,2))
med_est(dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The median of the estimated distribution under gamma distribution using mle is 5.68"

med_est(dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The median of the estimated distribution under gamma distribution using mom is 5.66"

3.1.2.4 Median Sampling distribution

par(mfrow = c(1,2))
med_sample(dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
med_sample(dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

3.1.2.5 Range of Middle 95% of sampling distribution

par(mfrow = c(1,2))
mid_95(dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The middle 95 percent of the sampling distribution for gamma distribution begins from 3.99 and ends at 7.72 using mle"

mid_95(dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The middle 95 percent of the sampling distribution for gamma distribution begins from 3.93 and ends at 7.74 using mom"

3.1.3 Weibull Distribution

3.1.3.1 Parameter Estimation

nLL <- function(shape, scale){
  fm <- dist_n("weibull")
  fs <- fm(
        x = gh
      , shape=shape
      , scale=scale
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(shape=1,scale=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))
mle_shape <- coef(fit)[1]
mle_scale <- coef(fit)[2]
m <- mean(gh)
v <- var(gh)
# using the function set up via the Weibull distribution for 
# estimating its parameters to calculate values in the   
# simultaneous equation below
fn2 <- function(x) {
  crossprod(weibull_fn(x[1], x[2]) - c(m, v + m^2))
}
d <- optim(c(1, 1), fn2)
mom_shape <- d$par[2] 
mom_scale <- d$par[1]

Using MOM to estimate weibull parameters

\[E(X)=\Lambda\Gamma(1 + (1/k))\]

\[Var(X)=\Lambda^2\Gamma(1 + (2/k))-E(X)^2\] \[5.7246=\Lambda\Gamma(1 + (1/k))\]

\[1.107= \Lambda^2\Gamma(1 + (2/k))-32.771\] \[33.878= \Lambda^2\Gamma(1 + (2/k))\] Using simultaneous equations,

\[\Lambda(scale) = 6.1516\] \[k(shape) = 6.3478\] Under weibull distribution, the maximum likelihood estimate parameters for gh are 4.1252537 and 6.1738848 for shape and scale respectively. The method of moments parameters for gh are 6.3261437 and 6.1520965 for shape and scale respectively.

3.1.3.2 Estimated and Empirical distributions’ comparison

par(mfrow = c(1,2))
pdf(gh, "weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
pdf(gh, "weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

The two density curves generated by the estimated distributions of mle and mom are slightly different. The mom estimated curve has a fatter tail and is more positively skewed compared to that of the mle.Both curves do not fit perfectly well with the histogram

par(mfrow = c(1,2))
cdf(gh, "weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
cdf(gh, "weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

The two cumulative distribution curves do not differ by much. There is a form of inconsistency between the estimated cdf and the empirical cdf of the two charts.

par(mfrow = c(1,2))
qplot(gh, dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
qplot(gh, dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

The two qqplots generate roughly the same range of values. The qqplot of the mle generated chart has values from about 2 to 10 and the qqplot of the mom generated chart has values from abut 2 to 9. They also both fit well with the y=x line.

3.1.3.3 Estimated distribution’s median

par(mfrow = c(1,2))
med_est(dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The median of the estimated distribution under weibull distribution using mle is 5.65"

med_est(dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The median of the estimated distribution under weibull distribution using mom is 5.81"

3.1.3.4 Median Sampling distribution

par(mfrow = c(1,2))
med_sample(dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
med_sample(dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

3.1.3.5 Range of Middle 95% of sampling distribution

par(mfrow = c(1,2))
mid_95(dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The middle 95 percent of the sampling distribution for weibull distribution begins from 2.53 and ends at 8.47 using mle"

mid_95(dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The middle 95 percent of the sampling distribution for weibull distribution begins from 3.54 and ends at 7.57 using mom"

All the activities conducted above for the Glycohaemoglobin variable are repetead for the height of adult female variable in the next section.

3.2 Heights of Adult Female

3.2.1 Normal Distribution

3.2.1.1 Parameter Estimation

nLL <- function(mean, sd){
  fm <- dist_n("norm")
  fs <- fm(
        x = ht
      , mean = mean
      , sd = sd
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(mean=1,sd=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))
mle_mean <- coef(fit)[1]
mle_sd <- coef(fit)[2]
mom_mean <- mean(ht)
mom_sd <- sqrt(var(ht))

Under normal distribution, the maximum likelihood estimate parameters for gh are 160.7419005 and 7.3165022 for shape and scale respectively. The method of moments parameters for gh are 160.7419 and 7.3201611 for shape and scale respectively.

3.2.1.2 Estimated and Empirical distributions’ comparison

par(mfrow = c(1,2))
pdf(ht, "norm", mean = mle_mean, sd = mle_sd, word2 = "mle")
pdf(ht, "norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

The charts above show that both mle and mom properly define the height of adult females variable. The density curves generated from the estimated parameters match the distribution created with the histogram.

par(mfrow = c(1,2))
cdf(ht, "norm", mean = mle_mean, sd = mle_sd, word2 = "mle")
cdf(ht, "norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

The estimated cdfs created by both methods of estimation fit perfectly well with the empirical cdfs generated empirically. This fit is of high quality.

par(mfrow = c(1,2))
qplot(ht, dist = "norm", mean = mle_mean, sd = mle_sd, word2 = "mle")
qplot(ht, dist = "norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

The qqplots above are very similar. The quantile values only differ at both ends of the distribution. It can be observed that all the quantile values fit well with the y = x line.

3.2.1.3 Estimated distribution’s median

med_est(dist="norm", mean = mle_mean, sd = mle_sd, word2 = "mle")

## [1] "The median of the estimated distribution under norm distribution using mle is 160.74"

med_est(dist="norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

## [1] "The median of the estimated distribution under norm distribution using mom is 160.74"

3.2.1.4 Median Sampling distribution

med_sample(dist="norm", mean = mle_mean, sd = mle_sd, word2 = "mle")

med_sample(dist="norm", mean = mom_mean, sd = mom_sd, word2 = "mom")

3.2.1.5 Range of Middle 95% of sampling distribution

mid_95(dist="norm", mean=mle_mean, sd =mle_sd, word2="mle")

## [1] "The middle 95 percent of the sampling distribution for norm distribution begins from 146.51 and ends at 174.87 using mle"

mid_95(dist="norm", mean=mom_mean, sd =mom_sd, word2="mom")

## [1] "The middle 95 percent of the sampling distribution for norm distribution begins from 145.37 and ends at 175.15 using mom"

3.2.2 Gamma Distribution

3.2.2.1 Parameter Estimation

nLL <- function(shape, scale){
  fm <- dist_n("gamma")
  fs <- fm(
        x = ht
      , shape=shape
      , scale=scale
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(shape=1,scale=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))
mle_shape <- coef(fit)[1]
mle_scale <- coef(fit)[2]
m <- mean(ht)
v <- var(ht)
mom_shape <- m^2/v
mom_scale <- v/m

Under gamma distribution, the maximum likelihood estimate parameters for gh are 479.646083 and 0.3351275 for shape and scale respectively. The method of moments parameters for gh are 482.1885705 and 0.333359 for shape and scale respectively.

3.2.2.2 Estimated and Empirical distributions’ comparison

par(mfrow = c(1,2))
pdf(ht, "gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
pdf(ht, "gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

The gamma distribution’s estimated probability density function fits well with the histogram in the mle chart. And the mom estimated pdf is fits well too.

par(mfrow = c(1,2))
cdf(ht, "gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
cdf(ht, "gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

The mle chart shows that the cdf fits well with the ecdf, and the mom chart displays the same phenomenon observed above for the pdf.

par(mfrow = c(1,2))
qplot(ht, dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
qplot(ht, dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

Upon observing the values of the quantile created by the mom, it can be inferred that the gamma distribution estimated parameters yielded similar numbers for the quantile values.

3.2.2.3 Estimated distribution’s median

par(mfrow = c(1,2))
med_est(dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The median of the estimated distribution under gamma distribution using mle is 160.63"

med_est(dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The median of the estimated distribution under gamma distribution using mom is 160.63"

3.2.2.4 Median Sampling distribution

par(mfrow = c(1,2))
med_sample(dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")
med_sample(dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

3.2.2.5 Range of Middle 95% of sampling distribution

par(mfrow = c(1,2))
mid_95(dist="gamma", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The middle 95 percent of the sampling distribution for gamma distribution begins from 145.53 and ends at 175.89 using mle"

mid_95(dist="gamma", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The middle 95 percent of the sampling distribution for gamma distribution begins from 146.35 and ends at 175.03 using mom"

3.2.3 Weibull Distribution

3.2.3.1 Parameter Estimation

nLL <- function(shape, scale){
  fm <- dist_n("weibull")
  fs <- fm(
        x = ht
      , shape=shape
      , scale=scale
      , log = TRUE
    ) 
  -sum(fs)
}
fit <- mle(nLL, start = list(shape=1,scale=1),method = "L-BFGS-B", 
           lower = c(0, 0.01))
(mle_shape <- coef(fit)[1])

##    shape 
## 21.85398

(mle_scale <- coef(fit)[2])

##    scale 
## 164.2472

m <- mean(ht)
v <- var(ht)
# using the function set up via the Weibull distribution for 
# estimating its parameters to calculate values in the   
# simultaneous equation below
fn2 <- function(x) {
  crossprod(weibull_fn(x[1], x[2]) - c(m, v + m^2))
}
d <- optim(c(100, 10), fn2)
(mom_shape <- d$par[2])

## [1] 24.80321

(mom_scale <- d$par[1])

## [1] 164.2702

Under weibull distribution, the maximum likelihood estimate parameters for ht are 21.8539788 and 164.2471916 for shape and scale respectively. The method of moments parameters for gh are 24.8032112 and 164.2701646 for shape and scale respectively.

Obtaining parameters via MOM

The parameters for weibull distribution are shape and scale. Lambda represents scale, k represents shape.

\[E(X)=\Lambda\Gamma(1 + (1/k))\]

\[Var(X)=\Lambda^2\Gamma(1 + (2/k))-E(X)^2\] \[160.7419=\Lambda\Gamma(1 + (1/k))\] \[53.58476= \Lambda^2\Gamma(1 + (2/k))-25837.95842\] \[25891.54318 = \Lambda^2\Gamma(1 + (2/k))\] Using simultaenous equations,

\[\Lambda(scale) = 164.2702\] \[k(shape) = 24.8032\]

3.2.3.2 Estimated and Empirical distributions’ comparison

par(mfrow = c(1,2))
pdf(ht, "weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
pdf(ht, "weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

Using the mle approach, it is easy to model the behavior of the height variable quite accurately.The mom approach also does the same.

par(mfrow = c(1,2))
cdf(ht, "weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
cdf(ht, "weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

The mle chart shows that the cdf and the ecdf values are very similar. The mom chart also proves the same with a fitter curve.

par(mfrow = c(1,2))
qplot(ht, dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
qplot(ht, dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

The mom and mle approach via weibull distribution generated values that are similar with the heights of adult females. Both fits are in line with the y = x line.

3.2.3.3 Estimated distribution’s median

par(mfrow = c(1,2))
med_est(dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The median of the estimated distribution under weibull distribution using mle is 161.52"

med_est(dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The median of the estimated distribution under weibull distribution using mom is 161.86"

3.2.3.4 Median Sampling distribution

par(mfrow = c(1,2))
med_sample(dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")
med_sample(dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

3.2.3.5 Range of Middle 95% of sampling distribution

par(mfrow = c(1,2))
mid_95(dist="weibull", shape=mle_shape, scale = mle_scale, word2 = "mle")

## [1] "The middle 95 percent of the sampling distribution for weibull distribution begins from 138.53 and ends at 174.06 using mle"

mid_95(dist="weibull", shape=mom_shape, scale = mom_scale, word2 = "mom")

## [1] "The middle 95 percent of the sampling distribution for weibull distribution begins from 141.27 and ends at 173.22 using mom"

4 Conclusion

MLE and MOM are two great systems for modeling the behavior of a variable’s distribution when unclear about how the variable is distributed. In the case of this deliverable, we discovered that MLE is designed to figure out a way to fit the estimated distribution on the empirical distribution. However, MOM’s formulaic approach is created in such a way that not all the estimated distributions will fit perfectly well with the empirical distribtuion. Hence, it is a better choice if one is trying to figure out the optimal distribution for describing a variable. When working with MLE or MOM, be careful to take the messages highlighted above into account. The optimal way to model the Glycohaemoglobin variable is via normal distribution given that both the mle and mom approach yield the same results, and it creates the curve that draws more of a semblance to the original distribution. The best way to model the height variable is via normal distribution as both approaches, mle and mom, create perfectly fitting curves and accurately generated values when compared to the distribution of the height variable.

Modeling the unknown distribution with maximum likelihood and method of moments

Mubarak Ganiyu

November 18, 2021

1 Introduction

2 Methods

3 Results

3.1 Glycohaemoglobin

3.1.1 Normal Distribution

3.1.1.1 Parameter Estimation

3.1.1.2 Estimated and Empirical distributions’ comparison

3.1.1.3 Estimated distribution’s median

3.1.1.4 Median Sampling distribution

3.1.1.5 Range of Middle 95% of sampling distribution

3.1.2 Gamma Distribution

3.1.2.1 Parameter Estimation

3.1.2.2 Estimated and Empirical distributions’ comparison

3.1.2.3 Estimated distribution’s median

3.1.2.4 Median Sampling distribution

3.1.2.5 Range of Middle 95% of sampling distribution

3.1.3 Weibull Distribution

3.1.3.1 Parameter Estimation

3.1.3.2 Estimated and Empirical distributions’ comparison

3.1.3.3 Estimated distribution’s median

3.1.3.4 Median Sampling distribution

3.1.3.5 Range of Middle 95% of sampling distribution

3.2 Heights of Adult Female

3.2.1 Normal Distribution

3.2.1.1 Parameter Estimation

3.2.1.2 Estimated and Empirical distributions’ comparison

3.2.1.3 Estimated distribution’s median

3.2.1.4 Median Sampling distribution

3.2.1.5 Range of Middle 95% of sampling distribution

3.2.2 Gamma Distribution

3.2.2.1 Parameter Estimation

3.2.2.2 Estimated and Empirical distributions’ comparison

3.2.2.3 Estimated distribution’s median

3.2.2.4 Median Sampling distribution

3.2.2.5 Range of Middle 95% of sampling distribution

3.2.3 Weibull Distribution

3.2.3.1 Parameter Estimation

3.2.3.2 Estimated and Empirical distributions’ comparison

3.2.3.3 Estimated distribution’s median

3.2.3.4 Median Sampling distribution

3.2.3.5 Range of Middle 95% of sampling distribution

4 Conclusion