Method_of_Moments

Previous exercise

  • In Exercise 5.9 you comp ared a sampling distribution to a bootstrap distribution.
  • In both, the underlying population was assumed to have a gamma distribution with shape \( r=5 \) and rate \( \lambda=\frac{1}{4} \).

Huh?

  • Plot this distribution with ggdistribution.
  • Adjust the values on the horizontal axis to make sure you see enough of the distribution.
  • Describe the distribution.
library(ggfortify)
ggdistribution(dgamma, seq(0, 50, 0.1), rate=1/4, shape=5)

plot of chunk unnamed-chunk-1

  • This distribution is skewed right.
  • Roughly, most of the values lie between 8 and 30.

Gamma and Exponential distributions

  • We discussed the exponential distribution:

    • it is a “waiting time” distribution.
    • How long until the next volcano eruption?, etc.
  • The gamma distribution is similar - but more general.

    • The exponential distribution is a gamma distribution with \( r=1 \).

Service times

  • Wait times at a snack bar are in the data set Service.

  • Load this data set and plot a histogram.

    • Use prob=T in your histogram.
library(resampledata)
hist(Service$Times, prob=T)

plot of chunk unnamed-chunk-2

with a gamma distribution superimposed

Plot a gamma distribution on the same plot with:

hist(Service$Times, prob=T)
curve(dgamma(x, shape=10, rate=10), add=TRUE)

plot of chunk unnamed-chunk-3

What should the parameters be?

Experiment with the parameters a little and see if you can get the gamma distribution to match the distribution of the sample data (histogram).

What are the best parameters to use?

We'll use a method called the Method of Moments to determine the parameters to use.

Read the first part of section 6.2.

  • What values of \( k \) will we use?
  • What is \( f(x) \)?
  • What are the \( X_i \) s?
  • What is \( n \)?

The calculus part first

  • \( f \) is the pdf. You'll find this function in the Appendix C
  • What does the Appendix indicate is the domain of the gamma function pdf?
  • How will this affect the integrals shown on page 162?

Evaluate the integrals!

A few hints:

  • Look up \( \Gamma(r) \). It is in Appendix B.
  • The integrals are with respect to \( x \). Pull any constants outside the integral.
  • Start with a substitution: \( w=\lambda x \).
  • Next: integration by parts
  • It would be really nice if the factor of \( \Gamma(r) \) in the denominator canceled! Watch out for a factor of \( \Gamma(r) \) in the numerator.
  • When you get to the second integral, pay attention. At a certain point you can use work you already did for the first integral.

\[ \begin{align*} \Gamma(r)&=\int_0^\infty x^{r-1}e^{- x} dx \\ f(x,r,\lambda)&=\frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} \end{align*} \]

\[ \begin{align*} E[X]&=\int_0^\infty x f(x,r,\lambda) dx\\ &= \int_0^\infty x \frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} dx\\ &=\frac{\lambda^r}{\Gamma(r)}\int_0^\infty x^r e^{-\lambda x} dx && w=\lambda x, dw=\lambda dx\\ &=\frac{\lambda^{r-1}}{\Gamma(r)}\int_0^\infty \left(\frac{w}{\lambda}\right)^r e^{-w} dw \\ &=\frac{1}{\lambda \Gamma(r)} \int_0^\infty w^r e^{-w} dw && u=w^r, dv=e^{-w}dw\\ & && du = r w^{r-1} dw, v= -e^{-w}\\ \end{align*} \]

\[ \begin{align*} E[X]&=\\ &=\frac{1}{\lambda \Gamma(r)}\left[-w^re^{-w} \biggr|_0^\infty+r\int_0^\infty w^{r-1}e^{-w} dw \right]\\ &=\frac{1}{\lambda \Gamma(r)}\left[0+r\Gamma(r) \right]\\ &=\frac{r}{\lambda} \end{align*} \]

\[ \begin{align*} E[X^2]&=\int_0^\infty x f(x,r,\lambda) dx\\ &= \int_0^\infty x^2 \frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} dx\\ &=\frac{\lambda^r}{\Gamma(r)}\int_0^\infty x^{r+1} e^{-\lambda x} dx && w=\lambda x, dw=\lambda dx\\ &=\frac{\lambda^{r-1}}{\Gamma(r)}\int_0^\infty \left(\frac{w}{\lambda}\right)^{r+1} e^{-w} dw \\ &=\frac{1}{\lambda^2 \Gamma(r)} \int_0^\infty w^{r+1} e^{-w} dw && u=w^{r+1}, dv=e^{-w}dw\\ & && du = (r+1) w^r dw, v= -e^{-w}\\ \end{align*} \]

\[ \begin{align*} E[X^2]&=\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[-w^{r+1}e^{-w} \biggr|_0^\infty+(r+1)\int_0^\infty w^re^{-w} dw \right]\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[-w^{r+1}e^{-w} \biggr|_0^\infty+(r+1)r\Gamma(r) \right]\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[0+r(r+1)\Gamma(r) \right]\\ &=\frac{r(r+1)}{\lambda^2} \end{align*} \]

Status

  • You should have the left hand side for two of the equations on page 162.
  • They should be expressions in \( r \) and \( \lambda \).

Right hand side

You could compute these with a loop, but R's vector structure provides an elegant way to compute the right hand side.

Here's an example. Say my \( X_i \) s are:

Xs <- c(12, 14, 17, 2, 3, 19, 7, 12)

(I just made them up.)

To compute \[ \sum_{i=1}^n X_i^3 \] I will cube each of the \( X_i \) s and then add them together.

Xs^3
[1] 1728 2744 4913    8   27 6859  343 1728
sum(Xs^3)
[1] 18350

All that's left to do is divide by \( n \).

I gave names to the values on the right hand for computation use later:

alpha <- sum((Service$Times)^1)/length(Service$Times)
beta <- sum((Service$Times)^2)/length(Service$Times)
alpha
[1] 0.6949234
beta
[1] 0.6637755

Solve

  • You should have 2 equations in 2 unknowns.
  • Solve for \( \lambda \) and \( r \).
  • Do the first part by hand,
  • And let R find numerical values.

\[ \begin{align*} \frac{r}{\lambda}&=\alpha\\ r&=\alpha \cdot \lambda\\ \\ \frac{r(r+1)}{\lambda^2}&=\beta\\ \frac{\alpha \cdot \lambda (\alpha \cdot \lambda+1)}{\lambda^2}&=\beta\\ \frac{\alpha (\alpha \cdot \lambda+1)}{\lambda}&=\beta\\ \alpha^2\lambda+\alpha&=\beta \lambda\\ \alpha^2\lambda-\beta\lambda&=-\alpha\\ \lambda&=\frac{-\alpha}{\alpha^2-\beta} \hspace{1cm} r=\frac{-\alpha^2}{\alpha^2-\beta} \end{align*} \]

lambdahat <- -alpha/(alpha^2-beta)
rhat <- -alpha^2/(alpha^2-beta)
lambdahat
[1] 3.84239
rhat
[1] 2.670167

Plot

Plot the histogram of the data with the gamma distribution with the values you found for \( \lambda \) and \( r \) superimposed

hist(Service$Times, prob=T)
curve(dgamma(x, shape=rhat, rate=lambdahat), add=TRUE)

plot of chunk unnamed-chunk-8

Also plot the ecdf

  • Remember: the plot you just provided is for the pdf.
  • It's also helpful to have the cdf.
  • Plot the ecdf: plot.ecdf, along with the cdf of the gamma function.
plot.ecdf(Service$Times)
curve(pgamma(x,rhat, lambdahat), add=TRUE, lwd=2, col="blue")

plot of chunk unnamed-chunk-9

Did the process do a good job?

  • We'll start a process called “goodness of fit”.

  • First, determine quantiles, say 10 of them, for the gamma distribution with our parameters.

  • These are the values below which are 10% of the distribution, 20% of the distribution, etc.

cuts <- seq(0,1,by=0.1)
q<- qgamma(cuts, shape=rhat, rate=lambdahat)
q
 [1] 0.0000000 0.2352488 0.3366274 0.4267464 0.5160412 0.6103626 0.7156841
 [8] 0.8410612 1.0051247 1.2648805       Inf

Compare

  • We'll count how many data points we expect to be in each interval
  • We'll count how many data points are in each interval from the collected data sample
  • For now, we'll eyeball the comparison and return to the issue later when we have another tool: \( \chi^2 \)-test.

Expected

10% of the data should be in each quantile - that's the definition of a quantile.

expected <- length(Service$Times)*0.1
expected
[1] 17.4

Observed

The following starts to plot a histogram, but instead counts how many data points are in each bar. Of course, we have to explicitly specify the breaks.

count <- hist(Service$Times, breaks=q, plot=F)$counts
count
 [1] 16 20 17 17 21 17 16 15 16 19

and for ease of comparison:

observed <- count
expected <- rep(expected, 10)
mytable <- cbind(observed, expected)
knitr::kable(mytable, digits=3)
observed expected
16 17.4
20 17.4
17 17.4
17 17.4
21 17.4
17 17.4
16 17.4
15 17.4
16 17.4
19 17.4

Conclusion

  • None of the observed values feel too far off.
  • We'll have to wait for a better answer….