Method_of_Moments

Previous exercise

In Exercise 5.9 you comp ared a sampling distribution to a bootstrap distribution.
In both, the underlying population was assumed to have a gamma distribution with shape \( r=5 \) and rate \( \lambda=\frac{1}{4} \).

Huh?

Plot this distribution with ggdistribution.
Adjust the values on the horizontal axis to make sure you see enough of the distribution.
Describe the distribution.

library(ggfortify)
ggdistribution(dgamma, seq(0, 50, 0.1), rate=1/4, shape=5)

plot of chunk unnamed-chunk-1

This distribution is skewed right.
Roughly, most of the values lie between 8 and 30.

Gamma and Exponential distributions

We discussed the exponential distribution:
- it is a “waiting time” distribution.
- How long until the next volcano eruption?, etc.
The gamma distribution is similar - but more general.
- The exponential distribution is a gamma distribution with \( r=1 \).

Service times

Wait times at a snack bar are in the data set Service.
Load this data set and plot a histogram.
- Use prob=T in your histogram.

library(resampledata)
hist(Service$Times, prob=T)

plot of chunk unnamed-chunk-2

with a gamma distribution superimposed

Plot a gamma distribution on the same plot with:

hist(Service$Times, prob=T)
curve(dgamma(x, shape=10, rate=10), add=TRUE)

plot of chunk unnamed-chunk-3

What should the parameters be?

Experiment with the parameters a little and see if you can get the gamma distribution to match the distribution of the sample data (histogram).

What are the best parameters to use?

We'll use a method called the Method of Moments to determine the parameters to use.

Read the first part of section 6.2.

What values of \( k \) will we use?
What is \( f(x) \)?
What are the \( X_i \) s?
What is \( n \)?

The calculus part first

\( f \) is the pdf. You'll find this function in the Appendix C
What does the Appendix indicate is the domain of the gamma function pdf?
How will this affect the integrals shown on page 162?

Evaluate the integrals!

A few hints:

Look up \( \Gamma(r) \). It is in Appendix B.
The integrals are with respect to \( x \). Pull any constants outside the integral.
Start with a substitution: \( w=\lambda x \).
Next: integration by parts
It would be really nice if the factor of \( \Gamma(r) \) in the denominator canceled! Watch out for a factor of \( \Gamma(r) \) in the numerator.
When you get to the second integral, pay attention. At a certain point you can use work you already did for the first integral.

\[ \begin{align*} \Gamma(r)&=\int_0^\infty x^{r-1}e^{- x} dx \\ f(x,r,\lambda)&=\frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} \end{align*} \]

\[ \begin{align*} E[X]&=\int_0^\infty x f(x,r,\lambda) dx\\ &= \int_0^\infty x \frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} dx\\ &=\frac{\lambda^r}{\Gamma(r)}\int_0^\infty x^r e^{-\lambda x} dx && w=\lambda x, dw=\lambda dx\\ &=\frac{\lambda^{r-1}}{\Gamma(r)}\int_0^\infty \left(\frac{w}{\lambda}\right)^r e^{-w} dw \\ &=\frac{1}{\lambda \Gamma(r)} \int_0^\infty w^r e^{-w} dw && u=w^r, dv=e^{-w}dw\\ & && du = r w^{r-1} dw, v= -e^{-w}\\ \end{align*} \]

\[ \begin{align*} E[X]&=\\ &=\frac{1}{\lambda \Gamma(r)}\left[-w^re^{-w} \biggr|_0^\infty+r\int_0^\infty w^{r-1}e^{-w} dw \right]\\ &=\frac{1}{\lambda \Gamma(r)}\left[0+r\Gamma(r) \right]\\ &=\frac{r}{\lambda} \end{align*} \]

\[ \begin{align*} E[X^2]&=\int_0^\infty x f(x,r,\lambda) dx\\ &= \int_0^\infty x^2 \frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} dx\\ &=\frac{\lambda^r}{\Gamma(r)}\int_0^\infty x^{r+1} e^{-\lambda x} dx && w=\lambda x, dw=\lambda dx\\ &=\frac{\lambda^{r-1}}{\Gamma(r)}\int_0^\infty \left(\frac{w}{\lambda}\right)^{r+1} e^{-w} dw \\ &=\frac{1}{\lambda^2 \Gamma(r)} \int_0^\infty w^{r+1} e^{-w} dw && u=w^{r+1}, dv=e^{-w}dw\\ & && du = (r+1) w^r dw, v= -e^{-w}\\ \end{align*} \]

\[ \begin{align*} E[X^2]&=\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[-w^{r+1}e^{-w} \biggr|_0^\infty+(r+1)\int_0^\infty w^re^{-w} dw \right]\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[-w^{r+1}e^{-w} \biggr|_0^\infty+(r+1)r\Gamma(r) \right]\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[0+r(r+1)\Gamma(r) \right]\\ &=\frac{r(r+1)}{\lambda^2} \end{align*} \]

Status

You should have the left hand side for two of the equations on page 162.
They should be expressions in \( r \) and \( \lambda \).

Right hand side

You could compute these with a loop, but R's vector structure provides an elegant way to compute the right hand side.

Here's an example. Say my \( X_i \) s are:

Xs <- c(12, 14, 17, 2, 3, 19, 7, 12)

(I just made them up.)

To compute \[ \sum_{i=1}^n X_i^3 \] I will cube each of the \( X_i \) s and then add them together.

Xs^3

[1] 1728 2744 4913    8   27 6859  343 1728

sum(Xs^3)

[1] 18350

All that's left to do is divide by \( n \).

I gave names to the values on the right hand for computation use later:

alpha <- sum((Service$Times)^1)/length(Service$Times)
beta <- sum((Service$Times)^2)/length(Service$Times)
alpha

[1] 0.6949234

beta

[1] 0.6637755

Solve

You should have 2 equations in 2 unknowns.
Solve for \( \lambda \) and \( r \).
Do the first part by hand,
And let R find numerical values.

\[ \begin{align*} \frac{r}{\lambda}&=\alpha\\ r&=\alpha \cdot \lambda\\ \\ \frac{r(r+1)}{\lambda^2}&=\beta\\ \frac{\alpha \cdot \lambda (\alpha \cdot \lambda+1)}{\lambda^2}&=\beta\\ \frac{\alpha (\alpha \cdot \lambda+1)}{\lambda}&=\beta\\ \alpha^2\lambda+\alpha&=\beta \lambda\\ \alpha^2\lambda-\beta\lambda&=-\alpha\\ \lambda&=\frac{-\alpha}{\alpha^2-\beta} \hspace{1cm} r=\frac{-\alpha^2}{\alpha^2-\beta} \end{align*} \]

lambdahat <- -alpha/(alpha^2-beta)
rhat <- -alpha^2/(alpha^2-beta)
lambdahat

[1] 3.84239

rhat

[1] 2.670167

Plot

Plot the histogram of the data with the gamma distribution with the values you found for \( \lambda \) and \( r \) superimposed

hist(Service$Times, prob=T)
curve(dgamma(x, shape=rhat, rate=lambdahat), add=TRUE)

plot of chunk unnamed-chunk-8

Also plot the ecdf

Remember: the plot you just provided is for the pdf.
It's also helpful to have the cdf.
Plot the ecdf: plot.ecdf, along with the cdf of the gamma function.

plot.ecdf(Service$Times)
curve(pgamma(x,rhat, lambdahat), add=TRUE, lwd=2, col="blue")

plot of chunk unnamed-chunk-9

Did the process do a good job?

We'll start a process called “goodness of fit”.
First, determine quantiles, say 10 of them, for the gamma distribution with our parameters.
These are the values below which are 10% of the distribution, 20% of the distribution, etc.

cuts <- seq(0,1,by=0.1)
q<- qgamma(cuts, shape=rhat, rate=lambdahat)
q

 [1] 0.0000000 0.2352488 0.3366274 0.4267464 0.5160412 0.6103626 0.7156841
 [8] 0.8410612 1.0051247 1.2648805       Inf

Compare

We'll count how many data points we expect to be in each interval
We'll count how many data points are in each interval from the collected data sample
For now, we'll eyeball the comparison and return to the issue later when we have another tool: \( \chi^2 \)-test.

Expected

10% of the data should be in each quantile - that's the definition of a quantile.

expected <- length(Service$Times)*0.1
expected

[1] 17.4

Observed

The following starts to plot a histogram, but instead counts how many data points are in each bar. Of course, we have to explicitly specify the breaks.

count <- hist(Service$Times, breaks=q, plot=F)$counts
count

 [1] 16 20 17 17 21 17 16 15 16 19

and for ease of comparison:

observed <- count
expected <- rep(expected, 10)
mytable <- cbind(observed, expected)
knitr::kable(mytable, digits=3)

observed	expected
16	17.4
20	17.4
17	17.4
17	17.4
21	17.4
17	17.4
16	17.4
15	17.4
16	17.4
19	17.4

Conclusion

None of the observed values feel too far off.
We'll have to wait for a better answer….