library(ggfortify)
ggdistribution(dgamma, seq(0, 50, 0.1), rate=1/4, shape=5)
We discussed the exponential distribution:
The gamma distribution is similar - but more general.
Wait times at a snack bar are in the data set Service.
Load this data set and plot a histogram.
library(resampledata)
hist(Service$Times, prob=T)
Plot a gamma distribution on the same plot with:
hist(Service$Times, prob=T)
curve(dgamma(x, shape=10, rate=10), add=TRUE)
Experiment with the parameters a little and see if you can get the gamma distribution to match the distribution of the sample data (histogram).
We'll use a method called the Method of Moments to determine the parameters to use.
Read the first part of section 6.2.
A few hints:
\[ \begin{align*} \Gamma(r)&=\int_0^\infty x^{r-1}e^{- x} dx \\ f(x,r,\lambda)&=\frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} \end{align*} \]
\[ \begin{align*} E[X]&=\int_0^\infty x f(x,r,\lambda) dx\\ &= \int_0^\infty x \frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} dx\\ &=\frac{\lambda^r}{\Gamma(r)}\int_0^\infty x^r e^{-\lambda x} dx && w=\lambda x, dw=\lambda dx\\ &=\frac{\lambda^{r-1}}{\Gamma(r)}\int_0^\infty \left(\frac{w}{\lambda}\right)^r e^{-w} dw \\ &=\frac{1}{\lambda \Gamma(r)} \int_0^\infty w^r e^{-w} dw && u=w^r, dv=e^{-w}dw\\ & && du = r w^{r-1} dw, v= -e^{-w}\\ \end{align*} \]
\[ \begin{align*} E[X]&=\\ &=\frac{1}{\lambda \Gamma(r)}\left[-w^re^{-w} \biggr|_0^\infty+r\int_0^\infty w^{r-1}e^{-w} dw \right]\\ &=\frac{1}{\lambda \Gamma(r)}\left[0+r\Gamma(r) \right]\\ &=\frac{r}{\lambda} \end{align*} \]
\[ \begin{align*} E[X^2]&=\int_0^\infty x f(x,r,\lambda) dx\\ &= \int_0^\infty x^2 \frac{1}{\Gamma(r)}\lambda^rx^{r-1}e^{-\lambda x} dx\\ &=\frac{\lambda^r}{\Gamma(r)}\int_0^\infty x^{r+1} e^{-\lambda x} dx && w=\lambda x, dw=\lambda dx\\ &=\frac{\lambda^{r-1}}{\Gamma(r)}\int_0^\infty \left(\frac{w}{\lambda}\right)^{r+1} e^{-w} dw \\ &=\frac{1}{\lambda^2 \Gamma(r)} \int_0^\infty w^{r+1} e^{-w} dw && u=w^{r+1}, dv=e^{-w}dw\\ & && du = (r+1) w^r dw, v= -e^{-w}\\ \end{align*} \]
\[ \begin{align*} E[X^2]&=\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[-w^{r+1}e^{-w} \biggr|_0^\infty+(r+1)\int_0^\infty w^re^{-w} dw \right]\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[-w^{r+1}e^{-w} \biggr|_0^\infty+(r+1)r\Gamma(r) \right]\\ &=\frac{1}{\lambda^2 \Gamma(r)}\left[0+r(r+1)\Gamma(r) \right]\\ &=\frac{r(r+1)}{\lambda^2} \end{align*} \]
You could compute these with a loop, but R's vector structure provides an elegant way to compute the right hand side.
Here's an example. Say my \( X_i \) s are:
Xs <- c(12, 14, 17, 2, 3, 19, 7, 12)
(I just made them up.)
To compute \[ \sum_{i=1}^n X_i^3 \] I will cube each of the \( X_i \) s and then add them together.
Xs^3
[1] 1728 2744 4913 8 27 6859 343 1728
sum(Xs^3)
[1] 18350
All that's left to do is divide by \( n \).
I gave names to the values on the right hand for computation use later:
alpha <- sum((Service$Times)^1)/length(Service$Times)
beta <- sum((Service$Times)^2)/length(Service$Times)
alpha
[1] 0.6949234
beta
[1] 0.6637755
\[ \begin{align*} \frac{r}{\lambda}&=\alpha\\ r&=\alpha \cdot \lambda\\ \\ \frac{r(r+1)}{\lambda^2}&=\beta\\ \frac{\alpha \cdot \lambda (\alpha \cdot \lambda+1)}{\lambda^2}&=\beta\\ \frac{\alpha (\alpha \cdot \lambda+1)}{\lambda}&=\beta\\ \alpha^2\lambda+\alpha&=\beta \lambda\\ \alpha^2\lambda-\beta\lambda&=-\alpha\\ \lambda&=\frac{-\alpha}{\alpha^2-\beta} \hspace{1cm} r=\frac{-\alpha^2}{\alpha^2-\beta} \end{align*} \]
lambdahat <- -alpha/(alpha^2-beta)
rhat <- -alpha^2/(alpha^2-beta)
lambdahat
[1] 3.84239
rhat
[1] 2.670167
Plot the histogram of the data with the gamma distribution with the values you found for \( \lambda \) and \( r \) superimposed
hist(Service$Times, prob=T)
curve(dgamma(x, shape=rhat, rate=lambdahat), add=TRUE)
plot.ecdf(Service$Times)
curve(pgamma(x,rhat, lambdahat), add=TRUE, lwd=2, col="blue")
We'll start a process called “goodness of fit”.
First, determine quantiles, say 10 of them, for the gamma distribution with our parameters.
These are the values below which are 10% of the distribution, 20% of the distribution, etc.
cuts <- seq(0,1,by=0.1)
q<- qgamma(cuts, shape=rhat, rate=lambdahat)
q
[1] 0.0000000 0.2352488 0.3366274 0.4267464 0.5160412 0.6103626 0.7156841
[8] 0.8410612 1.0051247 1.2648805 Inf
10% of the data should be in each quantile - that's the definition of a quantile.
expected <- length(Service$Times)*0.1
expected
[1] 17.4
The following starts to plot a histogram, but instead counts how many data points are in each bar. Of course, we have to explicitly specify the breaks.
count <- hist(Service$Times, breaks=q, plot=F)$counts
count
[1] 16 20 17 17 21 17 16 15 16 19
observed <- count
expected <- rep(expected, 10)
mytable <- cbind(observed, expected)
knitr::kable(mytable, digits=3)
| observed | expected |
|---|---|
| 16 | 17.4 |
| 20 | 17.4 |
| 17 | 17.4 |
| 17 | 17.4 |
| 21 | 17.4 |
| 17 | 17.4 |
| 16 | 17.4 |
| 15 | 17.4 |
| 16 | 17.4 |
| 19 | 17.4 |