The short version is that the Beta distribution can be understood as representing a distribution of probabilities- that is, it represents all the possible values of a probability when we don’t know what that probability is.
This probability is given by the integral of this variable’s probability density function (PDF) over that range—that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range.
These parameters are chosen for two reasons: The mean is α/(α+β) = 81/(81+219) = 0.270 As you can see in the plot, this distribution lies almost entirely within (.2 - .35), the reasonable range for a batting average.
Anyone who follows baseball is familiar with batting averages- simply the number of times a player gets a base hit divided by the number of times he goes up at bat (so it’s just a percentage between 0 and 1). .266 is in general considered an average batting average, while .300 is considered an excellent one.
library(ExtDist)
## Warning: package 'ExtDist' was built under R version 3.3.3
##
## Attaching package: 'ExtDist'
## The following object is masked from 'package:stats':
##
## BIC
library(PerformanceAnalytics)
## Warning: package 'PerformanceAnalytics' was built under R version 3.3.3
## Loading required package: xts
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
x <- rBeta(n=500, params=list(shape1=81, shape2=219))
est.par <- eBeta(x); est.par
##
## Parameters for the Beta distribution.
## (found using the MOM method.)
##
## Parameter Type Estimate
## shape1 shape 77.98685
## shape2 shape 211.66731
plot(est.par)
curve(dbeta(x,81,219),col = "blue", xlab = "Batting Average", ylab = "Probability", xlim=c(0.1,0.4), ylim=c(0,25))
You asked what the x axis represents in a beta distribution density plot- here it represents his batting average. Thus notice that in this case, not only is the y-axis a probability (or more precisely a probability density), but the x-axis is as well (batting average is just a probability of a hit, after all)! The beta distribution is representing a probability distribution of probabilities.
Imagine the player gets a single hit. His record for the season is now “1 hit; 1 at bat.” We have to then update our probabilities- we want to shift this entire curve over just a bit to reflect our new information. While the math for proving this is a bit involved (it’s shown here), the result is very simple. The new beta distribution will be:
Beta(α0+hits,β0+misses)
curve(dbeta(x,81,219),col = "blue", xlab = "Batting Average", ylab = "Probability", xlim=c(0.1,0.4), ylim=c(0,25))
par(new=TRUE)
curve(dbeta(x,(81+6),(219)),col = "red", xlab = "Batting Average", ylab = "Probability", xlim=c(0.1,0.4),ylim=c(0,25))
However, the more the player hits over the course of the season, the more the curve will shift to accommodate the new evidence, and furthermore the more it will narrow based on the fact that we have more proof. Let’s say halfway through the season he has been up to bat 300 times, hitting 100 out of those times. The new distribution would be beta(81+100,219+200)beta(81+100,219+200):
curve(dbeta(x,81,219),col = "blue", xlab = "Batting Average", ylab = "Probability", xlim=c(0.1,0.4), ylim=c(0,25))
par(new=TRUE)
curve(dbeta(x,(81+100),(219+200)),col = "red", xlab = "Batting Average", ylab = "Probability", xlim=c(0.1,0.4),ylim=c(0,25))
In short, the beta distribution can be understood as representing a probability distribution of probabilities- that is, it represents all the possible values of a probability when we don’t know what that probability is.
Applications The Beta distribution is used in a range of disciplines including rule of succession, Bayesian statistics, and task duration modeling. Examples of events that may be modeled by Beta distribution include: The time it takes to complete a task The proportion of defective items in a shipment
The Standard Beta Distribution. Density, distribution, quantile, random number generation, and parameter estimation functions for the beta distribution with parameters shape1 and shape2. Parameter estimation can be based on a weighted or unweighted i.i.d. sample and can be carried out analytically or numerically.
The beta distribution with parameters shape1=α , and shape2=β The parameter estimates for α and β are as given in the http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm.{Engineering Statistics Handbook}.
dBeta gives the density, pBeta the distribution function, qBeta the quantile function, rBeta generates random deviates, and eBeta estimates the parameters. lBeta provides the log-likelihood function, sBeta the observed score function, and iBeta the observed information matrix.
x <- rBeta(n=500, params=list(shape1=2, shape2=2))
est.par <- eBeta(x); est.par
##
## Parameters for the Beta distribution.
## (found using the MOM method.)
##
## Parameter Type Estimate
## shape1 shape 1.889858
## shape2 shape 1.998561
plot(est.par)
dens <- dBeta(x=seq(0,1,length=100), params=list(shape1=2, shape2=2))
hist(x, breaks=10, probability=TRUE, ylim = c(0,1.2*max(dens)))
lines(seq(0,1,length=100), dens, col="blue")
lines(density(x), lty=2)
est.par[attributes(est.par)$par.type=="shape"]
## $shape1
## [1] 1.889858
##
## $shape2
## [1] 1.998561
Example from; Bury(1999) pp.253-255, parameter estimates as given by Bury are shape1 = 4.222 and shape2 = 6.317
data <- c(0.461, 0.432, 0.237, 0.113, 0.526, 0.278, 0.275, 0.309, 0.67, 0.428, 0.556,
0.402, 0.472, 0.226, 0.632, 0.533, 0.309, 0.417, 0.495, 0.241)
est.par <- eBeta(X=data, method="numerical.MLE"); est.par
##
## Parameters for the Beta distribution.
## (found using the numerical.MLE method.)
##
## Parameter Type Estimate S.E.
## shape1 shape 4.191737 1.283389
## shape2 shape 6.304651 1.969371
plot(est.par)
lBeta(data, param=est.par)
## [1] 10.5777
sBeta(data, param=est.par)
## shape1 shape2
## 3.857847e-07 -8.869954e-07
iBeta(data, param=est.par)
## shape1 shape2
## shape1 3.386125 -1.999060
## shape2 -1.999060 1.438018
H <- attributes(est.par)$nll.hessian;H
## shape1 shape2
## shape1 3.386125 -1.999060
## shape2 -1.999060 1.438018
var <- solve(H)
se <- sqrt(diag(var)); se
## shape1 shape2
## 1.283389 1.969371
x <- rBeta(n=500, params=list(shape1=1.283389, shape2=1.969371))
est.par <- eBeta(x); est.par
##
## Parameters for the Beta distribution.
## (found using the MOM method.)
##
## Parameter Type Estimate
## shape1 shape 1.199050
## shape2 shape 1.847258
plot(est.par)