With Kobe Bryant passed Micheal Jordan for third on NBA’s scoring list, this 37 years old NBA player came back to people’s attention. These is no doubt that Kobe’s one of the best players in the NBA, but can he still do what he does best when he’s 37? After his injury last year, he only played 6 games in the last season. What performance he’ll bring to us for this season? How much points can he score on average for this season? These are the major questions all of his fans are concerned about.
In this report, Kobe Bryant’s field goal data are investigated using a bayesian approach. And all of the above questions is answered in terms of that.
Kobe Bryant’s filed goal data were obtained from Yahoo sports. Yahoo sports have access to all NBA players stats for each season and each game. For example, the last 10 games’ field goal data for Kobe are as follows:
Kobe is a famous scorer. Hence in the above table, only some of the variables are of interest.
Here we use a conjugate hierarchical model to analyze Kobe Bryants’ field goal data. First of all, note that the data is binomial. Since for each game Kobe have different number of field goal attempts and his shooting percentage may not be the same. We want to inspect Kobe’s performance across each game. Therefore, the model is specified as follows. \[y_t \sim Bin(n_t, \pi_t)\] where \(t \in \{1:10\}\) and \(n_t\)’s are known. This is obvious since every field goal attempt is a bernoulli trial and the the seasonal total is binomial.
As the rat tumor example in the book, if we assume Kobe’s shooting percentage across each season is similar (exchangeable), we can use a mixture of independent identical distributions to model the prior: \[\pi_t | \alpha, \beta \sim Beta(\alpha, \beta)\] The advantage of using such a Beta distribution as the conditional prior is that the conditional posterior \(p(\pi_t|\alpha, \beta, y)\) will be independent Beta distributions too.
The joint distribution is:For each \(t\), \(\pi_t \sim\) Beta(\(\alpha, \beta)\). Since Beta is conjugate prior for binomial models, \(\pi_t\) given \(\alpha, \beta, y\) follows Beta\((\alpha + y_t, \beta + n_t - y_t)\). So the conditional posterior of \(\pi\) given \(\alpha\) and \(\beta\) can be write as mixture of independent Beta distributions too. \[p(\pi| \alpha, \beta, y) = \prod_{t=1996}^{2014} \frac{\Gamma(\alpha + \beta + n_t)}{\Gamma(\alpha+y_t)\Gamma(\beta+n_t-y_t)} \pi_t^{\alpha + y_t -1}(1-\pi_t)^{\beta + n_t - y_t - 1} \]
As suggested by the book, before assigning a prior on the hyperparameters, we reparameterize in terms of \(\gamma = log(\alpha/\beta)\) and \(\delta = log(\alpha + \beta)\), which are the logit of the mean and the log of the sample size in the beta distribution. A uniform prior on these scale will lead to a improper posterior density, so another prior is suggested: \[p(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}\] this is actually a uniform hyperprior on \((\frac{\alpha}{\alpha + \beta}, (\alpha + \beta)^{-1/2})\).
The marginal posterior of \(p(\alpha, \beta | y)\)is just the joint divided by the conditional:
\[p(\alpha, \beta | y) \propto p(\alpha + \beta) \prod_{t=1996}^{2014} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \frac{\Gamma(\alpha+y_t)\Gamma(\beta+n_t-y_t)}{\Gamma(\alpha+\beta+n_t)}\]
If the diffuse prior \(p(\alpha, \beta) \propto (\alpha + \beta)^{-5/2}\) is used, we can compute the marginal posterior density of the hyperparameters on a grid of values.
We can estimate \(\alpha\) and \(\beta\) approximately by method of moments. According to BDA3 P.583, \[\alpha + \beta = \frac{E(\pi)(1-E(\pi))}{Var(\pi)} = \frac{0.3886348 * (1-0.3886348)}{0.00577598} -1 = 40.13549\] \[\alpha = (\alpha + \beta) E(\pi) = 40.13549 * 0.3886348 = 15.59805 \quad \beta = (\alpha + \beta) (1 -E(\pi)) = 40.13549 *(1-0.3886348) = 24.53744\]
That gave a crude estimate for \(log(\alpha/\beta) = -0.4530543\) and \(log(alpha + beta) = 3.692261\). Based on this estimate, we can expand a grid centered at these two values and compute a contour of marginal densities.
Given the grid values, the log of marginal density at each value can be computed. After some trial and error, I set the grid to be \([-1, .2] \times [0, 6]\) so that it can contain all the density. To avoid computation overflow, the maximum log marginal density is substracted from each log marginal density values and exponentiate, yielding values of the unnormalized marginal posterior.
The follwing figures are the contour of marginal posterior density and the discrete sample from it. The posterior sample is obtained by 1. compute the marginal probability of \(\alpha | \beta, y\); 2. Sample \(\alpha\) using this density; 3. sample \(\beta\) using the just sampled \(\alpha\) and its correponding discrete conditional probability \(p(\beta | \alpha, y)\).
I sampled 1000 values of \(\alpha\) and \(\beta\), scatter plot are shown in the same scale.
Then The sample can be transformed back to the original scale. Based on sampled values of \(\alpha\) and \(\beta\), we can directly sample \(\pi_t\) from Beta distributions. The posterior of \(\pi\) is shown in the follwing boxplot.
We can see there is an increasing trend in this model. It shows that somehow Kobe is finding his game back.
Since these two conditional posteriors are not recognizeable, a metropolis step can be used to sample from these two populations. The sampling framework would be like this:
for (m in 2:MM) {
theta[m, ] <- draw.thetas(alpha[m-1], beta[m-1])
alpha[m] <- draw.alpha(alpha[m-1], beta[m-1], theta[m, ], alpha.prop.sd)
beta[m] <- draw.beta(alpha[m], beta[m-1], theta[m,], beta.prop.sd)
}
The proposal density is Metropolis random walk and direct sampling is used for those \(\pi_t\)’s.
This actually works for the rat-tumor data if I set a chain for 100,000 iterations and choose a suitable thining value, like 10 or 20. However for this example, I couldn’t get the chain to converge. The book talked about this issue. It said the contours will drifting toward infinity, indicating the posterior is not integrable in that limit. Most of the mass of \(\alpha\) and \(\beta\) would be near infinity. I ploted the posterior plot obtained from the discrete sample, this is shown clearly in the following figures.
In the above model, a mixture of beta priors is used. The main advantage of using conjugate beta priors is that the conditonal posterior will be beta too. However, it’s quite hard to interpret the hyperparameters as well as find priors for them.
Another commonly used model is the logit model. Data still assumes to follow binomial distribution \[y_t \sim Binomial(n_t, \pi_t)\]
Define \(\theta_t\) to be the log odds ratio of Kobe’s field goal. Furthermore, it’s on a scale of the real line.
\[logit(\pi_t) = \log\big(\frac{\pi_t}{1-\pi_t}\big) = \theta_t\]
And we can assume a Normal prior on \(\theta_t\): \[\theta_t \sim N(\mu_\theta, \tau_\theta)\] in which \(\tau_theta\) is the precision parameter.
If we want to assign a noninformative prior on \(\theta_t\), we can use: \[\mu_\theta \sim N(0, 100) \quad\text{and}\quad \tau_\theta \sim Gamma(.001, .001)\]
The hyper prior densities are chosen to be flat enough so that it’s a noninformative prior.
Actually there is some prior information we can use on this issue. When modelling a NBA player’s shooting performance, we know that it’s rare for a player’s shooting performance to be less than .35 or greater than .55. Knowing that Normal densities are usually within 3\(\sigma\) of its mean, we can set \[\mu_\theta \sim N(-.2, 0.135^2) \quad\text{and}\quad \tau_\theta \sim Gamma(.001, .001)\]
This is a more informative prior than the previous one, and will gave a posterior distribution with smaller variance.
This model is run in WinBUGS. The result is shown in the following box plot.
Since there is clear increasing trend for Kobe’s field goal percentage. We could use a time series AR(1) model to mode that. Assume \[y_t \sim Binomial(n_t, \pi_t)\] and \[logit(\pi_t) = \log\big(\frac{\pi_t}{1-\pi_t}\big) = \theta_t\] but we change the prior on \(\theta_t\) to be: \[\theta_t \sim N(\phi \theta_{t-1}, \tau_\theta) \] in which \(\tau_\theta\) is the precision parameter and \(t\) starts from 1997 (the second season).
The hyperpriors are: \[\theta_{1996} \sim N(-.2, 0.135),\quad \phi \sim N(0, 100) \quad\text{and}\quad \tau_\theta \sim Gamma(.001, .001)\]
That means the conditional prior is Normal with mean \(\phi\) times the previous game’s log odds ratio.
The DIC’s of those models are computed and compared in the following table.