require(coda)
## Loading required package: coda
setwd("F:/Academics/2016 Spring/Bayesian Statistics")
source("Bayesfunctions.r")
nba <- read.csv(file="NBA2015Data.csv", header=TRUE)
dim(nba)
## [1] 492 28
fit <- lm(PF ~ STL+BLK+MIN, data=nba)
summary(fit)
##
## Call:
## lm(formula = PF ~ STL + BLK + MIN, data = nba)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.69405 -0.30945 -0.03146 0.27926 1.53612
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.61890 0.05135 12.053 <2e-16 ***
## STL 0.12737 0.07484 1.702 0.0894 .
## BLK 0.51324 0.05129 10.006 <2e-16 ***
## MIN 0.04143 0.00373 11.108 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4661 on 488 degrees of freedom
## Multiple R-squared: 0.5762, Adjusted R-squared: 0.5736
## F-statistic: 221.1 on 3 and 488 DF, p-value: < 2.2e-16
confint.lm(fit)
## 2.5 % 97.5 %
## (Intercept) 0.51800734 0.71978727
## STL -0.01967918 0.27441343
## BLK 0.41245597 0.61402238
## MIN 0.03410368 0.04876113
[1] Beta2
We are 95% confident that if a player have one more steal per game, his personal foul per game will change as low as decreasing by -0.01967918, and as high as increasing by 0.27441343.
[2] Beta3
We are 95% confident that if a player have one more block per game, his personal foul per game will increase by as low as 0.41245597, and as high as 0.61402238.
[3] Beta4
We are 95% confident that if a player plays one more minute per game, his personal foul per game will increase by as low as 0.03410368, and as high as 0.04876113.
new <- data.frame(STL=1, BLK=1, MIN=25)
predict.lm(fit, newdata=new,interval="prediction")
## fit lwr upr
## 1 2.295314 1.376144 3.214484
A player that averages 1 steal, 1 block, and 25 minutes a game is perdicted to get 2.295314 per game. And we are 95% confident that his personal foul per game is from 1.376144 to 3.214484.
shapiro.test(nba$PF)
##
## Shapiro-Wilk normality test
##
## data: nba$PF
## W = 0.99538, p-value = 0.1532
par(mfrow=c(2,2))
plot(fit)
Shapiro test and QQ-plot show that personal foul is approximately normal, in accordance with the assumption.
There is a slight “^” pattern in the residual plot, showing that variance may not be constant for all the data. Transformation may make the model better.
First, I believe that STL, BLK and MIN are all positively associated with PF.
For the following four players, my believes are as follows:
(1) For players with 2 steals, 1 block, 35 minutes per game (similar with Kawhi Leonard), the mean of their personal fouls per game is 2.8; 95% percentle of their personal fouls per game is 3.5.
(2) For players with 1 steal, 2 blocks, 35 minutes per game (similar with DeAndre Jordan), the mean of their personal fouls per game is 3; 95% percentle of their personal fouls per game is 3.8.
(3) For players with 0.5 steal, 0.5 block, 35 minutes per game (average defensive ability), the mean of their personal fouls per game is 2; 95% percentle of their personal fouls per game is 2.6.
(4) For players with 0.1 steal, 0.1 block, 25 minutes per game (bad defensive ability), the mean of their personal fouls per game is 1; 95% percentle of their personal fouls per game is 1.5.
m <- rep(NA, 4) # Y.tilde in the book
D <- diag(rep(NA, 4)) # D(w.tilde) in the book
find.normal(prior.mean=2.8, percentile=3.5, p=0.95)
## $mean
## [1] 2.8
##
## $sd
## [1] 0.4255698
##
## $precision
## [1] 5.521517
find.normal(prior.mean=3, percentile=3.8, p=0.95)
## $mean
## [1] 3
##
## $sd
## [1] 0.4863655
##
## $precision
## [1] 4.227412
find.normal(prior.mean=2, percentile=2.6, p=0.95)
## $mean
## [1] 2
##
## $sd
## [1] 0.3647741
##
## $precision
## [1] 7.515398
find.normal(prior.mean=1, percentile=1.5, p=0.95)
## $mean
## [1] 1
##
## $sd
## [1] 0.3039784
##
## $precision
## [1] 10.82217
m <- c(2.8, 3, 2, 1)
D <- diag( c( (0.3039784)^2, (0.4863655)^2, (0.3647741)^2, (0.3039784)^2 ) )
player1 <- c(1, 2, 1, 35)
player2 <- c(1, 1, 2, 35)
player3 <- c(1, 0.5, 0.5, 35)
player4 <- c(1, 0.1, 0.1, 25)
X.tilde <- rbind(player1, player2, player3, player4)
X.tilde
## [,1] [,2] [,3] [,4]
## player1 1 2.0 1.0 35
## player2 1 1.0 2.0 35
## player3 1 0.5 0.5 35
## player4 1 0.1 0.1 25
qr(X.tilde)$rank # rank of X.tilde
## [1] 4
prior.mean.beta <- solve(X.tilde) %*% m
prior.mean.beta
## [,1]
## [1,] -0.690
## [2,] 0.350
## [3,] 0.550
## [4,] 0.064
prior.cov.beta <- solve(X.tilde) %*% D %*% t(solve(X.tilde))
prior.cov.beta
## [,1] [,2] [,3] [,4]
## [1,] 2.73672363 0.230776180 0.29564302 -0.098638221
## [2,] 0.23077618 0.100026112 -0.02841389 -0.009517496
## [3,] 0.29564302 -0.028413889 0.17210038 -0.012400467
## [4,] -0.09863822 -0.009517496 -0.01240047 0.003663589
coef.names <- c("Intercept", "Steals", "Blocks", "Minutes")
beta.prior.sample <- matrix(NA, nrow=1e5, ncol=4)
par(mfrow=c(2, 2))
for(j in 1:4){
beta.prior.sample[ , j] <- rnorm(1e5, mean=prior.mean.beta[j], sd=sqrt(prior.cov.beta[j,j]))
plot(density(beta.prior.sample[, j]), type="l", main=substitute(paste("Prior Density of ", beta[a]), list(a=coef.names[j])), xlab=substitute(beta[a], list(a=coef.names[j])), ylab="density")
}
For players with 0.5 steal, 0.5 block, 35 minutes per game (average defensive ability), the mean of their personal fouls per game is 2; 95% percentle of their personal fouls per game is 2.6; 95% percentile of that percentile is 3.**
normal.percentile.to.sd(mean.value=2, percentile=2.6, p=0.95) # Returns prior mode guess for sigma
## $sd
## [1] 0.3647741
normal.percentile.to.sd(mean.value=2, percentile=3, p=0.95) # Returns prior percentile guess for sigma
## $sd
## [1] 0.6079568
gamma.parameters <- find.tau.gamma(prior.sigma.mode=0.3647741, sigma.percentile=0.6079568, p=0.95) # Returns shape and rate parameters for the Gamma distribution of tau
gamma.parameters
## $a
## [1] 5.69
##
## $b
## [1] 0.8901724
tau.prior.shape <- gamma.parameters$a
tau.prior.rate <- gamma.parameters$b
par(mfrow=c(1, 2))
plot(density(rgamma(10000, shape=tau.prior.shape, rate=tau.prior.rate)), main=expression(paste("Prior Density of ", tau)), xlab=expression(tau), ylab="density")
plot(density(1/sqrt(rgamma(10000, shape=tau.prior.shape, rate=tau.prior.rate ))), main=expression(paste("Prior Density of ", sigma)), xlab=expression(sigma), ylab="density")
X <- model.matrix(fit)
y <- nba$PF
beta.hat <- solve(t(X)%*%X) %*% t(X) %*% y
MCMC <- Gibbs.lm(net.iterations=50000, n.thin=1, burn.in=100000,
start.beta = beta.hat, X=X, y=y, a=tau.prior.shape,
b=tau.prior.rate, prior.mean.beta, prior.cov.beta)
colnames(MCMC$beta) <- names(coef(fit))
beta.summary <- round(t(apply(MCMC$beta, MARGIN=2, FUN=summary.Bayes)), digits=3) # apply the summary.Bayes function to each regression coefficient
beta.summary
## Min Q1 Median Mean Q3 Max SD HPD-Lower
## (Intercept) 0.406 0.592 0.626 0.626 0.661 0.836 0.051 0.526
## STL -0.154 0.118 0.165 0.165 0.213 0.448 0.071 0.028
## BLK 0.318 0.488 0.522 0.522 0.556 0.747 0.050 0.426
## MIN 0.025 0.037 0.039 0.039 0.042 0.055 0.004 0.032
## HPD-Upper Level
## (Intercept) 0.725 0.95
## STL 0.306 0.95
## BLK 0.623 0.95
## MIN 0.046 0.95
round(summary.Bayes(MCMC$sigma, HPD.p=0.95), digits=3)
## Min Q1 Median Mean Q3 Max SD
## 0.410 0.455 0.465 0.465 0.475 0.534 0.015
## HPD-Lower HPD-Upper Level
## 0.437 0.494 0.950
# compared to frequentist
summary(fit)
##
## Call:
## lm(formula = PF ~ STL + BLK + MIN, data = nba)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.69405 -0.30945 -0.03146 0.27926 1.53612
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.61890 0.05135 12.053 <2e-16 ***
## STL 0.12737 0.07484 1.702 0.0894 .
## BLK 0.51324 0.05129 10.006 <2e-16 ***
## MIN 0.04143 0.00373 11.108 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4661 on 488 degrees of freedom
## Multiple R-squared: 0.5762, Adjusted R-squared: 0.5736
## F-statistic: 221.1 on 3 and 488 DF, p-value: < 2.2e-16
The mean of regression coeffients for Intercept, STL, BLK and MIN are 0.626, 0.166, 0.522, 0.039, respectively. They are quite close to the ones by frequentist approach (0.61890, 0.12737, 0.51324, 0.04143).
The mean of sigma is 0.465, quite close to the one by frequentist approach (0.4661).
coef.names <- c("Intercept", "Steals", "Blocks", "Minutes")
par(mfrow=c(2, 2))
#marginal posterior for each beta_j
for(j in 1:4){
plot(density(MCMC$beta[, j]), type="l", main=substitute(paste("Posterior Density of ", beta[a]), list(a=coef.names[j])), xlab=substitute(beta[a], list(a=coef.names[j])), ylab="density")
lines(HPDinterval(as.mcmc(MCMC$beta[,j])), c(0,0), col="red", cex=2, lty=3) # add PI line
legend("topright", legend="95% HPD interval", lty=2, col="red", cex=0.4)
}
These four plots shows posterior densities of the regression coeffcients for Intercept(topleft), Steals(Topright), Blocks(Bottomleft) and Minutes(Bottomright), respectively.
Their 95% HPD inteval are shown as red lines.
[1] Beta2
There is 95% probabililty that if a player have one more steal per game, his personal foul per game will increase by as low as 0.028, and as high as 0.307.
[2] Beta3
There is 95% probabililty that if a player have one more block per game, his personal foul per game will increase by as low as 0.423, and as high as 0.621.
[3] Beta4
There is 95% probabililty that if a player plays one more minute per game, his personal foul per game will increase by as low as 0.032, and as high as 0.046.
par(mfrow=c(1,1))
# marginal posterior for each sigma
plot(density(MCMC$sigma), type="l", main=expression(paste("Posterior Density of ", sigma)), xlab=expression(sigma), ylab="density")
lines(HPDinterval(as.mcmc(MCMC$sigma)), c(0,0), col="red", cex=2, lty=3) # add PI line
legend("topright", legend="95% HPD interval", lty=2, col="red", cex=0.7)
There is 95% probabililty that the standard deviation of players’ personal foul per game is as low as 0.437, and as high as 0.495.
x.f <- c(1, 1, 1, 25) # covariates of new person
pf.1s.1b.25.min <- as.vector(MCMC$beta %*% x.f)
summary.Bayes(pf.1s.1b.25.min)
## Min Q1 Median Mean Q3 Max
## 2.12400000 2.27300000 2.30000000 2.30000000 2.32600000 2.45700000
## SD HPD-Lower HPD-Upper Level
## 0.03894983 2.22364428 2.37597990 0.95000000
par(mfrow=c(1,1))
plot(density(pf.1s.1b.25.min), type="l", main=expression(paste("Posterior Density of Personal Foul \nif Steals=1, Blocks=1, Minutes=25")), xlab="Personal Foul per game", ylab="density")
lines(HPDinterval(as.mcmc(pf.1s.1b.25.min)), c(0,0), col="red", cex=2, lty=3) # add PI line
legend("topright", legend="95% HPD interval", lty=2, col="red", cex=0.7)