Since the 2014-2015 season, the line-up of the Golden State Warriors has included a group of players popularly known as the ‘Death lineup’ of the NBA. This name is as a result of the relatively shorter players under an average height of 6.8 feet who are faster, more agile and have high scoring abilities to play excellently and win championships much to the envy of many teams. This death lineup consists of Stephen Curry, Klay Thompson, Draymond Green, Andre Iguodala and Kevin-Durant who joined the team in the 2016-2017 basketball season.
This style of play is now being adopted by teams like the Boston Celtics, Oklahoma City Thunders and the Rockets hence the name ‘small ball basketball’ to refer to all teams that have adopted this style of play. This style of play pushes the pace of the game by increasing the pace of the offense and causing the defense to be more spread out around the court. It also includes a faster and more agile forward who would typically play a small forward position and who has a better three-point shooting percentage than a traditional forward. Players like Kevin Durant and Lebron James are examples of players who play the role of power forwards.
Due to their fast pace of play and the taller and more defensive nature of the more traditional players on the other teams, these smaller players are also more likely to score outside of the paint, have a higher 3-point percentage and also have a higher number of assists in a game as they would bounce the ball around the taller players till they find a comfortable three point shooting position.
Though effective, there are many challenges that accompany small ball basketball. Among these include lower physical strength of smaller players and lower post offensive and defensive abilities. Due to the smaller nature of the players, rebounds might be more difficult as the taller and bigger players could easily snatch the ball right from the rim. Due to the fast pace of their play also, they are more likely to lose the ball and have a higher turnover rate and lower blocking rate than the more traditional teams.
The pros of small ball basketball might sometimes be outweighed by the cons and so for this project, I would be considering which of these factors (number of assists, turnovers, blocks, total rebounds and three point percentage) seem have a greater influence on whether the team wins or loses a game.
The data set is from the Basketball Reference website and contains data on all statistics related to the games played by the GSW during the 2017/2018 season. For the purposes of this project I would be taking data on all games in the season with the exception of playoffs and quarter to final games played by the team. Specifically, I would do a little data cleaning on the available data and gather data on the number of assists, turnovers, blocks, total rebounds and three point percentage.
As expected, rebounds and assists are positively correlated with the probability of winning. This is an interesting result as it goes to show that teams who are active on the defensive end tend to be active on the offensive end as well. There are few exceptions however. The Warriors managed to beat the Toronto Raptors who finished 1st in NBA Eastern Conference when they had a very low number of rebounds and just a fair amount of assists. The Houston Rockets who also finished 1st in the NBA Western Conference beat the Warriors when they had a high number of assists and a fair amount of rebounds. The Boston Celtics who finished 2nd in NBA Eastern Conference lost to the Warriors when they had a very low number of assists and a fairly low number of rebounds.
The scatterplot of the three point percentage against for each game shows that the team had more wins with increasing three point percentages. Games where the Warriors had very low three point percentages resulted in a loss. The region shaded purple and blue represent the games missed by two of best shooters in the team, Steph Curry (the 2015-2016 scoring champion) and Kevin Durant (4x scoring champion) respectively. The two lowest three point percentages were observed when these players were absent.
The plot of the number of blocks observed at each game, shows that a greater number of games were won when the team had a high number of blocks. The 2016-2017 Defensive player of the year, Draymond Green, has the highest blocking record on the team. The green region shows the games he missed. It would be observed that the highest number of blocks were observed when he was active in a game.
A scatterplot of the number of assists against the number of turnovers shows no linear relationship between the number of assists and the total number of turnovers. The higher the number of assists and the lower the number of turnovers, the greater the number of wins and vice versa. There are a few exceptions. In the game against the Sacramento Kings, the Warriors recorded a very high number if rebounds and just a fair amount of assists yet they still won the game.
The effects plot shows a positive relationship between the number of assists, rebounds, blocks and three point percentage and the probability of winning. It however shows a negative relationship between the number of wins and the number of turnovers. This is expected as turnovers tend to hamper a team’s chances of winning.
I would be using a Bayesian logistic regression model to model the relationship between whether or not the team loses a match and the 3-point percentage, the number of blocks, assists per game, the number of turnovers per game and the total number of rebounds per game. The response of interest is whether or not the team wins or loses a match and it is binary.
Of interest to us is the derivation of the posterior distribution of the regression coefficients \(\beta\).
The sampling model, is defined by
\[{Win_i|\tilde{x}_i,\tilde{\beta}_i} \sim {Bernoulli(\theta_i)}\]
where
\(Win_i\) is the a categorical variable which represents whether or not the Warriors win a match i and \(\theta\) is the probability that they win game i and \(log(\frac{\theta_i}{1-\theta_i})=\tilde{\beta}^TX_i\)
The logistic regression model can be expressed as
\[log(\frac{\theta_i}{1-\theta_i})=\beta_o+\beta_1TPP_i+\beta_2TRB_i+\beta_3AST_i+\beta_4BLK_i+\beta_5TOV_i\]
I would use the Metropolis algorithm where the normal proposal distribution is given as \(J(\theta^{*}|\theta^{(s)}) \sim N(\theta^{(s)},\gamma^{2})\) is symmetric and has a step size of 0.05.
The Metropolis algorithm follows as:
Sample \(\theta^{*}|\theta^{(s)} \sim J(\theta^{*}|\theta^{(s)})\)
Compute the acceptance ratio:
\[r=\frac{p(\theta*|y)}{p(\theta^{(s)})}\]
\(\theta^{(s+1)} =\theta^*\) with probability min(r,1)
or
\(\theta^{(s+1)} =\theta^{(s)}\) with probability 1 0 min(r,1)
##Acceptance Ratio The acceptance ratio was 0.4665 and the steps were enough for efficient mixing.
## [1] 0.4665
##Trace Plots
The trace plots show convergence for all of the coefficients.
The posterior means for the slope associated with assists is 0.94 with a 95% credible interval of (0.23, 1.74). This indicates that the estimated log odds of winning increases by 0.94 for every 1 additional assist after accounting for the other variables in the model.
The posterior mean for the slope associated with total rebounds was 0.74 with a 95% credible interval of (-0.01,1.58). This indicates that the estimated log odds of winning increases by 0.76 for each 1 rebound increase in a game after accounting for the other variables in the model.
The posterior mean for the slope associated with three point percentage was 1.13 with a 95% credible interval of (0.47,1.88). This indicates that the estimated log odds of winning increases by 1.13 for every 1% increase in the three point percentage after accounting for the other variables in the model.
The posterior mean for the slope associated with blocks was 0.18 with a 95% credible interval of (-0.45,0.82). This indicates that the estimated log odds of winning increases by 0.18 for every 1 additional block in a game after accounting for the other variables in the model.
The posterior mean for the slope associated with turnovers was -0.86 with a 95% credible interval of (-1.52,-0.26). This indicates that the estimated log odds of winning decreases by 0.86 for every 1 additional turnover after accounting for the other variables in the model.
## [1] 0.94 0.76 1.13 0.18 -0.86
## [,1] [,2] [,3] [,4] [,5]
## 2.5% 0.23 -0.01 0.47 -0.45 -1.52
## 97.5% 1.74 1.58 1.88 0.82 -0.26
##Appendix
#TRB and AST
win<-as.factor(gsw$wins)
ggplot(data=gsw,aes(y=TRB,x=AST,label=Opp,colour=win))+geom_text(size=3.5)+ ggtitle("A scatterplot of Total rebounds against the number Assists")+ scale_colour_discrete(labels=c("Lost","Win"))+annotate("segment",x=26,xend=27.5,y=30,yend=30,size=0.02,arrow=arrow())+annotate("segment",x=32,xend=33.5,y=41,yend=41,size=0.02,arrow=arrow())+annotate("segment",x=17,xend=18.5,y=43,yend=43,size=0.05,arrow=arrow())
##TPP and Wins
win<-as.factor(gsw$wins)
p<-ggplot(data=gsw,aes(y=TPP,x=G,colour=win))+geom_line(size=0.8)+ ggtitle("Scatterplot of the 3 point percentage and Games (Steph Curry and Kevin Durant)")
p+annotate("rect",xmin=26,xmax=36,ymin=0, ymax=0.7, alpha=0.1,fill="purple") + annotate("rect",xmin=66,xmax=82,ymin=0, ymax=0.7, alpha=0.1,fill="purple") + annotate("rect",xmin=69,xmax=74,ymin=0, ymax=0.7, alpha=0.1,fill="blue") + scale_colour_discrete(labels=c("Lost","Win"))
#BLK and wins
win<-as.factor(gsw$wins)
k<-ggplot(data=gsw,aes(y=BLK,x=G,colour=win))+geom_line(size=0.8)+ ggtitle("A plot of Blocks against games (considering Draymond Green)")
k+ annotate("rect",xmin=72,xmax=74,ymin=0, ymax=20, alpha=0.1,fill="green")+annotate("rect",xmin=28,xmax=31,ymin=0, ymax=20, alpha=0.1,fill="green")+ scale_colour_discrete(labels=c("Lost","Win"))
#TOV and TRB
win<-as.factor(gsw$wins)
ggplot(data=gsw,aes(y=AST,x=TOV,label=Opp,colour=win))+geom_text(size=3.5)+ ggtitle("A scatterplot of Assists against Turnovers")+ scale_colour_discrete(labels=c("Lost","Win"))
##TPP and Wins
win<-as.factor(gsw$wins)
p<-ggplot(data=gsw,aes(y=TPP,x=G,colour=win))+geom_line(size=0.8)+ ggtitle("Scatterplot of the 3 point percentage and Games (Steph Curry and Kevin Durant)")
p+annotate("rect",xmin=26,xmax=36,ymin=0, ymax=0.7, alpha=0.1,fill="purple") + annotate("rect",xmin=66,xmax=82,ymin=0, ymax=0.7, alpha=0.1,fill="purple") + annotate("rect",xmin=69,xmax=74,ymin=0, ymax=0.7, alpha=0.1,fill="blue") + scale_colour_discrete(labels=c("Lost","Win"))
mod<-glm(wins~AST+TPP+TRB+BLK+TOV,data=gsw)
plot(allEffects(mod))
set.seed(652631885)
#names(gsw)
#View(gsw)
X <- model.matrix(wins~-1 +AST+TPP+TRB+BLK+TOV,data=gsw)
n <- nrow(gsw)
p <- ncol(X)
for(i in 1:5){
X[,i] <- (X[,i] - mean(X[,i])) / sd(X[,i])
}
num.mcmc <- 10000
step.size <- .05
accept.ratio <- rep(0,num.mcmc)
beta.mcmc <- matrix(0,num.mcmc,p)
beta.prior.var <- diag(p)*100
for (i in 2:num.mcmc)
{
beta.star <- beta.mcmc[i-1,] + rmnorm(1,0,diag(p) * step.size)
p.star <- exp(X %*% beta.star)/ (1 + exp(X %*% beta.star))
p.current <- exp(X %*% beta.mcmc[i-1,])/ (1 + exp(X %*% beta.mcmc[i-1,]))
log.p.star <- sum(dbinom(gsw$wins,1,p.star,log=T)) +
dmnorm(beta.star,0,diag(p)*beta.prior.var ,log=T)
log.p.current <- sum(dbinom(gsw$wins,1,p.current,log=T)) +
dmnorm(beta.mcmc[i-1,],0,diag(p)*beta.prior.var ,log=T)
if (log(runif(1)) < (log.p.star - log.p.current))
{
beta.mcmc[i,] <- beta.star
accept.ratio[i] <- 1
}
else
{
beta.mcmc[i,] <- beta.mcmc[i-1,]
}
}
mean(accept.ratio)
par(mfcol=c(2,3))
for (i in 1:p)
{
plot(beta.mcmc[,i],main=colnames(X)[i],type='l')
}
par(mfcol=c(3,2))
for (i in 1:p)
{
hist(beta.mcmc[,i],main=colnames(X)[i], freq=FALSE, breaks=100)
}
round(apply(beta.mcmc,2,quantile,probs=c(.025,.975)), 2)