I choose NBA package(nbastatR Version0.1.151 published by MIT) to do the analytics
The package contains almost all the box metric in NBA, from 1946 till now.
The data obtained from:
NBA Stats API/Basketball-Reference
HoopsHype
nbadraft.net
realgm
Basketball Insiders
Due to the start year of small-ball usually defines as 2014 season,I selected 2014 season and 2022 season’s to do the basic analytic, and try to find out the trend of offensive style the high relative factor of win in Regular(a period of time in NBA season).
## Registered S3 method overwritten by 'ftExtra':
## method from
## as_flextable.data.frame flextable
Variables_name | explanation |
|---|---|
teanName | team Name |
slugSeason | Season(year) |
gp | count the regular season games |
fgm | file goal made |
fga | file goal attempt |
pctFG | percentage of shot |
fg3m | 3-point made |
fg3a | 3-point attempt |
pctFG3a | percentage of 3-pointshot |
ftm | free throw made |
fta | free throw attempt |
pctFT | percentage of free throw |
oreb | offenive rebound |
dreb | dffensive rebound |
treb | total rebound |
ast | assist |
pf | personal foul |
stl | stole the ball |
tov | turnover |
blk | block |
pts | score |
As I have mentioned above, 2014season is called the start of small-ball era. Let’s see what have happened in that season.
library(ggplot2)
library(scales)
season2014_team_box_score %>%
group_by(teamName) %>%
mutate(three_pa_cover=fg3a/fga ) %>%
ggplot(aes(x=teamName,y=fg3a))+
geom_col()+geom_text(aes(label = scales::percent(three_pa_cover),
y = fg3a
),
position = position_stack(vjust =0.2))
In 2014 season, the top 6 team ranking in Regular did no more than 2000 times three-point attempt, and only cover 23.46~29.08% of total shot attempt. The other 70%+ were two-point shot.
Let see how much three-point can actually transfer into scores.
season2014_team_box_score %>%
group_by(teamName) %>%
ggplot(aes(x=teamName,y=fg3m))+
geom_col()+geom_text(aes(label = scales::percent(pctFG3), y = pctFG3*1000 ),
position = position_stack(vjust =0.5))
There doesn’t seem to be much difference in percentage of made, with a maximum difference of 4.5%. This means if shot more, make more. But why they didn’t shot more three_points?
Have a look in two_points!
season2014_team_box_score %>%
group_by(teamName) %>% mutate(fg2m=fgm-fg3m) %>%
ggplot(aes(x=teamName,y=fg2m))+
geom_col()+geom_text(aes(label =scales::percent (round((fgm-fg3m)/(fga-fg3a),3)), y = round((fgm-fg3m)/(fga-fg3a),2)*1000 ),
position = position_stack(vjust =2.5))
All the teams’ two-point percentage are higher than 47%, the largest different is 8.3%.
season2014_team_box_score %>%
group_by(teamName) %>%
mutate(three_pct= (fg3m*3)/pts,
two_pct=(fgm-fg3m)*2/pts,
freethrow_pct=ftm/pts) %>%
ggplot(aes(x=teamName))+
geom_line(aes(y=three_pct,color="three_pct",group=1))+
geom_line(aes(y=two_pct,color="two_pct",group=2))+
geom_line(aes(y=freethrow_pct,color="freethrow_pct",group=3))
From the chat above, it can be concluded that 2-point is the main scoring method (55%-60% of the total), 3-point is second (20%-25%) and free throws are the lowest (15%-20%).
We can also see a peculiar phenomenon in that the Raptors seem to be the most willing of the six teams to try new attacking patterns. They have the highest percentage of threes compared to the other teams. Although the Spurs shot a higher percentage of threes than them and the Clippers attempted more total attempts ; they had the highest total field goal percentage as well as the highest percentage of points scored, so it is clear that they relied more on threes and get some results.
library(ggpubr)
s2022_1<-season2022_team_box_score %>%
group_by(teamName) %>%
mutate(three_pa_cover=fg3a/fga ) %>%
ggplot(aes(x=teamName,y=fg3a))+
geom_col()+geom_text(aes(label = scales::percent(three_pa_cover),
y = fg3a
),
position = position_stack(vjust =0.2))
s2014_1 <- season2014_team_box_score %>%
group_by(teamName) %>%
mutate(three_pa_cover=fg3a/fga ) %>%
ggplot(aes(x=teamName,y=fg3a))+
geom_col()+geom_text(aes(label = scales::percent(three_pa_cover),
y = fg3a
),
position = position_stack(vjust =0.2))
ggarrange(s2022_1,s2014_1,ncol = 1,labels = c("2022 3-points attempt and pct of total shot attempt","2014 3-points attempt and pct of total shot attempt"))
None of the top six teams in the regular season (2022 season)made fewer than 2,500 three-point attempts and accounted for at least a third of the total attempts. This is a significant increase in both the number of shots taken and the percentage of offense compared to the 2014 season.
s2014_2 <- season2014_team_box_score %>%
group_by(teamName) %>%
ggplot(aes(x=teamName,y=fg3m))+
geom_col()+geom_text(aes(label = scales::percent(pctFG3), y = pctFG3*1000 ),
position = position_stack(vjust =0.7))
s2022_2 <- season2022_team_box_score %>%
group_by(teamName) %>%
ggplot(aes(x=teamName,y=fg3m))+
geom_col()+geom_text(aes(label = scales::percent(pctFG3), y = pctFG3*1000 ),
position = position_stack(vjust =0.7))
ggarrange(s2022_2,s2014_2,ncol = 1,labels = c("2022 3-points made and fiel goal pct ","2014 3-points made and fiel goal pct "))
There is no big difference in terms of three-points. The total number of three-points, on the other hand, has increased substantially. The reason for this is also very clear - a significant increase in the number of three pointers attempted.
s2014_3 <- season2014_team_box_score %>%
group_by(teamName) %>% mutate(fg2m=fgm-fg3m) %>%
ggplot(aes(x=teamName,y=fg2m))+
geom_col()+geom_text(aes(label =scales::percent (round((fgm-fg3m)/(fga-fg3a),3)), y = round((fgm-fg3m)/(fga-fg3a),2)*1000 ),
position = position_stack(vjust =2.5))
s2022_3 <- season2022_team_box_score %>%
group_by(teamName) %>% mutate(fg2m=fgm-fg3m) %>%
ggplot(aes(x=teamName,y=fg2m))+
geom_col()+geom_text(aes(label =scales::percent (round((fgm-fg3m)/(fga-fg3a),3)), y = round((fgm-fg3m)/(fga-fg3a),2)*1000 ),
position = position_stack(vjust =2.5))
ggarrange(s2022_3,s2014_3,ncol = 1,labels = c("2022 2-points made and fiel goal pct ","2014 2-points made and fiel goal pct "))
s2014_4 <- season2014_team_box_score %>%
group_by(teamName) %>%
mutate(three_pct= (fg3m*3)/pts,
two_pct=(fgm-fg3m)*2/pts,
freethrow_pct=ftm/pts) %>%
ggplot(aes(x=teamName))+
geom_line(aes(y=three_pct,color="three_pct",group=1))+
geom_line(aes(y=two_pct,color="two_pct",group=2))+
geom_line(aes(y=freethrow_pct,color="freethrow_pct",group=3))
s2022_4 <- season2022_team_box_score %>%
group_by(teamName) %>%
mutate(three_pct= (fg3m*3)/pts,
two_pct=(fgm-fg3m)*2/pts,
freethrow_pct=ftm/pts) %>%
ggplot(aes(x=teamName))+
geom_line(aes(y=three_pct,color="three_pct",group=1))+
geom_line(aes(y=two_pct,color="two_pct",group=2))+
geom_line(aes(y=freethrow_pct,color="freethrow_pct",group=3))
ggarrange(s2022_4,s2014_4,ncol = 1,labels = c("2022 Composition of the total score ","2014 Composition of the total score"))
These two charts are comparing the composition of the scores and we can see that:
The percentage of free throws dropped from 20% to about 10%, with a decrease in free throws without a significant increase in the number of shots taken.
The percentage of 3-points has increased substantially, originally at 20% ~25%, to 30% ~40%. It is indeed a big change and can be called a REVOLUTION.
There was also a drop of almost 10% in two-points.
Get the data set
The chart shows no great difference in hit rate, even slightly lower. But in terms of the variability in the level of the teams’ 3-pointers it is narrowing.
In keeping with what we have observed in the top 6 teams in the regular season, teams are trying to shoot fewer two points and the number of made has dropped. Let’s look at the 2-points shot percentage again.
The 2022 season looks better than 2014 in terms of two-point shooting.
That may have something to do with fewer mid-range and long-range 2-points being taken.
Everyone is now more willing to shoot inside the paint and more likely to score or cause opponents to foul.
Free throw ability has been better
The total amount of free throw has slightly decreased
Earlier we saw no significant increase in either two-point or three-point percentage, while total points scored rose dramatically. Let’s look at the points per game.
Several reasons for this:
Increased pace of play and more attacking rounds
Three-point attempts as well as makes increased, with some of the previous two-point attempts becoming threes, leading to an overall increase in scoring
Compared to the old days NBA teams are now more than happy to shoot threes and are becoming a more regular weapon, taking up over a third of the points, a substantial increase. It also proves that they are now more confident than ever to hit the three point shot.
Two-point attempts are down, but overall hitting is up, and without taking into account a player’s ability to shoot, there is a good chance that they are attempting closer shots.
Free throw totals are down, but only slightly. If there had to be a reason for this, it could be because the intensity of the physical play is now decreasing, as is the quality of the defence.
Scoring average has improved, but this is a general trend since the 21st century.
The overall shooting ability of players has gone up, and it’s hitting at an increased rate, regardless of the type of scoring. Most notably the three-point shot.
Due to the lack of coordinates of the shooting position, it is not possible to do further visualization and calculate the shooting distance. If there was an offensive timer there would also be further insight into the offensive options.This also plays a heavy role in the interpretation of the offensive model.
library(tidyverse)
library(GGally)
small_ball_era_start <- small_ball_era_start%>%
group_by(dateGame,nameTeam) %>%
select(5,8,21,28,29,31,32,36,37,40:50) %>%
mutate(
Gameresult=case_when((outcomeGame=="W"~1),
(outcomeGame=="L"~0)),
tfgm=sum(fgm),tfga=sum(fga),tfg3m=sum(fg3m),
tfg3a=sum(fg3a),tfg2m=sum(fg2m),tfg2a=sum(fg2a),
tftm=sum(ftm),tfta=sum(fta),toreb=sum(oreb),tdreb=sum(dreb),ttreb=sum(treb),
tast=sum(ast),tstl=sum(stl),tblk=sum(blk),ttov=sum(tov),tpf=sum(pf),tpts=sum(pts)
)
small_ball_era_start <- small_ball_era_start%>%
select(-c(3:20))%>%
distinct_all()
#mutate the team box score for each game in the season,and do the correlation
#first we remove some high relative variables.
c <- small_ball_era_start %>%
ungroup() %>%
select(-c(1,2,3))
ggcorr(data = c)
#then we filter out five variables may have high relative with the win
g <- small_ball_era_start %>%
ungroup() %>%
select(-c(1,2,4,5,14,10))
ggcorr(data = g)
g <- g%>%
select(Gameresult,tpts,tdreb,tfg2m,tfg3m,tast,tstl)
g %>% ggpairs(.,
title = "the factors with high relative to the win compared to the other",
mapping = ggplot2::aes(colour=as.factor(Gameresult)),
lower = list(continuous = wrap("smooth", alpha = 0.3, size=0.1),
discrete = "blank", combo="blank"),
diag = list(discrete="barDiag",
continuous = wrap("densityDiag", alpha=0.5 )),
upper = list(combo = wrap("box_no_facet", alpha=0.5),
continuous = wrap("cor", size=4, alignPercent=0.8))) +
theme(panel.grid.major = element_blank())
In 2014 season,the total points / defend rebound and assist are the top 3 high relative factors.
It is indeed difficult to identify the differences and further modelling may be required to get a closer answer.
library(rpart)
library(rpart.plot)
w <- rpart(Gameresult~., data=g, method="anova")
w$variable.importance
## tpts tdreb tast tfg2m tfg3m tstl
## 126.02848 59.39596 40.34308 35.77511 35.54322 12.29514
last_season <- last_season%>%
group_by(dateGame,nameTeam) %>%
select(5,8,21,28,29,31,32,36,37,40:50) %>%
mutate(
Gameresult=case_when((outcomeGame=="W"~1),
(outcomeGame=="L"~0)),
tfgm=sum(fgm),tfga=sum(fga),tfg3m=sum(fg3m),
tfg3a=sum(fg3a),tfg2m=sum(fg2m),tfg2a=sum(fg2a),
tftm=sum(ftm),tfta=sum(fta),toreb=sum(oreb),tdreb=sum(dreb),ttreb=sum(treb),
tast=sum(ast),tstl=sum(stl),tblk=sum(blk),ttov=sum(tov),tpf=sum(pf),tpts=sum(pts)
)
last_season <- last_season%>%
select(-c(3:20))%>%
distinct_all()
#mutate the team box score for each game in the season,and do the correlation
#first we remove some high relative variables.
h <- last_season %>%
ungroup() %>%
select(-c(1,2,3))
ggcorr(data = h)
#then we filter out five variables may have high relative with the win
j <- last_season %>%
ungroup() %>%
select(-c(1,2,4,5,14,10))
ggcorr(data = j)
k <- j%>%
select(Gameresult,tpts,tdreb,tfg2m,tfg3m,tast,tblk)
k %>% ggpairs(.,
title = "the factors with high relative to the win compared to the other",
mapping = ggplot2::aes(colour=as.factor(Gameresult)),
lower = list(continuous = wrap("smooth", alpha = 0.3, size=0.1),
discrete = "blank", combo="blank"),
diag = list(discrete="barDiag",
continuous = wrap("densityDiag", alpha=0.5 )),
upper = list(combo = wrap("box_no_facet", alpha=0.5),
continuous = wrap("cor", size=4, alignPercent=0.8))) +
theme(panel.grid.major = element_blank())
q <- rpart(Gameresult~., data=k, method="anova")
q$variable.importance
## tpts tdreb tast tfg3m tfg2m
## 140.71582 65.55712 50.97707 39.07252 33.81742
The top3 are still total points, defensive rebounds and assists.
There was a small increase in defensive rebounds, three-pointers made and the correlation between assists and wins. Proof that these three abilities are more important compared to the past.
)
https://www-degruyter-com.ez.library.latrobe.edu.au/document/doi/10.1515/jqas-2015-0027/html
research question is that Using model to fairly evaluate the impact of players on team wins.
the study aims:
to identify highly paid players with low impact relative to their teammates
players whose high impact is not captured by existing metrics
to estimate an individual player’s impact, after controlling for the other players on the court.
play-by-play data obtained from ESPN for 8365 of the 9840 (85%) of the scheduled regular season games in each of the eight seasons between 2006 and 2014. The other 15% have the missing value.
The authors used a regression model to estimate the change in win rate and to evaluate the model veracity.
The authors used the Bayesian linear regression model to evaluate the influence of players on the winning percentage of matches
IImpact scores are environment dependent, and a player’s impact score from one season can help predict his impact score in the following season because he is more consistent unless his playing environment (including team changes, injuries, playing rules, etc.) changes significantly.
A player’s Impact Score is not affected by year and when combined with PER (a satistic metric of evaluating personal performance without considering any conditions), it does give a more realistic feedback of a player’s performance. When a player has a high PER as well as Impact of Score, he must be a player who plays a role in his team’s winning (e.g. Durant, James, Nowitzki).
A unit change in time corresponds to a smaller change in probability of winning than a unit change in lead, especially near the end of the game. This may introduce a slight bias against players who are often subbed in on defense and subbed out on offense, since such players would not be associated with a large change in win probability.
The impact of any one player in a single substitution on his team’s chances of winning is small, usually less than 1%.
No player is more likely to significantly improve his team’s chances of winning than any other player.
win probability estimation is admittedly simplistic and designing a more sophisticated procedure is an area for future work.
to entertain two-way or three-way player interactions, in case there are any on-court synergies or mismatches amongst small groups of players.
the variables σ which measure uncertainty of the win probability change by the shift and the performance need to be segmented.
Changing evaluating one person into evaluating a group of people (three or four). It adds a lot of computational work, but it plays a very important role in the team’s ability to win games.
2.5 comments
In Table 4, I found some interesting phenomena. 2008-2010 Lebron was a one-man team and won back-to-back MVPs. 2011-2014 he was in the heat of forming the Big Three (three All-starplayers played together) and his influence must have declined compared to his previous stint with the Cavs. Both offensively and defensively. You can also observe the disappearance of Wade from the list. But Wade was still in the prime of his career in those years. This evaluation model is not so friendly to teams with more star players. There are some advantages for the evaluation of weaker teams with their leading players.
The Impact score does play a role in the overall assessment. Defensive bruisers have been underrated for a long time because there is very little box score metric on defense( little derective metrics: steal block backward rebound).They are really influencing the game.
This individual Impact score is very interesting. Its starting point and the results of the final simulation are also very close to the results of the MVP ballot at that time. We have been struggling with Lebron’s impact not matching his stats at times (and of course other players such as Harden this season). I think this is a solution.
While it’s fair as well as necessary to individually assess the impact of individuals on winning, basketball is ultimately a team sport. A lot of GMs have been talking about team chemistry, and a lot of ncaa coaches are talking about a winning culture, all proving that evaluating a group of players (three or four) is important to winning games. The Lakers changed their starting lineup 37 times last season more times than they won because they couldn’t find the right mix of players.