Stefan Nikolovski, Student Number: s3844709
2022-06-08
The NBA is a basketball league that has been around for 75 years. The American league is considered to have the best basketball competition on the planet. With 30 teams striving to win an NBA title each year, teams would like to develop and sign the best players possible for them to have the best chance of winning it all. Through the data visualisations in this report, we will find what NBA teams need in order to build the best team possible to compete for a championship.
The 3 main statistics in basketball will be used (points per game (PTS), rebounds per game (TRB), and assists per game (AST)). These statistics will be used to find prime player ages and positions in order to maximize the chance of production in order to win an NBA championship.
The data is sourced from kaggle and includes every NBA players statistics for the 2020-21 NBA regular season;: source: https://www.kaggle.com/datasets/umutalpaydn/nba-20202021-season-player-stats
To start we load the required packages we need
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Next, we load in the data:
Checking the first few observations of the data for any problems:
## Player Pos Age Tm G GS MP FG FGA FG. X3P X3PA X3P. X2P
## 1 Precious Achiuwa PF 21 MIA 28 2 14.6 2.6 4.4 0.590 0.0 0.0 0.000 2.6
## 2 Jaylen Adams PG 24 MIL 6 0 2.8 0.2 1.3 0.125 0.0 0.3 0.000 0.2
## 3 Steven Adams C 27 NOP 27 27 28.1 3.5 5.8 0.603 0.0 0.0 0.000 3.5
## 4 Bam Adebayo C 23 MIA 26 26 33.6 7.4 12.9 0.573 0.1 0.2 0.400 7.3
## 5 LaMarcus Aldridge C 35 SAS 18 18 26.7 5.9 12.5 0.476 1.3 3.7 0.358 4.6
## 6 Ty-Shon Alexander SG 22 PHO 3 0 2.7 0.0 1.0 0.000 0.0 0.3 0.000 0.0
## X2PA X2P. eFG. FT FTA FT. ORB DRB TRB AST STL BLK TOV PF PTS
## 1 4.4 0.590 0.590 1.3 2.4 0.561 1.3 2.7 4.0 0.6 0.4 0.5 1.0 1.9 6.5
## 2 1.0 0.167 0.125 0.0 0.0 0.000 0.0 0.5 0.5 0.3 0.0 0.0 0.0 0.2 0.3
## 3 5.7 0.606 0.603 1.1 2.3 0.468 4.3 4.6 8.9 2.1 1.0 0.6 1.7 1.9 8.0
## 4 12.7 0.576 0.576 5.1 6.0 0.841 1.9 7.3 9.2 5.3 1.0 1.0 3.0 2.6 19.9
## 5 8.8 0.525 0.529 0.9 1.2 0.762 0.8 3.5 4.3 1.9 0.4 0.9 0.9 1.5 14.1
## 6 0.7 0.000 0.000 0.0 0.0 0.000 0.0 0.3 0.3 0.3 0.0 0.0 0.0 0.3 0.0
Checking the structure of the data:
## 'data.frame': 497 obs. of 29 variables:
## $ Player: chr "Precious Achiuwa" "Jaylen Adams" "Steven Adams" "Bam Adebayo" ...
## $ Pos : chr "PF" "PG" "C" "C" ...
## $ Age : int 21 24 27 23 35 22 22 25 22 22 ...
## $ Tm : chr "MIA" "MIL" "NOP" "MIA" ...
## $ G : int 28 6 27 26 18 3 23 19 28 12 ...
## $ GS : int 2 0 27 26 18 0 3 8 10 5 ...
## $ MP : num 14.6 2.8 28.1 33.6 26.7 2.7 19.2 23.9 26.2 26.7 ...
## $ FG : num 2.6 0.2 3.5 7.4 5.9 0 3.3 3.2 4.4 3.7 ...
## $ FGA : num 4.4 1.3 5.8 12.9 12.5 1 8.2 7.4 6.8 5.4 ...
## $ FG. : num 0.59 0.125 0.603 0.573 0.476 0 0.41 0.429 0.642 0.677 ...
## $ X3P : num 0 0 0 0.1 1.3 0 1 2.3 0 0 ...
## $ X3PA : num 0 0.3 0 0.2 3.7 0.3 3.8 5.3 0.1 0 ...
## $ X3P. : num 0 0 0 0.4 0.358 0 0.276 0.436 0.25 0 ...
## $ X2P : num 2.6 0.2 3.5 7.3 4.6 0 2.3 0.8 4.3 3.7 ...
## $ X2PA : num 4.4 1 5.7 12.7 8.8 0.7 4.4 2.1 6.6 5.4 ...
## $ X2P. : num 0.59 0.167 0.606 0.576 0.525 0 0.525 0.41 0.651 0.677 ...
## $ eFG. : num 0.59 0.125 0.603 0.576 0.529 0 0.473 0.586 0.645 0.677 ...
## $ FT : num 1.3 0 1.1 5.1 0.9 0 1.1 1.7 3.6 3.8 ...
## $ FTA : num 2.4 0 2.3 6 1.2 0 1.4 1.9 4.7 5.1 ...
## $ FT. : num 0.561 0 0.468 0.841 0.762 0 0.781 0.892 0.758 0.754 ...
## $ ORB : num 1.3 0 4.3 1.9 0.8 0 0.2 0.4 2.9 3.2 ...
## $ DRB : num 2.7 0.5 4.6 7.3 3.5 0.3 2.4 2.5 6.1 7.3 ...
## $ TRB : num 4 0.5 8.9 9.2 4.3 0.3 2.7 2.9 9 10.4 ...
## $ AST : num 0.6 0.3 2.1 5.3 1.9 0.3 2 2.1 1.6 1.7 ...
## $ STL : num 0.4 0 1 1 0.4 0 1.1 1 0.5 0.6 ...
## $ BLK : num 0.5 0 0.6 1 0.9 0 0.3 0.2 1.6 1.6 ...
## $ TOV : num 1 0 1.7 3 0.9 0 1.3 1.1 1.5 1.8 ...
## $ PF : num 1.9 0.2 1.9 2.6 1.5 0.3 1.7 1.3 1.6 1.8 ...
## $ PTS : num 6.5 0.3 8 19.9 14.1 0 8.8 10.4 12.3 11.2 ...
Now for some wrangling/ preprocessing:
For this we have changed the positions of players that are ‘F-C’ to C, ‘SF-PF’ to ‘SF’ and F to ‘SF’. This is just to make the data more simple and easy to use, these new positions are the same as the old ones, just simplified and more specific.
Here we have a 5 Plots showcasing showcasing points per position by age for the 2020-2021 NBA Season, each observation in the data is one players stats.
p1 <- ggplot(data = data, aes(x = Age, y = PTS, colour = Pos)) + geom_point() + geom_smooth() + facet_grid(. ~ Pos) + ylim(0, 40)
p1## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
There is also a sort of trend line on each plot. Here it can be seen for
positions C, PF, SF and SG, as age goes up, the average points per game
also goes up, but at a certain age for each of them, there is a dip in
the amount of production in points. Meanwhile for PG’s it can be seen
that as the players get older, they might gain more experience and have
the ability to score more than most. There is non clear dip in points
per game production for the point guard position in the NBA.
Lets take a look at this production and look at the data for rebounds and assists:
First the Rebounds
p2 <- ggplot(data = data, aes(x = Age, y = TRB, colour = Pos))
p2 <- p2 + geom_point() + geom_smooth() + facet_grid(. ~ Pos) + ylim(0, 20)
p2 ## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
As shown from the visualisation above, once again, C’s, PF’s, SF’s and
SG’S have a dip in production in terms of rebounds per game once they
hit a certain age. Furthermore, PG’s clearly do not, as as they get
older, they everage more rebounds per game in the NBA.
Now the Assists:
p3 <- ggplot(data = data, aes(x = Age, y = AST, colour = Pos))
p3 <- p3 + geom_point() + geom_smooth() + facet_grid(. ~ Pos) + ylim(0, 15)
p3## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Once again, it can be seen that the trend is positive for PG’s once
again, with other positions having a dip at some point while age
increases.
Now, lets make some new data to analyse:
data2 <- aggregate(data$PTS, list(data$Pos), FUN=median)
data3 <- aggregate(data$TRB, list(data$Pos), FUN=median)
data4 <- aggregate(data$AST, list(data$Pos), FUN=median)Medians were chosen to look into rather than means as with a large sample size, medians are more accurate value in the data.
Lets take a look at the new data:
## Group.1 x
## 1 C 7.00
## 2 PF 6.45
## 3 PG 7.55
## 4 SF 6.40
## 5 SG 8.35
## Group.1 x
## 1 C 5.35
## 2 PF 3.60
## 3 PG 2.20
## 4 SF 3.10
## 5 SG 2.55
## Group.1 x
## 1 C 1.00
## 2 PF 1.05
## 3 PG 3.05
## 4 SF 1.10
## 5 SG 1.60
Lets look at this data graphically:
p4 <- ggplot(data2, aes( x = Group.1, y = x, fill = Group.1, colour = Group.1))
p4 <- p4 + geom_bar(stat = 'identity') + geom_text(aes(label = x), size = 4, vjust = -1) + ylim(0, 10)+ labs(title = "Median PTS Per Game For Each Position 2020-21 NBA Season", x = "Position", y = "PTS Per Game Median") + scale_fill_discrete(name = "Position") + theme_minimal()
p4
The rank in points per game is SG, PG, C, PF, SF. This means that the
position with the most scoring production is the SG position. If a team
needs scoring, then they should look to develop or chase after players
that play the SG position. With teams becoming more gaurd orientates,
this is possible. and as shown above, the largest point production comes
from the SG (shooting guards) and PG (point guards).
Next we will look at rebound stats:
p5 <- ggplot(data3, aes( x = Group.1, y = x, fill = Group.1, colour = Group.1))
p5 <- p5 + geom_bar(stat = 'identity') + geom_text(aes(label = x), size = 4, vjust = -1) + ylim(0, 7.5)+ labs(title = "Median TRB Per Game For Each Position 2020-21 NBA Season", x = "Position", y = "TRB Per Game Median") + scale_fill_discrete(name = "Position") + theme_minimal()
p5
As it is seen, C’s and PF’s have the highest median rebound per game
vaues. This may be the fact that C’s and PF’s are generally taller than
the other positions, meaning that they have an adnavtage when the ball
is in the air as they can get to the ball before shorter players. If a
team is lacking in rebounding, they should go after C’s and PF’s.
Next we will look at assist medians:
p6 <- ggplot(data4, aes( x = Group.1, y = x, fill = Group.1, colour = Group.1))
p6 <- p6 + geom_bar(stat = 'identity') + geom_text(aes(label = x), size = 4, vjust = -1) + ylim(0, 5)+ labs(title = "Median AST Per Game For Each Position 2020-21 NBA Season", x = "Position", y = "AST Per Game Median") + scale_fill_discrete(name = "Position") + theme_minimal()
p6It is shown that PG’s have the highest median assists of all 5 positions. As is is shown by the heght of the bars, PG’s are a lot better at getting assists than all other positions, with the median assists from a PG being 3.05, and the next highest being from SG’s at 1.6. This furthermore shows how the NBA is a more guard oriented game as in general, they rack up better numbers than other positions.
Next lets look at age vs production:
To start, the data is made:
data5 <- aggregate(data$PTS, list(data$Age), FUN=median)
data6 <- aggregate(data$TRB, list(data$Age), FUN=median)
data7 <- aggregate(data$AST, list(data$Age), FUN=median)Here is a display of the data frames:
## Group.1 x
## 1 19 5.20
## 2 20 4.20
## 3 21 5.00
## 4 22 5.30
## 5 23 4.60
## 6 24 7.05
## 7 25 6.85
## 8 26 7.95
## 9 27 7.95
## 10 28 10.00
## 11 29 7.40
## 12 30 9.50
## 13 31 11.80
## 14 32 10.70
## 15 33 8.70
## 16 34 11.20
## 17 35 5.90
## 18 36 10.75
## 19 37 4.40
## Group.1 x
## 1 19 2.70
## 2 20 2.40
## 3 21 2.20
## 4 22 2.70
## 5 23 2.10
## 6 24 2.95
## 7 25 2.95
## 8 26 3.60
## 9 27 3.50
## 10 28 4.20
## 11 29 3.65
## 12 30 3.75
## 13 31 4.80
## 14 32 3.80
## 15 33 3.50
## 16 34 3.10
## 17 35 4.60
## 18 36 3.85
## 19 37 3.60
## Group.1 x
## 1 19 1.20
## 2 20 0.90
## 3 21 0.90
## 4 22 1.10
## 5 23 0.90
## 6 24 1.40
## 7 25 1.30
## 8 26 1.70
## 9 27 1.35
## 10 28 1.60
## 11 29 1.70
## 12 30 2.95
## 13 31 1.70
## 14 32 2.50
## 15 33 1.70
## 16 34 3.10
## 17 35 1.40
## 18 36 1.55
## 19 37 2.30
The range of the ages:
## [1] 18
which range from:
## [1] 19
and:
## [1] 37
Here is the first plot, looking at Points Per Game v Age:
p7 <- ggplot(data5, aes( x = Group.1, y = x, fill = Group.1, colour = Group.1))
p7 <- p7 + geom_line(size = 1.2, alpha = 0.9) + geom_smooth(method=lm) + geom_text(aes(label = x), size = 4, vjust = -1) + ylim(0, 15)+ labs(title = "Median PTS Per Game For Each Year In Age 2020-21 NBA Season", x = "Age", y = "PTS Per Game Median") + theme_minimal()
p7## `geom_smooth()` using formula 'y ~ x'
It can be seen that as players in the NBA get older, they tend to
average more points per game.The peak median average points per game is
at the age of 31. After that, there are large up and down fluctuations
in player production.
Next we will look at rebounds:
p8 <- ggplot(data6, aes( x = Group.1, y = x, fill = Group.1, colour = Group.1))
p8 <- p8 + geom_line(size = 1.2, alpha = 0.9) + geom_smooth(method=lm) + geom_text(aes(label = x), size = 4, vjust = -1) + ylim(0, 5)+ labs(title = "Median AST Per Game For Each Year In Age 2020-21 NBA Season", x = "Age", y = "TRB Per Game Median") + theme_minimal()
p8## `geom_smooth()` using formula 'y ~ x'
It can be seen that as players in the NBA get older, they tend to
average more rebounds per game. Once again, the peak median average
rebounds per game is at the age of 31. After that, there are large up
and down fluctuations in player production. In fact, ages 32, 33, and 34
are downwards trending.
Next we will look at assists:
p9 <- ggplot(data7, aes( x = Group.1, y = x, fill = Group.1, colour = Group.1))
p9 <- p9 + geom_line(size = 1.2, alpha = 0.9) + geom_smooth(method=lm) + geom_text(aes(label = x), size = 4, vjust = -1) + ylim(0, 5)+ labs(title = "Median AST Per Game For Each Year In Age 2020-21 NBA Season", x = "Age", y = "AST Per Game Median") + theme_minimal()
p9## `geom_smooth()` using formula 'y ~ x'
As shown from the graph above, there is an upwards trend in the assist data. In this case, there are many irregular up and down fluctuations in the data. The peak age for median assists from NBA players in the 2020-21 season is 34. With no apparent downward trend in the data, the older a player gets, it can be said to some certainty more assists they will average.
Overall, Through the data visualisations shown above. The can be concluded that the NBA is a guard orientated league. PG’s and SG’s lead the league in median agerage points per game and assists per game. According to the visualisations, guards clearly carry the offensive load in todays NBA.
At the same time, C’s (centres) and PF’s (power forwards) are still important in todays game, as they provide good rebounding that guards do not. A good team is good in all categories, so having a PF and a C is needed for s team to succeed.
On the other hand, small forwards are clearly the least valuable position, as they are at the bottom in median points per game, are on a very simular level to centres and power forwards in assists, and are mediocure at rebounding. Therefore, in todays NBA, this position could be replaced with another guard, power forward or center to get more production out of it.
Furthermore, It is found that as player age increases, players get more experienced with the level of basketball being played in the NBA and generally perform better. For points per game production and rebounds per game production, players tend to hit their peak, and then inconsistently decline. Furthermore the peak for assists per game production is at the age of 34, though the visualisation shows that production does not begin to dip as much as the other two variables of production after that age.
Finally, this tells us that for a team to want to content for an NBA title, to optimise production from their players, they would want players aged roughly 31 that are PG’s and SG’s for their offensive load, at the same time, they would want someone aged roughly 34 to have a high number of assists per game. Furthermore, They would want a PF, and C for rebounding. and the final starter can be either a guard, PF, or C.