drawing

Seasons 1950-2017





Setup Stage


Loading necessary packages.

library(dplyr)
library(tidyr)
library(ggplot2)
library(RColorBrewer)
library(plotly)
library(reshape2)
library(kableExtra)
library(wordcloud2)


Create additional functions

# Mode average
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}


Load the dataset


Dataset taken from: https://www.kaggle.com/drgilermo/nba-players-stats/version/2

If you need to see the glossary and the data tidying process, please visit the first part here.

Loading datasets…

NBA <- read.csv("NBA_TidySet.csv")[,-c(1)]
NBA_Scaled <- read.csv("NBA_Scaled_TidySet.csv")
NBA$Pos <- factor(NBA$Pos, levels = c("C", "PF", "SF", "SG", "PG"))
NBA_Scaled$Pos <- factor(NBA_Scaled$Pos, levels = c("C", "PF", "SF", "SG", "PG"))
PosColorCode <- c("C"="#FF0000", "PF"="#FFA500", "SF"="#DDDD00" ,"SG"="#0000FF", "PG"="#32CD32")


Displaying main table

NBA




POINTS


Points, commonly perceived as the most substantial indicator to value a player’s competence to compete in the NBA. After all, the team that has recorded the most points at the end of a game is declared that game’s winner. So, let the analyzing begin…

Points can be aggregated from three different grounds:

  1. Field Goals (FG): for each FG successfully made from within the three-point line, the player scores 2 points.
  2. Three-Points Field Goal (3P): for each FG successfully made from beyond the three-point line, the player scores 3 points.
  3. Free-Throw (FT): for each FT successfully made, the player scores 1 point.



Annual Team Points per Game


I will start from a wider scope down into the smaller ones. So let’s start with total points accumulated per team year-by-year.

Since the dataset is a player statistics, not the team, it’s not so easy to get the data that I want, so I came up with these following method:

  1. Calculate total points collected each year from all players.
  2. Calculate number of teams participated in NBA competition each year.
  3. Calculate number of games each team has to play each year.
  4. Calculate total NBA games in each year by multiplying the number of games and number of teams participated then divide it by two (each game two teams compete with each other).
  5. Calculate average team score in each game by dividing total points NBA collected each year by the total number of games each year.

And to make the plot more meaningful, why not let it tell where those points generated from?

So here we go..


Team_Scores <- NBA %>%
    group_by(Year) %>%
    summarise(Total_PTS = sum(PTS),
            Total_2FG = sum(X2P)*2,
            Total_3FG = sum(X3P, na.rm = T)*3,
            Total_FT = sum(FT),
            nTeam = n_distinct(Tm),
            nGames = max(G),
            Total_G = round((nGames*nTeam)/2, 0),
            FG2 = round((Total_2FG/Total_G)/2, 2),
            FG3 = round((Total_3FG/Total_G)/2, 2),
            FT = round((Total_FT/Total_G)/2, 2),
            avg_Tm_Scores = round((Total_PTS/Total_G)/2, 2)) %>%
    gather(Parameter, Count, FG2:FT, -c(Year:Total_G, avg_Tm_Scores))
Team_Scores %>%
    group_by(Year, Parameter) %>%
    ggplot(aes(Year, Count, fill=forcats::fct_rev(Parameter))) + 
    geom_bar(stat="identity") +
    geom_hline(yintercept = mean(Team_Scores$avg_Tm_Scores), col = "blue", alpha = 0.5) +
    ggtitle("Annual Team Scores per Game") +
    guides(fill=guide_legend(title="")) +
    xlab("Year") +
    ylab("Score per Game") +
    scale_fill_manual(name="", values=c("#FF5555", "#008500", "#0055FF")) +
    scale_x_continuous(breaks = seq(1950, 2020, 10)) +
    theme(legend.position="bottom")


  • Average team score per game across the NBA history is 101.55.
  • The lowest average team score is 72.26 in the 1950-1951 season.
  • The highest average team score is 116.75 in the 1960-1961 season.
  • 2-points FG is accounted for 70% of all points collected, averaging 72.86 per game.
  • 3-points FG is accounted for 9.9% of all points collected, averaging 7.22 per game, And it’s 22.5% of points collected in this decade.
  • Free throws are accounted for 20.1% of points, averaging 21.46 per game.



Annual Points per Game by Position


Points per game (PPG) is a standard gauge used to measure a player’s ability to scores points. It is calculated by dividing the total number of points by the number of games.

Here I wanted to compare annual PPG in each position among themselves and to average. We do that first by making a yearly table with all the positions spread and filled with their respective annual PPG and then add the annual average column.


MeanPPG <- NBA %>%
    group_by(Year) %>%
    summarise(Average = round(sum(PTS)/sum(G), 2))
AvgPosPPGbyYear <- NBA %>%
    group_by(Year, Pos) %>%
    summarise(PPG = round(sum(PTS)/sum(G), 1)) %>%
    dcast(Year ~ Pos) %>%
    bind_cols(MeanPPG[,2])
AvgPosPPGbyYear %>%
    kable(escape = FALSE, align='c', caption = "Points per Game by Position") %>%
    kable_styling("striped", full_width = T) %>%
    column_spec(1, bold = T) %>%
    scroll_box(width = "100%", height = "300px")
Points per Game by Position
Year C PF SF SG PG Average
1950 9.2 8.1 7.7 7.5 8.3 8.02
1951 12.4 8.4 7.7 8.2 9.6 8.82
1952 11.0 9.1 9.0 8.3 9.1 9.21
1953 11.4 8.8 7.8 9.1 8.6 9.03
1954 13.6 7.7 6.7 8.7 8.3 8.67
1955 12.4 10.6 10.0 8.7 9.8 10.23
1956 13.7 9.2 10.0 9.2 9.2 10.25
1957 12.0 11.6 9.3 10.2 9.2 10.49
1958 12.3 12.3 12.0 10.0 9.5 11.23
1959 10.5 12.0 13.3 10.3 10.1 11.27
1960 12.8 12.3 14.3 11.2 10.2 12.17
1961 12.1 12.4 13.2 12.1 12.3 12.42
1962 14.8 11.2 14.2 12.0 11.3 12.67
1963 12.6 11.6 12.3 11.8 11.6 12.02
1964 12.1 12.3 11.9 10.9 11.0 11.65
1965 12.1 11.7 10.8 11.4 11.8 11.52
1966 11.8 11.8 11.3 12.5 13.6 12.14
1967 10.6 10.5 13.8 12.8 13.0 12.18
1968 11.1 11.9 12.7 12.7 12.5 12.19
1969 12.0 10.9 11.4 12.9 13.2 12.01
1970 12.0 11.1 11.8 11.9 14.3 12.17
1971 11.4 9.9 11.9 11.3 13.5 11.50
1972 11.4 9.9 12.3 10.6 13.2 11.38
1973 9.8 11.0 10.9 11.5 12.4 11.07
1974 9.5 10.1 11.3 11.7 11.6 10.83
1975 9.3 10.2 10.1 10.4 11.8 10.28
1976 9.4 9.9 10.1 11.3 11.3 10.38
1977 10.0 10.8 10.4 11.6 9.3 10.45
1978 11.2 10.0 11.4 12.9 9.4 11.04
1979 10.5 10.8 11.8 11.5 10.9 11.13
1980 10.7 10.2 12.3 12.5 9.4 11.06
1981 10.3 10.1 12.0 12.2 9.6 10.84
1982 10.2 9.4 11.3 11.5 10.0 10.50
1983 9.4 10.6 11.6 10.6 10.4 10.55
1984 10.0 9.6 12.5 11.6 9.3 10.61
1985 9.1 10.5 12.3 11.9 10.1 10.78
1986 9.4 9.6 12.9 11.0 10.1 10.57
1987 8.8 10.2 12.2 12.0 10.2 10.72
1988 8.9 10.1 12.4 12.3 9.3 10.56
1989 8.2 9.8 13.1 12.5 9.9 10.62
1990 8.8 10.1 12.3 12.3 9.3 10.55
1991 8.3 9.4 12.3 12.1 10.6 10.49
1992 8.1 10.3 11.8 11.9 9.8 10.40
1993 8.4 10.5 10.8 12.5 9.6 10.38
1994 8.9 9.3 10.6 11.6 9.4 9.95
1995 9.0 9.1 11.0 11.5 9.5 9.96
1996 8.4 9.2 10.9 11.1 9.9 9.87
1997 7.7 9.4 10.7 11.1 9.5 9.68
1998 8.4 9.0 10.6 10.5 8.7 9.45
1999 7.7 8.4 10.0 9.9 8.4 8.86
2000 7.9 9.2 10.9 11.2 8.5 9.49
2001 7.0 9.7 10.2 11.7 8.6 9.37
2002 7.2 9.3 10.7 10.9 9.3 9.44
2003 6.8 10.3 10.3 10.8 9.3 9.40
2004 6.7 9.7 9.2 11.4 9.3 9.24
2005 6.9 10.3 10.9 10.0 10.1 9.57
2006 6.5 10.0 10.5 10.9 10.2 9.58
2007 8.4 8.3 11.2 11.3 9.6 9.67
2008 8.4 9.0 11.6 11.0 9.4 9.84
2009 8.4 10.0 10.6 10.9 10.0 9.97
2010 8.6 10.2 10.0 11.0 10.1 9.97
2011 7.9 10.2 9.9 10.7 10.1 9.75
2012 7.6 9.1 9.0 10.2 9.9 9.19
2013 8.1 9.6 8.8 10.0 10.5 9.38
2014 8.1 10.2 9.3 9.9 11.1 9.73
2015 8.7 8.9 8.7 10.2 10.7 9.48
2016 9.0 8.6 9.8 10.5 10.7 9.72
2017 9.4 8.9 10.1 10.3 11.2 9.98


With line plot, it will be easy to see which position is hot or cold in each year by using average line as a threshold.

PosPPG <- AvgPosPPGbyYear %>%
    plot_ly(x = ~Year, opacity=0.5) %>%
    add_lines(y = ~C, name = "C", mode = 'lines', line=list(color='#FF0000')) %>%
    add_lines(y = ~PF, name = "PF", mode = 'lines', line=list(color='#FFA500')) %>%
    add_lines(y = ~SF, name = "SF", mode = 'lines', line=list(color='#DDDD00')) %>%
    add_lines(y = ~SG, name = "SG", mode = 'lines', line=list(color='#0000FF')) %>%
    add_lines(y = ~PG, name = "PG", mode = 'lines', line=list(color='#32CD32')) %>%
    add_lines(y = ~Average, name = "Average", mode = 'marker', opacity=1) %>%
    layout(legend = list(orientation = 'h'),
           title = 'Annual Points per Game by Position',
           yaxis = list(title = "Points per Game"),
           xaxis = list(title = ""))
#PosPPGChart <- api_create(PosPPG, filename = "PosPPG")
#PosPPGChart
PosPPG

click image for interactive plotly graph


Now, this may be looked like a messy spaghetti plot. This is where the plot_ly come to the rescue when you click on compare data on hover and then go hover your pointer along the lines horizontally. Now the spaghetti with the right sauce may satisfy your appetite.



Position Compared for Accumulated Points by Decade

Now I’d like to see total points distribution to each position. But instead of the yearly accumulation I want to see by decades instead. This way it looks more pleasant to eyes while I learn to make grouped stacked bar plot.

NBA %>%
    mutate(Decade = as.factor(ifelse(Year %in% 1950:1959, "1950-59",
                               ifelse(Year %in% 1960:1969, "1960-69",
                                      ifelse(Year %in% 1970:1979, "1970-79",
                                             ifelse(Year %in% 1980:1989, "1980-89",
                                                    ifelse(Year %in% 1990:1999, "1990-99",
                                                           ifelse(Year %in% 2000:2010, "2000-09",
                                                                  "2010+")))))))) %>%
    group_by(Pos, Decade) %>%
    summarise(PTS = sum(PTS)) %>%
    ggplot(aes(Pos, PTS, fill=forcats::fct_rev(Decade))) + 
    geom_bar(stat="identity") +
    ggtitle("Points Distribution by Position") +
    guides(fill=guide_legend(title="Decades")) +
    xlab("Position") +
    ylab("Points") +
    scale_fill_brewer(palette="Set2")



All-Time Points per Game Leaders


Many of us NBA fans probably already know that Michael Jordan, followed closely by Wilt Chamberlain, are in the top leaderboard of All-Time Points per Game. What I’m intrigued to see is how their annual PPG rise and fall shaped like throughout their career. Here we gonna see top 10 all-time leaders in PPG get compared.

First, I need to create the leaderboard table as a base for generating the plot. This probably the most celebrated table that you can find easily on the web. Even so, I refrain myself to see it until mine done, then I take a look at the official table to make sure I did not make any mistake.

AllTimePPG <- NBA %>%
    group_by(Player) %>%
    summarise(Pos = getmode(Pos),
              Team = getmode(Tm),
              ActiveYears = paste(getmode(YearStart), "-", getmode(YearEnd)),
              Games = sum(G),
              Points = sum(PTS),
              PPG = round(Points/Games, 2)) %>%
    filter(Games >= 400 | Points > 10000) %>%
    arrange(desc(PPG)) %>%
    head(n=20) %>%
    mutate(Rank = dense_rank(desc(PPG))) %>%
    select(Rank, everything())
AllTimePPG %>%
    mutate(Pos = cell_spec(Pos, color = "white", align = "c", 
                    background = factor(Pos, c("C", "PF", "SF", "SG", "PG"), 
                                        PosColorCode))) %>%
    kable(escape = FALSE, caption = "All-Time Points per Game Leaders") %>%
    kable_styling("striped", full_width = T) %>%
    column_spec(2, bold = T) %>%
    column_spec(1, bold = T, color = "yellow", background = "#FF0000") %>%
    column_spec(8, bold = T, color = "white", background = "#777777") %>%
    scroll_box(width = "100%", height = "300px")
All-Time Points per Game Leaders
Rank Player Pos Team ActiveYears Games Points PPG
1 Michael Jordan SG CHI 1985 - 2003 1072 32292 30.12
2 Wilt Chamberlain C LAL 1960 - 1973 1045 31419 30.07
3 Elgin Baylor SF LAL 1959 - 1972 846 23149 27.36
4 Kevin Durant SF OKC 2008 - 2018 703 19121 27.20
5 LeBron James SF CLE 2004 - 2018 1061 28787 27.13
6 Jerry West PG LAL 1961 - 1974 932 25192 27.03
7 Allen Iverson SG PHI 1997 - 2010 914 24368 26.66
8 Bob Pettit PF STL 1955 - 1965 792 20880 26.36
9 George Gervin SG SAS 1973 - 1986 791 20708 26.18
10 Oscar Robertson PG CIN 1961 - 1974 1040 26710 25.68
11 Karl Malone PF UTA 1986 - 2004 1476 36928 25.02
12 Kobe Bryant SG LAL 1997 - 2016 1346 33643 24.99
13 Dominique Wilkins SF ATL 1983 - 1999 1074 26668 24.83
14 Carmelo Anthony SF DEN 2004 - 2018 976 24156 24.75
15 Kareem Abdul-Jabbar C LAL 1970 - 1989 1560 38387 24.61
16 Larry Bird SF BOS 1980 - 1992 897 21791 24.29
17 Adrian Dantley SF UTA 1977 - 1991 955 23177 24.27
18 Pete Maravich SG NOJ 1971 - 1980 658 15948 24.24
19 Shaquille O'Neal C LAL 1993 - 2011 1207 28596 23.69
20 Dwyane Wade SG MIA 2004 - 2018 915 21317 23.30


Generating plot…

PPGAll <- NBA %>%
    filter(Player %in% head(AllTimePPG$Player, 10)) %>%
    group_by(Age, Player) %>%
    summarise(Games = sum(G),
              Points = sum(PTS),
              PPG = round(Points/Games, 2)) %>%
    ggplot() +
    geom_line(aes(Age, PPG, color=Player), alpha = 1) +
    ggtitle("All-Time Top Scorers' Chronological PPG") +
    theme(legend.position="bottom")
#ggplotly(PPGAll, session="knitr", kwargs=list(filename="PPGall_knitr", fileopt="overwrite"))
PPGall

click image for interactive plotly graph


Again, add your favored sauce to your spaghetti by click on “Compare data on hover” or double click desired player on the legend to isolate one trace.

Now let’s see PPG leader in each position.

TopPPGPos <- AllTimePPG %>%
    group_by(Pos) %>%
    summarise(Player = Player[which.max(PPG)],
              Team = Team[which.max(PPG)],
              ActiveYears = ActiveYears[which.max(PPG)],
              Games = Games[which.max(PPG)],
              PPG = PPG[which.max(PPG)])

Best Points per Game by Position

C


Wilt Chamberlain

30.07
  • Team: LAL
  • Games: 1045
  • Years: 1960 - 1973

PF


Bob Pettit

26.36
  • Team: STL
  • Games: 792
  • Years: 1955 - 1965

SF


Elgin Baylor

27.36
  • Team: LAL
  • Games: 846
  • Years: 1959 - 1972

SG


Michael Jordan

30.12
  • Team: CHI
  • Games: 1072
  • Years: 1985 - 2003

PG


Jerry West

27.03
  • Team: LAL
  • Games: 932
  • Years: 1961 - 1974





Annual Points per Game Leaders


Still exploring Points per Game, we now see annual leaders and then plot it for our eyes delight.

TopScorer <- NBA %>%
    group_by(Year) %>%
    summarise(Player = as.character(Player[which.max(PpG)]),
              Team = Tm[which.max(PpG)],
              Pos = Pos[which.max(PpG)],
              Age = Age[which.max(PpG)],
              Games = G[which.max(PpG)],
              Shooting = TS.[which.max(PpG)],
              Top_PPG = max(round(PpG, 2))) %>%
    arrange(desc(Year))
TopScorer %>%
    mutate(Pos = cell_spec(Pos, color = "white", align = "c", 
                    background = factor(Pos, c("C", "PF", "SF", "SG", "PG"), 
                                        PosColorCode))) %>%
    kable(escape = FALSE, caption = "Annual Points per Game Leaders") %>%
    kable_styling("striped", full_width = T) %>%
    column_spec(1, bold = T, color = "yellow", background = "#FF0000") %>%
    column_spec(2, bold = T) %>%
    column_spec(8, bold = T, color = "white", background = "#777777") %>%
    scroll_box(width = "100%", height = "300px")
Annual Points per Game Leaders
Year Player Team Pos Age Games Shooting Top_PPG
2017 Russell Westbrook OKC PG 28 81 0.554 31.58
2016 Stephen Curry GSW PG 27 79 0.669 30.06
2015 Russell Westbrook OKC PG 26 67 0.536 28.15
2014 Kevin Durant OKC SF 25 81 0.635 32.01
2013 Carmelo Anthony NYK PF 28 67 0.560 28.66
2012 Kevin Durant OKC SF 23 66 0.610 28.03
2011 Kevin Durant OKC SF 22 78 0.589 27.71
2010 Kevin Durant OKC SF 21 82 0.607 30.15
2009 Dwyane Wade MIA SG 27 79 0.574 30.20
2008 LeBron James CLE SF 23 75 0.568 30.00
2007 Kobe Bryant LAL SG 28 77 0.580 31.56
2006 Kobe Bryant LAL SG 27 80 0.559 35.40
2005 Allen Iverson PHI PG 29 75 0.532 30.69
2004 Tracy McGrady ORL SG 24 67 0.526 28.03
2003 Tracy McGrady ORL SG 23 75 0.564 32.09
2002 Allen Iverson PHI SG 26 60 0.489 31.38
2001 Allen Iverson PHI SG 25 71 0.518 31.08
2000 Shaquille O'Neal LAL C 27 79 0.578 29.67
1999 Allen Iverson PHI SG 23 48 0.508 26.75
1998 Michael Jordan CHI SG 34 82 0.533 28.74
1997 Michael Jordan CHI SG 33 82 0.567 29.65
1996 Michael Jordan CHI SG 32 82 0.582 30.38
1995 Shaquille O'Neal ORL C 22 79 0.588 29.30
1994 David Robinson SAS C 28 80 0.577 29.79
1993 Michael Jordan CHI SG 29 78 0.564 32.58
1992 Michael Jordan CHI SG 28 80 0.579 30.05
1991 Michael Jordan CHI SG 27 82 0.605 31.46
1990 Michael Jordan CHI SG 26 82 0.606 33.57
1989 Michael Jordan CHI SG 25 81 0.614 32.51
1988 Michael Jordan CHI SG 24 82 0.603 34.98
1987 Michael Jordan CHI SG 23 82 0.562 37.09
1986 Dominique Wilkins ATL SF 26 78 0.536 30.33
1985 Bernard King NYK SF 28 55 0.585 32.89
1984 Adrian Dantley UTA SF 27 79 0.652 30.61
1983 Adrian Dantley UTA SF 26 22 0.661 30.73
1982 George Gervin SAS SG 29 79 0.562 32.29
1981 Adrian Dantley UTA SF 24 80 0.622 30.65
1980 George Gervin SAS SG 27 78 0.587 33.14
1979 George Gervin SAS SF 26 80 0.591 29.56
1978 John Williamson NJN SG 26 33 0.509 29.48
1977 Pete Maravich NOJ SG 29 73 0.492 31.14
1976 Bob McAdoo BUF C 24 78 0.542 31.12
1975 Bob McAdoo BUF C 23 82 0.569 34.52
1974 Bob McAdoo BUF C 22 74 0.594 30.55
1973 Tiny Archibald KCO PG 24 80 0.555 33.99
1972 Kareem Abdul-Jabbar MIL C 24 81 0.603 34.84
1971 Kareem Abdul-Jabbar MIL C 23 82 0.606 31.66
1970 Jerry West LAL PG 31 74 0.572 31.20
1969 Elvin Hayes SDR C 23 82 0.483 28.38
1968 Oscar Robertson CIN PG 29 65 0.588 29.17
1967 Rick Barry SFW SF 22 78 0.531 35.58
1966 Wilt Chamberlain PHI C 29 79 0.547 33.53
1965 Wilt Chamberlain SFW C 28 38 0.495 38.95
1964 Wilt Chamberlain SFW C 27 80 0.537 36.85
1963 Wilt Chamberlain SFW C 26 80 0.550 44.83
1962 Wilt Chamberlain PHW C 25 80 0.536 50.36
1961 Wilt Chamberlain PHW C 24 79 0.519 38.39
1960 Wilt Chamberlain PHW C 23 72 0.493 37.60
1959 Bob Pettit STL PF 26 72 0.519 29.24
1958 George Yardley DET SF 29 72 0.505 27.79
1957 Paul Arizin PHW SF 28 71 0.515 25.59
1956 Bob Pettit STL C 23 72 0.502 25.68
1955 Neil Johnston PHW C 25 72 0.536 22.65
1954 Neil Johnston PHW C 24 72 0.531 24.43
1953 Neil Johnston PHW C 23 70 0.534 22.34
1952 Paul Arizin PHW SF 23 66 0.546 25.36
1951 George Mikan MNL C 26 68 0.509 28.41
1950 George Mikan MNL C 25 68 0.487 27.43


Let’s now generate the plot with the same color-coded position with additional NBA average line.

PPGmean <- NBA %>%
    group_by(Year) %>%
    summarise(meanPPG = mean(sum(PTS)/sum(G), na.rm = T))
TS <- TopScorer %>%
    ggplot() +
    geom_bar(aes(Year, Top_PPG, fill = Pos, text = paste("Player:", Player)), stat = "identity") +
    geom_line(aes(Year, meanPPG, linetype = "Average line"), data = PPGmean, color = "black") +
    ggtitle("Top Scorer by Year") +
    scale_x_continuous(breaks = seq(1950, 2020, 10)) +
    scale_fill_manual("Pos", values = PosColorCode) +
    ylab("Top Points per Game") +
    theme(legend.position='none')
TSplotly <- ggplotly(TS, session="knitr", kwargs=list(filename="TS_knitr", fileopt="overwrite"))
api_create(TSplotly, filename = "AnnualTS")
Found a grid already named: 'AnnualTS Grid'. Since fileopt='overwrite', I'll try to update it
Found a plot already named: 'AnnualTS'. Since fileopt='overwrite', I'll try to update it
AnnualTS

click image for interactive plotly graph

You can hover over the bars to see the details. We can clearly see the pattern which shows Wilt Chamberlain and Michael Jordan era, followed by a short Durrant era.


  • Highest PPG leader in a season: 50.36 by Wilt Chamberlain in 1962
  • Lowest PPG leader in a season: 22.34 by Neil Johnston in 1953
  • Average PPG leader all season: 31.24, compares to overall average PPG NBA: 10.41
  • Most seasons leading the league in PPG: 10 by Michael Jordan



Points per 36 Minutes


The concept of per-minute statistics is a basic building block of statistical analysis in basketball. While looking at per-game averages gives us some information about the productivity of a player, per-minute averages tell us a great deal more about how well, or poorly, that player is actually performing. (source)

As we know full-length of an NBA game is 48 minutes, divided by 4 quarters so it’s 12 minutes each quarter. So why 36? Well, as the argument goes, the NBA average for a starter is about 36 minutes (they play 3 quarters out of 4 quarters), and 36 is the more realistic way to measure stats as not many players play 40+ minutes, let alone 48.

PTS36 <- NBA %>%
    group_by(Player) %>%
    summarise(Pos = getmode(Pos),
              ActiveYears = paste(getmode(YearStart), "-", getmode(YearEnd)),
              Games = sum(G),
              MP = sum(MP),
              Points = sum(PTS),
              MPG = round(MP/Games, 2),
              PPG = round(Points/Games, 2),
              PPM36 = round(PPG/MPG * 36, 2)) %>%
    filter(Games > 100) %>%
    arrange(desc(PPM36)) %>%
    mutate(Rank = dense_rank(desc(PPM36))) %>%
    select(Rank, everything(), -c(MP, Points)) %>%
    head(n=20)
PTS36 %>%
    mutate(Pos = cell_spec(Pos,
                            color = "white",
                            align = "c",
                            background = factor(Pos, c("C", "PF", "SF", "SG", "PG"),
                                                PosColorCode))) %>%
    kable(escape = FALSE, caption = "Points per 36 Minutes") %>%
    kable_styling("striped", full_width = T) %>%
    column_spec(2, bold = T) %>%
    column_spec(1, bold = T, color = "yellow", background = "#FF0000") %>%
    column_spec(8, bold = T, color = "white", background = "#777777") %>%
    scroll_box(width = "100%", height = "300px")
Points per 36 Minutes
Rank Player Pos ActiveYears Games MPG PPG PPM36
1 Michael Jordan SG 1985 - 2003 1072 38.26 30.12 28.34
2 George Gervin SG 1973 - 1986 791 33.55 26.18 28.09
3 Kevin Durant SF 2008 - 2018 703 37.38 27.20 26.20
4 Freeman Williams SG 1979 - 1986 323 20.45 14.67 25.82
5 John Drew SF 1975 - 1985 739 29.54 20.69 25.21
6 Dominique Wilkins SF 1983 - 1999 1074 35.49 24.83 25.19
7 LeBron James SF 2004 - 2018 1061 38.90 27.13 25.11
8 Kobe Bryant SG 1997 - 2016 1346 36.13 24.99 24.90
9 David Thompson SG 1976 - 1984 509 32.03 22.13 24.87
10 Jerry West PG 1961 - 1974 932 39.24 27.03 24.80
11 Carmelo Anthony SF 2004 - 2018 976 36.20 24.75 24.61
11 Elgin Baylor SF 1959 - 1972 846 40.03 27.36 24.61
12 Shaquille O'Neal C 1993 - 2011 1207 34.73 23.69 24.56
13 Bob Pettit PF 1955 - 1965 792 38.75 26.36 24.49
14 Adrian Dantley SF 1977 - 1991 955 35.76 24.27 24.43
15 Walter Davis SG 1978 - 1992 1033 27.94 18.90 24.35
16 Karl Malone PF 1986 - 2004 1476 37.16 25.02 24.24
17 Alex English SF 1977 - 1991 1193 31.91 21.47 24.22
18 Kareem Abdul-Jabbar C 1970 - 1989 1560 36.82 24.61 24.06
19 Bernard King SF 1978 - 1993 874 33.66 22.49 24.05


For the plot, I’d like to see the discrepancy between points per 36 minutes (PTS36) and points per game (PPG), sorted by PTS36 rank.

PTS36 %>%
    mutate(Player = reorder(Player, desc(PPM36), FUN=median)) %>%
    ggplot(aes(group = 1)) +
    geom_segment(aes(x=Player, xend=Player, y=PPM36, yend=PPG), color="black") +
    geom_point(aes(Player, PPM36, color="#FF5800"), size=5) +
    geom_point(aes(Player, PPG, color="#009dff"), size=3, shape=18) +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
    ggtitle("Points per 36 Minutes vs. Points per Game") +
    xlab("") +
    ylab("PTS36 vs. PPG") +
    scale_color_manual(name="", values=c("#009dff", "#FF5800"),
                       labels=c("Points/Game", "Points/36 minutes"),
                       guide = guide_legend(reverse=TRUE)) +
    theme(legend.position="bottom")


This simply shows how well they scores and how much chance their coach give them to play in a game. If the blue diamond is inside the orange circle, it means their minutes per game (MPG) is around 36 minutes, if it falls below the circle then their MPG is lower than 36 and vice versa.



Most Points Career


Now we about to see the most points accumulated by a player throughout his career in the NBA.

PTSMost <- NBA %>%
    group_by(Player) %>%
    filter(n_distinct(Born) == 1) %>%
    summarise(Pos = getmode(Position),
              ActiveYears = paste(getmode(YearStart), "-", getmode(YearEnd)),
              YearsActive = n_distinct(Year),
              Games = sum(G),
              MP = sum(MP),
              MPG = round(MP/Games, 2),
              Points = as.numeric(sum(PTS)),
              PPG = round(Points/Games, 2)) %>%
    filter(Games > 100) %>%
    arrange(desc(Points)) %>%
    head(n=20) %>%
    mutate(Rank = rank(desc(Points))) %>%
    select(Rank, everything(), -c(MP))
PTSMost %>%
    kable(escape = FALSE, caption = "Most Points Career") %>%
    kable_styling("striped", full_width = T) %>%
    column_spec(2, bold = T) %>%
    column_spec(1, bold = T, color = "yellow", background = "#FF0000") %>%
    column_spec(8, bold = T, color = "white", background = "#777777") %>%
    scroll_box(width = "100%", height = "300px")
Most Points Career
Rank Player Pos ActiveYears YearsActive Games MPG Points PPG
1 Kareem Abdul-Jabbar C 1970 - 1989 20 1560 36.82 38387 24.61
2 Karl Malone PF 1986 - 2004 19 1476 37.16 36928 25.02
3 Kobe Bryant SG 1997 - 2016 20 1346 36.13 33643 24.99
4 Michael Jordan SG 1985 - 2003 15 1072 38.26 32292 30.12
5 Wilt Chamberlain C 1960 - 1973 14 1045 45.80 31419 30.07
6 Dirk Nowitzki PF 1999 - 2018 19 1394 34.92 30260 21.71
7 LeBron James SF 2004 - 2018 14 1061 38.90 28787 27.13
8 Shaquille O'Neal C 1993 - 2011 19 1207 34.73 28596 23.69
9 Moses Malone C 1975 - 1995 19 1329 33.91 27409 20.62
10 Elvin Hayes PF 1969 - 1984 16 1303 38.37 27313 20.96
11 Hakeem Olajuwon C 1985 - 2002 18 1238 35.72 26946 21.77
12 Oscar Robertson PG 1961 - 1974 14 1040 42.20 26710 25.68
13 Dominique Wilkins SF 1983 - 1999 15 1074 35.49 26668 24.83
14 Tim Duncan C 1998 - 2016 19 1392 34.03 26496 19.03
15 Paul Pierce SF 1999 - 2017 19 1343 34.16 26397 19.66
16 John Havlicek SF 1963 - 1978 16 1270 36.59 26395 20.78
17 Kevin Garnett PF 1996 - 2016 21 1462 34.49 26071 17.83
18 Alex English SF 1977 - 1991 15 1193 31.91 25613 21.47
19 Reggie Miller SG 1988 - 2005 18 1389 34.28 25279 18.20
20 Jerry West PG 1961 - 1974 14 932 39.24 25192 27.03


Generating plot…

PTSMost %>%
    head(10) %>%
    arrange(Points) %>%
    mutate(Player=factor(Player, Player)) %>%
    ggplot(aes(Player, Points)) +
    geom_segment(aes(x=Player, xend=Player, y=0, yend=Points),
                 color="gray11", linetype = "dotted", size=1, alpha=0.6) +
    geom_point(color="orangered1", size=4) +
    geom_text(aes(label=Points), hjust=0.5, vjust=-1.5, size=3) +
    ggtitle("Most Points Career") +
    coord_flip() +
    xlab("") +
    ylab("Points")



Finding the BIGGEST Top Scorers in the League


Now that we have seen the best scorers in the league in several categories, I’d like to proceed with finding the biggest name in scoring by combining all category from the previous leaderboards:

  1. Top 20 All-Time Points per Game Leaders
  2. Annual Points per Game Leaders
  3. Top 20 Points per 36 Minutes Leaders
  4. Top 20 Most Points Leaders

My arbitrary method to measure their prominence as a scorer by using tokens as a valuation unit described as follows:

  • Rank 1 earn 10 tokens
  • Rank 2 earn 7 tokens
  • Rank 3 earn 4 tokens
  • Rank 4 to 10 earn 2 tokens
  • Rank 11 to 20 earn 1 token
  • Each annual PPG leader earn 1 token
  • PPG must not below 18.0

Then we can rank them based on the total tokens they earn.

So let’s get started…

# Step 1: Calculating tokens for All-Time Points per Game leaderboard:
PTSValuation1 <- AllTimePPG %>%
    group_by(Player) %>%
    summarise(n = (ifelse(Rank == 1, 10, ifelse(Rank == 2, 7,
                        ifelse(Rank == 3, 4, ifelse(Rank%in%c(4:10), 2, 1)))))) %>%
    mutate(Player = as.character(Player)) %>%
    arrange(desc(n))
# Step 2: Calculating tokens for All-Time Points per Game leaderboard:
PTSValuation2 <- TopScorer %>% count(Player) %>% arrange(desc(n))
# Step 3: Calculating tokens for Points per 36 Minutes leaderboard:
PTSValuation3 <- PTS36 %>%
    group_by(Player) %>%
    filter(PPG > 16) %>%
    summarise(n = (ifelse(Rank == 1, 10, ifelse(Rank == 2, 7,
                        ifelse(Rank == 3, 4, ifelse(Rank%in%c(4:10), 2, 1)))))) %>%
    mutate(Player = as.character(Player)) %>%
    arrange(desc(n))
# Step 4: Calculating tokens for Most Points in career leaderboard:
PTSValuation4 <- PTSMost %>%
    group_by(Player) %>%
    summarise(n = (ifelse(Rank == 1, 10, ifelse(Rank == 2, 7, 
                    ifelse(Rank == 3, 4, ifelse(Rank%in%c(4:10), 2, 1)))))) %>%
    mutate(Player = as.character(Player)) %>%
    arrange(desc(n))
# Step 5: Merge the dataframes and calculating total tokens:
AllScorers <- data.frame(Player = c(PTSValuation1$Player,
                                    PTSValuation2$Player,
                                    PTSValuation3$Player,
                                    PTSValuation4$Player),
                         Tokens = c(PTSValuation1$n,
                                    PTSValuation2$n,
                                    PTSValuation3$n,
                                    PTSValuation4$n)) %>%
    group_by(Player) %>%
    summarise(Total = sum(Tokens)) %>%
    arrange(desc(Total))
# Step 6: Display the final table:
AllScorers %>%
    group_by(Player) %>%
    mutate(Pos = getmode(NBA$Position[NBA$Player %in% Player]),
           PPG = round(sum(NBA$PTS[NBA$Player %in% Player])/sum(NBA$G[NBA$Player %in% Player]), 2),
           Tokens = paste(strrep("|", Total))) %>%
    select(-Total,everything()) %>%
    filter(PPG > 16) %>%
    kable(escape = FALSE, caption = "BIGGEST Top Scorers") %>%
    kable_styling(bootstrap_options = "striped", full_width = F, font_size = 11) %>%
    column_spec(1, bold = T) %>%
    column_spec(4, bold = T, color = "gold") %>%
    column_spec(5, bold = T, color = "white", background = "#777777", width = "8px") %>%
    scroll_box(width = "60%", height = "500px") %>%
    kable_styling(position = "float_left")
BIGGEST Top Scorers
Player Pos PPG Tokens Total
Michael Jordan SG 30.12 |||||||||||||||||||||||||||||||| 32
Wilt Chamberlain C 30.07 |||||||||||||||| 16
Kareem Abdul-Jabbar C 24.61 |||||||||||||| 14
George Gervin SG 26.18 |||||||||||| 12
Kevin Durant SF 27.20 |||||||||| 10
Karl Malone PF 25.02 ||||||||| 9
Kobe Bryant SG 24.99 ||||||||| 9
LeBron James SF 27.13 ||||||| 7
Allen Iverson SG 26.66 |||||| 6
Jerry West PG 27.03 |||||| 6
Shaquille O'Neal C 23.69 |||||| 6
Adrian Dantley SF 24.27 ||||| 5
Bob Pettit PF 26.36 ||||| 5
Dominique Wilkins SF 24.83 ||||| 5
Elgin Baylor SF 27.36 ||||| 5
Oscar Robertson PG 25.68 |||| 4
Bob McAdoo C 22.05 ||| 3
Carmelo Anthony SF 24.75 ||| 3
Elvin Hayes PF 20.96 ||| 3
Neil Johnston C 19.42 ||| 3
Alex English SF 21.47 || 2
Bernard King SF 22.49 || 2
David Thompson SG 22.13 || 2
Dirk Nowitzki PF 21.71 || 2
Dwyane Wade SG 23.30 || 2
George Mikan C 22.32 || 2
John Drew SF 20.69 || 2
Moses Malone C 20.62 || 2
Paul Arizin SF 22.81 || 2
Pete Maravich SG 24.24 || 2
Russell Westbrook PG 22.69 || 2
Tracy McGrady SG 19.60 || 2
David Robinson C 21.06 | 1
George Yardley SF 19.20 | 1
Hakeem Olajuwon C 21.77 | 1
John Havlicek SF 20.78 | 1
John Williamson SG 20.15 | 1
Kevin Garnett PF 17.83 | 1
Larry Bird SF 24.29 | 1
Paul Pierce SF 19.66 | 1
Reggie Miller SG 18.20 | 1
Rick Barry SF 23.17 | 1
Stephen Curry PG 22.80 | 1
Tim Duncan C 19.03 | 1
Tiny Archibald PG 18.81 | 1
Walter Davis SG 18.90 | 1


Based on this calculation, Michael Jordan remains the all-time king of the scorer in the NBA. Let’s see their names from the biggest name to smaller ones…

AllScorers %>%
    wordcloud2(size=0.5, color='random-light', backgroundColor="black")



SuperScorers FG-3P-FT Ratio


Now let’s take the best eight top scorers (who has 7 or more tokens based on our previous valuation), named them SuperScorers, then find out where all those points they made generated from.

Bear in mind that in the NBA, the three-point line was introduced in the 1979–80 season. And our three out of eight SuperScorers (Chamberlain, Gervin, and Abdul-Jabbar) are been active in the field since before then. With that being said, let us now see where their points accumulated from.


Here’s the table…

SuperScorers <- NBA %>%
    filter(Player %in% head(AllScorers$Player, 8)) %>%
    group_by(Player) %>%
    summarise(Pos = getmode(Position),
              ActiveYears = paste(getmode(YearStart), "-", getmode(YearEnd)),
              YearsActive = n_distinct(Year),
              Games = sum(G),
              FGx2 = sum(X2P) * 2,
              FGx3 = sum(X3P, na.rm = T) * 3,
              FT = sum(FT),
              PTS = sum(PTS),
              PPG = round(PTS/Games, 2)) %>%
    arrange(desc(PPG))
SuperScorers %>%
    kable(escape = FALSE, caption = "SuperScorers 2Pts-3Pts-FT Ratio") %>%
    kable_styling("striped", full_width = T) %>%
    column_spec(1, bold = T)
SuperScorers 2Pts-3Pts-FT Ratio
Player Pos ActiveYears YearsActive Games FGx2 FGx3 FT PTS PPG
Michael Jordan SG 1985 - 2003 15 1072 23222 1743 7327 32292 30.12
Wilt Chamberlain C 1960 - 1973 14 1045 25362 0 6057 31419 30.07
Kevin Durant SF 2008 - 2018 10 703 10406 3780 4935 19121 27.20
LeBron James SF 2004 - 2018 14 1061 17912 4401 6474 28787 27.13
George Gervin SG 1973 - 1986 10 791 15936 231 4541 20708 26.18
Karl Malone PF 1986 - 2004 19 1476 26886 255 9787 36928 25.02
Kobe Bryant SG 1997 - 2016 20 1346 19784 5481 8378 33643 24.99
Kareem Abdul-Jabbar C 1970 - 1989 20 1560 31672 3 6712 38387 24.61


And let’s see the plot…

SuperScorers %>%
    gather(Parameter, Points, -c(Player:Games, PTS:PPG)) %>%
    select(-c(Pos, Games, PTS)) %>%
    mutate(Player = factor(Player, levels = c("Kareem Abdul-Jabbar", "Kobe Bryant", "Karl Malone", "George Gervin", "LeBron James", "Kevin Durant", "Wilt Chamberlain", "Michael Jordan"))) %>%
    ggplot(aes(Player, Points, fill=forcats::fct_rev(Parameter))) +
    geom_bar( stat="identity", position="fill") +
    ggtitle("Points Breakdown Among SuperScorers") +
    coord_flip() +
    xlab("") +
    ylab("Shooting") +
    guides(fill = guide_legend(reverse = TRUE)) +
    theme(legend.position="bottom") +
    theme(legend.title=element_blank())



Scoring efficiency among the SuperScorers


Let’s take a step further by comparing their shooting capability to each other and to the NBA average with radar plot.

Since unavailability of the data (in this case three-points data) will mess up the plot, I exclude those three old-timers from our list so we proceed with 5 players.

The variables used for comparison:

  • X2Pm: normalized 2 points Field Goal made per minute
  • X2Pa: normalized 2 points Field Goal attempt per minute
  • X2P.: normalized 2 points Field Goal percentage
  • X3Pm: normalized 3 points Field Goal made per minute
  • X3Pa: normalized 3 points Field Goal attempt per minute
  • X3P.: normalized 3 points Field Goal percentage
  • FTm: normalized Free Throw made per minute
  • FTa: normalized Free Throw attempt per minute
  • FT.: normalized Free Throw percentage
  • TS.: normalized true shooting percentage
  • eFG.: normalized effective field goal percentage
RadarScore <- NBA_Scaled %>%
    group_by(Player) %>%
    summarise(X2Pm = mean(X2P_pM),
              X2Pa = mean(X2PA_pM),
              X2P. = mean(X2P., na.rm=T),
              X3Pm = mean(X3P_pM),
              X3Pa = mean(X3PA_pM),
              X3P. = mean(X3P., na.rm=T),
              FTm = mean(FT_pM),
              FTa = mean(FTA_pM),
              FT. = mean(FT., na.rm=T),
              TS. = mean(TS., na.rm=T),
              eFG. = mean(eFG., na.rm=T)) %>%
    filter(Player %in% SuperScorers$Player) %>%
    select(-Player)
radarPPG <- plot_ly(type = 'scatterpolar',
        fill = 'toself',
        mode = 'lines') %>%
    add_trace(r = as.numeric(as.vector(RadarScore[7,])),
              theta = as.character(as.vector(colnames(RadarScore))),
              name = 'Michael Jordan') %>%
    add_trace(r = as.numeric(as.vector(RadarScore[4,])),
              theta = as.character(as.vector(colnames(RadarScore))),
              name = 'Kevin Durant') %>%
    add_trace(r = as.numeric(as.vector(RadarScore[6,])),
              theta = as.character(as.vector(colnames(RadarScore))),
              name = 'LeBron James') %>%
    add_trace(r = as.numeric(as.vector(RadarScore[3,])),
              theta = as.character(as.vector(colnames(RadarScore))),
              name = 'Karl Malone') %>%
    add_trace(r = as.numeric(as.vector(RadarScore[5,])),
              theta = as.character(as.vector(colnames(RadarScore))),
              name = 'Kobe Bryant') %>%
    layout(polar = list(radialaxis = list(visible = T,
                                          range = c(-1.5, 1.5))))
api_create(radarPPG, filename = "radarPPG")
Found a grid already named: 'radarPPG Grid'. Since fileopt='overwrite', I'll try to update it
Found a plot already named: 'radarPPG'. Since fileopt='overwrite', I'll try to update it


radarPPG

click image for interactive plotly graph

How to interpret the plot:

  • Click on the ‘player legend’ in the upper right of the graph to select/unselect each position, double-click to isolate the trace. You can compare them in any way as you desire.
  • “0” in the scale is the average of each variable, anything below “0” tells us that the player performance in that particular category is below average, and vice versa.


  • Michael Jordan leads the group in both X2Pm and X2Pa category.
  • Kevin Durant leads the group in all three-points category (X3Pm, X3Pa, X3P.), FTm, FT., TS., and eFG.. He trails in X2PA category.
  • LeBron James leads the group in X2P. category. He trails in FTm category.
  • Karl Malone leads the group in FTa category. He trails in all three-points (X3Pm, X3Pa, X3P.) and FT. category, since he’s the only post player in the group.
  • Kobe Bryant trails in X2Pm, X2P., FTa, TS., eFG. category, which made him the most ineffective scorer in this Superscorer group.




End of Session


