Loading Data

nba <- nba %>%
  distinct(Year, Player, Tm, .keep_all = T)
nba %>%
  group_by(Tm) %>%
  summarise(tm_count = n())
## # A tibble: 41 × 2
##    Tm    tm_count
##    <chr>    <int>
##  1 ATL        721
##  2 BOS        687
##  3 BRK        207
##  4 CHA        176
##  5 CHH        237
##  6 CHI        685
##  7 CHO        133
##  8 CLE        740
##  9 DAL        688
## 10 DEN        686
## # ℹ 31 more rows

Unclear without documentation

  1. Tm - Team

    Tm is a simple column do show which team a player played for during a season. Without having knowledge of the NBA and its current/historical some of these abbreviation may not make sense. The following will provide brief descriptions for each abbreviation that appears in the dataset. It is important to know what these stand for and what years they were used for so when grouping by teams you can explain why some group counts are much lower than the others.

    Current NBA Teams:

    • ATL - Atlanta Hawks (1968-69 to current)
    • BOS - Boston Celtics (1946-47 to current)
    • BRK - Brooklyn Nets (2012-13 to current)
    • CHO - Charlotte Hornets (2014-15 to current)
    • CHI - Chicago Bulls (1966-67 to current)
    • CLE - Cleveland Cavaliers (1970-71 to current)
    • DAL - Dallas Mavericks (1980-81 to current)
    • DEN - Denver Nuggets (1976-77 to current)
    • DET - Detroit Pistons (1957-58 to current)
    • GSW - Golden State Warriors (1971-72 to current)
    • HOU - Houston Rockets (1971-72 to current)
    • IND - Indiana Pacers ( 1976-77 to current)
    • LAC - Los Angeles Clippers (1984-85 to current)
    • LAL - Los Angeles Lakers (1960-61 to current)
    • MEM - Memphis Grizzlies (2001-02 to current)
    • MIA - Miami Heat (1988-89 to current)
    • MIL - Milwaukee Bucks (1968-69 to current)
    • MIN - Minnesota Timberwolves (1989-90 to current)
    • NOP - New Orleans Pelicans (2013-14 to current)
    • NYK - New York Knicks (1946-47 to current)
    • OKC - Oklahoma City Thunder (2008-09 to current)
    • ORL - Orlando Magic (1989-90 to current)
    • PHI - Philadelphia 76ers (1963-64 to current)
    • PHO - Phoenix Suns (1968-69 to current)
    • POR - Portland Trail Blazers (1970-71 to current)
    • SAC - Sacremento Kings (1985-86 to current)
    • SAS - San Antonia Spurs (1976-77 to current)
    • TOR - Toronto Raptors (1995-96 to current)
    • UTA - Utah Jazz (1979-1980to current)
    • WAS - Washington Wizards (1997-98 to current)

    Historical NBA Teams:

    • CHA - Charlotte Bobcats (2004-05 to 2013-14)
    • CHH - Charlotte Hornets (1988-89 to 2001-02)
    • KCK - Kansas City Kings (1975-76 to 1994-85)
    • NJN - New Jersey Nets (1977-78 to 2011-12)
    • NOH - New Orleans Hornets (2002-03 to 2004-05 and 2007-08 to 2012-13)
    • NOK - New Orleans-Oklahoma City Hornets (2005-06 to 2006-07)
    • SDC - San Diego Clippers (1978-79 to 1983-84)
    • SEA - Seattle Supersonics (1967-68 to 2007-08)
    • VAN - Vancouver Grizzlies (1995-96 to 2000-01)
    • WSB - Washington Bullets (1974-75 to 1996-97)

    Other:

    • TOT - Total
      • For players that are trading mid-season. Rows with TOT are cumulative stats for the season across all teams that a player played for.
  2. VORP - Value Over Replacement Player

    • A box score estimate of the points per 100 TEAM possessions that a player contributed above a replacement-level (-2.0) player, translated to an average team and prorated to an 82-game season. Multiply by 2.70 to convert to wins over replacement.
  3. OWS, DWS, WS, and WS/48

    • OWS - Offensive Win Shares
      • Offensive Win Shares are credited to players based on Dean Oliver’s points produced and offensive possessions. The formulas are quite detailed, so I would point you to Oliver’s book Basketball on Paper for complete details.
      • Calculate points produced for each player. Based on formula from Basketball on Paper mentioned above.
      • Calculate offensive possessions for each player. Based on formula from Basketball on Paper mentioned above.
      • Calculate marginal offense for each player. Marginal offense is equal to (points produced) - 0.92 * (league points per possession) * (offensive possessions). Note that this formula may produce a negative result for some players.
      • Calculate marginal points per win. Marginal points per win reduces to 0.32 * (league points per game) * ((team pace) / (league pace)).
      • Credit Offensive Win Shares to the players. Offensive Win Shares are credited using the following formula: (marginal offense) / (marginal points per win).
    • DWS - Defensive Win Shares
      • Crediting Defensive Win Shares to players is based on Dean Oliver’s Defensive Rating. Defensive Rating is an estimate of the player’s points allowed per 100 defensive possessions. Once again using formulas from Dean Oliver’s Basketball on Paper.
      • Calculate the Defensive Rating for each player. James’s Defensive Rating in 2008-09 was 99.1.
      • Calculate marginal defense for each player. Marginal defense is equal to (player minutes played / team minutes played) * (team defensive possessions) * (1.08 * (league points per possession) - ((Defensive Rating) / 100)). Note that this formula may produce a negative result for some players.
      • Calculate marginal points per win. Marginal points per win reduces to 0.32 * (league points per game) * ((team pace) / (league pace)).
      • Credit Defensive Win Shares to the players. Defensive Win Shares are credited using the following formula: (marginal defense) / (marginal points per win). Defensive Win Shares.
    • WS - Win Shares
      • OWS + DWS
    • WS/48 - Win Shares per 48 minutes
      • An estimate of the number of wins contributed by the player per 48 minutes

Unclear after reading documentation

  1. 3PAr - 3 Point FG Attempt rate

    • This stat is not recorded in the documentation provided by Basketball Reference. To figure out what this stat means I tested a couple things.
    • The first being 3PA/(2PA+3PA+FTA)
    nba %>%
      filter(Player == "LeBron James", Year == 2019) %>%
      select(Year, Player, `2PA`, FTA, `3PA`, `3PAr`) %>%
      mutate(test = round(`3PA`/(`3PA`+FTA+`2PA`),digits = 3)) %>%
      select(Year, Player, `3PAr`, test)
    ## # A tibble: 1 × 4
    ##    Year Player       `3PAr`  test
    ##   <dbl> <chr>         <dbl> <dbl>
    ## 1  2019 LeBron James  0.299 0.216
    • The second was 3PA/(2PA+3PA)
nba %>%
  filter(Player == "LeBron James", Year == 2019) %>%
  select(Year, Player, `2PA`, `3PA`, `3PAr`) %>%
  mutate(test = round(`3PA`/(`3PA`+`2PA`), digits = 3)) %>%
  select(Year, Player, `3PAr`, test)
## # A tibble: 1 × 4
##    Year Player       `3PAr`  test
##   <dbl> <chr>         <dbl> <dbl>
## 1  2019 LeBron James  0.299 0.299

Visualization

n_abbreviations <- length(unique(nba$Tm))
palette1 <- rep('darkgrey',times = n_abbreviations)
palette1_named <- setNames(object = palette1, nm = unique(nba$Tm))
palette1_named['SEA'] = 'orange'
palette1_named['OKC'] = 'orange'
palette1_named['BRK'] = 'black'
palette1_named['NJN'] = 'black'
palette1_named['CHO'] = 'lightblue'
palette1_named['CHA'] = 'lightblue'
palette1_named['CHH'] = 'lightblue'
palette1_named['MEM'] = 'yellow'
palette1_named['VAN'] = 'yellow'
palette1_named['NOP'] = 'gold'
palette1_named['NOH'] = 'gold'
palette1_named['NOK'] = 'gold'
palette1_named['LAC'] = 'red'
palette1_named['SDC'] = 'red'
palette1_named['WAS'] = 'blue'
palette1_named['WSB'] = 'blue'
palette1_named['SAC'] = 'purple'
palette1_named['KCK'] = 'purple'
palette1_named['TOT'] = 'pink'
nba %>%
  ggplot() +
  geom_bar(mapping = aes(x=Tm, fill = Tm)) +
  scale_fill_manual(values = palette1_named) +
  theme(axis.text.x = element_text(angle = 60, vjust = 0.5))

# point out abbreviations that should go together id SEA/OKC

This bar plot shows the counts of the team abbreviations. As you can see all of the non colored teams hover around the 750 mark, which is expected because this count should be close to uniform across the league. However when you look at all of the teams with values significantly below this amount we find that it is due to the team having a different abbreviation(s) across the years that data is for. The risk that this could create if not accounted for would present itself when attempting to find team averages. Your results would have more than the current number of NBA teams and the numbers would not be accurate as you would not be looking at the full data for each team.