Loading Data
nba <- nba %>%
distinct(Year, Player, Tm, .keep_all = T)
nba %>%
group_by(Tm) %>%
summarise(tm_count = n())
## # A tibble: 41 × 2
## Tm tm_count
## <chr> <int>
## 1 ATL 721
## 2 BOS 687
## 3 BRK 207
## 4 CHA 176
## 5 CHH 237
## 6 CHI 685
## 7 CHO 133
## 8 CLE 740
## 9 DAL 688
## 10 DEN 686
## # ℹ 31 more rows
Unclear without documentation
Tm - Team
Tm is a simple column do show which team a player played for during a
season. Without having knowledge of the NBA and its current/historical
some of these abbreviation may not make sense. The following will
provide brief descriptions for each abbreviation that appears in the
dataset. It is important to know what these stand for and what years
they were used for so when grouping by teams you can explain why some
group counts are much lower than the others.
Current NBA Teams:
- ATL - Atlanta Hawks (1968-69 to current)
- BOS - Boston Celtics (1946-47 to current)
- BRK - Brooklyn Nets (2012-13 to current)
- CHO - Charlotte Hornets (2014-15 to current)
- CHI - Chicago Bulls (1966-67 to current)
- CLE - Cleveland Cavaliers (1970-71 to current)
- DAL - Dallas Mavericks (1980-81 to current)
- DEN - Denver Nuggets (1976-77 to current)
- DET - Detroit Pistons (1957-58 to current)
- GSW - Golden State Warriors (1971-72 to current)
- HOU - Houston Rockets (1971-72 to current)
- IND - Indiana Pacers ( 1976-77 to current)
- LAC - Los Angeles Clippers (1984-85 to current)
- LAL - Los Angeles Lakers (1960-61 to current)
- MEM - Memphis Grizzlies (2001-02 to current)
- MIA - Miami Heat (1988-89 to current)
- MIL - Milwaukee Bucks (1968-69 to current)
- MIN - Minnesota Timberwolves (1989-90 to current)
- NOP - New Orleans Pelicans (2013-14 to current)
- NYK - New York Knicks (1946-47 to current)
- OKC - Oklahoma City Thunder (2008-09 to current)
- ORL - Orlando Magic (1989-90 to current)
- PHI - Philadelphia 76ers (1963-64 to current)
- PHO - Phoenix Suns (1968-69 to current)
- POR - Portland Trail Blazers (1970-71 to current)
- SAC - Sacremento Kings (1985-86 to current)
- SAS - San Antonia Spurs (1976-77 to current)
- TOR - Toronto Raptors (1995-96 to current)
- UTA - Utah Jazz (1979-1980to current)
- WAS - Washington Wizards (1997-98 to current)
Historical NBA Teams:
- CHA - Charlotte Bobcats (2004-05 to 2013-14)
- CHH - Charlotte Hornets (1988-89 to 2001-02)
- KCK - Kansas City Kings (1975-76 to 1994-85)
- NJN - New Jersey Nets (1977-78 to 2011-12)
- NOH - New Orleans Hornets (2002-03 to 2004-05 and 2007-08 to
2012-13)
- NOK - New Orleans-Oklahoma City Hornets (2005-06 to 2006-07)
- SDC - San Diego Clippers (1978-79 to 1983-84)
- SEA - Seattle Supersonics (1967-68 to 2007-08)
- VAN - Vancouver Grizzlies (1995-96 to 2000-01)
- WSB - Washington Bullets (1974-75 to 1996-97)
Other:
- TOT - Total
- For players that are trading mid-season. Rows with TOT are
cumulative stats for the season across all teams that a player played
for.
VORP - Value Over
Replacement Player
- A box score estimate of the points per 100 TEAM possessions that a
player contributed above a replacement-level (-2.0) player, translated
to an average team and prorated to an 82-game season. Multiply by 2.70
to convert to wins over replacement.
OWS, DWS, WS, and WS/48
- OWS - Offensive Win Shares
- Offensive Win Shares are credited to players based on Dean Oliver’s
points produced and offensive possessions. The formulas are quite
detailed, so I would point you to Oliver’s book Basketball on Paper for
complete details.
- Calculate points produced for each player. Based on
formula from Basketball on Paper mentioned above.
- Calculate offensive possessions for each player.
Based on formula from Basketball on Paper mentioned above.
- Calculate marginal offense for each player.
Marginal offense is equal to (points produced) - 0.92 * (league points
per possession) * (offensive possessions). Note that this formula may
produce a negative result for some players.
- Calculate marginal points per win. Marginal points
per win reduces to 0.32 * (league points per game) * ((team pace) /
(league pace)).
- Credit Offensive Win Shares to the players.
Offensive Win Shares are credited using the following formula: (marginal
offense) / (marginal points per win).
- DWS - Defensive Win Shares
- Crediting Defensive Win Shares to players is based on Dean Oliver’s
Defensive Rating. Defensive Rating is an estimate of the player’s points
allowed per 100 defensive possessions. Once again using formulas from
Dean Oliver’s Basketball on Paper.
- Calculate the Defensive Rating for each player.
James’s Defensive Rating in 2008-09 was 99.1.
- Calculate marginal defense for each player.
Marginal defense is equal to (player minutes played / team minutes
played) * (team defensive possessions) * (1.08 * (league points per
possession) - ((Defensive Rating) / 100)). Note that this formula may
produce a negative result for some players.
- Calculate marginal points per win. Marginal points
per win reduces to 0.32 * (league points per game) * ((team pace) /
(league pace)).
- Credit Defensive Win Shares to the players.
Defensive Win Shares are credited using the following formula: (marginal
defense) / (marginal points per win). Defensive Win Shares.
- WS - Win Shares
- WS/48 - Win Shares per 48 minutes
- An estimate of the number of wins contributed by the player per 48
minutes
Unclear after reading documentation
3PAr - 3 Point FG
Attempt rate
- This stat is not recorded in the documentation provided by
Basketball Reference. To figure out what this stat means I tested a
couple things.
- The first being 3PA/(2PA+3PA+FTA)
nba %>%
filter(Player == "LeBron James", Year == 2019) %>%
select(Year, Player, `2PA`, FTA, `3PA`, `3PAr`) %>%
mutate(test = round(`3PA`/(`3PA`+FTA+`2PA`),digits = 3)) %>%
select(Year, Player, `3PAr`, test)
## # A tibble: 1 × 4
## Year Player `3PAr` test
## <dbl> <chr> <dbl> <dbl>
## 1 2019 LeBron James 0.299 0.216
- The second was 3PA/(2PA+3PA)
nba %>%
filter(Player == "LeBron James", Year == 2019) %>%
select(Year, Player, `2PA`, `3PA`, `3PAr`) %>%
mutate(test = round(`3PA`/(`3PA`+`2PA`), digits = 3)) %>%
select(Year, Player, `3PAr`, test)
## # A tibble: 1 × 4
## Year Player `3PAr` test
## <dbl> <chr> <dbl> <dbl>
## 1 2019 LeBron James 0.299 0.299
- As you can see here the formula 3PA/(2PA+3PA) = 3PAr
Visualization
n_abbreviations <- length(unique(nba$Tm))
palette1 <- rep('darkgrey',times = n_abbreviations)
palette1_named <- setNames(object = palette1, nm = unique(nba$Tm))
palette1_named['SEA'] = 'orange'
palette1_named['OKC'] = 'orange'
palette1_named['BRK'] = 'black'
palette1_named['NJN'] = 'black'
palette1_named['CHO'] = 'lightblue'
palette1_named['CHA'] = 'lightblue'
palette1_named['CHH'] = 'lightblue'
palette1_named['MEM'] = 'yellow'
palette1_named['VAN'] = 'yellow'
palette1_named['NOP'] = 'gold'
palette1_named['NOH'] = 'gold'
palette1_named['NOK'] = 'gold'
palette1_named['LAC'] = 'red'
palette1_named['SDC'] = 'red'
palette1_named['WAS'] = 'blue'
palette1_named['WSB'] = 'blue'
palette1_named['SAC'] = 'purple'
palette1_named['KCK'] = 'purple'
palette1_named['TOT'] = 'pink'
nba %>%
ggplot() +
geom_bar(mapping = aes(x=Tm, fill = Tm)) +
scale_fill_manual(values = palette1_named) +
theme(axis.text.x = element_text(angle = 60, vjust = 0.5))

# point out abbreviations that should go together id SEA/OKC
This bar plot shows the counts of the team abbreviations. As you can
see all of the non colored teams hover around the 750 mark, which is
expected because this count should be close to uniform across the
league. However when you look at all of the teams with values
significantly below this amount we find that it is due to the team
having a different abbreviation(s) across the years that data is for.
The risk that this could create if not accounted for would present
itself when attempting to find team averages. Your results would have
more than the current number of NBA teams and the numbers would not be
accurate as you would not be looking at the full data for each team.