The RAPTOR data is a set of proprietary advanced metrics which rate NBA players based on “plus-minus,” or a team’s point differential when that player is on the floor. 538 bills RAPTOR as a way for NBA fans to evaluate players more like teams do–that is, based on their contributions to team victories as much or more than individual metrics like points scored or shots blocked. As data literacy among journalists and viewers has increased in recent years, such metrics have grown in their importance for player awards and public perception of skill.
How RAPTOR works: https://fivethirtyeight.com/features/how-our-raptor-metric-works/
Bringing in tidyverse for analysis
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Pulling in RAPTOR data from CSV
nba_raptor <- read.csv(file = 'modern_RAPTOR_by_player.csv')
head(nba_raptor)
## player_name player_id season poss mp raptor_box_offense
## 1 Alex Abrines abrinal01 2017 2387 1135 0.745505
## 2 Alex Abrines abrinal01 2018 2546 1244 0.317549
## 3 Alex Abrines abrinal01 2019 1279 588 -3.215683
## 4 Precious Achiuwa achiupr01 2021 1581 749 -4.122966
## 5 Precious Achiuwa achiupr01 2022 3802 1892 -2.521510
## 6 Quincy Acy acyqu01 2014 1716 847 -1.716079
## raptor_box_defense raptor_box_total raptor_onoff_offense raptor_onoff_defense
## 1 -0.3729378 0.3725672 -0.4185528 -3.85701115
## 2 -1.7253253 -1.4077762 -1.2917271 -0.04969363
## 3 1.0783985 -2.1372847 -6.1588565 4.90116827
## 4 1.3592780 -2.7636875 -4.0507790 -0.91971189
## 5 1.7635023 -0.7580077 -1.6878926 3.10344083
## 6 0.1331150 -1.5829641 -0.3248108 -1.66149523
## raptor_onoff_total raptor_offense raptor_defense raptor_total war_total
## 1 -4.275564 0.54342123 -1.1448318 -0.6014106 1.2490077
## 2 -1.341421 -0.02082588 -1.5026417 -1.5234676 0.7773045
## 3 -1.257688 -4.04015659 1.8856184 -2.1545382 0.1781673
## 4 -4.970491 -4.34759588 0.9548211 -3.3927748 -0.2460551
## 5 1.415548 -2.51737239 2.1441514 -0.3732210 2.2626580
## 6 -1.986306 -1.56525715 -0.2164768 -1.7817339 0.4159181
## war_reg_season war_playoffs predator_offense predator_defense predator_total
## 1 1.4477081 -0.1987004440 0.07710201 -1.0386773 -0.9615753
## 2 0.4659122 0.3113923090 -0.17462117 -1.1126254 -1.2872466
## 3 0.1781673 0.0000000000 -4.57767760 1.5432817 -3.0343959
## 4 -0.2467764 0.0007213383 -3.81771271 0.4748280 -3.3428847
## 5 2.3096110 -0.0469529228 -2.48395630 2.0243602 -0.4595961
## 6 0.4159181 0.0000000000 -1.46441703 -0.2237543 -1.6881713
## pace_impact
## 1 0.3264127
## 2 -0.4561412
## 3 -0.2680131
## 4 0.3291573
## 5 -0.7286095
## 6 -0.5548977
Limit data to 2022 and to players who played at least 1000 minutes in the season. This is to eliminate player duplicates (each row is a “player season,” so player may occur in multiple instances), and to consider only players who have played a meaningful amount in games. 538 themselves use the 1000 minute standard in some of their graphics.
df22 <- subset(nba_raptor, season == 2022 & mp >= 1000)
head(df22)
## player_name player_id season poss mp raptor_box_offense
## 5 Precious Achiuwa achiupr01 2022 3802 1892 -2.5215099
## 25 Steven Adams adamsst01 2022 4392 2113 0.4706397
## 30 Bam Adebayo adebaba01 2022 4893 2439 -0.3994730
## 59 LaMarcus Aldridge aldrila01 2022 2205 1050 -0.5158208
## 64 Nickeil Alexander-Walker alexani01 2022 3037 1471 -1.5011287
## 70 Grayson Allen allengr01 2022 4468 2110 0.4438660
## raptor_box_defense raptor_box_total raptor_onoff_offense
## 5 1.7635023 -0.7580077 -1.687893
## 25 1.3494540 1.8200937 6.291397
## 30 3.7219120 3.3224390 1.775652
## 59 2.2454400 1.7296192 1.093065
## 64 -1.2098121 -2.7109408 -2.399584
## 70 -0.3152036 0.1286624 -0.683337
## raptor_onoff_defense raptor_onoff_total raptor_offense raptor_defense
## 5 3.1034408 1.4155482 -2.517372390 2.14415141
## 25 -3.1653182 3.1260786 1.685480150 0.45034847
## 30 3.4814564 5.2571083 0.007084622 3.84834228
## 59 -0.4621391 0.6309262 -0.241157321 1.83076823
## 64 -0.9571289 -3.3567124 -1.773719916 -1.22306792
## 70 1.1312156 0.4478786 0.214310753 -0.06938761
## raptor_total war_total war_reg_season war_playoffs predator_offense
## 5 -0.3732210 2.2626580 2.3096110 -0.04695292 -2.48395630
## 25 2.1358286 5.2635871 5.1405873 0.12299984 1.37333950
## 30 3.8554269 8.2752185 6.1026294 2.17258916 0.30549220
## 59 1.5896109 2.3195180 2.3195180 0.00000000 0.29086793
## 64 -2.9967878 -0.1984665 -0.2948744 0.09640794 -0.74198252
## 70 0.1449231 3.1226038 2.8661344 0.25646938 0.02142403
## predator_defense predator_total pace_impact
## 5 2.0243602 -0.4595961 -0.72860950
## 25 1.5263545 2.8996940 -0.24242976
## 30 3.4100854 3.7155776 -0.11814838
## 59 0.8445189 1.1353869 -0.22604974
## 64 -1.3130290 -2.0550115 0.68626303
## 70 -0.2913752 -0.2699512 0.02209729
Check to make sure there are still no duplicates.
length(df22$player_id) == length(unique(df22$player_id))
## [1] TRUE
Eliminate unnecessary columns, including:
season: unnecessary since its all 2022 now
player_id: maybe useful for a future analysis, but non-intuitive for knowing which player is being referenced
most of the RAPTOR metrics, outside of raptor_total. The other metrics are interesting, but would overcrowd an analysis of overall RAPTOR
df22 <- subset(df22, select = c("player_name","poss","mp","raptor_total"))
head(df22)
## player_name poss mp raptor_total
## 5 Precious Achiuwa 3802 1892 -0.3732210
## 25 Steven Adams 4392 2113 2.1358286
## 30 Bam Adebayo 4893 2439 3.8554269
## 59 LaMarcus Aldridge 2205 1050 1.5896109
## 64 Nickeil Alexander-Walker 3037 1471 -2.9967878
## 70 Grayson Allen 4468 2110 0.1449231
Make column names more intuitive, and sort by raptor_total
colnames(df22) <- c("player_name","possessions_played","minutes_played","raptor_total")
df22 <- df22[order(df22$raptor_total, decreasing = TRUE), ]
head(df22)
## player_name possessions_played minutes_played raptor_total
## 2305 Nikola Jokic 5481 2647 14.572411
## 142 Giannis Antetokounmpo 5669 2652 8.066856
## 1291 Joel Embiid 5377 2682 7.775600
## 1565 Rudy Gobert 4762 2317 6.886323
## 1034 Stephen Curry 6218 2975 6.795283
## 1165 Luka Doncic 5683 2853 6.382791
With the current dataframe, it is easy to see which high-usage players are considered effective by the RAPTOR metric. Unsurprisingly, and perhaps to the credit of the model builders, 2-time league MVP Nikola Jokic tops the list, and is joined in the top 5 by other superstar players like Giannis Antetokounmpo and Steph Curry.
In a future analysis, I would want to merge this dataframe with another that includes each player’s team. That way, I could conduct an analysis of which teams have the best RAPTOR scores on average among key players, and eventually identify the degree to which those figures are predictive of team success.