data607_hw1_nbaraptor

Overview

The RAPTOR data is a set of proprietary advanced metrics which rate NBA players based on “plus-minus,” or a team’s point differential when that player is on the floor. 538 bills RAPTOR as a way for NBA fans to evaluate players more like teams do–that is, based on their contributions to team victories as much or more than individual metrics like points scored or shots blocked. As data literacy among journalists and viewers has increased in recent years, such metrics have grown in their importance for player awards and public perception of skill.

How RAPTOR works: https://fivethirtyeight.com/features/how-our-raptor-metric-works/

Imports

Bringing in tidyverse for analysis

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Read Data

Pulling in RAPTOR data from CSV

nba_raptor <- read.csv(file = 'modern_RAPTOR_by_player.csv')

head(nba_raptor)

##        player_name player_id season poss   mp raptor_box_offense
## 1     Alex Abrines abrinal01   2017 2387 1135           0.745505
## 2     Alex Abrines abrinal01   2018 2546 1244           0.317549
## 3     Alex Abrines abrinal01   2019 1279  588          -3.215683
## 4 Precious Achiuwa achiupr01   2021 1581  749          -4.122966
## 5 Precious Achiuwa achiupr01   2022 3802 1892          -2.521510
## 6       Quincy Acy   acyqu01   2014 1716  847          -1.716079
##   raptor_box_defense raptor_box_total raptor_onoff_offense raptor_onoff_defense
## 1         -0.3729378        0.3725672           -0.4185528          -3.85701115
## 2         -1.7253253       -1.4077762           -1.2917271          -0.04969363
## 3          1.0783985       -2.1372847           -6.1588565           4.90116827
## 4          1.3592780       -2.7636875           -4.0507790          -0.91971189
## 5          1.7635023       -0.7580077           -1.6878926           3.10344083
## 6          0.1331150       -1.5829641           -0.3248108          -1.66149523
##   raptor_onoff_total raptor_offense raptor_defense raptor_total  war_total
## 1          -4.275564     0.54342123     -1.1448318   -0.6014106  1.2490077
## 2          -1.341421    -0.02082588     -1.5026417   -1.5234676  0.7773045
## 3          -1.257688    -4.04015659      1.8856184   -2.1545382  0.1781673
## 4          -4.970491    -4.34759588      0.9548211   -3.3927748 -0.2460551
## 5           1.415548    -2.51737239      2.1441514   -0.3732210  2.2626580
## 6          -1.986306    -1.56525715     -0.2164768   -1.7817339  0.4159181
##   war_reg_season  war_playoffs predator_offense predator_defense predator_total
## 1      1.4477081 -0.1987004440       0.07710201       -1.0386773     -0.9615753
## 2      0.4659122  0.3113923090      -0.17462117       -1.1126254     -1.2872466
## 3      0.1781673  0.0000000000      -4.57767760        1.5432817     -3.0343959
## 4     -0.2467764  0.0007213383      -3.81771271        0.4748280     -3.3428847
## 5      2.3096110 -0.0469529228      -2.48395630        2.0243602     -0.4595961
## 6      0.4159181  0.0000000000      -1.46441703       -0.2237543     -1.6881713
##   pace_impact
## 1   0.3264127
## 2  -0.4561412
## 3  -0.2680131
## 4   0.3291573
## 5  -0.7286095
## 6  -0.5548977

Filtering

Limit data to 2022 and to players who played at least 1000 minutes in the season. This is to eliminate player duplicates (each row is a “player season,” so player may occur in multiple instances), and to consider only players who have played a meaningful amount in games. 538 themselves use the 1000 minute standard in some of their graphics.

df22 <- subset(nba_raptor, season == 2022 & mp >= 1000)

head(df22)

##                 player_name player_id season poss   mp raptor_box_offense
## 5          Precious Achiuwa achiupr01   2022 3802 1892         -2.5215099
## 25             Steven Adams adamsst01   2022 4392 2113          0.4706397
## 30              Bam Adebayo adebaba01   2022 4893 2439         -0.3994730
## 59        LaMarcus Aldridge aldrila01   2022 2205 1050         -0.5158208
## 64 Nickeil Alexander-Walker alexani01   2022 3037 1471         -1.5011287
## 70            Grayson Allen allengr01   2022 4468 2110          0.4438660
##    raptor_box_defense raptor_box_total raptor_onoff_offense
## 5           1.7635023       -0.7580077            -1.687893
## 25          1.3494540        1.8200937             6.291397
## 30          3.7219120        3.3224390             1.775652
## 59          2.2454400        1.7296192             1.093065
## 64         -1.2098121       -2.7109408            -2.399584
## 70         -0.3152036        0.1286624            -0.683337
##    raptor_onoff_defense raptor_onoff_total raptor_offense raptor_defense
## 5             3.1034408          1.4155482   -2.517372390     2.14415141
## 25           -3.1653182          3.1260786    1.685480150     0.45034847
## 30            3.4814564          5.2571083    0.007084622     3.84834228
## 59           -0.4621391          0.6309262   -0.241157321     1.83076823
## 64           -0.9571289         -3.3567124   -1.773719916    -1.22306792
## 70            1.1312156          0.4478786    0.214310753    -0.06938761
##    raptor_total  war_total war_reg_season war_playoffs predator_offense
## 5    -0.3732210  2.2626580      2.3096110  -0.04695292      -2.48395630
## 25    2.1358286  5.2635871      5.1405873   0.12299984       1.37333950
## 30    3.8554269  8.2752185      6.1026294   2.17258916       0.30549220
## 59    1.5896109  2.3195180      2.3195180   0.00000000       0.29086793
## 64   -2.9967878 -0.1984665     -0.2948744   0.09640794      -0.74198252
## 70    0.1449231  3.1226038      2.8661344   0.25646938       0.02142403
##    predator_defense predator_total pace_impact
## 5         2.0243602     -0.4595961 -0.72860950
## 25        1.5263545      2.8996940 -0.24242976
## 30        3.4100854      3.7155776 -0.11814838
## 59        0.8445189      1.1353869 -0.22604974
## 64       -1.3130290     -2.0550115  0.68626303
## 70       -0.2913752     -0.2699512  0.02209729

Check to make sure there are still no duplicates.

length(df22$player_id) == length(unique(df22$player_id))

## [1] TRUE

Column Curation

Eliminate unnecessary columns, including:

season: unnecessary since its all 2022 now

player_id: maybe useful for a future analysis, but non-intuitive for knowing which player is being referenced

most of the RAPTOR metrics, outside of raptor_total. The other metrics are interesting, but would overcrowd an analysis of overall RAPTOR

df22 <- subset(df22, select = c("player_name","poss","mp","raptor_total"))

head(df22)

##                 player_name poss   mp raptor_total
## 5          Precious Achiuwa 3802 1892   -0.3732210
## 25             Steven Adams 4392 2113    2.1358286
## 30              Bam Adebayo 4893 2439    3.8554269
## 59        LaMarcus Aldridge 2205 1050    1.5896109
## 64 Nickeil Alexander-Walker 3037 1471   -2.9967878
## 70            Grayson Allen 4468 2110    0.1449231

Tidy data

Make column names more intuitive, and sort by raptor_total

colnames(df22) <- c("player_name","possessions_played","minutes_played","raptor_total")

df22 <- df22[order(df22$raptor_total, decreasing = TRUE), ]

head(df22)

##                player_name possessions_played minutes_played raptor_total
## 2305          Nikola Jokic               5481           2647    14.572411
## 142  Giannis Antetokounmpo               5669           2652     8.066856
## 1291           Joel Embiid               5377           2682     7.775600
## 1565           Rudy Gobert               4762           2317     6.886323
## 1034         Stephen Curry               6218           2975     6.795283
## 1165           Luka Doncic               5683           2853     6.382791

Conclusion

With the current dataframe, it is easy to see which high-usage players are considered effective by the RAPTOR metric. Unsurprisingly, and perhaps to the credit of the model builders, 2-time league MVP Nikola Jokic tops the list, and is joined in the top 5 by other superstar players like Giannis Antetokounmpo and Steph Curry.

Next steps

In a future analysis, I would want to merge this dataframe with another that includes each player’s team. That way, I could conduct an analysis of which teams have the best RAPTOR scores on average among key players, and eventually identify the degree to which those figures are predictive of team success.