This report analyzes NBA play data to extract useful insights, such as offense and defense ratings, based on various metrics like player positions, distances, and interactions on the court.
# Load necessary libraries
library(tidyverse)
library(ggforce)
library(fields)
library(RcppHungarian)
nba <- read.csv("AT2_nba_data.csv")
The following code visualizes a single play, plot_play, to better understand player positions and the court layout.
The x and y coordinates of each player on offense and defense are provided, as well as indicators of if the certain player is beyond the three point line, which player has the ball, and if that player has used their dribble
## id_play quarter game_clock dribble threept player_id x y
## 1 93077 1 680.88 1 1 ball 25.51623 11.92743
## 2 93077 1 680.88 1 1 on_ball 25.94406 11.73578
## 3 93077 1 680.88 1 1 off1 14.83673 1.53336
## 4 93077 1 680.88 1 1 off2 28.67697 40.97999
## 5 93077 1 680.88 1 1 off3 5.15659 6.99852
## 6 93077 1 680.88 1 1 off4 16.06194 44.83964
## 7 93077 1 680.88 1 1 def1 13.82802 12.62237
## 8 93077 1 680.88 1 1 def2 23.62568 14.20491
## 9 93077 1 680.88 1 1 def3 18.89399 31.31310
## 10 93077 1 680.88 1 1 def4 8.58958 33.07449
## 11 93077 1 680.88 1 1 def5 7.10733 16.55830
## speed team
## 1 NA ball
## 2 4.635967 off
## 3 NA off
## 4 NA off
## 5 NA off
## 6 NA off
## 7 NA def
## 8 NA def
## 9 NA def
## 10 NA def
## 11 NA def
Using the play data provided, I put together a dangerousity formula incorporating an offensive and defensive rating
The speed of the on-ball player is used as a positive variable as the higher the speed of the player, the more likely they are to be faster than their opponent.
Dribble Used is used as a negative value as if the player has used their dribble, the overall likeliness of the team scoring decreases.
Open Teammates Ahead is the sum of total open players ahead of the player with the ball. If a teammate is open ahead of the player with the ball, it means they are open closer to the basket, and the threat is determined by if the player with the ball can find them
If a player is wide open, they are much more dangerous to score, with the only restriction being how far away they are from the basket.
Ball Zone was calculated through the distance of the player with the ball to the basket, with the closer they are causing a jump in offensive rating as they are considered to be more of a threat to score the closer they are to the basket.
An open teammate rating was used with the same distance to basket calculation as the ball zone, allocating a rating for how close to the basket a player is that is considered to be open.
Total Cost, calculated through the Hungarian Algorithm, is a score related to the sum of overall distance between pairings of players, with a lower cost indicating a better overall structure of the defense
Defenders Ahead is the sum of all defenders that are closer to the basket than the player with the ball, the less defenders ahead of the ball, the more dangerous the offensive team is
Ball Pressure is an indicator of how close defenders are to the player with the ball. Higher pressure causes more mistakes from the offense and limits the player with the ball’s options, negatively impacting the dangerousity of the offense.
The formula gave a score between 0 and 1 for each specific play, with the higher the score indicating that the offensive team is more dangerous to score.
## # A tibble: 6 × 4
## id_play offense_rating defense_rating dangerousity
## <int> <dbl> <dbl> <dbl>
## 1 93077 3.32 3.09 0.557
## 2 109277 2.36 1.59 0.684
## 3 110877 3.93 3.02 0.713
## 4 128757 2.89 3.13 0.441
## 5 180356 3.42 3.08 0.585
## 6 188156 2.36 0.664 0.845
## # A tibble: 1 × 4
## id_play offense_rating defense_rating dangerousity
## <int> <dbl> <dbl> <dbl>
## 1 5680255 3.24 0.987 0.905
## # A tibble: 1 × 4
## id_play offense_rating defense_rating dangerousity
## <int> <dbl> <dbl> <dbl>
## 1 977260 0.574 3.03 0.0788