Dangerosity is determined by four key factors: defensive pressure, offensive pressure, zone of play and the player’s speed and dribble.

Defensive pressure is assessed based on the number of defenders within a 5-foot radius of the player with the ball, as well as the number of players who are positioned in front of the player relative to the hoop. Offensive pressure considers the spatial area between players from each team. Zone of play takes into account the player’s position relative to the and “fast-ness” considers the speed and dribble used by the player.

Variables used for this analysis are explained in Table 1.

Table 1. Summary of variables used in this analysis. Additional variables are marked with (*).
Variable Description
id_play Unique IDs of all plays
dribble 0 or 1 indicator if player has used their dribble
player_id Labels the ball, the player with the ball, the
other 4 attacking players, and the 5 defenders
x X coordinate in feet
y Y coordinate in feet
eed Speed of player with ball in feet per second
team Offense or defense
distance_from_on_ball* Distance between a player and player with the ball
angle_from_hoop* Angle between a player, centre of hoop and player with the ball
chull_area* Area of play by offence or defense team
distance_to_hoop* Distance from player with the ball to the hoop
fast* 0.5 if player has used dribble and speed > 10ft/s. 0 otherwise

We start by importing our data from the dataset provided and create custom functions to avoid repeated lines of code.

# Import data ----
nba <- read.csv('AT2_nba_data.csv', stringsAsFactors = F)

# Functions ----- 
def_angle <- function(x1,y1,x2,y2) {
  d1 <-  sqrt((x1 - x2)^2 + (y1 - y2)^2) # distance 1
  d2 <- sqrt((x1 - 5.25)^2 + (y1 - 25)^2) # distance 2
  d3 <- sqrt((x2 - 5.25)^2 + (y2 - 25)^2) # distance 3
  angle = acos((d1^2 + d2^2 - d3^2) / (2*d1*d2)) * (180/pi) # cosine formula to get angle 
  return(angle)
}

def_chull_area <- function(x,y) {
  chull_rows <- chull(x,y) # get convex hull
  chull_data <- cbind(x[chull_rows], y[chull_rows]) # get coordinates
  area <- areapl(chull_data) # get area within convex hull
  return(area)
}


def_distance <- function(x1,y1, x2, y2) {
  dist <-  sqrt((x1 - x2)^2 + (y1 - y2)^2) # ecludian distance formula
  return(dist)
}

Next, we want to get a visualization of the locations of all players and the ball, overlaid on the court.

# Plot play ----

# The plays in the dataset
unique(nba$id_play)


court <- ggplot() +
    # court
  geom_rect(aes(xmin = 0, xmax = 94, ymin = 0, ymax = 50), alpha = 0, col = 'black')+
  geom_rect(aes(xmin = 0, xmax = 19, ymin = 19, ymax = 31), alpha = 0, col = 'black')+
  geom_rect(aes(xmin = 0, xmax = 19, ymin = 17, ymax = 33), alpha = 0, col = 'black')+
  geom_segment(x=47,xend=47,y=0,yend=50) +
  geom_segment(x=4,xend=4,y=22,yend=28) +
  geom_circle(aes(x0=5.25, y0=25, r=0.75)) +
  geom_circle(aes(x0=47, y0=25, r=6)) +
  geom_circle(aes(x0=19, y0=25, r=6)) +
  geom_segment(aes(y = 3, x = 0, yend = 3, xend = 14)) +   #Left
  geom_segment(aes(y = 47, x = 0, yend = 47, xend = 14)) + #Right
  geom_curve(aes(y = 3, x = 14, yend = 47, xend = 14), curvature = .75,
             angle = 90) + #Curve + 
  #looks
  theme_bw() +
  coord_equal() +
  ggtitle(plot_play)

# Plot a single play:
plot_play <- 993980 

court +
    #players
  geom_point(data = subset(nba, id_play == plot_play), 
             aes(x = x, y = y, col = team), size = 2) 

Defensive Pressure

Figure 1. Calculating angles from each player to player with the ball relative to the hoop.
Figure 1. Calculating angles from each player to player with the ball relative to the hoop.


Defensive pressure is assessed by considering both the number of defenders positioned in front of the player with the ball and the proximity of each defender to that player.

To determine if a player is positioned in front of the player, a line tangent from the centre of the hoop to the player holding the ball is drawn. The angle between the defender and the centre of the hoop, with the player’s (x,y) coordinates serving as the vertex, is then calculated using the cosine rule. If the defender is positioned in front of the player, the angle will be less than 90 degrees. Conversely, if the defender is behind the player, the angle will exceed 90 degrees (Figure 1). A high number of defensive players standing in front of the player leads to a greater defensive pressure. This is on the assumption players in front are better positioned to defend against a scoring opportunity or pass compared to if they were positioned behind.

The distance of each defender from the player with the ball is calculated using the Euclidean distance formula. Under the assumption that proximity increases defensive pressure, we counted the number of defenders within a 5-foot radius of the player on-ball.

Defenders far away from the player or behind them may be indicative of a situation where they are out of position or attempting to catch up with the player. In these instances, defensive pressure is typically lower, leading to an increased level of danger. In contrast, when the defenders are not only close to the player but also positioned in front of them, it becomes more challenging for the player to break free from the defenders to create a scoring opportunity. The player may be forced to pass the ball out to teammates at the three-point line, decreasing the level of danger.

Defensive pressure is modelled with the equation:

Different weights were assigned to the three components, with a greater emphasis placed on the number of close defenders who are also in front of the player. We chose to divide the number of close defenders in front by 3 on the assumption that it is not ideal for the all the defenders to surround the player on-ball, leaving the other four players unmarked.

# Defensive Pressure ----

dist_treshold <- 5 # set threshold for close defenders (5foot in this case)

def_pressure <- nba %>%
  group_by(id_play) %>%
  mutate(
    ball_x = x[player_id == "on_ball"], # get x coordinates of player on ball
    ball_y = y[player_id == "on_ball"], # get y coordinate of player on ball
  ) %>%
  summarise(
    distance_from_on_ball = def_distance(x, y, ball_x, ball_y), # get distance bet. player and player on ball
    angle_btw_hoop_players = def_angle(ball_x, ball_y,x,y), # get angle between hoop, player and player on ball
    count_close = sum(team == 'def' & distance_from_on_ball <= dist_treshold), # count number of defense within distance treshold 
    count_front = sum(team == 'def' & angle_btw_hoop_players < 90), # count number of defense in front
    count_front_close = sum(team == 'def' & distance_from_on_ball <= 5 & angle_btw_hoop_players < 90),
    def_pressure = 0.15*(count_close/5) + 0.15*(count_front/5) + 0.7*(count_front_close/3)) %>% # def pressure formula
  slice(1) # def pressure / play 0 - 1

Offensive Pressure

Offensive pressure considers the area of play of the offence team relative to the defense team.

Figure 2. Area of play marked by the defence team in green and offence team in blue.
Figure 2. Area of play marked by the defence team in green and offence team in blue.


The court coverage for each play’s offensive and defensive team is determined by calculating the area between each player’s position. The difference between these values reflects the effectiveness of offensive positioning in covering more ground while squeezing the defense into a smaller area.

To address the situation where a perfectly positioned man-on-man defense, i.e. area of offence and defense are exactly the same, is not an ideal situation, we assigned a weight of 0.85 to the offensive area. This adjustment assumes that the level of danger should not be 0 in such instances, as having all players on defense marking their players perfectly might be risky and will impact the level of danger.

Additionally, we took the absolute value to address scenarios where the area of defense is larger than the area of offense. In such cases, the defense may find themselves being pushed farther out and become more spread apart, allowing the offensive team to dominate the area within. These situations are more dangerous and should lead to a reduction in the overall score.

Offensive pressure is modeled with the equation:

# Offensive Pressure ----

# get chull area by team and play (0-1)
off_pressure <- nba %>% 
  group_by(id_play, team) %>% # area of defense and offense team 
  summarise(chull_area = def_chull_area(x,y)) %>% 
  pivot_wider(names_from = team, values_from = chull_area) %>% # pivot wide for mutate function 
  mutate(chull_score = abs((0.85*off - def) * (1/1000))) # chull forumla 


Zone

Figure 3. Level of danger as distance from hoop increases.
Figure 3. Level of danger as distance from hoop increases.


Distance from hoop refers to the player’s proximity to the centre of the hoop and is calculated using the Euclidean distance formula, with hoop’s centre coordinates fixed at (5.25,25). We found an inverse relationship between distance from hoop and the level of danger. Not considering any other influencing factors and assuming that there are no defensive players in the way, the level of danger is the highest when the player with the ball is positioned just under the hoop. In these situations, the player is able to attempt a high percentage or execute a fast break, assuming that accuracy is the highest closest to the hoop. As distance from hoop increases, the level of danger gradually decreases, on the assumption that accuracy usually decreases further from the hoop. The team will need to make more moves to bring the ball closer to the hoop for a scoring opportunity. Another assumption with this equation is that the level of danger will not reach 0 even as distance from hoop increases and we have decided not to ignore the 1% chance that the player manages to score a goal out of pure luck. 

We model this by with the equation:

where x is the distance between the player on ball and the centre of the hoop.

Fast

When considering the “fast-ness of a player”, we consider both the player’s speed and whether the player is still actively dribbling the ball. This is on the assumption that the indicator “dribble = 1” indicates that the player’s dribble has ended. A “fast” player increases the level of danger. For instance, if there are no defenders in the way and the offensive player is sprinting at a speed of 16ft/s while maintaining control of the dribble, it is very likely that he would be able to quickly reach the basket for a scoring attempt without encountering any steals or blocks by the opponents. In such scenarios, the level of danger is elevated.

# zone of play ----
dist_toHoop <- nba %>% 
  group_by(id_play) %>%  
  summarise(dist_toHoop = def_distance(x[player_id == 'on_ball'], y[player_id == 'on_ball'], 5.25, 25)) %>% # get distance with function
  mutate(dist_toHoop_score = 0.5^dist_toHoop) # formula 

# get fast scores ----
fast  <- nba %>% 
  filter(player_id == 'on_ball') %>% 
  group_by(id_play) %>% 
  summarise(fast = ifelse(dribble == 0 & speed >= 10, 0.5, 0)) # fast if dribble not finished and speed more than 10

# dangerosity score -----
dangerosity <- off_pressure %>% 
  right_join(dist_toHoop, by = 'id_play') %>% # combine dist score with off pressure by play id
  right_join(def_pressure, by = 'id_play') %>% # combine defensive pressure 
  right_join(fast, by = 'id_play')  %>% # combine fast scores
  select(chull_score, dist_toHoop_score, def_pressure, fast) %>% # select the three components
  mutate(dangerosity = (0.25*chull_score+ 0.4*dist_toHoop_score - 0.25*def_pressure + 0.1*fast + 0.5)) # combine to get dangerosity scores for each play


Dangerosity

Bringing all these components together, we define the dangerosity formula with:

The higher scoring probability that comes with being close to the hoop is accounted for in the equation with a higher weightage given to zone. The lowest weightage is given to fast although it might be a factor for consideration, it may not be the most important in most situations relative to offensive and defensive pressure. We add a constant value (0.5) to get the values above 0 for easier comparison.

# dangerosity score -----
dangerosity <- off_pressure %>% 
  right_join(dist_toHoop, by = 'id_play') %>% # combine dist score with off pressure by play id
  right_join(def_pressure, by = 'id_play') %>% # combine defensive pressure 
  right_join(fast, by = 'id_play')  %>% # combine fast scores
  select(chull_score, dist_toHoop_score, def_pressure, fast) %>% # select the three components
  mutate(dangerosity = (0.25*chull_score+ 0.4*dist_toHoop_score - 0.25*def_pressure + 0.1*fast + 0.5)) # combine to get dangerosity scores for each play

Test results

# Plot a single play:
plot_play <- 3421490 

court +
    #players
  geom_point(data = subset(nba, id_play == plot_play), 
             aes(x = x, y = y, col = team), size = 2) 

In this play, the player with the ball is just under the hoop with the defenders positioned behind him. This suggests that the player has an opportunity to score a goal with little pressure from the defense team. This play had a dangerosity score of 0.92, which well captures the high level of danger in this situation.

# Plot a single play:
plot_play <- 966140 

court +
    #players
  geom_point(data = subset(nba, id_play == plot_play), 
             aes(x = x, y = y, col = team), size = 2) 

In contrast, the player with the ball here is closely guarded with defenders directly in front of him within a close distance. The player is thus forced to pass out to the players on the three-point line to reset the play. This resulted in a lower dangerosity score of 0.33.