Calculating the Dangerousity of Attacking Plays in the NBA

Part 1: Purpose & Libraries

Dangerousity is a real-time measure that quantifies how likely a team is to score at any given moment based on player positioning, ball control, defensive pressure, and spatial dynamics. Rather than focusing on isolated events like shots or turnovers, it captures the continuous flow of the game and evaluates how threatening an offensive situation is as it unfolds. This kind of metric, though developed in football (soccer), has strong potential applications in the NBA, where player movement, spacing, and defensive reactions play a crucial role in creating scoring opportunities. By adapting the concept to basketball’s faster pace and smaller playing area, dangerousity could help coaches, analysts, and even broadcasters assess the effectiveness of offensive plays in real time and better understand how space and pressure shape scoring chances.

The aim of this code is to build upon previous research of dangerousity and apply this to the context of the NBA. This additional development will come through a newly proposed formula and its application to NBA test data. First, we’ll install the necessary libraries.

library(dplyr)
library(ggplot2)

Part 2: Loading and Initialising Data

Data Sources

I was given play data that showed the x and y positions of each player and the ball, how fast each were travelling and when it the game the play occured. I also created new fields, including how close each defender is to the player with the ball, so I could use triangulation to estimate whether passing lanes are blocked or not.

nba <- read.csv("AT2_nba_data.csv", stringsAsFactors = F)

names(nba)

# The plays in the dataset
unique(nba$id_play)

# Plot a single play:
plot_play <- 3421490

ggplot(data = subset(nba, id_play == plot_play)) +
  # court
  geom_rect(aes(xmin = 0, xmax = 94, ymin = 0, ymax = 50), alpha = 0, col = 'black')+
  geom_rect(aes(xmin = 0, xmax = 19, ymin = 19, ymax = 31), alpha = 0, col = 'black')+
  geom_rect(aes(xmin = 0, xmax = 19, ymin = 17, ymax = 33), alpha = 0, col = 'black')+
  geom_segment(x=47,xend=47,y=0,yend=50) +
  geom_segment(x=4,xend=4,y=22,yend=28) +
  geom_point(aes(x = x, y = y, col = team), size = 2) +
  #looks
  theme_bw() +
  coord_equal() +
  ggtitle(plot_play)

# calculate the distance from each player to the centre of the free-throw line
nba <- nba %>%
  group_by(id_play) %>%
  mutate(
    dist_toFreeThrow = sqrt( (x-21)^2 + (y-25)^2 )
  )

# summarise each play based on a few factors
nbaplaydata <- nba %>%
  group_by(id_play) %>%
  summarise(
    d_deepest_def = min(sqrt((x[team == 'def']-5.25)^2 + (y[team == 'def']-25)^2)), # distance to basket 
    dribble = dribble[1] # has the player used their dribble
  )

Dangerousity Formula

The dangerousity formula proposed in the paper “Real Time Quantification of Dangerousity in Football Using Spatiotemporal Tracking Data” by Link et al. suggests using four main components: Zone (the danger of a player scoring from their spatial position), Control (how well the player can implement their skill based on ball dynamics), Pressure (possibility of defending team stopping the attack), and Density (chance of defending the ball after the action). Shifting this to target basketball, which is on a much smaller pitch than football, the formulas will be altered, and Density will be replaced with Space. Zone and Control are increasing factors, while Pressure and Space are decreasing. All values are between 0 and 1.

# constants

k1 <- 2.5
k2 <- 1/350
rim_x <- 4
rim_y <- 25

dist_2d <- function(x1, y1, x2, y2) sqrt((x1 - x2)^2 + (y1 - y2)^2)

zone <- function(x, y) {
  d <- dist_2d(x, y, rim_x, rim_y)
  exp(-d / 20)
}

control <- function(speed, dist_rim, dribble) {
  if (is.na(speed)) return(0)
  if (dist_rim < 14 && speed > 10) {
    return(1)
  } else {
    co <- (1 - k2 * (speed^2))
    return(pmax(pmin(co, 1), 0))
  }
}

pressure <- function(defender_dists) {
  valid <- defender_dists[!is.na(defender_dists) & defender_dists <= 5]
  n <- length(valid)
  if (n == 0) return(0)
  pr <- exp(n - sum(valid) / 5) / 10
  return(pmin(pr, 1))
}

space <- function(pwb, attackers, defenders, dist_rim_pwb) {
  # if no attackers or defenders, full space (no blockage)
  if (nrow(attackers) == 0 || nrow(defenders) == 0) return(1)
  
  # find attackers closer to the rim
  attackers <- attackers %>%
    mutate(dist_rim_att = dist_2d(x, y, rim_x, rim_y)) %>%
    filter(dist_rim_att < dist_rim_pwb)
  
  if (nrow(attackers) == 0) return(0)
  
  heights <- c()
  for (i in 1:nrow(attackers)) {
    for (j in 1:nrow(defenders)) {
      a <- dist_2d(attackers$x[i], attackers$y[i], defenders$x[j], defenders$y[j])
      b <- dist_2d(attackers$x[i], attackers$y[i], pwb$x, pwb$y)
      c <- dist_2d(defenders$x[j], defenders$y[j], pwb$x, pwb$y)
      s <- (a + b + c) / 2
      area <- sqrt(pmax(s * (s - a) * (s - b) * (s - c), 0))
      h <- ifelse(b != 0, 2 * area / b, 0)
      heights <- c(heights, h)
    }
  }
  h_min <- min(heights)
  sp <- (6 - h_min) / 4
  return(pmax(pmin(sp, 1), 0))
}

# main function

calculate_dangerousity <- function(df) {
  df %>%
    group_by(id_play, quarter, game_clock) %>%
    group_modify(~{
      on_ball <- .x %>% filter(player_id == "on_ball")
      defenders <- .x %>% filter(team == "def")
      attackers <- .x %>% filter(team == "off" & player_id != "on_ball")
      
      if (nrow(on_ball) == 0) return(NULL)
      
      dist_rim_pwb <- dist_2d(on_ball$x, on_ball$y, rim_x, rim_y)
      z <- zone(on_ball$x, on_ball$y)
      c <- control(on_ball$speed, dist_rim_pwb, on_ball$dribble)
      p <- pressure(defenders$defender_on)
      s <- space(on_ball, attackers, defenders, dist_rim_pwb)
      
      da <- z * (1 - ((1 - c + p + s) / k1))
      
      tibble(
        id_play = .x$id_play[1],
        game_clock = on_ball$game_clock,
        ZO = z,
        CO = c,
        PR = p,
        SP = s,
        DA = da
      )
    }) %>%
    ungroup()
}

# run model
nba_danger <- calculate_dangerousity(nba)
head(nba_danger)

Notes

While the proposed dangerousity framework provides a novel adaptation of spatial-temporal quantification to basketball, several limitations exist. First, the model is constructed on a two-dimensional plane, meaning it neglects the vertical dynamics of the game — such as shot arcs, defender jumping reach, and ball height during passes — which are highly influential in real scenarios. Second, the formula assumes perfect tracking data accuracy and instantaneous updates in player positions and speeds. In practice, optical tracking systems introduce small but meaningful errors and frame delays that could distort calculated values of control, pressure, and space. Third, simplifications were made to preserve model tractability, such as assuming binary control states and fixed radii for defensive influence, which may not fully capture nuanced human decision-making. Fourth, constants such as k₁ and k₂ were selected theoretically rather than empirically calibrated against observed scoring outcomes. Future work should focus on calibrating these parameters using historical possession-level event data and evaluating predictive validity. Finally, psychological and contextual factors — player fatigue, game state, or team strategy — are not included but could significantly influence offensive “dangerousity” in real NBA play.