Dangerousity is the representation of how likely a team is to score from a certain situation [1]. So you can create a formula taking in any number of parameters based upon the sport you are looking at, create a formula for dangerousity and apply it to freeze frames from a match and apply your formula.
I created a formula for basketball, using spatiotemperal data from the NBA. In my paper, the higher the dangerousity score, the more likely a team is to score from that situation. Below is the paper I wrote with the explanation of my formula for basketball dangerousity and the code created to calculate it.
Overall Formula:
Dangerousity = (Player with the ball distance from basket score * (Velocity of player with the ball + openness score of offensive team + use of dribble score)) / pressure on ball Below I will explain and outline the rationale behind each component of the Dangerousity formula.
Parts of formula:
Player with the ball distance from basketball score
How it is calculated: 50 - Distance player with the ball is away from the basket. The calculation of 50 - Distance player with the ball is away from the basket is used so that the closer the player with the ball is to the basket, the higher the score for this part of the formula is. If this step wasn’t taken, the closer a player is to the basket, the lower their score would be. The use of 50 is somewhat of an arbitrary number, you could use a different number for this calculation. 50 was used as we are only using the front half of the NBA court for our calculations, which is 47 feet, so using 50 means there will be no negative scores for this part of the formula. Rationale for use in formula: The closer the player with the ball is to the basket, the more dangerous they are. They are more dangerous because players generally shoot higher percentages from closer to the basket, as well as getting more defenders to come off their own player and come and help, which also leaves other players on the court more open. This could lead to them receiving a pass and taking an open shot, which is more likely to go in than a contested shot.
Velocity of player with the ball:
How it is calculated: The speed of the player with the ball is taken from the dataset and placed directly into this part of the formula. Rationale for use in formula: A player who has the ball travelling at a higher velocity can be more dangerous as they can be more difficult to stop. This part of the formula doesn’t create the overall Dangerousity score to fluctuate too much as velocity isn’t one of the largest deciding factors in whether a play is dangerous or not as plays can still be dangerous if a player has a low velocity (ca still make a good pass, take a good shot), but it is still included as a small factor as it can lead to a play becoming slightly more dangerous.
Openness Score of offensive team:
How it is calculated: First, I calculated an openness score for each individual player in a play and then calculated the overall openness score for the team. For the individual player: To be considered ‘open’, the closest defender must be 10 feet or more away from the offensive player. The reason I chose 10 feet was that the NBA classifies a player as open when they are shooting if a defender is within 4-6 feet[2]. But for my calculation I am considering if a player without the ball is open, so I added some feet and made it 10 feet to allow for the time the ball may take to get to the open player and that the defender might then be able to close the distance between themselves and the open player. Calculation for individual player’s openness: Whether they are open or not (1 for open, 0 for not open) * (Distance from Basket Score + Distance from ball score) Distance from basket score calculation: 50 – Distance from basket Distance from ball score: 50 – Distance from player with the ball Calculation for openness score of offensive team: Add openness score for each individual offensive player in the team / number of players open in the play. This gives an average of the openness score of the players that are open in the play, which is then used in the Dangerousity formula. Rationale for use in formula: Players being open in a play makes the play more dangerous because if they get the ball, they have a higher percentage chance of receiving the ball and then being able to make a shot, compared with players who aren’t open. The open player’s distance from the basket score and distance from the player with the ball score are used as the closer the open player is to the basket, the more likely they are to score. The closer they are to the player with the ball enhances the likelihood that the player with the ball will be able to pass it to them.
Use of dribble score:
How it is calculated: In the dataset we know if the player has used their dribble or not. If they haven’t used their dribble yet, the use of dribble score is 5, if they have used their dribble, the use of dribble score is 0. Rationale for use in formula: Use of dribble isn’t a key factor in deciding whether a play is dangerous or not as a play can still be quite dangerous even if a player has used their dribble, because they may have used it to get past players, advance the ball etc. However, when other areas of the game included in this formula are factored in, having not used your dribble may add a small amount of danger to the play because you could then use it to get past defenders, get closer to the basket etc, so it has a small value that can bump the Dangerousity score up.
Pressure on the ball score:
How it is calculated: I calculated how close the closest defender was to the player with the ball. If the defender was within two feet, the pressure score is five. If the defender was between two and four feet from the defender, the pressure score is two. If the defender was four feet or more away from the defender, the pressure score was one as they are under no pressure. I used one instead of zero as in the formula the Dangerousity score is divided by the pressure score, and you can’t divide anything by zero. So, with the pressure score of 1, the Dangerousity score doesn’t change, so the ball handler being under no pressure doesn’t negatively affect the Dangerousity score. Rationale for use in formula: If the player with the ball is under more pressure with a defender close to them, the danger of the play can be diminished as even if players are open or they are closer to the basket, the pressure of the close defender makes it harder to score, so I used the pressure score to scale back to Dangerousity of plays where the ball handler was closely defended.
Extensions to the formula:
Below are some possible extensions that could have been added to the formula if the data was available:
Extension 1: Player with the ball If it was known who the player was with the ball, a rating system of their ability could have been applied and added to the formula. For example, if Chris Paul was the player with the ball on the perimeter with open players the play would be a lot more dangerous than if it was Steven Adams in the same scenario. Using the same two players, consider a play where a player has the ball under the ring with heavy pressure. The play could then become more dangerous if Steven Adams has the ball because he shoots from a higher percentage under the rim with pressure than Chris Paul does. So, if the player with the ball was known, a weighting could have been calculated based on the players passing and shooting skills (maybe from the latest 2K game or from shooting and assists statistics) and added to the formula. Extension 2: Player who is open Consider a play where there is an open player in the corner on the three-point line, sounds like a dangerous play. What if I told you that player was Andre Drummond, whose career three-point percentage is 12.8%. This play now doesn’t seem as dangerous. On the other hand, if that player was Seth Curry, whose career three-point percentage is 43.5%, this play now becomes a lot more dangerous. So, knowing who the open player is and their shooting percentage from that area could also have been used as a weighting in the formula if known. Extension 3: Numbers advantage Having more offensive players in the play than defensive (such as on a fast break) can be a huge advantage and could have been used as a weighting in the formula. However, in the dataset there was only one example of this, so I didn’t believe it to be enough to test if the weighting in the formula would work or not. Extension 4: Number of players between the player with the ball and an open player If I had more time or had been able to work out the code quicker, I could have added to the formula the number of defenders in between the player with the ball and an open player, using a rectangular shape between the player with the ball and the open player to define a space and then seeing how many defenders were in that space. More defenders in that space would make the play less dangerous as it would be harder to get the pass to the open player. The same logic could be applied to the player with the ball and how many defenders are between them and the basket, looking at driving lanes.
[1] Link D, Lang S, Seidenschwarz P. Real Time Quantification of Dangerousity in Football Using Spatiotemporal Tracking Data. PLoS One. 2016 Dec 30;11(12):e0168768. doi: 10.1371/journal.pone.0168768. PMID: 28036407; PMCID: PMC5201291.
[2] “NBA Contested Shooting Analysis” [Internet]. [Accessed 2023 Nov 2]. Available from: https://jeremylu43.github.io/nba_contested_shooting/#:~:text=For%20those%20unfamiliar%2C%20NBA.com,Tight%20%2D%202%2D4%20Feet
Please note: to execute this code you will need to replace my data with your own data. It will need to have game id’s for each play with x and y coordinates for each player, which player has the ball and the velocity of the player with the ball, along with if the player has used their dribble or not
# **********************************************************
# NBA
# **********************************************************
#### libraries required
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
library(ggforce)
## Warning: package 'ggforce' was built under R version 4.2.3
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.2.3
#### this library is to calculate distances from one point to another
library("spatstat.geom")
## Warning: package 'spatstat.geom' was built under R version 4.2.3
## Loading required package: spatstat.data
## Warning: package 'spatstat.data' was built under R version 4.2.3
## spatstat.geom 3.2-5
#### read in data
nba <- read.csv('AT2_nba_data.csv', stringsAsFactors = F)
#### create list of numbers of unique plays for loops
unique_plays <- unique(nba$id_play)
#### create plots to visualise plays
#### uncomment this code to see plots of individual plays
# for(i in unique_plays) {
# subset_data <- subset(nba, id_play == i)
# print(summary(subset_data))
#
# p <- ggplot(data = subset(nba, id_play == i)) +
# # court
# geom_rect(aes(xmin = 0, xmax = 94, ymin = 0, ymax = 50), alpha = 0, col = 'black')+
# geom_rect(aes(xmin = 0, xmax = 19, ymin = 19, ymax = 31), alpha = 0, col = 'black')+
# geom_rect(aes(xmin = 0, xmax = 19, ymin = 17, ymax = 33), alpha = 0, col = 'black')+
# geom_segment(x=47,xend=47,y=0,yend=50) +
# geom_segment(x=4,xend=4,y=22,yend=28) +
# geom_circle(aes(x0=5.25, y0=25, r=0.75)) +
# geom_circle(aes(x0=47, y0=25, r=6)) +
# geom_circle(aes(x0=19, y0=25, r=6)) +
# #players
# geom_point(aes(x = x, y = y, col = team), size = 2) +
# #looks
# theme_bw() +
# coord_equal() +
# ggtitle(paste('Play', i))
#
# (print(p)
#
# }
#### calculate distance from player to basket
nba <- nba %>%
group_by(id_play) %>%
mutate(
dist_toBasket = sqrt( (x-5.25)^2 + (y-25)^2 )
)
#### Calculate Distance from basket score for dangerousity formula
nba <- nba %>%
mutate(DFBscore = 50 - dist_toBasket)
#### create empty dataframe for distances
distancetable = data.frame(id_play=numeric(0),player_id=character(0),distance=numeric(0))
#### Calculate distances from each offensive player to each defensive player,
#### work out which one is closest for each offensive player,
#### collate all that data into one df and then join to nba df
for(i in unique_plays){
gamei <- nba %>%
filter(id_play == i)
off1 <- gamei %>%
filter(player_id == 'off1')
x1 <- off1$x
y1 <- off1$y
def1 <- gamei %>%
filter(player_id == 'def1')
x2 <- def1$x
y2 <- def1$y
dist1 = crossdist(x1, y1, x2, y2)
def2 <- gamei %>%
filter(player_id == 'def2')
x2 <- def2$x
y2 <- def2$y
dist2 = crossdist(x1, y1, x2, y2)
def3 <- gamei %>%
filter(player_id == 'def3')
x2 <- def3$x
y2 <- def3$y
dist3 = crossdist(x1, y1, x2, y2)
def4 <- gamei %>%
filter(player_id == 'def4')
x2 <- def4$x
y2 <- def4$y
dist4 = crossdist(x1, y1, x2, y2)
def5 <- gamei %>%
filter(player_id == 'def5')
x2 <- def5$x
y2 <- def5$y
dist5 = crossdist(x1, y1, x2, y2)
id_play <- c(i, i, i, i, i)
player_id <- c('off1', 'off1', 'off1', 'off1', 'off1')
distance <- c(dist1, dist2, dist3, dist4, dist5)
df1 <- data.frame(id_play, player_id, distance)
df1 <- df1 %>%
arrange(distance) %>%
slice(1)
off2 <- gamei %>%
filter(player_id == 'off2')
x1 <- off2$x
y1 <- off2$y
def1 <- gamei %>%
filter(player_id == 'def1')
x2 <- def1$x
y2 <- def1$y
dist6 = crossdist(x1, y1, x2, y2)
def2 <- gamei %>%
filter(player_id == 'def2')
x2 <- def2$x
y2 <- def2$y
dist7 = crossdist(x1, y1, x2, y2)
def3 <- gamei %>%
filter(player_id == 'def3')
x2 <- def3$x
y2 <- def3$y
dist8 = crossdist(x1, y1, x2, y2)
def4 <- gamei %>%
filter(player_id == 'def4')
x2 <- def4$x
y2 <- def4$y
dist9 = crossdist(x1, y1, x2, y2)
def5 <- gamei %>%
filter(player_id == 'def5')
x2 <- def5$x
y2 <- def5$y
dist10 = crossdist(x1, y1, x2, y2)
id_play <- c(i, i, i, i, i)
player_id <- c('off2', 'off2', 'off2', 'off2', 'off2')
distance <- c(dist6, dist7, dist8, dist9, dist10)
df2 <- data.frame(id_play, player_id, distance)
df2 <- df2 %>%
arrange(distance) %>%
slice(1)
off3 <- gamei %>%
filter(player_id == 'off3')
x1 <- off3$x
y1 <- off3$y
def1 <- gamei %>%
filter(player_id == 'def1')
x2 <- def1$x
y2 <- def1$y
dist11 = crossdist(x1, y1, x2, y2)
def2 <- gamei %>%
filter(player_id == 'def2')
x2 <- def2$x
y2 <- def2$y
dist12 = crossdist(x1, y1, x2, y2)
def3 <- gamei %>%
filter(player_id == 'def3')
x2 <- def3$x
y2 <- def3$y
dist13 = crossdist(x1, y1, x2, y2)
def4 <- gamei %>%
filter(player_id == 'def4')
x2 <- def4$x
y2 <- def4$y
dist14 = crossdist(x1, y1, x2, y2)
def5 <- gamei %>%
filter(player_id == 'def5')
x2 <- def5$x
y2 <- def5$y
dist15 = crossdist(x1, y1, x2, y2)
id_play <- c(i, i, i, i, i)
player_id <- c('off3', 'off3', 'off3', 'off3', 'off3')
distance <- c(dist11, dist12, dist13, dist14, dist15)
df3 <- data.frame(id_play, player_id, distance)
df3 <- df3 %>%
arrange(distance) %>%
slice(1)
off4 <- gamei %>%
filter(player_id == 'off4')
x1 <- off4$x
y1 <- off4$y
def1 <- gamei %>%
filter(player_id == 'def1')
x2 <- def1$x
y2 <- def1$y
dist16 = crossdist(x1, y1, x2, y2)
def2 <- gamei %>%
filter(player_id == 'def2')
x2 <- def2$x
y2 <- def2$y
dist17 = crossdist(x1, y1, x2, y2)
def3 <- gamei %>%
filter(player_id == 'def3')
x2 <- def3$x
y2 <- def3$y
dist18 = crossdist(x1, y1, x2, y2)
def4 <- gamei %>%
filter(player_id == 'def4')
x2 <- def4$x
y2 <- def4$y
dist19 = crossdist(x1, y1, x2, y2)
def5 <- gamei %>%
filter(player_id == 'def5')
x2 <- def5$x
y2 <- def5$y
dist20 = crossdist(x1, y1, x2, y2)
id_play <- c(i, i, i, i, i)
player_id <- c('off4', 'off4', 'off4', 'off4', 'off4')
distance <- c(dist16, dist17, dist18, dist19, dist20)
df4 <- data.frame(id_play, player_id, distance)
df4 <- df4 %>%
arrange(distance) %>%
slice(1)
onball <- gamei %>%
filter(player_id == 'on_ball')
x1 <- onball$x
y1 <- onball$y
def1 <- gamei %>%
filter(player_id == 'def1')
x2 <- def1$x
y2 <- def1$y
dist21 = crossdist(x1, y1, x2, y2)
def2 <- gamei %>%
filter(player_id == 'def2')
x2 <- def2$x
y2 <- def2$y
dist22 = crossdist(x1, y1, x2, y2)
def3 <- gamei %>%
filter(player_id == 'def3')
x2 <- def3$x
y2 <- def3$y
dist23 = crossdist(x1, y1, x2, y2)
def4 <- gamei %>%
filter(player_id == 'def4')
x2 <- def4$x
y2 <- def4$y
dist24 = crossdist(x1, y1, x2, y2)
def5 <- gamei %>%
filter(player_id == 'def5')
x2 <- def5$x
y2 <- def5$y
dist25 = crossdist(x1, y1, x2, y2)
id_play <- c(i, i, i, i, i)
player_id <- c('on_ball', 'on_ball', 'on_ball', 'on_ball', 'on_ball')
distance <- c(dist21, dist22, dist23, dist24, dist25)
df5 <- data.frame(id_play, player_id, distance)
df5 <- df5 %>%
arrange(distance) %>%
slice(1)
df <- bind_rows(df1, df2, df3, df4, df5)
distancetable <- bind_rows(distancetable, df)
}
#### join distancetable to nba df
nba <- left_join(nba, distancetable, by = c('id_play' = 'id_play', 'player_id' = 'player_id'))
#### for each off player, 1 for open, 0 for not open
nba <- nba %>%
mutate(open = ifelse(distance > 10, 1, 0))
#### create empty df for disance from ball for each offensive player
distancefromballtable = data.frame(id_play=numeric(0),player_id=character(0),distancefromball=numeric(0))
#### Calculate distance from player with ball of each offensive player
for(i in unique_plays){
gamei <- nba %>%
filter(id_play == i)
onball <- gamei %>%
filter(player_id == 'on_ball')
x1 <- onball$x
y1 <- onball$y
off1 <- gamei %>%
filter(player_id == 'off1')
x2 <- off1$x
y2 <- off1$y
dist1 = crossdist(x1, y1, x2, y2)
off2 <- gamei %>%
filter(player_id == 'off2')
x2 <- off2$x
y2 <- off2$y
dist2 = crossdist(x1, y1, x2, y2)
off3 <- gamei %>%
filter(player_id == 'off3')
x2 <- off3$x
y2 <- off3$y
dist3 = crossdist(x1, y1, x2, y2)
off4 <- gamei %>%
filter(player_id == 'off4')
x2 <- off4$x
y2 <- off4$y
dist4 = crossdist(x1, y1, x2, y2)
id_play <- c(i, i, i, i)
player_id <- c('off1', 'off2', 'off3', 'off4')
distancefromball <- c(dist1, dist2, dist3, dist4)
df <- data.frame(id_play, player_id, distancefromball)
distancefromballtable <- bind_rows(distancefromballtable, df)
}
#### Join results from above loop to nba dataset
nba <- left_join(nba, distancefromballtable, by = c('id_play' = 'id_play', 'player_id' = 'player_id'))
#### Calculating openness score
#### Caluclating Distance from ball score
nba <- nba %>%
mutate(distancefromballscore = 50 - distancefromball)
#### Calculate openness score for individual players
nba <- nba %>%
mutate(openness = open * (DFBscore + distancefromballscore))
#### Calculating pressure on ball score
pressure <- nba %>%
filter(player_id == 'on_ball') %>%
mutate(PressureScore = ifelse(distance < 2, 5,
ifelse(distance >=2 & distance < 4, 2, 1)))
pressure <- pressure %>%
select(c('id_play', 'player_id', 'PressureScore'))
#### join pressure score to nba dataset
nba <- left_join(nba, pressure, by = c('id_play' = 'id_play', 'player_id' = 'player_id'))
#### exclude players who aren't in the front half of the court
nba <- nba %>%
filter(x <= 47)
#### Get distance from basket for only player with ball and dribble score for if player has dribbled or not
PWBDFBS <- nba %>%
filter(player_id == 'on_ball') %>%
mutate(PWBDFBS = DFBscore) %>%
mutate(DribbleScore = ifelse(dribble == 0, 5, 0)) %>%
select(c(id_play, player_id, PWBDFBS, DribbleScore))
#### Join this back to main data set
nba <- left_join(nba, PWBDFBS, by = c('id_play' = 'id_play', 'player_id' = 'player_id'))
#### All NA values to 0 to easily sum values needed in summarise
nba[is.na(nba)] <- 0
# summarise each play based on factors needed for formula
nbaplaydata <- nba %>%
group_by(id_play) %>%
summarise(
PWBDFBS = sum(PWBDFBS),
VelocityofPlayer = sum(speed),
OpenPlayers = sum(open),
UseofDribble = sum(DribbleScore),
SumofOpenness = sum(openness),
PressureonBall = sum(PressureScore)
)
#### Create final numbers for offensive team openness
nbaplaydata <- nbaplaydata %>%
mutate(Openness = SumofOpenness / OpenPlayers)
#### Replace NaN values created by dividing by 0 in openness calculations
nbaplaydata <- nbaplaydata %>%
mutate(
across(everything(), ~replace_na(.x, 0))
)
#### Create Dangerousity score using formula created
nbaplaydata <- nbaplaydata %>%
mutate(dangerousity = (PWBDFBS * (VelocityofPlayer + Openness + UseofDribble)) / PressureonBall)
#### Creating Dangerousity table with only playid and Dangerousity score
dangerousitytable <- nbaplaydata %>%
select(c('id_play', 'dangerousity'))
#### Arrange table from highest to lowest Dangerousity score
dangerousitytable <- dangerousitytable[order(dangerousitytable$dangerousity ,decreasing = TRUE),]
#### Plot of highest and lowest dangerousity play score
#### highest and lowest dangerousity plays
playstoplot <- c(2897581, 8915350)
#### uncomment this code to plot highest and lowest dangerousity plays
# for(i in playstoplot) {
# subset_data <- subset(nba, id_play == i)
# print(summary(subset_data))
#
# p <- ggplot(data = subset(nba, id_play == i)) +
# # court
# geom_rect(aes(xmin = 0, xmax = 94, ymin = 0, ymax = 50), alpha = 0, col = 'black')+
# geom_rect(aes(xmin = 0, xmax = 19, ymin = 19, ymax = 31), alpha = 0, col = 'black')+
# geom_rect(aes(xmin = 0, xmax = 19, ymin = 17, ymax = 33), alpha = 0, col = 'black')+
# geom_segment(x=47,xend=47,y=0,yend=50) +
# geom_segment(x=4,xend=4,y=22,yend=28) +
# geom_circle(aes(x0=5.25, y0=25, r=0.75)) +
# geom_circle(aes(x0=47, y0=25, r=6)) +
# geom_circle(aes(x0=19, y0=25, r=6)) +
# #players
# geom_point(aes(x = x, y = y, col = team), size = 2) +
# #looks
# theme_bw() +
# coord_equal() +
# ggtitle(paste('Play', i))
#
# (print(p)
#
# }