Dangerosity is determined by four key factors: defensive pressure, offensive pressure, zone of play and the player’s speed and dribble.
Defensive pressure is assessed based on the number of defenders within a 5-foot radius of the player with the ball, as well as the number of players who are positioned in front of the player relative to the hoop. Offensive pressure considers the spatial area between players from each team. Zone of play takes into account the player’s position relative to the and “fast-ness” considers the speed and dribble used by the player.
Variables used for this analysis are explained in Table 1.
| Variable | Description |
|---|---|
| id_play | Unique IDs of all plays |
| dribble | 0 or 1 indicator if player has used their dribble |
| player_id | Labels the ball, the player with the ball, the |
| other 4 attacking players, and the 5 defenders | |
| x | X coordinate in feet |
| y | Y coordinate in feet |
| eed | Speed of player with ball in feet per second |
| team | Offense or defense |
| distance_from_on_ball* | Distance between a player and player with the ball |
| angle_from_hoop* | Angle between a player, centre of hoop and player with the ball |
| chull_area* | Area of play by offence or defense team |
| distance_to_hoop* | Distance from player with the ball to the hoop |
| fast* | 0.5 if player has used dribble and speed > 10ft/s. 0 otherwise |
We start by importing our data from the dataset provided and create custom functions to avoid repeated lines of code.
# Import data ----
nba <- read.csv('AT2_nba_data.csv', stringsAsFactors = F)
# Functions -----
def_angle <- function(x1,y1,x2,y2) {
d1 <- sqrt((x1 - x2)^2 + (y1 - y2)^2) # distance 1
d2 <- sqrt((x1 - 5.25)^2 + (y1 - 25)^2) # distance 2
d3 <- sqrt((x2 - 5.25)^2 + (y2 - 25)^2) # distance 3
angle = acos((d1^2 + d2^2 - d3^2) / (2*d1*d2)) * (180/pi) # cosine formula to get angle
return(angle)
}
def_chull_area <- function(x,y) {
chull_rows <- chull(x,y) # get convex hull
chull_data <- cbind(x[chull_rows], y[chull_rows]) # get coordinates
area <- areapl(chull_data) # get area within convex hull
return(area)
}
def_distance <- function(x1,y1, x2, y2) {
dist <- sqrt((x1 - x2)^2 + (y1 - y2)^2) # ecludian distance formula
return(dist)
}
Next, we want to get a visualization of the locations of all players and the ball, overlaid on the court.
# Plot play ----
# The plays in the dataset
unique(nba$id_play)
court <- ggplot() +
# court
geom_rect(aes(xmin = 0, xmax = 94, ymin = 0, ymax = 50), alpha = 0, col = 'black')+
geom_rect(aes(xmin = 0, xmax = 19, ymin = 19, ymax = 31), alpha = 0, col = 'black')+
geom_rect(aes(xmin = 0, xmax = 19, ymin = 17, ymax = 33), alpha = 0, col = 'black')+
geom_segment(x=47,xend=47,y=0,yend=50) +
geom_segment(x=4,xend=4,y=22,yend=28) +
geom_circle(aes(x0=5.25, y0=25, r=0.75)) +
geom_circle(aes(x0=47, y0=25, r=6)) +
geom_circle(aes(x0=19, y0=25, r=6)) +
geom_segment(aes(y = 3, x = 0, yend = 3, xend = 14)) + #Left
geom_segment(aes(y = 47, x = 0, yend = 47, xend = 14)) + #Right
geom_curve(aes(y = 3, x = 14, yend = 47, xend = 14), curvature = .75,
angle = 90) + #Curve +
#looks
theme_bw() +
coord_equal() +
ggtitle(plot_play)
# Plot a single play:
plot_play <- 993980
court +
#players
geom_point(data = subset(nba, id_play == plot_play),
aes(x = x, y = y, col = team), size = 2)
Defensive pressure is assessed by considering both the number of defenders positioned in front of the player with the ball and the proximity of each defender to that player.
To determine if a player is positioned in front of the player, a line tangent from the centre of the hoop to the player holding the ball is drawn. The angle between the defender and the centre of the hoop, with the player’s (x,y) coordinates serving as the vertex, is then calculated using the cosine rule. If the defender is positioned in front of the player, the angle will be less than 90 degrees. Conversely, if the defender is behind the player, the angle will exceed 90 degrees (Figure 1). A high number of defensive players standing in front of the player leads to a greater defensive pressure. This is on the assumption players in front are better positioned to defend against a scoring opportunity or pass compared to if they were positioned behind.
The distance of each defender from the player with the ball is calculated using the Euclidean distance formula. Under the assumption that proximity increases defensive pressure, we counted the number of defenders within a 5-foot radius of the player on-ball.
Defenders far away from the player or behind them may be indicative of a situation where they are out of position or attempting to catch up with the player. In these instances, defensive pressure is typically lower, leading to an increased level of danger. In contrast, when the defenders are not only close to the player but also positioned in front of them, it becomes more challenging for the player to break free from the defenders to create a scoring opportunity. The player may be forced to pass the ball out to teammates at the three-point line, decreasing the level of danger.
Defensive pressure is modelled with the equation:
Different weights were assigned to the three components, with a greater emphasis placed on the number of close defenders who are also in front of the player. We chose to divide the number of close defenders in front by 3 on the assumption that it is not ideal for the all the defenders to surround the player on-ball, leaving the other four players unmarked.
# Defensive Pressure ----
dist_treshold <- 5 # set threshold for close defenders (5foot in this case)
def_pressure <- nba %>%
group_by(id_play) %>%
mutate(
ball_x = x[player_id == "on_ball"], # get x coordinates of player on ball
ball_y = y[player_id == "on_ball"], # get y coordinate of player on ball
) %>%
summarise(
distance_from_on_ball = def_distance(x, y, ball_x, ball_y), # get distance bet. player and player on ball
angle_btw_hoop_players = def_angle(ball_x, ball_y,x,y), # get angle between hoop, player and player on ball
count_close = sum(team == 'def' & distance_from_on_ball <= dist_treshold), # count number of defense within distance treshold
count_front = sum(team == 'def' & angle_btw_hoop_players < 90), # count number of defense in front
count_front_close = sum(team == 'def' & distance_from_on_ball <= 5 & angle_btw_hoop_players < 90),
def_pressure = 0.15*(count_close/5) + 0.15*(count_front/5) + 0.7*(count_front_close/3)) %>% # def pressure formula
slice(1) # def pressure / play 0 - 1
Offensive pressure considers the area of play of the offence team relative to the defense team.
The court coverage for each play’s offensive and defensive team is determined by calculating the area between each player’s position. The difference between these values reflects the effectiveness of offensive positioning in covering more ground while squeezing the defense into a smaller area.
To address the situation where a perfectly positioned man-on-man defense, i.e. area of offence and defense are exactly the same, is not an ideal situation, we assigned a weight of 0.85 to the offensive area. This adjustment assumes that the level of danger should not be 0 in such instances, as having all players on defense marking their players perfectly might be risky and will impact the level of danger.
Additionally, we took the absolute value to address scenarios where the area of defense is larger than the area of offense. In such cases, the defense may find themselves being pushed farther out and become more spread apart, allowing the offensive team to dominate the area within. These situations are more dangerous and should lead to a reduction in the overall score.
Offensive pressure is modeled with the equation:
# Offensive Pressure ----
# get chull area by team and play (0-1)
off_pressure <- nba %>%
group_by(id_play, team) %>% # area of defense and offense team
summarise(chull_area = def_chull_area(x,y)) %>%
pivot_wider(names_from = team, values_from = chull_area) %>% # pivot wide for mutate function
mutate(chull_score = abs((0.85*off - def) * (1/1000))) # chull forumla
Zone
Distance from hoop refers to the player’s proximity to the centre of the hoop and is calculated using the Euclidean distance formula, with hoop’s centre coordinates fixed at (5.25,25). We found an inverse relationship between distance from hoop and the level of danger. Not considering any other influencing factors and assuming that there are no defensive players in the way, the level of danger is the highest when the player with the ball is positioned just under the hoop. In these situations, the player is able to attempt a high percentage or execute a fast break, assuming that accuracy is the highest closest to the hoop. As distance from hoop increases, the level of danger gradually decreases, on the assumption that accuracy usually decreases further from the hoop. The team will need to make more moves to bring the ball closer to the hoop for a scoring opportunity. Another assumption with this equation is that the level of danger will not reach 0 even as distance from hoop increases and we have decided not to ignore the 1% chance that the player manages to score a goal out of pure luck.
We model this by with the equation:
where x is the distance between the player on ball and the centre of the hoop.
Fast
When considering the “fast-ness of a player”, we consider both the player’s speed and whether the player is still actively dribbling the ball. This is on the assumption that the indicator “dribble = 1” indicates that the player’s dribble has ended. A “fast” player increases the level of danger. For instance, if there are no defenders in the way and the offensive player is sprinting at a speed of 16ft/s while maintaining control of the dribble, it is very likely that he would be able to quickly reach the basket for a scoring attempt without encountering any steals or blocks by the opponents. In such scenarios, the level of danger is elevated.
# zone of play ----
dist_toHoop <- nba %>%
group_by(id_play) %>%
summarise(dist_toHoop = def_distance(x[player_id == 'on_ball'], y[player_id == 'on_ball'], 5.25, 25)) %>% # get distance with function
mutate(dist_toHoop_score = 0.5^dist_toHoop) # formula
# get fast scores ----
fast <- nba %>%
filter(player_id == 'on_ball') %>%
group_by(id_play) %>%
summarise(fast = ifelse(dribble == 0 & speed >= 10, 0.5, 0)) # fast if dribble not finished and speed more than 10
# dangerosity score -----
dangerosity <- off_pressure %>%
right_join(dist_toHoop, by = 'id_play') %>% # combine dist score with off pressure by play id
right_join(def_pressure, by = 'id_play') %>% # combine defensive pressure
right_join(fast, by = 'id_play') %>% # combine fast scores
select(chull_score, dist_toHoop_score, def_pressure, fast) %>% # select the three components
mutate(dangerosity = (0.25*chull_score+ 0.4*dist_toHoop_score - 0.25*def_pressure + 0.1*fast + 0.5)) # combine to get dangerosity scores for each play
Bringing all these components together, we define the dangerosity formula with:
The higher scoring probability that comes with being close to the hoop is accounted for in the equation with a higher weightage given to zone. The lowest weightage is given to fast although it might be a factor for consideration, it may not be the most important in most situations relative to offensive and defensive pressure. We add a constant value (0.5) to get the values above 0 for easier comparison.
# dangerosity score -----
dangerosity <- off_pressure %>%
right_join(dist_toHoop, by = 'id_play') %>% # combine dist score with off pressure by play id
right_join(def_pressure, by = 'id_play') %>% # combine defensive pressure
right_join(fast, by = 'id_play') %>% # combine fast scores
select(chull_score, dist_toHoop_score, def_pressure, fast) %>% # select the three components
mutate(dangerosity = (0.25*chull_score+ 0.4*dist_toHoop_score - 0.25*def_pressure + 0.1*fast + 0.5)) # combine to get dangerosity scores for each play
# Plot a single play:
plot_play <- 3421490
court +
#players
geom_point(data = subset(nba, id_play == plot_play),
aes(x = x, y = y, col = team), size = 2)
In this play, the player with the ball is just under the hoop with the defenders positioned behind him. This suggests that the player has an opportunity to score a goal with little pressure from the defense team. This play had a dangerosity score of 0.92, which well captures the high level of danger in this situation.
# Plot a single play:
plot_play <- 966140
court +
#players
geom_point(data = subset(nba, id_play == plot_play),
aes(x = x, y = y, col = team), size = 2)
In contrast, the player with the ball here is closely guarded with defenders directly in front of him within a close distance. The player is thus forced to pass out to the players on the three-point line to reset the play. This resulted in a lower dangerosity score of 0.33.