Decisions made during the final moments of a game often determine the outcome — a recurring theme present in almost all sports. In the National Basketball Association (NBA), this is especially salient, where many games are decided by the last possession, and a single decision can determine a win or a loss. It is evident that fouling significantly impacts the result of a basketball game, in more ways than one, but is especially important at the end of games. A controversial endgame strategy endorsed by many is to deliberately foul the opposing team when leading by 3 points near the end of the game. The rationale behind this tactic is to force the opponent to earn points from free throws, limiting them to a maximum of 2 and preventing a game-tying three-point shot attempt. This approach not only maintains the lead for the fouling team but also grants them possession with time remaining on the clock.
Our project aims to explore the following questions:
To understand end-game strategies such as this one, enhance team performance, and ultimately improve game outcomes, we were interested in understanding how late-game fouls impact the result of an NBA game. This analysis aims to help coaches, players, and analysts make more informed decisions during late-game situations by providing data-driven insights into the effects of fouling strategies. We found that fouling may not be a good strategy when the opponent has been shooting well at the free-throw line earlier in the game. However, in general, there is no clear answer as to whether or not teams should foul late in the game. Nevertheless, our work lays a foundation for further analysis and research into end-game strategies in basketball.
NBA play-by-play and box score data were scraped from the hoopR package for seasons
from 2003 to 2023. The data was filtered to capture late-game scenarios,
specifically regular season and postseason games where the game clock
was within 24 seconds and the score difference was exactly 3 points in
the fourth quarter or any overtimes. A binary variable,
foul, was created to flag instances of teams fouling in
these critical moments. The outcome of each game was determined, and an
indicator variable success was added to denote whether the
leading team ultimately won the game. Game-level statistics such as
field goal percentage, three-point percentage, and free throw percentage
served as a proxy for team strength.
The descriptions for all of the key variables of interest are as follows:
| Variable | Description |
|---|---|
game_id |
Unique game ID |
season |
NBA season (year) |
foul |
Whether or not the leading team fouled (1 = yes) |
qtr |
Game period (4 = 4th quarter, 5 = OT1, 6 = OT2, etc) |
leading_team |
Team up by 3 points |
winner |
Team that won the game |
success |
Whether or not the leading team won the game (1 = yes) |
home_team_id |
Unique home team identification number |
away_team_id |
Unique away team identification number |
leading_fgpct |
Field goal percentage of the leading team in the game |
opponent_fgpct |
Field goal percentage of the opponent team in the game |
leading_3pct |
Three-point percentage of the leading team in the game |
opponent_3pct |
Three-point percentage of the opponent team in the game |
leading_FTpct |
Free throw percentage of the leading team in the game |
opponent_FTpct |
Free throw percentage of the opponent team in the game |
clock_seconds |
Seconds left until the end of the period |
Out of the 5428 games, only 306 games (~5.6% of the time) involved the leading team fouling when up by 3 points. After removing poor data, such as missing values (NA’s), our final dataset used for analysis comprised of 5418 games.
Figure 1 displays the frequency of fouls committed by leading teams during late-game situations (with 24 seconds left) across the past 21 seasons. This frequency was calculated by dividing the number of games with fouls by the total number of games in each season. The huge fluctuations may be attributed to the small sample of data with fouls, but we can still observe a decreasing trend in fouling frequency.
On average, the leading teams fouled with about 13.5 seconds remaining on the clock. As depicted in Figure 2, the occurrence of fouls committed by the leading team decreased steadily as the game clock approached 0. This decline in foul frequency is likely attributed to mounting game pressure and strategic considerations regarding the risks involved.
In our dataset, teams in the lead with 24 seconds left in the period won 93.25% of the time. We also compared the success rates of leading teams based on whether they fouled late in the game. The results showed a minimal difference in success rates between the two approaches: leading teams that did not foul won 93% of the time, slightly surpassing the 92% success rate for those opting to foul (Figure 3). This narrow margin implies that, in such tight late-game scenarios, the decision to foul or not foul may not be the decisive factor to the chances of winning the game. It should be noted that this comparison does not account for variables such as team strength and other relevant factors — and importantly, we are unable to determine if the late-game fouls were deliberate or inadvertent.
In this project, we developed a Bayesian binomial regression model
using the brms package that implements the model using the
probabilistic programming language Stan (4 chains, 2000 iterations
each). This model was designed to predict the success of the leading
team, defined as winning the game, while accounting for various
predictive factors. These included home-team advantage (i.e. whether the
leading team is the home team or away team) and in-game statistical
performance such as free-throw, field-goal, and three-point percentage
for both the leading and opposing teams. Importantly, interaction terms
between fouls and the opponents’ shooting percentages were also included
to assess the impact of fouling under different conditions.
The Bayesian binomial regression model assumes that the outcome
variable, success, follows a binomial distribution, which
is appropriate given the binary nature of the outcome (win or lose). The
model incorporates prior distributions on the parameters; without strong
pre-existing hypotheses about the parameters’ values, we decided to rely
on the model’s default prior distributions. Typically, these include
normal distributions centered at zero for intercepts and regression
coefficients, suggesting no initial bias towards positive or negative
effects. We opted for a Bayesian approach for another key reason: to
obtain 95% credible intervals and full posterior distributions for the
intercept and coefficient estimates, rather than just singular values.
This allows us to gain a more comprehensive understanding of their
complexity and better evaluate the uncertainty of our model results.
We employed leave-one-season-out cross-validation to assess our model’s effectiveness and its ability to predict success. This technique provides insights into the model’s misclassification rate and its generalizability to new seasons of unseen game data. Additionally, we computed the model’s Brier score, which measures prediction accuracy by quantifying the squared difference between the predicted success probability and the actual game outcome.
Our model revealed a positive coefficient for the foul
variable (1.23), however there is substantial variance (CI: [-4.04,
6.57]). In general, higher in-game statistics for the leading team are
associated with a higher probability of success. On the contrary, higher
opponent FG%, 3pt%, and FT% are associated with a lower probability of
success.
Interestingly, the effectiveness of fouling as a strategic tactic appears to be contingent upon the opponent’s performance, particularly concerning their accuracy at the free-throw line during the game (Figure 3). Specifically, when opponents have been shooting well at the free-throw line throughout the game, fouling may not yield the desired strategic advantage. However, we should note that there is high uncertainty surrounding this claim, as illustrated by the overlapping 95% credible intervals.
On the other hand, when opponents have been shooting well from the field, our model fails to offer convincing evidence in favor of fouling as an effective strategy. Notably, interactions between fouls and field goal percentage or three-point percentage indicate that the estimated coefficients center around zero with substantial variability, suggesting a lack of a significant impact on success. Due to high levels of uncertainty in our model results, these findings underscore the nuanced dynamics surrounding fouling strategies in basketball and imply that there may be a clear-cut guideline as to when, or if, fouling in late-game situations is effective.
Leave-one-season-out cross-validation revealed a misclassification rate of 6.98% and a Brier score of 6.26%. These results indicate that our model performed adequately, particularly considering how the occurrence of late-game fouls is rare in our the dataset. Additionally, we found no notable discrepancies in holdout performance across seasons.
Our project investigated the influence of late-game fouls on the outcome of NBA games in seasons from 2003 to 2023. Specifically, we examined the final 24 seconds of a game when a team is up by exactly 3 points. In our Bayesian binomial regression model, we accounted for home-team advantage, in-game statistics for both teams, and other relevant interactions between the variables. Our findings suggest that fouling may not be advantageous for the leading team, particularly when the opponent has demonstrated proficiency in free-throw shooting during the game. However, the efficacy of fouling remains uncertain when the opponent has been shooting well from the field or from the three-point line.
It is important to recognize the limitations of our analyses. First, late-game fouls are rare events in the NBA, which poses a challenge in developing a comprehensive model that can accurately predict their occurrence and impact. Additionally, not all fouls result in shooting opportunities (e.g., loose-ball fouls, non-shooting fouls), complicating the predictive accuracy of our model. Furthermore, our reliance on in-game statistics as a proxy for team strength may introduce biases, as these statistics may not fully capture team dynamics or team strength in general. Therefore, while we have attempted to control for various factors that may be important, other variables such as team ratings or the presence of specific players on the court may influence late-game scenarios and game outcomes. Despite these limitations, this project represents a significant step towards better understanding late-game strategies in the NBA. Nevertheless, further refinement and validation of our model are essential to tackle these constraints and improve our capacity to recommend a definitive late-game strategy for teams.
In the future, we would like to expand the model to include additional contextual factors such as player-specific performance metrics, team dynamics, and other situational variables to enhance predictive accuracy and model inference. We would also like to create interactive visualization tools to allow stakeholders, including coaches, analysts, and fans, to explore and interpret the model’s predictions and insights in an intuitive and user-friendly manner. Additionally, we may want to implement adaptive learning algorithms to continually update and refine the model’s performance over time as new data becomes available, ensuring its relevance as the league evolves. Although the occurrence of fouling in late-game scenarios is even more rare in the WNBA and NCAA, extending this analysis to other leagues could offer intriguing insights into the potential effectiveness of fouling strategies outside of the NBA.