This report seeks to answer the following question:
Is there a relationship between the number of points a basketball player scores per game and the number of times per game that player turns the ball over? (In basketball, a turnover is a change of possession due to some kind of mistake, such as commiting an offensive foul or having the ball stolen.)
We will be using a data set called bball_data obtained from https://www.basketball-reference.com/. It contains per-game totals for the 245 NBA players from the 2022-23 season who played in the minimum number of games (58 out of 82) to be eligible for the scoring title. There are 29 variables for each player; the relevant ones in this report are PTS (the average number of points scored by the player per game), TOV (the average number of turnovers committed by the player per game), and MP (the average number of minutes played per game by the player). The full data set can be viewed below:
Throughout, we will need the functionality of the tidyverse package, mainly to create visualizations.
library(tidyverse)
The problem we are investigating deals with the relationship between a player’s scoring output and the frequency with which the player turns the ball over. It seems reasonable to suggest that the more points an NBA player scores, the better player he is. Also, the better player he is, the less often he turns the ball over. We might thus expect an inverse relationship between points per game and turnovers per game, meaning that as points per game increases, turnovers per game decreases. We can test this hypothesis with a scatter plot:
ggplot(data = bball_data) +
geom_point(mapping = aes(x = PTS, y = TOV)) +
labs(x = "points per game",
y = "turnovers per game",
title = "Turnovers vs. Points Scored",
caption = "data obtained from basketball-reference.com")
This scatter plot indicates that our conjecture is incorrect. It seems that the more points a player scores, the more times he turns the ball over. In fact, the player with the most turnovers averaged per game also averaged more than 25 points per game. Why might this be? One explanation might be that players who score a lot of points per game are the ones who usually play lot. After all, the more time a player spends on the court during the game, the more opportunities he has to score. However, increased playing time also leads to more occasions for turning the ball over. We can test this hypothesis by color-coding the points in the above scatter plot according to the average number of minutes played per game by each player.
ggplot(data = bball_data) +
geom_point(mapping = aes(x = PTS, y = TOV, color = MP)) +
labs(x = "points per game",
y = "turnovers per game",
color = "minutes played per game",
title = "Turnovers vs. Points Scored",
caption = "data obtained from basketball-reference.com")
This time, our hunch is confirmed. Moving from left to right in the scatter plot, we see the points transitioning from dark to light, and according to the legend, this means the number of minutes played per game is increasing.
In summary, we can conclude that a player’s points per game and turnovers per game are directly related to each other, meaning that as one increases, so does the other. However, our data provides a reason for this. Points per game and turnovers per game are both explained by a third variable: minutes played per game. When taking minutes played into account, we see exactly why it is the case that prolific scorers often commit a lot of turnovers.