Week 2 Data Dive

This is my Week 2 Data Dive into my NBA Team per 100 possessions data set.

The columns of data include: Season, League, Team, Abbreviation, Playoffs, Games, Minutes_Played, FG_per_100, FGA_per_100, FG_Percent, X3p_per_100, X3pa_per_100, X3p_Percent, X2p_per_100, X2pa_per_100, X2p_Percent, FT_per_100, FTA_per_100, FT_Percent, ORB_per_100, DRB_per_100, TRB_per_100, AST_per_100, STL_per_100, BLK_per_100, TOV_per_100, PF_per_100, PTS_per_100

I am going to breakdown some of this data and discuss:

Summaries

Here I show a numeric summary of the columns for Points, Rebounds, Assists, Steals, and Blocks per 100 possessions.

summary(NBA_Stats_100$PTS_per_100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    92.2   103.5   106.4   106.6   109.7   123.2
summary(NBA_Stats_100$TRB_per_100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   38.00   43.20   44.60   44.62   46.10   52.90
summary(NBA_Stats_100$AST_per_100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   16.70   22.60   24.00   24.07   25.57   30.40
summary(NBA_Stats_100$STL_per_100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.700   7.600   8.300   8.344   9.000  13.500
summary(NBA_Stats_100$BLK_per_100)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.300   4.500   5.100   5.184   5.800   8.700

The insight that is gained from this is an understanding of the basic NBA stats for points, rebounds, assists, steals and blocks. These summaries depict the mean, median, minimum, and maximum value for each of these basic stats. This can help with determining what is unique or important when it comes to each of these values. In the future, I may want to compare what values have significance in determining the chance of making the playoffs or possibly even winning an NBA Championship.

Investigations

Now I will propose 3 questions to investigate the data further.

Question 1: How much has the average points per 100 possessions changed from 2010 to 2020?

Question 2: Has the average rebounds per 100 possessions increased or decreased since 2010?

Question 3: Has there been an increase in the average steals + blocks (or “stocks”) per 100 possessions since 2010?

Now I will display an aggregate function that answers the first question I proposed.

nba_10_20 <- subset(
  NBA_Stats_100,
  Season %in% c("2010-11", "2020-21")
)

aggregate(
  PTS_per_100 ~ Season,
  data = nba_10_20,
  FUN = mean,
  na.rm = TRUE
)
##    Season PTS_per_100
## 1 2010-11    107.2567
## 2 2020-21    112.3533

Given this result we can conclude that the average points per 100 possessions has increased by approximately 5.1 points from the 2010-11 season to the 2020-21 season. This is significant as we now know that the average points per 100 possessions is increasing over time and means that scoring is increasing as well. To further investigate this topic I may want to see if the average field goal, three point, and free throw percentages per 100 possessions have also increased over the same range of time. This could help us understand if scoring efficiency is also improving or if it’s just the amount of points being scored that is increasing.

Visualizations

Here I will display 2 separate visualizations that I created

The first one illustrates the fluctuation in average points per 100 possessions each season from 2010-11 to 2020-21. The visualization adds context for the 2011-12 season being a lockout season and the 2019-20 season being the covid season. Both of these had either a reduced amount of games or a vastly different schedule for games.

nba_10_21 <- subset(
  NBA_Stats_100,
  Season >= "2010-11" & Season <= "2020-21"
)
pts_by_season <- aggregate(
  PTS_per_100 ~ Season,
  data = nba_10_21,
  FUN = mean,
  na.rm = TRUE
)
pts_by_season$Season <- factor(
  pts_by_season$Season,
  levels = sort(unique(pts_by_season$Season))
)
# Highlight 2011-12 and 2019-20
highlight <- ifelse(pts_by_season$Season %in% c("2011-12", "2019-20"), "red", "blue")

plot(
  pts_by_season$Season,
  pts_by_season$PTS_per_100,
  type = "l",
  xlab = " ",
  ylab = "Average Points per 100 Possessions",
  main = "NBA Scoring 2010–11 to 2020–21",
  las = 2
)

points(
  pts_by_season$Season,
  pts_by_season$PTS_per_100,
  col = highlight,
  pch = 16
)

# Add legend
legend(
  "topleft",
  legend = c("Normal", "Lockout/Covid Season"),
  col = c("blue", "red"),
  pch = 16,
  cex = 0.8,
  pt.cex = 1
)

The second visualization illustrates how average rebounds per 100 possessions has fluctuated from the 2010-11 season to the 2020-21 season. Once again the visualization adds context for the altered schedules of the 2010-11 lockout and 2019-20 covid seasons.

nba_10_21 <- subset(
  NBA_Stats_100,
  Season >= "2010-11" & Season <= "2020-21"
)
reb_by_season <- aggregate(
  TRB_per_100 ~ Season,
  data = nba_10_21,
  FUN = mean,
  na.rm = TRUE
)
reb_by_season$Season <- factor(
  reb_by_season$Season,
  levels = sort(unique(reb_by_season$Season))
)
# Highlight 2011-12 and 2019-20
highlight <- ifelse(reb_by_season$Season %in% c("2011-12", "2019-20"), "red", "blue")

plot(
  reb_by_season$Season,
  reb_by_season$TRB_per_100,
  type = "l",
  xlab = " ",
  ylab = "Average Rebounds per 100 Possessions",
  main = "NBA Rebounding 2010–11 to 2020–21",
  las = 2
)

points(
  reb_by_season$Season,
  reb_by_season$TRB_per_100,
  col = highlight,
  pch = 16
)

# Add legend
legend(
  "topright",
  legend = c("Normal", "Lockout/Covid Season"),
  col = c("blue", "red"),
  pch = 16,
  cex = 0.8,
  pt.cex = 1
)

Given the results in these visualizations we can see that the average points per 100 possessions is increasing from the 2010-11 season to the 2020-21 season, while the average rebounds per 100 possessions is decreasing over the same time. This is significant because it is evidence towards the theory that scoring efficiency has improved from 2010 to 2020. To further investigate this topic I may look into how field goal percentages, three point percentages, and free throw percentages have changed over the same range of time.

Conclusion

Overall, I believe that I made some good progress towards understanding this dataset and beginning to pull out insights. I am excited to continue working witht this data to find out as much as I can about trends in NBA team stats. I am especially interested in seeing if I can form a connection between what stats are most important for teams to make the playoffs or even win an NBA Championship.