NBA Player Stats 2024-25

Author

Cody Paulay-Simmons

NBA Player Game Logs: Pre All-Star break 2024–25

This dataset contains individual game logs for NBA players during the 2024–25 season, covering games up to the All-Star break. Each row represents one game for a single player, including stats such as points scored (PTS), minutes played (MP), field goal percentage (FG%), assists (AST), rebounds (TRB), and Game Score (GmSc), a performance rating for overall impact. Categorical variables include the player name (Player), team (Tm), opponent (Opp), and game result (Res). The original dataset was sourced from NBA, though it was posted on Kaggle.

In this project, I plan to explore how player contributes to the game through scoring, rebounding, assisting, stealing, blocking, turnovers, and field goal percentages. The Game score is a simplified version of the Player Efficiency Rating (PER) which accounts for all of the statistics I just mentioned above. I will summarize player performance across all games to find trends in averages for those who played more than three quarters of the season before the all star break starts.

Load the library and the NBA Data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nba <- read_csv("NBADatabase2425.csv")
Rows: 16512 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (4): Player, Tm, Opp, Res
dbl  (20): MP, FG, FGA, FG%, 3P, 3PA, 3P%, FT, FTA, FT%, ORB, DRB, TRB, AST,...
date  (1): Data

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Average performance per player, grouped by team

nba_summary <- nba |>
  group_by(Player, Tm) |>
  summarize(
    games_played = n(),
    avg_pts = mean(PTS, na.rm = TRUE),
    avg_fg_pct = mean(`FG%`, na.rm = TRUE),
    avg_3p_fg_pct = mean(`3P%`, na.rm = TRUE),
    avg_trb = mean(TRB, na.rm = TRUE),
    avg_ast = mean(AST, na.rm = TRUE),
    avg_stl = mean(STL, na.rm = TRUE),
    avg_blk = mean(BLK, na.rm = TRUE),
    avg_tov = mean(TOV, na.rm = TRUE),    avg_gmsc = mean(GmSc, na.rm = TRUE)
  ) |>
  arrange(desc(avg_pts))
`summarise()` has grouped output by 'Player'. You can override using the
`.groups` argument.

Pull out the value of max_games outside the pipeline

max_games <- max(nba_summary$games_played, na.rm = TRUE)

Filter by average game score and take top 10 players who played at least 3/4 of the games

top_10 <- nba_summary |>
  filter(
    !is.na(avg_gmsc),
    games_played >= 0.75 * max_games
  ) |>
  arrange(desc(avg_gmsc)) |>
  head(10)

Plot

Create the bar chart of the top 10 players by average Game Score

# Final Plot: Top 10 Players by Average Game Score (with all required elements)
ggplot(top_10, aes(x = reorder(Player, avg_gmsc), y = avg_gmsc, fill = avg_gmsc)) +
  geom_col(color = "black", width = 0.8) + 
  coord_flip() +
  scale_fill_gradientn(
    colors = c("#9FE2BF", "#40E0D0", "#3CB371"),
    name = "Game Score") +
  labs(title = "Top 10 Most Impactful NBA Players (Pre–All-Star 2024–25)",
       x = "Player",
       y = "Average Game Score",
       caption = "Source: Basketball Reference via Kaggle") +
  theme_bw()

Reflection

To prepare the data, I cleaned up by grouping each player by team and summarized their average game stats including points, assists, rebounds, steals, blocks shooting percentages, and Game Score (GmSc). I filtered how many games each player had played and who appeared in at least 75% of the games before the All-Star break were included. This way, the results reflect consistency and not just one-time performances.

For the final visualization, I used a bar chart to show the top 10 players with the highest average Game Score. I chose Game Score because it gives a full picture of a player’s performance… not just scoring, but also assisting, rebounding, defense, and efficiency. I used a custom color gradient to help show who stood out the most, and I changed the ggplot theme for a cleaner visual. This specific information is not in the dataset but I noticed that top 3 players are international and there are 6 international players in this top 10 list. I thought it was interesting to see the game going more global nowadays.

If I had the advanced skills or tools, I would’ve loved to make this a scatterplot where each dot is a player’s face, and when you hover over it, it would show their full statistics. I also wanted to divide the chart into four quadrants: top performers, underperformers, efficient scorers, and volume scorers. That would make it easier to see not just who’s good overall, but also who struggles or excels in specific areas of the game.