This dataset contains individual game logs for NBA players during the 2024–25 season, covering games up to the All-Star break. Each row represents one game for a single player, including stats such as points scored (PTS), minutes played (MP), field goal percentage (FG%), assists (AST), rebounds (TRB), and Game Score (GmSc), a performance rating for overall impact. Categorical variables include the player name (Player), team (Tm), opponent (Opp), and game result (Res). The original dataset was sourced from NBA, though it was posted on Kaggle.
In this project, I plan to explore how player contributes to the game through scoring, rebounding, assisting, stealing, blocking, turnovers, and field goal percentages. The Game score is a simplified version of the Player Efficiency Rating (PER) which accounts for all of the statistics I just mentioned above. I will summarize player performance across all games to find trends in averages for those who played more than three quarters of the season before the all star break starts.
Load the library and the NBA Data
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
nba <-read_csv("NBADatabase2425.csv")
Rows: 16512 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Player, Tm, Opp, Res
dbl (20): MP, FG, FGA, FG%, 3P, 3PA, 3P%, FT, FTA, FT%, ORB, DRB, TRB, AST,...
date (1): Data
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Create the bar chart of the top 10 players by average Game Score
# Final Plot: Top 10 Players by Average Game Score (with all required elements)ggplot(top_10, aes(x =reorder(Player, avg_gmsc), y = avg_gmsc, fill = avg_gmsc)) +geom_col(color ="black", width =0.8) +coord_flip() +scale_fill_gradientn(colors =c("#9FE2BF", "#40E0D0", "#3CB371"),name ="Game Score") +labs(title ="Top 10 Most Impactful NBA Players (Pre–All-Star 2024–25)",x ="Player",y ="Average Game Score",caption ="Source: Basketball Reference via Kaggle") +theme_bw()
Reflection
To prepare the data, I cleaned up by grouping each player by team and summarized their average game stats including points, assists, rebounds, steals, blocks shooting percentages, and Game Score (GmSc). I filtered how many games each player had played and who appeared in at least 75% of the games before the All-Star break were included. This way, the results reflect consistency and not just one-time performances.
For the final visualization, I used a bar chart to show the top 10 players with the highest average Game Score. I chose Game Score because it gives a full picture of a player’s performance… not just scoring, but also assisting, rebounding, defense, and efficiency. I used a custom color gradient to help show who stood out the most, and I changed the ggplot theme for a cleaner visual. This specific information is not in the dataset but I noticed that top 3 players are international and there are 6 international players in this top 10 list. I thought it was interesting to see the game going more global nowadays.
If I had the advanced skills or tools, I would’ve loved to make this a scatterplot where each dot is a player’s face, and when you hover over it, it would show their full statistics. I also wanted to divide the chart into four quadrants: top performers, underperformers, efficient scorers, and volume scorers. That would make it easier to see not just who’s good overall, but also who struggles or excels in specific areas of the game.