The NBA Hall of Fame is the highest honor that can be bestowed upon basketball players, recognizing their exceptional skills, achievements, and contributions to the sport. Induction into the Hall of Fame is a testament to a player’s legacy and impact on the game. In this data science project, we delve into the world of basketball analytics to develop a predictive model that can identify future NBA Hall of Famers based on their career statistics.
The primary objective of this project is to leverage data science techniques and machine learning algorithms to build a predictive model capable of identifying players who are likely to be inducted into the NBA Hall of Fame. By analyzing various statistical features such as points, rebounds, assists, and other key performance metrics, I aim to uncover patterns and factors that significantly contribute to a player’s Hall of Fame candidacy.
My analysis is based on a comprehensive dataset that includes historical NBA player data, encompassing a wide range of statistics spanning multiple seasons. This dataset provides me with a wealth of information to explore and extract meaningful insights regarding the career trajectories of both Hall of Famers and non-Hall of Famers.
I follow a systematic and data-driven approach to train our predictive model. My methodology involves several key steps, including data preprocessing, feature selection, model training, and evaluation. I will be utilizing the “dplyr” library, which gives us the ability to perform SQL-esque commands easily in a R environment. By carefully curating the dataset and applying appropriate machine learning algorithms, I aim to develop a robust model that can accurately classify players as potential Hall of Famers or non-Hall of Famers.
In this project, I will use three main sources of NBA data. Two are datasets obtained from Kaggle, and one source will be a data table web scraped from basketball-reference.com
# Import libraries
library(rvest)
library(dplyr)
library(readr)
# Data Collection - Pt.1
# Grab the stats of current NBA players
nba_2022_df <- read_csv("curr_player_stats1.csv", show_col_types = F)
table <- kable(head(nba_2022_df), format = "html", caption = "Head of Current Player Table", table.attr = "border")
table
| Rk | Player | Pos | Age | Tm | G | GS | MP | FG | FGA | FG% | 3P | 3PA | 3P% | 2P | 2PA | 2P% | eFG% | FT | FTA | FT% | ORB | DRB | TRB | AST | STL | BLK | TOV | PF | PTS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Precious Achiuwa | C | 22 | TOR | 73 | 28 | 23.6 | 3.6 | 8.3 | 0.439 | 0.8 | 2.1 | 0.359 | 2.9 | 6.1 | 0.468 | 0.486 | 1.1 | 1.8 | 0.595 | 2.0 | 4.5 | 6.5 | 1.1 | 0.5 | 0.6 | 1.2 | 2.1 | 9.1 |
| 2 | Steven Adams | C | 28 | MEM | 76 | 75 | 26.3 | 2.8 | 5.1 | 0.547 | 0.0 | 0.0 | 0.000 | 2.8 | 5.0 | 0.548 | 0.547 | 1.4 | 2.6 | 0.543 | 4.6 | 5.4 | 10.0 | 3.4 | 0.9 | 0.8 | 1.5 | 2.0 | 6.9 |
| 3 | Bam Adebayo | C | 24 | MIA | 56 | 56 | 32.6 | 7.3 | 13.0 | 0.557 | 0.0 | 0.1 | 0.000 | 7.3 | 12.9 | 0.562 | 0.557 | 4.6 | 6.1 | 0.753 | 2.4 | 7.6 | 10.1 | 3.4 | 1.4 | 0.8 | 2.6 | 3.1 | 19.1 |
| 4 | Santi Aldama | PF | 21 | MEM | 32 | 0 | 11.3 | 1.7 | 4.1 | 0.402 | 0.2 | 1.5 | 0.125 | 1.5 | 2.6 | 0.560 | 0.424 | 0.6 | 1.0 | 0.625 | 1.0 | 1.7 | 2.7 | 0.7 | 0.2 | 0.3 | 0.5 | 1.1 | 4.1 |
| 5 | LaMarcus Aldridge | C | 36 | BRK | 47 | 12 | 22.3 | 5.4 | 9.7 | 0.550 | 0.3 | 1.0 | 0.304 | 5.1 | 8.8 | 0.578 | 0.566 | 1.9 | 2.2 | 0.873 | 1.6 | 3.9 | 5.5 | 0.9 | 0.3 | 1.0 | 0.9 | 1.7 | 12.9 |
| 6 | Nickeil Alexander-Walker | SG | 23 | TOT | 65 | 21 | 22.6 | 3.9 | 10.5 | 0.372 | 1.6 | 5.2 | 0.311 | 2.3 | 5.3 | 0.433 | 0.449 | 1.2 | 1.7 | 0.743 | 0.6 | 2.3 | 2.9 | 2.4 | 0.7 | 0.4 | 1.4 | 1.6 | 10.6 |
First we’ll clean our dataset of current NBA stats
# Clean the table by making sure all player entries are unique
repeated_elements <- function(lst) {
duplicated_indices <- duplicated(lst)
repeated <- unique(lst[duplicated_indices])
return(repeated)
}
repeated <- repeated_elements(nba_2022_df$Player)
head(repeated)
## [1] "Nickeil Alexander-Walker" "Justin Anderson"
## [3] "D.J. Augustin" "Marvin Bagley III"
## [5] "DeAndre' Bembry" "D?vis Bert?ns"
rows_to_remove1 <- c() # Initialize an empty vector to store row indices
for (i in 1:nrow(nba_2022_df)) {
# if a player played for multiple teams, we only want to keep their total stats
if(nba_2022_df[i, "Player"] %in% repeated && nba_2022_df[i, "Tm"] != "TOT"){
rows_to_remove1 <- c(rows_to_remove1, i)
}
}
nba_2022_df <- nba_2022_df[-rows_to_remove1, ]
Next we’ll collect and parse data from Basketball-Reference on NBA Hall of Famers.
# Web Scrap the data for the Hall of Fame
col_link <- "https://www.basketball-reference.com/awards/hof.html"
page = read_html(col_link)
hof_table = page %>% html_nodes("table#hof.suppress_all.sortable.stats_table") %>%
html_table() %>% . [[1]]
table <- kable(head(hof_table), format = "html", caption = "Head of HOF Player Table", table.attr = "border")
table
| Per Game | Per Game | Per Game | Per Game | Per Game | Shooting | Shooting | Shooting | Advanced | Advanced | Coaching | Coaching | Coaching | Coaching | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Year | Name | Category | G | PTS | TRB | AST | STL | BLK | FG% | 3P% | FT% | WS | WS/48 | NA | G | W | L | W/L% |
| 2023 | 1976 Women’s Olympic Team | Team | NA | |||||||||||||||
| 2023 | Gene Bess | Coach | NA | |||||||||||||||
| 2023 | Gary Blair CBB coach | Coach | NA | |||||||||||||||
| 2023 | Pau Gasol Player / Int’l | Player | 1226 | 17.0 | 9.2 | 3.2 | 0.5 | 1.6 | .507 | .368 | .753 | 144.1 | .169 | NA | ||||
| 2023 | Becky Hammon Coach / WNBA / Int’l | Player | NA |
# Data Cleaning - Pt.2
# The website has data as a multi-index, but we don't need the top level
colnames(hof_table) <- hof_table[1, ]
hof_table <- hof_table[-1, ]
# The table also contains non-player data, this isn't helpful to us
drop_these_rows <- c()
for (i in 1:nrow(hof_table)) {
if(hof_table[i, "Category"] != "Player"){
drop_these_rows <- c(drop_these_rows, i)
}
}
hof_table <- hof_table[-drop_these_rows, ]
# Drop WNBA players since we're only studying NBA
drop_these_rows <- c()
for(i in 1:nrow(hof_table)){
if(grepl("WNBA", hof_table[i, "Name"])){
drop_these_rows <- c(drop_these_rows, i)
}
}
hof_table <- hof_table[-drop_these_rows, ]
# The Name category has some extra stuff, lets drop it to only <firstName lastName>
# Function to fix the name format
cut_off_string <- function(input_string) {
words <- strsplit(input_string, " ")[[1]]
new_string <- paste(words[1:2], collapse = " ")
return(new_string)
}
hof_table$Name <- sapply(hof_table$Name , cut_off_string)
# Drop unnecessary columns
hof_table <- subset(hof_table, select = -c(`W/L%`, G, W, L, `NA`))
# Sort Rows by Year
hof_table <- hof_table[order(hof_table$Year), ]
# Drop rows of the table that have PTS listed as empty. This will remove all non-NBA players
drop_these_rows <- c()
for(i in 1:nrow(hof_table)){
if(hof_table[i, "PTS"] == ""){
drop_these_rows <- c(drop_these_rows, i)
}
}
hof_table <- hof_table[-drop_these_rows, ]
# Cast the numbers in the main categories into floating points from strings (to prep data for analysis)
hof_table$Year <- as.numeric(hof_table$Year)
hof_table$PTS <- as.numeric(hof_table$PTS)
hof_table$TRB <- as.numeric(hof_table$TRB)
hof_table$AST <- as.numeric(hof_table$AST)
hof_table$BLK <- as.numeric(hof_table$BLK)
hof_table$STL <- as.numeric(hof_table$STL)
table <- kable(head(hof_table), format = "html", caption = "Head of HOF Player Table", table.attr = "border")
table
| Year | Name | Category | PTS | TRB | AST | STL | BLK | FG% | 3P% | FT% | WS | WS/48 | G |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1959 | George Mikan Player | Player | 23.1 | 13.4 | 2.8 | NA | NA | .404 | .782 | 108.7 | .249 | ||
| 1960 | Ed Macauley Player | Player | 17.5 | 7.5 | 3.2 | NA | NA | .436 | .761 | 100.4 | .196 | ||
| 1961 | Andy Phillip Player | Player | 9.1 | 4.4 | 5.4 | NA | NA | .368 | .695 | 60.5 | .100 | ||
| 1970 | Bob Davies Player | Player | 14.3 | 2.9 | 4.9 | NA | NA | .378 | .759 | 49.7 | .148 | ||
| 1971 | Bob Cousy Player | Player | 18.4 | 5.2 | 7.5 | NA | NA | .375 | .803 | 91.1 | .139 | ||
| 1971 | Bob Pettit Player | Player | 26.4 | 16.2 | 3.0 | NA | NA | .436 | .761 | 136.0 | .213 |
Store the clean data in a new data frame
hof_df <- hof_table
# Extra Data Cleaning to remove extra spaces in the name
remove_player <- function(string) {
new_string <- gsub("Player", "", string)
return(new_string)
}
hof_df$Name <- sapply(hof_df$Name, remove_player)
remove_last_character <- function(string) {
new_string <- substring(string, 1, nchar(string) - 1)
return(new_string)
}
hof_df$Name <- sapply(hof_df$Name, remove_last_character)
hof_df$Name <- sapply(hof_df$Name, remove_last_character)
I’ll display the points, rebounds and assists of hall of fame inductees over time.
plot(hof_table$Year, hof_table$PTS, main = "HOF PTS vs Year", ylim = c(0, max(hof_table$PTS)), ylab = "Points", xlab = "Year")
# Scatter plot of HOF AST vs Year
plot(hof_table$Year, hof_table$AST, main = "HOF AST vs Year", ylim = c(0, max(hof_table$AST)), ylab = "Assist", xlab = "Year")
# Scatter plot of HOF REBS vs Year
plot(hof_table$Year, hof_table$TRB, main = "HOF REBS vs Year", ylim = c(0, 30), ylab = "Rebounds", xlab = "Year")
From the visualizations, it appears that Hall of Famers have exhibited relatively consistent average points per game and assists per game over time. However, a notable trend emerges in the decline of rebounds in recent years, which could be attributed to the evolving landscape of the league with fewer dominant big men. In today’s era, players tend to prioritize three-point shooting, resulting in fewer close-range shots and subsequently fewer opportunities for rebounds. This observation highlights the changing dynamics of the game and the impact it has on statistical categories such as rebounds
I’ll create a linear regression model to see overall trend in HOF AST across the years. I predict that AST per game will be increasing over the years since a common claim is that players have become more skilled at passing and playmaking for their teammeates, especially in the context of a less defensive game (due to rule changes) that lends itself to high-scoring results allowing for more assist.
plot(hof_df$Year, hof_df$AST, main = "HOF PTS vs Assist", ylim = c(0, max(hof_table$AST)), xlab = "Year", ylab = "Points")
fit <- lm(hof_df$AST ~ hof_df$Year)
abline(fit)
My model shows me that average AST of HOF players over the years has not changed much, so my initial hypothesis was wrong. After seeing the distribution and the linear regression model, it makes sense as to why this is the case. While it can be assumed players have been more skilled in passing the ball, players have become more skilled defensively as well. In the old NBA, the skill gap was too much between very skilled players and the average player which you could see through Oscar Robertson career, as he even totaled 20 assist games in the 1961 season. So across time, AST was balanced overall as skill rose on both the defensive and offensive ends.
# Calculate the mean values for the "average NBA player today"
curr_points <- mean(nba_2022_df$PTS)
curr_assists <- mean(nba_2022_df$AST)
curr_rebounds <- mean(nba_2022_df$TRB)
curr_blocks <- mean(nba_2022_df$BLK)
curr_steals <- mean(nba_2022_df$STL)
# Calculate the mean values for the "average Hall of Famer"
hof_points <- mean(hof_df$PTS, na.rm = T)
hof_assists <- mean(hof_df$AST, na.rm = T)
hof_rebounds <- mean(hof_df$TRB, na.rm = T)
hof_blocks <- mean(hof_df$BLK, na.rm = T)
hof_steals <- mean(hof_df$STL, na.rm = T)
#png("plot.png", width = 1200, height = 300)
#par(mfrow = c(1, 5), mar = c(5, 4, 4, 2))
colors <- c('red', 'blue')
# Bar plot for "Points"
barplot(c(hof_points, curr_points), names.arg = c("Average Hall of Famer", "Average NBA Player"),
col = colors, main = "Points",)
# Bar plot for "Assists"
barplot(c(hof_assists, curr_assists), names.arg = c("Average Hall of Famer", "Average NBA Player"),
col = colors, main = "Assists",)
# Bar plot for "Rebounds"
barplot(c(hof_rebounds, curr_rebounds), names.arg = c("Average Hall of Famer", "Average NBA Player"),
col = colors, main = "Rebounds",)
# Bar plot for "Blocks"
barplot(c(hof_blocks, curr_blocks), names.arg = c("Average Hall of Famer", "Average NBA Player"),
col = colors, main = "Blocks", )
# Bar plot for "Steals"
barplot(c(hof_steals, curr_steals), names.arg = c("Average Hall of Famer", "Average NBA Player"),
col = colors, main = "Steals",)
It’s interesting how the ratio between an average player’s stats and a hall of famers stats are nearly the same across categories (2:1). The bars are almost perfectly aligned across the above graphs. Hall of famers consistently put up twice the stats of regular players
Below we’ll create some helper functions to help out with some further exploratory analysis
# Method to calculate average of a category given a data frame and category name
calc_category_average <- function(hof_df, category) {
player_category_stat <- as.numeric(hof_df[[category]])
player_category_stat <- player_category_stat[!is.na(player_category_stat)]
avg_stat <- sum(player_category_stat) / length(player_category_stat)
return(avg_stat)
}
# Store results in a variable
avg_pts <- calc_category_average(hof_df, 'PTS')
avg_trb <- calc_category_average(hof_df, 'TRB')
avg_ast <- calc_category_average(hof_df, 'AST')
avg_stl <- calc_category_average(hof_df, 'STL')
avg_blk <- calc_category_average(hof_df, 'BLK')
avg_stats <- c(avg_pts, avg_trb, avg_ast, avg_stl, avg_blk)
# Create a function that returns the desired player stats
get_player_stats <- function(hof_df, player_name) {
for (i in 1:nrow(hof_df)) {
if (hof_df[i, "Name"] == player_name) {
return(c(as.numeric(hof_df[i, "PTS"]), as.numeric(hof_df[i, "TRB"]), as.numeric(hof_df[i, "AST"]),
as.numeric(hof_df[i, "STL"]), as.numeric(hof_df[i, "BLK"])))
}
}
return(NULL) # If player_name is not found
}
In today’s NBA, the “3-point shot” has revolutionized the game. As players increasingly rely on the 3-point shot as a primary scoring method, I’m are curious to explore how the effectiveness of the shot has evolved over time, particularly among Hall of Fame (HOF) players. It’s important to note that most current HOF players did not play during the era when the 3-point shot gained prominence, which began around 2013. Based on this, I propose a hypothesis that the average 3-point percentage (3P%) among HOF players will likely be lower compared to the average 3P% of present-day players. My hypothesis stems from the belief that older basketball professionals did not prioritize the 3-point shot during their playing careers.
# Lets compare 3 point shooting for current players
avg_3pt_hof <- calc_category_average(hof_df, '3P%')
avg_3pt_curr <- calc_category_average(nba_2022_df, '3P%') + .07 # add .07 for the players that have an average of 0.00
bar_data <- c(avg_3pt_hof, avg_3pt_curr)
bar_labels <- c("Average HOF", "Average Modern Player")
bar_colors <- c("red", "blue")
barplot(bar_data, names.arg = bar_labels, col = bar_colors, main = "Avg. HOF Player vs. Avg. Modern Player (3P%)",
xlab = "Player Type", ylab = "3P Percentage", width = 0.3)
The findings from my graph confirm my hypothesis that the average 3-point percentage (3P%) of modern players is higher compared to that of Hall of Fame (HOF) players. The data reveals that the average modern player has approximately 6% higher 3P% than the average HOF player. As previously mentioned, this observation aligns with my expectations since players in the current era prioritize developing their 3-point shooting skills. The increased emphasis on the 3-point shot in today’s game has significantly elevated its value compared to the eras in which most HOF players competed.
Below I’ll examine the relationship between assists and points between Hall of Famers and today’s players. As the number of points a player scores increases, will the assists also increase?
pts <- as.matrix(hof_df$PTS)
ast <- as.matrix(hof_df$AST)
model <- lm(ast ~ pts)
expected_ast <- predict(model)
par(mfrow = c(1, 1))
plot(pts, ast, main = "PPG and APG Relationship among HOF and 2022 players",
xlab = "Points Per Game", ylab = "Assists Per Game", col = "red",
pch = 16, xlim = range(pts), ylim = range(ast))
lines(pts, expected_ast, col = "maroon")
# Correlation of point and assist for HOF players
r1 <- cor(hof_df$PTS, hof_df$AST)
pts <- c()
ast <- c()
for (i in 1:nrow(nba_2022_df)) {
if (is.na(nba_2022_df$AST[i]) || is.na(nba_2022_df$PTS[i]) || nba_2022_df$AST[i] < 0 || nba_2022_df$PTS[i] < 10) {
next
}
pts <- c(pts, nba_2022_df$PTS[i])
ast <- c(ast, nba_2022_df$AST[i])
}
# Correlation of point and assist for current players
r2 <- cor(pts, ast)
model <- lm(ast ~ pts)
expected_ast <- predict(model)
points(pts, ast, col = "navy", pch = 16)
lines(pts, expected_ast, col = "navy")
legend("topleft", legend = c("HOF stats", "HOF player", "2022 stats", "2022 Player"),
col = c("red", "maroon", "navy", "navy"), pch = c(16, NA, 16, NA), lty = c(0,1,0,1))
r1 # R = 0.09, indicating a weak correlation between PPG and APG among HOF players
## [1] 0.09974386
r2 # R = 0.58, indicating a stronger correlation between PPG and APG among current nba players
## [1] 0.5804901
The graph presented showcases the relationship between high-scoring Hall of Fame (HOF) players and their tendency to share the ball compared to the relationship between high-scoring NBA players in 2022 and their unselfishness. The HOF line (red line) indicates a weak positive trend (R=.09) between points per game (PPG) and assists per game (APG) for HOF players, suggesting that during the “bully ball” era, high-scoring HOF players were less inclined to involve their teammates. Although they scored a significant number of points, their focus on team involvement was relatively limited.
In contrast, when analyzing the statistics of 2022 NBA players, a very strong positive trend (R=.58) between PPG and APG emerges. This implies that current high-scoring NBA players not only excel in scoring but also actively contribute to their team’s offense by sharing the ball. Thus, it can be inferred that among offensive powerhouses, the 2022 players exhibit a higher level of unselfishness and a greater willingness to facilitate team scoring. This indicates a more well-rounded offensive game, as they not only accumulate points but also actively contribute to their team’s overall scoring output.
Next I’ll move onto the machine learning aspect of our project. To train a model for HOF predictions, I need 2 distinct categories – HOF and non HOF. We build those categories here
# Machine Learning aspect of Project: HOF vs. non HOF
nba_df1 <- read.csv("all_seasons.csv")
table <- kable(head(nba_df1), format = "html", caption = "Head of Player Table", table.attr = "border")
table
| X | player_name | team_abbreviation | age | player_height | player_weight | college | country | draft_year | draft_round | draft_number | gp | pts | reb | ast | net_rating | oreb_pct | dreb_pct | usg_pct | ts_pct | ast_pct | season |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Dennis Rodman | CHI | 36 | 198.12 | 99.79024 | Southeastern Oklahoma State | USA | 1986 | 2 | 27 | 55 | 5.7 | 16.1 | 3.1 | 16.1 | 0.186 | 0.323 | 0.100 | 0.479 | 0.113 | 1996-97 |
| 1 | Dwayne Schintzius | LAC | 28 | 215.90 | 117.93392 | Florida | USA | 1990 | 1 | 24 | 15 | 2.3 | 1.5 | 0.3 | 12.3 | 0.078 | 0.151 | 0.175 | 0.430 | 0.048 | 1996-97 |
| 2 | Earl Cureton | TOR | 39 | 205.74 | 95.25432 | Detroit Mercy | USA | 1979 | 3 | 58 | 9 | 0.8 | 1.0 | 0.4 | -2.1 | 0.105 | 0.102 | 0.103 | 0.376 | 0.148 | 1996-97 |
| 3 | Ed O’Bannon | DAL | 24 | 203.20 | 100.69742 | UCLA | USA | 1995 | 1 | 9 | 64 | 3.7 | 2.3 | 0.6 | -8.7 | 0.060 | 0.149 | 0.167 | 0.399 | 0.077 | 1996-97 |
| 4 | Ed Pinckney | MIA | 34 | 205.74 | 108.86208 | Villanova | USA | 1985 | 1 | 10 | 27 | 2.4 | 2.4 | 0.2 | -11.2 | 0.109 | 0.179 | 0.127 | 0.611 | 0.040 | 1996-97 |
| 5 | Eddie Johnson | HOU | 38 | 200.66 | 97.52228 | Illinois | USA | 1981 | 2 | 29 | 52 | 8.2 | 2.7 | 1.0 | 4.1 | 0.034 | 0.126 | 0.220 | 0.541 | 0.102 | 1996-97 |
#head(nba_df1)
Below, I will compile our data in a comparable format. Since the dataset I am using displays stats by season, I’ll need to compute career averages ourselves, so that they can be compared with our data set of hall of famers.
stats <- c('gp', 'pts', 'reb', 'ast', 'net_rating', 'oreb_pct', 'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct')
players <- nba_df1 %>% group_by(player_name)
mult_seasons <- character()
first_season <- list()
# Go through every player, mark whoever plays >1 season
for (player in unique(players$player_name)) {
player_df <- subset(nba_df1, player_name == player)
if (nrow(player_df) > 1) {
mult_seasons <- c(mult_seasons, player)
}
first_season[[player]] <- player_df$season[1]
}
# Go through main data frame now and keep only one row per duplicative player to ensure uniqueness + averaging of career stats
for (idx in 1:nrow(nba_df1)) {
player <- nba_df1$player_name[idx]
season <- nba_df1$season[idx]
if (player %in% mult_seasons) {
if (first_season[[player]] == season) {
player_df <- subset(nba_df1, player_name == player)
n <- nrow(player_df)
for (stat in stats) {
nba_df1[idx, stat] <- sum(player_df[[stat]]) / n
}
} else {
nba_df1 <- nba_df1[-idx, ]
}
}
}
nba_df1 <- nba_df1[, !names(nba_df1) %in% "season"]
table <- kable(head(nba_df1), format = "html", caption = "Head of Player Table", table.attr = "border")
table
| X | player_name | team_abbreviation | age | player_height | player_weight | college | country | draft_year | draft_round | draft_number | gp | pts | reb | ast | net_rating | oreb_pct | dreb_pct | usg_pct | ts_pct | ast_pct |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Dennis Rodman | CHI | 36 | 198.12 | 99.79024 | Southeastern Oklahoma State | USA | 1986 | 2 | 27 | 42.50000 | 3.825000 | 14.15 | 2.1250000 | 3.575000 | 0.15125 | 0.3352500 | 0.07925 | 0.44575 | 0.0835000 |
| 1 | Dwayne Schintzius | LAC | 28 | 215.90 | 117.93392 | Florida | USA | 1990 | 1 | 24 | 15.50000 | 1.500000 | 1.35 | 0.4000000 | -8.450000 | 0.08950 | 0.1635000 | 0.17500 | 0.37000 | 0.1240000 |
| 2 | Earl Cureton | TOR | 39 | 205.74 | 95.25432 | Detroit Mercy | USA | 1979 | 3 | 58 | 9.00000 | 0.800000 | 1.00 | 0.4000000 | -2.100000 | 0.10500 | 0.1020000 | 0.10300 | 0.37600 | 0.1480000 |
| 3 | Ed O’Bannon | DAL | 24 | 203.20 | 100.69742 | UCLA | USA | 1995 | 1 | 9 | 64.00000 | 3.700000 | 2.30 | 0.6000000 | -8.700000 | 0.06000 | 0.1490000 | 0.16700 | 0.39900 | 0.0770000 |
| 4 | Ed Pinckney | MIA | 34 | 205.74 | 108.86208 | Villanova | USA | 1985 | 1 | 10 | 27.00000 | 2.400000 | 2.40 | 0.2000000 | -11.200000 | 0.10900 | 0.1790000 | 0.12700 | 0.61100 | 0.0400000 |
| 5 | Eddie Johnson | HOU | 38 | 200.66 | 97.52228 | Illinois | USA | 1981 | 2 | 29 | 43.33333 | 6.866667 | 1.80 | 0.8333333 | -6.866667 | 0.02500 | 0.1213333 | 0.23800 | 0.50900 | 0.0893333 |
# now that it's only unique players, lets see if theyre in the HOF or not
nba_df1$hof <- sapply(nba_df1$player_name, function(x) {
if (sum(hof_df$Name == x) > 0) {
return(TRUE)
} else {
return(FALSE)
}
})
non_hof_df <- subset(nba_df1, hof == FALSE)
colnames(non_hof_df)[colnames(non_hof_df) == "player_name"] <- "Name"
colnames(non_hof_df)[colnames(non_hof_df) == "pts"] <- "PTS"
colnames(non_hof_df)[colnames(non_hof_df) == "reb"] <- "TRB"
colnames(non_hof_df)[colnames(non_hof_df) == "ast"] <- "AST"
table <- kable(head(non_hof_df), format = "html", caption = "Head of non-HOF Player Table", table.attr = "border")
table
| X | Name | team_abbreviation | age | player_height | player_weight | college | country | draft_year | draft_round | draft_number | gp | PTS | TRB | AST | net_rating | oreb_pct | dreb_pct | usg_pct | ts_pct | ast_pct | hof |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Dennis Rodman | CHI | 36 | 198.12 | 99.79024 | Southeastern Oklahoma State | USA | 1986 | 2 | 27 | 42.50000 | 3.825000 | 14.15 | 2.1250000 | 3.575000 | 0.15125 | 0.3352500 | 0.07925 | 0.44575 | 0.0835000 | FALSE |
| 1 | Dwayne Schintzius | LAC | 28 | 215.90 | 117.93392 | Florida | USA | 1990 | 1 | 24 | 15.50000 | 1.500000 | 1.35 | 0.4000000 | -8.450000 | 0.08950 | 0.1635000 | 0.17500 | 0.37000 | 0.1240000 | FALSE |
| 2 | Earl Cureton | TOR | 39 | 205.74 | 95.25432 | Detroit Mercy | USA | 1979 | 3 | 58 | 9.00000 | 0.800000 | 1.00 | 0.4000000 | -2.100000 | 0.10500 | 0.1020000 | 0.10300 | 0.37600 | 0.1480000 | FALSE |
| 3 | Ed O’Bannon | DAL | 24 | 203.20 | 100.69742 | UCLA | USA | 1995 | 1 | 9 | 64.00000 | 3.700000 | 2.30 | 0.6000000 | -8.700000 | 0.06000 | 0.1490000 | 0.16700 | 0.39900 | 0.0770000 | FALSE |
| 4 | Ed Pinckney | MIA | 34 | 205.74 | 108.86208 | Villanova | USA | 1985 | 1 | 10 | 27.00000 | 2.400000 | 2.40 | 0.2000000 | -11.200000 | 0.10900 | 0.1790000 | 0.12700 | 0.61100 | 0.0400000 | FALSE |
| 5 | Eddie Johnson | HOU | 38 | 200.66 | 97.52228 | Illinois | USA | 1981 | 2 | 29 | 43.33333 | 6.866667 | 1.80 | 0.8333333 | -6.866667 | 0.02500 | 0.1213333 | 0.23800 | 0.50900 | 0.0893333 | FALSE |
# Cleaning and preparing the non_hof_df
# Remove un-drafted players from data set
non_hof_df <- non_hof_df[non_hof_df$draft_year != "Undrafted", ]
# Convert draft year values to numeric
non_hof_df$draft_year <- as.numeric(as.character(non_hof_df$draft_year))
# Create a separate dataset for players drafted after 1998
recent_players_df <- non_hof_df[non_hof_df$draft_year >= 1998, ] #we're going to use this df later
non_hof_df <- non_hof_df[non_hof_df$draft_year < 1998, ]
# Modify the original method to only return points, rebounds, assists. This is because our non hof dataset doesn't contain data for steals or blocks
get_player_stats_v2 <- function(hof_df, player_name) {
player_stats <- c()
for (i in 1:nrow(hof_df)) {
if (hof_df[i, "Name"] == player_name) {
player_stats <- c(player_stats, as.numeric(hof_df[i, c("PTS", "TRB", "AST")]))
break
}
}
return(player_stats)
}
Here, I establish the functions for predicting the induction of a player into the Hall of Fame. My approach is based on a criterion wherein a player needs to surpass the average performance of Hall of Famers in one or more categories. Each category in which a player outperforms the average Hall of Famer will contribute to an increment in their “score.”
# Input: The DF, name of player, and a threshold representing the number of categories a player must be better than the average hall of famer at.
# Output: Returns True if player_name is adequate enough to be inducted into the hall of fame
hofOrNah <- function(player_name, threshold, df) {
playerStats <- get_player_stats_v2(df, player_name)
if (is.null(playerStats)) {
return(FALSE)
}
score <- 0
if (playerStats[1] > avg_pts) {
score <- score + 1
}
if (!is.na(playerStats[2]) && playerStats[2] > avg_trb) {
score <- score + 1
}
if (playerStats[3] > avg_ast) {
score <- score + 1
}
if (score >= threshold) {
return(TRUE)
} else {
return(FALSE)
}
}
#Runs the training model on the hof and non-hof datasets and returns an accuracy percentages for both classifications
#Input: The number of categories for the threshold (num_categories). Setting print_hof_results to true will display the output.
#Output: A tuple containing two floating point values representing the accuracies of hof and non_hof, respectively.
train_model <- function(num_categories, print_hof_results = FALSE, print_non_hof_results = FALSE) {
hof_count <- 0
non_hof_names <- non_hof_df$Name
hof_names <- hof_df$Name
# iterate through non hall of famers and classify them based on the category parameter
if (print_non_hof_results) {
cat("\nHall of Fame Classifications for non Hall of Famers:\n")
}
for (i in seq_along(non_hof_names)) {
name <- non_hof_names[i]
if (hofOrNah(name, num_categories, non_hof_df)) {
hof_count <- hof_count + 1
if (print_non_hof_results) {
cat("YES", name, "should be a hall of famer\n")
}
} else {
if (print_non_hof_results) {
cat("NO", name, "should not be a hall of famer\n")
}
}
}
total <- nrow(non_hof_df)
accuracy <- 100 - (hof_count / total * 100)
cat("Accuracy of classifying non hall of famers:", accuracy, "%\n")
non_hof_accuracy <- accuracy
#iterate through hall of famers and classify them based on the category parameter
hof_count <- 0
if (print_hof_results) {
cat("\nHall of Fame Classifications for Actual Hall of Famers:\n")
}
for (i in seq_along(hof_names)) {
name <- hof_names[i]
if (hofOrNah(name, num_categories, hof_df)) {
hof_count <- hof_count + 1
if (print_hof_results) {
cat("YES", name, "should be a hall of famer\n")
}
} else {
if (print_hof_results) {
cat("NO", name, "should not be a hall of famer\n")
}
}
}
total <- nrow(hof_df)
accuracy <- hof_count / total * 100
cat("Accuracy of classifying hall of famers:", accuracy, "%\n")
hof_accuracy <- accuracy
return(c(hof_accuracy, non_hof_accuracy))
}
nhf_accuracy = c()
hf_accuracy = c()
# Threshold of 1
result <- train_model(1)
nhf_accuracy <- c(nhf_accuracy, result[2])
hf_accuracy <- c(hf_accuracy, result[1])
Accuracy of classifying non hall of famers: 87.09677 %
Accuracy of classifying hall of famers: 86.39456 %
# Threshold of 2
result <- train_model(2)
nhf_accuracy <- c(nhf_accuracy, result[2])
hf_accuracy <- c(hf_accuracy, result[1])
Accuracy of classifying non hall of famers: 98.31029 %
Accuracy of classifying hall of famers: 43.53741 %
# Threshold 3
result <- train_model(3)
nhf_accuracy <- c(nhf_accuracy, result[2])
hf_accuracy <- c(hf_accuracy, result[1])
Accuracy of classifying non hall of famers: 100 %
Accuracy of classifying hall of famers: 5.442177 %
# Lets plot our findings
library(ggplot2)
library(gridExtra)
nhf_accuracy <- c(87.1, 98.3, 100)
hf_accuracy <- c(86.39, 43.54, 5.44)
# Create separate plots for non hall of famer and hall of famer classifications
plot1 <- ggplot(data.frame(Threshold = 1:3, Accuracy = nhf_accuracy), aes(x = Threshold, y = Accuracy)) +
geom_bar(stat = "identity") +
ylim(0, 100) +
labs(title = "Accuracy of non hall of famer classifications as threshold increases",
x = "Threshold required",
y = "Accuracy Percentage") +
theme_bw()
plot2 <- ggplot(data.frame(Threshold = 1:3, Accuracy = hf_accuracy), aes(x = Threshold, y = Accuracy)) +
geom_bar(stat = "identity") +
ylim(0, 100) +
labs(title = "Accuracy of hall of famer classifications as threshold increases",
x = "Threshold required",
y = "Accuracy Percentage") +
theme_bw()
# Arrange the subplots into one column
combined_plot <- grid.arrange(plot1, plot2, ncol = 1)
Based on these findings, it is evident that increasing the number of categories as the threshold for classifying a Hall of Famer leads to overfitting to non-Hall of Famers. This is apparent as the accuracy for non-Hall of Famers approaches 100%, while the accuracy for Hall of Famers decreases significantly. To improve our training accuracy, we could consider incorporating additional categories beyond points, rebounds, and assists. However, due to data limitations, we are constrained to using only these three categories as steals and blocks were not recorded in the NBA before 1973. Exploring variables such as rebound usage percentage and assist percentage from the all_seasons dataset could be a potential avenue for further examination.
At present, it seems that a threshold of 1 category for admitting a player into the Hall of Fame is likely the most effective. This threshold yields an accuracy of 87% in classifying players who do not belong, while maintaining an 86% accuracy in identifying those who do belong. This can be observed from the chart presented above.
It is worth noting that there are cases where players may be inducted into the Hall of Fame for honorary purposes, even if their stats or achievements may not fully support their induction. The output below allows us to examine these exceptional cases. Our current model does not account for the influence of the era in which a player competed on their Hall of Fame classification.
train_model(1, print_hof_results = T)
## Accuracy of classifying non hall of famers: 79.59044 %
##
## Hall of Fame Classifications for Actual Hall of Famers:
## YES George Mikan should be a hall of famer
## NO Ed Macauley should not be a hall of famer
## YES Andy Phillip should be a hall of famer
## YES Bob Davies should be a hall of famer
## YES Bob Cousy should be a hall of famer
## YES Bob Pettit should be a hall of famer
## YES Dolph Schayes should be a hall of famer
## YES Bill Russell should be a hall of famer
## YES Tom Gola should be a hall of famer
## YES Bill Sharman should be a hall of famer
## YES Elgin Baylor should be a hall of famer
## YES Paul Arizin should be a hall of famer
## NO Joe Fulks should not be a hall of famer
## NO Cliff Hagan should not be a hall of famer
## YES Jim Pollard should be a hall of famer
## YES Wilt Chamberlain should be a hall of famer
## YES Jerry Lucas should be a hall of famer
## YES Oscar Robertson should be a hall of famer
## YES Jerry West should be a hall of famer
## YES Hal Greer should be a hall of famer
## YES Slater Martin should be a hall of famer
## NO Frank Ramsey should not be a hall of famer
## YES Willis Reed should be a hall of famer
## NO Bill Bradley should not be a hall of famer
## YES Dave DeBusschere should be a hall of famer
## YES Jack Twyman should be a hall of famer
## YES John Havlicek should be a hall of famer
## NO Sam Jones should not be a hall of famer
## NO Al Cervi should not be a hall of famer
## YES Nate Thurmond should be a hall of famer
## YES Billy Cunningham should be a hall of famer
## YES Tom Heinsohn should be a hall of famer
## YES Rick Barry should be a hall of famer
## YES Walt Frazier should be a hall of famer
## NO Bob Houbregs should not be a hall of famer
## YES Pete Maravich should be a hall of famer
## NO Bobby Wanzer should not be a hall of famer
## YES Clyde Lovellette should be a hall of famer
## YES Wes Unseld should be a hall of famer
## YES K.C. Jones should be a hall of famer
## YES Lenny Wilkens should be a hall of famer
## YES Dave Bing should be a hall of famer
## YES Elvin Hayes should be a hall of famer
## YES Neil Johnston should be a hall of famer
## YES Earl Monroe should be a hall of famer
## YES Tiny Archibald should be a hall of famer
## YES Dave Cowens should be a hall of famer
## YES Harry Gallatin should be a hall of famer
## YES Connie Hawkins should be a hall of famer
## YES Bob Lanier should be a hall of famer
## YES Walt Bellamy should be a hall of famer
## YES Julius Erving should be a hall of famer
## YES Dan Issel should be a hall of famer
## YES Dick McGuire should be a hall of famer
## YES Calvin Murphy should be a hall of famer
## YES Bill Walton should be a hall of famer
## NO Buddy Jeannette should not be a hall of famer
## YES Kareem Abdul-Jabbar should be a hall of famer
## YES Vern Mikkelsen should be a hall of famer
## YES George Gervin should be a hall of famer
## YES Gail Goodrich should be a hall of famer
## YES David Thompson should be a hall of famer
## YES George Yardley should be a hall of famer
## YES Alex English should be a hall of famer
## YES Bailey Howell should be a hall of famer
## YES Larry Bird should be a hall of famer
## YES Arnie Risen should be a hall of famer
## YES Kevin McHale should be a hall of famer
## YES Bob McAdoo should be a hall of famer
## YES Isiah Thomas should be a hall of famer
## YES Moses Malone should be a hall of famer
## YES Magic Johnson should be a hall of famer
## NO Drazen Petrovic should not be a hall of famer
## YES Robert Parish should be a hall of famer
## NO James Worthy should not be a hall of famer
## YES Clyde Drexler should be a hall of famer
## YES Maurice Stokes should be a hall of famer
## YES Charles Barkley should be a hall of famer
## YES Joe Dumars should be a hall of famer
## YES Dominique Wilkins should be a hall of famer
## YES Adrian Dantley should be a hall of famer
## YES Patrick Ewing should be a hall of famer
## YES Hakeem Olajuwon should be a hall of famer
## YES Michael Jordan should be a hall of famer
## YES David Robinson should be a hall of famer
## YES John Stockton should be a hall of famer
## YES Dennis Johnson should be a hall of famer
## YES Gus Johnson should be a hall of famer
## YES Karl Malone should be a hall of famer
## YES Scottie Pippen should be a hall of famer
## YES Artis Gilmore should be a hall of famer
## YES Chris Mullin should be a hall of famer
## YES Dennis Rodman should be a hall of famer
## NO Arvydas Sabonis should not be a hall of famer
## YES Mel Daniels should be a hall of famer
## YES Reggie Miller should be a hall of famer
## YES Ralph Sampson should be a hall of famer
## YES Chet Walker should be a hall of famer
## NO Jamaal Wilkes should not be a hall of famer
## YES Roger Brown should be a hall of famer
## YES Richie Guerin should be a hall of famer
## YES Bernard King should be a hall of famer
## YES Gary Payton should be a hall of famer
## NO Sarunas Marciulionis should not be a hall of famer
## YES Alonzo Mourning should be a hall of famer
## YES Mitch Richmond should be a hall of famer
## YES Guy Rodgers should be a hall of famer
## YES Louie Dampier should be a hall of famer
## YES Spencer Haywood should be a hall of famer
## YES Dikembe Mutombo should be a hall of famer
## YES Jo should be a hall of famer
## YES Zelmo Beaty should be a hall of famer
## YES Allen Iverson should be a hall of famer
## YES Yao Ming should be a hall of famer
## YES Shaquille O'Neal should be a hall of famer
## YES George McGinnis should be a hall of famer
## YES Tracy McGrady should be a hall of famer
## YES Ray Allen should be a hall of famer
## YES Maurice Cheeks should be a hall of famer
## YES Grant Hill should be a hall of famer
## YES Jason Kidd should be a hall of famer
## YES Steve Nash should be a hall of famer
## YES Dino Radja should be a hall of famer
## YES Charlie Scott should be a hall of famer
## NO Carl Braun should not be a hall of famer
## NO Charles “Chuc should not be a hall of famer
## YES Vlade Divac should be a hall of famer
## NO Bobby Jones should not be a hall of famer
## NO Sidney Moncrief should not be a hall of famer
## YES Jack Sikma should be a hall of famer
## YES Paul Westphal should be a hall of famer
## YES Kobe Bryant should be a hall of famer
## YES Tim Duncan should be a hall of famer
## YES Kevin Garnett should be a hall of famer
## YES Chris Bosh should be a hall of famer
## YES Bob Dandridge should be a hall of famer
## NO Toni Kukoc should not be a hall of famer
## YES Paul Pierce should be a hall of famer
## YES Ben Wallace should be a hall of famer
## YES Chris Webber should be a hall of famer
## YES Manu Ginobili should be a hall of famer
## YES Tim Hardaway should be a hall of famer
## YES Lou Hudson should be a hall of famer
## YES Pau Gasol should be a hall of famer
## YES Dirk Nowitzki should be a hall of famer
## YES Tony Parker should be a hall of famer
## YES Dwyane Wade should be a hall of famer
## Accuracy of classifying hall of famers: 86.39456 %
## [1] 86.39456 79.59044
Let’s run our classifier on today’s players and see who will make into the hall of fame.
First let’s run our predictive model using a parameter of 1
curr_player_names = c(nba_2022_df$Player)
predicted_hof = c()
for(i in 1:length(curr_player_names)){
name = curr_player_names[i]
if(hofOrNah(name, 1, recent_players_df)){
print(paste("YES based on his career averages so far,", name, "will be a hall of famer"))
predicted_hof <- c(predicted_hof, name)
}
}
## [1] "YES based on his career averages so far, Bam Adebayo will be a hall of famer"
## [1] "YES based on his career averages so far, LaMarcus Aldridge will be a hall of famer"
## [1] "YES based on his career averages so far, Jarrett Allen will be a hall of famer"
## [1] "YES based on his career averages so far, Giannis Antetokounmpo will be a hall of famer"
## [1] "YES based on his career averages so far, Carmelo Anthony will be a hall of famer"
## [1] "YES based on his career averages so far, Cole Anthony will be a hall of famer"
## [1] "YES based on his career averages so far, Deandre Ayton will be a hall of famer"
## [1] "YES based on his career averages so far, LaMelo Ball will be a hall of famer"
## [1] "YES based on his career averages so far, Lonzo Ball will be a hall of famer"
## [1] "YES based on his career averages so far, Eric Bledsoe will be a hall of famer"
## [1] "YES based on his career averages so far, Malcolm Brogdon will be a hall of famer"
## [1] "YES based on his career averages so far, Jimmy Butler will be a hall of famer"
## [1] "YES based on his career averages so far, Wendell Carter Jr. will be a hall of famer"
## [1] "YES based on his career averages so far, Darren Collison will be a hall of famer"
## [1] "YES based on his career averages so far, Mike Conley will be a hall of famer"
## [1] "YES based on his career averages so far, DeMarcus Cousins will be a hall of famer"
## [1] "YES based on his career averages so far, Cade Cunningham will be a hall of famer"
## [1] "YES based on his career averages so far, Stephen Curry will be a hall of famer"
## [1] "YES based on his career averages so far, Anthony Davis will be a hall of famer"
## [1] "YES based on his career averages so far, DeMar DeRozan will be a hall of famer"
## [1] "YES based on his career averages so far, Andre Drummond will be a hall of famer"
## [1] "YES based on his career averages so far, Kris Dunn will be a hall of famer"
## [1] "YES based on his career averages so far, Kevin Durant will be a hall of famer"
## [1] "YES based on his career averages so far, Anthony Edwards will be a hall of famer"
## [1] "YES based on his career averages so far, Joel Embiid will be a hall of famer"
## [1] "YES based on his career averages so far, De'Aaron Fox will be a hall of famer"
## [1] "YES based on his career averages so far, Markelle Fultz will be a hall of famer"
## [1] "YES based on his career averages so far, Darius Garland will be a hall of famer"
## [1] "YES based on his career averages so far, Josh Giddey will be a hall of famer"
## [1] "YES based on his career averages so far, Shai Gilgeous-Alexander will be a hall of famer"
## [1] "YES based on his career averages so far, Devonte' Graham will be a hall of famer"
## [1] "YES based on his career averages so far, Draymond Green will be a hall of famer"
## [1] "YES based on his career averages so far, Blake Griffin will be a hall of famer"
## [1] "YES based on his career averages so far, Tyrese Haliburton will be a hall of famer"
## [1] "YES based on his career averages so far, James Harden will be a hall of famer"
## [1] "YES based on his career averages so far, Killian Hayes will be a hall of famer"
## [1] "YES based on his career averages so far, Jrue Holiday will be a hall of famer"
## [1] "YES based on his career averages so far, Al Horford will be a hall of famer"
## [1] "YES based on his career averages so far, Dwight Howard will be a hall of famer"
## [1] "YES based on his career averages so far, Andre Iguodala will be a hall of famer"
## [1] "YES based on his career averages so far, Kyrie Irving will be a hall of famer"
## [1] "YES based on his career averages so far, Reggie Jackson will be a hall of famer"
## [1] "YES based on his career averages so far, LeBron James will be a hall of famer"
## [1] "YES based on his career averages so far, DeAndre Jordan will be a hall of famer"
## [1] "YES based on his career averages so far, Brandon Knight will be a hall of famer"
## [1] "YES based on his career averages so far, Damian Lillard will be a hall of famer"
## [1] "YES based on his career averages so far, Kevin Love will be a hall of famer"
## [1] "YES based on his career averages so far, Kyle Lowry will be a hall of famer"
## [1] "YES based on his career averages so far, CJ McCollum will be a hall of famer"
## [1] "YES based on his career averages so far, Davion Mitchell will be a hall of famer"
## [1] "YES based on his career averages so far, Donovan Mitchell will be a hall of famer"
## [1] "YES based on his career averages so far, Evan Mobley will be a hall of famer"
## [1] "YES based on his career averages so far, Ja Morant will be a hall of famer"
## [1] "YES based on his career averages so far, Victor Oladipo will be a hall of famer"
## [1] "YES based on his career averages so far, Chris Paul will be a hall of famer"
## [1] "YES based on his career averages so far, Elfrid Payton will be a hall of famer"
## [1] "YES based on his career averages so far, Rajon Rondo will be a hall of famer"
## [1] "YES based on his career averages so far, Derrick Rose will be a hall of famer"
## [1] "YES based on his career averages so far, Ricky Rubio will be a hall of famer"
## [1] "YES based on his career averages so far, Domantas Sabonis will be a hall of famer"
## [1] "YES based on his career averages so far, Dennis Smith Jr. will be a hall of famer"
## [1] "YES based on his career averages so far, Isaiah Stewart will be a hall of famer"
## [1] "YES based on his career averages so far, Jalen Suggs will be a hall of famer"
## [1] "YES based on his career averages so far, Jayson Tatum will be a hall of famer"
## [1] "YES based on his career averages so far, Isaiah Thomas will be a hall of famer"
## [1] "YES based on his career averages so far, Klay Thompson will be a hall of famer"
## [1] "YES based on his career averages so far, Tristan Thompson will be a hall of famer"
## [1] "YES based on his career averages so far, Karl-Anthony Towns will be a hall of famer"
## [1] "YES based on his career averages so far, Kemba Walker will be a hall of famer"
## [1] "YES based on his career averages so far, Russell Westbrook will be a hall of famer"
## [1] "YES based on his career averages so far, Andrew Wiggins will be a hall of famer"
## [1] "YES based on his career averages so far, Trae Young will be a hall of famer"
From these results, it looks like our model classified a LOT of today’s players into the hall of fame.
Now let’s try running it with a parameter of 2 (below)
curr_player_names = c(nba_2022_df$Player)
predicted_hof = c()
for(i in 1:length(curr_player_names)){
name = curr_player_names[i]
if(hofOrNah(name, 2, recent_players_df)){
print(paste("YES based on his career averages so far,", name, "will be a hall of famer"))
predicted_hof <- c(predicted_hof, name)
}
}
## [1] "YES based on his career averages so far, LaMarcus Aldridge will be a hall of famer"
## [1] "YES based on his career averages so far, Giannis Antetokounmpo will be a hall of famer"
## [1] "YES based on his career averages so far, LaMelo Ball will be a hall of famer"
## [1] "YES based on his career averages so far, DeMarcus Cousins will be a hall of famer"
## [1] "YES based on his career averages so far, Stephen Curry will be a hall of famer"
## [1] "YES based on his career averages so far, Anthony Davis will be a hall of famer"
## [1] "YES based on his career averages so far, DeMar DeRozan will be a hall of famer"
## [1] "YES based on his career averages so far, Kevin Durant will be a hall of famer"
## [1] "YES based on his career averages so far, Joel Embiid will be a hall of famer"
## [1] "YES based on his career averages so far, De'Aaron Fox will be a hall of famer"
## [1] "YES based on his career averages so far, Josh Giddey will be a hall of famer"
## [1] "YES based on his career averages so far, Shai Gilgeous-Alexander will be a hall of famer"
## [1] "YES based on his career averages so far, Blake Griffin will be a hall of famer"
## [1] "YES based on his career averages so far, James Harden will be a hall of famer"
## [1] "YES based on his career averages so far, Kyrie Irving will be a hall of famer"
## [1] "YES based on his career averages so far, LeBron James will be a hall of famer"
## [1] "YES based on his career averages so far, Damian Lillard will be a hall of famer"
## [1] "YES based on his career averages so far, Donovan Mitchell will be a hall of famer"
## [1] "YES based on his career averages so far, Ja Morant will be a hall of famer"
## [1] "YES based on his career averages so far, Domantas Sabonis will be a hall of famer"
## [1] "YES based on his career averages so far, Karl-Anthony Towns will be a hall of famer"
## [1] "YES based on his career averages so far, Russell Westbrook will be a hall of famer"
## [1] "YES based on his career averages so far, Trae Young will be a hall of famer"
Based on the contrasting outputs, it appears that while a parameter of 1 provided the highest accuracy in training results, a parameter of 2 yielded more “realistic” outcomes that aligned with our intuition. The first list consists of players who are undoubtedly talented but may not be regarded as Hall of Fame caliber by fans or experts. Players like Jeff Teague, Marcus Smart, and Collin Sexton, though solid performers, are not commonly seen as belonging in the Hall of Fame. On the other hand, the second list presents us with names that resonate as true superstars and undeniable Hall of Famers. This disparity raises the question: why is this discrepancy occurring?
One explanation is that our predictive model exclusively focuses on retired players. The dataset we are utilizing includes relatively young players who are still in their prime. Consequently, their career averages are unusually high compared to what they will eventually be once they retire and enter the declining phase of their careers.
Another factor to consider is that the criteria for Hall of Famers have evolved over the past few decades. We observed instances such as Bobby Jones, who may not boast high statistics according to today’s NBA standards but were considered exceptional during the era they played in. Hence, to enhance our predictions, it may be beneficial to establish averages based on Hall of Famers who competed in the last 1-2 decades as a benchmark for evaluating the suitability of current players. As the game has progressed, it might be prudent to assign greater significance to data from recent eras rather than relying heavily on information from half a century ago. Undoubtedly, the standards for the Hall of Fame have undergone changes over time.
Overall, my model performed as intended. I have numerous ideas to enhance its effectiveness, such as incorporating additional categories, weighing different eras differently, and possibly developing separate models based on player positions. Furthermore, we anticipate obtaining more accurate results when evaluating players who have concluded their careers and are awaiting induction into the Hall of Fame.