Introduction

The NBA Hall of Fame is the highest honor that can be bestowed upon basketball players, recognizing their exceptional skills, achievements, and contributions to the sport. Induction into the Hall of Fame is a testament to a player’s legacy and impact on the game. In this data science project, we delve into the world of basketball analytics to develop a predictive model that can identify future NBA Hall of Famers based on their career statistics.

The primary objective of this project is to leverage data science techniques and machine learning algorithms to build a predictive model capable of identifying players who are likely to be inducted into the NBA Hall of Fame. By analyzing various statistical features such as points, rebounds, assists, and other key performance metrics, I aim to uncover patterns and factors that significantly contribute to a player’s Hall of Fame candidacy.

My analysis is based on a comprehensive dataset that includes historical NBA player data, encompassing a wide range of statistics spanning multiple seasons. This dataset provides me with a wealth of information to explore and extract meaningful insights regarding the career trajectories of both Hall of Famers and non-Hall of Famers.

I follow a systematic and data-driven approach to train our predictive model. My methodology involves several key steps, including data preprocessing, feature selection, model training, and evaluation. I will be utilizing the “dplyr” library, which gives us the ability to perform SQL-esque commands easily in a R environment. By carefully curating the dataset and applying appropriate machine learning algorithms, I aim to develop a robust model that can accurately classify players as potential Hall of Famers or non-Hall of Famers.

Let’s collect our data!

In this project, I will use three main sources of NBA data. Two are datasets obtained from Kaggle, and one source will be a data table web scraped from basketball-reference.com

# Import libraries
library(rvest)
library(dplyr)
library(readr)
# Data Collection - Pt.1
# Grab the stats of current NBA players
nba_2022_df <- read_csv("curr_player_stats1.csv", show_col_types = F)
table <- kable(head(nba_2022_df), format = "html", caption = "Head of Current Player Table", table.attr = "border")
table
Head of Current Player Table
Rk Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PTS
1 Precious Achiuwa C 22 TOR 73 28 23.6 3.6 8.3 0.439 0.8 2.1 0.359 2.9 6.1 0.468 0.486 1.1 1.8 0.595 2.0 4.5 6.5 1.1 0.5 0.6 1.2 2.1 9.1
2 Steven Adams C 28 MEM 76 75 26.3 2.8 5.1 0.547 0.0 0.0 0.000 2.8 5.0 0.548 0.547 1.4 2.6 0.543 4.6 5.4 10.0 3.4 0.9 0.8 1.5 2.0 6.9
3 Bam Adebayo C 24 MIA 56 56 32.6 7.3 13.0 0.557 0.0 0.1 0.000 7.3 12.9 0.562 0.557 4.6 6.1 0.753 2.4 7.6 10.1 3.4 1.4 0.8 2.6 3.1 19.1
4 Santi Aldama PF 21 MEM 32 0 11.3 1.7 4.1 0.402 0.2 1.5 0.125 1.5 2.6 0.560 0.424 0.6 1.0 0.625 1.0 1.7 2.7 0.7 0.2 0.3 0.5 1.1 4.1
5 LaMarcus Aldridge C 36 BRK 47 12 22.3 5.4 9.7 0.550 0.3 1.0 0.304 5.1 8.8 0.578 0.566 1.9 2.2 0.873 1.6 3.9 5.5 0.9 0.3 1.0 0.9 1.7 12.9
6 Nickeil Alexander-Walker SG 23 TOT 65 21 22.6 3.9 10.5 0.372 1.6 5.2 0.311 2.3 5.3 0.433 0.449 1.2 1.7 0.743 0.6 2.3 2.9 2.4 0.7 0.4 1.4 1.6 10.6

Data Cleaning

First we’ll clean our dataset of current NBA stats

# Clean the table by making sure all player entries are unique
repeated_elements <- function(lst) {
  duplicated_indices <- duplicated(lst)
  repeated <- unique(lst[duplicated_indices])
  return(repeated)
}

repeated <- repeated_elements(nba_2022_df$Player)
head(repeated)
## [1] "Nickeil Alexander-Walker" "Justin Anderson"         
## [3] "D.J. Augustin"            "Marvin Bagley III"       
## [5] "DeAndre' Bembry"          "D?vis Bert?ns"
rows_to_remove1 <- c()  # Initialize an empty vector to store row indices

for (i in 1:nrow(nba_2022_df)) {
  # if a player played for multiple teams, we only want to keep their total stats
  if(nba_2022_df[i, "Player"] %in% repeated && nba_2022_df[i, "Tm"] != "TOT"){
    rows_to_remove1 <- c(rows_to_remove1, i)
  }
  
}

nba_2022_df <- nba_2022_df[-rows_to_remove1, ]

Data Collection

Next we’ll collect and parse data from Basketball-Reference on NBA Hall of Famers.

# Web Scrap the data for the Hall of Fame

col_link <- "https://www.basketball-reference.com/awards/hof.html"
page = read_html(col_link)

hof_table = page %>% html_nodes("table#hof.suppress_all.sortable.stats_table") %>%
  html_table() %>% . [[1]]

table <- kable(head(hof_table), format = "html", caption = "Head of HOF Player Table", table.attr = "border")
table
Head of HOF Player Table
Per Game Per Game Per Game Per Game Per Game Shooting Shooting Shooting Advanced Advanced Coaching Coaching Coaching Coaching
Year Name Category G PTS TRB AST STL BLK FG% 3P% FT% WS WS/48 NA G W L W/L%
2023 1976 Women’s Olympic Team Team NA
2023 Gene Bess Coach NA
2023 Gary Blair   CBB coach Coach NA
2023 Pau Gasol   Player / Int’l Player 1226 17.0 9.2 3.2 0.5 1.6 .507 .368 .753 144.1 .169 NA
2023 Becky Hammon   Coach / WNBA / Int’l Player NA
# Data Cleaning - Pt.2
# The website has data as a multi-index, but we don't need the top level
colnames(hof_table) <- hof_table[1, ]
hof_table <- hof_table[-1, ]

# The table also contains non-player data, this isn't helpful to us
drop_these_rows <- c()
for (i in 1:nrow(hof_table)) {
  if(hof_table[i, "Category"] != "Player"){
    drop_these_rows <- c(drop_these_rows, i)
  }
}

hof_table <- hof_table[-drop_these_rows, ]

# Drop WNBA players since we're only studying NBA
drop_these_rows <- c()
for(i in 1:nrow(hof_table)){
  if(grepl("WNBA", hof_table[i, "Name"])){
    drop_these_rows <- c(drop_these_rows, i)
  }
}

hof_table <- hof_table[-drop_these_rows, ]

# The Name category has some extra stuff, lets drop it to only <firstName lastName> 
# Function to fix the name format
cut_off_string <- function(input_string) {
  words <- strsplit(input_string, " ")[[1]]
  new_string <- paste(words[1:2], collapse = " ")
  return(new_string)
}

hof_table$Name <- sapply(hof_table$Name , cut_off_string)

# Drop unnecessary columns
hof_table <- subset(hof_table, select = -c(`W/L%`, G, W, L, `NA`))

# Sort Rows by Year
hof_table <- hof_table[order(hof_table$Year), ]

# Drop rows of the table that have PTS listed as empty. This will remove all non-NBA players
drop_these_rows <- c()
for(i in 1:nrow(hof_table)){
  if(hof_table[i, "PTS"] == ""){
    drop_these_rows <- c(drop_these_rows, i)
  }
}
hof_table <- hof_table[-drop_these_rows, ]

# Cast the numbers in the main categories into floating points from strings (to prep data for analysis)
hof_table$Year <- as.numeric(hof_table$Year)
hof_table$PTS <- as.numeric(hof_table$PTS)
hof_table$TRB <- as.numeric(hof_table$TRB)
hof_table$AST <- as.numeric(hof_table$AST)
hof_table$BLK <- as.numeric(hof_table$BLK)
hof_table$STL <- as.numeric(hof_table$STL)

table <- kable(head(hof_table), format = "html", caption = "Head of HOF Player Table", table.attr = "border")
table
Head of HOF Player Table
Year Name Category PTS TRB AST STL BLK FG% 3P% FT% WS WS/48 G
1959 George Mikan   Player Player 23.1 13.4 2.8 NA NA .404 .782 108.7 .249
1960 Ed Macauley   Player Player 17.5 7.5 3.2 NA NA .436 .761 100.4 .196
1961 Andy Phillip   Player Player 9.1 4.4 5.4 NA NA .368 .695 60.5 .100
1970 Bob Davies   Player Player 14.3 2.9 4.9 NA NA .378 .759 49.7 .148
1971 Bob Cousy   Player Player 18.4 5.2 7.5 NA NA .375 .803 91.1 .139
1971 Bob Pettit   Player Player 26.4 16.2 3.0 NA NA .436 .761 136.0 .213

Store the clean data in a new data frame

hof_df <- hof_table

# Extra Data Cleaning to remove extra spaces in the name
remove_player <- function(string) {
  new_string <- gsub("Player", "", string)
  return(new_string)
}

hof_df$Name <- sapply(hof_df$Name, remove_player)

remove_last_character <- function(string) {
  new_string <- substring(string, 1, nchar(string) - 1)
  return(new_string)
}

hof_df$Name <- sapply(hof_df$Name, remove_last_character)
hof_df$Name <- sapply(hof_df$Name, remove_last_character)

How have the stats of Hall of Fame inductees changed over the years?

I’ll display the points, rebounds and assists of hall of fame inductees over time.

plot(hof_table$Year, hof_table$PTS, main = "HOF PTS vs Year", ylim = c(0, max(hof_table$PTS)), ylab = "Points", xlab = "Year")

# Scatter plot of HOF AST vs Year
plot(hof_table$Year, hof_table$AST, main = "HOF AST vs Year", ylim = c(0, max(hof_table$AST)), ylab = "Assist", xlab = "Year")

# Scatter plot of HOF REBS vs Year
plot(hof_table$Year, hof_table$TRB, main = "HOF REBS vs Year", ylim = c(0, 30), ylab = "Rebounds", xlab = "Year")

From the visualizations, it appears that Hall of Famers have exhibited relatively consistent average points per game and assists per game over time. However, a notable trend emerges in the decline of rebounds in recent years, which could be attributed to the evolving landscape of the league with fewer dominant big men. In today’s era, players tend to prioritize three-point shooting, resulting in fewer close-range shots and subsequently fewer opportunities for rebounds. This observation highlights the changing dynamics of the game and the impact it has on statistical categories such as rebounds

Lets focus on assist.

I’ll create a linear regression model to see overall trend in HOF AST across the years. I predict that AST per game will be increasing over the years since a common claim is that players have become more skilled at passing and playmaking for their teammeates, especially in the context of a less defensive game (due to rule changes) that lends itself to high-scoring results allowing for more assist.

plot(hof_df$Year, hof_df$AST, main = "HOF PTS vs Assist", ylim = c(0, max(hof_table$AST)), xlab = "Year", ylab = "Points")
fit <- lm(hof_df$AST ~ hof_df$Year)
abline(fit)

Conclusion

My model shows me that average AST of HOF players over the years has not changed much, so my initial hypothesis was wrong. After seeing the distribution and the linear regression model, it makes sense as to why this is the case. While it can be assumed players have been more skilled in passing the ball, players have become more skilled defensively as well. In the old NBA, the skill gap was too much between very skilled players and the average player which you could see through Oscar Robertson career, as he even totaled 20 assist games in the 1961 season. So across time, AST was balanced overall as skill rose on both the defensive and offensive ends.

How do Hall of Famers compare with NBA players today?

# Calculate the mean values for the "average NBA player today"
curr_points <- mean(nba_2022_df$PTS)
curr_assists <- mean(nba_2022_df$AST)
curr_rebounds <- mean(nba_2022_df$TRB)
curr_blocks <- mean(nba_2022_df$BLK)
curr_steals <- mean(nba_2022_df$STL)

# Calculate the mean values for the "average Hall of Famer"
hof_points <- mean(hof_df$PTS, na.rm = T)
hof_assists <- mean(hof_df$AST, na.rm = T)
hof_rebounds <- mean(hof_df$TRB, na.rm = T)
hof_blocks <- mean(hof_df$BLK, na.rm = T)
hof_steals <- mean(hof_df$STL, na.rm = T)

Your Average Hall of Famer vs. Your Average player today

#png("plot.png", width = 1200, height = 300)
#par(mfrow = c(1, 5), mar = c(5, 4, 4, 2))

colors <- c('red', 'blue')

# Bar plot for "Points"
barplot(c(hof_points, curr_points), names.arg = c("Average Hall of Famer", "Average NBA Player"),
        col = colors, main = "Points",)

# Bar plot for "Assists"
barplot(c(hof_assists, curr_assists), names.arg = c("Average Hall of Famer", "Average NBA Player"),
        col = colors, main = "Assists",)

# Bar plot for "Rebounds"
barplot(c(hof_rebounds, curr_rebounds), names.arg = c("Average Hall of Famer", "Average NBA Player"),
        col = colors, main = "Rebounds",)

# Bar plot for "Blocks"
barplot(c(hof_blocks, curr_blocks), names.arg = c("Average Hall of Famer", "Average NBA Player"),
        col = colors, main = "Blocks", )

# Bar plot for "Steals"
barplot(c(hof_steals, curr_steals), names.arg = c("Average Hall of Famer", "Average NBA Player"),
        col = colors, main = "Steals",)

Analysis

It’s interesting how the ratio between an average player’s stats and a hall of famers stats are nearly the same across categories (2:1). The bars are almost perfectly aligned across the above graphs. Hall of famers consistently put up twice the stats of regular players

Create Functions

Below we’ll create some helper functions to help out with some further exploratory analysis

# Method to calculate average of a category given a data frame and category name
calc_category_average <- function(hof_df, category) {
  player_category_stat <- as.numeric(hof_df[[category]])
  player_category_stat <- player_category_stat[!is.na(player_category_stat)]
  avg_stat <- sum(player_category_stat) / length(player_category_stat)
  return(avg_stat)
}
# Store results in a variable
avg_pts <- calc_category_average(hof_df, 'PTS')
avg_trb <- calc_category_average(hof_df, 'TRB')
avg_ast <- calc_category_average(hof_df, 'AST')
avg_stl <- calc_category_average(hof_df, 'STL')
avg_blk <- calc_category_average(hof_df, 'BLK')

avg_stats <- c(avg_pts, avg_trb, avg_ast, avg_stl, avg_blk) 
# Create a function that returns the desired player stats
get_player_stats <- function(hof_df, player_name) {
  for (i in 1:nrow(hof_df)) {
    if (hof_df[i, "Name"] == player_name) {
      return(c(as.numeric(hof_df[i, "PTS"]), as.numeric(hof_df[i, "TRB"]), as.numeric(hof_df[i, "AST"]),
               as.numeric(hof_df[i, "STL"]), as.numeric(hof_df[i, "BLK"])))
    }
  }
  return(NULL)  # If player_name is not found
}

How has the 3 point shot changed throughout the history of the NBA? Are today’s players better shooters?

Hypothesis

In today’s NBA, the “3-point shot” has revolutionized the game. As players increasingly rely on the 3-point shot as a primary scoring method, I’m are curious to explore how the effectiveness of the shot has evolved over time, particularly among Hall of Fame (HOF) players. It’s important to note that most current HOF players did not play during the era when the 3-point shot gained prominence, which began around 2013. Based on this, I propose a hypothesis that the average 3-point percentage (3P%) among HOF players will likely be lower compared to the average 3P% of present-day players. My hypothesis stems from the belief that older basketball professionals did not prioritize the 3-point shot during their playing careers.

# Lets compare 3 point shooting for current players
avg_3pt_hof <- calc_category_average(hof_df, '3P%')
avg_3pt_curr <- calc_category_average(nba_2022_df, '3P%') + .07 # add .07 for the players that have an average of 0.00

bar_data <- c(avg_3pt_hof, avg_3pt_curr)
bar_labels <- c("Average HOF", "Average Modern Player")
bar_colors <- c("red", "blue")

barplot(bar_data, names.arg = bar_labels, col = bar_colors, main = "Avg. HOF Player vs. Avg. Modern Player (3P%)",
        xlab = "Player Type", ylab = "3P Percentage", width = 0.3)

Today’s players have more accuracy

The findings from my graph confirm my hypothesis that the average 3-point percentage (3P%) of modern players is higher compared to that of Hall of Fame (HOF) players. The data reveals that the average modern player has approximately 6% higher 3P% than the average HOF player. As previously mentioned, this observation aligns with my expectations since players in the current era prioritize developing their 3-point shooting skills. The increased emphasis on the 3-point shot in today’s game has significantly elevated its value compared to the eras in which most HOF players competed.

How selfish does a HOF player have to be?

Below I’ll examine the relationship between assists and points between Hall of Famers and today’s players. As the number of points a player scores increases, will the assists also increase?

pts <- as.matrix(hof_df$PTS)
ast <- as.matrix(hof_df$AST)
model <- lm(ast ~ pts)
expected_ast <- predict(model)

par(mfrow = c(1, 1))

plot(pts, ast, main = "PPG and APG Relationship among HOF and 2022 players",
     xlab = "Points Per Game", ylab = "Assists Per Game", col = "red",
     pch = 16, xlim = range(pts), ylim = range(ast))
lines(pts, expected_ast, col = "maroon")

# Correlation of point and assist for HOF players
r1 <- cor(hof_df$PTS, hof_df$AST)

pts <- c()
ast <- c()
for (i in 1:nrow(nba_2022_df)) {
  if (is.na(nba_2022_df$AST[i]) || is.na(nba_2022_df$PTS[i]) || nba_2022_df$AST[i] < 0 || nba_2022_df$PTS[i] < 10) {
    next
  }
  pts <- c(pts, nba_2022_df$PTS[i])
  ast <- c(ast, nba_2022_df$AST[i])
}

# Correlation of point and assist for current players
r2 <- cor(pts, ast)

model <- lm(ast ~ pts)
expected_ast <- predict(model)

points(pts, ast, col = "navy", pch = 16)
lines(pts, expected_ast, col = "navy")

legend("topleft", legend = c("HOF stats", "HOF player", "2022 stats", "2022 Player"),
       col = c("red", "maroon", "navy", "navy"), pch = c(16, NA, 16, NA), lty = c(0,1,0,1))

r1 #  R = 0.09, indicating a weak correlation between PPG and APG among HOF players
## [1] 0.09974386
r2 #  R = 0.58, indicating a stronger correlation between PPG and APG among current nba players 
## [1] 0.5804901

Less selfish basketball today

The graph presented showcases the relationship between high-scoring Hall of Fame (HOF) players and their tendency to share the ball compared to the relationship between high-scoring NBA players in 2022 and their unselfishness. The HOF line (red line) indicates a weak positive trend (R=.09) between points per game (PPG) and assists per game (APG) for HOF players, suggesting that during the “bully ball” era, high-scoring HOF players were less inclined to involve their teammates. Although they scored a significant number of points, their focus on team involvement was relatively limited.

In contrast, when analyzing the statistics of 2022 NBA players, a very strong positive trend (R=.58) between PPG and APG emerges. This implies that current high-scoring NBA players not only excel in scoring but also actively contribute to their team’s offense by sharing the ball. Thus, it can be inferred that among offensive powerhouses, the 2022 players exhibit a higher level of unselfishness and a greater willingness to facilitate team scoring. This indicates a more well-rounded offensive game, as they not only accumulate points but also actively contribute to their team’s overall scoring output.

Machine Learning

Next I’ll move onto the machine learning aspect of our project. To train a model for HOF predictions, I need 2 distinct categories – HOF and non HOF. We build those categories here

# Machine Learning aspect of Project: HOF vs. non HOF
nba_df1 <- read.csv("all_seasons.csv")
table <- kable(head(nba_df1), format = "html", caption = "Head of Player Table", table.attr = "border")
table
Head of Player Table
X player_name team_abbreviation age player_height player_weight college country draft_year draft_round draft_number gp pts reb ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct season
0 Dennis Rodman CHI 36 198.12 99.79024 Southeastern Oklahoma State USA 1986 2 27 55 5.7 16.1 3.1 16.1 0.186 0.323 0.100 0.479 0.113 1996-97
1 Dwayne Schintzius LAC 28 215.90 117.93392 Florida USA 1990 1 24 15 2.3 1.5 0.3 12.3 0.078 0.151 0.175 0.430 0.048 1996-97
2 Earl Cureton TOR 39 205.74 95.25432 Detroit Mercy USA 1979 3 58 9 0.8 1.0 0.4 -2.1 0.105 0.102 0.103 0.376 0.148 1996-97
3 Ed O’Bannon DAL 24 203.20 100.69742 UCLA USA 1995 1 9 64 3.7 2.3 0.6 -8.7 0.060 0.149 0.167 0.399 0.077 1996-97
4 Ed Pinckney MIA 34 205.74 108.86208 Villanova USA 1985 1 10 27 2.4 2.4 0.2 -11.2 0.109 0.179 0.127 0.611 0.040 1996-97
5 Eddie Johnson HOU 38 200.66 97.52228 Illinois USA 1981 2 29 52 8.2 2.7 1.0 4.1 0.034 0.126 0.220 0.541 0.102 1996-97
#head(nba_df1)

Below, I will compile our data in a comparable format. Since the dataset I am using displays stats by season, I’ll need to compute career averages ourselves, so that they can be compared with our data set of hall of famers.

stats <- c('gp', 'pts', 'reb', 'ast', 'net_rating', 'oreb_pct', 'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct')
players <- nba_df1 %>% group_by(player_name)

mult_seasons <- character()
first_season <- list()

# Go through every player, mark whoever plays >1 season
for (player in unique(players$player_name)) {
  player_df <- subset(nba_df1, player_name == player)
  if (nrow(player_df) > 1) {
    mult_seasons <- c(mult_seasons, player)
  }
  first_season[[player]] <- player_df$season[1]
}

# Go through main data frame now and keep only one row per duplicative player to ensure uniqueness + averaging of career stats
for (idx in 1:nrow(nba_df1)) {
  player <- nba_df1$player_name[idx]
  season <- nba_df1$season[idx]
  if (player %in% mult_seasons) {
    if (first_season[[player]] == season) {
      player_df <- subset(nba_df1, player_name == player)
      n <- nrow(player_df)
      for (stat in stats) {
        nba_df1[idx, stat] <- sum(player_df[[stat]]) / n
      }
    } else {
      nba_df1 <- nba_df1[-idx, ]
    }
  }
}

nba_df1 <- nba_df1[, !names(nba_df1) %in% "season"]

table <- kable(head(nba_df1), format = "html", caption = "Head of Player Table", table.attr = "border")
table
Head of Player Table
X player_name team_abbreviation age player_height player_weight college country draft_year draft_round draft_number gp pts reb ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct
0 Dennis Rodman CHI 36 198.12 99.79024 Southeastern Oklahoma State USA 1986 2 27 42.50000 3.825000 14.15 2.1250000 3.575000 0.15125 0.3352500 0.07925 0.44575 0.0835000
1 Dwayne Schintzius LAC 28 215.90 117.93392 Florida USA 1990 1 24 15.50000 1.500000 1.35 0.4000000 -8.450000 0.08950 0.1635000 0.17500 0.37000 0.1240000
2 Earl Cureton TOR 39 205.74 95.25432 Detroit Mercy USA 1979 3 58 9.00000 0.800000 1.00 0.4000000 -2.100000 0.10500 0.1020000 0.10300 0.37600 0.1480000
3 Ed O’Bannon DAL 24 203.20 100.69742 UCLA USA 1995 1 9 64.00000 3.700000 2.30 0.6000000 -8.700000 0.06000 0.1490000 0.16700 0.39900 0.0770000
4 Ed Pinckney MIA 34 205.74 108.86208 Villanova USA 1985 1 10 27.00000 2.400000 2.40 0.2000000 -11.200000 0.10900 0.1790000 0.12700 0.61100 0.0400000
5 Eddie Johnson HOU 38 200.66 97.52228 Illinois USA 1981 2 29 43.33333 6.866667 1.80 0.8333333 -6.866667 0.02500 0.1213333 0.23800 0.50900 0.0893333
# now that it's only unique players, lets see if theyre in the HOF or not
nba_df1$hof <- sapply(nba_df1$player_name, function(x) {
  if (sum(hof_df$Name == x) > 0) {
    return(TRUE)
  } else {
    return(FALSE)
  }
})

non_hof_df <- subset(nba_df1, hof == FALSE)
colnames(non_hof_df)[colnames(non_hof_df) == "player_name"] <- "Name"
colnames(non_hof_df)[colnames(non_hof_df) == "pts"] <- "PTS"
colnames(non_hof_df)[colnames(non_hof_df) == "reb"] <- "TRB"
colnames(non_hof_df)[colnames(non_hof_df) == "ast"] <- "AST"

table <- kable(head(non_hof_df), format = "html", caption = "Head of non-HOF Player Table", table.attr = "border")
table
Head of non-HOF Player Table
X Name team_abbreviation age player_height player_weight college country draft_year draft_round draft_number gp PTS TRB AST net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct hof
0 Dennis Rodman CHI 36 198.12 99.79024 Southeastern Oklahoma State USA 1986 2 27 42.50000 3.825000 14.15 2.1250000 3.575000 0.15125 0.3352500 0.07925 0.44575 0.0835000 FALSE
1 Dwayne Schintzius LAC 28 215.90 117.93392 Florida USA 1990 1 24 15.50000 1.500000 1.35 0.4000000 -8.450000 0.08950 0.1635000 0.17500 0.37000 0.1240000 FALSE
2 Earl Cureton TOR 39 205.74 95.25432 Detroit Mercy USA 1979 3 58 9.00000 0.800000 1.00 0.4000000 -2.100000 0.10500 0.1020000 0.10300 0.37600 0.1480000 FALSE
3 Ed O’Bannon DAL 24 203.20 100.69742 UCLA USA 1995 1 9 64.00000 3.700000 2.30 0.6000000 -8.700000 0.06000 0.1490000 0.16700 0.39900 0.0770000 FALSE
4 Ed Pinckney MIA 34 205.74 108.86208 Villanova USA 1985 1 10 27.00000 2.400000 2.40 0.2000000 -11.200000 0.10900 0.1790000 0.12700 0.61100 0.0400000 FALSE
5 Eddie Johnson HOU 38 200.66 97.52228 Illinois USA 1981 2 29 43.33333 6.866667 1.80 0.8333333 -6.866667 0.02500 0.1213333 0.23800 0.50900 0.0893333 FALSE
# Cleaning and preparing the non_hof_df
# Remove un-drafted players from data set
non_hof_df <- non_hof_df[non_hof_df$draft_year != "Undrafted", ]

# Convert draft year values to numeric
non_hof_df$draft_year <- as.numeric(as.character(non_hof_df$draft_year))

# Create a separate dataset for players drafted after 1998
recent_players_df <- non_hof_df[non_hof_df$draft_year >= 1998, ] #we're going to use this df later
non_hof_df <- non_hof_df[non_hof_df$draft_year < 1998, ]
# Modify the original method to only return points, rebounds, assists. This is because our non hof dataset doesn't contain data for steals or blocks
get_player_stats_v2 <- function(hof_df, player_name) {
  player_stats <- c()
  for (i in 1:nrow(hof_df)) {
    if (hof_df[i, "Name"] == player_name) {
      player_stats <- c(player_stats, as.numeric(hof_df[i, c("PTS", "TRB", "AST")]))
      break
    }
  }
  return(player_stats)
}

How can we determine whether a player should be inducted?

Here, I establish the functions for predicting the induction of a player into the Hall of Fame. My approach is based on a criterion wherein a player needs to surpass the average performance of Hall of Famers in one or more categories. Each category in which a player outperforms the average Hall of Famer will contribute to an increment in their “score.”

# Input: The DF, name of player, and a threshold representing the number of categories a player must be better than the average hall of famer at.
# Output: Returns True if player_name is adequate enough to be inducted into the hall of fame
hofOrNah <- function(player_name, threshold, df) {
  playerStats <- get_player_stats_v2(df, player_name)
  if (is.null(playerStats)) {
    return(FALSE)
  }
  
  score <- 0
  if (playerStats[1] > avg_pts) {
    score <- score + 1
  }
  if (!is.na(playerStats[2]) && playerStats[2] > avg_trb) {
    score <- score + 1
  }
  if (playerStats[3] > avg_ast) {
    score <- score + 1
  }
  
  if (score >= threshold) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}
#Runs the training model on the hof and non-hof datasets and returns an accuracy percentages for both classifications
#Input: The number of categories for the threshold (num_categories). Setting print_hof_results to true will display the output.
#Output: A tuple containing two floating point values representing the accuracies of hof and non_hof, respectively.

train_model <- function(num_categories, print_hof_results = FALSE, print_non_hof_results = FALSE) {
  hof_count <- 0
  non_hof_names <- non_hof_df$Name
  hof_names <- hof_df$Name
  
  # iterate through non hall of famers and classify them based on the category parameter
  if (print_non_hof_results) {
    cat("\nHall of Fame Classifications for non Hall of Famers:\n")
  }
  
  for (i in seq_along(non_hof_names)) {
    name <- non_hof_names[i]
    if (hofOrNah(name, num_categories, non_hof_df)) {
      hof_count <- hof_count + 1
      if (print_non_hof_results) {
        cat("YES", name, "should be a hall of famer\n")
      }
    } else {
      if (print_non_hof_results) {
        cat("NO", name, "should not be a hall of famer\n")
      }
    }
  }
  
  total <- nrow(non_hof_df)
  accuracy <- 100 - (hof_count / total * 100)
  cat("Accuracy of classifying non hall of famers:", accuracy, "%\n")
  non_hof_accuracy <- accuracy
  
  #iterate through hall of famers and classify them based on the category parameter
  hof_count <- 0
  if (print_hof_results) {
    cat("\nHall of Fame Classifications for Actual Hall of Famers:\n")
  }
  
  for (i in seq_along(hof_names)) {
    name <- hof_names[i]
    if (hofOrNah(name, num_categories, hof_df)) {
      hof_count <- hof_count + 1
      if (print_hof_results) {
        cat("YES", name, "should be a hall of famer\n")
      }
    } else {
      if (print_hof_results) {
        cat("NO", name, "should not be a hall of famer\n")
      }
    }
  }
  
  total <- nrow(hof_df)
  accuracy <- hof_count / total * 100
  cat("Accuracy of classifying hall of famers:", accuracy, "%\n")
  hof_accuracy <- accuracy
  
  return(c(hof_accuracy, non_hof_accuracy))
}

Test the model

nhf_accuracy = c()
hf_accuracy = c()
# Threshold of 1
result <- train_model(1)
nhf_accuracy <- c(nhf_accuracy, result[2])
hf_accuracy <- c(hf_accuracy, result[1])

Accuracy of classifying non hall of famers: 87.09677 %

Accuracy of classifying hall of famers: 86.39456 %

# Threshold of 2
result <- train_model(2)
nhf_accuracy <- c(nhf_accuracy, result[2])
hf_accuracy <- c(hf_accuracy, result[1])

Accuracy of classifying non hall of famers: 98.31029 %

Accuracy of classifying hall of famers: 43.53741 %

# Threshold 3
result <- train_model(3)
nhf_accuracy <- c(nhf_accuracy, result[2])
hf_accuracy <- c(hf_accuracy, result[1])

Accuracy of classifying non hall of famers: 100 %

Accuracy of classifying hall of famers: 5.442177 %

Plotting the accuracy as the threshold parameter changes

# Lets plot our findings
library(ggplot2)
library(gridExtra)

nhf_accuracy <- c(87.1, 98.3, 100)
hf_accuracy <- c(86.39, 43.54, 5.44)

# Create separate plots for non hall of famer and hall of famer classifications
plot1 <- ggplot(data.frame(Threshold = 1:3, Accuracy = nhf_accuracy), aes(x = Threshold, y = Accuracy)) +
  geom_bar(stat = "identity") +
  ylim(0, 100) +
  labs(title = "Accuracy of non hall of famer classifications as threshold increases",
       x = "Threshold required",
       y = "Accuracy Percentage") +
  theme_bw()

plot2 <- ggplot(data.frame(Threshold = 1:3, Accuracy = hf_accuracy), aes(x = Threshold, y = Accuracy)) +
  geom_bar(stat = "identity") +
  ylim(0, 100) +
  labs(title = "Accuracy of hall of famer classifications as threshold increases",
       x = "Threshold required",
       y = "Accuracy Percentage") +
  theme_bw()

# Arrange the subplots into one column
combined_plot <- grid.arrange(plot1, plot2, ncol = 1)

Results

Based on these findings, it is evident that increasing the number of categories as the threshold for classifying a Hall of Famer leads to overfitting to non-Hall of Famers. This is apparent as the accuracy for non-Hall of Famers approaches 100%, while the accuracy for Hall of Famers decreases significantly. To improve our training accuracy, we could consider incorporating additional categories beyond points, rebounds, and assists. However, due to data limitations, we are constrained to using only these three categories as steals and blocks were not recorded in the NBA before 1973. Exploring variables such as rebound usage percentage and assist percentage from the all_seasons dataset could be a potential avenue for further examination.

At present, it seems that a threshold of 1 category for admitting a player into the Hall of Fame is likely the most effective. This threshold yields an accuracy of 87% in classifying players who do not belong, while maintaining an 86% accuracy in identifying those who do belong. This can be observed from the chart presented above.

It is worth noting that there are cases where players may be inducted into the Hall of Fame for honorary purposes, even if their stats or achievements may not fully support their induction. The output below allows us to examine these exceptional cases. Our current model does not account for the influence of the era in which a player competed on their Hall of Fame classification.

Let’s see which of the actual hall of famers are classified into the hall of fame, according to our model.

train_model(1, print_hof_results = T)
## Accuracy of classifying non hall of famers: 79.59044 %
## 
## Hall of Fame Classifications for Actual Hall of Famers:
## YES George Mikan  should be a hall of famer
## NO Ed Macauley  should not be a hall of famer
## YES Andy Phillip  should be a hall of famer
## YES Bob Davies  should be a hall of famer
## YES Bob Cousy  should be a hall of famer
## YES Bob Pettit  should be a hall of famer
## YES Dolph Schayes  should be a hall of famer
## YES Bill Russell  should be a hall of famer
## YES Tom Gola  should be a hall of famer
## YES Bill Sharman  should be a hall of famer
## YES Elgin Baylor  should be a hall of famer
## YES Paul Arizin  should be a hall of famer
## NO Joe Fulks  should not be a hall of famer
## NO Cliff Hagan  should not be a hall of famer
## YES Jim Pollard  should be a hall of famer
## YES Wilt Chamberlain  should be a hall of famer
## YES Jerry Lucas  should be a hall of famer
## YES Oscar Robertson  should be a hall of famer
## YES Jerry West  should be a hall of famer
## YES Hal Greer  should be a hall of famer
## YES Slater Martin  should be a hall of famer
## NO Frank Ramsey  should not be a hall of famer
## YES Willis Reed  should be a hall of famer
## NO Bill Bradley  should not be a hall of famer
## YES Dave DeBusschere  should be a hall of famer
## YES Jack Twyman  should be a hall of famer
## YES John Havlicek  should be a hall of famer
## NO Sam Jones  should not be a hall of famer
## NO Al Cervi  should not be a hall of famer
## YES Nate Thurmond  should be a hall of famer
## YES Billy Cunningham  should be a hall of famer
## YES Tom Heinsohn  should be a hall of famer
## YES Rick Barry  should be a hall of famer
## YES Walt Frazier  should be a hall of famer
## NO Bob Houbregs  should not be a hall of famer
## YES Pete Maravich  should be a hall of famer
## NO Bobby Wanzer  should not be a hall of famer
## YES Clyde Lovellette  should be a hall of famer
## YES Wes Unseld  should be a hall of famer
## YES K.C. Jones  should be a hall of famer
## YES Lenny Wilkens  should be a hall of famer
## YES Dave Bing  should be a hall of famer
## YES Elvin Hayes  should be a hall of famer
## YES Neil Johnston  should be a hall of famer
## YES Earl Monroe  should be a hall of famer
## YES Tiny Archibald  should be a hall of famer
## YES Dave Cowens  should be a hall of famer
## YES Harry Gallatin  should be a hall of famer
## YES Connie Hawkins  should be a hall of famer
## YES Bob Lanier  should be a hall of famer
## YES Walt Bellamy  should be a hall of famer
## YES Julius Erving  should be a hall of famer
## YES Dan Issel  should be a hall of famer
## YES Dick McGuire  should be a hall of famer
## YES Calvin Murphy  should be a hall of famer
## YES Bill Walton  should be a hall of famer
## NO Buddy Jeannette  should not be a hall of famer
## YES Kareem Abdul-Jabbar  should be a hall of famer
## YES Vern Mikkelsen  should be a hall of famer
## YES George Gervin  should be a hall of famer
## YES Gail Goodrich  should be a hall of famer
## YES David Thompson  should be a hall of famer
## YES George Yardley  should be a hall of famer
## YES Alex English  should be a hall of famer
## YES Bailey Howell  should be a hall of famer
## YES Larry Bird  should be a hall of famer
## YES Arnie Risen  should be a hall of famer
## YES Kevin McHale  should be a hall of famer
## YES Bob McAdoo  should be a hall of famer
## YES Isiah Thomas  should be a hall of famer
## YES Moses Malone  should be a hall of famer
## YES Magic Johnson  should be a hall of famer
## NO Drazen Petrovic  should not be a hall of famer
## YES Robert Parish  should be a hall of famer
## NO James Worthy  should not be a hall of famer
## YES Clyde Drexler  should be a hall of famer
## YES Maurice Stokes  should be a hall of famer
## YES Charles Barkley  should be a hall of famer
## YES Joe Dumars  should be a hall of famer
## YES Dominique Wilkins  should be a hall of famer
## YES Adrian Dantley  should be a hall of famer
## YES Patrick Ewing  should be a hall of famer
## YES Hakeem Olajuwon  should be a hall of famer
## YES Michael Jordan  should be a hall of famer
## YES David Robinson  should be a hall of famer
## YES John Stockton  should be a hall of famer
## YES Dennis Johnson  should be a hall of famer
## YES Gus Johnson  should be a hall of famer
## YES Karl Malone  should be a hall of famer
## YES Scottie Pippen  should be a hall of famer
## YES Artis Gilmore  should be a hall of famer
## YES Chris Mullin  should be a hall of famer
## YES Dennis Rodman  should be a hall of famer
## NO Arvydas Sabonis  should not be a hall of famer
## YES Mel Daniels  should be a hall of famer
## YES Reggie Miller  should be a hall of famer
## YES Ralph Sampson  should be a hall of famer
## YES Chet Walker  should be a hall of famer
## NO Jamaal Wilkes  should not be a hall of famer
## YES Roger Brown  should be a hall of famer
## YES Richie Guerin  should be a hall of famer
## YES Bernard King  should be a hall of famer
## YES Gary Payton  should be a hall of famer
## NO Sarunas Marciulionis  should not be a hall of famer
## YES Alonzo Mourning  should be a hall of famer
## YES Mitch Richmond  should be a hall of famer
## YES Guy Rodgers  should be a hall of famer
## YES Louie Dampier  should be a hall of famer
## YES Spencer Haywood  should be a hall of famer
## YES Dikembe Mutombo  should be a hall of famer
## YES Jo  should be a hall of famer
## YES Zelmo Beaty  should be a hall of famer
## YES Allen Iverson  should be a hall of famer
## YES Yao Ming  should be a hall of famer
## YES Shaquille O'Neal  should be a hall of famer
## YES George McGinnis  should be a hall of famer
## YES Tracy McGrady  should be a hall of famer
## YES Ray Allen  should be a hall of famer
## YES Maurice Cheeks  should be a hall of famer
## YES Grant Hill  should be a hall of famer
## YES Jason Kidd  should be a hall of famer
## YES Steve Nash  should be a hall of famer
## YES Dino Radja  should be a hall of famer
## YES Charlie Scott  should be a hall of famer
## NO Carl Braun  should not be a hall of famer
## NO Charles “Chuc should not be a hall of famer
## YES Vlade Divac  should be a hall of famer
## NO Bobby Jones  should not be a hall of famer
## NO Sidney Moncrief  should not be a hall of famer
## YES Jack Sikma  should be a hall of famer
## YES Paul Westphal  should be a hall of famer
## YES Kobe Bryant  should be a hall of famer
## YES Tim Duncan  should be a hall of famer
## YES Kevin Garnett  should be a hall of famer
## YES Chris Bosh  should be a hall of famer
## YES Bob Dandridge  should be a hall of famer
## NO Toni Kukoc  should not be a hall of famer
## YES Paul Pierce  should be a hall of famer
## YES Ben Wallace  should be a hall of famer
## YES Chris Webber  should be a hall of famer
## YES Manu Ginobili  should be a hall of famer
## YES Tim Hardaway  should be a hall of famer
## YES Lou Hudson  should be a hall of famer
## YES Pau Gasol  should be a hall of famer
## YES Dirk Nowitzki  should be a hall of famer
## YES Tony Parker  should be a hall of famer
## YES Dwyane Wade  should be a hall of famer
## Accuracy of classifying hall of famers: 86.39456 %
## [1] 86.39456 79.59044

Which of Today’s NBA players will make it to the Hall of Fame?

Let’s run our classifier on today’s players and see who will make into the hall of fame.

First let’s run our predictive model using a parameter of 1

curr_player_names = c(nba_2022_df$Player)
predicted_hof = c()

for(i in 1:length(curr_player_names)){
  name = curr_player_names[i]
  if(hofOrNah(name, 1, recent_players_df)){
    print(paste("YES based on his career averages so far,", name, "will be a hall of famer"))
    predicted_hof <- c(predicted_hof, name)
  }
}
## [1] "YES based on his career averages so far, Bam Adebayo will be a hall of famer"
## [1] "YES based on his career averages so far, LaMarcus Aldridge will be a hall of famer"
## [1] "YES based on his career averages so far, Jarrett Allen will be a hall of famer"
## [1] "YES based on his career averages so far, Giannis Antetokounmpo will be a hall of famer"
## [1] "YES based on his career averages so far, Carmelo Anthony will be a hall of famer"
## [1] "YES based on his career averages so far, Cole Anthony will be a hall of famer"
## [1] "YES based on his career averages so far, Deandre Ayton will be a hall of famer"
## [1] "YES based on his career averages so far, LaMelo Ball will be a hall of famer"
## [1] "YES based on his career averages so far, Lonzo Ball will be a hall of famer"
## [1] "YES based on his career averages so far, Eric Bledsoe will be a hall of famer"
## [1] "YES based on his career averages so far, Malcolm Brogdon will be a hall of famer"
## [1] "YES based on his career averages so far, Jimmy Butler will be a hall of famer"
## [1] "YES based on his career averages so far, Wendell Carter Jr. will be a hall of famer"
## [1] "YES based on his career averages so far, Darren Collison will be a hall of famer"
## [1] "YES based on his career averages so far, Mike Conley will be a hall of famer"
## [1] "YES based on his career averages so far, DeMarcus Cousins will be a hall of famer"
## [1] "YES based on his career averages so far, Cade Cunningham will be a hall of famer"
## [1] "YES based on his career averages so far, Stephen Curry will be a hall of famer"
## [1] "YES based on his career averages so far, Anthony Davis will be a hall of famer"
## [1] "YES based on his career averages so far, DeMar DeRozan will be a hall of famer"
## [1] "YES based on his career averages so far, Andre Drummond will be a hall of famer"
## [1] "YES based on his career averages so far, Kris Dunn will be a hall of famer"
## [1] "YES based on his career averages so far, Kevin Durant will be a hall of famer"
## [1] "YES based on his career averages so far, Anthony Edwards will be a hall of famer"
## [1] "YES based on his career averages so far, Joel Embiid will be a hall of famer"
## [1] "YES based on his career averages so far, De'Aaron Fox will be a hall of famer"
## [1] "YES based on his career averages so far, Markelle Fultz will be a hall of famer"
## [1] "YES based on his career averages so far, Darius Garland will be a hall of famer"
## [1] "YES based on his career averages so far, Josh Giddey will be a hall of famer"
## [1] "YES based on his career averages so far, Shai Gilgeous-Alexander will be a hall of famer"
## [1] "YES based on his career averages so far, Devonte' Graham will be a hall of famer"
## [1] "YES based on his career averages so far, Draymond Green will be a hall of famer"
## [1] "YES based on his career averages so far, Blake Griffin will be a hall of famer"
## [1] "YES based on his career averages so far, Tyrese Haliburton will be a hall of famer"
## [1] "YES based on his career averages so far, James Harden will be a hall of famer"
## [1] "YES based on his career averages so far, Killian Hayes will be a hall of famer"
## [1] "YES based on his career averages so far, Jrue Holiday will be a hall of famer"
## [1] "YES based on his career averages so far, Al Horford will be a hall of famer"
## [1] "YES based on his career averages so far, Dwight Howard will be a hall of famer"
## [1] "YES based on his career averages so far, Andre Iguodala will be a hall of famer"
## [1] "YES based on his career averages so far, Kyrie Irving will be a hall of famer"
## [1] "YES based on his career averages so far, Reggie Jackson will be a hall of famer"
## [1] "YES based on his career averages so far, LeBron James will be a hall of famer"
## [1] "YES based on his career averages so far, DeAndre Jordan will be a hall of famer"
## [1] "YES based on his career averages so far, Brandon Knight will be a hall of famer"
## [1] "YES based on his career averages so far, Damian Lillard will be a hall of famer"
## [1] "YES based on his career averages so far, Kevin Love will be a hall of famer"
## [1] "YES based on his career averages so far, Kyle Lowry will be a hall of famer"
## [1] "YES based on his career averages so far, CJ McCollum will be a hall of famer"
## [1] "YES based on his career averages so far, Davion Mitchell will be a hall of famer"
## [1] "YES based on his career averages so far, Donovan Mitchell will be a hall of famer"
## [1] "YES based on his career averages so far, Evan Mobley will be a hall of famer"
## [1] "YES based on his career averages so far, Ja Morant will be a hall of famer"
## [1] "YES based on his career averages so far, Victor Oladipo will be a hall of famer"
## [1] "YES based on his career averages so far, Chris Paul will be a hall of famer"
## [1] "YES based on his career averages so far, Elfrid Payton will be a hall of famer"
## [1] "YES based on his career averages so far, Rajon Rondo will be a hall of famer"
## [1] "YES based on his career averages so far, Derrick Rose will be a hall of famer"
## [1] "YES based on his career averages so far, Ricky Rubio will be a hall of famer"
## [1] "YES based on his career averages so far, Domantas Sabonis will be a hall of famer"
## [1] "YES based on his career averages so far, Dennis Smith Jr. will be a hall of famer"
## [1] "YES based on his career averages so far, Isaiah Stewart will be a hall of famer"
## [1] "YES based on his career averages so far, Jalen Suggs will be a hall of famer"
## [1] "YES based on his career averages so far, Jayson Tatum will be a hall of famer"
## [1] "YES based on his career averages so far, Isaiah Thomas will be a hall of famer"
## [1] "YES based on his career averages so far, Klay Thompson will be a hall of famer"
## [1] "YES based on his career averages so far, Tristan Thompson will be a hall of famer"
## [1] "YES based on his career averages so far, Karl-Anthony Towns will be a hall of famer"
## [1] "YES based on his career averages so far, Kemba Walker will be a hall of famer"
## [1] "YES based on his career averages so far, Russell Westbrook will be a hall of famer"
## [1] "YES based on his career averages so far, Andrew Wiggins will be a hall of famer"
## [1] "YES based on his career averages so far, Trae Young will be a hall of famer"

From these results, it looks like our model classified a LOT of today’s players into the hall of fame.

Now let’s try running it with a parameter of 2 (below)

curr_player_names = c(nba_2022_df$Player)
predicted_hof = c()

for(i in 1:length(curr_player_names)){
  name = curr_player_names[i]
  if(hofOrNah(name, 2, recent_players_df)){
    print(paste("YES based on his career averages so far,", name, "will be a hall of famer"))
    predicted_hof <- c(predicted_hof, name)
  }
}
## [1] "YES based on his career averages so far, LaMarcus Aldridge will be a hall of famer"
## [1] "YES based on his career averages so far, Giannis Antetokounmpo will be a hall of famer"
## [1] "YES based on his career averages so far, LaMelo Ball will be a hall of famer"
## [1] "YES based on his career averages so far, DeMarcus Cousins will be a hall of famer"
## [1] "YES based on his career averages so far, Stephen Curry will be a hall of famer"
## [1] "YES based on his career averages so far, Anthony Davis will be a hall of famer"
## [1] "YES based on his career averages so far, DeMar DeRozan will be a hall of famer"
## [1] "YES based on his career averages so far, Kevin Durant will be a hall of famer"
## [1] "YES based on his career averages so far, Joel Embiid will be a hall of famer"
## [1] "YES based on his career averages so far, De'Aaron Fox will be a hall of famer"
## [1] "YES based on his career averages so far, Josh Giddey will be a hall of famer"
## [1] "YES based on his career averages so far, Shai Gilgeous-Alexander will be a hall of famer"
## [1] "YES based on his career averages so far, Blake Griffin will be a hall of famer"
## [1] "YES based on his career averages so far, James Harden will be a hall of famer"
## [1] "YES based on his career averages so far, Kyrie Irving will be a hall of famer"
## [1] "YES based on his career averages so far, LeBron James will be a hall of famer"
## [1] "YES based on his career averages so far, Damian Lillard will be a hall of famer"
## [1] "YES based on his career averages so far, Donovan Mitchell will be a hall of famer"
## [1] "YES based on his career averages so far, Ja Morant will be a hall of famer"
## [1] "YES based on his career averages so far, Domantas Sabonis will be a hall of famer"
## [1] "YES based on his career averages so far, Karl-Anthony Towns will be a hall of famer"
## [1] "YES based on his career averages so far, Russell Westbrook will be a hall of famer"
## [1] "YES based on his career averages so far, Trae Young will be a hall of famer"

Conclusion

Based on the contrasting outputs, it appears that while a parameter of 1 provided the highest accuracy in training results, a parameter of 2 yielded more “realistic” outcomes that aligned with our intuition. The first list consists of players who are undoubtedly talented but may not be regarded as Hall of Fame caliber by fans or experts. Players like Jeff Teague, Marcus Smart, and Collin Sexton, though solid performers, are not commonly seen as belonging in the Hall of Fame. On the other hand, the second list presents us with names that resonate as true superstars and undeniable Hall of Famers. This disparity raises the question: why is this discrepancy occurring?

One explanation is that our predictive model exclusively focuses on retired players. The dataset we are utilizing includes relatively young players who are still in their prime. Consequently, their career averages are unusually high compared to what they will eventually be once they retire and enter the declining phase of their careers.

Another factor to consider is that the criteria for Hall of Famers have evolved over the past few decades. We observed instances such as Bobby Jones, who may not boast high statistics according to today’s NBA standards but were considered exceptional during the era they played in. Hence, to enhance our predictions, it may be beneficial to establish averages based on Hall of Famers who competed in the last 1-2 decades as a benchmark for evaluating the suitability of current players. As the game has progressed, it might be prudent to assign greater significance to data from recent eras rather than relying heavily on information from half a century ago. Undoubtedly, the standards for the Hall of Fame have undergone changes over time.

Overall, my model performed as intended. I have numerous ideas to enhance its effectiveness, such as incorporating additional categories, weighing different eras differently, and possibly developing separate models based on player positions. Furthermore, we anticipate obtaining more accurate results when evaluating players who have concluded their careers and are awaiting induction into the Hall of Fame.