This analysis explores NBA player statistics, categorizing scoring and assist levels, visualizing the data using boxplots and violin plots, and calculating correlation and confidence intervals. The goal is to derive meaningful insights into player performance.
nba_data <- read.csv("C:/Statistics/nba.csv")
glimpse(nba_data)
## Rows: 1,703
## Columns: 19
## $ bbrID <chr> "abdelal01", "abdulma02", "abdulta01", "abdursh01…
## $ Date <chr> "1993-03-16", "1991-04-02", "1998-04-19", "2001-1…
## $ Tm <chr> "BOS", "DEN", "SAC", "ATL", "OKC", "MIA", "ORL", …
## $ Opp <chr> "GSW", "DAL", "VAN", "DET", "CHO", "PHI", "WSB", …
## $ TRB <int> 10, 2, 2, 12, 2, 13, 10, 14, 2, 10, 4, 5, 10, 2, …
## $ AST <int> 2, 6, 3, 5, 0, 3, 1, 1, 8, 3, 3, 9, 2, 2, 0, 2, 1…
## $ STL <int> 0, 4, 1, 2, 0, 0, 0, 1, 5, 1, 4, 1, 1, 0, 1, 1, 2…
## $ BLK <int> 0, 0, 0, 1, 0, 1, 0, 0, 0, 3, 0, 0, 3, 1, 1, 2, 1…
## $ PTS <int> 25, 30, 31, 50, 25, 17, 18, 19, 31, 17, 22, 41, 2…
## $ GmSc <dbl> 22.7, 29.7, 26.4, 46.0, 17.1, 16.9, 19.2, 20.7, 3…
## $ Season <chr> "1992-93", "1990-91", "1997-98", "2001-02", "2018…
## $ Playoffs <chr> "false", "false", "false", "false", "false", "fal…
## $ Year <int> 1993, 1991, 1998, 2002, 2019, 2021, 1990, 2015, 1…
## $ GameIndex <int> 181, 64, 58, 386, 160, 8, 236, 124, 100, 4, 4, 25…
## $ GmScMovingZ <dbl> 4.13, 3.82, 4.11, 4.06, 3.37, 2.58, 4.27, 4.15, 3…
## $ GmScMovingZTop2Delta <dbl> 0.24, 0.64, 1.67, 0.84, 0.18, 0.05, 0.02, 0.93, 0…
## $ Date2 <chr> "1991-12-04", "1995-12-07", "1998-01-14", "2003-1…
## $ GmSc2 <dbl> 18.6, 40.1, 16.9, 34.3, 16.6, 16.8, 19.6, 18.5, 4…
## $ GmScMovingZ2 <dbl> 3.89, 3.18, 2.44, 3.22, 3.19, 2.53, 4.25, 3.22, 2…
Insight: This allows us to understand the structure of the dataset, identifying key variables for analysis.
nba_data <- nba_data %>%
mutate(
Scoring_Level = case_when(
PTS < 10 ~ "Low",
PTS >= 10 & PTS < 20 ~ "Medium",
PTS >= 20 ~ "High"
),
Assist_Level = case_when(
AST < 3 ~ "Low",
AST >= 3 & AST < 7 ~ "Medium",
AST >= 7 ~ "High"
)
)
nba_data$Scoring_Level <- factor(nba_data$Scoring_Level, levels = c("Low", "Medium", "High"), ordered = TRUE)
nba_data$Assist_Level <- factor(nba_data$Assist_Level, levels = c("Low", "Medium", "High"), ordered = TRUE)
Insight: Categorizing players based on their scoring and assist levels helps in grouping them into meaningful performance brackets.
ggplot(nba_data, aes(x = Scoring_Level, y = PTS, fill = Scoring_Level)) +
geom_boxplot() +
labs(title = "Boxplot of Points Scored by Scoring Level", x = "Scoring Level", y = "Points Scored") +
theme_minimal()
Insight: This visualization shows how points are
distributed within each scoring category, helping us compare
variability.
ggplot(nba_data, aes(x = Assist_Level, y = AST, fill = Assist_Level)) +
geom_violin() +
labs(title = "Violin Plot of Assists by Assist Level", x = "Assist Level", y = "Assists") +
theme_minimal()
Insight: The violin plot highlights the spread of
assist numbers within each level, showing potential outliers and
distribution density.
nba_data$Scoring_Level_Num <- as.numeric(nba_data$Scoring_Level)
nba_data$Assist_Level_Num <- as.numeric(nba_data$Assist_Level)
corr_pts <- cor(nba_data$PTS, nba_data$Scoring_Level_Num, method = "pearson")
corr_ast <- cor(nba_data$AST, nba_data$Assist_Level_Num, method = "pearson")
corr_results <- data.frame(Variable = c("PTS & Scoring Level", "AST & Assist Level"),
Correlation = c(corr_pts, corr_ast))
print(corr_results)
## Variable Correlation
## 1 PTS & Scoring Level 0.6615882
## 2 AST & Assist Level 0.8892969
Insight: The correlation between PTS and Scoring Level is expected to be strong since scoring level is derived from PTS. The correlation between AST and Assist Level is moderate due to variance within categories.
confidence_interval <- function(data, confidence = 0.95) {
mean_val <- mean(data, na.rm = TRUE)
std_err <- sd(data, na.rm = TRUE) / sqrt(length(data))
margin_of_error <- qt((1 + confidence) / 2, df = length(data) - 1) * std_err
return(c(mean_val - margin_of_error, mean_val + margin_of_error))
}
ci_pts <- confidence_interval(nba_data$PTS)
ci_ast <- confidence_interval(nba_data$AST)
ci_results <- data.frame(
Variable = c("PTS (Points Scored)", "AST (Assists)"),
CI_Lower = c(ci_pts[1], ci_ast[1]),
CI_Upper = c(ci_pts[2], ci_ast[2])
)
print(ci_results)
## Variable CI_Lower CI_Upper
## 1 PTS (Points Scored) 25.571603 26.551708
## 2 AST (Assists) 3.584751 3.896165
Insight: Confidence intervals provide an estimate of the range within which the true mean points and assists lie, helping in assessing variability in player performance.
This analysis explored scoring and assist trends among NBA players. The categorization, visualizations, and statistical analyses provided insights into performance levels. Further questions include: - How do other statistics (e.g., rebounds) correlate with performance levels? - Do different teams exhibit different patterns in scoring and assists?
This calls for further investigations into team-based and position-based performance trends.