knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
library(readxl)
library(tidyverse)
library(stringr)
load_all_sheets <- function(file_name) {
sheetstotal <- readxl::excel_sheets(path = file_name)
indsheets <- lapply(sheetstotal, function(sh) {
df <- readxl::read_xlsx(path = file_name, sheet = sh)
if (!"Awards" %in% names(df)) df$Awards <- NA_character_
df$Team <- sh
df$Sheet_name <- sh
df$Won_Award <- ifelse(is.na(df$Awards) | trimws(df$Awards) == "", 0, 1)
df$PRA <- df$PTS + df$TRB + df$AST
df$STOCKS <- df$STL + df$BLK
return(df)
})
out <- dplyr::bind_rows(indsheets)
return(out)
}
NBAdata <- load_all_sheets("NBA.xlsx")
conference_lookup <- read_xlsx("Team Conferences.xlsx") %>%
mutate(Team_clean = str_trim(str_to_lower(Team)))
NBAdata <- NBAdata %>%
mutate(Sheet_clean = str_trim(str_to_lower(Sheet_name))) %>%
left_join(conference_lookup, by = c("Sheet_clean" = "Team_clean")) %>%
mutate(
Conference_binary = ifelse(Conference == "East", 1,
ifelse(Conference == "West", 0, NA))
) %>%
relocate(Conference, Conference_binary, .after = Sheet_name) %>%
dplyr::select(-Sheet_clean)
ggplot(NBAdata, aes(x = STOCKS, y = PRA, color = Conference)) +
geom_point(size = 3, alpha = 0.8) +
labs(
title = "Relationship Between PRA and STOCKS by Conference",
x = "STOCKS (Steals + Blocks)",
y = "PRA (Points + Rebounds + Assists)",
color = "Conference"
) +
theme_minimal()
ggplot(NBAdata, aes(x = Age, y = `3P%`, color = Conference)) +
geom_point(size = 3, alpha = 0.8) +
labs(
title = "Relationship Between Age and 3 Point Percentage",
x = "Age",
y = "3P%",
color = "Conference"
) +
theme_minimal()
In the first plot, there appears to be an association between offensive prowess and defensive prowess. Though this may reflect a true association between the two skills, it may also be due to the fact that players accumulate both more STOCKS and more PRA when they play more games. These metrics are absolute scores and not proportions; this must be taken into account.
In the other graph, there does not appear to be a strong association between age and 3 point percentage. In fact, 3 point percentage seems to be almost independent of age. (It is important, also, to note that these 3 point percentages do not account for number of attempted shots. It is almost certain that few, if any, of the players with 3 point percentages over 50% had a meaningful number of attempted 3 point shots. Using standard statistical criteria, only a small handful of players have ever achieved this).
cor_test_pra <- cor.test(NBAdata$Conference_binary, NBAdata$PRA, method = "pearson")
cor_test_pra
##
## Pearson's product-moment correlation
##
## data: NBAdata$Conference_binary and NBAdata$PRA
## t = -1.8195, df = 650, p-value = 0.0693
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.147164250 0.005629906
## sample estimates:
## cor
## -0.07118475
There is a slight negative correlation between conference and PRA (r=-.07). This correlation, though, is not statistically significant (p = .069).
cor_test_stocks <- cor.test(NBAdata$Conference_binary, NBAdata$STOCKS, method = "pearson")
cor_test_stocks
##
## Pearson's product-moment correlation
##
## data: NBAdata$Conference_binary and NBAdata$STOCKS
## t = -2.094, df = 650, p-value = 0.03665
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.157650363 -0.005105577
## sample estimates:
## cor
## -0.08185737
There is also a small negative correlation between conference and STOCKS (-.08). Unlike the other correlation, though, this one is statistically significant (p=.04). Western Conference players have more STOCKS than Eastern Conference players. This may be due to different approaches to strategy between the two conferences.
corr_data <- NBAdata %>%
dplyr::select(Age, PRA, STOCKS)
corr_matrix <- cor(corr_data, use = "complete.obs", method = "pearson")
corr_matrix
## Age PRA STOCKS
## Age 1.00000000 0.1238926 0.07734898
## PRA 0.12389260 1.0000000 0.84021798
## STOCKS 0.07734898 0.8402180 1.00000000
library(ggcorrplot)
ggcorrplot(corr_matrix,
lab = TRUE,
lab_size = 4,
method = "circle",
type = "lower",
colors = c("white", "yellow", "red"),
title = "Correlation Matrix: Age, PRA, and STOCKS",
ggtheme = ggplot2::theme_minimal)
Within the correlation matrix, there is weak association between PRA and age (.12) and STOCKS and age (.08). There is, however, a strong association between PRA and STOCKS (r=.84). There are a variety of viable explanations for this association. The strong association between PRA and STOCKS may be due to the fact that strong offensive players tend to be strong defensive players (and vice versa). It may, however, be due to the fact that players who are stronger in one of the two categories tend to play more games and, thereby, accumulate higher statistics in both categories. It is necessary to control for time played to gain more clarity on this data.
library(ppcor)
partial_result <- pcor.test(NBAdata$PRA, NBAdata$STOCKS, NBAdata$MP)
print(partial_result)
## estimate p.value statistic n gp Method
## 1 0.07911891 0.04359333 2.021931 652 1 pearson
Holding minutes played constant, there is still a slight positive correlation between PRA and STOCKS (r=.08), and the correlation is still significant (p=.04).However, the size of this correlation suggests that PRA and STOCKS are primarily associated with one another through amount of minutes played.
Our statistical findings suggest that players in the Western Conference are slightly better defensively, though the difference (for practical purposes) may be negligible. Our findings also suggest that offensive and defensive prowess are associated with one another, yet this is due to minutes played–not any clear association between the skills. When minutes played are controlled for, this association remains; however, it is quite weak. Further analyses might probe whether there are specific metrics (beyond overall offensive and defensive ability) that differ between the two conferences; this would allow analysis to hone in on what specific factors are creating the specific gap in defensive prowess.
Moreover, further analyses might look at which specific skills/metrics improve with age/games played and which decline. These analyses might also differentiate the patterns for elite vs. non-elite players. For instance, some skills (such as 3 point percentage) appear to remain steady with age. It is unclear whether this would be the case for other specific skills as well. A more thorough knowledge of these patterns would allow coaches and recruiters to better allocate financial resources; after all, a player whose particular strengths increase with age might, ultimately, be a better financial investment than a player whose particular strengths have already peaked.