knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
library(stats)
# Load the dataset
df <- read.csv("C:/Statistics/nba.csv") # Update with your file path
# Convert categorical variables
df$Tm <- as.factor(df$Tm)
df$Season <- as.factor(df$Season)
# **Load Dataset**
# **ANOVA Test: Points Per Game by Team**
## **Define Hypotheses**
- **H₀**: The mean points per game (PTS) are the same across all teams.
- **H₁**: At least one team has a significantly different mean points per game.
## **Check Category Count**
num_teams <- length(unique(df$Team))
num_teams
## [1] 0
# Consolidate if more than 10 teams
if (num_teams > 10) {
top_teams <- df %>% group_by(Team) %>% summarise(avg_pts = mean(PTS, na.rm = TRUE)) %>%
top_n(10, avg_pts) %>% pull(Team)
df <- df %>% mutate(Team = ifelse(Team %in% top_teams, Team, "Other"))
}
anova_model <- aov(PTS ~ Tm, data = df)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Tm 37 5362 144.9 1.374 0.0672 .
## Residuals 1665 175583 105.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
df_sample <- df[sample(nrow(df), 40, replace = TRUE), ]
ggplot(df_sample, aes(x = Tm, y = PTS, fill = Tm)) +
geom_boxplot() +
theme_minimal() +
theme(legend.position = "none") + # Removes the legend
labs(title = "ANOVA: Points Per Game by Tm",
x = "Tm",
y = "Points Per Game (PTS)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotates x-axis labels
We hypothesize that assists per game (AST) influence points per game (PTS), assuming a roughly linear relationship.
df_sample <- df[sample(nrow(df), 40, replace = TRUE), ]
cor(df$PTS, df$AST, use = "complete.obs")
## [1] 0.1649656
# Scatterplot
ggplot(df_sample, aes(x = AST, y = PTS)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", col = "red") +
theme_minimal() +
labs(title = "Scatterplot: Assists vs. Points Per Game",
x = "Assists Per Game (AST)",
y = "Points Per Game (PTS)")
## `geom_smooth()` using formula = 'y ~ x'
## **Run Linear Regression**
``` r
lm_model <- lm(PTS ~ AST, data = df)
summary(lm_model)
##
## Call:
## lm(formula = PTS ~ AST, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.831 -7.158 -1.677 5.823 55.842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.11963 0.37419 64.458 < 2e-16 ***
## AST 0.51919 0.07526 6.898 7.4e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.17 on 1701 degrees of freedom
## Multiple R-squared: 0.02721, Adjusted R-squared: 0.02664
## F-statistic: 47.59 on 1 and 1701 DF, p-value: 7.401e-12
## **Interpretation**
- **Coefficient Interpretation:** The regression coefficient for AST shows how much PTS increases per unit increase in AST.
- **R-squared Value:** Indicates how well AST explains variations in PTS.
- **Significance:** If assists significantly predict points, coaches may prioritize playmaking strategies to boost scoring.
### **Final Insights**
- **ANOVA:** If team differences are significant, this could guide performance analysis.
- **Regression:** If assists strongly predict points, teams could leverage assist-heavy playstyles to increase scoring.
- **Future Investigation:** Explore additional variables (e.g., turnovers, shooting percentage) for deeper analysis.
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
``` r
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.