knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
library(stats)

# Load the dataset
df <- read.csv("C:/Statistics/nba.csv")  # Update with your file path

# Convert categorical variables
df$Tm <- as.factor(df$Tm)
df$Season <- as.factor(df$Season)

# **Load Dataset**


# **ANOVA Test: Points Per Game by Team**

## **Define Hypotheses**
- **H₀**: The mean points per game (PTS) are the same across all teams.
- **H₁**: At least one team has a significantly different mean points per game.

## **Check Category Count**
num_teams <- length(unique(df$Team))
num_teams
## [1] 0
# Consolidate if more than 10 teams
if (num_teams > 10) {
  top_teams <- df %>% group_by(Team) %>% summarise(avg_pts = mean(PTS, na.rm = TRUE)) %>%
    top_n(10, avg_pts) %>% pull(Team)
  df <- df %>% mutate(Team = ifelse(Team %in% top_teams, Team, "Other"))
}

Run ANOVA Test

anova_model <- aov(PTS ~ Tm, data = df)
summary(anova_model)
##               Df Sum Sq Mean Sq F value Pr(>F)  
## Tm            37   5362   144.9   1.374 0.0672 .
## Residuals   1665 175583   105.5                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Boxplot for Visualization

df_sample <- df[sample(nrow(df), 40, replace = TRUE), ]

ggplot(df_sample, aes(x = Tm, y = PTS, fill = Tm)) +
  geom_boxplot() +
  theme_minimal() +
  theme(legend.position = "none") +  # Removes the legend
  labs(title = "ANOVA: Points Per Game by Tm",
       x = "Tm",
       y = "Points Per Game (PTS)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotates x-axis labels

Interpretation

  • If p < 0.05, reject H₀ (at least one team has a significantly different mean PTS).
  • If p ≥ 0.05, fail to reject H₀ (no significant difference in PTS across teams).
  • Significance: If certain teams consistently score higher, this could inform player performance analysis and team strategies.

Linear Regression: Points Per Game vs. Assists Per Game

Define Relationship

We hypothesize that assists per game (AST) influence points per game (PTS), assuming a roughly linear relationship.

Check Correlation & Linearity

df_sample <- df[sample(nrow(df), 40, replace = TRUE), ]
cor(df$PTS, df$AST, use = "complete.obs")
## [1] 0.1649656
# Scatterplot
ggplot(df_sample, aes(x = AST, y = PTS)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", col = "red") +
  theme_minimal() +
  labs(title = "Scatterplot: Assists vs. Points Per Game",
       x = "Assists Per Game (AST)",
       y = "Points Per Game (PTS)")
## `geom_smooth()` using formula = 'y ~ x'

## **Run Linear Regression**

``` r
lm_model <- lm(PTS ~ AST, data = df)
summary(lm_model)
## 
## Call:
## lm(formula = PTS ~ AST, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.831  -7.158  -1.677   5.823  55.842 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.11963    0.37419  64.458  < 2e-16 ***
## AST          0.51919    0.07526   6.898  7.4e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.17 on 1701 degrees of freedom
## Multiple R-squared:  0.02721,    Adjusted R-squared:  0.02664 
## F-statistic: 47.59 on 1 and 1701 DF,  p-value: 7.401e-12
## **Interpretation**
- **Coefficient Interpretation:** The regression coefficient for AST shows how much PTS increases per unit increase in AST.
- **R-squared Value:** Indicates how well AST explains variations in PTS.
- **Significance:** If assists significantly predict points, coaches may prioritize playmaking strategies to boost scoring.

### **Final Insights**
- **ANOVA:** If team differences are significant, this could guide performance analysis.
- **Regression:** If assists strongly predict points, teams could leverage assist-heavy playstyles to increase scoring.
- **Future Investigation:** Explore additional variables (e.g., turnovers, shooting percentage) for deeper analysis.


## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


``` r
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.