2025-10-27

Introduction to Sports Analytics

Sports analytics uses statistical methods to:

  • Evaluate player performance and potential
  • Optimize team strategies and game tactics
  • Predict game outcomes and player injuries
  • Inform draft decisions and player trades

Modern sports teams rely heavily on data-driven insights to gain competitive advantages.

Statistical Models in Sports

Common statistical applications include:

  • Regression Analysis: Predicting points scored based on player statistics
  • Hypothesis Testing: Comparing performance across different conditions
  • Probability Models: Estimating win probabilities
  • Time Series Analysis: Tracking performance trends over seasons

Expected Points Model: \[E[Points] = \beta_0 + \beta_1(Shots) + \beta_2(Assists) + \beta_3(Minutes) + \epsilon\]

Generating Player Performance Data

# Simulate basketball player statistics
set.seed(2024)
n_players <- 50

players <- data.frame(
  player_id = 1:n_players,
  minutes_played = round(rnorm(n_players, 28, 6)),
  shots_attempted = round(rnorm(n_players, 12, 3)),
  assists = round(rpois(n_players, 4)),
  rebounds = round(rpois(n_players, 6))
)

# Calculate points (with some realistic correlation)
players$points <- round(8 + 0.8*players$shots_attempted + 
                       0.5*players$assists + rnorm(n_players, 0, 3))

Relationship: Shots vs Points Scored

Performance Distribution Analysis

3D Performance Analysis

Hypothesis Testing: Home vs Away Performance

Testing if players score differently at home vs away games:

Null Hypothesis: \(H_0: \mu_{home} = \mu_{away}\)

Alternative Hypothesis: \(H_A: \mu_{home} \neq \mu_{away}\)

# Simulate home and away performance
players$points_home <- players$points + rnorm(n_players, 2, 2)
players$points_away <- players$points + rnorm(n_players, -1, 2)

# Perform paired t-test
t_test_result <- t.test(players$points_home, players$points_away, 
                        paired = TRUE)
t_test_result
## 
##  Paired t-test
## 
## data:  players$points_home and players$points_away
## t = 8.9766, df = 49, p-value = 6.389e-12
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  2.415816 3.809458
## sample estimates:
## mean difference 
##        3.112637

Confidence Intervals for Average Performance

The 95% confidence interval for mean points per game:

\[\bar{X} \pm t_{\alpha/2, n-1} \times \frac{s}{\sqrt{n}}\]

Where: - \(\bar{X}\) is the sample mean - \(t_{\alpha/2, n-1}\) is the critical t-value - \(s\) is the sample standard deviation - \(n\) is the sample size

## Mean points: 20.06
## 95% CI: [18.86, 21.26]

Regression Model for Point Prediction

Multiple linear regression model:

\[Points_i = \beta_0 + \beta_1(Shots_i) + \beta_2(Assists_i) + \beta_3(Minutes_i) + \epsilon_i\]

## 
## Call:
## lm(formula = points ~ shots_attempted + assists + minutes_played, 
##     data = players)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.5753 -2.0616 -0.0408  1.9100  6.8501 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     12.74684    2.64123   4.826 1.57e-05 ***
## shots_attempted  0.82003    0.16177   5.069 6.97e-06 ***
## assists          0.41213    0.23509   1.753   0.0862 .  
## minutes_played  -0.14288    0.07025  -2.034   0.0478 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.129 on 46 degrees of freedom
## Multiple R-squared:  0.4875, Adjusted R-squared:  0.4541 
## F-statistic: 14.59 on 3 and 46 DF,  p-value: 8.238e-07

Interpretation: Each additional shot attempt increases expected points by approximately 0.8, holding other variables constant.

Key Takeaways

Sports analytics leverages statistics to:

  1. Quantify performance objectively using metrics
  2. Identify patterns that may not be visible to the naked eye
  3. Make predictions about future performance
  4. Test hypotheses about strategies and tactics

\[Performance = f(Skills, Strategy, Opportunity, Randomness)\]

The field continues to evolve with machine learning, computer vision, and real-time tracking data revolutionizing how teams make decisions.