2024-10-31

Introduction to Hypothesis Testing

Hypothesis Testing is one statistical method that might be used to make inferences or conclusions about data samples. There are different testing methods that might be used, but this presentation will focus on T-Tests. T-Tests use a t-distribution to test an alternate hypothesis against a null hypothesis, and are especially useful for comparing means.

  • Null Hypothesis: No effect or no difference between samples (Previous Sample Mean)
  • Alternate Hypothesis: Significant effect or difference between samples (New Sample Mean)

Statistical calculations (shown in next slide) are then used to determine a probability value for the alternate hypothesis occurring. This p-value is then compared to a pre-determined value alpha, usually 0.05, to determine statistical significance.

Latex Demonstration with T-Test Formulas

The test statistic \(t\) for a two-sample t-test can be calculated with the following formula:

\[ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

where \(\bar{X}_1\) and \(\bar{X}_2\) are the sample means, \(s_1^2\) and \(s_2^2\) are the sample variances, \(n_1\) and \(n_2\) are the sample sizes.

Degrees of Freedom Formula in Latex

For t-tests with two different samples, there is a different formula for calculating degrees of freedom:

\[ df = \frac{\left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2}{\frac{\left( \frac{s_1^2}{n_1} \right)^2}{n_1 - 1} + \frac{\left( \frac{s_2^2}{n_2} \right)^2}{n_2 - 1}} \]

where \(s_1^2\) and \(s_2^2\) are the sample variances and \(n_1\) and \(n_2\) are the sample sizes.

Loading in & Displaying Data Set and explaining Basketball Reference

Basketball reference is a great resource for all sorts of NBA and WNBA Statistics with some resources dating back to the 1940s. You should check out their website here: https://www.basketball-reference.com/

The data sets looked at here for the purpose of hypothesis testing include the Per-Game Team Statistics for the 2023-2024 NBA Season and the Per-Game Team Statistics for the current 2024-2025 NBA Season as of 10/30. The beginning of the data set is as follows:

##   Rk                  Team G    MP   FG  FGA   FG.  X3P X3PA  X3PP  X2P X2PA
## 1  1        Boston Celtics 4 240.0 42.8 90.5 0.472 21.5 50.3 0.428 21.3 40.3
## 2  2 Golden State Warriors 4 240.0 45.8 96.0 0.477 18.8 46.8 0.401 27.0 49.3
## 3  3   Cleveland Cavaliers 4 240.0 44.8 84.8 0.528 14.3 34.8 0.410 30.5 50.0
## 4  4     Memphis Grizzlies 4 240.0 44.5 94.8 0.470 13.5 37.5 0.360 31.0 57.3
## 5  5         Brooklyn Nets 4 246.3 40.8 90.3 0.452 16.8 44.3 0.379 24.0 46.0
## 6  6    Los Angeles Lakers 4 240.0 42.3 89.0 0.475 11.8 30.8 0.382 30.5 58.3
##    X2PP   FT  FTA   FTP  ORB  DRB  TRB  AST  STL BLK  TOV   PF   PTS
## 1 0.528 17.3 21.0 0.821 10.0 31.0 41.0 24.3  7.0 4.3 10.5 17.0 124.3
## 2 0.548 13.5 17.8 0.761 14.8 34.8 49.5 31.5 13.3 6.3 15.5 23.5 123.8
## 3 0.610 19.8 26.8 0.738  8.3 32.8 41.0 26.3 11.5 7.0 14.3 21.0 123.5
## 4 0.541 17.8 26.3 0.676 12.8 33.8 46.5 31.8  7.5 6.8 15.0 26.3 120.3
## 5 0.522 19.5 24.0 0.813  9.8 32.3 42.0 25.8  7.0 4.5 14.8 30.3 117.8
## 6 0.524 21.0 26.5 0.792 11.3 32.3 43.5 28.0  7.5 5.5 12.8 19.5 117.3

Visualization of PPG against 3P% for 2024-2025 NBA Teams

Another way to look specifically at 3 Point Percentage for 2024-2025 Season

Setup of 3 Dimensional Plot with Points Per Game, 3 Point Percentage, and Turnovers Per Game

p <- plot_ly(sample25, x = ~PTS, y = ~X3PP, z = ~TOV,type = 'scatter3d', mode = 'markers', text = ~paste("Team:", Team)) %>%
  layout(title = "3D Scatter Plot of PPG, 3P% and Turnovers",
         scene = list(xaxis = list(title = 'Points Per Game'),
                      yaxis = list(title = '3 Point Field Goal Percentage'),
                      zaxis = list(title = 'Turnovers')))

3D Plot of PPG, 3P%, and TOV including Team Names

Connecting Hypothesis Testing with NBA Teams’ Data from last season

Are Teams Shooting Unusually Good or Bad this season?

Null Hypothesis -> Team 3P Percentage From Last Season

Alternate Hypothesis -> Team 3P Percentage From This Season

nullhypothesis24 <- mean(sample24$X3PP)
var24 <- var(sample24$X3PP)
n24  <- sum(sample24$G)

alternatehypothesis25 <- mean(sample25$X3PP)
var25 <- var(sample25$X3PP)
n25 <- sum(sample25$G)

Hypothesis Test Results

When we plug in the numbers calculated above to the previously mentioned formulas, we get t=1.77 and df=36.9. We can compare this to the R function “t.test” which can do hypothesis testing with t automatically:

t.test(sample24$X3PP, sample25$X3PP, alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  sample24$X3PP and sample25$X3PP
## t = 1.7716, df = 36.9, p-value = 0.08472
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.001869421  0.027869421
## sample estimates:
## mean of x mean of y 
## 0.3656667 0.3526667

Conclusions about Hypothesis Testing and NBA 3PP

As shown by the p-value from the last slide equaling 0.08472, there it is not statistically abnormal for NBA Teams to have shot as bad as they have so far this season.

Thus, we were able to use t-testing to test the statistical significance of our alternate hypothesis against our null hypothesis.