Week 7 Data Dive - Hypothesis Testing

For this week I am going to complete two different tests based on hypotheses about my dataset. For the first hypothesis I will be performing a Neyman-Pearson test. For the second hypothesis I will be performing a Fisher’s Significance test. For each of these tests I will provide a visualization that illustrates the results.

Hypothesis 1 - A higher points per 100 possessions (PTS_per_100) will lead to making the playoffs more often than not.

Hypothesis 2 - A higher rebounds per 100 possessions (TRB_per_100) will lead to making the playoffs more often than not.

Does a Higher Points Average lead to Making the Playoffs?

Here I will conduct the first of my hypotheses.

There are 502 teams that have an above average scoring who made the playoffs compared to 202 teams with above average scoring that missed the playoffs. There are also 263 teams with a below average scoring that made the playoffs compared to 435 teams with a below average scoring that missed the playoffs.

I chose these values for the following factors for the hypothesis test:

alpha level - 0.05 (5% is chosen to have a 95% confidence)

power level - 0.80 (Is a standard choice for hypothesis testing)

minimum effect size - 15 percentage points (15% is large enough to be considered a meaningful difference in chance to make the playoffs in the NBA)

Now I will run through the Neyman-Pearson test with Points per 100 possessions:

# Remove missing values
NBA1 <- NBA[!is.na(NBA$PTS_per_100), ]

# Compute median
median_pts <- median(NBA1$PTS_per_100)

# Create groups
NBA1$OffGroup <- ifelse(
  NBA1$PTS_per_100 >= median_pts,
  "High_PTS",
  "Low_PTS"
)

NBA1$OffGroup <- as.factor(NBA1$OffGroup)

# Check table
table(NBA1$OffGroup, NBA1$Playoffs)
##           
##            FALSE TRUE
##   High_PTS   202  502
##   Low_PTS    435  263
power.prop.test(
  p1 = 0.50,
  p2 = 0.65,
  sig.level = 0.05,
  power = 0.80,
  alternative = "one.sided"
)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 133.2485
##              p1 = 0.5
##              p2 = 0.65
##       sig.level = 0.05
##           power = 0.8
##     alternative = one.sided
## 
## NOTE: n is number in *each* group
table(NBA1$OffGroup)
## 
## High_PTS  Low_PTS 
##      704      698
tab1 <- table(NBA1$OffGroup, NBA1$Playoffs)

success <- c(tab1["High_PTS","TRUE"],
             tab1["Low_PTS","TRUE"])

total <- c(sum(tab1["High_PTS",]),
           sum(tab1["Low_PTS",]))

prop.test(success,
          total,
          alternative = "greater",
          correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  success out of total
## X-squared = 159.87, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.2950888 1.0000000
## sample estimates:
##    prop 1    prop 2 
## 0.7130682 0.3767908
playoff_high <- success[1] / total[1]
playoff_low  <- success[2] / total[2]

barplot(
  c(playoff_high, playoff_low),
  names.arg = c("High Scoring", "Low Scoring"),
  ylim = c(0,1),
  main = "Playoff Rate by Points per 100",
  ylab = "Proportion Making Playoffs"
)

Insights, Significance, and Questions

The insight that I found from this hypothesis test was that higher scoring teams have a statistically higher probability of making the playoffs (about a 0.75 proportion for high scoring teams compared to about a 0.4 proportion for lower scoring teams). This is significant as it suggests that scoring more points per 100 possessions is strongly associated with making the playoffs. This could help shape roster construction or coaching strategies for teams. A potential question to look into could be does 3pt. scoring matter more than traditional field goal percentage?

Does Better Rebounding lead to Making the Playoffs?

Here I will conduct the second of my hypotheses.

There are 461 teams with above average rebounding that made the playoffs compared to 251 teams with above average rebounding that missed the playoffs. There are also 304 teams with below average rebounding that made the playoffs compared to 386 teams with below average rebounding that missed the playoffs.

I am confident in this data and the conclusions you can draw from them because this is a large sample size of NBA teams across the history of the league. There is also a clear binary outcome with the Playoffs column for teams that made or missed the playoffs. Additionally, the per 100 possessions stat accounts for the difference in pace across different eras of the NBA.

Now I will run the Fisher’s Significance test for my hypothesis:

NBA2 <- NBA[!is.na(NBA$TRB_per_100), ]

median_trb <- median(NBA2$TRB_per_100)

NBA2$RebGroup <- ifelse(
  NBA2$TRB_per_100 >= median_trb,
  "High_REB",
  "Low_REB"
)

NBA2$RebGroup <- as.factor(NBA2$RebGroup)

table(NBA2$RebGroup, NBA2$Playoffs)
##           
##            FALSE TRUE
##   High_REB   251  461
##   Low_REB    386  304
tab2 <- table(NBA2$RebGroup, NBA2$Playoffs)

fisher.test(tab2)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab2
## p-value = 8.963e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3438181 0.5347610
## sample estimates:
## odds ratio 
##  0.4290708
success2 <- c(tab2["High_REB","TRUE"],
              tab2["Low_REB","TRUE"])

total2 <- c(sum(tab2["High_REB",]),
            sum(tab2["Low_REB",]))

playoff_high_reb <- success2[1] / total2[1]
playoff_low_reb  <- success2[2] / total2[2]

barplot(
  c(playoff_high_reb, playoff_low_reb),
  names.arg = c("High Rebounding", "Low Rebounding"),
  ylim = c(0,1),
  main = "Playoff Rate by Rebounding",
  ylab = "Proportion Making Playoffs"
)

Insights, Significance, and Questions

The insight I gather from this hypothesis test is that there is evidence of a correlation between better rebounding being associated with making the playoffs. However, this time there is a weaker association than there was with points per 100 possessions. There is about a 0.65 proportion for above average rebounding teams to make the playoffs and about a 0.4 proportion for below average rebounding teams to make the playoffs. This is significant enough to point out that better rebounding does lead to a higher chance of making the playoffs, but once again it is not as significant as points per 100 possessions. I would take this into account when constructing a roster or developing a coaching strategy. An additional question that could be worth exploring would be: Is there a difference in correlation towards playoff chances between offensive rebounds and defensive rebounds?