hypothesis testing

Hypothesis 1 (Neyman-Pearson)

Do unexpected playoff performances have a higher average Game Score than unexpected regular season performances?

Variables

The main variable is GmSc which is continuous, Group A is playoff games and Group B is regular season games.

Null and Alternative

H0: μPlayoffs = μRegular HA: μPlayoffs > μRegular

Test Design Choices

Alpha = 0.05 (standard/moderate) Power = 0.8 (80% chance of detecting meaningful difference) Minimum Effect Size = 2 (less than 2 GmSc points has limited practical meaning in NBA performance terms)

Calculate Sample Size

pwr.t.test(
  d = -2 / sd(nba$GmSc, na.rm = TRUE),
  power = 0.80,
  sig.level = 0.05,
  type = "two.sample",
  alternative = "less"
)

## 
##      Two-sample t test power calculation 
## 
##               n = 222.3259
##               d = -0.2361939
##       sig.level = 0.05
##           power = 0.8
##     alternative = less
## 
## NOTE: n is number in *each* group

Perform Test

t.test(
  GmSc ~ Playoffs,
  data = nba,
  alternative = "less"
)

## 
##  Welch Two Sample t-test
## 
## data:  GmSc by Playoffs
## t = -3.6885, df = 49.023, p-value = 0.0002825
## alternative hypothesis: true difference in means between group false and group true is less than 0
## 95 percent confidence interval:
##       -Inf -2.870964
## sample estimates:
## mean in group false  mean in group true 
##            24.98882            30.25208

Visualization 1

nba |>
  ggplot(aes(x = Playoffs, y = GmSc)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  theme_minimal() +
  labs(
    title = "Game Score: Playoffs vs Regular Season",
    x = "Playoff Status",
    y = "Game Score"
  )

The mean Game Score in the regular season was 24.99, compared to 30.25 in the playoffs. The test indicated a statistically significant difference with an extremely low p-value. At a = 0.05, we reject the null hypothesis and conclude that playoff performances have significantly higher Game Scores than regular season performances. The 95% confidence interval indicates that playoff Game Scores exceed regular season Game Scores by at least 2.87 points on average, suggesting both statistical and practical significance. My question is why exactly do most of these NBA players tend to perform at a higher level in playoff games? Why are there so many potential outliers on the higher end of regular season performances showing up on this box plot? Is this mostly just due to the large difference between game types within this dataset?

Hypothesis 2 (Fisher)

Are 30+ point performances more common in playoff games than in regular season games?

Variables

The outcome of whether a player scores at least 30 points is binary this time, but two groups remain the same.

Null and Alternative

H0: There is no difference in the proportions of 30+ point performances in the playoffs vs. regular season HA: There is a difference in the proportions of 30+ point performances in the playoffs vs. regular season

Perform Test

nba <- nba |>
  mutate(ThirtyPlus = PTS >= 30)

fisher.test(
  table(nba$Playoffs, nba$ThirtyPlus)
)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  table(nba$Playoffs, nba$ThirtyPlus)
## p-value = 0.0008181
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.477995 5.154833
## sample estimates:
## odds ratio 
##   2.742524

Visualization 2

nba |>
  group_by(Playoffs) |>
  summarise(
    Prop30 = mean(ThirtyPlus, na.rm = TRUE)
  ) |>
  ggplot(aes(x = Playoffs, y = Prop30)) +
  geom_col(width = 0.5) +
  theme_minimal() +
  labs(
    title = "Proportion of 30+ Point Games",
    x = "Playoff Status",
    y = "Proportion of Games with 30+ Points"
  )

The p-value is well below 0.05, meaning the probability of observing such an extreme difference in 30+ point performances if the null were true is only 0.082%. We can reject the null hypothesis since there is strong evidence that 30+ point games are not equally common in playoffs and the regular season. The odds ratio indicates that the odds of a player scoring 30+ points in a playoff game are roughly 2.7 times higher than in a regular season game. The confidence interval (1.48–5.15) does not include 1, reinforcing that this difference is statistically significant. We can trust this conclusion because Fisher’s Exact Test is usually exact and reliable even with small counts in some cells unlike the chi-squared test which can be inaccurate with small expected counts. The binary outcome is clearly defined (30+ points or not), minimizing measurement error. There is strong statistical evidence that 30+ point performances are more common in playoff games than in regular season games. Players are significantly more likely to reach 30 points in the postseason, possibly due to higher intensity, longer minutes or more strategic focus on star players. Some more questions that I have include: Does the increase hold across all positions or mostly for guards/forwards? Are star players simply just playing more minutes in playoff games with higher usage rates as well?

hypothesis testing

Lucas Tetrault

2026-02-26

Hypothesis 1 (Neyman-Pearson)

Variables

Null and Alternative

Test Design Choices

Calculate Sample Size

Perform Test

Visualization 1

Hypothesis 2 (Fisher)

Variables

Null and Alternative

Perform Test

Visualization 2