One Categorical Variable: Multinomial Test

Quarterbacks (QB) are the most important position in the NFL. Are quarterbacks equally likely to be picked in the fourth, fifth, sixth, and seventh rounds (last 4 rounds) in the NFL draft?

Our Null Hypothesis is:

\[H_0: \pi_4 = \pi_5 = \pi_6 = \pi_7\]

We aren’t testing all of the proportions this time, unlike our student survey example (only interested in 4 of the 7 proportions).

Data Summary

We’ll start by counting the number of quarterbacks picked in each round of the NFL Draft:

qb_round <- 
  draft |> 
  
  # Only looking at the quarterback position
  filter(Position == "QB") |> 
  
  # Counting the number of quarterbacks picked per round
  count(Round) |> 
  
  # and renaming the column from n to picks
  rename(picks = n) |> 
  
  # and finally calculating the sample proportions per round
  mutate(prop = picks/sum(picks))

qb_round

##   Round picks       prop
## 1     1    66 0.23571429
## 2     2    23 0.08214286
## 3     3    31 0.11071429
## 4     4    31 0.11071429
## 5     5    36 0.12857143
## 6     6    48 0.17142857
## 7     7    45 0.16071429

# Let's record the total number of quarterbacks drafted
N <- sum(qb_round$picks)

Expected Proportions

Our test statistic will look similar to the Goodness-of-fit test from the previous section:

\[\chi^2 = N \sum_{i = 1}^c \frac{(p_i - \pi_{i,0})^2}{\pi_{i,0}} \\ G = \sum_{i = 1}^c n_i \ln \frac{p_i}{\pi_{i,0}}\]

However, we have a little bit of an issue: what are \(\pi_{i,0}\) this time?

The null hypothesis doesn’t explicitly state what the population proportions are expected to be, just that \(\pi_4\) through \(\pi_7\) are all the same. So what do we do?

Good news: Any expected proportions not specified in the null hypothesis are equal to their sample proportions! - \(\pi_{i,0} = p_i\)

Bad news: We need to estimate the expected proportions for the groups included in the null hypothesis.

If we are assuming that \(\pi_4\) through \(\pi_7\) are all equal, the expected proportions should be equal.

That means our expected proportions for \(\pi_4\) through \(\pi_7\) will just be the average of the four sample proportions!

qb_round2 <- 
  qb_round |> 
  
  # using filter() and between() to pick rounds 4 - 7
  filter(between(Round, 4, 7)) |> 
  
  # calculating the average proportion for the 4 remaining rounds:
  mutate(expected_prop = mean(prop))

qb_round2

##   Round picks      prop expected_prop
## 1     4    31 0.1107143     0.1428571
## 2     5    36 0.1285714     0.1428571
## 3     6    48 0.1714286     0.1428571
## 4     7    45 0.1607143     0.1428571

If the null hypothesis is true, we should expect our sample proportions to be close to 14.3%

Calculating the test statistics:

qb_round2 <- 
  qb_round2 |> 
  mutate(
    # individual chi2 contributions
    zi2 = N * (prop - expected_prop)^2/expected_prop, 
    # individual LRT G contributions
    gi = picks * log(prop/expected_prop)
  )

qb_round2

##   Round picks      prop expected_prop   zi2        gi
## 1     4    31 0.1107143     0.1428571 2.025 -7.901660
## 2     5    36 0.1285714     0.1428571 0.400 -3.792979
## 3     6    48 0.1714286     0.1428571 1.600  8.751435
## 4     7    45 0.1607143     0.1428571 0.625  5.300237

# Calculating the test stats

nfl_test_stats <- 
  c(
    chisq = sum(qb_round2$zi2),  # Calculating Pearson's chi-squared test stat
    nfl_G = sum(qb_round2$gi)    # Calculating the LRT G-test stat
  ) 

nfl_test_stats

##    chisq    nfl_G 
## 4.650000 2.357033

Determining the p-value

Like our previous \(\chi^2\) and \(G\)-tests, we’ll use a \(\chi^2\) distribution to find the p-value. But what are the degrees of freedom this time?

Again, it is the number of proportions estimated for the sample proportions \(p_i\) (\(r_1\)) minus the number of expected proportions \(\pi_{i,0}\) estimated (\(r_0\))

\(df = r_1 - r_0\)

r1

since we are only estimating some of the proportions (instead of all of them), the four won’t add up to 1 this time. So we needed to estimate all 4 proportions: \(r_1 = 4\)

r0

While we estimated four expected proportions, how many unique \(\pi_{i,0}\) did we estimate? Since we’re assuming the four proportions are all equal, we only needed to estimate one: \(r_0 = 1\)

Which means our degrees of freedom are \(4 - 1 = 3\)

pchisq(q = nfl_test_stats, df = 3, lower = F)

##     chisq     nfl_G 
## 0.1992946 0.5016830

While the results here are noticeably different, we reach the same conclusion: No evidence that the probability a QB is picked differs between rounds 4, 5, 6, or 7.

Second Test: Equally likely picked on day 2 and equally likely picked on day 3

Let’s work through a test that QBs are equally likely to be picked in rounds that occur on the same day:

\[H_0: \\ \pi_2 = \pi_3 \\ \pi_4 = \pi_5 = \pi_6 = \pi_7\]

qb_round |> 
  round(digits = 3)

##   Round picks  prop
## 1     1    66 0.236
## 2     2    23 0.082
## 3     3    31 0.111
## 4     4    31 0.111
## 5     5    36 0.129
## 6     6    48 0.171
## 7     7    45 0.161

One Categorical Variable: Multinomial Test - Some Proportions

Module 1

STAT 5350

Data Summary

Expected Proportions

Calculating the test statistics:

Determining the p-value

r1

r0

Second Test: Equally likely picked on day 2 and equally likely picked on day 3