One Categorical Variable: Multinomial Test

We’ll be looking at hair/eye color combos to see if they follow the pattern of dominant vs recessive traits that we would expect.

We’ll be converting hair and eye color to hair_dom and eye_dom.

For hair color, if the student has brown or black hair, they’ll be classified as “Yes”.
For eye color, if the student has brown eyes, they’ll be classified as “Yes”.

For any other hair or eye color, we’ll label them as “No”. Then we’ll be combining the results into one variable with 4 groups and assuming dominant and recessive traits for hair and eye color is true:

Yes/Yes: Should occur 9 times as often as No/No
Yes/No: Should occur 3 times as often as No/No
No/Yes: Should occur 3 times as often as No/No
No/No: Should occur 1 times as often as No/No

Let’s form our new variables first

hair_eye <- 
  students |> 
  
  # Filter out students who replied "other" for either hair or eye color
  filter(
    Hair != "Other",
    Eye != "Other"
  ) |> 
  
  # Using mutate to create new variables
  # and if_else() to form the new binary groups
  mutate(
    hair_dom = if_else(condition = Hair %in% c("Brown", "Black"),
                       true = "Yes",
                       false = "No"),
    
    eye_dom = if_else(condition = Eye == "Brown",
                      true = "Yes",
                      false = "No"),
    
    hair_eye_dom = paste(hair_dom, eye_dom, sep = "/") |> 
      factor(levels = c("Yes/Yes", "Yes/No", 
                        "No/Yes", "No/No"))
  ) |> 
  
  dplyr::select(Hair, Eye, hair_dom, eye_dom, hair_eye_dom)

And create a summary result for the data:

hair_eye_results <- 
  hair_eye |> 
  summarize(
    .by = hair_eye_dom,
    count = n()
  ) |> 
  mutate(prop = count/sum(count))

hair_eye_results

##   hair_eye_dom count       prop
## 1        No/No    59 0.23505976
## 2      Yes/Yes   100 0.39840637
## 3       Yes/No    84 0.33466135
## 4       No/Yes     8 0.03187251

To test if hair and eye color combos follow the dominant and recessive trait pattern, our null hypothesis is

\[H_0: \pi_{yy} = \frac{9}{16}, \pi_{yn} = \frac{3}{16}, \pi_{ny} = \frac{3}{16}, \pi_{nn} = \frac{1}{16}\]

So how do we go about testing to see if all four stated proportions are correct?

Since we don’t have one proportion (or a difference in proportions), we can’t just calculate a \(z\) test statistic like we did previously.

Pearson’s Chi-squared Goodness-of-Fit Test

What we can do is calculate a \(z\)-test statistic for each group individually, square them, and combine them together to form an overall test statistic!

\[z_i^2 = N\frac{(p_i - \pi_{i,0})^2}{\pi_{i,0}} \\ \chi^2 = \sum_{i=1}^c z_i^2\]

Let’s take our summary results and add the expected counts as a column, then calculate the \(z_i^2\) for each group:

hair_eye_results2 <- 
  hair_eye_results |> 
  
  # Adding the expected proportions according to the null hypothesis
  add_column(expected_prop = c(9/16, 3/16, 3/16, 1/16)) |> 
  
  # Calculating the individual zi^2 for each group:
  mutate(zi2 = sum(count) * (prop - expected_prop)^2/expected_prop)

hair_eye_results2

##   hair_eye_dom count       prop expected_prop       zi2
## 1        No/No    59 0.23505976        0.5625 47.842657
## 2      Yes/Yes   100 0.39840637        0.1875 59.545900
## 3       Yes/No    84 0.33466135        0.1875 28.990787
## 4       No/Yes     8 0.03187251        0.0625  3.767181

# and then the test stat
(gof_test_stat <- sum(hair_eye_results2$zi2))

## [1] 140.1465

Degrees of freedom

We end up with a test statistic of about 140! Is that a large test statistic? How do we find the p-value?

We can use the \(\chi^2\) distribution to find our p-value, but it requires degrees of freedom to use.

For any \(\chi^2\) test in this class, the degrees of freedom are going to be determined by the number of estimated proportions from the data and estimated proportions from the null hypothesis. But what does that mean exactly?

In order to perform our test, we needed:

The sample proportions: \(p_i\)
Let’s call the number of sample proportions estimated \(r_1\)
The Null Hypothesis proportions: \(\pi_{i,0}\)
Let’s call the number of null hypothesis proportions estimated \(r_0\)

The degrees of freedom are going to be the difference in the number of estimated proportions for the sample vs the estimated Null Hypothesis proportions: \(r_1 - r_0\)

r_1

How many sample proportions did we need to estimate to conduct our test? Well, there are 4 groups and we estimated a proportion for each of them, so it makes sense that we have 4 estimated proportions.

But since proportions have to sum to 1, we only estimated 3 (if we know Yes:Yes, Yes:No, No:Yes, then we know the proportion of No:No). So we only needed the data to estimate 3 of the 4!

So we have \(r_1 = 4 - 1 = 3\)

r_0

And since the null hypothesis told us exactly what the true proportions are, we didn’t need to estimate any of them! \(r_0 = 0\)

Which means our degrees of freedom are \(r_1 - r_0 = (c - 1) - 0 = (4 - 1) - 0 = 3\)

P-value

The p-value can be found using the \(\chi^2\) distribution function pchisq() with:

q = the test statistic (about 193)
df = the degrees of freedom (3)
lower = F

gof_pval <- pchisq(q = gof_test_stat, df = 3, lower = F)

gof_pval

## [1] 3.514523e-30

We get a p-value of essentially 0, indicating that we have strong evidence that at least one of the null hypothesis values is wrong!

Note that at least 1 \(\ne\) all values are wrong

Canned Goodness-of-Fit function

The chisq_test() function in the rstatix package will perform a goodness of fit test for us!

The arguments are:

x = a vector of counts
p = the expected proportions under the null hypothesis

# Using the summarized counts, x = count
color_gof <- 
  rstatix::chisq_test(
    x = hair_eye_results2$count,
    p = hair_eye_results2$expected_prop
  )

color_gof

## # A tibble: 1 × 6
##       n statistic        p    df method          p.signif
## * <int>     <dbl>    <dbl> <dbl> <chr>           <chr>   
## 1     4      140. 3.51e-30     3 Chi-square test ****

Likelihood Ratio Test: G-test

Pearson’s \(chi^2\)-tests is likely what you’ve seen before this class, but there is an alternative test as well.

It works similar to Pearson’s test, we still need the estimated proportions, \(p_i\) and the expected proportions, \(\pi_{i,0}\), but how we calculate the test statistic is a little different:

\[G = 2\sum_{i = 1}^c n_i \ln \left( \frac{p_i}{\pi_{i,0}} \right)\]

Since we already have the estimated and expected proportions, we can calculate the \(G\) test statistic using hair_eye_results2

hair_eye_results2 <- 
  hair_eye_results2 |> 
  mutate(g_i = count*log(prop/expected_prop))

hair_eye_results2

##   hair_eye_dom count       prop expected_prop       zi2        g_i
## 1        No/No    59 0.23505976        0.5625 47.842657 -51.480530
## 2      Yes/Yes   100 0.39840637        0.1875 59.545900  75.369368
## 3       Yes/No    84 0.33466135        0.1875 28.990787  48.664585
## 4       No/Yes     8 0.03187251        0.0625  3.767181  -5.387381

# and calculating the test statistic
G_color <- 2*sum(hair_eye_results2$g_i)
G_color

## [1] 134.3321

The test statistic will follow the same probability distribution as Pearson’s test stat: \(G \sim \chi^2_{c-1}\)

# Finding the p-value
pchisq(q = G_color, df = 3, lower = F)

## [1] 6.300664e-29

or using a canned function GTest() in DescTools package:

pacman::p_load(DescTools)

# GTest() has the same arguments as chisq_test()
color_gtest <- 
  GTest(
    x = hair_eye_results2$count,
    p = c(9, 3, 3, 1)/16
  )

color_gtest

## 
##  Log likelihood ratio (G-test) goodness of fit test
## 
## data:  hair_eye_results2$count
## G = 134.33, X-squared df = 3, p-value < 2.2e-16

While the results are not exactly the same, they are fairly similar and most importantly the conclusion is exactly the same. We have strong evidence that at least one of the stated proportions is incorrect. But which proportion(s)?

Post Hoc Analysis

IF we reject the null hypothesis, we want to find which proportions are incorrect. We follow up by examining each of the standardized residuals

\[z_i = \sqrt{N}\frac{(p_i - \pi_{i,0})}{\sqrt{\pi_{i,0}}}\]

As long as all the expected counts are at least 5 \((n_i \pi_{i,0} \ge 5)\), the \(z_i\) will follow a standard Normal distribution:

\(|z_i| > 2 \rightarrow\) some evidence that the specified proportion is incorrect
\(|z_i| > 3 \rightarrow\) very strong evidence that the specified proportion is incorrect

If there are many proportions (rule of thumb is 6 or more), we only consider \(|z_i| > 3\) to indicate a strong difference

We can find the residuals fairly quickly using the saved objects from the canned function, chisq_test(), by using pearson_residuals() (also in the rstatix package)

hair_eye_results2 |> 
  
  # Adding the residual column to the summary df
  mutate(residuals = rstatix::pearson_residuals(color_gof)) |> 
  
  # Picking just the relevant columns
  dplyr::select(hair_eye_dom, prop, expected_prop, zi2, g_i, residuals)

##   hair_eye_dom       prop expected_prop       zi2        g_i residuals
## 1        No/No 0.23505976        0.5625 47.842657 -51.480530 -6.916839
## 2      Yes/Yes 0.39840637        0.1875 59.545900  75.369368  7.716599
## 3       Yes/No 0.33466135        0.1875 28.990787  48.664585  5.384309
## 4       No/Yes 0.03187251        0.0625  3.767181  -5.387381 -1.940923

We can see there is strong evidence that all but No/Yes is different that what is expected, with the last category having a z-score close to 2!

One Categorical Variable: Multinomial Test - All Proportions

Module 1

STAT 5350