We’ll be looking at hair/eye color combos to see if they follow the pattern of dominant vs recessive traits that we would expect.
We’ll be converting hair and eye color to hair_dom
and
eye_dom
.
For hair color, if the student has brown or black hair, they’ll be classified as “Yes”.
For eye color, if the student has brown eyes, they’ll be classified as “Yes”.
For any other hair or eye color, we’ll label them as “No”. Then we’ll be combining the results into one variable with 4 groups and assuming dominant and recessive traits for hair and eye color is true:
Let’s form our new variables first
hair_eye <-
students |>
# Filter out students who replied "other" for either hair or eye color
filter(
Hair != "Other",
Eye != "Other"
) |>
# Using mutate to create new variables
# and if_else() to form the new binary groups
mutate(
hair_dom = if_else(condition = Hair %in% c("Brown", "Black"),
true = "Yes",
false = "No"),
eye_dom = if_else(condition = Eye == "Brown",
true = "Yes",
false = "No"),
hair_eye_dom = paste(hair_dom, eye_dom, sep = "/") |>
factor(levels = c("Yes/Yes", "Yes/No",
"No/Yes", "No/No"))
) |>
dplyr::select(Hair, Eye, hair_dom, eye_dom, hair_eye_dom)
And create a summary result for the data:
hair_eye_results <-
hair_eye |>
summarize(
.by = hair_eye_dom,
count = n()
) |>
mutate(prop = count/sum(count))
hair_eye_results
## hair_eye_dom count prop
## 1 No/No 59 0.23505976
## 2 Yes/Yes 100 0.39840637
## 3 Yes/No 84 0.33466135
## 4 No/Yes 8 0.03187251
To test if hair and eye color combos follow the dominant and recessive trait pattern, our null hypothesis is
\[H_0: \pi_{yy} = \frac{9}{16}, \pi_{yn} = \frac{3}{16}, \pi_{ny} = \frac{3}{16}, \pi_{nn} = \frac{1}{16}\]
So how do we go about testing to see if all four stated proportions are correct?
Since we don’t have one proportion (or a difference in proportions), we can’t just calculate a \(z\) test statistic like we did previously.
What we can do is calculate a \(z\)-test statistic for each group individually, square them, and combine them together to form an overall test statistic!
\[z_i^2 = N\frac{(p_i - \pi_{i,0})^2}{\pi_{i,0}} \\ \chi^2 = \sum_{i=1}^c z_i^2\]
Let’s take our summary results and add the expected counts as a column, then calculate the \(z_i^2\) for each group:
hair_eye_results2 <-
hair_eye_results |>
# Adding the expected proportions according to the null hypothesis
add_column(expected_prop = c(9/16, 3/16, 3/16, 1/16)) |>
# Calculating the individual zi^2 for each group:
mutate(zi2 = sum(count) * (prop - expected_prop)^2/expected_prop)
hair_eye_results2
## hair_eye_dom count prop expected_prop zi2
## 1 No/No 59 0.23505976 0.5625 47.842657
## 2 Yes/Yes 100 0.39840637 0.1875 59.545900
## 3 Yes/No 84 0.33466135 0.1875 28.990787
## 4 No/Yes 8 0.03187251 0.0625 3.767181
# and then the test stat
(gof_test_stat <- sum(hair_eye_results2$zi2))
## [1] 140.1465
We end up with a test statistic of about 140! Is that a large test statistic? How do we find the p-value?
We can use the \(\chi^2\) distribution to find our p-value, but it requires degrees of freedom to use.
For any \(\chi^2\) test in this class, the degrees of freedom are going to be determined by the number of estimated proportions from the data and estimated proportions from the null hypothesis. But what does that mean exactly?
In order to perform our test, we needed:
The sample proportions: \(p_i\)
Let’s call the number of sample proportions estimated \(r_1\)
The Null Hypothesis proportions: \(\pi_{i,0}\)
Let’s call the number of null hypothesis proportions estimated \(r_0\)
The degrees of freedom are going to be the difference in the number of estimated proportions for the sample vs the estimated Null Hypothesis proportions: \(r_1 - r_0\)
How many sample proportions did we need to estimate to conduct our test? Well, there are 4 groups and we estimated a proportion for each of them, so it makes sense that we have 4 estimated proportions.
But since proportions have to sum to 1, we only estimated 3 (if we know Yes:Yes, Yes:No, No:Yes, then we know the proportion of No:No). So we only needed the data to estimate 3 of the 4!
So we have \(r_1 = 4 - 1 = 3\)
And since the null hypothesis told us exactly what the true proportions are, we didn’t need to estimate any of them! \(r_0 = 0\)
Which means our degrees of freedom are \(r_1 - r_0 = (c - 1) - 0 = (4 - 1) - 0 = 3\)
The p-value can be found using the \(\chi^2\) distribution function
pchisq()
with:
q =
the test statistic (about 193)df =
the degrees of freedom (3)lower = F
gof_pval <- pchisq(q = gof_test_stat, df = 3, lower = F)
gof_pval
## [1] 3.514523e-30
We get a p-value of essentially 0, indicating that we have strong evidence that at least one of the null hypothesis values is wrong!
Note that at least 1 \(\ne\) all values are wrong
The chisq_test()
function in the rstatix
package will perform a goodness of fit test for us!
The arguments are:
x =
a vector of countsp =
the expected proportions under the null
hypothesis# Using the summarized counts, x = count
color_gof <-
rstatix::chisq_test(
x = hair_eye_results2$count,
p = hair_eye_results2$expected_prop
)
color_gof
## # A tibble: 1 × 6
## n statistic p df method p.signif
## * <int> <dbl> <dbl> <dbl> <chr> <chr>
## 1 4 140. 3.51e-30 3 Chi-square test ****
Pearson’s \(chi^2\)-tests is likely what you’ve seen before this class, but there is an alternative test as well.
It works similar to Pearson’s test, we still need the estimated proportions, \(p_i\) and the expected proportions, \(\pi_{i,0}\), but how we calculate the test statistic is a little different:
\[G = 2\sum_{i = 1}^c n_i \ln \left( \frac{p_i}{\pi_{i,0}} \right)\]
Since we already have the estimated and expected proportions, we can
calculate the \(G\) test statistic
using hair_eye_results2
hair_eye_results2 <-
hair_eye_results2 |>
mutate(g_i = count*log(prop/expected_prop))
hair_eye_results2
## hair_eye_dom count prop expected_prop zi2 g_i
## 1 No/No 59 0.23505976 0.5625 47.842657 -51.480530
## 2 Yes/Yes 100 0.39840637 0.1875 59.545900 75.369368
## 3 Yes/No 84 0.33466135 0.1875 28.990787 48.664585
## 4 No/Yes 8 0.03187251 0.0625 3.767181 -5.387381
# and calculating the test statistic
G_color <- 2*sum(hair_eye_results2$g_i)
G_color
## [1] 134.3321
The test statistic will follow the same probability distribution as Pearson’s test stat: \(G \sim \chi^2_{c-1}\)
# Finding the p-value
pchisq(q = G_color, df = 3, lower = F)
## [1] 6.300664e-29
or using a canned function GTest()
in
DescTools
package:
pacman::p_load(DescTools)
# GTest() has the same arguments as chisq_test()
color_gtest <-
GTest(
x = hair_eye_results2$count,
p = c(9, 3, 3, 1)/16
)
color_gtest
##
## Log likelihood ratio (G-test) goodness of fit test
##
## data: hair_eye_results2$count
## G = 134.33, X-squared df = 3, p-value < 2.2e-16
While the results are not exactly the same, they are fairly similar and most importantly the conclusion is exactly the same. We have strong evidence that at least one of the stated proportions is incorrect. But which proportion(s)?
IF we reject the null hypothesis, we want to find which proportions are incorrect. We follow up by examining each of the standardized residuals
\[z_i = \sqrt{N}\frac{(p_i - \pi_{i,0})}{\sqrt{\pi_{i,0}}}\]
As long as all the expected counts are at least 5 \((n_i \pi_{i,0} \ge 5)\), the \(z_i\) will follow a standard Normal distribution:
If there are many proportions (rule of thumb is 6 or more), we only consider \(|z_i| > 3\) to indicate a strong difference
We can find the residuals fairly quickly using the saved objects from
the canned function, chisq_test()
, by using
pearson_residuals()
(also in the rstatix
package)
hair_eye_results2 |>
# Adding the residual column to the summary df
mutate(residuals = rstatix::pearson_residuals(color_gof)) |>
# Picking just the relevant columns
dplyr::select(hair_eye_dom, prop, expected_prop, zi2, g_i, residuals)
## hair_eye_dom prop expected_prop zi2 g_i residuals
## 1 No/No 0.23505976 0.5625 47.842657 -51.480530 -6.916839
## 2 Yes/Yes 0.39840637 0.1875 59.545900 75.369368 7.716599
## 3 Yes/No 0.33466135 0.1875 28.990787 48.664585 5.384309
## 4 No/Yes 0.03187251 0.0625 3.767181 -5.387381 -1.940923
We can see there is strong evidence that all but No/Yes is different that what is expected, with the last category having a z-score close to 2!