Introduction

Welcome! This R Markdown document serves as a demo session focusing on chi-square tests. Specifically, we will cover two topics:

  • Chi-Square Test of Goodness of Fit
  • Chi-Square Test of Independence

Chi-Square Test of Goodness of Fit

The Chi-Square Test of Goodness of Fit is a statistical test used to determine if observed frequencies of categorical data match the expected frequencies. It helps to assess whether a given distribution of data fits a specific theoretical model, allowing researchers to either accept or reject the null hypothesis that the observed and expected frequencies are similar.

Let’s start by creating a simulated dataset for the purpose of this demonstration.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Create a sample dataset for Chi-Square Test of Goodness of Fit
set.seed(123)
categories <- c("Red", "Green", "Blue")
observed_counts <- sample(50:150, 3, replace = TRUE)
observed_counts
## [1]  80 128 100
# Create a data frame
data_goodness_of_fit <- data.frame(Category = categories, Observed = observed_counts)

# Calculate percentages
total_count <- sum(data_goodness_of_fit$Observed)
data_goodness_of_fit$Percentage <- (data_goodness_of_fit$Observed / total_count) * 100

# Compute positions for percentage labels
data_goodness_of_fit <- data_goodness_of_fit %>%
  arrange(desc(Category)) %>%
  mutate(LabelPosition = cumsum(Observed) - 0.5 * Observed)

data_goodness_of_fit
##   Category Observed Percentage LabelPosition
## 1      Red       80   25.97403            40
## 2    Green      128   41.55844           144
## 3     Blue      100   32.46753           258

The research question that can be addressed using the Chi-Square Test of Goodness of Fit from the dataset is, “Do the observed frequencies of colors (Red, Green, Blue) differ from an equal distribution across the categories?”

Data Visualization

We will use ggplot2 to create a stacked bar graph to visualize the dataset.

library(ggplot2)

# Create a stacked bar graph with custom colors and percentage annotations
ggplot(data_goodness_of_fit, 
       aes(x = "Categories", y = Observed, fill = Category)) + # Base layer specifying data and aesthetics
  geom_bar(stat = "identity") + # Add bars using the provided data without aggregation
  scale_fill_manual(values = c("Red" = "#E63946", "Green" = "#2A9D8F", "Blue" = "#264653")) + # Manually set colors for each category
  geom_text(aes(y = LabelPosition, # Specify the position of the text labels
                label = paste0(round(Percentage, 1), "%")),  # Add text labels for percentages
            color = "white", 
            fontface = "bold") +
  labs(title = "Stacked Bar Graph of Observed Counts with Percentages", # Set the title of the graph
       x = "",  # Remove x-axis label
       y = "Count")  # Set the y-axis label

Assumptions Check

Before conducting the test, we should check the following assumptions:

  • Observations are independent.
  • Random sampling.
  • Sample size is sufficiently large (usually at least 5 counts in each category).

In our simulated data, these assumptions are met.

Conducting the Test

We will use the chisq.test() function in R to conduct the test.

# Conducting the chi-square test of goodness of fit
chi_sq_result <- chisq.test(data_goodness_of_fit$Observed)

# Display results
chi_sq_result
## 
##  Chi-squared test for given probabilities
## 
## data:  data_goodness_of_fit$Observed
## X-squared = 11.325, df = 2, p-value = 0.003474

Results Interpretation

The p-value will tell us if our observed distribution significantly differs from an expected uniform distribution.

Based on the test output provided:

  • X-squared = 11.325: This is the test statistic, which quantifies the difference between observed and expected counts.
  • df = 2: This indicates that there are 2 degrees of freedom. The degrees of freedom for a goodness of fit test is calculated as the number of categories minus 1. In our case, 3 categories (Red, Green, Blue) minus 1 gives us 2 degrees of freedom.
  • p-value = 0.003474385**: The p-value tells us the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from our sample, assuming that the null hypothesis is true. A low p-value suggests that the observed counts significantly differ from the expected counts.

Effect Size

We can calculate the effect size using Cramér's V.

The formulas for the calculation is:

\[ V = \sqrt{\frac{\chi^2}{N(k - 1)}} \]

# Calculating Cramér's V
cramer_v <- sqrt(chi_sq_result$statistic / (sum(data_goodness_of_fit$Observed) * (min(length(data_goodness_of_fit$Category) - 1, 1))))

# Display Cramér's V
paste("Cramér's V: ", round(cramer_v, 2))
## [1] "Cramér's V:  0.19"

Interpretation Guidelines:

Df Small Medium Large
1 0.10 0.30 0.50
2 0.07 0.21 0.35
3 0.06 0.17 0.29
4 0.05 0.15 0.25
5 0.04 0.13 0.22

The Cramer’s V value of 0.19 suggests a small to medium effect size for the observed differences in the frequencies of the color categories (Blue, Red, and Green). This indicates that there’s a weak to moderate deviation from what would be expected if the colors were distributed equally.

Chi-Square Test of Independence

The Chi-square Test of Independence is used to determine if there is a significant association between two categorical variables. Specifically, it tests whether the observed frequencies for categories are different from what we would expect under the assumption of independence.

Context of the Study

In our current study, we aim to examine the relationship between the method of reward used to train cats and the outcome of whether the cat can dance. The dataset, catsData, provides information on two variables:

  • Training: This represents the type of reward used to train the cat, and it can take one of two values - affection or food.
  • Dance: This denotes whether the cat can dance (yes) or not (no).

The dataset can be structured in two ways:

  • The first format consists of two columns, Training and Dance, where each row represents an individual cat’s data.
  • The second format presents the data in the form of a contingency table, catsTable.

Data Exploration

Let’s begin by exploring our dataset visually.

library(ggplot2)
library(scales) 
library(dplyr)

# Provided dataset
catsData <- read.delim("cats.dat", header = TRUE)

head(catsData)
##         Training Dance
## 1 Food as Reward   Yes
## 2 Food as Reward   Yes
## 3 Food as Reward   Yes
## 4 Food as Reward   Yes
## 5 Food as Reward   Yes
## 6 Food as Reward   Yes
# Prepare the data for a stacked bar graph
plot_data <- catsData %>%  
  count(Training, Dance) %>%  
  group_by(Training) %>%  
  mutate(percent = n/sum(n))  # Calculate percentage within each Training group

# Generate labels for the bars
plot_data$label <- paste(as.character(plot_data$n), 
                         '\n(', 
                         as.character(percent(plot_data$percent)),')', 
                         sep = '')  # Create labels with counts and percentages

# Display the data
plot_data  # Output the processed data
## # A tibble: 4 × 5
## # Groups:   Training [2]
##   Training            Dance     n percent label         
##   <chr>               <chr> <int>   <dbl> <chr>         
## 1 Affection as Reward No      114   0.704 "114\n(70.4%)"
## 2 Affection as Reward Yes      48   0.296 "48\n(29.6%)" 
## 3 Food as Reward      No       10   0.263 "10\n(26.3%)" 
## 4 Food as Reward      Yes      28   0.737 "28\n(73.7%)"
# Create the stacked bar graph
ggplot(plot_data, aes(x = Training, y = percent, fill = Dance)) + 
  geom_col(position = "fill", width = 0.5) +  # Create bars with fill position and specified width
  geom_text(aes(label=label), position = "fill",color = 'black', vjust=2) +  # Add text labels to bars
  scale_y_continuous(labels = percent) +  # Format the y-axis labels as percentages
  scale_x_discrete(name = 'Types of training') +  # Label the x-axis
  scale_fill_brewer(palette="Pastel1", name = 'Dance')  # Choose a color palette and label the legend

For cats trained with Affection, only about 29.6% were observed to dance, while the majority (70.4%) did not. In contrast, when Food was used as a reward, a significant 73.7% of cats danced, with only 26.3% refraining.

The graph suggests that cats are more inclined to dance when offered food as a reward compared to affection.

Assumptions for Chi-square Test of Independence:

  1. Assumption of independence: The observations should be independent of each other.
  2. Expected frequency > 5: Each cell of the contingency table should have an expected frequency of 5 or more.
  3. Random sampling.

In our data, these assumptions are met.

Conducting the Test

To analyze the relationship between the type of training and a cat’s ability to dance, we’ll employ the CrossTable() function from the gmodels package. This function has several arguments:

  • The first argument(s) can either be the raw data columns (catsData$Training, catsData$Dance) or a contingency table (catsTable).
  • fisher: If set to TRUE, it computes the Fisher’s exact test. Useful when the sample size is small.
  • chisq: If set to TRUE, it calculates the chi-squared test.
  • expected: If set to TRUE, it displays the expected frequencies.
  • sresid: If set to TRUE, it shows the standardized residuals.
  • format: This determines the output format. For our purpose, we’ve set it to “SPSS” for a clean presentation.

Let’s start by entering our data:

# Entering data: the contingency table
table(catsData$Dance,catsData$Training)
##      
##       Affection as Reward Food as Reward
##   No                  114             10
##   Yes                  48             28
food <- c(10, 28)
affection <- c(114, 48)
catsTable <- cbind(food, affection)
catsTable
##      food affection
## [1,]   10       114
## [2,]   28        48

Now, let’s proceed with our analysis:

library(gmodels)

# Using the raw data:
gmodels::CrossTable(catsData$Training, 
                    catsData$Dance, 
                    fisher = TRUE, 
                    chisq = TRUE, 
                    expected = TRUE, 
                    sresid = TRUE, 
                    format = "SPSS")
## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |         Expected Values |
## | Chi-square contribution |
## |             Row Percent |
## |          Column Percent |
## |           Total Percent |
## |            Std Residual |
## |-------------------------|
## 
## Total Observations in Table:  200 
## 
##                     | catsData$Dance 
##   catsData$Training |       No  |      Yes  | Row Total | 
## --------------------|-----------|-----------|-----------|
## Affection as Reward |      114  |       48  |      162  | 
##                     |  100.440  |   61.560  |           | 
##                     |    1.831  |    2.987  |           | 
##                     |   70.370% |   29.630% |   81.000% | 
##                     |   91.935% |   63.158% |           | 
##                     |   57.000% |   24.000% |           | 
##                     |    1.353  |   -1.728  |           | 
## --------------------|-----------|-----------|-----------|
##      Food as Reward |       10  |       28  |       38  | 
##                     |   23.560  |   14.440  |           | 
##                     |    7.804  |   12.734  |           | 
##                     |   26.316% |   73.684% |   19.000% | 
##                     |    8.065% |   36.842% |           | 
##                     |    5.000% |   14.000% |           | 
##                     |   -2.794  |    3.568  |           | 
## --------------------|-----------|-----------|-----------|
##        Column Total |      124  |       76  |      200  | 
##                     |   62.000% |   38.000% |           | 
## --------------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  25.35569     d.f. =  1     p =  4.767434e-07 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  23.52028     d.f. =  1     p =  1.236041e-06 
## 
##  
## Fisher's Exact Test for Count Data
## ------------------------------------------------------------
## Sample estimate odds ratio:  6.579265 
## 
## Alternative hypothesis: true odds ratio is not equal to 1
## p =  1.311709e-06 
## 95% confidence interval:  2.837773 16.42969 
## 
## Alternative hypothesis: true odds ratio is less than 1
## p =  0.9999999 
## 95% confidence interval:  0 14.25436 
## 
## Alternative hypothesis: true odds ratio is greater than 1
## p =  7.7122e-07 
## 95% confidence interval:  3.193221 Inf 
## 
## 
##  
##        Minimum expected frequency: 14.44
# Using the contingency table:
gmodels::CrossTable(catsTable, 
                    fisher = TRUE, 
                    chisq = TRUE, 
                    expected = TRUE, 
                    sresid = TRUE, 
                    format = "SPSS")
## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |         Expected Values |
## | Chi-square contribution |
## |             Row Percent |
## |          Column Percent |
## |           Total Percent |
## |            Std Residual |
## |-------------------------|
## 
## Total Observations in Table:  200 
## 
##              |  
##              |      food  | affection  | Row Total | 
## -------------|-----------|-----------|-----------|
##         [1,] |       10  |      114  |      124  | 
##              |   23.560  |  100.440  |           | 
##              |    7.804  |    1.831  |           | 
##              |    8.065% |   91.935% |   62.000% | 
##              |   26.316% |   70.370% |           | 
##              |    5.000% |   57.000% |           | 
##              |   -2.794  |    1.353  |           | 
## -------------|-----------|-----------|-----------|
##         [2,] |       28  |       48  |       76  | 
##              |   14.440  |   61.560  |           | 
##              |   12.734  |    2.987  |           | 
##              |   36.842% |   63.158% |   38.000% | 
##              |   73.684% |   29.630% |           | 
##              |   14.000% |   24.000% |           | 
##              |    3.568  |   -1.728  |           | 
## -------------|-----------|-----------|-----------|
## Column Total |       38  |      162  |      200  | 
##              |   19.000% |   81.000% |           | 
## -------------|-----------|-----------|-----------|
## 
##  
## Statistics for All Table Factors
## 
## 
## Pearson's Chi-squared test 
## ------------------------------------------------------------
## Chi^2 =  25.35569     d.f. =  1     p =  4.767434e-07 
## 
## Pearson's Chi-squared test with Yates' continuity correction 
## ------------------------------------------------------------
## Chi^2 =  23.52028     d.f. =  1     p =  1.236041e-06 
## 
##  
## Fisher's Exact Test for Count Data
## ------------------------------------------------------------
## Sample estimate odds ratio:  0.1519927 
## 
## Alternative hypothesis: true odds ratio is not equal to 1
## p =  1.311709e-06 
## 95% confidence interval:  0.06086544 0.352389 
## 
## Alternative hypothesis: true odds ratio is less than 1
## p =  7.7122e-07 
## 95% confidence interval:  0 0.3131634 
## 
## Alternative hypothesis: true odds ratio is greater than 1
## p =  0.9999999 
## 95% confidence interval:  0.07015399 Inf 
## 
## 
##  
##        Minimum expected frequency: 14.44

Results interpretation

CrossTab Data:

The main body of the table gives us the observed frequencies (counts) for each combination of the factors.

  • For example, 114 cats trained with affection did not dance, while 10 cats trained with food did not dance.
  • The percentages (Row, Column, and Total Percent) are computed based on these counts. For instance, the “29.608%” you referenced for “Affection as Reward” and “Yes” is the row percentage of the cats that danced when trained with affection out of all cats trained with affection.

Chi-squared Tests:

Pearson’s Chi-squared test:

  • The value of the test statistic is 25.35569 and is highly significant (p-value is 4.76 x 10^-7, which is way below the 0.05 significance threshold). This indicates that the type of reward used is associated with the outcome of the training.
  • It’s commonly used for tables larger than 2x2 (though it can be applied to 2x2 tables as well) and when sample sizes are reasonably large.
  • Expected frequencies for each cell of the table should be 5 or greater for the chi-squared test to be valid.
  • Assumes that observations are independent of each other.

Yates’ continuity correction:

  • Its significance value (p = 1.236041e-06) also supports the strong association between reward type and training outcome.
  • This is an adjusted version of the Pearson’s chi-squared test for 2x2 tables, to correct for overestimation.
  • The correction adjusts the chi-squared statistic to make it more accurate. It subtracts 0.5 from the difference between each observed value and its expected value in a 2x2 table. This makes the chi-squared distribution a better approximation of the distribution of the test statistic, especially for small samples.
  • Used when the sample size is small.

Fisher’s Exact Test:

  • It’s used primarily for 2x2 tables, especially when the sample size is small, but it can be extended to larger tables (though computations can be intensive).
  • It’s particularly useful when expected frequencies in any cell of the table are below 5. It’s also suitable for very small sample sizes.
  • The sample estimated odds ratio (OR) is 6.579265, suggesting that cats are about 6.6 times more likely to dance when trained with food than with affection. Note that it is possible to get a different value (0.1519927) for the odds ratio using the same dataset. The OR is a reciprocal value based on how you frame your observation. This means that cats are about 0.152 times (or 15.2% as likely) to dance when trained with affection compared to when they are trained with food. In essence, both values convey the same information but from different reference points. The interpretation depends on which category you consider as the reference or baseline. If “food as a reward” is the reference, then an OR greater than 1 indicates a higher likelihood of the event (dancing) occurring. If “affection as a reward” is the reference, then an OR less than 1 (like 0.1519927) indicates a lower likelihood of the event occurring.
  • The p-value (1.31 x 10^-6) again indicates a highly significant association.
  • The 95% confidence interval for the odds ratio ranges from 2.837773 to 16.42969, meaning we’re 95% confident that the true odds ratio lies in this range.
  • When interpreting the alternative hypotheses: The first hypothesis tests if the true odds ratio is different from 1 (indicating an association). Its very small p-value suggests that the odds ratio is indeed not equal to 1. The second hypothesis tests if the odds ratio is less than 1. The high p-value (near 1) suggests that this is not the case. The third tests if the odds ratio is greater than 1. The small p-value (7.71 x 10^-7) supports this, in line with our original findings.
  • Minimum expected frequency: This is 14.44, and it’s an important measure for the chi-squared test, ensuring the validity of the test. Generally, all expected counts should be 5 or greater for the chi-squared test to be valid. Here, our minimum is 14.44, which is greater than 5, suggesting our test is reliable.

Effect Size and Its Interpretation

The effect size helps us understand the practical significance of our results. There are two measures of effect size that we will calculate:

  • Odds Ratio: This measure is appropriate for 2x2 tables. It gives us the odds of an event occurring in one group relative to it occurring in another group. To compute the odds ratio, simply set fisher = TRUE in CrossTable()).
  • Cramer’s V: Useful when the number of rows and/or columns exceeds 2. It provides a measure of association between two categorical variables, ranging from 0 (no association) to 1 (perfect association).
library(rcompanion)
## Registered S3 method overwritten by 'DescTools':
##   method         from 
##   reorder.factor gdata
# The new dataset
Species = c(rep("Species1", 16), rep("Species2", 16))
Color   = c(rep(c("blue", "blue", "blue", "green"),4),
            rep(c("green", "green", "green", "blue"),4))
d <- as.data.frame(cbind(Species, Color))

head(d)
##    Species Color
## 1 Species1  blue
## 2 Species1  blue
## 3 Species1  blue
## 4 Species1 green
## 5 Species1  blue
## 6 Species1  blue
# Compute Cramer's V
cramerV(matrix(table(d$Species, d$Color), ncol = 2), ci = T)
##   Cramer.V lower.ci upper.ci
## 1      0.5   0.1909   0.7893

Interpretation Guidelines:

Df Small Medium Large
1 0.10 0.30 0.50
2 0.07 0.21 0.35
3 0.06 0.17 0.29
4 0.05 0.15 0.25
5 0.04 0.13 0.22

The Cramer’s V value of 0.5 suggests a large effect size, indicating a strong association between the two categorical variables - in this case, the type of reward (affection vs. food) and the behavior of cats dancing. There’s a statistically significant and strong association between the type of reward given to cats and their likelihood to dance. The method of rewarding cats (with food or affection) has a substantial influence on their behavior in this context. From our data, rewarding cats with food is more effective in training cats to dance.