The chi-square test of independence is a procedure for testing if two categorical variables are related in some population. The null hypothesis tested is that there is no relationship between the categorical variables. To examine the relationship between two variables, a contingency table is produced, which is a table whereby the categories of the explanatory variable are placed in the columns and the categories of the response variable are placed in the rows. The rows and columns intersect at cells. The row totals are found along the right side and the column totals are found along the bottom.

A chi-square test of independence is appropriate when the expected frequencies of each cell in the contingency table is at least 5.

Background Theory and Example

Chi-square tests of independence analyze a contingency table that looks like the table below:

No_Severe_Reaction Severe_Reaction
Thigh 4758 30
Arm 8840 76

The above table is from a study by Jackson et al. (2013) that investigated the reaction of children aged 3 to 6 years old to the DTaP vaccine (diphtheria, tetanus, and pertussis) in either the thigh or the arm to determine the better location for the vaccine. Both of their variables were categorical: location (arm or thigh) and reaction (severe or not). There was a higher proprtion of a severe reaction in the arm vs the thigh. A chi-square test of independence was used to examine whether the location of the vaccine was related to the severity of the reaction.

To calculate a chi-square value, expected values are estimated from the observed values. To see if For the vaccination example, there are 13704 total children (4758+8840+30+76), and 106 of them had reactions (30+76). The null hypothesis is therefore that 106/13704 = 0.7735% of the children given injections in the thigh would have reactions, and 0.7735% of children given injections in the arm would also have reactions. There are 4788 children given injections in the thigh (4758+30), so you expect 0.007735 × 4788 = 37.0 of the thigh children to have reactions, if the null hypothesis is true. You could do the same kind of calculation for each of the cells in this 2×2 table of numbers. The observed values are compared to the expected values using the chi-square statistic.

Loading the Data into RStudio

In order to make your RStudio script file organized, you will want to include some information at the top of the file. You can use the hashtag (#) to include things in your RStudio file that R can’t read.

For each test, you should include the following lines at the top of each RStudio file:

#question:
#response variable:
#explanatory variable:
#test name:

Below is what it would look like for the vaccine example:

#question: is there a relationship between location of vaccine and severity of reaction?
#response variable: severity of reaction
#explanatory variable: location of vaccine (categorical)
#test name: chi-square test of independence

You can directly enter your data into RStudio to run a chi-square test of independence. Below is the code to enter the vaccine data from above.

R1 = c(4788, 30)
R2 = c(8916, 76) 

rows   = 2

Matriz = matrix(c(R1, R2), nrow=rows, byrow=TRUE)

#code to name the rows and columns
rownames(Matriz) = c("Thigh", "Arm")          
colnames(Matriz) = c("No.severe", "Severe")    

Matriz
##       No.severe Severe
## Thigh      4788     30
## Arm        8916     76

The contingency table here is the same as the one above so you know you entered your data in correctly.

Running the Chi-square Test of Independence

To run the chi-square test of independence, use the code below.

#code to run the chi-square test of independence on the vaccine data.
chisq.test(Matriz, correct=TRUE)  
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  Matriz
## X-squared = 1.7579, df = 1, p-value = 0.1849

The output shows that the chi-square test statistic (X-squared) was 1.76, the degrees of freedom (df) was 1, and the p-value was 0.185. The df are calculated as (# rows - 1) x (# columns - 1) so in this case it was (2 - 1) x (2 - 1), which is 1. Because the p-value was > 0.05, you cannot conclude that children aged 3-6 given DTaP vaccinations in the thigh have fewer reactions that those given injections in the arm.

Presenting your Results

Results Statements

The results section of your paper should begin with a narrative of your results statements. These statements should be quantitative in nature and include

  1. the statistical significance and
  2. the biological significance.

  3. Statistical Significance: Your first sentence should list the results of the statistical test (in this case whether the location of the vaccine shot was related to the severity of the reaction). You include statistical data in parentheses at the end of the sentence only. You should never write “The p-value was…” or “The chi-square was, which means…”. You don’t write about your statistics, you just write the biological results and include your statistics in parentheses. This satisfies whether your results were statistically significant.

  4. Biological Significance: For the quantitative statements, you can include quantitative comparisons of the counts (e.g., one is X% higher than the other).

The first results statement that includes the statistical results in parentheses should also reference the figure you are referring to. Remember to always -refer to figures in the order in which they appear, and -include your results narrative before the figure.

Results Figure

A clustered bar chart with the group counts or percentages is the most appropriate way to present these data. You can use Excel to create a bar chart. Use the link below to watch a video on how to make a bar chart in Excel: https://www.screencast.com/t/Dv6xg1yXJT

Figure legend

A caption must be included below each figure.

The caption for an Chi-square Test of Independence must include:

1.  A short descriptive title following the figure number  
2.  A description of what you plotted  
3.  Your sample size (e.g., # transects/site)  
4.  The p-value for overall test.

Example Results Statement and Figure (Excel was used to generate this figure)

The severity of the DTap vaccine in children aged 3-6 was independent of where the vaccine was given (chi-square test of independence, X2 = 1.76,df = 1, p = 0.185, Figure 1). The arm resulted in 1.5 times more severe reactions to the DTaP vaccine than the thigh.

Figure 1. Percent of children aged 3-6 with either a severe or not a severe reaction to a vaccine in the thigh and the arm. The location of the vaccine was independent of the severity of the reaction (p = 0.185).

Quick Chi-square Test of Independence

Here is all of the code you need to quickly run the chi-square test of independence.

#code to enter the data
R1 = c(4788, 30)
R2 = c(8916, 76) 
rows   = 2

#code to create a contingency table
Matriz = matrix(c(R1, R2), nrow=rows, byrow=TRUE)

#code to name the rows and columns
rownames(Matriz) = c("Thigh", "Arm")          
colnames(Matriz) = c("No.severe", "Severe")    

#code to display the contingency table
Matriz

#code to run the chi-square test of independence on the vaccine data.
chisq.test(Matriz, correct=TRUE)  

*Written by Carrie L. Woods, August 2019. Modified from http://stats.pugetsound.edu/ecology/

References Jackson, L.A., Peterson, D., Nelson, J.C., et al. (13 co-authors). 2013. Vaccination site and risk of local reactions in children one through six years of age. Pediatrics 131: 283-289. McDonald, J. http://www.biostathandbook.com/chiind.html (accessed 9/1/2019)