Topic 6: Chi-Square Tests in jamovi


Welcome to Computer Lab 6 for the Data Analysis (DA) component of BIO2POS, our final DA computer lab for the semester!

Throughout this semester we have predominantly focused on assessing numeric variables. Our final DA topic, Topic 6, covered Chi-Square tests, which are used to assess categorical data.

  • Chi-Square Goodness of Fit tests can be used for assessing the features of one categorical variable, while
  • Chi-Square Tests of Independence can be used to assess whether or not there is an association between two categorical variables.

The assumptions of these tests were also discussed.

In this computer lab, you will continue to learn how to use the statistical software jamovi, and conduct Chi-Square tests using real data sets. You will also learn how to check the assumptions of these tests in jamovi, and how to interpret and summarise jamovi output for these tests.

Learning Outcomes

These labs are designed to provide you with plenty of opportunities to practice different aspects of the statistical content covered in the lectures.

Each lab consists of core questions (with the 🌱 symbol) and extension questions (with the 🌳 symbol).

  • We recommend that you aim to complete at least the core component question(s) within the scheduled lab time.
  • If you have time, you can work through the extension component question(s) either during the lab, or later in your own time.
  • We recommend that you aim to complete all questions before the end of Week 11.

Having completed this lab, you will be able to conduct the following tests and calculations in jamovi:

  • Chi-Square Goodness of Fit test
  • Chi-Square Test of Independence

You will also be able to interpret the results of the above statistical techniques, check the assumptions of the tests, and provide clear summary statements highlighting the key statistical outputs of the models.


Before you begin, please check the following:

  1. Have you attended this week’s lectures/watched the lecture recordings?
  2. Have you completed this week’s DA Revision Quiz?

Please aim to complete Step 1 before starting this lab, as doing so will help you to better understand the content covered. Please aim to complete Step 2 before the next week of DA content.


1 Coral Bleaching - Chi-Square Goodness of Fit test 🌱

<span style='font-size:10px;'>Note. From File:LordHoweIsland NorthBay Reef 27.JPG, by [Toby Hudson](https://commons.wikimedia.org/wiki/User:99of9), 2012, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY-SA 3.0 AU DEED](https://creativecommons.org/licenses/by-sa/3.0/au/deed.en) </span>

Figure 1.1: Note. From File:LordHoweIsland NorthBay Reef 27.JPG, by Toby Hudson, 2012, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 AU DEED

Coral reefs can be severely impacted by anomalies and increases in ocean temperatures. A recent paper by Moriarty et al. (2023) documented and analysed the extent of coral bleaching and subsequent recovery within the Lord Howe Island lagoonal reef over an eight month period in 2019. You can freely access their paper here.

Data from their study has been formatted for jamovi analyses, and is available in the file coral_data.omv in this week’s tile on LMS. It contains recorded values for the following variables:

  • Site: The region in the lagoonal reef where the recording was taken - One of Sylph's Hole, North Bay, or Coral Garden
  • Month: The month in which the recording was taken - One of March, April/May, or October
  • Taxa: The species of coral - One of Stylophora pistillata, Pocillopora damicornis, Porites spp., Seriatopora hystrix, Isopora cuneats, Acropora spp. or Other taxa
  • Bleaching_Status: The health status of the coral - One of Bleached, Dead,or Healthy

jamovi

1.0.1

To begin, create a descriptives table in row format for the coral_data.omv data, using the Bleaching_Status variable, and splitting results by Taxa. Also create a bar plot of your results.

1.0.2

Suppose you would like to conduct a Chi-Square Goodness of Fit test to check if the proportions of the different coral species in the Lord Howe Island lagoonal reef region are all the same. Since we have data on six species plus a seventh category for ‘Other taxa’, this would mean we are expecting proportions of approximately 0.14 for each species.

Based on your results from part 1.0.1, do you think it is reasonable to assume equal proportions across coral species? Explain your reasoning clearly.

1.0.3

Regardless of your previous conclusion, suppose you begin by conducting a simple Chi-Square Goodness of Fit test of coral species’ proportions, under the assumption that proportions are equal across all categories.

We will now cover how to conduct this test in jamovi.

1.0.3.1

Write out an appropriate null and alternative hypothesis for this test.

  • You can use abbreviations for the taxa names in the subscripts, e.g. 1, 2, etc.

1.0.3.2

Navigate to the Frequencies tab, and select N Outcomes. Since we are interested in the different coral species, drag the Taxa variable across to the Variable box. You should see your Chi-Square Goodness of Fit test results appear automatically.

1.0.3.3

Click the Expected counts box, to display expected counts for each level (species) of the categorical variable Taxa.

1.0.3.4

Confirm the Expected Count of 300.286 by hand calculation.

Hint

Recall from the Topic 6A Lecture that the expected count will be the sample size divided by the number of levels of your categorical variable, for a Chi-Square Goodness of Fit test.

1.0.3.5

Write a short, simple summary of your results. Make sure to confirm the test assumptions are satisfied.

1.0.4

Moriarty et al. (2023) note that the dominant coral species within the Lord Howe Island lagoonal reef are:

  • Stylophora pistillata
  • Pocillopora damicornis
  • Porites spp.
  • Seriatopora hystrix

Since it is expected these will be more prevalent, let us re-run our Chi-Square Goodness of Fit test, using the extended case where we assign specific expected proportions to each category.

1.0.4.1

In the Proportion Test (N Outcomes) section of jamovi, expand the Expected Proportions button. You will see that the ratios for all corals are set to 1, making all the proportions equal at 0.143.

Change the ratios for the 4 dominant coral species to 0.2 each, and then set the remaining 2 coral species to 0.075, and Other taxa to 0.05.

1.0.4.2

Check your new results, and write a summary of your Chi-Square Goodness of Fit test. Compare your observed test statistic value for both versions of the test.

1.0.5

Conduct another Chi-Square Goodness of Fit test, this time using the Bleaching_Status, and summarise your results.

Suppose that past results suggest that a typical distribution of proportions is 0.42 for Bleached coral, 0.18 for Dead coral, and 0.4 for Healthy coral.

R

1.0.1

Open up R, and open a new R script (File -> New File -> R Script), set your current working directory to where you downloading the coral data, and read in the coral data, storing it in the object coral.

If you need to refresh your memory on any of the processes mentioned above, just check the Details box below:


Recall that to open .omv files in R, we’ll need the jmvReadWrite package installed and loaded. If you are not sure whether you have this package, copy-paste and then run the following code in your R script, one line at a time.

install.packages("jmvReadWrite") # this line installs the package we need
library(jmvReadWrite) # this line loads the package in our current session
# Note that anything after a # is called a comment in R, and isn't treated as executable code

To run a line of code in RStudio, just have your cursor on that line, and click the Run Selected Line(s) button at the top right of the script (where the green arrow is, see reference image below). Your line of code will then be run, or executed, and you should see the code and some other output appear in the Console section below your script file.

Recall that to set your Working Directory (where R looks for files), the two simplest options in RStudio are:

  • Press Ctrl + Shift + H in Windows (Cmd + Shift + H in Mac), then navigate to where you saved your file (e.g. your `Downloads’ folder), or
  • In the menu panel in the top right, go to Session -> Set Working Directory -> Choose Directory… (see screenshot below for reference) then navigate to where you saved your file (e.g. your `Downloads’ folder)


Then:

  • Copy-paste the following line of code into your script file, below your other lines of code, and
  • Run the line of code
coral <- read_omv("coral_data.omv") 
# This line loads our coral data set into RStudio, 
# and stores the data in an object we've called coral_data

You should now see coral_data listed in the Environment section of RStudio in the top right - this means the data is loaded in RStudio, and ready for analysis!


In R, if we would like to check a descriptives table, often the summary function is the simplest starting point, e.g.:

summary(coral)

Since we have 3 Bleaching Status options, and 7 Taxa options, it may be helpful to split our bar plots based on either of these variables, in order to obtain a more detailed understanding of how the different taxa are impacted.

The R code below walks you through the set up for a simple bar plot - try adjusting the levels to produce the bar plots for the other Bleaching_Status options, or even switch the Taxa and Bleaching_Status positions, to create bar plots for each Taxa.

barplot(table(coral$Taxa[coral$Bleaching_Status == levels(coral$Bleaching_Status)[1]]),
        main =levels(coral$Bleaching_Status)[1], col = 2:7)

# Here, Taxa and Bleaching_Status are recorded as factor variables
# So we use the table() function to structure them in a way that the barplot function understands
# We are subsetting the data using the == part to select only those entries that match the
# levels(coral$Bleaching_Status)[1]], i.e. that match the first listed Taxa
# Note for example levels(coral$Taxa)[2] would give us the 2nd listed Taxa, etc
1.0.2

Suppose you would like to conduct a Chi-Square Goodness of Fit test to check if the proportions of the different coral species in the Lord Howe Island lagoonal reef region are all the same. Since we have data on six species plus a seventh category for ‘Other taxa’, this would mean we are expecting proportions of approximately 0.14 for each species.

Based on your results from part 1.0.1, do you think it is reasonable to assume equal proportions across coral species? Explain your reasoning clearly.

1.0.3

Regardless of your previous conclusion, suppose you begin by conducting a simple Chi-Square Goodness of Fit test of coral species’ proportions, under the assumption that proportions are equal across all categories.

We will now cover how to conduct this test in R.

1.0.3.1

Write out an appropriate null and alternative hypothesis for this test.

  • You can use abbreviations for the taxa names in the subscripts, e.g. 1, 2, etc.
1.0.3.2

To conduct a Chi-Square Goodness of Fit test in R, we can use the inbuilt chisq.test function. This function is deceptively simple to use, but we need to ensure that the data provided is in the right format - in this instance, this means we will need to restructure the Taxa data to a table format.

It will also be helpful to assign the output of the chisq.test function to a new object, since the function will generate various data which we may wish to assess separately afterwards.

To begin, run the following code:

coral_gof <- chisq.test(table(coral$Taxa))
# Here we are conducting the test across all the different coral species at once
# We are assuming equal proportions
coral_gof
# Run this line to then check the results of the test
1.0.3.3

Once we have run the test, we can access various secondary results stored within the coral_gof object using the $ operator.

For example, to check the expected counts for each category, run the following code:

coral_gof$expected
1.0.3.4

Confirm the Expected Count of 300.286 by hand calculation.

Hint

Recall from the Topic 6A Lecture that the expected count will be the sample size divided by the number of levels of your categorical variable, for a Chi-Square Goodness of Fit test.

1.0.3.5

Write a short, simple summary of your results. Make sure to confirm the test assumptions are satisfied.

1.0.4

Moriarty et al. (2023) note that the dominant coral species within the Lord Howe Island lagoonal reef are:

  • Stylophora pistillata
  • Pocillopora damicornis
  • Porites spp.
  • Seriatopora hystrix

Since it is expected these will be more prevalent, let us re-run our Chi-Square Goodness of Fit test, using the extended case where we assign specific expected proportions to each category.

1.0.4.1

Suppose we set the proportions for each of the dominant coral species to 0.2 each, and set the remaining coral species proportions to 0.075, and Other taxa to 0.05.

To specify different proportions for each category, we can use the p argument in the chisq.test function (which by default was set to 1 divided by # categories for each category).

If we check the levels of our Taxa data:

levels(coral$Taxa)

we observe that the 4 dominant coral species are listed as levels 7, 4, 5 and 6 respectively. Other taxa is level 3, leaving the less dominant coral species as levels 1 and 2.

Therefore, we can use the vector p = c(0.075, 0.075, 0.05, 0.2, 0.2, 0.2, 0.2). While the process is a little unwieldy, it does allow us unprecedented control over the distribution of expected proportions.

Run the following R code to conduct the extended case test:

coral_gof_extended <- chisq.test(table(coral$Taxa), p = c(0.075, 0.075, 0.05, 0.2, 0.2, 0.2, 0.2))
# Here we are conducting the test across all the different coral species at once
# We are no longer assuming equal proportions
coral_gof_extended
# Run this line to then check the results of the test
1.0.4.2

Check your new results, and write a summary of your Chi-Square Goodness of Fit test. Compare your observed test statistic value for both versions of the test.

1.0.5

Conduct another Chi-Square Goodness of Fit test, this time using the Bleaching_Status, and summarise your results.

Suppose that past results suggest that a typical distribution of proportions is 0.42 for Bleached coral, 0.18 for Dead coral, and 0.4 for Healthy coral.


2 Coral Bleaching - Chi-Square Test of Association 🌱

Suppose we extend our analysis of the coral_data.omv data, and now take into account the different sites in the Lord Howe Island lagoonal reef.

Specifically, suppose we would like to determine if there is an association between the different sites in the Lord Howe Island lagoonal reef and the health status of the coral in the reef.

jamovi

2.0.1

To conduct a Chi-Square Test of Association of Bleaching_Status vs Site in jamovi, click on the Frequencies tab and select the Independent Samples option.

Drag Bleaching_Status and Site across to the Rows and Columns boxes respectively. While this will yield test results, we can include additional details via the following steps.

2.0.1.1

Write out an appropriate null and alternative hypothesis for this test.

2.0.1.2

Expand the Statistics section and select Phi and Cramer's V under the Nominal heading.

2.0.1.3

Expand the Cells section and select Observed counts and Expected counts under the Counts heading, and Row under the Percentages heading.

2.0.1.4

Expand the Plots section, select Bar Plot, and change the Bar Type from Side by side to Stacked.

2.0.2

Summarise your test results. Make sure to check the test assumptions.

To interpret the effect size, you may like to check e.g. Kim (2017).

2.0.3

Based on your test results and the bar plot produced, do you agree with Moriarty et al.’s (2023) conclusion that Sylph’s Hole consistently has the least amount of healthy coral colonies?

2.0.3.1

To gain further supporting information, add the Month variable to the Layers box. This will further split the analysis to consider the different months.

You may like to untick the Expected counts and Row percentages boxes, to make the results more concise.

2.0.3.2

To gain a different perspective on your data, change the Counts option to Percentages within rows for your bar plot. Do you prefer this version, or the version from 2.0.1.4 for interpretative purposes?

R

2.0.1

The good news is that we can continue to use the chisq.test function to conduct Chi Square Tests of Association - we just need to ensure that the format of our code is correct. We’ll cover all the steps over the next few subquestions.

2.0.1.1

To begin, write out an appropriate null and alternative hypothesis for this test.

2.0.1.2

In R, run the following code to:

  • Create a two-way table of our data
  • Run the Chi Square Test of Association, using our two-way table data
coral_two_way_table <- table(coral$Bleaching_Status, coral$Site)
# This sets up our data in the right format for the test

coral_toa <- chisq.test(coral_two_way_table)
# Assign the test results to the object coral_toa (arbitrary name)
# Then run the object name to see the test results
coral_toa
2.0.1.3

To help us check the test assumptions, we will need to look at the expected counts across the different elements of the two-way table. To access this data, we can run the following code:

coral_toa$expected
2.0.1.4

There are various options and methods for visualising the data as you conduct your analysis. One option which may be helpful to us here is to produce stacked bar plots, to more easily identify differences across the different sites.

The R code below provides a framework for producing such plots:

barplot(two_way_table, col = 2:4, legend = T, ylim = c(0, 1400))
# Note that we need the data in the two-way table structure, 
# before we can plot the different sites together (you may like to test this for Q1 too)
# Here, we are including a legend, and increasing the y axis range
# to ensure the legend does not overlap the bar plot

barplot(prop.table(two_way_table, margin = 2), col = 2:4, legend = T, ylim = c(0, 1))
# Here, we are scaling the results to show proportions out of 1, 
# so comparisons between the different sites are more comparable
2.0.2

Summarise your test results. Make sure to check the test assumptions.

2.0.3

Based on your test results and the bar plot produced, do you agree with Moriarty et al.’s (2023) conclusion that Sylph’s Hole consistently has the least amount of healthy coral colonies?

2.0.3.1

As a coding extension question, see if you can segment your data by the Month variable, to gain further insights.


3 Coral Bleaching - Reproducible Research 🌱

Open a copy of Moriarty et al. (2023), and navigate to plot C from Fig 2.

Over the next few steps in this question, we’ll see that we can easily reproduce this plot in jamovi (using the coral_data.omv data), highlighting the ability of jamovi to conduct and produce journal article-grade results.

3.1

With the data open in jamovi, double-click on the Variables tab and select the Month variable. Notice how the Levels are initially set to April/May, then March, then October? Click on March and click the up arrow next to the Levels box to fix the order.

3.1.1

Change the order of the levels for Site, so they match up with the order shown in plot C from Fig 2. of Moriarty et al. (2023).

3.1.2

Recheck your stacked Bar Plot - it should now be looking very similar to the plot C from Fig 2. of Moriarty et al. (2023). As a final challenge, try changing the colours so they match those in the paper.

Hint

Recall you can access plot options via the three vertical dots button on the top right of the jamovi user interface.


If you have made it to this stage by the end of the lab, that’s great! Completing the core questions will prepare you well for upcoming assessments. The following extension questions will help consolidate and extend your understanding of the material.


4 Caribbean Reef Sharks 🌳

<span style='font-size:10px;'>Note. From File:Caribbean reef sharks and a lemon shark .jpg, by [Albert kok](https://commons.wikimedia.org/wiki/User:Albert_kok), 2010, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY-SA 3.0 DEED](https://creativecommons.org/licenses/by-sa/3.0/deed.en)</span>

Figure 4.1: Note. From File:Caribbean reef sharks and a lemon shark .jpg, by Albert kok, 2010, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 DEED

Recall the Caribbean Reef Shark example introduced in the DA Topic 6A Lecture.

This data, from Kohler et al. (2023), is available in the file reef_shark_data.omv in this week’s tile on LMS. It contains recorded values for numerous variables. We will focus just on the following variables:

  • Tag_ID: The tag ID given to each shark
  • Sex: The sex of the shark (F = female, M = male)
  • Mat_stage: The maturity of the shark (IM = immature, M = mature)
  • Tagging_Isl: The island at which the shark was tagged (GC = Grand Cayman, LC = Little Cayman, CB = Cayman Brac)

4.1

In the lecture, I purposefully ignored a few details in the data, to help simplify the introduction of the concept of Chi-Square Goodness of Fit tests. Inspecting the data, you may notice that some sharks are recorded multiple times - however I have treated them as separate sharks for the purposes of the analyses presented in the lectures.

In this question, I would like you to think about and discuss possible options for dealing with multiple observations per individual, in the context of Chi-Square tests. There are several potential approaches we could take. You may like to discuss options with your ED group members.

Make any adjustments you deem necessary to the reef_shark_data.omv data, and then conduct Chi-Square tests with the aim to obtain results as close as you can to those presented in the Results section of Kohler et al. (2023).

Namely, conduct Chi-Square Goodness of Fit tests to compare:

  1. Islands (paper result \(\chi^2 = 23.545\), \(p < 0.001\))
  2. Sex (paper result \(\chi^2 = 0.970\), \(p= 0.325\))
  3. Maturity (paper result \(\chi^2 = 0.546\), \(p = 0.460\))

If you manage to get any similar (or identical!) results, please make sure to let us know.

Hint

Kohler et al. (2023) used data for \(n=39\) sharks in their analyses. The Tag IDs for sharks with multiple observations are:

  • 43700
  • 43701
  • 43702
  • 45050
  • 45051


5 Pea Plant Data 🌳

Recall that in DA Computer Lab 1 we introduced a raw, messy data set on dwarf pea plant seedlings, which had been collected as part of an experiment in an LTU BIO1AP lab class in 2022. Figure 5.2 below contains this data.

We have been analysing this data throughout the semester, using the different statistical tests introduced in each DA topic.

<span style='font-size:10px;'>Note. From File:Prof. Dr. Thomé's Flora von Deutschland, Österreich und der Schweiz, in Wort und Bild, für Schule und Haus; mit ... Tafeln ... von Walter Müller (Pl. 453) (7982431787)c.png, by [Migula, Walter; Thomé, Otto W.](https://www.biodiversitylibrary.org/page/4321350#page/631/mode/1up), 1888, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [In the public domain.](https://en.wikipedia.org/wiki/public_domain)</span>

Figure 5.1: Note. From File:Prof. Dr. Thomé’s Flora von Deutschland, Österreich und der Schweiz, in Wort und Bild, für Schule und Haus; mit … Tafeln … von Walter Müller (Pl. 453) (7982431787)c.png, by Migula, Walter; Thomé, Otto W., 1888, Wikimedia Commons (https://commons.wikimedia.org/). In the public domain.

Background Information

To recap, in this experiment dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is of interest to determine if application of GA to the seedlings has an impact.

For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:

  • C: a control group, were sprayed with water
  • TA: a treatment group, were sprayed with a 25mg/L solution of GA
  • TB: a treatment group, were sprayed with a 50mg/L solution of GA

The height of the seedlings was then recorded at a later date. The pea plant data in Figure 5.2 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.

Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).

<span style='font-size:10px;'> Pea Plant Raw Data </span>

Figure 5.2: Pea Plant Raw Data

5.1

In DA Computer Lab 1 or DA Computer Lab 2 you should have created a data file in jamovi containing the cleaned pea plant data. If for whatever reason you do not have this data file saved, you can find a copy of the data in this week’s tile on LMS, in the file pea_plant_seedlings_data.omv.

5.2

As a final extension question, think about how you could apply a Chi-Square test to this pea plant seedling data, and consider:

  • Which test(s) would you use, and how?
  • Which type of effect size should be used?

Discuss your thought process with other students and/or your lab demonstrator.

If you have the time, try conducting the analysis/analyses in jamovi.

You may need to recode some data, and add additional columns to your original pea plant .omv file.


Congratulations, that’s the end of the final BIO2POS DA computer lab!


We hope you have enjoyed these sessions. This computer lab is a bit shorter than previous ones. You can use the remaining time to work on your current assessments, and/or begin revising earlier content. If you have any assessment questions, please ask your lab demonstrator.


Before you finish up, make sure to save your Word document to your OneDrive, for future reference.


References

  • Kim, H.-Y. (2017). Statistical notes for clinical researchers: Chi-squared test and Fisher’s exact test. Restorative Dentistry & Endodontics, 42(2), 152–155. https://doi.org/10.5395/rde.2017.42.2.152

  • Kohler, J., Gore, M., Ormond, R., Johnson, B., & Austin, T. (2023). Individual residency behaviours and seasonal long-distance movements in acoustically tagged Caribbean reef sharks in the Cayman Islands. PloS One, 18(11), e0293884–e0293884. https://doi.org/10.1371/journal.pone.0293884

  • Moriarty, T., Leggat, W., Heron, S. F., Steinberg, R., & Ainsworth, T. D. (2023) Bleaching, mortality and lengthy recovery on the coral reefs of Lord Howe Island. The 2019 marine heatwave suggests an uncertain future for high-latitude ecosystems. PLOS Climate, 2(4): e0000080. https://doi.org/10.1371/journal.pclm.0000080


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.