Welcome to Computer Lab 6 for the Data Analysis (DA) component of BIO2POS, our final DA computer lab for the semester!
Throughout this semester we have predominantly focused on assessing numeric variables. Our final DA topic, Topic 6, covered Chi-Square tests, which are used to assess categorical data.
The assumptions of these tests were also discussed.
In this computer lab, you will continue to learn how to use the statistical software jamovi, and conduct Chi-Square tests using real data sets. You will also learn how to check the assumptions of these tests in jamovi, and how to interpret and summarise jamovi output for these tests.
These labs are designed to provide you with plenty of opportunities to practice different aspects of the statistical content covered in the lectures.
Each lab consists of core questions (with the 🌱 symbol) and extension questions (with the 🌳 symbol).
Having completed this lab, you will be able to conduct the following tests and calculations in jamovi:
You will also be able to interpret the results of the above statistical techniques, check the assumptions of the tests, and provide clear summary statements highlighting the key statistical outputs of the models.
Before you begin, please check the following:
Please aim to complete Step 1 before starting this lab, as doing so will help you to better understand the content covered. Please aim to complete Step 2 before the next week of DA content.
Figure 1.1: Note. From File:LordHoweIsland NorthBay Reef 27.JPG, by Toby Hudson, 2012, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 AU DEED
Coral reefs can be severely impacted by anomalies and increases in ocean temperatures. A recent paper by Moriarty et al. (2023) documented and analysed the extent of coral bleaching and subsequent recovery within the Lord Howe Island lagoonal reef over an eight month period in 2019. You can freely access their paper here.
Data from their study has been formatted for jamovi analyses, and is available in the file coral_data.omv in this week’s tile on LMS. It contains recorded values for the following variables:
Sylph's Hole, North Bay, or Coral GardenMarch, April/May, or OctoberStylophora pistillata, Pocillopora damicornis, Porites spp., Seriatopora hystrix, Isopora cuneats, Acropora spp. or Other taxaBleached, Dead,or HealthyTo begin, create a descriptives table in row format for the coral_data.omv data, using the Bleaching_Status variable, and splitting results by Taxa. Also create a bar plot of your results.
Suppose you would like to conduct a Chi-Square Goodness of Fit test to check if the proportions of the different coral species in the Lord Howe Island lagoonal reef region are all the same. Since we have data on six species plus a seventh category for ‘Other taxa’, this would mean we are expecting proportions of approximately 0.14 for each species.
Based on your results from part 1.0.1, do you think it is reasonable to assume equal proportions across coral species? Explain your reasoning clearly.
Regardless of your previous conclusion, suppose you begin by conducting a simple Chi-Square Goodness of Fit test of coral species’ proportions, under the assumption that proportions are equal across all categories.
We will now cover how to conduct this test in jamovi.
Write out an appropriate null and alternative hypothesis for this test.
Navigate to the Frequencies tab, and select N Outcomes. Since we are interested in the different coral species, drag the Taxa variable across to the Variable box. You should see your Chi-Square Goodness of Fit test results appear automatically.
Click the Expected counts box, to display expected counts for each level (species) of the categorical variable Taxa.
Confirm the Expected Count of 300.286 by hand calculation.
Recall from the Topic 6A Lecture that the expected count will be the sample size divided by the number of levels of your categorical variable, for a Chi-Square Goodness of Fit test.
Write a short, simple summary of your results. Make sure to confirm the test assumptions are satisfied.
Moriarty et al. (2023) note that the dominant coral species within the Lord Howe Island lagoonal reef are:
Since it is expected these will be more prevalent, let us re-run our Chi-Square Goodness of Fit test, using the extended case where we assign specific expected proportions to each category.
In the Proportion Test (N Outcomes) section of jamovi, expand the Expected Proportions button. You will see that the ratios for all corals are set to 1, making all the proportions equal at 0.143.
Change the ratios for the 4 dominant coral species to 0.2 each, and then set the remaining 2 coral species to 0.075, and Other taxa to 0.05.
Check your new results, and write a summary of your Chi-Square Goodness of Fit test. Compare your observed test statistic value for both versions of the test.
Conduct another Chi-Square Goodness of Fit test, this time using the Bleaching_Status, and summarise your results.
Suppose that past results suggest that a typical distribution of proportions is 0.42 for Bleached coral, 0.18 for Dead coral, and 0.4 for Healthy coral.
Open up R, and open a new R script (File -> New File -> R Script), set your current working directory to where you downloading the coral data, and read in the coral data, storing it in the object coral.
If you need to refresh your memory on any of the processes mentioned above, just check the Details box below:
Recall that to open .omv files in R, we’ll need the jmvReadWrite package installed and loaded. If you are not sure whether you have this package, copy-paste and then run the following code in your R script, one line at a time.
install.packages("jmvReadWrite") # this line installs the package we need
library(jmvReadWrite) # this line loads the package in our current session
# Note that anything after a # is called a comment in R, and isn't treated as executable code
To run a line of code in RStudio, just have your cursor on that line, and click the Run Selected Line(s) button at the top right of the script (where the green arrow is, see reference image below). Your line of code will then be run, or executed, and you should see the code and some other output appear in the Console section below your script file.
Recall that to set your Working Directory (where R looks for files), the two simplest options in RStudio are:
Then:
coral <- read_omv("coral_data.omv")
# This line loads our coral data set into RStudio,
# and stores the data in an object we've called coral_data
You should now see coral_data listed in the Environment section of RStudio in the top right - this means the data is loaded in RStudio, and ready for analysis!
In R, if we would like to check a descriptives table, often the summary function is the simplest starting point, e.g.:
summary(coral)
Since we have 3 Bleaching Status options, and 7 Taxa options, it may be helpful to split our bar plots based on either of these variables, in order to obtain a more detailed understanding of how the different taxa are impacted.
The R code below walks you through the set up for a simple bar plot - try adjusting the levels to produce the bar plots for the other Bleaching_Status options, or even switch the Taxa and Bleaching_Status positions, to create bar plots for each Taxa.
barplot(table(coral$Taxa[coral$Bleaching_Status == levels(coral$Bleaching_Status)[1]]),
main =levels(coral$Bleaching_Status)[1], col = 2:7)
# Here, Taxa and Bleaching_Status are recorded as factor variables
# So we use the table() function to structure them in a way that the barplot function understands
# We are subsetting the data using the == part to select only those entries that match the
# levels(coral$Bleaching_Status)[1]], i.e. that match the first listed Taxa
# Note for example levels(coral$Taxa)[2] would give us the 2nd listed Taxa, etc
Suppose you would like to conduct a Chi-Square Goodness of Fit test to check if the proportions of the different coral species in the Lord Howe Island lagoonal reef region are all the same. Since we have data on six species plus a seventh category for ‘Other taxa’, this would mean we are expecting proportions of approximately 0.14 for each species.
Based on your results from part 1.0.1, do you think it is reasonable to assume equal proportions across coral species? Explain your reasoning clearly.
Regardless of your previous conclusion, suppose you begin by conducting a simple Chi-Square Goodness of Fit test of coral species’ proportions, under the assumption that proportions are equal across all categories.
We will now cover how to conduct this test in R.
Write out an appropriate null and alternative hypothesis for this test.
To conduct a Chi-Square Goodness of Fit test in R, we can use the inbuilt chisq.test function. This function is deceptively simple to use, but we need to ensure that the data provided is in the right format - in this instance, this means we will need to restructure the Taxa data to a table format.
It will also be helpful to assign the output of the chisq.test function to a new object, since the function will generate various data which we may wish to assess separately afterwards.
To begin, run the following code:
coral_gof <- chisq.test(table(coral$Taxa))
# Here we are conducting the test across all the different coral species at once
# We are assuming equal proportions
coral_gof
# Run this line to then check the results of the test
Once we have run the test, we can access various secondary results stored within the coral_gof object using the $ operator.
For example, to check the expected counts for each category, run the following code:
coral_gof$expected
Confirm the Expected Count of 300.286 by hand calculation.
Recall from the Topic 6A Lecture that the expected count will be the sample size divided by the number of levels of your categorical variable, for a Chi-Square Goodness of Fit test.
Write a short, simple summary of your results. Make sure to confirm the test assumptions are satisfied.
Moriarty et al. (2023) note that the dominant coral species within the Lord Howe Island lagoonal reef are:
Since it is expected these will be more prevalent, let us re-run our Chi-Square Goodness of Fit test, using the extended case where we assign specific expected proportions to each category.
Suppose we set the proportions for each of the dominant coral species to 0.2 each, and set the remaining coral species proportions to 0.075, and Other taxa to 0.05.
To specify different proportions for each category, we can use the p argument in the chisq.test function (which by default was set to 1 divided by # categories for each category).
If we check the levels of our Taxa data:
levels(coral$Taxa)
we observe that the 4 dominant coral species are listed as levels 7, 4, 5 and 6 respectively. Other taxa is level 3, leaving the less dominant coral species as levels 1 and 2.
Therefore, we can use the vector p = c(0.075, 0.075, 0.05, 0.2, 0.2, 0.2, 0.2). While the process is a little unwieldy, it does allow us unprecedented control over the distribution of expected proportions.
Run the following R code to conduct the extended case test:
coral_gof_extended <- chisq.test(table(coral$Taxa), p = c(0.075, 0.075, 0.05, 0.2, 0.2, 0.2, 0.2))
# Here we are conducting the test across all the different coral species at once
# We are no longer assuming equal proportions
coral_gof_extended
# Run this line to then check the results of the test
Check your new results, and write a summary of your Chi-Square Goodness of Fit test. Compare your observed test statistic value for both versions of the test.
Conduct another Chi-Square Goodness of Fit test, this time using the Bleaching_Status, and summarise your results.
Suppose that past results suggest that a typical distribution of proportions is 0.42 for Bleached coral, 0.18 for Dead coral, and 0.4 for Healthy coral.
Suppose we extend our analysis of the coral_data.omv data, and now take into account the different sites in the Lord Howe Island lagoonal reef.
Specifically, suppose we would like to determine if there is an association between the different sites in the Lord Howe Island lagoonal reef and the health status of the coral in the reef.
To conduct a Chi-Square Test of Association of Bleaching_Status vs Site in jamovi, click on the Frequencies tab and select the Independent Samples option.
Drag Bleaching_Status and Site across to the Rows and Columns boxes respectively. While this will yield test results, we can include additional details via the following steps.
Write out an appropriate null and alternative hypothesis for this test.
Expand the Statistics section and select Phi and Cramer's V under the Nominal heading.
Expand the Cells section and select Observed counts and Expected counts under the Counts heading, and Row under the Percentages heading.
Expand the Plots section, select Bar Plot, and change the Bar Type from Side by side to Stacked.
Summarise your test results. Make sure to check the test assumptions.
To interpret the effect size, you may like to check e.g. Kim (2017).
Based on your test results and the bar plot produced, do you agree with Moriarty et al.’s (2023) conclusion that Sylph’s Hole consistently has the least amount of healthy coral colonies?
To gain further supporting information, add the Month variable to the Layers box. This will further split the analysis to consider the different months.
You may like to untick the Expected counts and Row percentages boxes, to make the results more concise.
To gain a different perspective on your data, change the Counts option to Percentages within rows for your bar plot.
Do you prefer this version, or the version from 2.0.1.4 for interpretative purposes?
The good news is that we can continue to use the chisq.test function to conduct Chi Square Tests of Association - we just need to ensure that the format of our code is correct. We’ll cover all the steps over the next few subquestions.
To begin, write out an appropriate null and alternative hypothesis for this test.
In R, run the following code to:
coral_two_way_table <- table(coral$Bleaching_Status, coral$Site)
# This sets up our data in the right format for the test
coral_toa <- chisq.test(coral_two_way_table)
# Assign the test results to the object coral_toa (arbitrary name)
# Then run the object name to see the test results
coral_toa
To help us check the test assumptions, we will need to look at the expected counts across the different elements of the two-way table. To access this data, we can run the following code:
coral_toa$expected
There are various options and methods for visualising the data as you conduct your analysis. One option which may be helpful to us here is to produce stacked bar plots, to more easily identify differences across the different sites.
The R code below provides a framework for producing such plots:
barplot(two_way_table, col = 2:4, legend = T, ylim = c(0, 1400))
# Note that we need the data in the two-way table structure,
# before we can plot the different sites together (you may like to test this for Q1 too)
# Here, we are including a legend, and increasing the y axis range
# to ensure the legend does not overlap the bar plot
barplot(prop.table(two_way_table, margin = 2), col = 2:4, legend = T, ylim = c(0, 1))
# Here, we are scaling the results to show proportions out of 1,
# so comparisons between the different sites are more comparable
Summarise your test results. Make sure to check the test assumptions.
Based on your test results and the bar plot produced, do you agree with Moriarty et al.’s (2023) conclusion that Sylph’s Hole consistently has the least amount of healthy coral colonies?
As a coding extension question, see if you can segment your data by the Month variable, to gain further insights.
Open a copy of Moriarty et al. (2023), and navigate to plot C from Fig 2.
Over the next few steps in this question, we’ll see that we can easily reproduce this plot in jamovi (using the coral_data.omv data), highlighting the ability of jamovi to conduct and produce journal article-grade results.
With the data open in jamovi, double-click on the Variables tab and select the Month variable. Notice how the Levels are initially set to April/May, then March, then October? Click on March and click the up arrow next to the Levels box to fix the order.
Change the order of the levels for Site, so they match up with the order shown in plot C from Fig 2. of Moriarty et al. (2023).
Recheck your stacked Bar Plot - it should now be looking very similar to the plot C from Fig 2. of Moriarty et al. (2023). As a final challenge, try changing the colours so they match those in the paper.
Recall you can access plot options via the three vertical dots button on the top right of the jamovi user interface.
Figure 4.1: Note. From File:Caribbean reef sharks and a lemon shark .jpg, by Albert kok, 2010, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 DEED
Recall the Caribbean Reef Shark example introduced in the DA Topic 6A Lecture.
This data, from Kohler et al. (2023), is available in the file reef_shark_data.omv in this week’s tile on LMS. It contains recorded values for numerous variables. We will focus just on the following variables:
In the lecture, I purposefully ignored a few details in the data, to help simplify the introduction of the concept of Chi-Square Goodness of Fit tests. Inspecting the data, you may notice that some sharks are recorded multiple times - however I have treated them as separate sharks for the purposes of the analyses presented in the lectures.
In this question, I would like you to think about and discuss possible options for dealing with multiple observations per individual, in the context of Chi-Square tests. There are several potential approaches we could take. You may like to discuss options with your ED group members.
Make any adjustments you deem necessary to the reef_shark_data.omv data, and then conduct Chi-Square tests with the aim to obtain results as close as you can to those presented in the Results section of Kohler et al. (2023).
Namely, conduct Chi-Square Goodness of Fit tests to compare:
If you manage to get any similar (or identical!) results, please make sure to let us know.
Kohler et al. (2023) used data for \(n=39\) sharks in their analyses. The Tag IDs for sharks with multiple observations are:
Recall that in DA Computer Lab 1 we introduced a raw, messy data set on dwarf pea plant seedlings, which had
been collected as part of an experiment in an LTU BIO1AP lab class in 2022. Figure 5.2 below contains this data.
We have been analysing this data throughout the semester, using the different statistical tests introduced in each DA topic.
Figure 5.1: Note. From File:Prof. Dr. Thomé’s Flora von Deutschland, Österreich und der Schweiz, in Wort und Bild, für Schule und Haus; mit … Tafeln … von Walter Müller (Pl. 453) (7982431787)c.png, by Migula, Walter; Thomé, Otto W., 1888, Wikimedia Commons (https://commons.wikimedia.org/). In the public domain.
To recap, in this experiment dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is of interest to determine if application of GA to the seedlings has an impact.
For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:
The height of the seedlings was then recorded at a later date. The pea plant data in Figure 5.2 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.
Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).
Figure 5.2: Pea Plant Raw Data
In DA Computer Lab 1 or DA Computer Lab 2 you should have created a data file in jamovi containing the cleaned pea plant data. If for whatever reason you do not have this data file saved, you can find a copy of the data in this week’s tile on LMS, in the file pea_plant_seedlings_data.omv.
As a final extension question, think about how you could apply a Chi-Square test to this pea plant seedling data, and consider:
Discuss your thought process with other students and/or your lab demonstrator.
If you have the time, try conducting the analysis/analyses in jamovi.
You may need to recode some data, and add additional columns to your original pea plant .omv file.
Before you finish up, make sure to save your Word document to your OneDrive, for future reference.
Kim, H.-Y. (2017). Statistical notes for clinical researchers: Chi-squared test and Fisher’s exact test. Restorative Dentistry & Endodontics, 42(2), 152–155. https://doi.org/10.5395/rde.2017.42.2.152
Kohler, J., Gore, M., Ormond, R., Johnson, B., & Austin, T. (2023). Individual residency behaviours and seasonal long-distance movements in acoustically tagged Caribbean reef sharks in the Cayman Islands. PloS One, 18(11), e0293884–e0293884. https://doi.org/10.1371/journal.pone.0293884
Moriarty, T., Leggat, W., Heron, S. F., Steinberg, R., & Ainsworth, T. D. (2023) Bleaching, mortality and lengthy recovery on the coral reefs of Lord Howe Island. The 2019 marine heatwave suggests an uncertain future for high-latitude ecosystems. PLOS Climate, 2(4): e0000080. https://doi.org/10.1371/journal.pclm.0000080
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.