Welcome to Computer Lab 6 for the Data Analysis (DA) component of BIO2POS, our final DA computer lab for the semester!
Throughout this semester we have predominantly focused on assessing numeric variables. Our final DA topic, Topic 6, covered Chi-Square tests, which are used to assess categorical data.
The assumptions of these tests were also discussed.
In this computer lab, you will continue to learn how to use the statistical software jamovi, and conduct Chi-Square tests using real data sets. You will also learn how to check the assumptions of these tests in jamovi, and how to interpret and summarise jamovi output for these tests.
These labs are designed to provide you with plenty of opportunities to practice different aspects of the statistical content covered in the lectures.
Each lab consists of core questions (with the 🌱 symbol) and extension questions (with the 🌳 symbol).
Having completed this lab, you will be able to conduct the following tests and calculations in jamovi:
You will also be able to interpret the results of the above statistical techniques, check the assumptions of the tests, and provide clear summary statements highlighting the key statistical outputs of the models.
Please complete at least step 1. first, as doing so will help you to better understand the concepts you will need for this computer lab.
Figure 1.1: Note. From File:LordHoweIsland NorthBay Reef 27.JPG, by Toby Hudson, 2012, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 AU DEED
Coral reefs can be severely impacted by anomalies and increases in ocean temperatures. A recent paper by Moriarty et al. (2023) documented and analysed the extent of coral bleaching and subsequent recovery within the Lord Howe Island lagoonal reef over an eight month period in 2019. You can freely access their paper here.
Data from their study has been formatted for jamovi analyses, and is available in the file coral_data.omv in this week’s tile on LMS. It contains recorded values for the following variables:
Sylph's Hole, North Bay, or Coral GardenMarch, April/May, or OctoberStylophora pistillata, Pocillopora damicornis, Porites spp., Seriatopora hystrix, Isopora cuneats, Acropora spp. or Other taxaBleached, Dead,or HealthyTo begin, create a descriptives table in row format for the coral_data.omv data, using the Bleaching_Status variable, and splitting results by Taxa. Also create a bar plot of your results.
Suppose you would like to conduct a Chi-Square Goodness of Fit test to check if the proportions of the different coral species in the Lord Howe Island lagoonal reef region are all the same. Since we have data on six species plus a seventh category for ‘Other taxa’, this would mean we are expecting proportions of approximately 0.14 for each species.
Based on your results from part 1.1, do you think it is reasonable to assume equal proportions across coral species? Explain your reasoning clearly.
Regardless of your previous conclusion, suppose you begin by conducting a simple Chi-Square Goodness of Fit test of coral species’ proportions, under the assumption that proportions are equal across all categories.
Write out an appropriate null and alternative hypothesis for this test.
Navigate to the Frequencies tab, and select N Outcomes. Since we are interested in the different coral species, drag the Taxa variable across to the Variable box. You should see your Chi-Square Goodness of Fit test results appear automatically.
Click the Expected counts box, to display expected counts for each level (species) of the categorical variable Taxa.
Confirm the Expected Count of 300.286 by hand calculation.
Recall from the Topic 6A Lecture that the expected count will be the sample size divided by the number of levels of your categorical variable, for a Chi-Square Goodness of Fit test.
Write a short, simple summary of your results. Make sure to confirm the test assumptions are satisfied.
Moriarty et al. (2023) note that the dominant coral species within the Lord Howe Island lagoonal reef are:
Since it is expected these will be more prevalent, let us re-run our Chi-Square Goodness of Fit test, using the extended case where we assign specific expected proportions to each category.
In the Proportion Test (N Outcomes) section of jamovi, expand the Expected Proportions button. You will see that the ratios for all corals are set to 1, making all the proportions equal at 0.143.
Change the ratios for the 4 dominant coral species to 0.2 each, and then set the remaining 2 coral species to 0.075, and Other taxa to 0.05.
Check your new results, and write a summary of your Chi-Square Goodness of Fit test. Compare your observed test statistic value for both versions of the test.
Conduct another Chi-Square Goodness of Fit test, this time using the Bleaching_Status, and summarise your results. Suppose that past results suggest that a typical distribution of proportions is 0.42 for Bleached coral, 0.18 for Dead coral, and 0.4 for Healthy coral.
Suppose we extend our analysis of the coral_data.omv data, and now take into account the different sites in the Lord Howe Island lagoonal reef.
Specifically, suppose we would like to determine if there is an association between the different sites in the Lord Howe Island lagoonal reef and the health status of the coral in the reef.
To conduct a Chi-Square Test of Association of Bleaching_Status vs Site in jamovi, click on the Frequencies tab and select the Independent Samples option.
Drag Bleaching_Status and Site across to the Rows and Columns boxes respectively. While this will yield test results, we can include additional details via the following steps.
Write out an appropriate null and alternative hypothesis for this test.
Expand the Statistics section and select Phi and Cramer's V under the Nominal heading.
Expand the Cells section and select Observed counts and Expected counts under the Counts heading, and Row under the Percentages heading.
Expand the Plots section, select Bar Plot, and change the Bar Type from Side by side to Stacked.
Summarise your test results. Make sure to check the test assumptions.
To interpret the effect size, you may like to check e.g. Kim (2017).
Based on your test results and the bar plot produced, do you agree with Moriarty et al.’s (2023) conclusion that Sylph’s Hole consistently has the least amount of healthy coral colonies?
To gain further supporting information, add the Month variable to the Layers box. This will further split the analysis to consider the different months.
You may like to untick the Expected counts and Row percentages boxes, to make the results more concise.
To gain a different perspective on your data, change the Counts option to Percentages within rows for your bar plot.
Do you prefer this version, or the version from 2.1.4 for interpretative purposes?
Open a copy of Moriarty et al. (2023), and navigate to plot C from Fig 2.
Over the next few steps in this question, we’ll see that we can easily reproduce this plot in jamovi (using the coral_data.omv data).
With the data open in jamovi, double-click on the Variables tab and select the Month variable. Notice how the Levels are initially set to April/May, then March, then October? Click on March and click the up arrow next to the Levels box to fix the order.
Change the order of the levels for Site, so they match up with the order shown in plot C from Fig 2. of Moriarty et al. (2023).
Recheck your stacked Bar Plot - it should now be looking very similar to the plot C from Fig 2. of Moriarty et al. (2023). As a final challenge, try changing the colours so they match those in the paper.
Recall you can access plot options via the three vertical dots button on the top right of the jamovi user interface.
Figure 4.1: Note. From File:Caribbean reef sharks and a lemon shark .jpg, by Albert kok, 2010, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 DEED
Recall the Caribbean Reef Shark example introduced in the DA Topic 6A Lecture.
This data, from Kohler et al. (2023), is available in the file reef_shark_data.omv in this week’s tile on LMS. It contains recorded values for numerous variables. We will focus just on the following variables:
In the lecture, I purposefully ignored a few details in the data, to help simplify the introduction of the concept of Chi-Square Goodness of Fit tests. Inspecting the data, you may notice that some sharks are recorded multiple times - however I have treated them as separate sharks for the purposes of the analyses presented in the lectures.
In this question, I would like you to think about and discuss possible options for dealing with multiple observations per individual, in the context of Chi-Square tests. There are several potential approaches we could take. You may like to discuss options with your ED group members.
Make any adjustments you deem necessary to the reef_shark_data.omv data, and then conduct Chi-Square tests with the aim to obtain results as close as you can to those presented in the Results section of Kohler et al. (2023).
Namely, conduct Chi-Square Goodness of Fit tests to compare:
If you manage to get any similar (or identical!) results, please make sure to let us know.
Kohler et al. (2023) used data for \(n=39\) sharks in their analyses. The Tag IDs for sharks with multiple observations are:
Recall that in DA Computer Lab 1 we introduced a raw, messy data set on dwarf pea plant seedlings, which had
been collected as part of an experiment in an LTU BIO1AP lab class in 2022. Figure 5.2 below contains this data.
We have been analysing this data throughout the semester, using the different statistical tests introduced in each DA topic.
Figure 5.1: Note. From File:Prof. Dr. Thomé’s Flora von Deutschland, Österreich und der Schweiz, in Wort und Bild, für Schule und Haus; mit … Tafeln … von Walter Müller (Pl. 453) (7982431787)c.png, by Migula, Walter; Thomé, Otto W., 1888, Wikimedia Commons (https://commons.wikimedia.org/). In the public domain.
To recap, in this experiment dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is of interest to determine if application of GA to the seedlings has an impact.
For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:
The height of the seedlings was then recorded at a later date. The pea plant data in Figure 5.2 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.
Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).
Figure 5.2: Pea Plant Raw Data
In DA Computer Lab 1 or DA Computer Lab 2 you should have created a data file in jamovi containing the cleaned pea plant data. If for whatever reason you do not have this data file saved, you can find a copy of the data in this week’s tile on LMS, in the file pea_plant_seedlings_data.omv.
As a final extension question, think about how you could apply a Chi-Square test to this pea plant seedling data, and consider:
Discuss your thought process with other students and/or your lab demonstrator.
If you have the time, try conducting the analysis/analyses in jamovi.
You may need to recode some data, and add additional columns to your original pea plant .omv file.
Before you finish up, make sure to save your Word document to your OneDrive, for future reference.
Kim, H.-Y. (2017). Statistical notes for clinical researchers: Chi-squared test and Fisher’s exact test. Restorative Dentistry & Endodontics, 42(2), 152–155. https://doi.org/10.5395/rde.2017.42.2.152
Kohler, J., Gore, M., Ormond, R., Johnson, B., & Austin, T. (2023). Individual residency behaviours and seasonal long-distance movements in acoustically tagged Caribbean reef sharks in the Cayman Islands. PloS One, 18(11), e0293884–e0293884. https://doi.org/10.1371/journal.pone.0293884
Moriarty, T., Leggat, W., Heron, S. F., Steinberg, R., & Ainsworth, T. D. (2023) Bleaching, mortality and lengthy recovery on the coral reefs of Lord Howe Island. The 2019 marine heatwave suggests an uncertain future for high-latitude ecosystems. PLOS Climate, 2(4): e0000080. https://doi.org/10.1371/journal.pclm.0000080
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.