Topic 6: Chi-Square Tests in jamovi


These are the solutions for DA Computer Lab 6.

Please make sure to go over these after the lab session, and finish off any questions you may have missed during the lab.

1 Coral Bleaching - Chi-Square Goodness of Fit test 🌱

<span style='font-size:10px;'>Note. From File:LordHoweIsland NorthBay Reef 27.JPG, by [Toby Hudson](https://commons.wikimedia.org/wiki/User:99of9), 2012, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY-SA 3.0 AU DEED](https://creativecommons.org/licenses/by-sa/3.0/au/deed.en) </span>

Figure 1.1: Note. From File:LordHoweIsland NorthBay Reef 27.JPG, by Toby Hudson, 2012, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 AU DEED

In this question we assessed the coral_data.omv data from Moriarty et al.’s (2023) study of coral bleaching at Lord Howe Island. We focused on the variables:

  • Site: The region in the lagoonal reef where the recording was taken - One of Sylph's Hole, North Bay, or Coral Garden
  • Month: The month in which the recording was taken - One of March, April/May, or October
  • Taxa: The species of coral - One of Stylophora pistillata, Pocillopora damicornis, Porites spp., Seriatopora hystrix, Isopora cuneats, Acropora spp. or Other taxa
  • Bleaching_Status: The health status of the coral - One of Bleached, Dead,or Healthy

1.1

1.2

Based on the previous results, it doesn’t seem reasonable to assume equal proportions across coral species, as we can see that some species were observed far more frequently (e.g. Pocillopora damicornis and Porites spp.) than others (e.g. Acropora spp.).

1.3

Regardless of your previous conclusion, suppose you begin by conducting a simple Chi-Square Goodness of Fit test of coral species’ proportions, under the assumption that proportions are equal across all categories.

1.3.1

Here we would be testing:

\[H_0: \pi_1 = \pi_2 = \cdots = \pi_7 \text{ vs } H_1: \text{ Not all $\pi_i$'s are equal }\]

Here:

  • \(\pi_1\) denotes the population proportion of Stylophora pistillata coral in the Lord Howe Island lagoonal reef
  • \(\pi_2\) denotes the population proportion of Pocillopora damicornis coral in the Lord Howe Island lagoonal reef
  • \(\pi_3\) denotes the population proportion of Porites spp. coral in the Lord Howe Island lagoonal reef
  • \(\pi_4\) denotes the population proportion of Seriatopora hystrix coral in the Lord Howe Island lagoonal reef
  • \(\pi_5\) denotes the population proportion of Isopora cuneats coral in the Lord Howe Island lagoonal reef
  • \(\pi_6\) denotes the population proportion of Acropora spp. coral in the Lord Howe Island lagoonal reef
  • \(\pi_7\) denotes the population proportion of Other taxa coral in the Lord Howe Island lagoonal reef

1.3.2

1.3.3

1.3.4

Here we have a total sample size of \(n = 2102\) (which you can check easily in e.g. the Exploration section). We have \(k = 7\) levels for our categorical variable.

Therefore the expected count will be \(n / k \approx 300.2857\).

1.3.5

We have conducted a Chi-Square Goodness of Fit test to check if there was a difference in the observed and expected proportions of coral species present in the Lord Howe Island lagoonal reef. Equal proportions of each of the 7 taxa were expected (expected proportion of approximately 0.143 per taxa) with a total sample size of 2012.

The test conditions were satisfied, with all categories having expected counts larger than 5.

A statistically significant difference was found between the proportions of the 7 coral taxa at the \(\alpha = 0.05\) level of significance, with \(\chi^2_6 = 926.449\), \(p < .001\).

1.4

No answer required.

1.4.1

1.4.2

We have conducted a Chi-Square Goodness of Fit test to check if there was a difference in the observed and expected proportions of coral species present in the Lord Howe Island lagoonal reef. The total sample size was 2012, and a specific distribution of expected proportions was assumed, with:

  • The expected proportion of Stylophora pistillata coral was 0.2
  • The expected proportion ofPocillopora damicornis coral was 0.2
  • The expected proportion of Porites spp. coral was 0.2
  • The expected proportion of Seriatopora hystrix coral was 0.2
  • The expected proportion ofIsopora cuneatscoral was 0.075
  • The expected proportion ofAcropora spp. coral was 0.075
  • The expected proportion of Other taxa coral was 0.05

The test conditions were satisfied, with all categories having expected counts larger than 5.

A statistically significant difference was found between the observed and expected proportions of the 7 coral taxa at the \(\alpha = 0.05\) level of significance, with \(\chi^2_6 = 222.667\), \(p < .001\).

The test statistic has remained statistically significant, but the magnitude has reduced dramatically, suggesting the distribution of proportions specified was closer to the observed distribution of proportions than for the assumed equal proportions case.

1.5

Conduct another Chi-Square Goodness of Fit test, this time using the Bleaching_Status, and summarise your results. Suppose that past results suggest that a typical distribution of proportions is 0.42 for Bleached coral, 0.18 for Dead coral, and 0.4 for Healthy coral.

We have conducted a Chi-Square Goodness of Fit test to check if there was a difference in the observed and expected proportions of the bleaching statuses of coral species present in the Lord Howe Island lagoonal reef. The total sample size was 2012, and a specific distribution of expected proportions was assumed, with:

  • The expected proportion of Bleached coral was 0.42
  • The expected proportion ofDead coral was 0.18
  • The expected proportion of Healthy coral was 0.4

The test conditions were satisfied, with all categories having expected counts larger than 5.

However the test results were not statistically significant at the \(\alpha = 0.05\) level of significance, with \(\chi^2_2 = 1.682\), \(p < .431\).


2 Coral Bleaching - Chi-Square Test of Association 🌱

2.1

2.1.1

Our null hypothesis is that there is no association between bleaching status and reef site, while our alternate hypothesis is that there is an association between the two variables. I.e.:

\[H_0: \text{ There is no association between the variables bleaching status and site, vs. } \\ H_1: \text{ There is an association between the variables bleaching status and site}\]

2.1.2

2.1.3

2.1.4

2.2

We have conducted a Chi-Square Test of Association to determine if there is an association between bleaching status and reef site in the Lord Howe Island lagoonal reef.

Some key descriptive details include: 62% of all bleached coral and 77.3% of all dead coral was observed at Sylph’s Hole. 50% of all healthy coral observations were at North Bay.

A statistically and clinically significant association was found between bleaching status and reef site, at the \(\alpha = 0.05\) level of significance, with \(\chi^2_4 = 551.567\), \(n = 2102\), \(p < .001\), and a large effect size, with Cramer’s \(V = 0.362\).

It appears that Sylph’s Hole is more likely to have bleached or dead coral than Coral Garden or North Bay. Coral Garden is less likely to have bleached or dead coral than the other sites.

Note that if we further segment our data by month (in the following questions), we find that this association between bleaching status and reef site holds true across the 8-month period.

2.3

Yes, it seems reasonable to agree with Moriarty et al.’s (2023) conclusion that Sylph’s Hole consistently has the least amount of healthy coral colonies. It clearly has the highest amount of bleached and dead coral, and results across the different months all show statistically significant differences in bleaching status across sites.

2.3.1

2.3.2

A preference here is down to personal opinion.


3 Coral Bleaching - Reproducible Research 🌱

3.1

3.1.1

3.1.2

Settings to produce the plot are also shown below:

Note that you could go further and e.g. also change the variable name for taxa to more closely resemble plot C from Moriarty et al. (2023).


4 Caribbean Reef Sharks 🌳

<span style='font-size:10px;'>Note. From File:Caribbean reef sharks and a lemon shark .jpg, by [Albert kok](https://commons.wikimedia.org/wiki/User:Albert_kok), 2010, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY-SA 3.0 DEED](https://creativecommons.org/licenses/by-sa/3.0/deed.en)</span>

Figure 4.1: Note. From File:Caribbean reef sharks and a lemon shark .jpg, by Albert kok, 2010, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 3.0 DEED

In this question we assessed the Caribbean Reef Shark data from Kohler et al. (2023). We focused on the variables:

  • Tag_ID: The tag ID given to each shark
  • Sex: The sex of the shark (F = female, M = male)
  • Mat_stage: The maturity of the shark (IM = immature, M = mature)
  • Tagging_Isl: The island at which the shark was tagged (GC = Grand Cayman, LC = Little Cayman, CB = Cayman Brac)

4.1

We have not covered data interpolation or combination in any great detail in the BIO2POS DA content, and a detailed discussion of this part of data analysis is beyond the scope of the subject, so this question is intended mainly to stimulate discussion and thought.

If our data were numeric here, we would have several potential options, e.g.:

  1. Take the average of the results for each shark who has multiple observations
  2. Remove duplicate observations (but this is ignoring data, which is not generally recommended)
  3. Take a weighted average based on other data (e.g. if the maturity changes from IM to M, perhaps weight the M result more heavily?)

However, if we check the data, we see that the sharks with multiple recordings appear to simply have duplicate recordings - all the pairs of recorded values match for each shark, so it would seem reasonable to simply remove the duplicates, bringing us to \(n=39\).

If we run our Chi-Square Goodness of Fit tests with the reduced data set though, we still do not obtain identical results to Kohler et al. (2023) - it would be interesting to know what steps they took.

Example output is shown below:


5 Pea Plant Data 🌳

Recall that in DA Computer Lab 1 we introduced a raw, messy data set on dwarf pea plant seedlings, which had been collected as part of an experiment in an LTU BIO1AP lab class in 2022. Figure 4.2 below contains this data.

We have been analysing this data throughout the semester, using the different statistical tests introduced in each DA topic.

<span style='font-size:10px;'>Note. From File:Prof. Dr. Thomé's Flora von Deutschland, Österreich und der Schweiz, in Wort und Bild, für Schule und Haus; mit ... Tafeln ... von Walter Müller (Pl. 453) (7982431787)c.png, by [Migula, Walter; Thomé, Otto W.](https://www.biodiversitylibrary.org/page/4321350#page/631/mode/1up), 1888, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [In the public domain.](https://en.wikipedia.org/wiki/public_domain)</span>

Figure 5.1: Note. From File:Prof. Dr. Thomé’s Flora von Deutschland, Österreich und der Schweiz, in Wort und Bild, für Schule und Haus; mit … Tafeln … von Walter Müller (Pl. 453) (7982431787)c.png, by Migula, Walter; Thomé, Otto W., 1888, Wikimedia Commons (https://commons.wikimedia.org/). In the public domain.

Background Information

To recap, in this experiment dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is of interest to determine if application of GA to the seedlings has an impact.

For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:

  • C: a control group, were sprayed with water
  • TA: a treatment group, were sprayed with a 25mg/L solution of GA
  • TB: a treatment group, were sprayed with a 50mg/L solution of GA

The height of the seedlings was then recorded at a later date. The pea plant data in Figure 4.2 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.

Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).

<span style='font-size:10px;'> Pea Plant Raw Data </span>

Figure 5.2: Pea Plant Raw Data

5.1

In DA Computer Lab 1 or DA Computer Lab 2 you should have created a data file in jamovi containing the cleaned pea plant data. If for whatever reason you do not have this data file saved, you can find a copy of the data in this week’s tile on LMS, in the file pea_plant_seedlings_data.omv.

5.2

No answer required. Discuss your thought processes with other students and/or your lab demonstrator.

Analyses you may have considered could include checking whether the proportion of seedlings given the different treatments was equal across benches, and whether there is an association between the bench number and the distribution of seedlings that survived a given treatment. It may be more reasonable to use the gamma effect size, if we consider the data being ordinal, in terms of the strength of the GA solution used.

You may need to recode some data, and add additional columns to your original pea plant .omv file.


Congratulations, that’s the end of the final BIO2POS DA computer lab!


References

  • Kohler, J., Gore, M., Ormond, R., Johnson, B., & Austin, T. (2023). Individual residency behaviours and seasonal long-distance movements in acoustically tagged Caribbean reef sharks in the Cayman Islands. PloS One, 18(11), e0293884–e0293884. https://doi.org/10.1371/journal.pone.0293884

  • Moriarty, T., Leggat, W., Heron, S. F., Steinberg, R., & Ainsworth, T. D. (2023) Bleaching, mortality and lengthy recovery on the coral reefs of Lord Howe Island. The 2019 marine heatwave suggests an uncertain future for high-latitude ecosystems. PLOS Climate, 2(4): e0000080. https://doi.org/10.1371/journal.pclm.0000080


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.