Welcome to Computer Lab 2 for the Data Analysis (DA) component of BIO2POS!
In DA Topic 2, we introduced the concept of the \(t\)-distribution, and covered how to conduct one sample \(t\)-tests, paired \(t\)-tests, and two sample \(t\)-tests. We also went over the assumptions of these different tests, and outlined the non-parametric tests we could use in situations where the \(t\)-test assumptions were violated.
In this computer lab, you will continue to learn how to use the statistical software jamovi, and conduct various \(t\)-tests and the equivalent non-parametric tests using real data sets. You will also learn how to interpret and summarise jamovi output for these tests.
These labs are designed to provide you with plenty of opportunities to practice different aspects of the statistical content covered in the lectures.
Each lab consists of core questions (with the 🌱 symbol) and extension questions (with the 🌳 symbol).
Having completed this lab, you will be able to obtain and discuss the following statistical outputs and results for a data set in jamovi:
Before you begin, please check the following:
Please aim to complete Step 1 before starting this lab, as doing so will help you to better understand the content covered. Please aim to complete Step 2 before the next week of DA content.
To begin, we will return to the red crab data, collected by Green (1997), which we began analysing in DA Computer Lab 1. However in this lab we will assess an extended version of this data set, which contains recorded values for the following variables:
Figure 1.1: Note. From File:Christmas Island Red Crab.jpg, by ChrisBrayPhotography, 2017, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 4.0 DEED
This red crab data is available in the DA Topic 2 tile on LMS, in the file crab_data_extended.omv. Download this file now,
and save it on your computer. Also open up a Word document, in which you can write down your responses and
save your jamovi output as you work through the lab.
You may like to save the data and Word document to your OneDrive, so you can access them easily at a later date.
Open up jamovi, and click on the burger menu (the three horizontal white bars) on the top left. You should see a side panel appear. Click on Open, and load in the crab_data_extended.omv file.
If you would like to analyse this red crab data in R, you can download either the crab_data_extended.omv or crab_data_extended.csv file in the DA Topic 2 tile on LMS.
Also open up a text document (e.g. a Word document), in which you can write down your responses and
save your R output as you work through the lab. Alternatively, you can simply save your R script, as the code you write is reproducible.
It is recommended that you save all your lab work, e.g. on OneDrive, so that you can access it easily at a later date.
Open up R, and open a new R script (Ctrl + Shift + N in Windows, or go to File -> New File -> R Script).
Recall that to open .omv files in R, we’ll need the jmvReadWrite package installed and loaded. If you are not sure whether you have this package, copy-paste and then run the following code in your R script, one line at a time.
install.packages("jmvReadWrite") # this line installs the package we need
library(jmvReadWrite) # this line loads the package in our current session
# Note that anything after a # is called a comment in R, and isn't treated as executable code
To run a line of code in RStudio, just have your cursor on that line, and click the Run Selected Line(s) button at the top right of the script (where the green arrow is, see reference image below). Your line of code will then be run, or executed, and you should see the code and some other output appear in the Console section below your script file.
Set your working directory to where you downloaded the crab data - if you’re not sure how to do this, just expand the Details box below.
Recall that to set your Working Directory (where R looks for files), the two simplest options in RStudio are:
Then:
crab_data_extended <- read_omv("crab_data_extended.omv")
# This line loads our crab data set into RStudio,
# and stores the data in an object we've called crab_data_extended
You should now see crab_data_extended listed in the Environment section of RStudio in the top right - this means the data is loaded in RStudio, and ready for analysis!
Alternatively, if you wanted to use the .csv file, you can simply use the following code:
crab_data_extended <- read.csv("crab_data_extended.csv", header = T)
# This line loads our crab data set into RStudio,
# and stores the data in an object we've called crab_data_extended
# The header = T part ensures the column names (e.g. CW, are treated as names rather than observations)
Suppose that research on a similar variety of crab has previously established that the mean (i.e. average) claw
length of those crabs was 40mm. We would like to determine if the recorded data in the crab_data_extended data aligns with this result.
Over the next few steps, we will conduct a one sample \(t\)-test in jamovi, to determine whether the mean claw length of the red crabs sample we have is also 40mm.
For this question you can assume that the claw lengths are known to come from a normal distribution. As you progress, copy the relevant output into your Word document.
If you would like to refresh your memory on one sample t-tests, check the Topic 2A Lecture.
To begin, click on the Analyses tab, and then click on T-Tests and select One Sample T-Test.
Since we are interested in the CLAW length of the crabs, drag the CLAW variable across to the Dependent Variables box. You will see that some automatic results will already appear in the Results section.
Under the Hypothesis heading, change the test value to 40, since this is our fixed reference value in this instance.
Under the Additional Statistics heading, select Mean difference, Confidence interval and Effect size
Interpret the mean difference value presented in the output.
Under the Additional Statistics heading, also select Descriptives and use this information to compute the Cohen’s \(d\) effect size by hand.
Confirm your result matches the jamovi output (do not worry if it slightly different, as this may be due to rounding).
See slides 11-12 of the Topic 2A Lecture for effect size details.
It is important to note that for jamovi one sample t-test output, the confidence interval reported is an interval for the difference from the specified reference value.
If our specified reference value is 0 then, as we might expect, the confidence interval will simply be for the parameter itself (e.g. a range for the likely population mean CLAW value). However, if we specify a non-zero reference value, as we have done above in 2.0.2, the confidence interval values shown should either be added to the reference value, or interpreted in the context of being the likely range for the mean difference.
To understand this better, try changing the test value back to 0, and compare how the results change.
Check the one sample \(t\)-test assumptions via:
CLAW observations, from the Exploration sectionChecks 2. and 3. can be selected under the Assumption Checks heading in the One Sample T-Test section.
If you do find that the assumptions are violated, you can conduct the non-parametric Wilcoxon Signed Rank test by selecting Wilcoxon rank under the Tests heading in the One Sample T-Test section.
Write a clear summary based on your \(t\)-test output, in the style presented in the lectures. Regardless of your findings in 2.0.6, assume that the relevant test assumptions have been satisfied.
Make sure to include the effect size and a 95% confidence interval (either of the CLAW length itself, or of the difference between the CLAW sample mean and the reference value).
Over the next few steps, we will conduct a one sample \(t\)-test in R, to determine whether the mean claw length of the red crabs sample we have is 40mm.
For this question you can assume that the claw lengths are known to come from a normal distribution. As you progress, copy the relevant output into your Word document.
If you would like to refresh your memory on one sample t-tests, check the Topic 2A Lecture.
Running statistical tests in R is quite different to the jamovi method. There is less structure, in the sense that we don’t have sections to navigate to, or buttons to select. On the other hand, we have more control over which tests we conduct, and the specifications within these tests.
To conduct a t-test in R, we can use the built-in t.test function.
We can access details about the function, and how to use it, by running the command ?t.test or, equivalently, help(t.test).
The structure of this function is as follows:
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
Here, all the components within the ( ) brackets are the arguments of the t.test function. Many of these arguments have pre-specified values - what this means is that, if we don’t specify different values, R will use the default settings (e.g. conf.level = 0.95 by default).
To begin, copy-paste or type the following code into your R script:
t.test(crab_data_extended$CLAW)
Here, all I have done is tell R that we want to conduct a t-test on the CLAW variable from our crab data.
The next step is to specify our null hypothesis, \(H_0\). In this instance, we are testing \(H_0: \mu = 40\).
Therefore, let’s add the argument mu = 40 to our t.test code, so that we have:
t.test(crab_data_extended$CLAW, mu = 40)
This replaces the default mu = 0 specification.
Since we are conducting a one-sample t-test, we can ignore the y, paired and var.equal arguments. Given we would like to conduct the test as a two-sided test (i.e. \(H_1: \mu \neq 40\)) at the 5% level of significance (i.e. \(\alpha = 0.05\)), we can either leave the alternative and conf.level arguments as their defaults, or explicitly specify them, as follows:
t.test(crab_data_extended$CLAW, mu = 40, alternative = "two.sided", conf.level = 0.95)
Run this code now, and take a look over the output - what do you observe?
Using the R functions mean and sd, compute the mean and standard deviation of the crabs’ claw values, respectively. Then, with these data, compute the Cohen’s \(d\) effect size by hand.
See slides 11-12 of the Topic 2A Lecture for effect size details.
Unlike in jamovi, in R the reported confidence interval for our t-test will always simply be for the parameter itself (e.g. a range for the likely population mean CLAW value).
Write out the confidence interval now.
The one sample \(t\)-test has several assumptions. We can check these via the following:
CLAW observationsTo do so in R is a little harder than in jamovi, but we’ll go through all the steps together here.
To create the histogram, we have several options, but the simplest would be the following:
hist(crab_data_extended$CLAW)
To create a Normal Q-Q plot, we can use the following code:
qqnorm(crab_data_extended$CLAW)
qqline(crab_data_extended$CLAW)
To conduct the Shapiro-Wilk Normality Test, we can use the built-in shapiro.test function, as follows:
shapiro.test(crab_data_extended$CLAW)
If you do find that the assumptions are violated, you can conduct the non-parametric Wilcoxon Signed Rank test via the wilcox.test function.
Write a clear summary based on your \(t\)-test output, in the style presented in the lectures. Regardless of your findings in 2.0.6, assume that the relevant test assumptions have been satisfied.
Make sure to include the effect size and a 95% confidence interval for the population mean CLAW length in your summary.
Suppose we are also interested in determining whether the average lengths of red crabs’ two claws differ.
For this question you can assume that the differences in claw lengths are known to come from a normal distribution. Copy the relevant output into your Word document.
When conducting paired t-tests, we technically need two observations/measurements of the same unit under different conditions/timepoints.
While we don’t technically have paired data for our red crabs, we do have measurements for their two claws (the CLAW and OtherClaw variables), which we’ll treat as repeated measurements here, simply so that we can go through the paired t-test process
Using the CLAW and OtherClaw variables (which we’ll treat as repeated measurements here), let’s conduct a paired \(t\)-test in jamovi to test whether there is a difference in the mean length of the red crabs’ claws.
To begin, click on the Analyses tab, and then click on T-Tests and select Paired Samples T-Test. Then drag the CLAW and OtherClaw variables across into the Paired Variables box.
Are the paired \(t\)-test assumptions satisfied? Explain, with reference to the appropriate results.
Write a clear summary based on your paired \(t\)-test output, in the style presented in the lectures. Make sure to include the effect size and confidence interval.
Provide a clear interpretation of the confidence interval produced as part of the paired \(t\)-test. Make sure to consider why this confidence interval supports your conclusion above.
Using the CLAW and OtherClaw variables (which we’ll treat as repeated measurements here), let’s conduct a paired \(t\)-test in R to test whether there is a difference in the mean length of the red crabs’ claws.
The good news is that we can continue to use the t.test function introduced in the previous question - we just need to adjust our inputs. Since we now have the two variables, we’ll need to add both in, and also update the paired = FALSE argument to paired = TRUE, as follows:
t.test(crab_data_extended$CLAW, crab_data_extended$OtherClaw, mu = 0, alternative = "two.sided", paired = TRUE, conf.level = 0.95)
Note that we’ve changed mu to be equal to 0, which effectively is equivalent to saying that the mean claw lengths are equal.
The main assumption to check is the normality assumption here - which we can do via the Shapiro-Wilk test again. However, remember that we’ll be testing the set of differences between the paired values - so we’ll need to calculate them first. Check the following code below, and then run it, to conduct the test:
differences <- crab_data_extended$CLAW - crab_data_extended$OtherClaw
shapiro.test(differences)
Write a clear summary based on your paired \(t\)-test output, in the style presented in the lectures. Make sure to include the effect size and confidence interval.
Provide a clear interpretation of the confidence interval produced as part of the paired \(t\)-test. Make sure to consider why this confidence interval supports your conclusion above.
To conclude our red crab study and \(t\)-tests overview, suppose we are interested in comparing the mean weights of male and female red crabs. Since each crab is an independent unit, the most appropriate test here would be an independent samples aka two sample t-test.
For this question you can assume that the weights are known to come from a normal distribution. Copy the relevant output into your Word document.
The terms Independent Samples \(t\)-test and Two Sample \(t\)-test are synonymous.
To conduct a two sample \(t\)-test in jamovi, begin by clicking on the Analyses tab, and then click on T-Tests and select Independent Samples T-Test.
Since we are interested in comparing the mean weight of male and female red crabs, drag the WEIGHT variable across to the Dependent Variables box, and drag the SEX variable across to the Grouping Variable box - that’s it!
As part of your analysis, conduct a Levene’s test to check the equal variances assumption, by selecting the Homogeneity test box under the Assumption Checks heading.
Based on the test result, should you use the Student’s or Welch’s version of the two sample \(t\)-test, and why? Make sure to select the appropriate box under the Tests heading.
Write a clear summary based on your independent samples \(t\)-test output, in the style presented in the lectures. Make sure to include the effect size and confidence interval for the mean difference.
Suppose that you have concerns about the test assumptions for your two sample \(t\)-test. Repeat your comparison of the weights of the male crabs and female crabs, this time using the non-parametric Mann-Whitney U test. Produce the relevant output, save a copy in your Word document, and write a clear summary, in the style presented in the lectures.
As part of this question, you may like to produce assumption check results, such as a Normal Q-Q plot.
To conduct a two sample \(t\)-test in R, we can continue to use the versatile t.test function - we just need to adjust some of the argument specifications.
Since we are interested in comparing the mean weight of male and female red crabs, we need to specify both the WEIGHT and SEX variables in our code. To begin with, we will also assume the variances between the two groups are equal. Run the code below in R to conduct the two sample \(t\)-test:
t.test(crab_data_extended$WEIGHT ~ crab_data_extended$SEX, var.equal = TRUE, conf.level = 0.95)
Note here that for the first argument, we are using the format variable of interest - split by - grouping variable with the ~ symbol used to split the data by our grouping variable.
As part of our analysis, we should conduct a Levene’s test to check the equal variances assumption. In R, this can be done with the leveneTest function, which is contained within the car package, which should be pre-installed with R. So we will need to load the car package before we can use the leveneTest function.
Run the following code, one line at a time, to conduct the test:
library(car)
leveneTest(crab_data_extended$WEIGHT ~ as.factor(crab_data_extended$SEX))
Based on the test result, should you use the Student’s or Welch’s version of the two sample \(t\)-test, and why?
If you decide to use the Welch’s version of the two sample \(t\)-test, adjust the var.equal = TRUE argument to var.equal = FALSE in your previous t.test code, and rerun the test.
Write a clear summary based on your independent samples \(t\)-test output, in the style presented in the lectures. Make sure to include the confidence interval for the mean difference.
Suppose that you have concerns about the test assumptions for your two sample \(t\)-test. Repeat your comparison of the weights of the male crabs and female crabs, this time using the non-parametric Mann-Whitney U test.
In R, the wilcox.test function is quite versatile, and can also be used to conduct the Mann-Whitney U test, so long as we specify the arguments correctly - two groups of data need to be provided, rather than one. The main trick here is that both groups must be numeric, so if we are splitting the data by a categorical variable (Male or Female), we need to set this up carefully. Read over the code below, and then run it to carry out the test:
male_crab_weights <- crab_data_extended[crab_data_extended$SEX == "Male", ]$WEIGHT
female_crab_weights <- crab_data_extended[crab_data_extended$SEX == "Female", ]$WEIGHT
wilcox.test(male_crab_weights, female_crab_weights)
Produce the relevant output, save a copy in your Word document, and write a clear summary, in the style presented in the lectures.
As part of this question, you may like to produce assumption check results, such as a Normal Q-Q plot.
The Indo-Pacific Lionfish (Pterois volitans/miles) is an invasive species in parts of the Atlantic Ocean and the Carribean Sea. Commercial fishing of lionfish has been proposed as a means of controlling population sizes. However, concerns have been raised about the safety of lionfish for human consumption, as they may contain potentially harmful levels of organic methylmercury (MeHg) due to bioaccumulation.
Figure 5.1: Note. From File:Common lion fish Pterois volitans.jpg, by Michael Gäbler, 2014, Wikimedia Commons (https://commons.wikimedia.org/). CC BY 3.0 DEED
Johnson et al. (2021) studied the total mercury (THg) levels in lionfish specimens taken from two locations in Florida.
Data from their study is available in the file lionfish_thg.omv in this week’s tile on LMS, and contains recorded values for the following variables:
Create a descriptives table in row format for the lionfish_thg.omv data, using the variables THG, SEX and LOCATION.
Comment on any interesting values you observe.
Create a histogram of the THG observed values with a density curve overlaid. Looking at the distribution, do you think that it is appropriate to use a \(t\)-test to analyse this data? Explain your reasoning.
Regardless of your answer to the previous question, suppose that you now would like to conduct a one sample \(t\)-test of this THG data.
Assume that the recommended limit for mercury concentration is 1 milligram per kilogram of fish (equivalent to 1 microgram per gram), in accordance with e.g. US EPA standards.
Suppose that it is currently believed that eating lionfish is borderline unsafe, and that the average lionfish mercury concentration (THG) is 1 microgram per gram. However, like Johnson et al. (2021), you would like to test if the mean THG levels are actually less than 1 microgram per gram.
Write out an appropriate null and alternative hypothesis for your test, using 1 microgram per gram as your reference value.
Conduct a one sample \(t\)-test of the THG data in jamovi, using your specified hypotheses. Record the test statistic, \(p\)-value, mean difference, and the 95% confidence interval for the mean difference.
Based on your results, what is your conclusion? Does it appear that lionfish are safe, or unsafe, to eat?
Compute and interpret the effect size for your one sample \(t\)-test.
Check the test assumptions of your one sample \(t\)-test via the Shapiro-Wilk test and Q-Q plot inspection. What do you conclude?
Regardless of your findings in the previous question, suppose you decide to conduct an equivalent non-parametric test of the THG levels. Note down the appropriate non-parametric test to use, and then carry this test out in jamovi, and provide a brief summary of your results.
Suppose you are interested in assessing if male and female lionfish exhibit different concentrations of mercury.
Conduct a two sample \(t\)-test to compare the mean THG levels of male and female lionfish. Make sure to check the test assumptions and compute the effect size, and write a short summary detailing your findings.
If you find that the two sample \(t\)-test assumptions are violated, conduct the appropriate non-parametric equivalent test.
Recall that in DA Computer Lab 1 we introduced a raw, messy data set on dwarf pea plant seedlings, which had
been collected as part of an experiment in an LTU BIO1AP lab class in 2022. Figure 6.2 below contains this data.
Previously, we produced descriptive statistics and some initial plots of this data. In this DA computer lab, now that we have learnt how to conduct various t-tests in jamovi, we can begin to properly analyse this data, and test hypotheses.
Figure 6.1: Note. From File:Leaves of Pisum sativum (2).JPG, by Chmee2, 2011, Wikimedia Commons (https://commons.wikimedia.org/). CC BY 3.0 DEED
To recap, in this experiment dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is of interest to determine if application of GA to the seedlings has an impact.
For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:
The height of the seedlings was then recorded at a later date. The pea plant data in Figure 6.2 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.
Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).
Figure 6.2: Pea Plant Raw Data
In DA Computer Lab 1, you should have created a data file in jamovi containing the cleaned pea plant data. If you have this file to hand, skip this step and proceed to 6.2. If for whatever reason you do not have this data file saved, please complete the following steps:
Data view.If you are stuck on any value, you may like to discuss this with other students and/or your lab demonstrator.
Suppose that based on previous studies, the mean height of pea plant seedlings which have been exposed to natural conditions is known to be 280mm. Using an appropriate \(t\)-test, test in jamovi if the mean height of the relevant pea plant seedlings in the jamovi data file you have prepared is different to 280mm.
Write a clear summary statement, and make sure to copy the relevant jamovi output to your Word document.
Assume that this mean value is for seedlings which have been growing for the same amount of time as the seedlings in the the BIO1AP experiment had been, when their data was recorded in Figure 6.2.
Suppose we would like to now compare the mean heights of the pea plant seedlings from the BIO1AP experiment, for the different treatments. Using the appropriate test(s) in jamovi, compare the mean heights of pea plant seedlings exposed to treatment C and treatment TA.
In order to conduct this test, you may need to reformat your data slightly. A simple option is to remove the rows of data for TB observations, and then once the analysis is complete, close jamovi without saving the adjustments. For the next question, you can repeat the process, but this time removing the rows of data for C observations.
As a separate test, also compare the mean heights of pea plant seedlings exposed to treatment TA and TB.
Write clear summary statements for your analyses in 6.3 and 6.4, and make sure to copy the relevant jamovi output to your Word document.
Make sure you check any relevant test assumptions before concluding your tests.
Before you finish up, make sure to save both your Word document and your pea plant jamovi file to your OneDrive, for future reference.
Green, P. T. (1997). Red crabs in rain forest on Christmas Island, Indian Ocean: activity patterns, density and biomass. Journal of Tropical Ecology, 13(1), 17-38
Johnson, E.G., Dichiera, A., Goldberg, D., Swenarton, M. and Gelsleichter, J. (2021). Total mercury concentrations in invasive lionfish (Pterois volitans/miles) from the Atlantic coast of Florida. PLOS ONE 16(9): e0234534. https://doi.org/10.1371/journal.pone.0234534
These notes have been prepared by Rupert Kuveke and other members of the Department of Mathematical and Physical Sciences. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.