Welcome to Computer Lab 1 for the Data Analysis (DA) component of BIO2POS!
In DA Topic 1, we introduced the concept of variables, and discussed different measures we can use for describing data, such as mean, median and variance.
We also introduced hypothesis testing, which is a key component of statistical inference, and covered \(p\)-value and confidence interval calculations.
In this computer lab, you will start to learn how to use the statistical software jamovi, and conduct some descriptive and inferential statistical calculations using real data sets. You will learn how to compute descriptive statistics, produce various statistical tables and plots, and interpret jamovi output.
Don’t be alarmed if you see a lot of text in the following questions: much of it is guidance on navigating jamovi, in case you are not familiar with the software.
These labs are designed to provide you with plenty of opportunities to practice different aspects of the statistical content covered in the lectures.
Each lab consists of core questions (with the 🌱 symbol) and extension questions (with the 🌳 symbol).
Having completed this lab, you will be able to obtain and discuss the following statistical outputs for a data set in jamovi:
Before you begin, please check the following:
Please complete at least step 1. first, as doing so will help you to better understand the concepts you will need for this computer lab.
All the BIO2POS DA labs will use the jamovi software.
You can download jamovi free of charge here (pick the solid option).
If you do not already have jamovi installed on your personal device, please install it now, and ask the lab demonstrator for assistance if you encounter any issues.
To begin, we will focus on a small data set about red crabs from Christmas island (Green, 1997).
If you already have some experience with jamovi, then this exercise will serve as a refresher.
The red crab data set contains recorded values for variables including:
Figure 2.1: Note. From File:Crab migration extra - chris bray-1 (cropped).jpg, by [ChrisBrayPhotography], 2019, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 4.0 DEED
This red crab data is available in this week’s tile on the LMS, in the file crab_data.omv. Download this file now,
and save it on your computer. Also open up a text document (e.g. a Word document), in which you can write down your responses and
save your jamovi output as you work through the lab.
It is recommended that you save all your lab work, e.g. on OneDrive, so that you can access it easily at a later date.
Open up jamovi, and click on the burger menu (the three horizontal white bars) on the top left. You should see a side panel appear. Click on Open, and load in the crab_data.omv file.
If you now check the Variables and Data tabs in jamovi, you should see all the crab variable names, and the recorded values for these variables.
How would you describe each of the variables, using the language introduced in the Topic 1A lecture?
E.g. is CW categorical or numerical, ordinal, nominal, discrete or continuous, etc?
To check your thinking, you can click on the names of the variables in the Variables tab to bring up more information.
We will often start a data analysis with an inspection of the data.
In jamovi, click on the Analyses tab in the top menu bar, and then click on the Exploration section, and select Descriptives. We can use this section to produce various descriptive statistics and plots, to help us better understand our data. In the following steps, we will cover some of the options available to us.
To begin, drag the CW variable across to the Variables box. This will produce a descriptives table summarising key details about the CW variable.
Make a note of the CW mean, median, and standard deviation.
Note that we can easily assess multiple variables simultaneously. Try dragging another variable into the Variables box - you should see a second column added to the descriptives table.
You can also change the layout so that the descriptive statistics are shown in rows, rather than columns - try this now by changing the Descriptives selection box from Variables across columns to Variables across rows.
You will have noticed that there is also a box called Split by. We can split or separate our data by a categorical variable, to help highlight similarities and differences between different categories. Add the SEX variable to the Split by box, and make a note of the results.
Does it look like there is a large difference between the mean and median CW values for male and female red crabs?
Directly below the Descriptives section, you should see drop-down sections entitled Statistics and Plots.
Let’s take a look at the Statistics section. If you click on this, you should see a number of bold headings and boxes appear. Many of these boxes are already ticked - these are the details provided by default when we produce a descriptives table in jamovi.
You can try ticking and unticking various boxes, to check the effect on the descriptives table in the Results section. At this stage of the semester, the most relevant new descriptive information to include is the IQR, and possibly the Variance (although we already have related information for this, in the form of the standard deviation).
Recall that the standard deviation is simply the square root of the variance.
With the CW split by SEX, click the Confidence Interval for Mean option in the Statistics section (under Mean Dispersion). Write down the 95% confidence intervals for the male and female crabs’ carapace width, in the form \[(\text{lower value, upper value})\]
If you compare these intervals, what do you notice? Is there evidence to suggest that the average carapace width is different between male and female red crabs? Make sure to explain your reasoning clearly.
Rather than assessing data solely in a table format, it is often beneficial to visualise the data. Producing plots and graphs can help convey key details of the data clearly and make interpretation easier.
Click on the Plots drop-down section below the Statistics section, and take a look at the different options available.
To begin, create a histogram of CW by clicking the Histogram button. At this stage, do not split by SEX.
Comment on the shape of the data. Does the distribution look symmetric, or skewed?
Add a density curve to the histogram, by clicking the Density button.
Next, create a box plot of the CW values, with the outliers (if any) labelled.
If you would like to show the individual observations, you can click the Data box in the Box Plots column.
Recall that the line in the middle of a box plot shows the median, not the mean! If you would like to also show the mean, you can select the Mean option in the Box Plots column.
If we would like to show both the spread of our data, and the density of the data at a certain point, all in the one plot, then we can use a violin plot. You can think of a violin plot as a combination of a box plot and a density plot.
Click on the Violin box in the Box Plots column, and comment on the output.
jamovi is somewhat limited in terms of personalising graphs and plots. If you would like to change details such as axes labels, then we can do this for each relevant variable name (e.g. CW) by navigating to the Variables tab along the top menu, clicking on the chosen variable name, and then editing the name in the box that appears. This should automatically update any Results you have produced (tables, plots etc).
Try changing the CW variable name to Carapace Width. If you get stuck, check with your lab demonstrator.
If you click on the three vertical dots on the top right hand side of jamovi, you will notice that you can change various details, such as Zoom, Results layout, and Plots layout. This is the main (and currently only) way to really change the appearance of plots in jamovi.
The default appearance is fine, but you may like to test out the different options available in the Plot theme and Color palette settings, to find an alternative presentation you prefer (I am currently using Minimal and Dark2).
To conclude our overview of the Exploration section, create a histogram (with density overlaid), a box plot and a violin plot of Carapace Width, now split by SEX.
Comment on your results. Do you think that splitting the data by SEX has been useful? What new details do you observe, as a result of splitting the data by SEX?
CLAW instead of CW.Now that you have had a chance to cover the steps for producing descriptive statistics and plots in jamovi, we will introduce a new, slightly more detailed data set for analysis.
Continuing our crab theme, we will look at data on camouflage characteristics of shore crabs (Carcinus maenas) from the Cornish peninsula in the UK, collected as part of a study by Stevens et al. (2014). You can access the paper via this link, and it makes for an interesting read.
Figure 3.1: Note. From File:Carcinus maenas 138761951.jpg, by Victor Heng, 2021, Wikimedia Commons (https://commons.wikimedia.org/). CCO 1.0 DEED
The shore crab data from this paper is available in this week’s tile on the LMS, in the file shore_crabs.omv. Download this file now, and open it in jamovi.
You will notice that there is a large number of variables in this data set. For the purposes of this lab, we will focus on the following (although you are welcome to assess additional variables):
To begin, produce a descriptives table, and a histogram and box plot of the shore crabs’ carapace widths.
Comment on the distribution of the data - does it appear to be symmetric or skewed? Based on your choice, which measure of location would be better to report?
Split the data by Site, and reassess the plots. Do you observe any interesting differences across the sites?
Split the data by both Site and Maturity, and reassess the plots. Do you notice any major changes between the immature and mature crabs at the different sites?
So far, we have not created any Bar Plots. These can be helpful when we are assessing data across different categories of a categorical variable. In the Plots drop-down section, click the Bar Plot box to produce a Bar Plot.
In your Split by box, you should have two variables - Site and Maturity. Surprisingly, the order in which they are placed has a direct impact on the resultant plots produced. Try switching the order now, and observe the changes in the plots.
Write a short summary of the shore crab data, focusing on either the mean or median (whichever you believe is more appropriate) carapace width of the crabs. You can focus on the differences between sites, or the differences between immature and mature crabs, or both.
The red crab data we assessed in Question 2 had been nicely cleaned and prepared for analysis, and only some of the shore crab data from Question 3 was assessed, for simplicity. However, in practice it can often be the case that we are provided with raw, messy data, with missing or incorrectly recorded values. Such data will require cleaning, preparation and entry into a proper file format, before we can begin to consider conducting an actual statistical analysis. While it may not be your job as an analyst to perform such preparatory steps, sometimes you may not have much of a choice, due to time and budget constraints, or other limitations. Regardless, having some experience with interpreting, cleaning and preparing raw data is a valuable skill.
Figure 4.1 below contains raw, messy data recorded as part of an experiment in an LTU BIO1AP lab class in
2022. In this experiment, dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations
of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants
are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is
of interest to determine if application of GA to the seedlings has an impact.
For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:
The height of the seedlings was then recorded at a later date. The pea plant data in Figure 4.1 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.
Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).
Figure 4.1: Pea Plant Raw Data
Suppose we would like to conduct an exploratory analysis of this pea plant data in jamovi. Our first step would be to record the data in an appropriate format in jamovi.
Take a look at the data in Figure 4.1 above, and consider how we might best record this information in jamovi. Discuss potential options with other students and/or your lab demonstrator.
In jamovi, click on the burger menu on the top left, and select New, to open a new jamovi window. You can now begin to add in the pea plant data to the spreadsheet in the Data section.
Based on your considerations in 4.1, you should have determined that an appropriate way in which to prepare the data for analysis is something along the lines of the following steps. Complete these steps now:
In the Variables section, rename variable A to Seedling Height, and rename variable B to Treatment Type. You will have to determine what type of variable each of these are.
In the Data section, manually type in the data from Figure 4.1 into the appropriate column. For the purposes of this
lab, do not make a distinction between the different benches. Do not include recorded values which are
crossed or scribbled out. For recorded values which are hard to read, either exclude them or type in your
best guess - it’s up to you.
If you are stuck on a value, you may like to discuss this with other students and/or the lab demonstrator.
With the pea plant data now nicely recorded in jamovi, it is time to compute some descriptive statistics and plots. Using the appropriate processes, complete the following questions:
If you are not sure how to approach one or more of the jamovi-related questions below, use the steps outlined in Questions 2 and 3 as a guide.
Produce a descriptives table and a bar plot for the treatment groups.
What percentage of seedlings (that survived) were in each treatment group?
Create histograms of the seedling heights, for each treatment group.
What is the overall sample size for this data?
What is the mean seedling height, overall and for each treatment group? Which treatment group has the highest mean height?
Provide a brief interpretation of the 95% confidence intervals reported.
Using these confidence intervals, can you conclude anything about the mean heights of the seedlings in the different treatment groups?
Take a few minutes to think about the study design for this experiment. When preparing the experiment, what are some key factors or steps that may have been considered? Discuss your thoughts with other students and/or the lab demonstrator.
Copy all the relevant jamovi output into your Word document, for your records.
Figure 5.1: Note. From File:Menura novaehollandiae novaehollandiae - Flickr - Lip Kee.jpg, by Lip Kee, 2014, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 2.0 DEED
In the recent ED lecture slides, the paper by Nugent et al. (2014) on the interactions between lyrebirds and fire in south-eastern Australia was discussed.
This paper contains several tables of descriptive and inferential statistics.
The Microsoft AI tool Copilot was used to summarise the statistical information in the paper, via the following prompt:
The AI output is included in the chunk below (minimised by default). Click on the symbol to expand the section, and have a quick look over the output.
Certainly! In the study titled “Interactions between the superb lyrebird (Menura novaehollandiae) and fire in south-eastern Australia,” conducted by Daniel T. Nugent, Steven W. J. Leonard, and Michael F. Clarke, the researchers investigated the relationship between superb lyrebirds and fire in fire-prone Eucalyptus forests. Here are the key statistical findings:
Lyrebird Foraging and Litter Fuel Loads: Lyrebirds are important ecosystem engineers that accelerate litter decomposition in Eucalyptus forests through their foraging behavior. Lyrebird foraging significantly reduces litter fuel loads by approximately 25% (1.66 t/ha) in areas where they are free to forage. This reduction in surface fuel loads may inhibit fire spread.
Fire Behavior Modeling: Fire-behavior models were used to assess the potential influence of lyrebird scratchings on fire behavior. Lyrebird foraging led to a lower likelihood of fire occurrence and less intense fire.
Distribution and Foraging Patterns: Lyrebirds were present in both unburnt and ground-burnt sites at the landscape scale. They favored foraging in unburnt patches within patchily burnt sites. The distinct vegetation structure and composition between burnt and unburnt patches influenced lyrebird distribution and foraging patterns.
Implications: Alternative stable states may emerge in Eucalyptus forests due to feedback mechanisms among lyrebirds, vegetation, and fuel accumulation. Considering the ecological role of lyrebirds is crucial for forest fuel management and conservation in extensive, fire-prone forests in south-eastern AustraliaOne thing you may have noticed is that, while this seems to be a helpful summary, no descriptive statistics (e.g. means, standard deviations etc) are actually discussed. This may be due to a sub-optimal prompt, but it highlights the fact that while AI tools can often assist us in our learning, it is always good to check things ourselves.
Access the Nugent et al. (2014) paper via the LTU library, and take a look at Tables 1-4. We will discuss the tests used later in the semester, and you are not expected to read the paper in depth - at this stage, your main objective is to try and understand the presentation of the data in these tables, specifically the mean and standard deviation descriptive statistics.
Discuss the tables’ details with other students and/or the lab demonstrator.
Before you finish up, make sure to save both your Word document and your pea plant jamovi (.omv) file to your OneDrive, for future reference.
Green, P. T. (1997). Red crabs in rain forest on Christmas Island, Indian Ocean: activity patterns, density and biomass. Journal of Tropical Ecology, 13(1), 17-38
Nugent, D. T., Leonard, S. W. J. and Clarke, M. F. (2014). Interactions between the superb lyrebird (Menura novaehollandiae) and fire in south-eastern Australia. Wildlife Research, 41(3), 203-211
Stevens, M, Lown, A. E. and Wood, L. E. (2014). Camouflage and Individual Variation in Shore Crabs (Carcinus maenas) from Different Habitats. PLOS ONE 9(12): e115586. https://doi.org/10.1371/journal.pone.0115586
These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.