Topic 1: Descriptive Statistics and Graphs in jamovi


Welcome to Computer Lab 1 for the Data Analysis (DA) component of BIO2POS!

In DA Topic 1, we introduced the concept of variables, and discussed different measures we can use for describing data, such as mean, median and variance.

We also introduced hypothesis testing, which is a key component of statistical inference, and covered \(p\)-value and confidence interval calculations.

In this computer lab, you will start to learn how to use the statistical software jamovi, and conduct some descriptive and inferential statistical calculations using real data sets. You will learn how to compute descriptive statistics, produce various statistical tables and plots, and interpret jamovi output.

Don’t be alarmed if you see a lot of text in the following questions: much of it is guidance on navigating jamovi, in case you are not familiar with the software.


Learning Outcomes

These labs are designed to provide you with plenty of opportunities to practice different aspects of the statistical content covered in the lectures.

Each lab consists of core questions (with the 🌱 symbol) and extension questions (with the 🌳 symbol).

  • We recommend that you aim to complete at least the core component question(s) within the scheduled lab time
  • If you have time, you can work through the extension component question(s) either during the lab, or later in your own time
  • We recommend that you aim to complete all questions before the next DA lecture

Having completed this lab, you will be able to obtain and discuss the following statistical outputs for a data set in jamovi:

  • descriptive statistics
  • bar charts
  • box plots
  • histograms
  • violin plots
  • confidence intervals


Before you begin, please check the following:

  1. Have you attended this week’s lectures/watched the lecture recordings?
  2. Have you completed this week’s DA online learning activity (if applicable)?
  3. Have you completed this week’s DA Quiz (if applicable)?

Please complete at least step 1. first, as doing so will help you to better understand the concepts you will need for this computer lab.

1 jamovi 🌱

All the BIO2POS DA labs will use the jamovi software.

You can download jamovi free of charge here (pick the solid option).

If you do not already have jamovi installed on your personal device, please install it now, and ask the lab demonstrator for assistance if you encounter any issues.

2 Red Crab Data 🌱

To begin, we will focus on a small data set about red crabs from Christmas island (Green, 1997).

If you already have some experience with jamovi, then this exercise will serve as a refresher.

The red crab data set contains recorded values for variables including:

  • CW: The carapace width (mm)
  • LEG: The leg length (mm)
  • CLAW: Claw length (mm)
  • WEIGHT: Weight (grams)
  • SEX: The crab’s sex (M/F)
<span style='font-size:10px;'>Note. From File:Crab migration extra - chris bray-1 (cropped).jpg, by [ChrisBrayPhotography], 2019, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY-SA 4.0 DEED](https://creativecommons.org/licenses/by-sa/4.0/deed.en)</span>

Figure 2.1: Note. From File:Crab migration extra - chris bray-1 (cropped).jpg, by [ChrisBrayPhotography], 2019, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 4.0 DEED

2.1

This red crab data is available in this week’s tile on the LMS, in the file crab_data.omv. Download this file now, and save it on your computer. Also open up a text document (e.g. a Word document), in which you can write down your responses and save your jamovi output as you work through the lab.

It is recommended that you save all your lab work, e.g. on OneDrive, so that you can access it easily at a later date.

2.2

Open up jamovi, and click on the burger menu (the three horizontal white bars) on the top left. You should see a side panel appear. Click on Open, and load in the crab_data.omv file.

2.3

If you now check the Variables and Data tabs in jamovi, you should see all the crab variable names, and the recorded values for these variables.

2.4

How would you describe each of the variables, using the language introduced in the Topic 1A lecture? E.g. is CW categorical or numerical, ordinal, nominal, discrete or continuous, etc?

To check your thinking, you can click on the names of the variables in the Variables tab to bring up more information.

2.5 Descriptives

We will often start a data analysis with an inspection of the data.

In jamovi, click on the Analyses tab in the top menu bar, and then click on the Exploration section, and select Descriptives. We can use this section to produce various descriptive statistics and plots, to help us better understand our data. In the following steps, we will cover some of the options available to us.

2.5.1

To begin, drag the CW variable across to the Variables box. This will produce a descriptives table summarising key details about the CW variable.

Make a note of the CW mean, median, and standard deviation.

2.5.2

Note that we can easily assess multiple variables simultaneously. Try dragging another variable into the Variables box - you should see a second column added to the descriptives table.

2.5.3

You can also change the layout so that the descriptive statistics are shown in rows, rather than columns - try this now by changing the Descriptives selection box from Variables across columns to Variables across rows.

2.5.4

You will have noticed that there is also a box called Split by. We can split or separate our data by a categorical variable, to help highlight similarities and differences between different categories. Add the SEX variable to the Split by box, and make a note of the results.

Does it look like there is a large difference between the mean and median CW values for male and female red crabs?

2.6 Statistics

Directly below the Descriptives section, you should see drop-down sections entitled Statistics and Plots.

Let’s take a look at the Statistics section. If you click on this, you should see a number of bold headings and boxes appear. Many of these boxes are already ticked - these are the details provided by default when we produce a descriptives table in jamovi.

2.6.1

You can try ticking and unticking various boxes, to check the effect on the descriptives table in the Results section. At this stage of the semester, the most relevant new descriptive information to include is the IQR, and possibly the Variance (although we already have related information for this, in the form of the standard deviation).

Recall that the standard deviation is simply the square root of the variance.

2.6.2

With the CW split by SEX, click the Confidence Interval for Mean option in the Statistics section (under Mean Dispersion). Write down the 95% confidence intervals for the male and female crabs’ carapace width, in the form \[(\text{lower value, upper value})\]

If you compare these intervals, what do you notice? Is there evidence to suggest that the average carapace width is different between male and female red crabs? Make sure to explain your reasoning clearly.

2.7 Plots

Rather than assessing data solely in a table format, it is often beneficial to visualise the data. Producing plots and graphs can help convey key details of the data clearly and make interpretation easier.

Click on the Plots drop-down section below the Statistics section, and take a look at the different options available.

2.7.1

To begin, create a histogram of CW by clicking the Histogram button. At this stage, do not split by SEX.

Comment on the shape of the data. Does the distribution look symmetric, or skewed?

2.7.2

Add a density curve to the histogram, by clicking the Density button.

2.7.3

Next, create a box plot of the CW values, with the outliers (if any) labelled.

If you would like to show the individual observations, you can click the Data box in the Box Plots column.

Recall that the line in the middle of a box plot shows the median, not the mean! If you would like to also show the mean, you can select the Mean option in the Box Plots column.

2.7.4

If we would like to show both the spread of our data, and the density of the data at a certain point, all in the one plot, then we can use a violin plot. You can think of a violin plot as a combination of a box plot and a density plot.

Click on the Violin box in the Box Plots column, and comment on the output.

  • Which of the plots you have produced so far (histogram, box plot, violin plot) do you prefer? If you are in a lab class, discuss this with the person sitting next to you and/or your lab demonstrator.

2.8

jamovi is somewhat limited in terms of personalising graphs and plots. If you would like to change details such as axes labels, then we can do this for each relevant variable name (e.g. CW) by navigating to the Variables tab along the top menu, clicking on the chosen variable name, and then editing the name in the box that appears. This should automatically update any Results you have produced (tables, plots etc).

Try changing the CW variable name to Carapace Width. If you get stuck, check with your lab demonstrator.

2.8.1

If you click on the three vertical dots on the top right hand side of jamovi, you will notice that you can change various details, such as Zoom, Results layout, and Plots layout. This is the main (and currently only) way to really change the appearance of plots in jamovi.

The default appearance is fine, but you may like to test out the different options available in the Plot theme and Color palette settings, to find an alternative presentation you prefer (I am currently using Minimal and Dark2).

2.9

To conclude our overview of the Exploration section, create a histogram (with density overlaid), a box plot and a violin plot of Carapace Width, now split by SEX.

Comment on your results. Do you think that splitting the data by SEX has been useful? What new details do you observe, as a result of splitting the data by SEX?


If you feel comfortable with the steps covered so far, that’s great!

If you would like to practice the previous steps, you could try repeating them using e.g. CLAW instead of CW.


3 Shore Crab Data 🌱

Now that you have had a chance to cover the steps for producing descriptive statistics and plots in jamovi, we will introduce a new, slightly more detailed data set for analysis.

Continuing our crab theme, we will look at data on camouflage characteristics of shore crabs (Carcinus maenas) from the Cornish peninsula in the UK, collected as part of a study by Stevens et al. (2014). You can access the paper via this link, and it makes for an interesting read.

<span style='font-size:10px;'>Note. From File:Carcinus maenas 138761951.jpg, by [Victor Heng](https://www.inaturalist.org/users/3275949), 2021, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CCO 1.0 DEED](https://creativecommons.org/publicdomain/zero/1.0/deed.en)</span>

Figure 3.1: Note. From File:Carcinus maenas 138761951.jpg, by Victor Heng, 2021, Wikimedia Commons (https://commons.wikimedia.org/). CCO 1.0 DEED

3.1

The shore crab data from this paper is available in this week’s tile on the LMS, in the file shore_crabs.omv. Download this file now, and open it in jamovi.

You will notice that there is a large number of variables in this data set. For the purposes of this lab, we will focus on the following (although you are welcome to assess additional variables):

  • Site: The location the recording was made
  • Maturity: The maturity of the crab (i.e. adult, not adult)
  • Carapace width: The carapace width of the crab

3.1.1

To begin, produce a descriptives table, and a histogram and box plot of the shore crabs’ carapace widths.

Comment on the distribution of the data - does it appear to be symmetric or skewed? Based on your choice, which measure of location would be better to report?

3.1.2

Split the data by Site, and reassess the plots. Do you observe any interesting differences across the sites?

3.1.3

Split the data by both Site and Maturity, and reassess the plots. Do you notice any major changes between the immature and mature crabs at the different sites?

3.1.4

So far, we have not created any Bar Plots. These can be helpful when we are assessing data across different categories of a categorical variable. In the Plots drop-down section, click the Bar Plot box to produce a Bar Plot.

3.1.5

In your Split by box, you should have two variables - Site and Maturity. Surprisingly, the order in which they are placed has a direct impact on the resultant plots produced. Try switching the order now, and observe the changes in the plots.

3.1.6

Write a short summary of the shore crab data, focusing on either the mean or median (whichever you believe is more appropriate) carapace width of the crabs. You can focus on the differences between sites, or the differences between immature and mature crabs, or both.


Well done on completing the previous core questions - you have now covered the main content for this lab! The following extension questions (denoted by the 🌳 symbol) will help consolidate your understanding and extend your skill set.


4 Pea Plant Data 🌳

The red crab data we assessed in Question 2 had been nicely cleaned and prepared for analysis, and only some of the shore crab data from Question 3 was assessed, for simplicity. However, in practice it can often be the case that we are provided with raw, messy data, with missing or incorrectly recorded values. Such data will require cleaning, preparation and entry into a proper file format, before we can begin to consider conducting an actual statistical analysis. While it may not be your job as an analyst to perform such preparatory steps, sometimes you may not have much of a choice, due to time and budget constraints, or other limitations. Regardless, having some experience with interpreting, cleaning and preparing raw data is a valuable skill.

Background Information

Figure 4.1 below contains raw, messy data recorded as part of an experiment in an LTU BIO1AP lab class in 2022. In this experiment, dwarf pea plant (Pisum sativum) seedlings were exposed to different concentrations of gibberellic acid (GA), in order to study the effect of GA application on plant growth. These dwarf pea plants are naturally deficient in GA, due to a mutation of a gene in the pathway for biosynthesis of GA. Therefore it is of interest to determine if application of GA to the seedlings has an impact.

For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:

  • C: a control group, were sprayed with water
  • TA: a treatment group, were sprayed with a 25mg/L solution of GA
  • TB: a treatment group, were sprayed with a 50mg/L solution of GA

The height of the seedlings was then recorded at a later date. The pea plant data in Figure 4.1 has pea plant height (in mm) recordings, for the three treatments, across 7 different benches.

Note that the number of seedlings (1 to 6) in each of the three groups varied between benches, and that some recordings were crossed or scribbled out (perhaps due to the seedling being damaged or dying).

<span style='font-size:10px;'>Pea Plant Raw Data</span>

Figure 4.1: Pea Plant Raw Data

4.1

Suppose we would like to conduct an exploratory analysis of this pea plant data in jamovi. Our first step would be to record the data in an appropriate format in jamovi.

Take a look at the data in Figure 4.1 above, and consider how we might best record this information in jamovi. Discuss potential options with other students and/or your lab demonstrator.

4.2

In jamovi, click on the burger menu on the top left, and select New, to open a new jamovi window. You can now begin to add in the pea plant data to the spreadsheet in the Data section.

Based on your considerations in 4.1, you should have determined that an appropriate way in which to prepare the data for analysis is something along the lines of the following steps. Complete these steps now:

  1. In the Variables section, rename variable A to Seedling Height, and rename variable B to Treatment Type. You will have to determine what type of variable each of these are.

  2. In the Data section, manually type in the data from Figure 4.1 into the appropriate column. For the purposes of this lab, do not make a distinction between the different benches. Do not include recorded values which are crossed or scribbled out. For recorded values which are hard to read, either exclude them or type in your best guess - it’s up to you.

If you are stuck on a value, you may like to discuss this with other students and/or the lab demonstrator.

4.3

With the pea plant data now nicely recorded in jamovi, it is time to compute some descriptive statistics and plots. Using the appropriate processes, complete the following questions:

If you are not sure how to approach one or more of the jamovi-related questions below, use the steps outlined in Questions 2 and 3 as a guide.

4.3.1

Produce a descriptives table and a bar plot for the treatment groups.

4.3.2

What percentage of seedlings (that survived) were in each treatment group?

4.3.3

Create histograms of the seedling heights, for each treatment group.

4.3.4

What is the overall sample size for this data?

4.3.5

What is the mean seedling height, overall and for each treatment group? Which treatment group has the highest mean height?

4.3.6

Provide a brief interpretation of the 95% confidence intervals reported.

Using these confidence intervals, can you conclude anything about the mean heights of the seedlings in the different treatment groups?

4.3.7

Take a few minutes to think about the study design for this experiment. When preparing the experiment, what are some key factors or steps that may have been considered? Discuss your thoughts with other students and/or the lab demonstrator.

4.3.8

Copy all the relevant jamovi output into your Word document, for your records.


5 Lyrebird Paper Review 🌳

<span style='font-size:10px;'>Note. From File:Menura novaehollandiae novaehollandiae - Flickr - Lip Kee.jpg, by [Lip Kee](https://www.flickr.com/people/64565252@N00), 2014, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY-SA 2.0 DEED](https://creativecommons.org/licenses/by-sa/2.0/deed.en)</span>

Figure 5.1: Note. From File:Menura novaehollandiae novaehollandiae - Flickr - Lip Kee.jpg, by Lip Kee, 2014, Wikimedia Commons (https://commons.wikimedia.org/). CC BY-SA 2.0 DEED

In the recent ED lecture slides, the paper by Nugent et al. (2014) on the interactions between lyrebirds and fire in south-eastern Australia was discussed.

This paper contains several tables of descriptive and inferential statistics.

The Microsoft AI tool Copilot was used to summarise the statistical information in the paper, via the following prompt:

  • Summarise the statistical information and key statistical results from the following paper: Nugent, D. T. and Leonard, S. W. J. and Clarke, M. F. (2014). Interactions between the superb lyrebird (Menura novaehollandiae) and fire in south-eastern Australia. Wildlife Research, 41(3), 203-211

The AI output is included in the chunk below (minimised by default). Click on the symbol to expand the section, and have a quick look over the output.

AI answer

Certainly! In the study titled “Interactions between the superb lyrebird (Menura novaehollandiae) and fire in south-eastern Australia,” conducted by Daniel T. Nugent, Steven W. J. Leonard, and Michael F. Clarke, the researchers investigated the relationship between superb lyrebirds and fire in fire-prone Eucalyptus forests. Here are the key statistical findings:

Lyrebird Foraging and Litter Fuel Loads: Lyrebirds are important ecosystem engineers that accelerate litter decomposition in Eucalyptus forests through their foraging behavior. Lyrebird foraging significantly reduces litter fuel loads by approximately 25% (1.66 t/ha) in areas where they are free to forage. This reduction in surface fuel loads may inhibit fire spread.

Fire Behavior Modeling: Fire-behavior models were used to assess the potential influence of lyrebird scratchings on fire behavior. Lyrebird foraging led to a lower likelihood of fire occurrence and less intense fire.

Distribution and Foraging Patterns: Lyrebirds were present in both unburnt and ground-burnt sites at the landscape scale. They favored foraging in unburnt patches within patchily burnt sites. The distinct vegetation structure and composition between burnt and unburnt patches influenced lyrebird distribution and foraging patterns.

Implications: Alternative stable states may emerge in Eucalyptus forests due to feedback mechanisms among lyrebirds, vegetation, and fuel accumulation. Considering the ecological role of lyrebirds is crucial for forest fuel management and conservation in extensive, fire-prone forests in south-eastern Australia

5.1

One thing you may have noticed is that, while this seems to be a helpful summary, no descriptive statistics (e.g. means, standard deviations etc) are actually discussed. This may be due to a sub-optimal prompt, but it highlights the fact that while AI tools can often assist us in our learning, it is always good to check things ourselves.

5.1.1

Access the Nugent et al. (2014) paper via the LTU library, and take a look at Tables 1-4. We will discuss the tests used later in the semester, and you are not expected to read the paper in depth - at this stage, your main objective is to try and understand the presentation of the data in these tables, specifically the mean and standard deviation descriptive statistics.

Discuss the tables’ details with other students and/or the lab demonstrator.


Great work! That concludes the DA Topic 1 jamovi computer lab.


Before you finish up, make sure to save both your Word document and your pea plant jamovi (.omv) file to your OneDrive, for future reference.


References

Green, P. T. (1997). Red crabs in rain forest on Christmas Island, Indian Ocean: activity patterns, density and biomass. Journal of Tropical Ecology, 13(1), 17-38

Nugent, D. T., Leonard, S. W. J. and Clarke, M. F. (2014). Interactions between the superb lyrebird (Menura novaehollandiae) and fire in south-eastern Australia. Wildlife Research, 41(3), 203-211

Stevens, M, Lown, A. E. and Wood, L. E. (2014). Camouflage and Individual Variation in Shore Crabs (Carcinus maenas) from Different Habitats. PLOS ONE 9(12): e115586. https://doi.org/10.1371/journal.pone.0115586


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.