Topic 3: Probability and Distributions


🏡 In Topic 3 we introduced the concept of probability. In this computer lab, we will cover how to carry out probability calculations in jamovi, focusing on important continuous and discrete random variables and probability distributions.

By the end of this lab, you should feel confident in using and distinguishing between R functions for computing the density, distribution function, quantile function, and random generation of data, for the Normal and Binomial distributions.

In Lecture 4 (Part 2), we covered material that will be very helpful for today’s computer lab. If you have not already watched Lecture 4 (Part 2), you may wish to do so now.


After working through the questions in this computer lab, you will be ready to complete Quiz 4. If you have time during today’s lab, you may like to work on the quiz.

1 jamovi modules and using the Rj editor in jamovi

💻 To calculate probabilities in jamovi, we will be using the Rj editor within jamovi, which can be accessed by installing the Rj editor module. Please watch the following video before proceeding. It contains important information about using the Rj editor, including displaying your results to a required number of decimal places and rounding.


Note: If you are using jamovi cloud, you may not be able to install jamovi modules. If this is the case, you will need to use jamovi via the La Trobe virtual desktop. Please see LMS for details, or check with your computer lab demonstrator if you are unsure.

2 Using the norm R functions

🏡 In this question, we will learn how to use the various norm functions in R, which are related to the Normal Distribution (hence norm). More specifically, we will become familiar with the pnorm and qnorm functions.

Firstly, recall that we express the distribution of a normally distributed random variable \(X\) as \[X \sim N(\mu, \sigma^2).\] Notice that we use the \(\color{blue}{\text{variance}, \sigma^2}\), when specifying the \(\color{blue}{\text{distribution}}\).

\(\color{red}{\text{Be careful:}}\) When we use the various norm functions \(\color{red}{\text{in R}}\), we specify the \(\color{red}{\text{standard devation (sd)}, \sigma.}\) rather than the \(\color{blue}{\text{variance}}\).

As we can see below, each function has an additional letter preceding norm, that specifies the aspect of the Normal distribution in which we are interested. Hence we have:

\(\color{blue}{\text{Note:}}\) If we do not specify mean and sd, R will assume we want to calculate values from the Standard Normal Distribution. That is, R will assume we have mean = 0 and sd = 1.

To help us further understand how these functions work, we will now consider some examples.

\(\color{blue}{\text{Note:}}\) A walkthrough video for sections 2.1 - 2.1.2 is included below. Give these questions a go, and if you find yourself stuck at any stage, please refer to the video. Please note that while the calculations in this video are conducted within the RStudio interface, which may appear unfamiliar, the same code used in the video can be used in the Rj editor in jamovi.

2.1 pnorm

🏡 Suppose we would like to know \(P(X \leq 1)\), the probability that the random variable \(X\) has an observed value less than or equal to 1 (where \(X \sim N(0,1)\)).

To calculate this, we can use the pnorm function as follows:

pnorm(1, mean = 0, sd = 1)

This gives us a value of 0.84. This probability is equivalent to computing the area under the standard Normal curve to the left of \(x = 1\). This area is shaded in blue in the plot below:

2.1.1

🏡 To help solidify our understanding of the pnorm function, let’s consider the probability that the random variable \(X\) has an observed value less than or equal to a different value, i.e. 1.5. \(P(X \leq 1.5)\) corresponds to the shaded blue area in the plot below:

If we compute this probability in R, we obtain the value 0.9332. See if you can use the pnorm function in R to verify this result.

2.1.2

🏡 If we compare the probabilities \(P(X \leq 1)\) and \(P(X \leq 1.5)\), and check the plots shown in 2.1 and 2.1.1, we can see that by increasing our observed \(x\) value from 1 to 1.5, we have covered an additional area of roughly 0.092 under the curve. This additional area is shaded red in the plot below, and represents the probability that X is between 1 and 1.5. That is, that \(P(1 \leq X \leq 1.5)\):

To compute this value in R, we can use pnorm in the following manner:

pnorm(1.5, mean = 0, sd = 1) - pnorm(1, mean = 0, sd = 1)

Note that here, the first use of pnorm is equivalent to calculating the blue shaded area in 2.1.1, while the second use of pnorm is equivalent to calculating the blue shaded area in 2.1. By subtracting the second area from the first (i.e. pnorm(1.5, mean = 0, sd = 1) - pnorm(1, mean = 0, sd = 1)), we end up with just the red shaded area shown in 2.1.2, i.e. \(P(1 \leq X \leq 1.5)\), as desired.

2.1.3

💻 As you can see, not only can we use pnorm to compute the probability that the random variable \(X\) has an observed value less than or equal to some value (e.g. \(x=1\)), but we can also use pnorm to compute the probability that the random variable \(X\) takes an observed value between two arbitrary values (e.g. between \(x=1\) and \(x=1.5\)).

To conclude this section, use the pnorm function to determine the following probabilities. For each of these calculations, try and sketch (by hand, not using R!) a picture of the standard Normal distribution, and then fill in the relevant area under the curve.

\(\color{blue}{\text{Note:}}\) A walkthrough video for questions a-e is included below. Give these questions a go, and refer to the video if you find yourself stuck at any stage. Please note that while the calculations in this video are conducted within the RStudio interface, which may appear unfamiliar, the same code used in the video can be used in the Rj editor in jamovi.


🎧 Online students 💬 For each sub question selected by the facilitator, enter your answer at the relevant location of the shared google doc.


Remember: In some assessments, you will need to provide results rounded to a certain number of decimal places. If you have not already done so, make sure to watch the video in Question 1 which contains important information about using the Rj editor, including displaying your results to a required number of decimal places and rounding.

  1. \(P(X \leq 2)\) for \(X \sim N(0, 1)\).
  2. \(P(X \leq -1)\) for \(X \sim N(0, 1)\).
  3. \(P(X \leq 1)\) for \(X \sim N(2, 1)\).
  4. \(P(X \leq 1)\) for \(X \sim N(0, 2)\). Hint: always remember to specify the standard deviation in R rather than the variance. In R, the square root can be calculated using sqrt. Try the following code, but fill in the blanks where there are ...’s: pnorm(..., mean = ..., sd = sqrt(...))
  5. \(P(1 \leq X \leq 2)\) for \(X \sim N(0, 3)\).
  6. \(P(-1 \leq X \leq 1)\) for \(X \sim N(0, 2^2)\).
  7. \(P(X \geq 2)\) for \(X \sim N(0, 5)\). \(^\dagger\)
  8. \(P(X \geq -1)\) for \(X \sim N(0, 3^2)\). \(^\dagger\)
  9. \(P(X \leq -2) + P(X \geq 2)\) for \(X \sim N(0, 1)\).

\(^\dagger\)Hint: For g and h, think about what the total area under a valid probability density curve must equal.

2.2 qnorm

💻 Note that in 2.1, our distribution function value was \(P(X \leq 1) = 0.8413\).

Suppose that we knew \(X \sim N(0,1)\), and were given the probability value 0.8413447, but did not know the value of \(x\) to which it corresponded (i.e. the quantile value). In this instance, we could calculate this value of \(x\) using the R function qnorm as follows:

    qnorm(0.8413447, mean = 0, sd = 1)
## [1] 0.9999998

As you can see, this gives us our quantile value of \(x = 1\) (for a reasonable number of decimal places of accuracy). To help visualise this, take a look at the plot below:

Note here that the area under the curve to the left of the quantile \(x=1\) (shown with the dashed red line) is equal to \(P(X \leq 1) = 0.8413447\) (shown in blue).

2.2.1

💻 To conclude this section, use the qnorm function to determine the following quantiles. Remember, for each of these calculations, it might be helpful to sketch (by hand, not using R!) a picture of the standard Normal distribution, and then fill in the relevant area under the curve (roughly) to determine the position of the quantile value.

Make sure to round your answers to 2 decimal places.

🎧 Online students 💬 For each sub question selected by the facilitator, enter your answer at the relevant location of the shared google doc.


  1. Find \(x\) if \(P(X \leq x) = 0.9772499\) for \(X \sim N(0,1)\).
  2. Find \(x\) if \(P(X \leq x) = 0.5\) for \(X \sim N(0,1)\).
  3. Find \(x\) if \(P(X \leq x) = 0.7733726\) for \(X \sim N(0,1)\).
  4. Find \(x\) if \(P(X \leq x) = 0.2742531\) for \(X \sim N(1,1)\).
  5. Find \(x\) if \(P(X \leq x) = 0.7421539\) for \(X \sim N(3,2)\).

3 Using the binom R functions

🏡 Hopefully, working through 2 has made the differences and usages of the different norm functions much clearer.

As you may have guessed, these density, distribution, quantile and random number generating functions are not restricted to the Normal distribution.

Let us now consider applying what we have learnt in 2 to the Binomial Distribution, which is a discrete distribution. In discrete distributions, we have a finite number of possible outcomes, and each of these outcomes has a probability of occurring, with the sum of these probabilities equal to 1.

3.1 Playing Cards Example

🏡 Recall the playing cards example introduced in the Topic 3 Workshop. In this example, we guessed the suit of a playing card (hearts, diamonds, spades or clubs) 10 times, with replacement, from a standard deck of playing cards.

Since there are only 4 suits, and each suit has the same number of cards, the probability of a correct guess was 25%, or 0.25.

Your number of correct guesses, which we will refer to as \(X\), had a range from 0 up to 10. However, given the probability of a correct guess was 25%, it is highly unlikely that you will have made a large number of correct guesses (if you did, well done!).

We can actually quantify the probability associated with making a certain number of correct guesses, using the Binomial distribution.

Let’s take a look at how the Binomial distribution works.

3.2 The Binomial Distribution

🏡 Suppose we have \(n\) “trials”, each with an outcome of either “success” or “failure”. Further suppose that for each trial, the probability of “success” is equal to \(p\), and that \(X\) is the number of “successes” from the \(n\) trials. We can model \(X\) using the Binomial Distribution. In mathematical notation, we define the distribution as \[ X \sim BIN(n, p),\] where:

  • \(X\) is the number of successes
  • \(n\) is the number of trials
  • \(p\) is the probability of success for each trial.

If, for example, we are interested in the probability of some number \(x\) successes (out of the \(n\) trials), we can write this probability down as \(P(X = x)\).

Given in our playing cards example we have \(n = 10\) trials with a probability of “success” of \(p = 0.25\) for each trial, we define the distribution in this example as: \[X \sim BIN(10, 0.25).\]

3.3 Overview of binom R functions

🏡 The below table provides an overview of the binom functions we will be using in this question:

Notes:

  1. In R, the first argument in the pbinom function is called q. However in the above table, we have referred to it as x for ease of notation.
  2. Although not required for this computer lab, the qbinom and rbinom functions are also available in R. If you would like to learn more about these functions, you can find more information in the help file, which can be opened by running ?qbinom.

3.4 dbinom

🏡 Let’s try computing \(P(X=x)\) in R, for a few different values of \(x\). To do so, we will be using the dbinom function.

Recalling that in our playing cards example we have \(n = 10\) trials, each with a probability of success of \(p = 0.25\), we can compute \(P(X=1)\) by running following code:

dbinom(1, 10, 0.25)

Make sure you understand the arguments here before proceeding. Using the above code, if you guessed the suit 10 times, what is the probability of guessing correctly exactly once?

\(\color{blue}{\text{Note:}}\) A walkthrough video for this question is included below. Please refer to the video if you find yourself stuck at any stage with the dbinom coding process. If you feel comfortable with the process, proceed to section 3.4.1. Please note that while the calculations in this video are conducted within the RStudio interface, which may appear unfamiliar, the same code used in the video can be used in the Rj editor in jamovi.

3.4.1

💻 Using the details in 3.4 as a guide, compute the following probabilities for \(X \sim BIN(10, 0.25)\):

🎧 Online students 💬 For each sub question selected by the facilitator, enter your answer at the relevant location of the shared google doc.


  1. \(P(X=0)\)
  2. \(P(X=2)\)
  3. \(P(X=3)\)
  4. \(P(X=9)\)
  5. \(P(X=10)\)

What do you notice about these results?

3.5 pbinom

💻 We can use the pbinom function to compute \(P(X \leq x)\) for \(X \sim BIN(n, p)\).

Use pbinom to determine the following probabilities assuming that \(X \sim BIN(10, 0.25)\):

🎧 Online students 💬 For each sub question selected by the facilitator, enter your answer at the relevant location of the shared google doc.


  1. The probability of making 6 or less correct card guesses out of the 10 trials?
  2. The probability of making 3 or less correct card guesses out of the 10 trials?
  3. The probability of making more than 3 correct card guesses out of the 10 trials?
  4. The probability of making more than 8 correct card guesses out of the 10 trials?
  5. The probability of making 7 or 8 correct card guesses out of the 10 trials? (Think carefully about this.)
  6. \(P(X \leq 6)\)
  7. \(P(X < 6)\) (think carefully about this)
  8. \(P(X > 8)\)
  9. \(P(X \geq 8)\)
  10. \(P(5 \leq X < 8)\)

3.5.1

💻 Suppose that one student claims they carried out the experiment twice, and guessed 7 out of 10 cards correctly in the first experiment, and then 8 out of 10 cards correctly, in the second experiment. Do you believe them?


4 Overlaying a density curve on a Histogram to assess normality

💻 Recall the happiness_income_2019.csv data (Gapminder 2021) that we considered in Computer Lab 3, which contains data on the 163 surveyed countries for 2019. Download this file from the LMS now, and save it in a relevant location on your PC.

Once you have done so, import the happiness_income_2019.csv file in jamovi. For revision on how to do this, see Computer Lab 1.

4.1

💻 In jamovi, create a histogram for both the income_2019 and happiness_2019 variables, and include the density on both plots.

🎧 Online students 💬 Volunteer to share your screen and explain your answers to this question.


4.2

💻 Looking at your histograms with densities overlaid, what conclusions can you make about the data? Do you think that either of the histograms you just created looks normally distributed? List some details about each histogram to support your conclusions.


Well done, that’s everything for today. There were quite a few questions, so if you didn’t manage to complete everything during the lab, don’t worry - just make sure to finish off any remaining questions before the next lab.

If you still have time, you may like to have a go at Quiz 4, which is based on the Topic 4 readings.

Before you finish up, remember to save your work (e.g. your jamovi and Word files) somewhere safe (e.g. OneDrive) so that you can access it at a later time.


References

Gapminder. 2021. “Happiness Score (WHR) [.csv File].” 2021. http://gapm.io/dhapiscore\_whr.


These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.