🏡 In Topic 3 we introduced the concept of probability. In this computer lab, we will cover how to carry out probability calculations in jamovi, focusing on important continuous and discrete random variables and probability distributions.
By the end of this lab, you should feel confident in using and distinguishing between R functions for computing the density, distribution function, quantile function, and random generation of data, for the Normal and Binomial distributions.
In Lecture 4 (Part 2), we covered material that will be very helpful for today’s computer lab. If you have not already watched Lecture 4 (Part 2), you may wish to do so now.
After working through the questions in this computer lab, you will be ready to complete Quiz 4. If you have time during today’s lab, you may like to work on the quiz.
💻 To calculate probabilities in jamovi, we will be using the Rj editor within jamovi, which can be accessed by installing the Rj editor module. Please watch the following video before proceeding. It contains important information about using the Rj editor, including displaying your results to a required number of decimal places and rounding.
Note: If you are using jamovi cloud, you may not be able to install jamovi modules. If this is the case, you will need to use jamovi via the La Trobe virtual desktop. Please see LMS for details, or check with your computer lab demonstrator if you are unsure.
norm
R functions🏡 In this question, we will learn how to use the various norm
functions in R, which are related to the Normal Distribution (hence norm
). More specifically, we will become familiar with the pnorm
and qnorm
functions.
Firstly, recall that we express the distribution of a normally distributed random variable \(X\) as \[X \sim N(\mu, \sigma^2).\] Notice that we use the \(\color{blue}{\text{variance}, \sigma^2}\), when specifying the \(\color{blue}{\text{distribution}}\).
\(\color{red}{\text{Be careful:}}\) When we use the various norm
functions \(\color{red}{\text{in R}}\), we specify the \(\color{red}{\text{standard devation (sd)}, \sigma.}\) rather than the \(\color{blue}{\text{variance}}\).
As we can see below, each function has an additional letter preceding norm
, that specifies the aspect of the Normal distribution in which we are interested. Hence we have:
\(\color{blue}{\text{Note:}}\) If we do not specify mean
and sd
, R will assume we want to calculate values from the Standard Normal Distribution. That is, R will assume we have mean = 0
and sd = 1
.
To help us further understand how these functions work, we will now consider some examples.
\(\color{blue}{\text{Note:}}\) A walkthrough video for sections 2.1 - 2.1.2 is included below. Give these questions a go, and if you find yourself stuck at any stage, please refer to the video. Please note that while the calculations in this video are conducted within the RStudio interface, which may appear unfamiliar, the same code used in the video can be used in the Rj editor in jamovi.
pnorm
🏡 Suppose we would like to know \(P(X \leq 1)\), the probability that the random variable \(X\) has an observed value less than or equal to 1 (where \(X \sim N(0,1)\)).
To calculate this, we can use the pnorm
function as follows:
pnorm(1, mean = 0, sd = 1)
This gives us a value of 0.84. This probability is equivalent to computing the area under the standard Normal curve to the left of \(x = 1\). This area is shaded in blue in the plot below:
🏡 To help solidify our understanding of the pnorm
function, let’s consider the probability that the random variable \(X\) has an observed value less than or equal to a different value, i.e. 1.5. \(P(X \leq 1.5)\) corresponds to the shaded blue area in the plot below:
If we compute this probability in R, we obtain the value 0.9332. See if you can use the pnorm
function in R to verify this result.
🏡 If we compare the probabilities \(P(X \leq 1)\) and \(P(X \leq 1.5)\), and check the plots shown in 2.1 and 2.1.1, we can see that by increasing our observed \(x\) value from 1 to 1.5, we have covered an additional area of roughly 0.092 under the curve. This additional area is shaded red in the plot below, and represents the probability that X is between 1 and 1.5. That is, that \(P(1 \leq X \leq 1.5)\):
To compute this value in R, we can use pnorm
in the following manner:
pnorm(1.5, mean = 0, sd = 1) - pnorm(1, mean = 0, sd = 1)
Note that here, the first use of pnorm
is equivalent to calculating the blue shaded area in 2.1.1, while the second use of pnorm
is equivalent to calculating the blue shaded area in 2.1. By subtracting the second area from the first (i.e. pnorm(1.5, mean = 0, sd = 1) - pnorm(1, mean = 0, sd = 1)
), we end up with just the red shaded area shown in 2.1.2, i.e. \(P(1 \leq X \leq 1.5)\), as desired.
💻 As you can see, not only can we use pnorm
to compute the probability that the random variable \(X\) has an observed value less than or equal to some value (e.g. \(x=1\)), but we can also use pnorm
to compute the probability that the random variable \(X\) takes an observed value between two arbitrary values (e.g. between \(x=1\) and \(x=1.5\)).
To conclude this section, use the pnorm
function to determine the following probabilities. For each of these calculations, try and sketch (by hand, not using R!) a picture of the standard Normal distribution, and then fill in the relevant area under the curve.
\(\color{blue}{\text{Note:}}\) A walkthrough video for questions a-e is included below. Give these questions a go, and refer to the video if you find yourself stuck at any stage. Please note that while the calculations in this video are conducted within the RStudio interface, which may appear unfamiliar, the same code used in the video can be used in the Rj editor in jamovi.
Remember: In some assessments, you will need to provide results rounded to a certain number of decimal places. If you have not already done so, make sure to watch the video in Question 1 which contains important information about using the Rj editor, including displaying your results to a required number of decimal places and rounding.
sqrt
. Try the following code, but fill in the blanks where there are ...
’s: pnorm(..., mean = ..., sd = sqrt(...))
\(^\dagger\)Hint: For g
and h
, think about what the total area under a valid probability density curve must equal.
qnorm
💻 Note that in 2.1, our distribution function value was \(P(X \leq 1) = 0.8413\).
Suppose that we knew \(X \sim N(0,1)\), and were given the probability value 0.8413447, but did not know the value of \(x\) to which it corresponded (i.e. the quantile value). In this instance, we could calculate this value of \(x\) using the R function qnorm
as follows:
qnorm(0.8413447, mean = 0, sd = 1)
## [1] 0.9999998
As you can see, this gives us our quantile value of \(x = 1\) (for a reasonable number of decimal places of accuracy). To help visualise this, take a look at the plot below:
Note here that the area under the curve to the left of the quantile \(x=1\) (shown with the dashed red line) is equal to \(P(X \leq 1) = 0.8413447\) (shown in blue).
💻 To conclude this section, use the qnorm
function to determine the following quantiles. Remember, for each of these calculations, it might be helpful to sketch (by hand, not using R!) a picture of the standard Normal distribution, and then fill in the relevant area under the curve (roughly) to determine the position of the quantile value.
Make sure to round your answers to 2 decimal places.
binom
R functions🏡 Hopefully, working through 2 has made the differences and usages of the different norm
functions much clearer.
As you may have guessed, these density, distribution, quantile and random number generating functions are not restricted to the Normal distribution.
Let us now consider applying what we have learnt in 2 to the Binomial Distribution, which is a discrete distribution. In discrete distributions, we have a finite number of possible outcomes, and each of these outcomes has a probability of occurring, with the sum of these probabilities equal to 1.
🏡 Recall the playing cards example introduced in the Topic 3 Workshop. In this example, we guessed the suit of a playing card (hearts, diamonds, spades or clubs) 10 times, with replacement, from a standard deck of playing cards.
Since there are only 4 suits, and each suit has the same number of cards, the probability of a correct guess was 25%, or 0.25.
Your number of correct guesses, which we will refer to as \(X\), had a range from 0 up to 10. However, given the probability of a correct guess was 25%, it is highly unlikely that you will have made a large number of correct guesses (if you did, well done!).
We can actually quantify the probability associated with making a certain number of correct guesses, using the Binomial distribution.
Let’s take a look at how the Binomial distribution works.
🏡 Suppose we have \(n\) “trials”, each with an outcome of either “success” or “failure”. Further suppose that for each trial, the probability of “success” is equal to \(p\), and that \(X\) is the number of “successes” from the \(n\) trials. We can model \(X\) using the Binomial Distribution. In mathematical notation, we define the distribution as \[ X \sim BIN(n, p),\] where:
If, for example, we are interested in the probability of some number \(x\) successes (out of the \(n\) trials), we can write this probability down as \(P(X = x)\).
Given in our playing cards example we have \(n = 10\) trials with a probability of “success” of \(p = 0.25\) for each trial, we define the distribution in this example as: \[X \sim BIN(10, 0.25).\]
binom
R functions🏡 The below table provides an overview of the binom
functions we will be using in this question:
Notes:
pbinom
function is called q
. However in the above table, we have referred to it as x
for ease of notation.qbinom
and rbinom
functions are also available in R. If you would like to learn more about these functions, you can find more information in the help file, which can be opened by running ?qbinom
.dbinom
🏡 Let’s try computing \(P(X=x)\) in R, for a few different values of \(x\). To do so, we will be using the dbinom
function.
Recalling that in our playing cards example we have \(n = 10\) trials, each with a probability of success of \(p = 0.25\), we can compute \(P(X=1)\) by running following code:
dbinom(1, 10, 0.25)
Make sure you understand the arguments here before proceeding. Using the above code, if you guessed the suit 10 times, what is the probability of guessing correctly exactly once?
\(\color{blue}{\text{Note:}}\) A walkthrough video for this question is included below. Please refer to the video if you find yourself stuck at any stage with the dbinom
coding process. If you feel comfortable with the process, proceed to section 3.4.1. Please note that while the calculations in this video are conducted within the RStudio interface, which may appear unfamiliar, the same code used in the video can be used in the Rj editor in jamovi.
💻 Using the details in 3.4 as a guide, compute the following probabilities for \(X \sim BIN(10, 0.25)\):
What do you notice about these results?
pbinom
💻 We can use the pbinom
function to compute \(P(X \leq x)\) for \(X \sim BIN(n, p)\).
Use pbinom
to determine the following probabilities assuming that \(X \sim BIN(10, 0.25)\):
💻 Suppose that one student claims they carried out the experiment twice, and guessed 7 out of 10 cards correctly in the first experiment, and then 8 out of 10 cards correctly, in the second experiment. Do you believe them?
💻 Recall the happiness_income_2019.csv
data (Gapminder 2021) that we considered in Computer Lab 3, which contains data on the 163 surveyed countries for 2019. Download this file from the LMS now, and save it in a relevant location on your PC.
Once you have done so, import the happiness_income_2019.csv
file in jamovi. For revision on how to do this, see Computer Lab 1.
💻 In jamovi, create a histogram for both the income_2019
and happiness_2019
variables, and include the density on both plots.
💻 Looking at your histograms with densities overlaid, what conclusions can you make about the data? Do you think that either of the histograms you just created looks normally distributed? List some details about each histogram to support your conclusions.
These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.