STM1001 Topic 3 Lecture

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 [Topic 3](https://bookdown.org/a_shaker/STM1001_Topic_3/) Lecture
## Probability and Distributions
### La Trobe University
This lecture complements the [Topic 3 readings](https://bookdown.org/a_shaker/STM1001_Topic_3/)

---

# Topic 3: Related Links

.pull-left[
## Readings
[Topic 3 Readings](https://bookdown.org/a_shaker/STM1001_Topic_3/)

## Notation

[Notation for Topics 3 and 4: Probability, Distributions and Sampling Distributions](https://bookdown.org/a_shaker/STM1001_Topic_0/notation-summary.html#topics-3-and-4-probability-distributions-and-sampling-distributions)

]

.pull-right[
## Maths Background

* [Squares, square roots and powers](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#squares-square-roots-and-powers)

* [Scientific notation and E-notation](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#scientific-notation-and-e-notation)

* [Division Symbols](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#division-symbols)

* [Equality and inequality operators](https://bookdown.org/a_shaker/STM1001_Topic_0/maths-background.html#equality-and-inequality-operators)

]

---

background-image: url(data:image/png;base64,#Cards.jpg)
background-position: 100% 50%
background-size: 50% 100%

.pull-left[
# Probability game (Kahoot)

* We have used a [playing card shuffler](https://www.random.org/playing-cards/?cards=1&decks=1&spades=on&hearts=on&diamonds=on&clubs=on&aces=on&twos=on&threes=on&fours=on&fives=on&sixes=on&sevens=on&eights=on&nines=on&tens=on&jacks=on&queens=on&kings=on&remaining=on) for the following game

* Your task is to see how many you can guess of the correct suit

* Prize for anyone who gets 10/10 correct guesses!
]

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Kahoot!

## Go to [www.kahoot.it](https://www.kahoot.it) and use

## the code provided

---

# Topic 3: Probability and Distributions

**Overview**

---
name: stat
class: middle
background-image: url(data:image/png;base64,#slide_1.png)
background-size: 110%

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_5.png)
background-size: 100%

---

# Some probability notation

## Sample Space

* Each time a card was randomly chosen in the game, what was your probability of guessing the correct suit?

`$\frac{1}{4}$` or `$0.25$`

* Focusing on just one 'trial' (one card selection), what are all of the possible outcomes?

[club, diamond, heart, spade]

* From [this topic's readings](https://bookdown.org/a_shaker/STM1001_Topic_3/1-1-some-probability-notation.html), this is called `$\Omega$` (a Greek letter pronounced 'omega'), the **sample space**: the set of all possible *outcomes* of an experiment

* For just one trial, we have `$\Omega = \{\text{club, diamond, heart, spade\}}$`

---

## Events

A ***simple event*** is any event with *just one outcome*.

* From our example, each of `$\{\text{club\}}$`, `$\{\text{diamond\}},$` etc. are simple events

An ***event*** can be slightly more complicated: a set of *one or more outcomes* from the sample space. For example:

--
  
* Let `$A$` be the event that a heart is chosen

* Let `$B$` be the event that a red suit is chosen
    
* Let `$C$` be the event that neither a red suit nor a black suit is chosen
    
--

* These events can be written down as:
`$$A = \{\text{heart\}}, B = \{\text{diamond, heart\}}, C = \emptyset$$`

* `$A, B$`, and `$C$` are all ***events***

* In addition, `$A$` is also a ***simple event***

* The symbol `$\emptyset$` denotes a ***null event*** (or ***null set***). Recalling `$C$` is the event that no red or black suits are chosen, it makes sense that `$C = \emptyset$`, because there are no outcomes within `$\Omega$` where this can occur.

---

# Some probability facts

Probabilities are always between 0 and 1, so that, for some event `$A$`:

* `$P(A) = 0$` means that event `$A$` will definitely not occur

* `$P(A) = 0.5$` means that event `$A$` is equally likely to occur or not occur
  
--

* `$P(A) = 1$` means that event `$A$` will definitely occur
  
--

* Probabilities cannot be negative - the smallest a probability can be is 0

* `$P(\Omega) = 1$`. For example, when we draw a card, we know exactly one of the outcomes from the sample space will definitely occur (either a club or spade or heart or a diamond)

---

# Some probability facts

* For one trial, we know the probability of getting a ‘heart’ is 0.25. What is the probability of NOT getting a heart?

* We can use the ***complement rule*** to answer this:

.content-box-blue[
.center[
**The complement rule**
]
`$$P(A^C) = 1 - P(A),$$`
where:

* `$A^C$` means "not `$A$`". That is, `$A^C$` is the complement of `$A$`.
]

---

# Equally likely events

Selecting one playing card from a deck is an example of an experiment with ***equally likely events***.

* If all simple events in `$\Omega$` are equally likely to occur, their probability can simply be calculated as

`$$\displaystyle\frac{1}{\text{number of simple events in }\Omega}$$`

* Therefore, each time a card is drawn, the probability of guessing the correct suit is `$$\displaystyle\frac{1}{\text{number of simple events in }\Omega} = \frac{1}{4}$$`

---

# Random variables

* Considering all 10 'trials' now, we know that for each trial, you had a 0.25 chance of guessing the correct answer

* We can let `$X$` denote the number of times you guessed correctly

* Then the possible values for `$X$` are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

* We can use something called the ***Binomial Distribution*** to see your chances for each one of these values of `$X$`

* The distribution on the next slide visualises the probabilities associated with each potential outcome (e.g. 0 correct guesses, 1 correct guess, etc)

---

---

# Random variables

In the previous example, `$X$` was a ***discrete random variable*** because we could write down the possible values of `$X$` sequentially with a recognisable pattern. In other words, they are ***countable***.

We can also have ***continuous random variables*** such as height, weight or age.

* The ***mean***, ***variance*** and ***standard deviation*** of a random variable `$X$` (whether continuous and discrete) can be denoted as follows:

.content-box-blue[
.center[
**Expected value, variance and standard deviation of a random variable**
]
* The ***expected value*** (or ***mean***) of `$X$` can be denoted `$\text{E}(X) = \mu$`

* The ***variance*** of `$X$` can be denoted `$\text{Var}(X) = \sigma^2$`

* The ***standard deviation*** of `$X$` is the square root of the variance and can be denoted `$\text{SD}(X) = \sqrt{\sigma^2} = \sigma$`
]

---

# The normal distribution

.pull-left[
* Bell-shaped and symmetric

* Total area under the curve is 1 (this includes all possible values for `$X$`)

* Can be specified by key parameters: the mean, `$\mu$`, and the standard deviation, `$\sigma$`, or alternatively the variance, `$\sigma^2$`

* We can express the distribution of an arbitrary normally distributed random variable `$X$` as `$X \sim N(\mu, \sigma^2)$`

* In this diagram, we have that `$X \sim N(0, 1)$`. This is known as the ***standard normal distribution***
]

.pull-right[

<img src="data:image/png;base64,#Topic_3_Lecture_files/figure-html/unnamed-chunk-3-1.svg" style="display: block; margin: auto;" />
]

---

# The normal distribution

Compare the following distributions:

---

# The normal distribution

* Recall that we express the distribution of a normally distributed random variable `$X$` as `$X \sim N(\mu, \sigma^2)$`

* When writing down the distribution, it is sometimes convenient to express the variance as some number squared, so that we can easily see what the standard deviation is

* For example, knowing Plot D has `$X \sim N(5, 1.5^2)$`, we have that:

* `$\mu = 5$` and 
  * `$\sigma^2 = 1.5^2$`

* Since the variance is expressed as `$\sigma^2 = 1.5^2$`, we immediately know that the standard deviation is `$\sigma = 1.5$`, since `$\sigma = \sqrt{\sigma^2} = \sqrt{1.5^2} = 1.5$`

---
# The normal distribution: Some rules of thumb

Where data are normally distributed, there are three very useful rules of thumb as follows:

.content-box-blue[
.center[
**Some rules of thumb for an approximately normal data set**
]
1. About 68% of values are within 1 standard deviation of the mean

1. About 95% of values are within 2 standard deviations of the mean

1. About 99.7% of values are within 3 standard deviations of the mean
]

We will apply these to an example on the next slide.

---
# The normal distribution: Some rules of thumb

Let's assume university students' heights follow a `$N(172.38, 9.85^2)$` distribution.

* Then we could guess the following:

1. About 68% of students' heights are within the range `$172.38 \pm 1\times9.85 = (172.38 -  9.85, 172.38 + 9.85) = (162.53, 182.23)$`

1. About 95% of students' heights are within the range `$172.38 \pm 2\times 9.85 = (172.38 - 19.7, 172.38 + 19.7) = (152.68, 192.08)$`

1. About 99.7% of students' heights are within the range `$172.38 \pm 3\times 9.85 = (172.38 - 29.55, 172.38 + 29.55) = (142.83, 201.93)$`

This is visually represented in the plots on the following slide.

---

---
# Standardisation

* A very useful technique commonly used in statistics

* Puts everything on a ***standard*** scale that makes comparing things easier

* If we standardise everything, we have a special case of the normal distribution: the ***standard*** normal distribution: `$Z \sim N(0, 1)$`

* The usual convention is to use `$Z$` instead of `$X$` when using the ***standard*** normal distribution

* We can ***standardise*** values with the following formula:

$$ z = \frac{x - \mu}{\sigma} $$

* This gives us a `$z$`-score

In general, `$z$`-scores close to zero tell us that the value is close to average. `$z$`-scores larger than 2 or 3 (or smaller than -2 or -3) can be considered more ***extreme***.

---
# Standardisation Examples

Recall in our height example, we had `$X \sim N(172.38, 9.85^2)$`.

* For a height of 172.38cm, we have

`$$z = (172.38 - 172.38) / 9.85 = 0 / 9.85 = 0$$`

* This makes sense, because 172.38 is the mean, so its corresponding value in the **standard** normal distribution is 0, since the standard normal distribution has a mean of 0.

* For a height of 182.23cm, we have

`$$z = (182.23 - 172.38) / 9.85 = 9.85 / 9.85 = 1$$`

* This means that 182.23 is exactly 1 sd above the mean. (Do you know how to check this?)

* For a height of 165cm, we have

`$$z = (165 - 172.38) / 9.85 = -7.38 / 9.85 \approx -0.7492$$`

* A `$z$`-score of -0.7492 tells us that the value is below average (because it is negative) but within 1 sd of the mean (because it is between 0 and -1).

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Kahoot!

## Go to [www.kahoot.it](https://www.kahoot.it) and use

## the code provided

---
# Using the Normal distribution to calculate probabilities

We can use distributions to help us calculate probabilities.

* Recall that **the total area under the normal distribution curve is always equal to 1**

---
# Using the Normal distribution to calculate probabilities

For example, recalling `$X$` denotes the height in cm of university students and is normally distributed such that `$X \sim N(172.38,9.85^2 )$`, we may wish to ask questions like:

A. What is the probability a university student's height is 172.38cm or shorter?
--
** `$P(X \leq 172.38)$` in probability notation**

B. What is the probability a university student's height is 182.23cm or shorter?
--
** `$P(X \leq 182.23)$` in probability notation**

C. What is the probability a university student's height is 182.23cm or taller?
--
** `$P(X \geq 182.23)$` in probability notation**

D. What is the probability a university student's height is between 152.68 and 192.08?
--
** `$P(152.68 \leq X \leq 192.08)$` in probability notation**

---

These probabilities can be represented visually:

---

# Useful implications

If you calculate a value's `$z$`-score and find the equivalent probability under the standard normal distribution, the probability will be exactly the same.

* For example, recall that for `$x = 172.38$` (the mean in our example), the corresponding `$z$`-score is 0. We then have that

`$$P(X \leq 172.38) = P(Z \leq 0) = 0.5$$`

We can use the **complement rule** to help work out probabilities.

* For example, we can see from the previous slide that

`$$P(X \leq 182.23) = 0.84$$`

Knowing the total area under the curve is equal to 1, it must therefore be the case that

`$$P(X \geq 182.23) = 1 - P(X \leq 182.23) = 1 - 0.84 = 0.16$$`

---

**Symmetry** is a very helpful property we can utilise. For example:
--
 
* We know that since the distribution is symmetric, the mean value of 172.38 is also the middle value (median). This means that, by symmetry, `$P(X \leq 172.38) = P(X \geq 172.38) = 0.5$`.

--
* Consider `$P(152.68 \leq X \leq 192.08) = 0.95$`, and recall that the range (152.68, 192.08) is `$\mu \pm 2\sigma$`.

* Since the probability of `$X$` being within this range is 0.95, the probability `$X$` is *not* within this range is 1 - 0.95 = 0.05, represented in Plot D by the white area in the upper and lower tails.

* By symmetry, we know that the area in the right tail is equal to the area in the left tail. Therefore, each tail has an area of 0.05 / 2 = 0.025. This means `$P(X \leq 152.68) = P(X \geq 192.08) = 0.025$`

We can use statistical software packages to help us calculate probabilities such as these, and will be learning how to do so in computer labs.

**A technical note on inequality signs for a continuous distribution**

For a continuous distribution, the probability that `$X$` is equal to a given value `$x$` is zero. This is because the area under the curve would be a vertical line, which has area zero. This means that the inequalities `$\leq$` & `$<$`, and `$\geq$` & `$>$`, are interchangeable. That is, `$P(X \leq x) = P(X < x)$` and `$P(X\geq x) = P(X> x)$`.
---

# Distribution functions

For each distribution, there is more than one ***distribution function***.

* When considering the Normal distribution so far, we have considered the ***Probability Density Function (PDF)***

Two other distribution functions are:

* The ***Cumulative Distribution Function (CDF)*** and

* The ***Quantile function***

We will describe each function now with the following example:

* We again consider the continuous random variable `$X$` that denotes the height in cm of university students and is normally distributed such that `$X∼N(172.38,9.85^2)$`.

* Considering a height of `$x = 165$` we will see how it can be represented in each type of function as follows.

---
# Probability Density Functions

For a continuous random variable `$X$`, the **Probability Density Function (PDF)** is a function that tells us the the density of probability at a given value.

Or, more usefully, the area under the curve of a PDF tells us the probability of `$X$` falling within a certain range of values.

---
# Cumulative Distribution Functions

For a continuous random variable `$X$`, the **Cumulative Distribution Function (CDF)** is a function that tells us, for a given value `$x$`, the probability that `$X$` is less than or equal to `$x$`. That is, the value of the function at `$x$` is equal to `$P(X \leq x)$`.

---
# Quantile Function

For a continuous random variable `$X$`, the **Quantile function** tells us the value of `$x$` for which the quantile would be equal to a certain value.

---

# Distribution functions

* The three functions are related mathematically: For example, using calculus, if we know the PDF, we can derive the CDF by taking the integral of the PDF

* It is also true that the quantile function is the inverse of the CDF

* In this subject however, we will be using statistical software packages to help us navigate these functions as needed

---

# Determining whether data are sampled from a Normal distribution

* Often in Statistics, it is useful to ask the question, *has this data been sampled from a Normal distribution?*

* One way to assess this is to overlay a Normal or density curve to a histogram - we will explore this further in computer labs

---

# The Binomial Distribution

* Recall the playing card example from earlier in this lecture, where the the possible number of correct guesses, `$X$`, were 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

* Also recall we can use the ***Binomial Distribution*** to see your chances for each one of these values of `$X$`:

---

# The Binomial Distribution

To conclude this lecture, we will now introduce the Binomial Distribution formally.

Suppose we have `$n$` "trials", each with an outcome of either "success" or "failure". Further suppose that for each trial, the probability of "success" is equal to `$p$`, and that `$X$` is the number of "successes" from the `$n$` trials.

Given this context, we can model `$X$` using the *Binomial Distribution*. In mathematical notation, we define the distribution as 
$$ X \sim BIN(n, p)$$
where:

* `$X$` is the number of successes

* `$n$` is the number of trials

* `$p$` is the probability of success for each trial

If, for example, we are interested in the probability of some number of successes `$x$`  (out of the `$n$` trials), we can write this probability down as `$P(X = x)$`.

---

# The Binomial Distribution

* Given in our playing cards example we have `$n = 10$` trials with a probability of "success" of `$p = 0.25$` for each trial, we define the distribution in this example as:
`$$X \sim BIN(10, 0.25).$$`

* Using this distribution, we can calculate the probability of particular events, for example:

* `$P(X = 0) = 0.056$`
    
--

* `$P(X = 1) = 0.188$`
  * etc.

We will be learning how to calculate probabilities like these in future lectures and computer labs.

---

# Other Distributions

If you are interested, you can see examples of other [continuous](https://bookdown.org/a_shaker/STM1001_Topic_3/5-other-continuous-distributions.html) and [discrete](https://bookdown.org/a_shaker/STM1001_Topic_3/6-some-discrete-distributions.html) distributions in this topic's readings.

---

background-image: url(data:image/png;base64,#computerlab.jpg)
background-position: bottom
background-size: 75%
class: center

# See you in the computer labs!

---
class: middle

<font color = "grey">
These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>
</font>