The probability of an event (E) is the number of ways event E can occur divided by the total number of probable outcomes. We live in a world where decision making is based on conditions of uncertainty. It is therefore important to know the chance of a particular event occurring to aid decision making.
Here are three rules that come up all the time.
\(Pr(A|B) = \frac{P(A \cap B)}{P(B)}\)
If A and B are independent, \(Pr(A \cap B) = Pr(A)Pr(B)\), and \(Pr(A|B)=Pr(A)\).
A discrete random variable \(X\) is described by its probability mass function \(f(x) = P(X = x)\). The set of \(x\) values for which \(f(x) > 0\) is called the support. If the distribution depends on unknown parameter(s) \(\theta\) we write it as \(f(x; \theta)\) (frequentist) or \(f(x | \theta)\) (Bayesian).
Context: A single trial with two outcomes, success/failure
\(X \sim \text{Bern}(p)\) with \(p\) probability of having a success
x | \(P(X=x)\) |
---|---|
1 | \(p\) |
0 | \(1-p\) |
Example: \(X\) is the random variable being born a female
If \(X\) is the count of successful events in \(n\) identical and independent Bernoulli trials of success probability \(\pi\), then \(X\) is a random variable with a binomial distribution \(X \sim Bin(n,\pi)\)
\[f(x;n, \pi) = \frac{n!}{x!(n-x)!} \pi^x (1-\pi)^{n-x} \hspace{1cm} x \in (0, 1, ..., n), \hspace{2mm} \pi \in [0, 1]\]
Context: Total number of successes from a fixed number of independent Bernoulli trials, all with same probability of success
\(X \sim \text{Bin}(N,p)\) with \(p\) probability of having a success and \(N\) number of trials
\[P(X=x) = {{N!}\over{x!(N-x)!}}p^x(1-p)^{N-x} = \binom{N}{x}p^x(1-p)^{N-x}\]
Example: \(X\) is the random variable number of heads in a series of coin flipping
\[P(X=x) = \binom{N}{x}p^x(1-p)^{N-x}\]
\(x\) | \(P(X=x)\) |
---|---|
0 | \((1-p)^N\) |
1 | \(Np(1-p)^{N-1}\) |
… | … |
N | \(p^N\) |
Let’s say \(X \sim \text{Bin}(N=10,p=0.5)\) is a random variable counting the number of males. What is the probability of having at most 1 male?
\(N=4,p=\frac{4}{5}\)
\[P(X>2)= P(X=3)+P(X=4)+P(X=5)+...\]
which is generally
\[P(X>2)=1-P(X \leq 2)=P(X=0)+P(X=1)+P(X=2)\] which is equal to
\[\binom{4}{0}\frac{4}{5}^0(1-\frac{4}{5})^{4-0}+\binom{4}{1}\frac{4}{5}^1(1-\frac{4}{5})^{4-1}+\binom{4}{2}\frac{4}{5}^2(1-\frac{4}{5})^{4-2}\]
choose(4,0)*(4/5)^0*(1-4/5)^4+
choose(4,1)*(4/5)^1*(1-4/5)^(4-1)+
choose(4,2)*(4/5)^2*(1-4/5)^(4-2)
## [1] 0.1808
\[P(X>=3)= P(X=3)+P(X=4)+P(X=5)+...\]
same as above
Fortunately, R
has this pre-programmed
dbinom(x , size , prob )
dbinom(x = 0, size = 4, prob = 4/5) +
dbinom(x = 1, size = 4, prob = 4/5)+
dbinom(x = 2, size = 4, prob = 4/5)
## [1] 0.1808
rbinom
, dbinom
Context: Number of occurrences of an event over a given unit of space or time.
\(X \sim \text{Poisson}(\lambda)\) with \(\lambda\) expected number of occurrences
\[P(X=x) = {{e^{-\lambda}\lambda^x}\over{x!}}\]
Example: \(X\) is the random variable number of birds counted on a colony during the breeding season
\[P(X=x) = {{e^{-\lambda}\lambda^x}\over{x!}}\]
\(x\) | \(P(X=x)\) |
---|---|
0 | \(e^{-\lambda}\) |
1 | \(\lambda e^{-\lambda}\) |
… | … |
Example: A small town’s police department issues 5 speeding tickets per month on average. Using a Poisson random variable, what is the likelihood that the police department issues 3 or fewer tickets in one month?
First, we note that here \(P(Y \le 3) = P(Y=0) + P(Y=1) + \cdots + P(Y=3)\). Applying the probability mass function for a Poisson distribution with \(\lambda = 5\), we find that
\[\begin{align*} P(Y \le 3) &= P(Y=0) + P(Y=1) + P(Y=2) + P(Y=3) \\ &= \frac{e^{-5}5^0}{0!} + \frac{e^{-5}5^1}{1!} + \frac{e^{-5}5^2}{2!} + \frac{e^{-5}5^3}{3!}\\ &= 0.27. \end{align*}\]
We can verify through R:
sum(dpois(0:3, lambda = 5))
## [1] 0.2650259
Therefore, there is a 27% chance of 3 or fewer tickets being issued within one month.
rpois
, dpois
Context: Distribution of “adding lots of things together”. Derived from Central Limit Theorem, which says that if you add a large number of independent samples from the same distribution the distribution of the sum will be approximately normal.
\(X \sim \text{Normal}(\mu,\sigma^2)\) where \(\mu\) is the mean and \(\sigma^2\) the variance
\[f(x) = {{1}\over{\sqrt{2\pi\sigma}}}\exp\left( - {{(x-\mu)^2}\over{2\sigma^2}} \right)\]
Example: Practically everything.
Most things in nature follow a normal distributon. Think of exam scores at your university, weight and heights of students or weight of newborn babies.
rnorm
, dnorm
Example 1: The weight of a box of Fruity Tootie cereal is approximately normally distributed with an average weight of 15 ounces and a standard deviation of 0.5 ounces. What is the probability that the weight of a randomly selected box is more than 15.5 ounces?
Using a normal distribution,
\[\begin{align*} P(Y > 15.5) = \int_{15.5}^{\infty} \frac{e^{-(y-15)^2/ (2\cdot 0.5^2)}}{\sqrt{2\pi\cdot 0.5^2}}dy = 0.159 \end{align*}\]
However the formula above is hard to work with hence the need to standardize and use tables.
Y
into
Z
such that \[Z=\frac{Y-\mu}{\sigma}\]therefore we would calculate as follows
\[P(Y > 15.5)=P(Z> \frac{15.5-15}{0.5})=P(Z> 1)\]
in high school or varsity the above would be found in statistical tables such that \[P(Z>1)=1-P(Z<1)=1-\Phi(1)\]
but it is easier to do in R such that using pnorm()
pnorm(1, mean = 0, sd = 1, lower.tail = FALSE)
## [1] 0.1586553
We can use R with the originial values as well:
pnorm(15.5, mean = 15, sd = 0.5, lower.tail = FALSE)
## [1] 0.1586553
There is a 16% chance of a randomly selected box weighing more than 15.5 ounces.
__example:2__ Suppose IQ scores are distributed \(X \sim N\left(100, 16^2\right)\). What is the probability that a randomly selected person’s IQ is less than 90.
in R
pnorm(q = 90, mean = 100, sd = 16, lower.tail = TRUE)
## [1] 0.2659855
example 3: What is the probability that a randomly selected person’s IQ is greater than 90
in R
pnorm(q = 140, mean = 100, sd = 16, lower.tail = FALSE)
## [1] 0.006209665
example 3: What is the probability that a randomly selected person’s IQ is between 92 and 114.
in R
pnorm(q = 114, mean = 100, sd = 16, lower.tail = TRUE) -
pnorm(q = 92, mean = 100, sd = 16, lower.tail = TRUE)
## [1] 0.5006755
If has nice properties, such as: if \(X \sim \text{N}(\mu,\sigma^2)\), then \(Z = \displaystyle{{{X - \mu}\over{\sigma}} \sim \text{N}(0,1)}\)
It is a limiting distribution (Central Limit Theorem)
It can be a good approximation for other distributions
By the central limit theorem (CLT) the binomial distribution \(X \sim B(n,p)\) approaches the normal distribution with mean \(\mu = n p\) and variance \(\sigma^2=np(1-p)\) as \(n \rightarrow \infty\). The approximation is useful when the expected number of successes and failures is at least 5, \(np>=5\) and \(n(1-p)>=5\).
Exact binomial
pbinom(q = 460, size = 1000, prob = 0.50, lower.tail = TRUE)
## [1] 0.006222073
Normal approximation
pnorm(q = 460, mean = 0.50 * 1000, sd = sqrt(1000 * 0.50 * (1 - 0.50)), lower.tail = TRUE)
## [1] 0.005706018
suppose
\(X \sim \text{Bin}(N=50,p=0.3)\)
Mean is \(Np = 50 \times 0.3 = 15\)
Variance is \(Np(1-p) = 50 \times 0.3 \times 0.7 = 10.5\)
Therefore, \(X\) can be approximated by \(Y \sim \text{N}(15,\sigma=\sqrt{10.5})\)
Distribution Name | pmf / pdf | Parameters | Possible Y Values | Description |
---|---|---|---|---|
Binomial | \({n \choose y} p^y (1-p)^{n-y}\) | \(p,\ n\) | \(0, 1, \ldots , n\) | Number of successes after \(n\) trials. |
Geometric | \((1-p)^yp\) | \(p\) | \(0, 1, \ldots, \infty\) | Number of failures until the first success. |
Negative Binomial | \({y + r - 1\choose r-1} (1-p)^{y}(p)^r\) | \(p,\ r\) | \(0, 1, \ldots, \infty\) | Number of failures before \(r\) successes. |
Hypergeometric | \({m \choose y}{N-m \choose n-y}\big/{N \choose n}\) | \(n,\ m,\ N\) | \(0, 1, \ldots , \min(m,n)\) | Number of successes after \(n\) trials without replacement. |
Poisson | \({e^{-\lambda}\lambda^y}\big/{y!}\) | \(\lambda\) | \(0, 1, \ldots, \infty\) | Number of events in a fixed interval. |
Exponential | \(\lambda e^{-\lambda y}\) | \(\lambda\) | \((0, \infty)\) | Wait time for one event in a Poisson process. |
Gamma | \(\displaystyle\frac{\lambda^r}{\Gamma(r)} y^{r-1} e^{-\lambda y}\) | \(\lambda, \ r\) | \((0, \infty)\) | Wait time for \(r\) events in a Poisson process. |
Normal | \(\displaystyle\frac{e^{-(y-\mu)^2/ (2 \sigma^2)}}{\sqrt{2\pi\sigma^2}}\) | \(\mu,\ \sigma\) | \((-\infty,\ \infty)\) | Used to model many naturally occurring phenomena. |
Beta | \(\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} y^{\alpha-1} (1-y)^{\beta-1}\) | \(\alpha,\ \beta\) | \((0,\ 1)\) | Useful for modeling probabilities. |