\[ \begin{array}{c|lcr} \text{x: number of heads} \\ \text {when two coins are tossed} & P(x) \\ \hline 0 & 0.25 \\ 1 & 0.50 \\ 2 & 0.25 \end{array} \]
\[ \begin{array}{c|lcr} \text{x: number of heads} \\ \text {when two coins are tossed} & P(x) \\ \hline 0 & 0.25 \\ 1 & 0.50 \\ 2 & 0.25 \end{array} \]
A phenomenon is random if individual outcomes are uncertain, but there is nonetheless a regular distribution of outcomes in a large number of repetitions.
The probability of any outcome of a random phenomenon can be defined as the proportion of times the outcome would occur in a very long series of repetitions.
A probability distribution is a description that gives the probability for each value of the random variable. It is often expressed in a format of a table, formula, or graph.
A random variable is a variable that has a single numeric value, determined by chance, for each outcome of a procedure.
A discrete random variable has a collection of values that is finite or countable.
A continuous random variable has infinitely many values, and the collection of values is not countable.
Consider the event: "Rolling a 1 of a die"
If all outcomes are equally likely,
\[{\text{relative frequency of an event}} = \frac{\text{number of outcomes in the event}}{\text{total number of outcomes}}\]
Let \(\hat{p_n}\) be the proportion of outcomes that are \(1\) after the \(n\) rolls. As the number of rolls \((n)\) increases, \(\hat{p_n}\) (the relative frequency of rolls) will converge to the probability of rolling a \(1,\space p = 1/6.\) The figure shows the convergence for \(100,000\) die rolls. The tendency of \(\hat{p_n}\) to stabilize around \(p\), i.e. the tendency of the relative frequency to stabilize around the true probability, is described by the Law of Large Numbers.
Law of Large Numbers
As more observations are collected, the observed proportion \(\hat{p_n}\) of occurrences with a particular outcome after \(n\) trials converges to the true probability \(p\) of that outcome.
The figure shows the fraction of die rolls that are \(1\) at each stage in a simulation. The relative frequency tends to get closer to the probability \(1/6 \approx 0.167\) as the number of rolls increases.
The figure shows the fraction of tosses that are heads at each stage in a simulation. The relative frequency tends to get closer to the probability \(1/2 \approx 0.50\) as the number of tosses increases.
There is a numerical (not categorical) random variable \(x\), and its number values are associated with corresponding probabilities.
\(\sum P(x)=1\), where \(x\) assumes all possible values.
\(0 \le P(x) \le 1\) for every individual value of the random variable \(x\).
\[ \begin{array}{c|lcr} \text{x: number of heads} \\ \text {when two coins are tossed} & P(x) \\ \hline 0 & 0.25 \\ 1 & 0.50 \\ 2 & 0.25 \end{array} \]
\[
\begin{array}{c|lcr}
\text{x: number of heads} \\ \text {when two coins are tossed} & P(x) \\
\hline
0 & 0.25 \\
1 & 0.50 \\
2 & 0.25
\end{array}
\] (a) Find the probability of getting two heads out of two tosses?
(b) Find the probability of getting at least one head out of two tosses?
(b) Find the probability of getting at most one head out of two tosses?
Mean \(\mu\) of a probability distribution
\(\mu = \sum [x.P(x)]\)
Variance \(\sigma^2\) for a probability distribution
\(\sigma^2 = \sum[(x-\mu)^2.P(x)] = \sum[x^2.P(x)]-\mu^2\)
Standard deviation \(\sigma\) for a probability distribution
\(\sigma = \sqrt{\sum[x^2.P(x)]-\mu^2}\)
In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents. For example, the expected value in rolling a six-sided die is \(3.5\), because the average of all the numbers that come up in an extremely large number of rolls is close to \(3.5\). The law of large numbers states that the arithmetic mean of the values almost surely converges to the expected value as the number of repetitions approaches infinity.
The expected value of a discrete random variable is the probability-weighted average of all possible values. In other words, each possible value the random variable can assume is multiplied by its probability of occurring, and the resulting products are summed to produce the expected value. The same principle applies to an absolutely continuous random variable, except that an integral of the variable with respect to its probability density replaces the sum.
The expected value is a key aspect of how one characterizes a probability distribution; it is one type of location parameter. By contrast, the variance is a measure of dispersion of the possible values of the random variable around the expected value. The variance itself is defined in terms of two expectations: it is the expected value of the squared deviation of the variable's value from the variable's expected value.
Expected value of a discrete random variable
If \(X\) takes outcomes \(x_1, x_2,...,x_m\) with probabilities \(p_1, p_2,..., p_m\) the expected value of \(X\) is the sum of each outcome multiplied by its corresponding probability:
\[ \begin{align} E(X) &= \mu_x = x_1 \times p_1 + x_2 \times p_2 +...+x_m \times p_m \\ &= \sum^{m}_{i=0}(x_i \times p_i) \end{align} \]
\(\text{Random Variable } X: \text{the number of spots on one roll of a die}\)
Probability distribution table for \(X\)
\[ \begin{array}{c|c|c|c|c|c|c} value & 1 & 2 & 3 & 4 & 5 & 6\\ \hline \text{Probability} & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 & 1/6 \end{array} \] \[ E(X) = 1.(1/6) + 2.(1/6) + 3.(1/6) + 4.(1/6) + 5.(1/6) + 6.(1/6) = 3.5 \]
Variance and standard deviation of a discrete random variable
If \(X\) takes outcomes \(x_1, x_2,...,x_m\) with probabilities \(p_1, p_2,...,p_m\) and expected value \(\mu_x = E(X),\) then to find the standard deviation of \(X\), we first find the variance and then take its square root.
\[ \begin{align} Var(X)=\sigma^2_x &= (x_1 - \mu_x)^2 \times p_1 + (x_2 - \mu_x)^2 \times p_2 +...+ (x_m - \mu_x)^2 \times p_m \\ &= \sum^m_{i=1}(x_i-\mu_x)^2 \times p_i \\ \\ SD(X) = \sigma_x &= \sqrt{\sum^m_{i=1}(x_i-\mu_x)^2 \times p_i} \\ \\ Var(X)=\sigma^2_x &= [(1 - 3.5)^2 + (2 - 3.5)^2 +...+ (6 - 3.5)^2]\times (1/6) \\ &= 2.92 \\ \\ SD(X) = \sigma_x &= \sqrt{2.92} = 1.71 \end{align} \]
\[ \begin{array}{c|lcr} \text{x: number of heads} \\ \text {when two coins are tossed} & P(x) \\ \hline 0 & 0.25 \\ 1 & 0.50 \\ 2 & 0.25 \end{array} \] Calculate expected value/average number of heads and SD from three tosses.
Probability distribution of the number of successes \((x)\) in \(3\) independent success/failure trials, each of which is a success with chance \(\frac{1}{6}.\)
\[ \begin{array}{lclr} k & \text{pattern} & \text{chance of pattern} & \text{chance of value} \\ \hline 0 & FFF & (5/6)^3 & (5/6)^3=0.5787 \\ \hline 1 & SFF & (1/6)(5/6)^2 & \\ & FSF & (1/6)(5/6)^2 & \\ & FFS & (1/6)(5/6)^2 & 3(1/6)(5/6)^2 = 0.3472 \\ \hline 2 & SSF & (1/6)^2(5/6) & \\ & SFS & (1/6)^2(5/6) & \\ & FSS & (1/6)^2(5/6) & 3(1/6)^2(5/6) = 0.0694 \\ \hline 3 & SSS & (1/6)^3 & (1/6)^3 = 0.0046 \end{array} \]
Suppose the probability of a single trial being a success is \(p\). Then the probability of observing exactly \(x\) successes in \(n\) independent trials is given by
\[ \bbox[yellow,5px] { \color{black}{P(x) = \binom{n}{x}p^x(1-p)^{n-x}=\frac{n!}{x!(n-x)!}p^x(1-p)^{n-x}} } \]
for \(x = 0,1,2,......,n\)
where
\(n\) = number of trials
\(x\) = number of successes among \(n\) trials
\(p\) = probability of success in any one trial
When \(k = 0,\) the chance of no successes (in other words, the chance of \(n\) failures in a row) is
\[ \frac{n!}{0!(n)!}p^0(1-p)^{n} = (1-p)^n \]
A random number generator draws at random with replacement from the ten digits \(0,1,2,3,4,5,6,7,8,9. \space\) Run the generator 20 times.
Find the chance that \(0\) appears once. \[binomial: n=20, p = 0.1, k = 1: \binom{20}{1}(0.1)^1(0.9)^{19} = 0.2702\]
Find the chance that \(0\) appears at most once. \[binomial: n=20, p = 0.1, k = (0,1): \\ \binom{20}{0}(0.1)^0(0.9)^{20} + \binom{20}{1}(0.1)^1(0.9)^{19} = 0.3917\]
Find the chance that \(0\) appears more than once. \[binomial: n=20, p = 0.1, k = (2,3,...,20): \\ 1-P(k=0,1)= (1- 0.3917) = 0.6083. \]
\(X:\) number of successes with probability \(p\) of success on each trial.
Probability distribution table for \(X:\)
\[ \begin{array}{c|c|c|c|c|c|c} value & 1 & 0 \\ \hline \text{Probability} & p & 1-p \end{array} \]
\[ \begin{align} \text{Average successes per trial: } E(X) &= 1 \times p + 0 \times (1-p) = p \\ \text{Expected total successes from n trials: } E(X) &= n[1 \times p + 0 \times (1-p)] = np \end{align} \]
\(X:\) number of successes with probability \(p\) of success on each trial.
Probability distribution table for \(X:\)
\[ \begin{array}{c|c|c|c|c|c|c} value & 1 & 0 \\ \hline \text{Probability} & p & 1-p \end{array} \]
\[ \begin{align} \text {for one trial} \\ SD(X) &= \sqrt{(1-p)^2 \times p + (0-p)^2 \times (1-p)} \\ &= \sqrt{(1-p)^2 \times p + p^2 \times (1-p)} \\ &= \sqrt{p(1-p)(1-p+p)} \\ &= \sqrt{p(1-p)} \\ \text {for n trials} \\ SD(X) &= \sqrt{np(1-p)} \end{align} \]
\(X\) has the binomial distribution with parameters \(n=100\) and \(p=0.5\)
\[ \begin{align} P(X=k) &= \binom{100}{k}(0.5)^k(1-0.5)^{100-k}, k = 0,1,2,...,100 \\ \\ E(X) &= 100 \times 0.5 \\ &= 50 \\ \\ SE(X) &= \sqrt{100 \times 0.5 \times 0.5} \\ &= 5 \end{align} \]
\(X\) has the binomial distribution with parameters \(n=100\) and \(p=1/6\)
\[ \begin{align} P(X=k) &= \binom{100}{k}(1/6)^k(1-1/6)^{100-k}, k = 0,1,2,...,100 \\ \\ E(X) &= 100 \times 1/6 \\ &= 16.7 \\ \\ SE(X) &= \sqrt{100 \times 1/6 \times 5/6} \\ &= 3.73 \end{align} \]
\[P(x) = \binom{100}{x}_{x=0,...,100}0.5^x(1-0.5)^{100-x}\]
A Poisson probability distribution is a discrete probability distribution that applies to occurrences of some event over a specified interval. The random variable \(x\) is the number of occurrences of the event in an interval. The interval can be time, distance, area, volume, or similar unit. The probability of the event occuring \(x\) times over an interval is given by the formula,
\[ P(x) = \frac{\mu^x. e^{-\mu}}{x!} \]
\[ \begin{aligned} where, \\ e &= 2.71828 \\ \mu &= \text{mean number of occurrences of the event in the intervals} \end{aligned} \]
For the \(55\)-year period since 1960, there were \(336\) Atlantic hurricanes. Assume that the Poisson distribution is a suitable model. Find the probability that in a randomly selected year, there are exactly \(8\) hurricanes.
The mean number of hurricanes per year
\[ \begin{aligned} \mu = \frac{\text {number of hurricanes}}{\text{number of years}} &= \frac{336}{55}=6.1 \\ \\ P(x) &= \frac{\mu^x. e^{-\mu}}{x!} \\ \\ P(x = 8) &= \frac{(6.1)^8. (2.71828)^{-6.1}}{8!} = 0.107 \end{aligned} \]
Requirements:
\[ \left. \begin{array}{l} 1. \text { } n \ge 100 \\ 2. \text { } np \le 10 \end{array} \right\} \text{indicating a rare event} \]
Mean for Poisson as an approximation to Binomial
\[ \mu = np \]
In a lottary game, you pay \(50c\) to select a sequence of four digits (0-9). If you play this game once every day, fir the probability of winning at least once in a year with 365 days.
\[ \begin{aligned} \text{The number of ways you can pick 4 digits} &= (10)^4 = 10000 \\ \text{The probability of a win is } p &= \frac{1}{10000} \\ n &= 365 \\ \mu &= 365. \frac{1}{10000} = 0.0365 \\ \text{Probability of at least one win} &= \text{1 - Probability of "no win"} \\ &= 1 - P(x = 0) \\ &= 1- \frac{\mu^x. e^{-\mu}}{x!} \\ &= 1- \frac{(0.0365)^0. (2.71828)^{-0.0365}}{0!} \\ &= 1 - 0.9642 \\ &= 0.0358 \end{aligned} \]
Two events or outcomes are called "disjoint or mutually exclusive" if they cannot both happen in the same trial.
When rolling a die, the outcomes \(1\) and \(2\) are disjoint, and we compute the probability that one of these outcomes will occur by adding their separate probabilities: \[P(1 \text{ or } 2)=P(1)+P(2)=1/6+1/6=1/3\]
What about the probability of rolling a 1, 2, 3, 4, 5, or 6?
\[ \begin{array}{ll} P(1 \text{ or } 2 \text{ or } 3 \text{ or } 4 \text{ or } 5 \text{ or }6) = P(1)+P(2)+P(3)+P(4)+P(5)+P(6) \\ =1/6+1/6+1/6+1/6+1/6+1/6 =1 \end{array} \]
If \(A_1,...,A_k\) represent \(k\) disjoint outcomes, then the probability that one of them occurs is given by: \[P(A_1\text{ or }A_2 \text{ or ... or }A_k)=P(A_1)+P(A_2)+...+P(A_k)\]
Consider a standard deck of cards.
\[ \text {4 suits} \left\{ \begin{array}{ll} \text{hearts: } \color{red}{\heartsuit} \\ \text{hearts: } \color{red}{\diamondsuit} \\ \text{hearts: } \spadesuit \\ \text{hearts: } \clubsuit \end{array} \right. \] \[\text{13 cards in each suit: } Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King\] One card is dealt from a well shuffled deck.
\[ \begin{align} P(\text{the card is an ace or a king}) &= P(\text{it's an ace})+P(\text {it's a king}) \\ & = 4/52+4/52 \\ & = 8/52 \\ & = 2/13 \end{align} \]
\[ \begin{align} & P(\text{the card is an ace or a heart}) \\ &= P(\text{it's an ace})+P(\text {it's a heart})-P(\text{it's an ace & heart}) \\ & = 4/52+13/52 - \underbrace{1/52}_{\text {adjustment made to avoid double-counting of the ace of hearts}} \\ & = 16/52 \\ & = 4/13 \end{align} \]
\[ \begin{align} P(\text{the card is an ace or a king}) &= P(\text{it's an ace})+P(\text {it's a king}) \\ & = 4/52+4/52 \\ & = 2/13 \end{align} \]
## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8])
\[ \begin{align} & P(\text{the card is an ace or a heart}) \\ & = P(\text{it's an ace})+P(\text {it's a heart})-P(\text{it's an ace AND heart}) \\ & = 4/52+13/52 - 1/52 = 16/52 \end{align} \]
## (polygon[GRID.polygon.9], polygon[GRID.polygon.10], polygon[GRID.polygon.11], polygon[GRID.polygon.12], text[GRID.text.13], text[GRID.text.14], text[GRID.text.15], text[GRID.text.16], text[GRID.text.17])
\[ \bbox[yellow,5px] {\color{black}{P(A \space or \space B) = P(A) + P(B) - P(A \space and \space B)}} \] where \(P(A \text{and} B)\) is the probability that both events occur.
Therefore,
\[ P(A \space or \space B) = P(A) + P(B)\]
The complement of event \(A\) is denoted \(A^c\), and \(A^c\) represents all outcomes not in \(A\). \(A\) and \(A^c\) are mathematically related:
\[ \begin{align} & P(A) + P(A^c) = 1 \\ or, \space & P(A^c) = 1 - P(A) \end{align} \]
Example: if an event has chance \(40\%\), then the chance that it doesn't happen is \(60\%\).
\[ \begin{align} P(email) &=0.73 \\ P(text) &= 0.62 \\ P(\text {email & text}) &= 0.49 \\ P(\text {only email}) &= 0.73 - 0.49 = 0.24 \\ P(\text{only text}) &= 0.62 - 0.49 = 0.13 \\ P(\text{neither email nor text}) &= 1 - (0.24 + 0.49 + 0.13) = 0.14 \end{align} \]
(polygon[GRID.polygon.18], polygon[GRID.polygon.19], polygon[GRID.polygon.20], polygon[GRID.polygon.21], text[GRID.text.22], text[GRID.text.23], text[GRID.text.24], text[GRID.text.25], text[GRID.text.26])
If \(A\) and \(B\) represent events from two different and independent processes, then the probability that both \(A\) and \(B\) occur can be calculated as the product of their seprarate probabilities:
\[P(A \text{ and } B) = P(A) \times P(B)\]
Similarlly, if there are \(k\) events \(A_1,...,A_k\) from \(k\) independent processes, then the probability they all occur is
\[ \bbox[yellow,5px] { \color{black} {P(A_1\text{ and }A_2 \text{ and ... and }A_k)=P(A_1)\times P(A_2)\times...\times P(A_k)} } \]
Example 1: If a card is randomly drawn from a well-shuffled deck, what is the probability that it is the ace of hearts? [Note: Ace and Hearts are two independent events.]
\[ \begin{align} P(Ace \text{ and } Hearts) &= P(Ace) \times P(Hearts) \\ &= (4/52) \times (13/52) = 1/52 \end{align} \]
Example 2: About \(9\%\) of people are left-handed. Suppose \(5\) people are selected at random from the US population.
(a) What is the probability that all are right-handed?
(b) What is the probability that all are left-handed?
(c) What is the probability that not all of them are right-handed?
\[ \begin{align} &(a) \space P\text{(All are RH)} = (1-0.09)^5 = 0.624 \\ &(b) \space P\text{(All are LH)} = (0.09)^5 = 0.0000059 \\ &(c) \space P\text{(not all RH)} = 1- P(\text {all RH}) = 1-0.624 = 0.376 \end{align} \]
The conditional probability of the outcome of interest \(A\) given condition \(B\) is computed as the following:
\[P(A|B) = \frac{P(A \text{ and } B)}{P(B)}\] College enrollment and parents' educational attainment
\[ \begin{array} {l|cc|r} & \text{parents: degree} & \text{parents: no degree} & \text{total} \\ \hline \text {teen: college} & 231 & 214 & 445 \\ \text {teen: no college} & 49 & 298 & 347 \\ \hline \text {total} & 280 & 512 & 792 \end{array} \]
If a probability is based on a single variable, it is a marginal probability. The probability of outcomes for two or more variables or processes is called a joint probability.
College enrollment and parents' educational attainment
\[ \begin{array} {l|cc|c} & \text{parents: degree} & \text{parents: no degree} & \text{marginal} \\ \hline \text {teen: college} & \color{red}{0.29} & \color{red}{0.27} & \color{blue}{0.56} \\ \text {teen: no college} & \color{red}{0.06} & \color{red}{0.38} & \color{blue}{0.44} \\ \hline \text {marginal} & \color{blue}{0.35} & \color{blue}{0.65} & 1.00 \end{array} \]
\[ \begin{align} &\color{blue}{\text{Marginal Probability: }} P(\text{teen: college})=\frac{445}{792}=0.56 \\ &\color{red}{\text{Joint Probability: }} P(\text {teen: college and parents: no degree})=\frac{214}{792}=0.27 \end{align} \]
College enrollment and parents' educational attainment
\[ \begin{array} {l|cc|r} & \text{parents: degree} & \text{parents: no degree} & \text{total} \\ \hline \text {teen: college} & 231 & 214 & 445 \\ \text {teen: no college} & 49 & 298 & 347 \\ \hline \text {total} & 280 & 512 & 792 \end{array} \]
\[ \begin{align} P(\text {teen college | parents degree}) &= \frac{231/792}{280/792} = 0.825 \\ P(\text {teen college | parents no degree}) &= \frac{214/792}{512/792} = 0.418 \\ P(\text {teen no college | parents degree}) &= \frac{49/792}{280/792} = 0.175 \\ P(\text {teen no college | parents no degree}) &= \frac{298/792}{512/792} = 0.582 \end{align} \]
If \(A\) and \(B\) represent two outcomes or events, then
\[ \bbox[yellow,5px] {\color{black}{P(A \space and \space B) = P(A|B) \times P(B)}} \]
Verify whether one of the following equations holds:
\[ \begin{align} P(A|B) &= P(A) \tag 1 \\ P(A \space and \space B) &=P (A) \times P(B) \tag 2 \end{align} \] Check if the equality holds in the following equation:
\[ \begin{align} P(\text{teen college | parent degree})&\stackrel{?}{=} P(\text {teen college}) \\ 0.825 &\ne 0.560 \end{align} \] Because both sides are not equal, teenager college attendance and parent degree are not independent.
If \(A\) and \(B\) are mutually exclusive events, then they cannot occur at the same time. If asked to determine if events \(A\) and \(B\) are mutually exclusive, verify one of the following equations holds:
\[ \begin{align} P(\text{A and B})&= 0 \tag 1 \\ P(\text{A or B}) &= P(A)+P(B) \tag 2 \end{align} \]
If the equation that is checked holds true, \(A\) and \(B\) are mutually exclusive. If the equation does not hold, then \(A\) and \(B\) are not mutually exclusive.
Find \(P(A|B)\)
From general multiplication rule, we can write:
\[ \begin{align} P(A \space and \space B) &= P(A|B) \times P(B) \\ \Rightarrow P(A|B) &= \frac{P(A \space and \space B)}{P(B)} \\ &= \frac{P(A \space and \space B)}{P(A \text { and } B)+P(A' \text{ and } B)} \\ &= \frac{P(A)\times P(B|A)}{P(A)\times P(B|A) + P(A')\times P(B|A')} \\ &= \frac{(0.8 \times 0.01)}{(0.8 \times 0.01)+(0.2 \times 0.02)} \\ &= 0.67 \end{align} \]
Let's say \(1\%\) of the population has a rare disease.
Error rates:
A person is picked at random and tested. Given that the test result is \(+\), what is the probability that the person has the disease?
Find \(P(D|+)\)
From general multiplication rule, we can write:
\[ \begin{align} P(D \space and \space +) &= P(D|+) \times P(+) \\ \Rightarrow P(D|+) &= \frac{P(D \space and \space +)}{P(+)} \\ &= \frac{P(D \space and \space +)}{P(D \text { and } +)+P(ND \text{ and } +)} \\ &= \frac{P(D)\times P(+|D)}{P(D)\times P(+|D) + P(ND)\times P(+|ND)} \\ &= \frac{(0.01 \times 0.995)}{(0.01 \times 0.995)+(0.99 \times 0.008)} \\ &= 0.56 \end{align} \]
Consider the following conditional probabilities for event 1 and event 2: \[P(\text {outcome of } A_1 \text { of event 1 | outcome B of event 2})\]
Bayes' Theorem states that this conditional probability can be identified as the following fraction:
\[\frac {P(B|A_1)P(A_1)}{P(B|A_1)P(A_1)+P(B|A_2)P(A_2)+...+P(B|A_k)P(A_k)}\]
A poker hand (5 cards) is dealt from a well shuffled deck. What is the chance that there is at least one ace in the hand?
\[ \begin{align} &P(\text{at least one ace}) \\ &=1-P(\text{no aces}) \\ &=1-(48/52) \times (47/51) \times (46/50) \times (45/49) \times (44/48) \\ &=34.11\% \end{align} \]