Alban Guillaumet, Troy University
“I believe that we do not know anything for certain, but everything probably.”
- Christiaan Huygens
Can you give some examples of the importance of the probability concept in biology ?
Sex of babies in humans/animals (heterogametic sex)
DNA mutation rate (hypermutator)
Probability is essential because we use samples to investigate the world
As we have seen, chance plays a major role in the properties of samples
e.g., 95% confidence interval of the mean, Proba to capture/see a bird given that it is present (estimating abundance), etc.
Here we will discuss basic probability calculations
Definition: A
random trial is a process or experiment that has two or more possible outcomes whose occurrence cannot be predicted with certainty.
Definition: An
event is any potential subset of all the possible outcomes of a random trial.
Definition: The
probability of an event is the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions. Probability ranges between zero and one.
Definition: The probability of an event not occurring is one minus the probability that it occurs. \[ \mathrm{Pr[{\it not}\ A]} = 1-\mbox{Pr[A]} \]
Definition:
General addition rule \[ \mathrm{Pr[A \ or \ B]} = \mathrm{Pr[A]} + \mathrm{Pr[B]} - \mathrm{Pr[A \ and \ B]} \]
Exemple: blood type 0 or Rhesus factor + \[ \mathrm{Pr[O \ or \ +]} = \mathrm{Pr[O]} + \mathrm{Pr[+]} - \mathrm{Pr[O \ and \ +]} \]
Definition: The
conditional probability of an event is the probability of that event occurring given that another event has already occurred.
Definition: The
conditional probability of an event B given that A occurred is \[ \mathrm{Pr[B \ | \ A]} = \frac{\mathrm{Pr[A \ and \ B]}}{\mathrm{Pr[A]}} \]
Definition:
General multiplication rule \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]}\times\mathrm{Pr[B \ | \ A]} \]
Commonly confused!
Definition: Two events are
mutually exclusive if they cannot both occur at the same time. \[ \mathrm{Pr[A \ and \ B]} = 0 \]
Definition: Two events are
independent if the occurrence of one does not inform us about the probability that the second will occur. \[ \mathrm{Pr[B \ | \ A]} = \mathrm{Pr[B]} \]
These two conditions simplify the general addition and multiplicative rules:
If two events are
mutually exclusive , then \[ \mathrm{Pr[A \ or \ B]} = \mathrm{Pr[A]} + \mathrm{Pr[B]} \]
Exemple: blood type 0
\[ \mathrm{Pr[O]} = \mathrm{Pr[O+ \ or \ O-]} = \mathrm{Pr[O+]} + \mathrm{Pr[O-]} - \mathrm{Pr[O+ \ and \ O-]} \]
\[ \mathrm{Pr[O]} = \mathrm{Pr[O+]} + \mathrm{Pr[O-]} \]
\[ \mathrm{Pr[2 \ six]} = \mathrm{Pr[first \ roll \ is \ six \ and \ 2nd \ roll \ is \ six]} = 1 / 6 \times 1 / 6 = 1 / 36 \]If two events are
independent , then \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]} \times \mathrm{Pr[B]} \]
Probability trees
Law of total probability
Contingency tables
Probability distributions
\[ \mathrm{Pr[P \ and \ M]} = ? \]
\[ \mathrm{Pr[P \ and \ M]} = \mathrm{Pr[P]}\times\mathrm{Pr[M \ | \ P]} = 0.18 \]
What is \( \mathrm{Pr[M]} \)?
So What is \( \mathrm{Pr[M]} \)?Definition: The
law of total probability is given by \[ \begin{align*} \mathrm{Pr[A]} & = \sum_{All \ values \ of \ B}\mathrm{Pr[A \ and \ B]} \\ & = \sum_{All \ values \ of \ B} \mathrm{Pr[B]}\ \mathrm{Pr[A\ | \ B]}, \end{align*} \] where \( B \) represents all possible mutually exclusive values of the condition
Definition: The
law of total probability is given by \[ \mathrm{Pr[A]} = \sum_{All \ values \ of \ B} \mathrm{Pr[B]}\ \mathrm{Pr[A\ | \ B]} \]
\( \mathrm{Pr[M]} = \mathrm{Pr[P]}\times\mathrm{Pr[M \ | \ P]} + \mathrm{Pr[NP]}\times\mathrm{Pr[M \ | \ NP]} \)
\( \mathrm{Pr[M]} = (0.20\times 0.90) + (0.80\times 0.05) = 0.22 \)
\( (\mathrm{Pr[P \ and \ M]}=0.18)\neq (\mathrm{Pr[P]}\times\mathrm{Pr[M]}= 0.20\times 0.22 = 0.044) \)
The probability to get a male depends on whether the host was already parasitized, i.e. the events are not independent
Smoking and cancer contingency table
health
status cancer not cancer Sum
smoker 8944 43056 52000
not smoker 624 47376 48000
Sum 9568 90432 100000
Question: What is Pr[smoker]?
Answer: 52000/100000 = 0.52
Question: What is Pr[cancer]?
Answer: 9568/100000 = 0.09568
Smoking and cancer contingency table
health
status cancer not cancer Sum
smoker 8944 43056 52000
not smoker 624 47376 48000
Sum 9568 90432 100000
Question: What is Pr[cancer | smoker]?
Answer: 8944/52000 = 0.172
Question: What is Pr[smoker | cancer]?
Answer: 8944/9568 = 0.9347826
Smoking and cancer contingency table
health
status cancer not cancer Sum
smoker 8944 43056 52000
not smoker 624 47376 48000
Sum 9568 90432 100000
Question: What is Pr[smoker AND cancer]?
Answer: 8944/100000 = 0.08944
Definition: A
probability distribution is a list of the probabilities of all mutually exclusive outcomes of a random trial.
How do you calculate P[X=1]?
How do you calculate P[X=1], P[X=2] and P[X=3]?
P[X=1] = 1/6
P[X=1] = 0
P[X=2] = P[roll#1 = 1 AND roll#2 = 1] P[X=2] = (1/6) * (1/6) = 1/36 ~ 0.028 P[X=3] = P[ (roll#1 = 1 AND roll#2 = 2) OR (roll#1 = 2 AND roll#2 = 1)] Mutually exclusive events P[X=3] = (1/36) + (1/36) ~ 0.056Definition: We describe a continuous probability distribution with a curve whose height is the
probability density.
Unlike discrete probability distributions, the height of a continuous probability curve (say, at Y = 1) is not the probability of obtaining Y = 1.
Instead, the probability to obtain a value of Y within some range is given by the area under the curve.
Probability densities