Probability

Alban Guillaumet, Troy University

“I believe that we do not know anything for certain, but everything probably.”

- Christiaan Huygens

The importance of Probability in biology

  • Can you give some examples of the importance of the probability concept in biology ?

  • Sex of babies in humans/animals (heterogametic sex)

  • DNA mutation rate (hypermutator)

The importance of Probability in Data analysis

  • Probability is essential because we use samples to investigate the world

  • As we have seen, chance plays a major role in the properties of samples

  • e.g., 95% confidence interval of the mean, Proba to capture/see a bird given that it is present (estimating abundance), etc.

  • Here we will discuss basic probability calculations

Probability Basics

Definition: A random trial is a process or experiment that has two or more possible outcomes whose occurrence cannot be predicted with certainty.

Definition: An event is any potential subset of all the possible outcomes of a random trial.

Probability Basics

Definition: The probability of an event is the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions. Probability ranges between zero and one.

Definition: The probability of an event not occurring is one minus the probability that it occurs. \[ \mathrm{Pr[{\it not}\ A]} = 1-\mbox{Pr[A]} \]

The Formulas

Definition: General addition rule \[ \mathrm{Pr[A \ or \ B]} = \mathrm{Pr[A]} + \mathrm{Pr[B]} - \mathrm{Pr[A \ and \ B]} \]

Exemple: blood type 0 or Rhesus factor + \[ \mathrm{Pr[O \ or \ +]} = \mathrm{Pr[O]} + \mathrm{Pr[+]} - \mathrm{Pr[O \ and \ +]} \]

Conditional Probabilities

Definition: The conditional probability of an event is the probability of that event occurring given that another event has already occurred.

Definition: The conditional probability of an event B given that A occurred is \[ \mathrm{Pr[B \ | \ A]} = \frac{\mathrm{Pr[A \ and \ B]}}{\mathrm{Pr[A]}} \]

Definition: General multiplication rule \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]}\times\mathrm{Pr[B \ | \ A]} \]

Mutually exclusive vs. independence

Commonly confused!

Definition: Two events are mutually exclusive if they cannot both occur at the same time. \[ \mathrm{Pr[A \ and \ B]} = 0 \]

Definition: Two events are independent if the occurrence of one does not inform us about the probability that the second will occur. \[ \mathrm{Pr[B \ | \ A]} = \mathrm{Pr[B]} \]

Mutually exclusive vs. independence

These two conditions simplify the general addition and multiplicative rules:

If two events are mutually exclusive, then \[ \mathrm{Pr[A \ or \ B]} = \mathrm{Pr[A]} + \mathrm{Pr[B]} \]

Exemple: blood type 0

\[ \mathrm{Pr[O]} = \mathrm{Pr[O+ \ or \ O-]} = \mathrm{Pr[O+]} + \mathrm{Pr[O-]} - \mathrm{Pr[O+ \ and \ O-]} \]

\[ \mathrm{Pr[O]} = \mathrm{Pr[O+]} + \mathrm{Pr[O-]} \]

Mutually exclusive vs. independence

If two events are independent, then \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]} \times \mathrm{Pr[B]} \]

\[ \mathrm{Pr[2 \ six]} = \mathrm{Pr[first \ roll \ is \ six \ and \ 2nd \ roll \ is \ six]} = 1 / 6 \times 1 / 6 = 1 / 36 \]

Objectives

  • Probability trees

  • Law of total probability

  • Contingency tables

  • Probability distributions

Visualizing... Probability trees

alt text

alt text

\[ \mathrm{Pr[P \ and \ M]} = ? \]

Visualizing... Probability trees

  • \[ \mathrm{Pr[P \ and \ M]} = \mathrm{Pr[P]}\times\mathrm{Pr[M \ | \ P]} = 0.18 \]

    What is \( \mathrm{Pr[M]} \)?

    Law of total probability

    Definition: The law of total probability is given by \[ \begin{align*} \mathrm{Pr[A]} & = \sum_{All \ values \ of \ B}\mathrm{Pr[A \ and \ B]} \\ & = \sum_{All \ values \ of \ B} \mathrm{Pr[B]}\ \mathrm{Pr[A\ | \ B]}, \end{align*} \] where \( B \) represents all possible mutually exclusive values of the condition

    So What is \( \mathrm{Pr[M]} \)?

    Calculating Pr[M]

    Definition: The law of total probability is given by \[ \mathrm{Pr[A]} = \sum_{All \ values \ of \ B} \mathrm{Pr[B]}\ \mathrm{Pr[A\ | \ B]} \]

    \( \mathrm{Pr[M]} = \mathrm{Pr[P]}\times\mathrm{Pr[M \ | \ P]} + \mathrm{Pr[NP]}\times\mathrm{Pr[M \ | \ NP]} \)

    \( \mathrm{Pr[M]} = (0.20\times 0.90) + (0.80\times 0.05) = 0.22 \)

    Are P and M independent events?

  • \( (\mathrm{Pr[P \ and \ M]}=0.18)\neq (\mathrm{Pr[P]}\times\mathrm{Pr[M]}= 0.20\times 0.22 = 0.044) \)

    The probability to get a male depends on whether the host was already parasitized, i.e. the events are not independent

    Practice: Contingency tables

    Smoking and cancer contingency table

                health
    status       cancer not cancer    Sum
      smoker       8944      43056  52000
      not smoker    624      47376  48000
      Sum          9568      90432 100000
    

    Question: What is Pr[smoker]?

    Answer: 52000/100000 = 0.52

    Question: What is Pr[cancer]?

    Answer: 9568/100000 = 0.09568

    Practice: Contingency tables

    Smoking and cancer contingency table

                health
    status       cancer not cancer    Sum
      smoker       8944      43056  52000
      not smoker    624      47376  48000
      Sum          9568      90432 100000
    

    Question: What is Pr[cancer | smoker]?

    Answer: 8944/52000 = 0.172

    Question: What is Pr[smoker | cancer]?

    Answer: 8944/9568 = 0.9347826

    Practice: Contingency tables

    Smoking and cancer contingency table

                health
    status       cancer not cancer    Sum
      smoker       8944      43056  52000
      not smoker    624      47376  48000
      Sum          9568      90432 100000
    

    Question: What is Pr[smoker AND cancer]?

    Answer: 8944/100000 = 0.08944

    Visualizing probability - Mosaic plots

    plot of chunk unnamed-chunk-4

    plot of chunk unnamed-chunk-5

    Visualizing probability - Probability trees

    alt text

    Probability distributions

    Definition: A probability distribution is a list of the probabilities of all mutually exclusive outcomes of a random trial.

    Discrete probability distributions

    alt text

    How do you calculate P[X=1]?

    alt text

    How do you calculate P[X=1], P[X=2] and P[X=3]?

    Discrete probability distributions

    alt text

    P[X=1] = 1/6

    Discrete probability distributions

    alt text

    P[X=1] = 0

    P[X=2] = P[roll#1 = 1 AND roll#2 = 1]

    P[X=2] = (1/6) * (1/6) = 1/36 ~ 0.028

    P[X=3] = P[ (roll#1 = 1 AND roll#2 = 2) OR (roll#1 = 2 AND roll#2 = 1)]

    Mutually exclusive events

    P[X=3] = (1/36) + (1/36) ~ 0.056

    Continuous probability distributions

    Definition: We describe a continuous probability distribution with a curve whose height is the probability density.

    alt text

    Continuous probability distributions

    Unlike discrete probability distributions, the height of a continuous probability curve (say, at Y = 1) is not the probability of obtaining Y = 1.

    Instead, the probability to obtain a value of Y within some range is given by the area under the curve.

    Probability densities alt text