Joel Correa da Rosa
January 4th 2017
Aristotle: “The probable is what usually happens”
Cicero: “Probability is the very guide of life”
Democritus: “Everything existing in the universe is the fruit of chance”
Randomness is the cause of uncertainty.
When sampling from a population, randomness is present.
In our everyday lives, almost everything is a random experiment.
A experiment whose outcome cannot be predicted with certainty before the experiment is run.
If the same results are obtained when an experiment is repeated under the same conditions, the experiment is deterministic.
Probability and Statistics are the branches of mathematics that have been developed to deal with random experiments.
The set of all possible outcomes of the Random Experiment.
Example #1 : Consider the random experiment that will sample 5 HIV+ subjects and verify how many subjects have adverse events after vaccination.
The sample space is: \( \Omega = \{0,1,2,3,4,5\} \)
Example #2 : Consider the random experiment that will sample 10 psoriatic subjects and measure the average IL-17 gene expression from micro-array in log2 scale in their skin tissue.
The sample space is: \( \Omega= (-\infty,+\infty) \)
An event is a subset of the sample space.
Example #1. Let's consider the following event: “More than 3 subjects have adverse events after vaccination”. This is a subset of the sample space \( A = \{4,5 \} \).
Example #2 : Define the event: “The average log2 expression for IL-17 in 10 psoriatic patients is greater than 7.” \( A = (7,+\infty) \).
Set operations are important to define the events of interests and also to better understanding of the probability rules. Considering two events (subsets of the sample space) \( A \) and \( B \).
Set operation # 1: Union (\( A \cup B \))
Set operation # 2: Intersection (\( A \cap B \))
Set Operation # 3: Complement (\( A^c \))
Comment: The Venn-Diagram is an auxiliary tool for visualization of set operations.
Is a number assigned to each event that is intended to measure the chance of its ocurrence in a random experiment.
The probability distribution is a function that maps events to real numbers.
The theory of probability is founded on 3 axioms.
\( 0 \leq P(A) \leq 1~~;A\subseteq \Omega \) (Probability is always positive)
\( P(\Omega)=1 \) (The sum of probabilities is 1)
\( P(A \cup B)=P(A)+P(B)~~;A \cap B=\emptyset \) (The probability of disjoint events union is the sum of their probabilities )
Comment: Two events \( A \) and \( B \) are said to be disjoint if \( A \cap B=\emptyset \)
\( P(A \cup B) = P(A) + P(B) - P(A\cap B) \)
As a consequence of the axioms, if \( A \cap B \) is an empty set, i.e. there is no intersection, \( P(A \cap B)=0 \).
Two schools of thoughts:
Frequentist (Classical) : \( \frac{\text{# occurrences of A}}{\text{# of Experiments}} \)
Bayesian : \( posterior\propto prior \times likelihood \)
If the ocurrence of event \( B \) modifies the sample space, the probability of ocurrence of event \( A \) is updated according to the following law :
\( P(A|B) = \frac{P(A \cap B)}{P(B)} \)
If \( A \) is independent of \( B \), \( P(A|B)=P(A) \)
It is also true that if \( A \) is independent of \( B \), \( P(A \cap B) = P(A)P(B) \)
Assuming
1) Occurrence of an adverse event in a subject does not depend on occurrence in other subjects. 2) Probability of an adverse event is 50%
In 5 HIV+ subjects, the number of subjects with adverse events will follow the binomial distribution ?
# Binomial Distribution
barplot(dbinom(0:5,5,0.5))
Assuming:
1) log2 expressions of IL-17 are normally distributed;
2) Average log2 expression in the population is 7;
3) Standard deviation for the log2 expressions in the population is 2.5
# Normal Distribution
x<-seq(0,16,0.01)
curve(dnorm(x,7,2.5),0,16)
Assuming that a diagnostic test is a composition of two random experiments:
a) Observe the result of a diagnostic test (\( \Omega_1 =\{T+,T-\} \))
b) Observe the result of a gold standard test (\( \Omega_2 =\{D+,D-\} \))
the sample space for the random experiment that results from this composition has four elements :
\( \Omega = \Omega_1 \times \Omega_2 = \{(T+,D+),(T+,D-),(T-,D+),(T-,D-)\} \)
obs: \( (T+,D+) = T+ \cap D+ \) Each element in the sample space is an intersection of two events.
Based on a frequentist approach, the probabilities for events in this random experiment can be calculated from a 2x2 table with the frequencies of ocurrences in each cell (e.g. \( a \) is the number of ocurrences of \( T+ \cap D+ \)).
D+ | D- | |
---|---|---|
T+ | a | b |
T- | c | d |
\( n = a+b+c+d \): number of experiments runs
sensitivity : \( P(T+|D+) \)
specificity : \( P(T-|D-) \)
false positive rate : \( P(T+|D-) \)
false negative rate : \( P(T-|D+) \)
The Bayes Theorem is a consequence of the conditional probability definition.
Consider two events \( A \) and \( B \)
\( P(B|A)=\frac{P(A|B)P(B)}{P(A)} \)
If a set of events constitutes a partition of the sample space (e.g. \( B \) and \( B^c \)). Then for every event A:
\( P(A) = P(A\cap B) + P(A \cap B^c) \)
Assume that a test has 80% sensitivity and 90.4% specificity for a cancer that is 1% prevalent in the population. Consider the random experiment that selects an individual from this population and perform this diagnostic test.
What is the probability of having cancer ?
What is the probability of having cancer if the test is positive ?