Probability is a numerical measure of the likelihood an outcome will occur; can assume values between 0 (outcome is highly unlikely to occur) and 1 (outcome is very likely to occur).
Note: 1/2, 0.5, or 50% all mean the same thing so they are all correct. When it comes to working with computers, decimals is the preferred format.
Experiment is any process that can result in one of several well-defined outcomes that cannot be predicted with certainty beforehand (e.g. roll a die, make an investment).
Sample space is the set of all possible outcomes in an experiment (written as S).
Sample point is a member of the sample space; any one particular experimental outcome.
Event a well-defined collection of sample points; a subset of the sample space.
Classical method assumes equally likely outcomes.
Relative frequency method is based more on real-world empirical observation than on theoretical assumptions about the likelihood of any experimental outcome.
Subjective method assumes experimental outcomes are not equally-likely, or relative frequency data are either unavailable or uncollectable.
When assigning probabilities, two requirements must be met:
The intersection of two events is written as Pr(A \(\cap\) B) and is read as the probability that both A and B occur simultaneously.
If Pr(A \(\cap\) B) = 0, then you say the two events are mutually exclusive.
The union of two events is written as Pr(A \(\cup\) B) and is read as the probability that A or B occurs.
The complement of an event is written as Pr(\(A^c\)) and is read as the probability that A does not occur.
The union of any two events can be calculated with the following rule:
\[ P(A \cup B) = P(A) + P(B) - P(A \cap B)\]
Conditional Probability is the probability of one event occurring after taking into account the occurrence of another event.
Notation: Pr(A|B) bar means “given”.
Independent Events: If Pr(A|B) = Pr(A), then the two events are independent.
For independent events the following rule is always true:
\[ P(A \cap B) = P(A) * P(B) \]
A random variable is a variable whose specific outcomes are assumed to arise by chance or according to some random or stochastic mechanism.
When you are considering random variables, we assume you have not yet made an observation.
The probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.
There are two types of random variables, depending on their outcome:
Discrete if its realizations can take on only certain precise values.
Continuous if it has no limit to the number of possible realizations.
A cumulative probability for a random variable X is the probability of observing less than or equal to x and written as \(Pr(X \leq x)\).
For discrete random variables we can calculate \(Pr(X = x)\).
For continuous random variables we calculate probabilities for an interval (using cumulative probabilities).
For a random variable X with density f, the mean \(\mu_x\) (or expectation or expected value E[X] ) is interpreted as the average outcome that you can expect over many realizations.
\[ \mu_x = E(X) = \sum_{i=1}^n x_i* Pr(X=x_i)\]
For X, the variance \(\sigma^2_x\), also written as Var[X], quantifies the variability inherent in realizations of X.
\[ \sigma^2_x = Var(X) = \sum_{i=1}^n (x_i - \mu_x)^2* Pr(X=x_i)\]
A distribution is symmetric if you can draw a vertical line down the center, and it is equally reflected with 0.5 probability falling on either side of this center line.
If a distribution is asymmetric, we say that it is skewed.
Modality describes the number of easily identifiable peaks in the distribution of interest.
If an experiment can be characterized as a sequence of N steps with \(n_1\) possible results on the first step, \(n_2\) possible results on the second step, etc., then the total number of outcomes for the overall experiment is equal to the product of the number of results on each step.
Counts the number of experimental outcomes when n objects are to be selected from a larger set of N objects.