Module 2: Probabilities Mass Functions

In this module we discuss probability, the foundation of statistical analysis. Probability assigns a number between 0 and 1 to events to give a sense of the “chance” of the event. Probability has become our default model for apparently random phenomena. Our eventual goal is to use probability models, our formal mechanism for connecting our data to a population. However, before we get to probability models, we need to understand the basics of probability calculus. The next few lectures cover these basics.

Discrete and Continuous

  • discrete = things you can count (qualitative and quantitative)
  • continuous = are ranges variales can take
  • PMF = probability mass function evalutated at a value corresponds to the probability that a random variable takes that value. To be a valid pmf function p, must satisfy:
  1. It must always be larger than or equal to 0
  2. The sum of the possible values the random varible can take has to add up to one

\[p(1) + p(2) + p(3) + p(4) + p(5) + p(6) = 1 \]

Bournouli Distribution

Capital X is the result of a coin flip. X=0 represents tails and X=1 represents heads. \(p(x) = (1/2)^x (1/2)^{1-x}\) for \(x=0,1\) \(p(0) = (1/2)^0(1/2)^{1-1} = 1/2\) \(p(1) = (1/2)^1(1/2)^{1-1} = 1/2\)

PDF Probability Density Functions

PDF functions are in relation to discrete varibales. To be a valid pdf, a function must satisfy:

  1. It must be larger than or equal to zero everywhere
  2. the total area under it must be one

Areas under pdf’s correspond to probailities for that random variable

x<-c(-0.5,0,1,1,1.5)
y<-c(0,0,2,0,0)
# probability area using polygon function
p1<-c(0,0.75,0.75);p2<-c(0,1.5,0)
plot(x,y,lwd=3,frame=FALSE,type="l")
polygon(c(0,p1),c(0,p2),col="red",border=FALSE)
lines(p1,p2,lwd=3)

Next we will need to see the density function is valid pdf. Then we see the probability that 75% or fewer calls get addressed? 0.75 is our probaility, 2 is the height of our triangle and 1 is the length. pbeta() is the function for finding probaility in a triangle.

1.5*0.75/2
## [1] 0.5625
# or
pbeta(0.75,2,1)
## [1] 0.5625

CDF and Survival Function

CDF (Cumulative Density Function) of a random variable, X, then returns the probaility that the random variable is less than or equal to the value of x. This defintio also applies whether our distribution is discrete or continuous.

\[ F(x) = P(X \le x)\]

Survival Function of a random variable X is defined as the probability that the random variable is greater than the value of x. Notice that S(x)==1-F(x).

\[ S(x) = P(X > x) \] \[ F(x) = P(X \le x) = \frac{1}{2}Base \times Height = \frac{1}{2}(x) \times (2x) = x^2 \]

pbeta(c(0.4,0.5,0.6),2,1)
## [1] 0.16 0.25 0.36

Quantiles

If you were the 95th percentile on an exam, you know that 95% of people scored worse than you and 5% scpored better. These are sample quantities. Here we define their population analogs.

\[F(x_{a}) = a\]

  • percentile is simply a quantile with \(a\) expressed as a percent
  • the median is the 50th percentile.

The below is a quantile & beta density function since it has a q in front of it.

qbeta(0.5,2,1)
## [1] 0.7071068

Module 3: Conditional Probability

Conditional probability is a very intuitive idea, “What is the probability given partial information about what has occurred?”. The probability of getting hit by lightning is small. However, it’s much larger for people playing outside in open fields during a lightning storm! In these lectures we go over the formal rules of conditional probability.

Let \(B\) be an event so that \(P(B)>0\)

\[P(A|B) = \frac{P(A \cap B)}{P(B)}\] P(one given that roll is odd) \(= P(A|B)\) \[ A = \{1\}; \; B = \{1,3,5\} \]

\[ \frac{1/6}{3/6} = \frac{1}{3}\]

Bayes Rule

This is quite useful in diagnostic test in conditional probability.

Diagnostic Tests:

Let + and - be the events that the result of a diagnostic test is positive or negative respectively. Let \(D\) and \(D^c\) be the event that the subject of the test has or does not have the disease respectively.

  • Sensitivity = \(P(+|D)\)
  • Specificity = \(P(-|D^c)\)

Two Likelihood ratios and Bayes Theroem:

\[P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D)+P(+|D^c)P(D^c)}\] \[P(D^c|+) = \frac{P(+|D^c)P(D^c)}{P(+|D)P(D)+P(+|D^c)P(D^c)}\]

The numerator changes but not the denominator.

Independance

Event \(A\) is independant of event \(B\) if:

\(P(A|B) = P(A)\) where \(P(B) > 0\)

Event \(A\) is independant of event \(B\) if:

\(P(A \cap B) = P(A)P(B)\)

Example:

\(A = \{Head \: on \: flip \: 1\} ~ P(A) = 0.5\) \(B = \{Head \: on \: flip \: 2\} ~ P(B) = 0.5\) \(A \cap B = \{Head \: on \: flips \: 1 \: and \: 2\}\) \(P(A \cap B) = P(A)P(B) = 0.5 \times 0.5 = 0.25\)

More about independance…

IID random variables

Random variables are said to be iid if they are independant and identically distributed:

  • Independant: statistically unrelated from one another
  • Identically distributed: all having been drawn from the same population distribution

iid random variables are the default model for random samples

Module 4: Expected Values

The empirical average is a very intuitive idea; it’s the middle of our data in a sense. But, what is it estimating? We can formally define the middle of a population distribution. This is the expected value. Expected values are very useful for characterizing populations and usually represent the first thing that we’re interested in estimating.

The population mean: the expected value or mean of a random variable is the center of its distribution.

The idea for expected values is similiar to finding the center of mass along a weighted line. The imperical mean will balance out the imperical distribution.

Facts about expected values

Summarizing this module

  1. expected values are proeprties of distributions
  2. the population mean is the center of mass of population
  3. the sample mean is the center of mass of the observed data
  4. the sample mean is an estimate of the population mean
  5. the sample mean is unbiased
  1. the population mean of its distribution is the mean that its trying to estimate
  1. the more data that goes into the sample mean, the more concentrated its density / mass function is around the population mean.