Monday, November 30, 2015

Why should you learn statistical inference?

What is statistical inference?

Coursera: "Generating conclusions about a population from noisy data"

Issues:

  • does the sample represent the population?
  • are there confounding variables?
  • is there systematic bias in the data?

Examples

  • election polls
  • weather predictions
  • A/B marketing campaign
  • drug efficacy / educational efficacy

Terms to understand

  • experiment / trial
  • population / sample

Probability demonstration - 6-sided die

P(0) =

P(1) =

P(odd) =

P(any number 1-6) =

In groups, roll the die 12 times. Record the expected and actual counts of rolls for each number 1-6.

6-sided die simulation in R

# set possible values for die roll
die <- 1:6

# take a sample of 100 die rolls
die_roll <- sample(die,100,replace=TRUE)
table(die_roll)
## die_roll
##  1  2  3  4  5  6 
## 22 15 16 20 13 14

Re-run this simulation with either a two-sided coin or 10-sided die.

Probability rules

  • P(nothing) = 0
  • P(anything) = 1
  • 0 < P(x) < 1
  • The probability of x occuring + probability of the opposite of x occuring = 1. [1 - P(x) = P(opposite_x)]
  • If x and y are mutually exclusive, the probability of at least x or y occuring is P(x) + P(y)
  • If x and y are NOT mutually exclusive, the probability of at least one occuring is P(x) + P(y) - their intersection.
  • If event x implies event y occurred, P(x) < P(y)

Coursera example

Suppose 10% of Americans have sleep apnea, and 3% of Americans have restless leg syndrome. How many Americans have at least one of the two sleep problems?

Shiny – Venn Diagram

Union - no overlap

\[ P(A \cup B) = P(A) + P(B) \]

## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8])

Union - overlap

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

## (polygon[GRID.polygon.9], polygon[GRID.polygon.10], polygon[GRID.polygon.11], polygon[GRID.polygon.12], text[GRID.text.13], text[GRID.text.14], text[GRID.text.15], text[GRID.text.16], text[GRID.text.17])

Subset

\[ If {A} \subset {B}, P(A) < P(B) \]

## (polygon[GRID.polygon.18], polygon[GRID.polygon.19], polygon[GRID.polygon.20], polygon[GRID.polygon.21], text[GRID.text.22], text[GRID.text.23], text[GRID.text.24], text[GRID.text.25])

Intersection

\[ P(A \cap B) = P(A) + P(B) - P(A \cup B) \]

## (polygon[GRID.polygon.26], polygon[GRID.polygon.27], polygon[GRID.polygon.28], polygon[GRID.polygon.29], text[GRID.text.30], text[GRID.text.31], text[GRID.text.32], text[GRID.text.33], text[GRID.text.34])

Union example

What is the probability that you will draw a spade OR an ace from a standard deck of playing cards?

Coursera - Intersection example

Consider influenza epidemics for two parent heterosexual families.

Suppose that the probability is 15% that at least one of the parents has contracted the disease. The probability that the father has influenza is 10% and the probability that the mother has influenza is 9%.

What is the probability that both parents contracted influenza?

Union / Subset / Intersection in R

Use sample() to create vectors A & B, each containing at least 20 unique numbers. Only use numbers >= 0 and <= 50.

Find the union and intersection for A & B. Hint:

?union
?intersect

Random variables

Can be discrete / categorical (binary) vs. continuous

 - coin flip (D)  
 - die roll (D)  
 - web site traffic (D)  
 - BMI (C)  
 - BMI category (D)  
 - IQ (C) 

Probability mass functions (PMF)

A PMF evaluated at a value corresponds to the probability that a discrete random variable takes that value

Rules:

  • probability >= 0
  • sum of possible values that variable can take must add to equal 1

Example: 6-sided die

Probability density functions (PDF)

PDF: probability that a continuous random variable takes a specific value

Rules:

  • must be >= 0 everywhere
  • total area below function = 1

Areas under PDFs correspond to probabilities for that random variable

PDF example - IQ bell curve

Call center example

Suppose the proportion of calls that are answered in a call center can be represented by f(x) = 2x for 0 < x < 1, and 0 otherwise. Is this a valid PDF?

Call center example (cont.)

What is the probability that 75% or fewer calls get addressed?

Call center example (cont.)

# find area of blue triangle
1.5 * 0.75/2
## [1] 0.5625
# find probability of this outcome
pbeta(0.75, 2, 1)
## [1] 0.5625

Cumulative distribution function (CDF)

CDF of a random variable (X) returns probability that random variable X is <= the value of x.

F(x) = P(x <= x)

Can be applied to discrete or continuous variables

Survival function

Probability that random variable X is > the value of x.

S(x) = P(X > x)

S(x) = 1 - F(x)

CDF example

Use pbeta() to calculate the probability that 40%, 50%, and 60% of calls are answered on a given day.

## [1] 0.16 0.25 0.36

Quantiles

  • percentile: quantile represented as a %
  • median: 50th percentile, most common quantile

Using R to approximate quantiles:

qbeta(0.5, 2, 1)
## [1] 0.7071068

Tells us that for right triangle with base 1 and height 2, we have 50% chance of answering about 70.7% of the calls for a given day

Quantile practice

Using qbeta(), determine the percent likelihood of answering calls for a given day with the parameters below:

  1. Right triangle with base 2 and height 4; percentiles = 25%, 50%, 75%.
  2. Right triangle with base 1 and height 10; percentiles = 10%, 30%, 40%
  3. Right triangle with base 5 and height 5; percentiles = 1%, 99%

Conditional probability

xkcd: lightning and statistics

Conditional probability example - shark bites

http://californiadiver.com/were-not-on-the-menu-california-shark-attacks-down-91-in-past-60-years/

"Abalone divers may be at the greatest risk, statistically. The Stanford study shows abalone diving creates the greatest exposure to shark incidents, followed by surfing, scuba diving and swimming. In 2013, the chances of a shark attack on an abalone diver were one in 1.44 million. For surfers, the chances were one in 17 million, and for scuba divers, one in 136 million. Swimmers had the lowest chance of being attacked by a shark, with one attack for every 738 million beach visits."

Conditional probability definition

P(A | B): P(A) given B has occurred

\[ P(A ~|~ B) = [P(A \cap B)] / [P(B)] \]

If A and B are independent, then:

\[ P(A ~|~ B) = P(A)*P(B)/P(B) = P(A) \]

Conditional probability - dice example

A: roll 1
B: roll odd

What is the probability of rolling a 1 or an odd?
P(A) = 1/6
P(B) = 1/2

If you know the roll was an odd number, what is the probability the roll was 1?

\[ P(A | B) = [P(A \cap B)]/[P(B)] \] = P(A) / P(B)
= (1/6) / (3/6)
= 1/3

Baye's rule

Useful tool for calculating conditional probaiblities

\[ P(B ~|~ A) = \frac{P(A ~|~ B) P(B)}{P(A ~|~ B) P(B) + P(A ~|~ B^c)P(B^c)} \]

Bayes' rule example - fair & unfair coin

Two-Face has one unfair (heads on both sides) and one fair (heads & tails) coin in his pocket.

He takes one coin from his pocket at random, tosses it, and obtains a heads.

What is the probability that Two-Face flipped the fair coin?

Two-Face example - tree solution

Two-Face example - Bayes' formula solution

\[ P(FC ~|~ H) = \] \[ = \frac{P(H ~|~ FC) P(FC)}{P(H ~|~ FC) P(FC) + P(H ~|~ UFC)P(UFC)} \] \[ = \frac{.5 * .5}{.5 * .5 + 1 * .5} \] \[ = \frac{.25}{.25 + .5} = \frac{.25}{.75} = 1/3 \]

Monty Hall problem

Diagnostic tests & terms

  • \(+\) : positive test
  • \(-\) : negative test
  • \(D\) : subject has disease
  • \(D^c\) : subject does not have disease

  • Sensitivity: \(P(+ ~|~ D)\)
  • Specificity: \(P(- ~|~ D^c)\)

  • Positive predictive value: \(P(D ~|~ +)\)
  • Negative predictive value: \(P(D^c ~|~ -)\)
  • Prevalence: \(P(D)\)

Terms (cont.)

  • Diagnostic likelihood ratio of a positive test: \(DLR_+\)
    • \(P(+ ~|~ D) / P(+ ~|~ D^c)\)
    • \(sensitivity / (1 - specificity)\)
  • Diagnostic likelihood ratio of a negative test: \(DLR_-\)
    • \(P(- ~|~ D) / P(- ~|~ D^c)\)
    • \((1 - sensitivity) / specificity\)

Group work - Coursera example

  • A study comparing the efficacy of HIV tests, reports on an experiment which concluded that HIV antibody tests have a sensitivity of 99.7% and a specificity of 98.5%
  • Suppose that a subject, from a population with a .1% prevalence of HIV, receives a positive test result. What is the probability that this subject has HIV?
  • Mathematically, we want \(P(D ~|~ +)\) given the sensitivity, \(P(+ ~|~ D) = .997\), the specificity, \(P(- ~|~ D^c) =.985\), and the prevalence \(P(D) = .001\)

Group work - (HIDE) solution using Bayes' formula

\[ P(D ~|~ +) = \frac{P(+~|~D)P(D)}{P(+~|~D)P(D) + P(+~|~D^c)P(D^c)} \]
\[ = \frac{P(+~|~D)P(D)}{P(+~|~D)P(D) + {1-P(-~|~D^c)}{1 - P(D)}} \]
\[ = \frac{.997\times .001}{.997 \times .001 + .015 \times .999} = .062 \]

Independence

  • Two events \(A\) and \(B\) are independent if \[P(A \cap B) = P(A)P(B)\]
  • Two random variables, \(X\) and \(Y\) are independent if for any two sets \(A\) and \(B\) \[P([X \in A] \cap [Y \in B]) = P(X\in A)P(Y\in B)\]
  • If \(A\) is independent of \(B\), then:

\(A^c\) is independent of \(B\)
\(A\) is independent of \(B^c\)
\(A^c\) is independent of \(B^c\)

Example

What is the probability of getting two consecutive heads?

  • \(A = {\mbox{Head on flip 1}}\) ~ \(P(A) = .5\)
  • \(B = {\mbox{Head on flip 2}}\) ~ \(P(B) = .5\)
  • \(A \cap B = {\mbox{Head on flips 1 and 2}}\)
  • \(P(A \cap B) = P(A)P(B) = .5 \times .5 = .25\)

Practice - on your own

Steph Curry is currently shooting about 94% from the free throw line.

Assuming his free throw attempts are independent events, what is the probability of Steph making 10 free throws in a row?

  • Good: calculate probability using calculator
  • Better: calculate probability using R
  • Best: write function with two arguments, "ft_per" & "attempts", that calculates probability

Stats gone bad

  • Volume 309 of Science reports on a physician who was on trial for expert testimony in a criminal trial.
  • Based on an estimated prevalence of sudden infant death syndrome of \(1\) out of \(8,543\), Dr Meadow testified that that the probability of a mother having two children with SIDS was \(\left(\frac{1}{8,543}\right)^2\).
  • The mother on trial was convicted of murder.

Stats gone bad (cont.)

  • From a stats perspective, the principal mistake was to assume that the probabilities of having SIDs within a family are independent.
  • That is, \(P(A_1 \cap A_2)\) is not necessarily equal to \(P(A_1)P(A_2)\).
  • Biological processes that have a believed genetic or familiar environmental component, of course, tend to be dependent within families.

Groups

Come up with with two examples of:

  • data that meet rules for independence
  • data that do not meet rules for independence

IID random variables

  • Random variables are said to be "iid" if they are independent and identically distributed.
    • Independent: statistically unrelated from one and another
    • Identically distributed: all having been drawn from the same population distribution.
  • iid random variables are the default model for random samples.
  • Many of the important theories of statistics are founded on assumptions stating that variables are iid.
  • Assuming a random sample and iid will be our default starting point of inference.