Uncertainty & Probability

M. Drew LaMar
September 4, 2020

“I believe that we do not know anything for certain, but everything probably.”

- Christiaan Huygens

Course Announcements

  • Reading Assignment for Monday - W&S, Chapter 5 (QUIZ)

Language: Sampling Distributions

Definition: The sampling distribution is the population distribution of all values for an estimate that we might obtain when we sample a population.

Definition: The standard error of an estimate is the standard deviation of the estimate’s sampling distribution.

Definition: The standard error of the mean is given by
\[ \sigma_{\overline{Y}} = \frac{\sigma}{\sqrt{n}} \] with the approximate standard error of the mean given by \[ \mathrm{SE}_{\overline{Y}} = \frac{s}{\sqrt{n}} \]

Sampling distributions tutorial

Chalk talk - Sampling distributions and 95% confidence intervals

Language: Confidence Intervals

Definition: A confidence interval is a range of values surrounding the sample estimate that is likely to contain the population parameter.

Definition: A 95% confidence interval provides a most-plausible range for a parameter. Values lying within the interval are most plausible, whereas those outside are less plausible, based on the data.

Confidence intervals tutorial

Error bars

How to do these in R?

Read and inspect the data.

locustData <- read.csv("../..//Datasets/chapter02/chap02f1_2locustSerotonin.csv")
head(locustData)
  serotoninLevel treatmentTime
1            5.3             0
2            4.6             0
3            4.5             0
4            4.3             0
5            4.2             0
6            3.6             0
str(locustData)
'data.frame':   30 obs. of  2 variables:
 $ serotoninLevel: num  5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
 $ treatmentTime : int  0 0 0 0 0 0 0 0 0 0 ...

Error bars

First, calculate the statistics by group needed for the error bars: the mean and standard error. Here, tapply is used to obtain each quantity by treatment group.

meanSerotonin <- tapply(locustData$serotoninLevel, 
                        locustData$treatmentTime, 
                        mean)
sdSerotonin <- tapply(locustData$serotoninLevel, 
                      locustData$treatmentTime, 
                      sd)
nSerotonin <- tapply(locustData$serotoninLevel, 
                     locustData$treatmentTime, 
                     length)
seSerotonin <- sdSerotonin / sqrt(nSerotonin)

Error bars

Draw the strip chart and then add the error bars.

\[ \bar{Y} \pm SE_{\bar{Y}} \]

offsetAmount <- 0.2
stripchart(serotoninLevel ~ treatmentTime, 
           data = locustData, 
           method = "jitter", 
           vertical = TRUE)

segments(1:3 + offsetAmount, 
         meanSerotonin - seSerotonin, 
         1:3 + offsetAmount, 
         meanSerotonin + seSerotonin)

points(meanSerotonin ~ c(c(1,2,3) + offsetAmount), 
       pch = 16, 
       cex = 1.2)

Error bars

Draw the strip chart and then add the error bars.

\[ \bar{Y} \pm SE_{\bar{Y}} \]

plot of chunk unnamed-chunk-3

Error bars can mean different things!!!

plot of chunk unnamed-chunk-4

Different error bars!!! \[ \bar{Y} \pm sd \\ \bar{Y} \pm SE_{\bar{Y}} \\ \bar{Y} \pm 2\times SE_{\bar{Y}} \]

Probability Basics

Definition: A random trial is a process or experiment that has two or more possible outcomes whose occurrence cannot be predicted with certainty.

Definition: An event is any potential subset of all the possible outcomes of a random trial.

Definition: The probability of an event is the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions. Probability ranges between zero and one.

Random sampling as a random trial

Instead of events, we have values of random variables.

Parasitic wasps (yuck!): Two categorical variables - Parasitized or not; sex of laid egg (M or F)

The Formulas and Venn Diagrams

Definition: General addition rule \[ \mathrm{Pr[A \ or \ B]} = \mathrm{Pr[A]} + \mathrm{Pr[B]} - \mathrm{Pr[A \ and \ B]} \]

Conditional Probabilities

Definition: The conditional probability of an event is the probability of that event occurring given that another event has already occurred.

Definition: The conditional probability of an event B given that A occurred is \[ \mathrm{Pr[B \ | \ A]} = \frac{\mathrm{Pr[A \ and \ B]}}{\mathrm{Pr[A]}} \]

Definition: General multiplication rule \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]}\times\mathrm{Pr[B \ | \ A]} \]

Bayes Rule

Definition: The conditional probability of an event A given that B occurred is \[ \mathrm{Pr[B \ | \ A]} = \frac{\mathrm{Pr[A \ and \ B]}}{\mathrm{Pr[A]}} \]

Definition: General multiplication rule \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[B \ | \ A]}\times\mathrm{Pr[A]} \]

Definition: General multiplication rule \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A \ | \ B]}\times\mathrm{Pr[B]} \]

Definition: Bayes Rule \[ \mathrm{Pr[B \ | \ A]} = \frac{\mathrm{Pr[A \ | \ B]}\times \mathrm{Pr[B]}}{\mathrm{Pr[A]}} \]

Mutually exclusive vs. independence

Commonly confused!

Definition: Two events are mutually exclusive if they cannot both occur at the same time. \[ \mathrm{Pr[A \ and \ B]} = 0 \]

Definition: Two events are independent if the occurrence of one does not inform us about the probability that the second will occur. \[ \mathrm{Pr[B \ | \ A]} = \mathrm{Pr[B]} \]

Mutually exclusive vs. independence

These two conditions simplify the general additive and multiplicative rules:

If two events are mutually exclusive, then \[ \mathrm{Pr[A \ or \ B]} = \mathrm{Pr[A]} + \mathrm{Pr[B]} \]

If two events are independent, then \[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]} \times \mathrm{Pr[B]} \]

Visualizing dependency

Independent events

Dependent events

Mosaic plots are awesome!