Estimating with Uncertainty

September 18, 2023

Quote of the day

“I believe that we do not know anything for certain, but everything probably.”

- Christiaan Huygens

Precision vs Accuracy

Random sampling

The main assumptions of all statistical techniques is that your data come from a random sample.

Definition: In a random sample, each member of a population has an equal and independent chance of being selected.

Random sampling

minimizes bias (equal) and
makes it possible to measure the amount of (quantify precision) sampling error (independent)

Random sampling

Suppose we have 1000 households with 5 members per household. We measure two variables from each person (e.g. height and weight). Presumably, those two variables will be similar for members of the same household (i.e. members from the same household are dependent samples).

Random sampling

Unbiased sample

Unbiased sample (n=10)

Pseudoreplicated sample

Pseudoreplicated sample (n=10): Lack of independence

Biased sample

Biased sample (increased chance of selection for larger x values)

100 samples of size 10

TL;DR #1: Pseudoreplication (lack of independence) affects precision

100 samples of size 10

TL;DR #2: Bias (lack of equality) affects accuracy

Language: Sampling Distributions

Definition: The sampling distribution is the population distribution of all values for an estimate that we might obtain when we sample a population.

Definition: The standard error of an estimate is the standard deviation of the estimate’s sampling distribution.

Language: Sampling Distributions

Definition: The standard error of the mean is given by \[ \sigma_{\overline{Y}} = \frac{\sigma}{\sqrt{n}} \] with the approximate standard error of the mean given by \[ \mathrm{SE}_{\overline{Y}} = \frac{s}{\sqrt{n}} \]

Sampling distributions tutorial

“Chalk” talk - Sampling distributions and 95% confidence intervals

Language: Confidence Intervals

Definition: A confidence interval is a range of values surrounding the sample estimate that is likely to contain the population parameter.

Definition: A 95% confidence interval provides a most-plausible range for a parameter. Values lying within the interval are most plausible, whereas those outside are less plausible, based on the data.

Confidence intervals tutorial

Error bars

How to do these in R?

Read and inspect the data.

locustData <- read.csv(here::here("Datasets/chapter02/chap02f1_2locustSerotonin.csv"))
head(locustData)

  serotoninLevel treatmentTime
1            5.3             0
2            4.6             0
3            4.5             0
4            4.3             0
5            4.2             0
6            3.6             0

str(locustData)

'data.frame':   30 obs. of  2 variables:
 $ serotoninLevel: num  5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
 $ treatmentTime : int  0 0 0 0 0 0 0 0 0 0 ...

Error bars

First, calculate the statistics by group needed for the error bars: the mean and standard error. Here, summarize and group_by are used to obtain each quantity by treatment group.

(locustStats <- summarize(group_by(locustData, treatmentTime), 
                         mean = mean(serotoninLevel), 
                         sd = sd(serotoninLevel), 
                         n = n(), 
                         se = sd/sqrt(n)))

# A tibble: 3 × 5
  treatmentTime  mean    sd     n    se
          <int> <dbl> <dbl> <int> <dbl>
1             0  6.36  4.82    10  1.52
2             1  8.04  4.96    10  1.57
3             2 10.8   5.33    10  1.68

Error bars

Draw the strip chart and then add the error bars.

\[ \bar{Y} \pm SE_{\bar{Y}} \]

offsetAmount <- 0.2
stripchart(serotoninLevel ~ treatmentTime, 
           data = locustData, 
           method = "jitter", 
           vertical = TRUE)

segments(1:3 + offsetAmount, 
         locustStats$mean - locustStats$se,
         1:3 + offsetAmount, 
         locustStats$mean + locustStats$se)

points(locustStats$mean ~ c(c(1,2,3) + offsetAmount), 
       pch = 16, 
       cex = 1.2)

Error bars

Draw the strip chart and then add the error bars.

\[ \bar{Y} \pm SE_{\bar{Y}} \]

Error bars can mean different things!!!

Different error bars!!! \[ \bar{Y} \pm sd \\ \bar{Y} \pm SE_{\bar{Y}} \\ \bar{Y} \pm 2\times SE_{\bar{Y}} \]