Estimating with Uncertainty

M. Drew LaMar
September 15, 2021

“I believe that we do not know anything for certain, but everything probably.”

- Christiaan Huygens

Course Announcements

  • Reading Assignment for Friday - W&S, Chapter 5 (QUIZ)

Moving on...

Make sure you read the book for the following discussions

  • How to compute a mean and standard deviation from a frequency table

Question: Why is this important to know?

  • Rounding rules for displaying tables and statistics
  • Effect of changing measurement scale
  • Cumulative frequency distributions (we will cover this later as well)

My point here is that you are responsible for all book material, even if we don't cover it in lecture!

Describing data in R

Measures R commands
\( \overline{Y} \) mean
\( s^2 \) var
\( s \) sd
\( IQR \) IQR\( ^* \)
Multiple summary

\( ^* \) Note that IQR has different algorithms. To match the algorithm in W&S, you should use IQR(___, type=5). There are different algorithms as there are different ways to calculate quantiles. (for curious souls, see ?quantiles). For the HW, either version is acceptable. Default type in R is type=7.

Describing data in R

Measures R commands
\( \overline{Y} \) mean
\( s^2 \) var
\( s \) sd
\( IQR \) IQR
Multiple summary
summary(mydata)
    breadth     
 Min.   : 1.00  
 1st Qu.: 3.00  
 Median : 8.00  
 Mean   :11.88  
 3rd Qu.:17.00  
 Max.   :62.00  

IQR would be \( 17-3 = 14 \).

Precision vs Accuracy

Language: Sampling Distributions

Definition: The sampling distribution is the population distribution of all values for an estimate that we might obtain when we sample a population.

Definition: The standard error of an estimate is the standard deviation of the estimate’s sampling distribution.

Definition: The standard error of the mean is given by
\[ \sigma_{\overline{Y}} = \frac{\sigma}{\sqrt{n}} \] with the approximate standard error of the mean given by \[ \mathrm{SE}_{\overline{Y}} = \frac{s}{\sqrt{n}} \]

Sampling distributions tutorial

"Chalk" talk - Sampling distributions and 95% confidence intervals

Language: Confidence Intervals

Definition: A confidence interval is a range of values surrounding the sample estimate that is likely to contain the population parameter.

Definition: A 95% confidence interval provides a most-plausible range for a parameter. Values lying within the interval are most plausible, whereas those outside are less plausible, based on the data.

Confidence intervals tutorial

Error bars

How to do these in R?

Read and inspect the data.

locustData <- read.csv("../..//Datasets/chapter02/chap02f1_2locustSerotonin.csv")
head(locustData)
  serotoninLevel treatmentTime
1            5.3             0
2            4.6             0
3            4.5             0
4            4.3             0
5            4.2             0
6            3.6             0
str(locustData)
'data.frame':   30 obs. of  2 variables:
 $ serotoninLevel: num  5.3 4.6 4.5 4.3 4.2 3.6 3.7 3.3 12.1 18 ...
 $ treatmentTime : int  0 0 0 0 0 0 0 0 0 0 ...

Error bars

First, calculate the statistics by group needed for the error bars: the mean and standard error. Here, tapply is used to obtain each quantity by treatment group.

meanSerotonin <- tapply(locustData$serotoninLevel, 
                        locustData$treatmentTime, 
                        mean)
sdSerotonin <- tapply(locustData$serotoninLevel, 
                      locustData$treatmentTime, 
                      sd)
nSerotonin <- tapply(locustData$serotoninLevel, 
                     locustData$treatmentTime, 
                     length)
seSerotonin <- sdSerotonin / sqrt(nSerotonin)

Error bars

Draw the strip chart and then add the error bars.

\[ \bar{Y} \pm SE_{\bar{Y}} \]

offsetAmount <- 0.2
stripchart(serotoninLevel ~ treatmentTime, 
           data = locustData, 
           method = "jitter", 
           vertical = TRUE)

segments(1:3 + offsetAmount, 
         meanSerotonin - seSerotonin, 
         1:3 + offsetAmount, 
         meanSerotonin + seSerotonin)

points(meanSerotonin ~ c(c(1,2,3) + offsetAmount), 
       pch = 16, 
       cex = 1.2)

Error bars

Draw the strip chart and then add the error bars.

\[ \bar{Y} \pm SE_{\bar{Y}} \]

plot of chunk unnamed-chunk-5

Error bars can mean different things!!!

plot of chunk unnamed-chunk-6

Different error bars!!! \[ \bar{Y} \pm sd \\ \bar{Y} \pm SE_{\bar{Y}} \\ \bar{Y} \pm 2\times SE_{\bar{Y}} \]