Estimating with uncertainty

M. Drew LaMar
August 31, 2020

“Absolute certainty is a privilege of uneducated minds and fanatics. It is, for scientific folk, an unattainable ideal.”

- Cassius J. Keyser

Class Announcements

  • Lab #3 will be available this Wednesday: Intermediate R
    • Conditionals and loops
  • No reading assignment for Wednesday
  • Homework #1 extension: Due tomorrow (Tuesday) at 11:59 pm
  • Homework #2 will go live tonight at 11:59 pm

Measures of variability - Variance

Definition: The population variance \( \sigma^{2} \) is the average of the squared deviations of all observations from the population mean, and assuming a finite population, we have
\[ \sigma^{2} = \frac{1}{N}\sum_{i=1}^{N}(Y_{i}-\mu)^2 \]

Measures of variability - Variance

Definition: The sample variance \( s^{2} \) is the average of the squared deviations from the sample mean,
\[ s^{2} = \frac{1}{n-1}\sum_{i=1}^{n}(Y_{i}-\overline{Y})^2 \]

Question: Why \( n-1 \)??

Answer: Needed to be unbiased estimate!!

Measures of variability - Standard deviation

Definition: The population standard deviation \( \sigma \) is the square root of population variance
\[ \sigma = \sqrt{\sigma^{2}} \]

Definition: The sample standard deviation \( s \) is the square root of the sample variance,
\[ s = \sqrt{s^{2}} \]

Note #1: \( s \) is in general a biased estimator of \( \sigma \). The bias gets smaller as the sample size gets larger.

Note #2: \( s \) and \( \sigma \) have the same units as the random variable!!!

Measures of variability - Standard deviation

Note #3: If the frequency distribution is bell shaped, then about two-thirds (67%) of the observations will lie within one standard deviation of the mean, and 95% of the observations will lie within two standard deviations of the mean.

Measures of variability - Standard deviation

Note #3: If the frequency distribution is bell shaped, then about two-thirds (67%) of the observations will lie within one standard deviation of the mean, and 95% of the observations will lie within two standard deviations of the mean.

Measures of variability - Interquartile range

Definition: The interquartile range \( IQR \) is the difference between the third and first quartiles of the data. It is the span of the middle 50% of the data.

Measures of variability - Interquartile range

Spiders with huge pedipalps, copulatory organs that make up about 10% of a male's mass. alt text

alt text

Measures of variability - Interquartile range

alt text

  • Middle bar of box is median
  • Bottom of box is first quartile
  • Top of box is third quartile
  • Whiskers extend \( 1.5\times IQR \) above and below box\( ^{*} \)
  • Data outside whiskers (extreme values) are plotted as dots

\( ^{*} \) If whisker extends past the max or min of data, then the whisker will be the max or min of the data

Standard deviation or interquartile range?

Heuristic #1: The location (mean and median) and spread (interquartile range and standard deviation) give similar information when the frequency distribution is symmetric and unimodal (i.e. bell shaped).

Heuristic #2: The mean and standard deviation become less informative when the distribution is strongly skewed or there there are extreme observations.

Coefficient of variation

Since in biology many times the standard deviation scales with the mean, it can be more informative to look at the coefficient of variation.

Definition: The coefficient of variation (CV) calculates the standard deviation as a percentage of the mean: \[ CV = \frac{s}{\bar{Y}}\times 100\% \]

In other words, the CV answers the question “How much variation is there relative to the mean?”

Moving on...

Make sure you read the book for the following discussions

  • How to compute a mean and standard deviation from a frequency table

Question: Why is this important to know?

  • Rounding rules for displaying tables and statistics
  • Effect of changing measurement scale
  • Cumulative frequency distributions (we will cover this later as well)

My point here is that you are responsible for all book material, even if we don't cover it in lecture!

Describing data in R

Measures R commands
\( \overline{Y} \) mean
\( s^2 \) var
\( s \) sd
\( IQR \) IQR\( ^* \)
Multiple summary

\( ^* \) Note that IQR has different algorithms. To match the algorithm in W&S, you should use IQR(___, type=5). There are different algorithms as there are different ways to calculate quantiles. (for curious souls, see ?quantiles). For the HW, either version is acceptable. Default type in R is type=7.

Describing data in R

Measures R commands
\( \overline{Y} \) mean
\( s^2 \) var
\( s \) sd
\( IQR \) IQR
Multiple summary
summary(mydata)
    breadth     
 Min.   : 1.00  
 1st Qu.: 3.00  
 Median : 8.00  
 Mean   :11.88  
 3rd Qu.:17.00  
 Max.   :62.00  

IQR would be \( 17-3 = 14 \).

Online Tutorials - Estimating with Uncertainty