10 November 2015


  • experimental design
  • data exploration
  • statistical tests & assumptions
  • analysis platforms

source code

experimental design

the most important thing

  • Design your experiment well and execute it well:
    you needn't worry too much in advance about statistics
  • Don't: you're doomed - statistics can't save you
  • "To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of." (Fisher)
  • "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." (Tukey)
  • randomization, replication, control


  • random assignment to treatments
  • poorer alternative: haphazard assignment
  • stratification
    (i.e., randomize within groups)
  • related: experimental blinding


  • how big does your experiment need to be?
  • power: probability of detecting an effect of a particular size,
    if one exists
    • more generally: how much information? what kinds of mistakes? (Gelman & Carlin, 2014)
  • underpowered studies
    • failure is likely
    • cheating is likely
    • significance filter \(\to\) biased estimates
  • overpowered studies waste time, lives, $
  • pseudoreplication (Davies & Gray, 2015; Hurlbert, 1984) confounding sampling units with treatment units

power analysis

  • need to guess effect size and variability
    • biological common sense
    • previous studies
    • ask your supervisor
  • OK to simplify design (e.g. ANOVA \(\to\) \(t\)-test)
  • methods
apropos("^power")  ## base-R functions
library("sos"); findFn("{power analysis}")


  • maximize desired variation
    • e.g. large doses (tradeoff with biological realism)
  • minimize undesired variation
    • within-subjects designs
      • e.g. paired, randomized-block, crossover
    • minimize environmental variation
      • tradeoff with generality
      • e.g. environmental chambers;
        clonal or inbred lines
  • isolate desired effects: positive/negative controls
    (vehicle-only, cage treatments, etc.)
  • control for variation statistically (e.g. ANCOVA)

data exploration

descriptive statistics

  • check that values are reasonable
  • categorical
    • tabulation, cross-tabulation
  • univariate
    • mean, standard deviation
    • median, minimum, maximum
  • bivariate
    • correlations


  • univariate
    • histogram
  • grouped
    • box-and-whisker (violin)
  • bivariate
    • scatterplots
  • multivariate
    • scatterplot matrices

statistical tests


  • independence (hard to test!)
  • homogeneity of variance (vs. heteroscedasticity)
  • linearity
  • Normality (least important)
    • outliers; skew; "fat tails" (Student, 1927)
    • distributional assumptions apply to the conditional distribution of the response variable


  • hypothesis tests are not generally appropriate:
    they answer the wrong question
  • graphical diagnostics
    • residuals plots (linearity, heteroscedasticity)
    • influence plots (outliers)
    • Q-Q plots (Normality)
    • Box-Cox plots (transformations)


(the data set)

dealing with violations

  • drop outliers (report both analyses)
  • transform (e.g. log transform: Box-Cox analysis)
  • non-parametric (rank-based) tests
    (e.g. Mann-Whitney-Wilcoxon, Kruskal-Wallis)
  • relax assumptions/do fancier stats, e.g.
    • logistic regression (0/1 outcomes)
    • quadratic regression (nonlinearity)

what should you use?

  • try to connect scientific & statistical questions
  • data type
    • see decision tree or table
    • if your question doesn't fit in this tree,
      think about how much you like statistics …
  • nonparametric stats
    • slight loss of power
    • stronger assumptions than you think
    • \(p\)-values only - no effect size


  • focus on effect sizes/CIs
  • eschew vacuous hypothesis statements
  • don't accept the null hypothesis
  • "the difference between significant and non-significant is not significant" (Gelman & Stern, 2006)
  • avoid snooping/\(p\)-hacking (Simmons et al., 2011):
    preregister your analyses

computational platforms


  • simple/weak vs. complex/powerful
  • GUI vs command-line
  • default: use what your lab uses


  • ubiquitous
  • open alternatives (Open Office)
  • data in plain sight
  • good enough for simple stuff
  • occasional traps (McCullough & Heiser, 2008)
  • archive your data as CSV, not XLSX

stats packages

  • SPSS, JMP, SAS, …
  • more reliable than Excel
  • more powerful than Excel
  • point & click (mostly)


  • powerful; free & open
  • reproducible: script-based
  • hardest to learn
  • R Commander if you need a GUI
  • great for data manipulation, graphics
    (once you learn how)

Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.

R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the passenger seat, and mountain climbing and spelunking gear in the back. R can take you anywhere you want to go if you take time to learn how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS.

Greg Snow, R-help (May 2006)

t-test (R)

x1 = c(1.5,2.5,2.1)
x2 = c(1.1,1.4,1.5)
##  Welch Two Sample t-test
## data:  x1 and x2
## t = 2.226, df = 2.6648, p-value = 0.1236
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3759079  1.7759079
## sample estimates:
## mean of x mean of y 
##  2.033333  1.333333

t-test (Excel)

Further resources


Davies, G. M., & Gray, A. (2015). Don’t let spurious accusations of pseudoreplication limit our ability to learn from natural experiments (and other messy kinds of ecological monitoring). Ecology and Evolution. http://doi.org/10.1002/ece3.1782

Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science, 9(6), 641–651. http://doi.org/10.1177/1745691614551642

Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. The American Statistician, 60(4), 328–331. http://doi.org/10.1198/000313006X152649

Hurlbert, S. H. (1984). Pseudoreplication and the design of ecological field experiments. Ecological Monographs, 54(2), 187–211. http://doi.org/10.2307/1942661

McCullough, B. D., & Heiser, D. A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis, 52(10), 4570–4578. http://doi.org/10.1016/j.csda.2008.03.004

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. http://doi.org/10.1177/0956797611417632

Student. (1927). Errors of routine analysis. Biometrika, 19(1/2), 151–164. http://doi.org/10.2307/2332181