- experimental design
- data exploration
- statistical tests & assumptions
- analysis platforms

10 November 2015

- experimental design
- data exploration
- statistical tests & assumptions
- analysis platforms

- Design your experiment well and execute it well:

you needn't worry too much in advance about statistics - Don't: you're doomed - statistics can't save you
- "To consult the statistician after an experiment is finished is often merely to ask him to conduct a
*post mortem*examination. He can perhaps say what the experiment died of." (Fisher) - "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." (Tukey)
*randomization*,*replication*,*control*

- random
*assignment to treatments* - poorer alternative:
*haphazard*assignment - stratification

(i.e., randomize within groups) - related: experimental
*blinding*

- how big does your experiment need to be?
**power**: probability of detecting an effect of a particular size,

if one exists- more generally: how much information? what kinds of mistakes? (Gelman & Carlin, 2014)

*underpowered*studies- failure is likely
- cheating is likely
*significance filter*\(\to\) biased estimates

*overpowered*studies waste time, lives, $**pseudoreplication**(Davies & Gray, 2015; Hurlbert, 1984) confounding sampling units with treatment units

- need to guess
*effect size*and variability- biological common sense
- previous studies
- ask your supervisor

- OK to simplify design (e.g. ANOVA \(\to\) \(t\)-test)
- methods
- seat-of-the-pants
- web calculators
- in R

apropos("^power") ## base-R functions library("sos"); findFn("{power analysis}")

- maximize
*desired*variation- e.g. large doses (tradeoff with biological realism)

- minimize
*undesired*variation- within-subjects designs
- e.g. paired, randomized-block, crossover

- minimize environmental variation
- tradeoff with generality
- e.g. environmental chambers;

clonal or inbred lines

- within-subjects designs
- isolate desired effects:
**positive/negative controls**

(vehicle-only, cage treatments, etc.) - control for variation statistically (e.g. ANCOVA)

- check that values are reasonable
- categorical
- tabulation, cross-tabulation

- univariate
- mean, standard deviation
- median, minimum, maximum

- bivariate
- correlations

- univariate
- histogram

- grouped
- box-and-whisker (violin)

- bivariate
- scatterplots

- multivariate
- scatterplot matrices

- independence (hard to test!)
- homogeneity of variance (vs.
*heteroscedasticity*) - linearity
- Normality (
*least*important)- outliers; skew; "fat tails" (Student, 1927)
- distributional assumptions apply to the
*conditional distribution*of the response variable

- hypothesis tests are
**not**generally appropriate:

they answer the wrong question - graphical diagnostics
- residuals plots (linearity, heteroscedasticity)
- influence plots (outliers)
- Q-Q plots (Normality)
- Box-Cox plots (transformations)

- drop outliers (report both analyses)
- transform (e.g. log transform:
*Box-Cox*analysis) - non-parametric (rank-based) tests

(e.g. Mann-Whitney-Wilcoxon, Kruskal-Wallis) - relax assumptions/do fancier stats, e.g.
- logistic regression (0/1 outcomes)
- quadratic regression (nonlinearity)

- try to connect scientific & statistical questions
- data type
- see decision tree or table
- if your question
*doesn't*fit in this tree,

think about how much you like statistics …

- nonparametric stats
- slight loss of power
- stronger assumptions than you think
- \(p\)-values only - no effect size

- focus on effect sizes/CIs
- eschew vacuous hypothesis statements
- don't accept the null hypothesis
- "the difference between significant and non-significant is not significant" (Gelman & Stern, 2006)
- avoid snooping/\(p\)-hacking (Simmons et al., 2011):
**preregister**your analyses

- simple/weak vs. complex/powerful
- GUI vs command-line
**default:**use what your lab uses

- ubiquitous
- open alternatives (Open Office)
- data in plain sight
- good enough for simple stuff
- occasional traps (McCullough & Heiser, 2008)
- archive your data as CSV, not XLSX

- SPSS, JMP, SAS, …
- more reliable than Excel
- more powerful than Excel
- point & click (mostly)

- powerful; free & open
- reproducible: script-based
- hardest to learn
- R Commander if you need a GUI
- great for data manipulation, graphics

(once you learn how)

Greg Snow, R-help (May 2006)Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.

R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the passenger seat, and mountain climbing and spelunking gear in the back. R can take you anywhere you want to go if you take time to learn how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS.

x1 = c(1.5,2.5,2.1) x2 = c(1.1,1.4,1.5) t.test(x1,x2)

## ## Welch Two Sample t-test ## ## data: x1 and x2 ## t = 2.226, df = 2.6648, p-value = 0.1236 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -0.3759079 1.7759079 ## sample estimates: ## mean of x mean of y ## 2.033333 1.333333

*Nature*web collection, "Statistics for Biologists"- UCLA statistics consulting (many examples in SAS, SPSS, R …)
- CrossValidated

Davies, G. M., & Gray, A. (2015). Don’t let spurious accusations of pseudoreplication limit our ability to learn from natural experiments (and other messy kinds of ecological monitoring). *Ecology and Evolution*. http://doi.org/10.1002/ece3.1782

Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. *Perspectives on Psychological Science*, *9*(6), 641–651. http://doi.org/10.1177/1745691614551642

Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. *The American Statistician*, *60*(4), 328–331. http://doi.org/10.1198/000313006X152649

Hurlbert, S. H. (1984). Pseudoreplication and the design of ecological field experiments. *Ecological Monographs*, *54*(2), 187–211. http://doi.org/10.2307/1942661

McCullough, B. D., & Heiser, D. A. (2008). On the accuracy of statistical procedures in Microsoft Excel 2007. *Computational Statistics & Data Analysis*, *52*(10), 4570–4578. http://doi.org/10.1016/j.csda.2008.03.004

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. *Psychological Science*, *22*(11), 1359–1366. http://doi.org/10.1177/0956797611417632

Student. (1927). Errors of routine analysis. *Biometrika*, *19*(1/2), 151–164. http://doi.org/10.2307/2332181