Principles of Experimental Design

Jon Lefcheck
February 7, 2014

Principles of Experimental Design

What is the hypothesis?
What are the treatments?
Control
Randomization
Replication
Blocking
Independence

What is the hypothesis?

A hypothesis is a question (or questions) that you hope to answer through experimentation.

Hypotheses should ideally be developed before data are collected.

Hypotheses should be developed to address the range of possible outcomes, not just “it worked” or “it didn't.”

What are the treatments?

An experimental treatment is a manipulation or procedure that is administered to experimental replicates.

Treatments should be selected to directly address your hypotheses.

Control

A control represents unmanipulated replicates against which treatment replicates are compared.

urchin grazing

Example: Andrew & Underwood 1993

Do the effects of sea urchin grazing on filamentous algae depend on urchin density (100?

Densities = 100% (natural), 66%, 33%, 0%

Control - The Data

  TREAT PATCH QUAD ALGAE
1    0%     1    1    46
2    0%     1    2    44
3    0%     1    3    41
4    0%     1    4    29
5    0%     1    5    11
6    0%     2    1    65

Control - Plotting the Data

plot of chunk unnamed-chunk-2

Control - The Actual Analysis

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
TREAT	3	4526.42	1508.81	2.84	0.0434
TREAT: C vs 66	1	1109.47	1109.47	2.09	0.1524
TREAT: C vs 33	1	605.49	605.49	1.14	0.2889
TREAT: C vs 0	1	2811.45	2811.45	5.30	0.0241
Residuals	75	39811.79	530.82

Control - Remove Control & Re-run Analysis

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
TREAT	2	3521.62	1760.81	2.50	0.0915
TREAT: 66 vs 33	1	1725.21	1725.21	2.45	0.1235
TREAT: 66 vs 0	1	1796.41	1796.41	2.55	0.1161
Residuals	56	39500.07	705.36

What does it mean, going from 66% to 33% natural density when natural density is not included? How would you interpret this?

Randomization

Randomization refers to the idea that replicates are not assigned to treatments with any bias.

Bad randomization: all replicates in location A recieve Treatment A, all replicates in location B recieve Treatment B.

Example: Quinn 1988

Does limpet density (8, 15, 30, and 45 individuals) and season (spring vs summer) influence egg production?

limpets

Randomization - The Data

   DENSITY SEASON  EGGS
1        8 spring 2.875
2        8 spring 2.625
3        8 spring 1.750
4        8 summer 2.125
5        8 summer 1.500
6        8 summer 1.875
7       15 spring 2.600
8       15 spring 1.866
9       15 spring 2.066
10      15 summer 0.867

Randomization - Plotting the Data

plot of chunk unnamed-chunk-6

Randomization - The Actual Analysis

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
DENSITY	3	5.28	1.76	9.67	0.0007
SEASON	1	3.25	3.25	17.84	0.0006
DENSITY:SEASON	3	0.16	0.05	0.30	0.8240
Residuals	16	2.91	0.18

A strong effect of both density and season, but the effects are not interactive!

Randomization - Re-Assign Treatments & Re-run

Let's put all small densities in spring and all large densities in summer.

   DENSITY SEASON  EGGS DENS
1        8 spring 2.875    8
2        8 spring 2.625    8
3        8 spring 1.750    8
4        8 spring 2.125    8
5        8 spring 1.500    8
6        8 spring 1.875    8
7       15 spring 2.600   15
8       15 spring 1.866   15
9       15 spring 2.066   15
10      15 spring 0.867   15

Randomization - Re-Assign Treatments & Re-run

plot of chunk unnamed-chunk-9

As we planned, small densities are in spring and large densities are in summer.

Randomization - Re-Assign Treatments & Re-run

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
DENSITY	3	5.28	1.76	5.57	0.0061
Residuals	20	6.33	0.32

Uh-oh, we can no longer estimate an effect of season because we lack proper replication for all density treatments across spring and summer!

Replication

Replication is the act of duplicating measurements of experimental treatments.

Replication increases the power, and therefore accuracy, of your experimental results.

Example: Andrew & Underwood 1993

Do the effects of sea urchin grazing on filamentous algae depend on urchin density?

urchin grazing

Replication - The Actual Analysis

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
TREAT	3	4526.42	1508.81	2.84	0.0434
Residuals	75	39811.79	530.82

Replication - How many measurements per treatment?

plot of chunk unnamed-chunk-12

Replication - Reducing Replication

What happens when we halve the number of replicates?

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
TREAT	3	3687.97	1229.32	1.87	0.1519
Residuals	35	22953.46	655.81

Replication - Reducing Replication

If we repeat this procedure 100 times, how many times is TREATMENT significant?

	significant
FALSE	93
TRUE	7

Not very many, which would lead us to conclude that limpet density does not have any effect on algal cover.

Replication - Increasing Replication

But what if we double the number of replicates, and repeat 100 times again?

	significant
FALSE	6
TRUE	94

Increasing replication (and thus power) has shown us that algal cover is indeed related to limpet density, it is just so variable we couldn't detect it with only a few samples!

Blocking

Blocking is the act of grouping replicates into discrete spatial or temporal units.

Blocking helps account for any error associated with being in a particular place or time.

Randomized block design: each block receives at least 1 replicate of each treatment

Example: Caffrey 1982

How do different substrate types (granite, slate & cement) influence barnacle settlement?

urchin grazing

Blocking - The Data

  substrate patch abundance
1   granite     1         8
2     slate     1         2
3    cement     1         3
4   granite     2        14
5     slate     2        11
6    cement     2         8

Blocking - Plotting the Data

plot of chunk unnamed-chunk-17

Some patches are more variable than others!

Blocking - The Actual Analysis

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
substrate	2	117.60	58.80	4.01	0.0362
Residuals	18	263.73	14.65

Blocking - Removing the Blocks

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
substrate	2	117.60	58.80	2.90	0.0724
Residuals	27	547.90	20.29

Removing the blocking term eliminates the relationship between abundance and substrate type!

Independence

Independence is the idea that replicates are completely isolated from one another.

Non-independence arises when replicates are close together in space and/or time, and could influence one another non-randomly.

Non-independence also arises when subreplicates are taken from the same replicate and treated as independent, a condition known as pseudoreplication.

Independence - The Data

Example: Driscoll & Roberts 1997

Investigated the impact of fuel-reduction burning on the number of individual male frogs calling.

frog

Measured the difference in the number of calls between matched burned and unburned sites across three years: 1992 pre-burn, 1993 post-burn, and 1994 post-burn.

Non-independence: Two post-burn years are more closely related than either is to the pre-burn year.

Independence - The Data

     BLOCK BLCK YEAR CALLS
1  logging    1    1     4
2   angove    2    1   -10
3  newpipe    3    1   -15
4 oldquinE    4    1   -14
5 newquinW    5    1    -4
6 newquinE    6    1     0

Independence - Plotting the Data

plot of chunk unnamed-chunk-21

Independence - Actual Analysis

Accounted for closer relationship between 1993 and 1994.

	numDF	denDF	F-value	p-value
(Intercept)	1	28.00	0.92	0.34
YEAR	2	28.00	4.47	0.02

Independence - Do Not Account for Years

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
YEAR	2	287.17	143.58	2.43	0.1068
Residuals	28	1657.50	59.20

Year is no longer significant!

Think Critically!

Before conducting an experiment, consider:

What is my question?
Do my treatments address this question?
Do I have a baseline against which to test treatment effets?
What are my replicates?
Are replicates random and independent?
Are there any additional sources of error?