Intro to Data & Experimental Design

M. Drew LaMar
January 20, 2017

“You can't fix by analysis what you bungled by design.”

- Light, Singer and Willett

This is Water

Cognitive Biases

Arranged, and designed by John Manoogian III (jm3). Categories and descriptions originally by Buster Benson.

In the News - Politics & Science

In the News

What Does It Mean When Cancer Findings Can't Be Reproduced?

“But research with living systems is never simple, so there are many possible sources of variation in any experiment, ranging from the animals and cells to the details of lab technique.”
- Richard Harris, NPR


Link:
http://www.npr.org/sections/health-shots/2017/01/18/510304871/what-does-it-mean-when-cancer-findings-cant-be-reproduced

In the News

Course Triad: Content + Skills

Course Triad: Content + Skills

Data Process

  1. Data Planning (Experimental Design)
    • Pilot Studies (Micro. Ver. of #2-4 below)
  2. Data Collection (Experiment/Field Study)
  3. Data Cleaning/Curation (e.g. remove missing values, units)
  4. Data Exploration & Analysis
    • Data Validation (sanity checks, e.g. values make biological sense?)
    • Data Munging/Wrangling (raw -> processed)
    • Data Analysis (Statistics)
    • Data Visualization
  5. Data Dissemination (Data Communication)

Data as Information

“Modern statisticians are familiar with the notion that any finite body of data contains only a limited amount of information on any point under examination; that this limit is set by the nature of the data themselves, and cannot be increased by any amount of ingenuity expended in their statistical examination: that the statistician's task, in fact, is limited to the extraction of the whole of the available information on any particular issue.”

- R. A. Fisher (biologist!)

Data as Information

There is desired and undesired information in data.

Goals:

  • Get accurate information by reducing bias (do we have the right signal?)

  • Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)

    “An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.”

    - John Tukey

Data as Information

For your question, there is desired (signal) and undesired (noise) information in your data.

Goals:

  • Isolate desired information by reducing or controlling for confounding factors (i.e. undesired information)

“The aim … is to provide a clear and rigorous basis for determining when a causal ordering can be said to hold between two variables or groups of variables in a model…”

- H. Simon

Data as Information

alt text

Experimental Design, Data & Statistics

“Designing experiments is as much about learning to think scientifically as it is about the mechanics of the statistics that we use to analyse the data once we have it. It is about having confidence in your data, and knowing that you are measuring what you think you are measuring. It is about knowing what can be concluded from a particular type of experiment and what cannot.

- Ruxton & Colegrave

Experimental Design, Data & Statistics

Design your experiment so that:

  • Measurements lead to useful data.
  • Useful data has information addressing your hypothesis.
  • Statistics are tailored to your data and powerful enough to separate out signal from noise.
  • Results of statistics can be properly interpreted as evidence for or against your original hypothesis.

Two key concepts of experimental design

“It might be said that the two major goals of designing experiments are to minimize random variation and account for confounding factors.

- Ruxton & Colegrave

Definition: Random variation is the differences between measured values of the same variable taken from different experimental subjects.

Good experiments minimize or control for "unwanted” random variation, so that any variation due to the factors of interest can be detected more easily.

Two key concepts of experimental design

“It might be said that the two major goals of designing experiments are to minimize random variation and account for confounding factors.

- Ruxton & Colegrave

Definition: If we want to study the effect of variable A on variable B, but variable C also affects B, then C is a confounding factor.

Let's Talk

Q1.1 If we wanted to measure the prevalences of left-handedness and religious practices among prison inmates, what population would we sample from?

Let's Talk

Q1.2 If we find that two people in our sample have been sharing a prison cell for the last 12 months, will they be independent sample units?

Let's Talk

Q1.3 Humans have tremendous variation in the patterning of grooves on our fingers, allowing us to be individually identified by our fingerprints. Why do you think there is such variation?

Let's Talk

Q1.4 If we are interested in comparing eyesight between smokers and non-smokers, what other factors could contribute to variation between people in the quality of their eyesight? Are any of the factors you have chosen likely to be related to someone's propensity to smoke?

Let's Talk

Q1.5 Faced with two flocks of sheep 25 km apart, how might you go about measuring sample masses in such a way as to reduce or remove the effect of time-of-measurement as a confounding factor?

Final Remarks

“Designing effective experiments needs thinking about biology more than it does mathematical calculations.”

“Experimental design is about the biology of the system, and that is why the best people to devise biological experiments are biologists themselves.”

- Ruxton & Colegrave

Reading for Monday

Whitlock & Schluter, Chapter 1 (PDF will be posted on Blackboard today)