# Designing studies, experiments, surveys (Ch 14)

## “The goal of experimental design is to eliminate **BIAS** and reduce **SAMPLING ERROR** when **ESTIMATING** [calculating the mean] and **TESTING** [hypothesis testing w/ a p value] the effects of one variable on another [looking for causal connections]” (pg 47)

**Today:** focus on true experiments

## (but principals apply to all studies)

**Wed:** Focus on observational studies

# BIAS and SAMPLING ERROR

**Bias** = wrong answer

## = inaccurate

## = “systematic discrepency” (pg 6) (overshoot or undershoot)

## = caused by properties of instrument (not calibrated),

## experimental design, or stat procedure

**Sampling error** = creates noise around answer

## = makes estimates imprecise

## = due to random variation in sampling unit

# = (like roll of dice, flip of coin)

# Classic Accuracy vs. Precision Illustration:

# Bulleyes Diagram

## See chapter 1 in book (pg 6)

# Lake Erie Stakeholders:

## PA, OH, NY, MI, Ontaria, USFWS, Canada FWS, Trout Unlimted

## Thought experiment:

### Everyone:

### -wants to estimate abundance of steel head

### -uses the exact same method (method “A”) using randomized sampling

### -BUT draw diff. random numbers to locate sample points

# Lake Erie Stakeholders:

## PA, OH, NY, MI, Ontaria, USFWS, Canada FWS, Trout Unlimted

### Everyone:

### -wants to estimate abundance of steel head

### -uses the exact same method using randomized sampling

### -BUT draw diff. random numbers to locate sample points

## If the REAL number of steelhead in the lake is 100 million, what range of numbers might you expect from these 8 stakeholder if “method A” is…

### 1)accurate and precise?

### 2)accurate but not precise?

### 3)precise but not accurate?

# “Estimation” vs “Hypothesis testing”

## Concepts are inter-related

**Estimation:** estimating the value of an unknown parameter / quantity

### eg, number of steelhead in Lake Erie, population growth rate of Allegheny county, incidence of HIV in West Africa

## Goal of estimation: calculate mean value from sample data that is accurate (the right answer)

## and precise (un-ambiguous)

**Testing:** are 2 estimated vlues different from each other

### eg, are steelhead more abundant in PA or NY streams, is HIV declining over time

### Involves a stistical model, p-values, etc

# -Increasing *sample size* is easiest way to increase precision

# -*Random sampling* best way to reduce bias

### Which value is most precise relative to the real value?

# Why do randomized experiments?

## Deals with “confounding” variables

## “randomization minimizes the influence of **confounding variables**, allowing the experimenter to [conclusively] isolate the effects of the treatment variable” (pg 424) and be confident about **causation**.

## Randomized sampling in observational studies & randomized allocation of treatments in experiments can be said to **“break up”** the effects of “confounding variables” (p 435)

# Confounding variables

## Definition:

## “A confoudning variables is a variable that masks or distorts the causal relationship betwen measured variables in a study.”

## Consequences of confounding:

## -Biased estimation of means

## -Incorrect conclusions about causation

## -can reverse the apparent direction of causation

# Book Ex. of Confounding: Breastfeeding Studies

## (Kramer et al 2002)

## -Initial observational study: breast-fed babies weighted less @ 6 mo

## -Later experimental study w/ randomization: breast-feds weighed more

## -Confounding variable: misc, including socio-economic status

### “w/ an experiment, random assignment of treatments to participants allows researchers to tease appart the effects of the explanatory variable. With random assignment, no confounding variables will be associated w/ treatment **except by chance**” (pg 435)

# Example of confounding

## Made up example: parasites & fish

# Research Question: Does parasite infection cause reduce fish health & mass?

## Say we notice a lot of sickly fish in a lake

## Dissection indicates that they have intestinal parasites

## We sample a bunch of fish and see that mass ~ parasite load

# Aside on Causality: **Proximate** vs. **Ultimate** causes

## Does temperature variation drive variation in parasite loads?

## -Parasites are the **proximate** cause

## -Temperature change is the real driver

## -If lake temp drops, parasite abundance goes down, and fish health improves

# Alternative hypothesis:

**Temperature stress** impacts fish health & immune system

## Fish with compromised immune systems more likely to acquire parasites.

## Parasites not causing poor health; poor health is resulting in parasite infection

## KEY: An observational study – even if it uses random sampling would have great difficulty in determing the right answer

## only some kind of experiment could figure this out

## OR a long-term study following individual fish over time

# Problem w/ Experiments: Experimental Artifacts

## Definition: “An **experimental artifact** is a **bias** in a measurement produced by unintended consequences of experimental procedures”(pg 425)

## What experimental artifacts could occur with exclusion experiments?

### -Turkey exclusion (Chips et al 2014)