Designing studies, experiments, surveys (Ch 14)

“The goal of experimental design is to eliminate BIAS and reduce SAMPLING ERROR when ESTIMATING [calculating the mean] and TESTING [hypothesis testing w/ a p value] the effects of one variable on another [looking for causal connections]” (pg 47)


Today: focus on true experiments

(but principals apply to all studies)


Wed: Focus on observational studies





























BIAS and SAMPLING ERROR

Bias = wrong answer

= inaccurate

= “systematic discrepency” (pg 6) (overshoot or undershoot)

= caused by properties of instrument (not calibrated),

experimental design, or stat procedure


Sampling error = creates noise around answer

= makes estimates imprecise

= due to random variation in sampling unit

= (like roll of dice, flip of coin)





























Classic Accuracy vs. Precision Illustration:

Bulleyes Diagram

See chapter 1 in book (pg 6)





























Lake Erie Stakeholders:

PA, OH, NY, MI, Ontaria, USFWS, Canada FWS, Trout Unlimted


Thought experiment:

Everyone:

-wants to estimate abundance of steel head

-uses the exact same method (method “A”) using randomized sampling

-BUT draw diff. random numbers to locate sample points





























Lake Erie Stakeholders:

PA, OH, NY, MI, Ontaria, USFWS, Canada FWS, Trout Unlimted


Everyone:

-wants to estimate abundance of steel head

-uses the exact same method using randomized sampling

-BUT draw diff. random numbers to locate sample points


If the REAL number of steelhead in the lake is 100 million, what range of numbers might you expect from these 8 stakeholder if “method A” is…


1)accurate and precise?

2)accurate but not precise?

3)precise but not accurate?





























“Estimation” vs “Hypothesis testing”


Estimation: estimating the value of an unknown parameter / quantity

eg, number of steelhead in Lake Erie, population growth rate of Allegheny county, incidence of HIV in West Africa

Goal of estimation: calculate mean value from sample data that is accurate (the right answer)

and precise (un-ambiguous)


Testing: are 2 estimated vlues different from each other

eg, are steelhead more abundant in PA or NY streams, is HIV declining over time

Involves a stistical model, p-values, etc





























-Increasing sample size is easiest way to increase precision

-Random sampling best way to reduce bias

Which value is most precise relative to the real value?





























Why do randomized experiments?

Deals with “confounding” variables

“randomization minimizes the influence of confounding variables, allowing the experimenter to [conclusively] isolate the effects of the treatment variable” (pg 424) and be confident about causation.


Randomized sampling in observational studies & randomized allocation of treatments in experiments can be said to “break up” the effects of “confounding variables” (p 435)





























Confounding variables


Definition:

“A confoudning variables is a variable that masks or distorts the causal relationship betwen measured variables in a study.”

Consequences of confounding:

-Biased estimation of means

-Incorrect conclusions about causation

-can reverse the apparent direction of causation















Book Ex. of Confounding: Breastfeeding Studies

(Kramer et al 2002)

-Initial observational study: breast-fed babies weighted less @ 6 mo

-Later experimental study w/ randomization: breast-feds weighed more

-Confounding variable: misc, including socio-economic status

“w/ an experiment, random assignment of treatments to participants allows researchers to tease appart the effects of the explanatory variable. With random assignment, no confounding variables will be associated w/ treatment except by chance” (pg 435)





























Example of confounding

Made up example: parasites & fish

Research Question: Does parasite infection cause reduce fish health & mass?

Say we notice a lot of sickly fish in a lake

Dissection indicates that they have intestinal parasites

We sample a bunch of fish and see that mass ~ parasite load





























Aside on Causality: Proximate vs. Ultimate causes

Does temperature variation drive variation in parasite loads?

-Parasites are the proximate cause

-Temperature change is the real driver

-If lake temp drops, parasite abundance goes down, and fish health improves





























Alternative hypothesis:

Temperature stress impacts fish health & immune system

Fish with compromised immune systems more likely to acquire parasites.

Parasites not causing poor health; poor health is resulting in parasite infection





























KEY: An observational study – even if it uses random sampling would have great difficulty in determing the right answer

only some kind of experiment could figure this out

OR a long-term study following individual fish over time





























Problem w/ Experiments: Experimental Artifacts


Definition: “An experimental artifact is a bias in a measurement produced by unintended consequences of experimental procedures”(pg 425)


What experimental artifacts could occur with exclusion experiments?

-Turkey exclusion (Chips et al 2014)

-????