MT5762 Lecture 3

C. Donovan

Sampling biases [leftover from Tues]

These usually arise from not sampling the population we thought, or our means of measuring alters the result.

  • Just bad sampling design Part of the population is not represented as intended e.g. not proportionately
  • Questionnaire/questioner biases questions are leading/misleading, interviewer intimidates/influences subjects
  • Non-response/response biases people lie, forget, get bored, or refuse to participate
  • Self-selection biases unselected people choose to participate - what population is this?
  • Survivorship biases Only selecting survivors/winners, but inferring to a larger population

Cause and effect, observation and experimentation

This section is intended to:

  • contrast randomized designed experiments with observational studies,
  • introduce some terminology of experiments,
  • discuss some principles of good experiments.

Terminology of Experiments

Jargon

  • Experimental units the individuals on which the experiment is done
  • Subjects experimental units that are human beings
  • Treatment a specific experimental condition that is applied to the units
  • Factors explanatory variables in an experiment
  • Response variable the variable being measured on the units after receiving a treatment

Jargon

  • Placebo a non-effective treatment, but should be similar to effective treatments in all other aspects e.g. a saline injection versus a drug injection.
  • Control group experimental units that receive no treatment (although they may receive a non-effective treatment/placebo)
  • Confounding variable/factor a possible cause of any effects detected other than the treatment under study

Example - acupuncture

Example - acupuncture

  • There have been many studies about the efficacy of acupuncture.
  • Some studies consist of comparing:
    • Standard treatments
    • Placebo acupuncture
    • Standard acupuncture
    • No treatment
  • Why is it done this way and does this mean in practice?

Example - acupuncture

For example, in a pain relief study:

  • Needles were applied to subjects using acupuncture principles
  • Needles were applied to subjects somewhat randomly
  • Subects were given ordinary pain relief
  • Subjects were untreated

It was found that there was a needle effect, but you could just stick them in without reference to the acupuncture charts to acheive the same effect.

Signal versus noise

I'll return to this periodically - I believe it is a very useful way to think of all statistical analysis….

Experimentation versus observation

We are often interested in cause-and-effect.

There are two broad types of studies with different abilities to demonstrate causes. The primary distinction between experimental and observational studies is the level of control we have over the speculative cause. Specifically:

  • in an experiment the units (individuals, subjects) are assigned a treatment,
  • in an observational study the units simply have the treatment (either naturally or by choice)
  • observational studies are generally retrospective - the treatment' was imposed before we decided to do our study

Observation vs experiment

However the general purpose of an experiment or an observational study are similar:

  • to determine whether a treatment, or experimental condition, has an effect on an experimental unit (e.g., human subject, animal),
  • to estimate the magnitude of any effect.

Examples

How might the following be investigated?

  • Does growing up with a dog increases a childs immunity to allergens?
  • Does drinking alcohol in moderation lower the risk of heart attacks?
  • Does drinking coffee delay the onset of Alzheimer's disease?
  • Does eating meat sausages “give you cancer”?

Experiments

Advantages of Randomized Experiment

Experiments

  • The primary advantage of a (properly designed) randomized experiment over an observational study is the ability to argue for causation.
  • A randomized experiment generally makes the comparison groups homogeneous (specifically equally representative of the population of interest)
  • therefore differences in the “longrun'' average responses should be largely due to any treatment we decide to apply.
  • observational studies have treatment' groups that are less likely to be homogeneous (in many respects they are self-selected!)
  • you are less protected against confounding factorsin an observational study

Principles of good experiments

Randomization

  • experimental units are randomly assigned to particular treatments
  • use a random number generator
  • the different treatment groups should be relatively similar or homogeneous in all regards except for the different treatments each receives - randomisation eliminates this on average.

Replication

  • Multiple experimental units receive each treatment (obvs?)
  • How many? Not clear, but more is better:
    • estimates of treatment effects become increasingly precise as the number of replicates increases;
    • replicates allow one to estimate the size of chance error (natural variation)

Reduction of variance

this helps increase precision. Do this by:

  • control for known sources of variation - partition the experimental units into homogeneous groups or blocks (also called blocking, or pairing, or matching).
  • e.g. age affects the recommended heart rate for aerobic training, thus if doing an experiment comparing exercise regimes, group the subjects by age classes then randomly assign treatments within each group. (Note: similar to stratified random sampling.)

Unexplained variability in the response is reduced, yielding more precision in the estimate of treatment effect.

Experiments with human subjects

  • Use of placebos for a control group
  • Double blinding

Example

Mozart effect: “Researchers have reported that college students who listened to the first ten minutes of Mozart's Sonata for Two Pianos in D Major subsequently scored significantly higher on a spatial-temporal task than after listening to ten minutes of progressive relaxation instructions, silence, a story, Philip Glass' Music With Changing Parts”, or British-style trance music" - Rauscher & Shaw, Perceptual and Motor Skills, 1998, 86, 835-841

Is this an observational study or an experiment? Why? What are the explanatory and response variables?

Remedies for confounding variables

Confounding fixes

  • if it is an experiment, design it to avoid confounding factors.
  • use matching: make comparisons between treatment and control groups with similar levels of the confounding variable. Example, comparing the non-drinkers, moderate drinkers, and heavy drinkers, make comparisons within same sex (males, females) and same age groups ($<$50, 50-59, 60+)

Arguing causation based on observational studies

So is it possible to argue causation from an observational study? Criteria of the Surgeon General of the US (p. 27 of Wild & Seber, 2000).

  • a strong relationship exists (e.g., the incidence rate of a disease for the control group is 4 times that for the treatment group)
  • a strong research design (e.g., matching, controlling for confounding factors was done to the degree possible)
  • temporal relationship (cause precedes effect; e.g., in contrast to some of subjects in the Wakefield MMR study results)
  • dose-response relationship (e.g., the incidence rate of a disease increases as the level of the dose increases)
  • reversible association (e.g., if treatment is stopped, incidence rate of a disease decreases)
  • consistency (e.g., study after study by different investigators with different subjects at different places and times produces similar results)
  • biological plausibility (e.g., that the treatment does have an effect isn't at odds with current understanding of the underlying biological processes)
  • coherence with known facts

Recap and look-forwards

We've covered:

  • Further terminology
  • The differences between experiments and observational studies and why experiments are superior for establishing cause and effect
  • Identifying potential confounders (almost always exist in observational studies) and therefore, how we might avoid them

Next:

  • Classes of data, how we treat them

Fin

...