One one of the central concepts of statistics states: “Correlation does not imply causation”. This is a fundamental notion to not only understanding regression but also for experimental design. While regression analysis (linear, logistic, multivariate, etc.) can be extremely useful for analyzing data and identifying potential trends, experiments can provide reliable evidence that supports a causal relationship between two variables. The basic outline of experimental design includes an explanatory response variable and response variable, randomization, control, and replication. When properly implemented, these principles increase the reliability and validity of the results and allows the experimenter to exclude confounding or lurking variables as an alternative explanation.
Studies that do not follow experimental design principles are labeled observational studies where researchers observe the effects of a risk factor or treatment without intervention. This is particularly are useful in fields such as epidemiology and psychology. Other studies that fall in between the two are termed quasi-experimental designs, which depends on the type of research question of interest, but these kinds of studies can also be useful for validating treatment methods or investigating potential associations on a more exploratory basis.
The aim of an experiment is to study the relationship between two variables. Following the scientific method, experiments are designed to test hypotheses formulated from theories and observations of the natural world. To infer a causal relationship between two variables, sound experimental design is crucial to producing accurate, reliable data. Of the two, the independent/explanatory variable serves to generate a change in the affected dependent/response variable.
In an experiment, the independent variable (shown on the x-axis) is the manipulated element, while the dependent variable (shown along the y-axis) is measured. The graph on the next tab displays the simple graph of a common hypothetical experiment in which the researcher provides various amounts of fertilizer to a type of plant to observe how this effects plant height. In this case, the hypothesis would be to investigate whether more fertilizer causes this type of plant to grow taller. By keeping other variables constant (i.e. amount of water, exposure to sunlight, amount and type of soil, species of plant, etc.), the researcher can infer that adding more fertilizer causes plants to grow taller instead of only observing that amount of fertilizer positively corresponds to plant height.
In addition, while experiments may test multiple independent variables (IVs) and measure even more dependent variables (DVs) since one IV may influence many different things, for the sake of simplicity only one IV and DV is shown in the graph.
Controlling for other factors is essential to reducing the possibility of alternative explanations for the results, which are termed confounding variables. For instance, if the plants were not watered the same amount, then one plant receiving more water could explain the greater plant height rather than the amount of fertilizer. Furthermore, confounders can affect both IVs and DVs, or both at once. Thus, what appears at first to be clear experimental results can quickly turn into a muddled outcome.
Plot: Variables = Source: https://explorable.com/independent-variable
Diagram: Confounders = Source: https://explorable.com/confounding-variables
Plot: Spurious Correlations = Source: The Truthful Art, Chapter 9, pg. 235
In order to determine whether experimental results are valid, researchers must follow several design principles: control, randomization, and replication.
CONTROLS AND BIASES
Think back to the mention of confounding variables. How do we reduce or eliminate their effect? By controlling for the effects that lurking variables can have on the DV. The issue with not controlling properly can lead to several biases. The most prominent bias is experimental bias which occurs when the researcher unconsciously affects results or data to favor certain outcomes based on personal influence. Another kind of bias that occurs in medical settings is the placebo effect which occurs when patients react to a treatment that they believe will have a positive effect even when in actuality no such method has been provided.
In addition, the reversal may affect the results; if participants suspect that they received no actual treatment, then that could change their behavior and the efficacy of the treatment could not be accurately evaluated. In this case, double-blinding is a common and recommended method in which neither the researchers nor participants are aware of the patients’ group status.
RANDOMIZATION
The practice of eliminating one’s own subjective bias with personal judgment is not a reliable method. Thus, randomizing experiments is one of the key features of receiving approval from the standards of academic journals. In addition, it is crucial to practice random sampling when selecting subjects for an experiment. Failure to do so means that any conclusions cannot be attributed to the general populous since the sample does not accurately reflect the overall population. For instance, if only participants under the age of 18 are used to test a medication, then the results of the experiment could not be attributed to the adult population - especially in this case in which children are known to have different reactions to dosages of medication due to either differences in hormone levels, age, or weight.
Two common randomized experimental designs are detailed below:
Items or participants are assigned to groups completely at random using objective methods or random number generators. One common practice is to label each subjects and then assign treatments using a table or random numbers, or running subjects through a computer program to select a random sample. Imagine a simple medical experiment: a company wishes to test the efficacy of a new vaccine and randomly selects 1000 people from across the US to test this vaccine. Subjects are randomly assigned to either the placebo or vaccine treatment. A table of this design is shown on the next tab.
Note: While the participants are randomly distributed between the two treatments, observe that the number of participants per treatment is equal. In experiments, sample size can influence the results and variability, so it is important to keep the treatment sizes generally the same. In statistics, as the sample size increases the amount of variability lessens due to the effect of averaging all participant results. Essentially, it is less likely that the outcome is due to random chance.
In this design, the experimenter distributes subjects into subgroups labeled blocks in a way that reduces the variability within the blocks more so than that between blocks. Then the subjects within each block are randomly assigned to treatment conditions. For example, a sample population of 1000 participants are part of an experiment testing the efficacy of a vaccine. The experimenters decide to split the population into two subgroups: male and female. The participants are assigned to blocks based on gender, and within each block the 500 participants are randomly split between treatments (placebo or vaccine). A table of this design is shown on the next tab.
Note: Gender is commonly used to split participants into blocks since it has been repeatedly observed that men and women have differences in physiology as well as reaction to medication. This particular design explicitly removes gender as a source of variability and as a confounder.
REPLICATION
After an experiment is designed following the principles above and the results are analyzed, we must still remain cautious of drawing strong conclusions from a single experiment no matter the statistical significance shown. For instance, the “success” of one experiment may still be due to random chance or bias or confounders, even when controlling for other variables. Thus, replication will show if a treatment is actually effective over time due to the long-term averaging effect of multiple experiments, which also reduces experimental bias since the original researchers would not be involved int he following experiments.
| Treatment | Count |
|---|---|
| Placebo | 500 |
| Vaccine | 500 |
| Gender | Treatment | Count |
|---|---|---|
| Male | Placebo | 250 |
| Female | Vaccine | 250 |
Observational study of the the number of US citizens who tested for COVID-19 over time between October - December 2020. This is not a controlled experiment but rather data pulled form a survey. This is useful for analyzing trends in behavior and number of COVID-19 tests performed, but no manipulation of any variable occurred here. These types of studies are conducted when the independent variable is not controlled by the experimenter due to ethical or logistical constraints.
Here’s a glimpse (first 10 entries) of the data that was plotted:
As Alberto Cairo mentions in Chapter Two of the The Truthful Art, successful visualizations have five qualities that make them:
- Truthful: based on thorough and reliable data conducted by honest research
- Functional: accurately displays the data in a meaninful way
- Beautiful: follows aesthetic design principles to make the graph easy to interpret and accessible to all audiences (scientists as well as the general public)
- Insightful: reveals associations that would otherwise be hard to identify
- Enlightening: persuades the viewer to interpret the data in the way it was intended and provides insight
As statisticians and researchers, we have an obligation to not deliberately misinterpret or inaccurately present data in such a way that it sends a misleading message. Statstics are powerful. To condense all the natural phenomena that occur everyday and make sense of the universe, we categorize our observations to form logical conclusions and develop theories about them. In order to support specific hypothesis derived from those theories, we conduct experiments to determine the validity of those hypotheses and the relationship between two variables. Experimental results and statistical analyses (that help determine the significance and validity of the results) inform our decisions in society, from policy making to corporations making their next business move.
Furthermore, we also operate with the responsibility to make accurate and pleasing visualizations accessible to the public in both form and function. For instance, we must create graphs that are interpretable to the colorblind and adjust the content and format of graphics to the intended audience whether it be academic, political, corporate, or public. Copious amounts of data are uploaded and displayed through the web every day, but it is another matter to parse through that data. In the case that the data is incomplete or there were constraints on the study design, it is appropriate for statisticians and scientists to acknowledge these limiations while illustrating potential associations and future pathways to investigate.
One graph from Reuters published a graph that flipped the y-axis to persuade readers that the “stand your ground law” enacted in Florida in 2005 was a reasonable decision, when in actuality the graph shows that the number of deaths due to firearms spiked the following few years. While there is still debate as to whether this law directly led to increased gun deaths, the purpose of this graphic was to deliberately misinform readers. In a technological age where things can easily be shared with thousands and millions in the span of seconds and minutes, it is essential to provide informational and accurate visualizations in public and accessible places like the internet and news.
Source: https://www.businessinsider.com/gun-deaths-in-florida-increased-with-stand-your-ground-2014-2
Mark Twain popularized this phrase as a reminder of the persuasive power of statistics and the ways in which data can be manipulated. While we should remain cautious of believing graphics at first glance, visualizations remain and essential part of informing us about the world and making sense of complex trends.