Guide: Exploratory Analysis |
|
This guide outlines basic analytic graphics and principles for exploratory analysis. While no analysis can provide definitive answers, the goal is to develop a plausible and testable hypothesis. Keep this checklist handy during your analysis process.
Considering a competing hypothesis will often strengthen testing of the original hypothesis to 1) Eliminate alternative explanations 2) critically evaluate strengthen hypothesis with robust testing 3) Avoid a baked analysis which reinforces bias and produces what one wishes
Ideally one should formulate plausible alternative hypothesis and design experiments to test both and weigh evidence e.g. “plants grow faster in red light” should be tested against “plants grow faster in day light” e.g. “air purifier improves athsma symptoms” should be tested against control group with no purifier
Use Bayesian reasoning - update the probability of a hypothesis being true based on new evidence. Testing competing hypothesis allows for more comparitive evaluation of evidence - adjusting the likelihood of your hypothesis being true relative to others.
Every hypothesis should be backed by an explination or ratonale. A well founded hypothesis emerges from existing theory, research or observations. Providing a foundation for a hypothesis provides justification for why it should be investigates. Dredging produces artificial hypothesis which seem pulled out of thin air.
Explanation provides further hooks to test and refute hypothesis and opens avenues for further research. e.g.) “Airpurifier improves Athsma symptoms”: why? “reduces particulate matter which effects lungs” this should be tested along with original results.
Multivariate analysis is preferable to analysing single correlation between two separate variables since both variables are likely influenced by other variables - known or suspected.
We should always attempt to test a hypothesis in isolation: multivariate analysis permits more control over confounding variables. We can try to isolate a specific effect on the dependent variable due to the independent variable. e.g.)instead of comparing treatment survival rates, we should compare treatment survival rates over ages - to remove the confounding variable of age. Separating and analysing different effects on the independent variable lead to a better understanding and more transferrable/repeatable studies.
e.g.) false positive correlation: Intelligence increases with shoe size—this is likely due to age as the confounding variable.
e.g.) false negative correlation: mortality rates seen to decrease air pollutants in areas where old folks pick up illness in winters and pollutants rise in summer due to wild fires
Don’t test multiple models without adjusting for multiple comparisons. Overworking a dataset increases the likelihood of finding false positives. This is especially important in legal or medical contexts where wrongful conclusions, like a potentially unsafe conviction, can arise from faulty analysis.
Avoid letting your tools constrain your analysis. Use words, numbers, images, and diagrams as needed to present your ideas effectively. Combining different forms of evidence helps communicate findings more clearly.
Reproducibility is key in research. Be transparent, use appropriate scales, reference sources, and ensure that your analysis is documented in a way that others can replicate. Make your scripts and methodologies available for verification.
Never try to polish bad content. Poor quality data or ideas will undermine any study, no matter how much effort you put into presentation or tools.