M. Drew LaMar
January 22, 2016
“You can't fix by analysis what you bungled by design.”
- Light, Singer and Willett
K. Batygin and M. E. Brown Astronom. J. 151, 22 (2016)
“Orbital calculations suggest that Planet Nine, if it exists, is about ten times the mass of Earth and swings an elliptical path around the Sun once every 10,000–20,000 years.”
- Nature
Reddit /r/science discussion
https://www.reddit.com/r/science/comments/41up8u/
astronomers_have_announced_the_potential/
The Six Divisions of Greater Data Science
What's missing?
(don't include #6 in your discussion)
“Modern statisticians are familiar with the notion that any finite body of data contains only a limited amount of information on any point under examination; that this limit is set by the nature of the data themselves, and cannot be increased by any amount of ingenuity expended in their statistical examination: that the statistician's task, in fact, is limited to the extraction of the whole of the available information on any particular issue.”
- R. A. Fisher (biologist!)
For your question, there is desired and undesired information in your data.
Goals:
Get accurate information by reducing bias
Get precise information by reducing sampling error due to random variation (increase signal-to-noise ratio)
“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.”
- John Tukey
For your question, there is desired and undesired information in your data.
Goals:
“The aim … is to provide a clear and rigorous basis for determining when a causal ordering can be said to hold between two variables or groups of variables in a model…”
- H. Simon
Q1.1 If we wanted to measure the prevalences of left-handedness and religious practices among prison inmates, what population would we sample from?
Q1.2 If we find that two people in our sample have been sharing a prison cell for the last 12 months, will they be independent sample units?
Q1.3 Humans have tremendous variation in the patterning of grooves on our fingers, allowing us to be individually identified by our fingerprints. Why do you think there is such variation?
Q1.4 If we are interested in comparing eyesight between smokers and non-smokers, what other factors could contribute to variation between people in the quality of their eyesight? Are any of the factors you have chosen likely to be related to someone's propensity to smoke?
Q1.5 Faced with two flocks of sheep 25 km apart, how might you go about measuring sample masses in such a way as to reduce or remove the effect of time-of-measurement as a confounding factor?
“Designing effective experiments needs thinking about biology more than it does mathematical calculations.”
“Experimental design is about the biology of the system, and that is why the best people to devise biological experiments are biologists themselves.”
- Ruxton & Colegrave
Whitlock & Schluter, Chapter 1 (PDF will be posted on Blackboard today)