Intro to Statistical Modeling
January 30, 2023
Quote: “A concern with truth enlivens these models, but just like a golem or a modern robot, scientific models are neither true nor false, neither prophets nor charlatans. Rather they are constructs engineered for some purpose. These constructs are incredibly powerful, dutifully conducting their programmed calculations.”
Golems…
Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting, drawing inference from, and presenting empirical data.
Definition: Aparameter is a quantity describing a population, whereas anestimate orstatistic is a related quantity calculated from a sample.
Parameter examples: Averages, proportions, measures of variation, and measures of relationship
Quote: Nearly every branch of science relies upon the senses of statistical golems. In many cases, it is no longer possible to even measure phenomena of interest, without making use of a model. To measure the strength of natural selection or the speed of a neutrino or the number of species in the Amazon, we must use models. The golem is a prosthesis, doing the measuring for us, performing impressive calculations, finding patterns where none are obvious.
Definition:
Hypothesis testing compares data to what we would expect to see if a specific null hypothesis were true. If the data are too unusual, compared to what we would expect to see if the null hypothesis were true, then the null hypothesis is rejected.
Definition: A
null hypothesis is a specific statement about a population parameter made for the purpose of argument.
Definition: The
alternative hypothesis includes all other feasible values for the population parameter besides the value stated in the null hypothesis.
Example: The average human body temperature is 98 degrees.
\[ H_{0}: \mu = 98 \\ H_{A}: \mu \neq 98 \]
The null hypothesized value is 98. Twenty different samples of 100 measurements each.
95% confidence intervals shown. How many would reject null?
Quote: “Science is not described by the falsification standard.”
Example: \(H_{0}\): The Ivory-billed Woodpecker is extinct.
Quote: There are mistaken confirmations (false positives) and mistaken disconfirmations (false negatives). Against this background of measurement difficulties, scientists who already believe that the Ivory-billed Woodpecker is extinct will always be suspicious of a claimed falsification. Those who believe it is still alive will tend to count the vaguest evidence as falsification.
Binary hypotheses and thinking exacerbates bias!!
Example: \(H_{0}\): Black swans are rare.
Quote: The task here is not to disprove or prove a hypothesis of this kind, but rather to estimate and explain the distribution of swan coloration as accurately as we can.
Quote: But falsification is always consensual, not logical. In light of the real problems of measurement error and the continous nature of natural phenomena, scientific communities argue towards consensus about the meaning of evidence. These arguments can be messy. After the fact, some textbooks misrepresent the history so it appears like logical falsification.
Quote: Bayesian golems treat “randomness” as a property of information, not the world. … We just use randomness to describe our uncertainty in the face of incomplete knowledge.
Quote: Fitting is easy; prediction is hard.
Cross-validation and information criteria help us in three ways.
Quote: Models that are causally incorrect can make better predictions than those that are causally correct.
Quote: A statistical model is an amazing association engine. It makes it possible to detect associations between causes and effects. But a statistical model is never sufficient for inferring cause, because the statistical model makes no distinction between the wind causing the branches to sway and the branches causing the wind to blow. Facts outside the data are needed to decide which explanation is correct.
Intro to Quantitative Biology, Spring 2023