The Golem of Prague

Intro to Statistical Modeling
January 29, 2024

Models as Golems

Golems

  • Clay robots
  • Powerful
  • No wisdom or foresight
  • Dangerous

Models as Golems

Quote: “A concern with truth enlivens these models, but just like a golem or a modern robot, scientific models are neither true nor false, neither prophets nor charlatans. Rather they are constructs engineered for some purpose. These constructs are incredibly powerful, dutifully conducting their programmed calculations.”

Models as Golems

Golems…

  • …do not know if they are being used in the right context.
  • …have a designed purpose (specificity).

Statistical Golems

Statistics is the science concerned with developing and studying methods for collecting, analyzing, interpreting, drawing inference from, and presenting empirical data.

Populations vs Samples

Definition: A parameter is a quantity describing a population, whereas an estimate or statistic is a related quantity calculated from a sample.

Parameter examples: Averages, proportions, measures of variation, and measures of relationship

What is statistics?

  • Statistics is a technology that describes and measures aspects of nature from samples.
  • Statistics lets us quantify the uncertainty of these measures.
  • Statistics is about estimation, the process of inferring an unknown quantity of a target population using sample data.

What is probability?

The Importance of Statistical Models

Quote: Nearly every branch of science relies upon the senses of statistical golems. In many cases, it is no longer possible to even measure phenomena of interest, without making use of a model. To measure the strength of natural selection or the speed of a neutrino or the number of species in the Amazon, we must use models. The golem is a prosthesis, doing the measuring for us, performing impressive calculations, finding patterns where none are obvious.

Statistical Golem Types

Frequentism

Sampling Distributions

https://www.zoology.ubc.ca/~whitlock/Kingfisher/SamplingNormal.htm

Frequentism

Confidence Intervals

https://www.zoology.ubc.ca/~whitlock/Kingfisher/CIMean.htm

Frequentism

Null Hypothesis Statistical Testing (NHST)

Definition: Hypothesis testing compares data to what we would expect to see if a specific null hypothesis were true. If the data are too unusual, compared to what we would expect to see if the null hypothesis were true, then the null hypothesis is rejected.

Frequentism: NHST

Definition: A null hypothesis is a specific statement about a population parameter made for the purpose of argument.

Definition: The alternative hypothesis includes all other feasible values for the population parameter besides the value stated in the null hypothesis.

Frequentism

Null Hypothesis Statistical Testing (NHST)

Example: The average human body temperature is 98 degrees.

\[ H_{0}: \mu = 98 \\ H_{A}: \mu \neq 98 \]

Frequentism

The null hypothesized value is 98. Twenty different samples of 100 measurements each.
95% confidence intervals shown. How many would reject null?

Pre-manufactured black-box golems

Pre-manufactured black-box golems

  • Statistical golems do not understand cause and effect. They only understand association.
  • Classical frequentist tools can work for some tightly controlled experiments with simple hypotheses.

Pre-manufactured black-box golems

  • In general, classical tools are not diverse enough to handle many common research questions.
  • Researchers need unified theory of golem engineering, a set of principles for designing, building, and refining special-purpose statistical procedures.

The problem with falsification in science

Hypotheses vs models

Quote: “Science is not described by the falsification standard.”

Probabilistic nature of evidence

  • Observations prone to error, especially at boundaries of scientific knowledge.
  • Most hypotheses are quantitative, concerning degree of existence, not presence/absence.

Probabilistic nature of evidence

Observations prone to error

Example: \(H_{0}\): The Ivory-billed Woodpecker is extinct.

Quote: There are mistaken confirmations (false positives) and mistaken disconfirmations (false negatives). Against this background of measurement difficulties, scientists who already believe that the Ivory-billed Woodpecker is extinct will always be suspicious of a claimed falsification. Those who believe it is still alive will tend to count the vaguest evidence as falsification.

Binary hypotheses and thinking exacerbates bias!!

Probabilistic nature of evidence

Continuous hypotheses

Example: \(H_{0}\): Black swans are rare.

Quote: The task here is not to disprove or prove a hypothesis of this kind, but rather to estimate and explain the distribution of swan coloration as accurately as we can.

Probabilistic nature of evidence

Quote: But falsification is always consensual, not logical. In light of the real problems of measurement error and the continous nature of natural phenomena, scientific communities argue towards consensus about the meaning of evidence. These arguments can be messy. After the fact, some textbooks misrepresent the history so it appears like logical falsification.

Tools for golem engineering

Bayesian data analysis

  • Takes a question in the form of a model and uses logic to produce an answer in the form of probability distributions.
  • Uses probability theory to count the number of ways the data could happen, according to assumptions.
  • Represents a statistical parameter is a probability distribution (not an unknown fixed value).
  • Uses probability to describe all types of uncertainty, whether empirical or epistemological.

Tools for golem engineering

Bayesian data analysis

Quote: Bayesian golems treat “randomness” as a property of information, not the world. … We just use randomness to describe our uncertainty in the face of incomplete knowledge.

Tools for golem engineering

Model comparison and prediction

Which is the best model?

https://xkcd.com/2048/

Tools for golem engineering

Model comparison and prediction

Quote: Fitting is easy; prediction is hard.

Cross-validation and information criteria help us in three ways.

  • They provide useful expectations of predictive accuracy, rather than merely fit to sample.
  • They give an estimate of the tendancy for a model to overfit.
  • They help us spot highly influential observations.

Tools for golem engineering

Graphical causal models

Quote: Models that are causally incorrect can make better predictions than those that are causally correct.

Quote: A statistical model is an amazing association engine. It makes it possible to detect associations between causes and effects. But a statistical model is never sufficient for inferring cause, because the statistical model makes no distinction between the wind causing the branches to sway and the branches causing the wind to blow. Facts outside the data are needed to decide which explanation is correct.