Parsimony and Collinearity

M. Drew LaMar
February 15, 2019

Announcements

  • Take-home Exam #1 will be live at 11:59 pm Sunday night and due the following Monday at 11:59 pm
    • This is open-universe, except you may not directly ask other people or AI to help you.
    • It is about 30 multiple-choice questions
    • You will submit your answers on Blackboard
  • Solutions to Homework #1 and #2 are on Blackboard
    • Solutions to Homework #3 will be posted after due date

Models in Science

Quote: “Models must be derived to carefully represent each of the science hypotheses.”

\[ H_{1} \Leftrightarrow g_{1}, \ H_{2} \Leftrightarrow g_{2}, \ldots, H_{k} \Leftrightarrow g_{k}. \]

Scientific Question: What is the support or empirical evidence for the ith hypothesis (via its corresponding model), relative to others in the set.

Model Selection: What is the the evidence for each of the hypotheses (and their associated models), given the data.

Models are Approximations

“All models are wrong, but some are useful.”
- Box

Example: Population survival \[ n_{t+1} = s\cdot n_{t} \]

Assumptions:

  • Population survival rate \( s \) does not change over time.
  • Each individual most likely has a different survival rate (\( s \) represents the population average).
  • Biotic and abiotic factors that influence survival rate are being ignored.

Models are Approximations

Discuss: What about Hardy-Weinberg equilibrium? What are the assumptions and approximations that go into this model?

Parameter estimation and model fit

Three common approaches have emerged for general parameter estimation:

  • least squares, LS (or “regression”),
  • maximum likelihood, ML, and
  • Bayesian methods.

Definition: The maximum likelihood estimate (MLE) is the value of the parameter that is most likely, given the data and model.

The Principal of Parsimony

Quote: “A person new to statistical thinking often finds it difficult to relate data, model, and model parameters that must be estimated. These are hard concepts to understand and the concepts are wound into the issue of parsimony. Let the data be fixed and then realize the information in the data is also fixed, then some of this information is "expended” each time a parameter is estimated. Thus, the data will only “support” a certain number of estimates, as this limit is exceeded parameter estimates become either very uncertain (e.g., large standard errors) or reach the point where they are not estimable.“