M. Drew LaMar
September 19, 2016
“Statisticians, like artists, have the bad habit of falling in love with their models.”
- Box
Quote: “Meaningful data of sufficient quantity are the grist of scientific bread.”
Quote: “If data are collected in an appropriate manner, then there is information in the sample data about the process or system under study.”
Quantification is essential due to variation and complexity.
Quote: “Unless one is engaged in simple descriptive studies, they [the empirical sciences] must deal with mathematical models.”
Quote: “We are not trying to model the data; instead, we are trying to model the information in the data.”
Quote: “Data contain both information and noise; fitting the data perfectly would include modeling the noise and this is counter to our science objective.”
Quote: “Models must be derived to carefully represent each of the science hypotheses.”
\[ H_{1} \Leftrightarrow g_{1}, \ H_{2} \Leftrightarrow g_{2}, \ldots, H_{k} \Leftrightarrow g_{k}. \]
Scientific Question: What is the support or empirical evidence for the ith hypothesis (via its corresponding model),
relative to others in the set .
Model Selection: What is the the
evidence for each of the hypotheses (and their associated models),given the data .
“All models are wrong, but some are useful.”
- Box
Example: Population survival \[ n_{t+1} = s\cdot n_{t} \]
Assumptions:
Discuss: What about Hardy-Weinberg equilibrium? What are the assumptions and approximations that go into this model?
Three common approaches have emerged for general parameter estimation:
Definition: The
maximum likelihood estimate (MLE) is the value of the parameter that is most likely, given the data and model.
Quote: “A person new to statistical thinking often finds it difficult to relate data, model, and model parameters that must be estimated. These are hard concepts to understand and the concepts are wound into the issue of parsimony. Let the data be fixed and then realize the information in the data is also fixed, then some of this information is "expended” each time a parameter is estimated. Thus, the data will only “support” a certain number of estimates, as this limit is exceeded parameter estimates become either very uncertain (e.g., large standard errors) or reach the point where they are not estimable.“
“…too few parameters and the model will be so unrealistic as to make prediction unreliable, but too many parameters and the model will be so specific to the particular data set so to make prediction unreliable.”
- Edwards
Quote: “Each time a parameter is estimated, some information is "taken out” of the data, leaving less information available for the estimation of still more parameters.“
Quote: "In model selection, we are really asking which is the best model
for a given sample size .”
In other words, what's the best model given the amount of information that we have?
Quote: “We are really asking - how much
model structure will the data support?”