7M1. Write down and compare the definitions of AIC and WAIC. Which of these criteria is most general? Which assumptions are required to transform the more general criterion into a less general one?

AIC: Works when the posterior is approximately multivariate gaussian. The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

WAIC: This is the Widely Applicable Information Criterion (WAIC), which is \(-2(lppd - pWAIC)\), meaning log posterior predictive density with a penalty proportional to the variance in the posterior predictions. WAIC is an extension of the Akaike Information Criterion (AIC). WAIC estimates the effective number of parameters to adjust for overfitting. This is more general criteria/approach to model fitting as this equation makes no assumption about the shape of the posterior.

7M2. Explain the difference between model selection and model comparison. What information is lost under model selection?

Model selection means choosing the model with the lowest criterion value and then discarding the others, which is not great. This procedure looses information about relative model accuracy contained in the differences among the CV/PSIS/WAIC values. Instead of model selection, it is better to use model comparison. This is a more general approach that uses simple models to understand how different variables influence predictions and in combination with a causal model, implied conditional independencies among variables, help us infer causal relationships.

7M3. When comparing models with an information criterion, why must all models be fit to exactly the same observations? What would happen to the information criterion values, if the models were fit to different numbers of observations? Perform some experiments, if you are not sure.

The models must be fit to the same number of observations because information criterion is based on deviance. If you increase the number of observations, it is likely that you will get a higher deviance and therefore a lower accuracy. Deviance is derived from the sum and not the mean of the observations, and will increase the deviance. A model based on a larger number of observations will always give a higher deviance, thus the deviance (information criteria) is not appropriate to use for comparison in these cases.

7M4. What happens to the effective number of parameters, as measured by PSIS or WAIC, as a prior becomes more concentrated? Why? Perform some experiments, if you are not sure.

When defining more concentrated priors,the effective number of parameters will decrease. This is because the priors become more regularized.

7M5. Provide an informal explanation of why informative priors reduce overfitting.

Overfitting refers to when a statistical model fits exactly against the training data, and therefore cannot perform accurately against unseen data. Defining an appropriate/informative prior guides the model when fitting it to the data by not allowing the model to not estimate the parameters based on extreme values.

7M6. Provide an informal explanation of why overly informative priors result in underfitting.

If the priors are defined too strict then the model will not be able to pick out patterns from the data, resulting in underfittning.