Benjamin et al. use Bayesian arguments to recommend that \(P\) values should be lowered to make results reproducible–claiming an effect at \(\alpha = 0.05\) makes for too many false positives. Today we discuss elements of the argument and review progress on exercises from last time.
Logistics
Resources
**web notes from last time
Software includes:
Class code on Sakai:
clarkFunctions2021.r
Discussion reading:
- Redefining statistical significance, classical and Bayesian statisticians compromise, if \(P\) values are to used, then \(P = 0.05\) is way too high, Nature.
Optional background:
Why Big Data Could Be a Big Fail, Jordan on potential and limitations of Big Data (misleading title).
Why environmental scientists are becoming Bayesians, Clark on proliferation of Bayes in environmental science, Ecol Letters.
Bayesian method for hierarchical models: Are ecologists making a Faustian bargain?, Lele and Dennis offer contrarian view, Ecol Appl.
For next time
review basic rules in this document
review next unit here
Exercises from Unit 1 due 28 January using
exerciseTemplate.Rmd
Today’s plan
check in on
exerciseTemplate.Rmdfor assignmentsBreakout discussion of Redefining statistical significance: consensus/remaining issues with discussion questions
group summaries
Jim: P values, Bayes’ theorem for normal distribution and regression, graphs, factors, R
Objectives:
- Where does a P value come from?
- Derive posterior mean for the “normal-normal”
- Use R for basic operations
A few rules to recall:
logs and exponentiation:
these are equivaluent: \(exp(x) = e^x\)
\(\log( e^x ) = e^{ \log x } = x\)
\(exp(y_1) exp(y_2) = exp(y_1 + y_2)\)
\(\prod_{i=1}^n exp( y_i ) = exp \left( \sum_{i=1}^n y_i \right)\)
\(log( a/b ) = log(a) - log(b)\)
Simple derivatives
\(\frac{d}{dx} \left( x^a \right) = a x^{a-1}\)
\(\frac{d}{dx} \left( e^x \right) = e^x\)
\(\frac{d}{dx} \left( e^{f(x)} \right) = \frac{df}{dx} e^{f(x)}\)
Some sample statistics
The notation \(E(y)\) refers to the expectation of \(y\). For random data it is the sample mean. We do not simply refer to it as a mean, because it is more general: a probability distribution has an expectation, even if there are no data.
mean: \(E(y) = \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i\)
variance: \(Var(y) = E(y^2) - E^2(y) =\frac{1}{n} \sum_{i=1}^n (y_i - \bar{y})^2 = \frac{1}{n} \sum_{i=1}^n y^2_i - \bar{y}^2\)
covariance: \(Cov(x,y) = E(xy) - E(x)E(y) = \frac{1}{n} \sum_{i=1}^n x_i y_i - \bar{x} \bar{y}\)
Models we will encounter
Univariate response models include the following:
| Name | response | additional attributes |
|---|---|---|
| linear (LM) | normal | linear in parameters |
| linear mixed model (LMM) | normal | LM with random effects |
| generalized linear model (GLM) | discrete | linear in parameters on link scale |
| logistic regression | binomial | GLM includes logit link |
| probit regression | binomial | GLM includes probit link |
| Poisson regression | Poisson | GLM typically with log link |
| mixed GLM (GLMM) | binomial, Poisson, … | GLM with random effects |
Multivariate response models have a vector of responses:
| Name | response | additional attributes |
|---|---|---|
| linear (LM) | MVN | linear in parameters |
| categorical | multinomial | multiple classes, one outcome, MV logit or probit link |
| multinomial | multinomial | multiple classes, mulitple outcomes |
| generalized joint attribute model (GJAM) | all types | linear in parameters |
Time series have dependence in time. They can be state-space models that are normal or not for continuous states. The Kalman filter is a simple (normal-normal) state-space model. Hidden Markov model is a term are most often applied to discrete states. Autoregressive models are normal and have dependence on \(p\) previous times, e.g., AR(\(p\)).
Spatial models have dependence in space. They can be LM or GLM with \(n = 1\) and a \(m \times m\) covariance matrix \(\Sigma\), where \(m\) is the number of locations. Kriging is a traditional spatial model for continuous space. Spatial autoregressive models are used where space is viewed as discrete blocks (e.g., census tracks, counties, …).