Benjamin et al. use Bayesian arguments to recommend that \(P\) values should be lowered to make results reproducible–claiming an effect at \(\alpha = 0.05\) makes for too many false positives. Today we discuss elements of the argument and review progress on exercises from last time.

Logistics

Resources

For next time

  • review basic rules in this document

  • review next unit here

  • Exercises from Unit 1 due 28 January using exerciseTemplate.Rmd

Today’s plan

  • check in onexerciseTemplate.Rmd for assignments

  • Breakout discussion of Redefining statistical significance: consensus/remaining issues with discussion questions

  • group summaries

  • Jim: P values, Bayes’ theorem for normal distribution and regression, graphs, factors, R

Objectives:

  1. Where does a P value come from?
  2. Derive posterior mean for the “normal-normal”
  3. Use R for basic operations


A few rules to recall:

logs and exponentiation:

  • these are equivaluent: \(exp(x) = e^x\)

  • \(\log( e^x ) = e^{ \log x } = x\)

  • \(exp(y_1) exp(y_2) = exp(y_1 + y_2)\)

  • \(\prod_{i=1}^n exp( y_i ) = exp \left( \sum_{i=1}^n y_i \right)\)

  • \(log( a/b ) = log(a) - log(b)\)

Simple derivatives

  • \(\frac{d}{dx} \left( x^a \right) = a x^{a-1}\)

  • \(\frac{d}{dx} \left( e^x \right) = e^x\)

  • \(\frac{d}{dx} \left( e^{f(x)} \right) = \frac{df}{dx} e^{f(x)}\)

Some sample statistics

The notation \(E(y)\) refers to the expectation of \(y\). For random data it is the sample mean. We do not simply refer to it as a mean, because it is more general: a probability distribution has an expectation, even if there are no data.

  • mean: \(E(y) = \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i\)

  • variance: \(Var(y) = E(y^2) - E^2(y) =\frac{1}{n} \sum_{i=1}^n (y_i - \bar{y})^2 = \frac{1}{n} \sum_{i=1}^n y^2_i - \bar{y}^2\)

  • covariance: \(Cov(x,y) = E(xy) - E(x)E(y) = \frac{1}{n} \sum_{i=1}^n x_i y_i - \bar{x} \bar{y}\)

Models we will encounter

Univariate response models include the following:

Name response additional attributes
linear (LM) normal linear in parameters
linear mixed model (LMM) normal LM with random effects
generalized linear model (GLM) discrete linear in parameters on link scale
logistic regression binomial GLM includes logit link
probit regression binomial GLM includes probit link
Poisson regression Poisson GLM typically with log link
mixed GLM (GLMM) binomial, Poisson, … GLM with random effects

Multivariate response models have a vector of responses:

Name response additional attributes
linear (LM) MVN linear in parameters
categorical multinomial multiple classes, one outcome, MV logit or probit link
multinomial multinomial multiple classes, mulitple outcomes
generalized joint attribute model (GJAM) all types linear in parameters

Time series have dependence in time. They can be state-space models that are normal or not for continuous states. The Kalman filter is a simple (normal-normal) state-space model. Hidden Markov model is a term are most often applied to discrete states. Autoregressive models are normal and have dependence on \(p\) previous times, e.g., AR(\(p\)).

Spatial models have dependence in space. They can be LM or GLM with \(n = 1\) and a \(m \times m\) covariance matrix \(\Sigma\), where \(m\) is the number of locations. Kriging is a traditional spatial model for continuous space. Spatial autoregressive models are used where space is viewed as discrete blocks (e.g., census tracks, counties, …).