Bayesian Statistical Models for Analysing Time-Dependent Data with Measurement Uncertainties.

Dr Niamh Cahill (she/her)

Early Career Workshop on Bayesian Statistics

What is Bayesian statistics?

\[P(A|B) = \frac{P(B|A) P(A)}{P(B)}\]

Bayes theorem in english

Bayes’ theorem can be written in words as:

\[\mbox{posterior is proportional to likelihood times prior}\] … or …

\[\mbox{posterior} \propto \mbox{likelihood} \times \mbox{prior}\]

Each of the three terms posterior, likelihood, and prior are probability distributions (pdfs).

In a Bayesian model, every item of interest is either data (which we will write as \(y\) or \(x\)) or parameters (which we will write as \(\theta\) or \(\sigma\)). Often the parameters are divided up into those of interest, and other nuisance parameters.

Bayes’ rule applied to parameters and data

Given a set of observed data points \(y\) and a set of parameters \(\theta\), we write Bayes’ rule as

\[\underset{\text{posterior}}{P(\theta|y)} = \frac{\underset{\text{likelihood}}{P(y|\theta)}\underset{\text{prior}}{P(\theta)}}{\underset{\text{marginal likelihood}}{P(y)}}\] \(P(y)\) is a constant that is often difficult to calculate and Baye’s rule is often written more simply as a proportional statement

\[\underset{\text{posterior}}{P(\theta|y)} \propto \underset{\text{likelihood}}{P(y|\theta)}\underset{\text{prior}}{P(\theta)}\]

Bayes’ rule applied to parameters and data

\[\underset{\text{posterior}}{P(\theta|y)} \propto \underset{\text{likelihood}}{P(y|\theta)}\underset{\text{prior}}{P(\theta)}\]

Example: one parameter to estimate

\[\theta \sim N(3, .2^2)\]

\[y|\theta \sim N(\theta, .5^2)\]

Example continued

\[\theta \sim N(3, .2^2)\] \[y|\theta \sim N(\theta, .5^2)\]

Note: posterior mean for \(\theta|y\) is 3.1 mm/yr and standard deviation is 0.19 mm/yr.

Example continued

\[\theta \sim N(3, .4^2)\] \[y|\theta \sim N(\theta, .5^2)\]

Note: posterior mean for \(\theta|y\) is 3.27 mm/yr and standard deviation is 0.31 mm/yr.

Example continued

\[\theta \sim N(3, .6^2)\] \[y|\theta \sim N(\theta, .5^2)\]

Note: posterior mean for \(\theta|y\) is 3.35 mm/yr and standard deviation is 0.35 mm/yr.

Example continued

\[\theta \sim U(0, 10)\] \[y|\theta \sim N(\theta, .5^2)\]

Note: posterior mean for \(\theta|y\) is 3.7 mm/yr and standard deviation is 0.5 mm/yr.

Calculating the posterior vs sampling from it

Things you can do with posterior samples

Having the posterior distribution enables you to get at exactly the quantities you are interested in.

Introducing some data

Bayesian regression models, some general notation

Armed with some background information about Bayesian methods and some data we want to analyse, we will now explore some regression models.

We will use the following general notation:

Bayesian regression models, general model specification

When it comes to the specifying Bayesian models we typically define

  1. The process model - describes latent trends in the outcome of interest
  2. The data model - describes how the data are generated and how they link to the process model
  3. The priors - describes what we know about paramters in the process and data models.

We now know:

Model specification cont.

\(\tilde{y} = f(\theta,x)\), where \(f\) = a function, e.g., linear

\(y \sim \pi (y|\theta,\sigma,x)\), where \(\pi\) = probability distribution, e.g., Normal

Priors = Constraints

A modelling option: simple linear regression

Now let’s assume \(y\) is measurements of relative sea level and that we are using age as an explanatory variable.

Process model

\(\tilde{y} = f(\theta,x = age)\), where f = linear

\[\tilde{y} = \alpha + \beta age\]

Data model

\(y \sim \pi (y|\theta,\sigma,x = age)\), where = Normal

\[y|\theta, \sigma \sim N(\tilde{y},\sigma^2)\]

Priors = Constraints

Results: simple linear regression

variable mean q5 q95
alpha -4.0769872 -4.3666123 -3.7898817
beta 1.9112899 1.7514559 2.0764976
sigma 0.0169143 0.0011484 0.0420047

Let’s take another look at the New Jersey data

So what happens when x has associated measurement errors?

Instead of \[\tilde{y} = f(\theta,age) = \alpha + \beta age\] What we want is \[\tilde{y} = f(\theta,\tilde{age}) = \alpha + \beta \tilde{age}\] Where \(\tilde{age}\) is the “true” age.

But we don’t know \(\tilde{age}\) 😿

We can use an Errors-in-Variables (EIV) approach 😃

We can assume that

\(x = \tilde{x} + error\), where error \(\sim\) Normal

\[age = \tilde{age} + error\] \[error \sim Normal(0, \sigma^2_{age})\] which is the same as

\[age \sim N(\tilde{age}, \sigma^2_{age})\] Priors = Constraints

This is called an Errors-in-Variables model.

Results: EIV simple linear regression

variable mean q5 q95
alpha -4.0435842 -4.3040220 -3.7435280
beta 1.8929372 1.7252241 2.0391864
sigma 0.0164157 0.0012616 0.0426489

Let’s take a little detour to explore this a little more

We’ll look at this web application to see the impact of accounting for vs not accounting for age errors in a simple linear regression model.

Extending these ideas: change-point regression

We can extend the Errors-in-Variables idea to be used with more complex process models.

Process model

\(\tilde{y} = f(\theta,\tilde{x})\), where f = change point model

\[\tilde{y} = \alpha + \beta_1 (\tilde{age} - cp) \hspace{2em} \text{for $\tilde{age}$ $<$ $cp$}\]

\[\tilde{y} = \alpha + \beta_2 (\tilde{age} - cp) \hspace{2em} \text{for $\tilde{age}$ $\geq$ $cp$}\]

Results: EIV change-point linear regression

variable mean q5 q95
alpha -0.6006002 -0.8107149 -0.3775718
beta[1] 1.4692624 1.0578567 1.8249054
beta[2] 4.0727349 2.5725674 7.6454496
cp 1.8692600 1.7819783 1.9559629
sigma 0.0142591 0.0012358 0.0341196

Extending these ideas: Gaussian process regression

Process model

\(\tilde{y} = f(\theta,\tilde{x})\), where f = Gaussian process model

Consider the multivariate normal distribution for some k-dimension random vector \(\tilde{y}\) \[\tilde{y} \sim MVN(0, \Sigma)\] The matrix Sigma currently looks like this: \[\Sigma = \left[ \begin{array}{cccc} \sigma_g^2 & 0 & \ldots & 0 \\ 0 & \sigma_g^2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \sigma_g^2 \end{array} \right]\]

Gaussian processes cont.

A zero-mean Gaussian process is defined as follows:

\[\tilde{y} \sim GP(0,K)\]

\[K_{i,j} = \sigma_g^2exp\bigg(-\rho^2(\tilde{x}_i-\tilde{x}_j)^2\bigg)\]

Gaussian processes cont.

Some intuition for the correlation function.

In this case the correlation decay is “slow” and there is still some correlation even between points that are 2000 years apart.

Gaussian processes cont.

Let’s assume that the data suggests that the decay should be much faster than this. This is what \(\rho\) controls. For example, larger values of \(\rho\) might result in a correlation function that looks like this:

Gaussian processes cont.

So what this means in practice is that the GP parameters will control how “wiggly” the curves are:

Results: EIV Gaussian Process regression

variable mean q5 q95
alpha 0.4662679 -3.2557742 5.5385965
phi 1.2327465 0.3897824 2.8894180
sigma 0.0147301 0.0009978 0.0372906
sigma_g 3.1253644 0.6055551 8.3126435

Extending these ideas: integrated Gaussian processes

Process model

\(\tilde{y} = f(\theta,\tilde{x})\), where f = integrated Gaussian process model

We assume a GP model for the rate of sea-level change

\[\omega \sim GP(0, K)\]

Then the sea-level process is obtained by integrating the rate process

\[\tilde{y} = \int_0^x \omega(u) du\]

Results: EIV Integrated Gaussian Process (EIV-IGP)

variable mean q5 q95
phi 0.1626012 0.0344187 0.3518412
sigma 0.0146700 0.0013467 0.0340291
sigma_g 2.8305896 0.9060651 6.3122754

Results: EIV Integrated Gaussian Process (EIV-IGP)

Let’s explore these ideas further