Statistical Analyses of Stable Isotope using R

09/16/2014

Content of the course

a quick introduction to R
linear models
mixed effects models
Bayesian inference

1. a quick introduction to R

What is R?

R is a free open source programming language and software environment for statistical computing & graphics.

The user can run workflows stored in one or several script file(s), which allows for reproducible research & easy communication.

A core set of packages is included with the installation of R, with more than 5,800 additional packages available, including:

siar
MixSIAR
IsotopeR
IsoriX (soon)

[for more information, visit http://cran.r-project.org/]

Basics of R

my.first.addition <- 1+1
my.first.addition

## [1] 2

foo <- rnorm(n=8, mean=2, sd=1)
foo

## [1] 0.6617029 1.8210471 3.3543329 2.6440125 1.8559594 3.4805985 2.0055959
## [8] 2.2677038

mean(x=foo)

## [1] 2.261369

?rnorm  # will get you some help

Basics of R

#setwd(dir="/home/alex/Boulot/My_teaching/stat_isotopes")
dir(pattern="csv")

## [1] "calibration_example1.csv" "dataset_saskatoon.csv"

my.data <- read.csv(file="dataset_saskatoon.csv", header=TRUE)
dim(x=my.data)

## [1] 288   4

Basics of R

head(x=my.data)

##      Month   O18    DEU precip
## 1 Jun-1990  -9.6  -69.2   59.8
## 2 Jul-1990 -13.2  -87.0   75.9
## 3 Aug-1990 -10.0  -84.8    6.3
## 4 Sep-1990 -15.5 -127.9    6.6
## 5 Oct-1990 -16.9 -127.7    5.8
## 6 Nov-1990 -20.6 -153.4   21.2

tail(x=my.data)

##        Month   O18     DEU precip
## 283 Dec-2013 -27.8 -216.73     NA
## 284 Jan-2014    NA      NA     NA
## 285 Feb-2014    NA      NA     NA
## 286 Mar-2014    NA      NA     NA
## 287 Apr-2014    NA      NA     NA
## 288 May-2014    NA      NA     NA

Basics of R

mean(x=my.data$O18)

## [1] NA

mean(x=my.data$O18, na.rm=TRUE)

## [1] -16.74853

my.data <- na.omit(my.data)  # to avoid this kind of problem
dim(my.data)

## [1] 159   4

mean(x=my.data$O18)

## [1] -16.80572

Basics of R

plot(x=my.data$O18, y=my.data$DEU, xlab=expression(paste(delta^18,"O")),
     ylab=expression(paste(delta^2,"H")), las=1,
     main="Isotopes in Saskatoon")

2. linear models

A meteoric water line is a linear model!

An isoscape is a linear model!

A transfer function is a linear model!

An assignment is a linear model!

A mixing analysis is a linear model!

Linear models are the most commonly used statistical tools!

Some well known linear models:

t-test / anova / ancova / linear regression (lm) / generalized linear models (glm) / mixed effect models (glmm) / hglm / hieararchical linear models / multilevel models / logistic regression / polynomial regressions / probit model / F-test / multiple linear regression / …

It applies to all cases when there is one response variable and one or several expanatory variables thought to influence the dependent variable.

A linear model is nothing but a sum!

\[ y_i = \alpha + \beta_1 \times x_{1i} + \beta_2 \times x_{2i} + \dots + \beta_p \times x_{pi} + \varepsilon_i \]

You know:

\(y_i\) = the value of the response variable for the individual \(i\)
\(x_{ki}\) = the value of the explanatory variable \(k\) for \(i\)

You want:

\(\alpha\) = intercept, i.e. the estimate of \(y\) when all \(x\) are zero
\(\beta\) = parameter estimates; they relate the \(x\) to the \(y\)

You wish:

\(\varepsilon_i\), i.e the residuals, to be small

Different linear models

An example of a quadratic transfer function:

\(\delta\ ^2\textrm{Hprecip}_i = \alpha + \beta_1 \times \delta\ ^2\textrm{Htissue}_i + \beta_2 \times \delta\ ^2\textrm{Htissue}^2_i + \varepsilon_i\)

An example of a mixing analysis:

\(\delta\ ^{13}\textrm{C}_i = 0 + \beta_\textrm{C3} \times \left(\delta\ ^{13}\textrm{C in C3}_i + \textrm{TEF}_\textrm{C3}\right)+\) \(\beta_\textrm{C4} \times \left(\delta\ ^{13}\textrm{C in C4}_i + \textrm{TEF}_\textrm{C4}\right)+ \varepsilon_i\)

[with \(\beta_\textrm{C3}+\beta_\textrm{C4}=1\)]

Different linear models

An example of isoscape:

\(\delta\ ^2\textrm{H}_i = \alpha + \beta_\textrm{lat.abs} \times \textrm{lat.abs}_i + \beta_{\textrm{lat}^2} \times \textrm{lat}^2_i+\) \(\beta_\textrm{elevation} \times \textrm{elevation}_i+u_i + \varepsilon_i\)

[with \(u_i=f\left(\textrm{lat}_i,\textrm{long}_i\right)\) ]

An example of a meteoric water line:

\(\delta\ ^2\textrm{H}_i = \alpha + \beta \times \delta\ ^{18}\textrm{O}_i + \varepsilon_i\)

Example 1: the meteoric water line

A least-squares regression

my.model <- lm(DEU~O18, data=my.data)
my.model

## 
## Call:
## lm(formula = DEU ~ O18, data = my.data)
## 
## Coefficients:
## (Intercept)          O18  
##      -1.427        7.730

That means: \[ \delta\ ^2\textrm{H}_i=-1.43 + 7.73 \times \delta\ ^{18}\textrm{O}_i + \varepsilon_i \]

summary(my.model)

## 
## Call:
## lm(formula = DEU ~ O18, data = my.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.572  -2.959   1.767   4.452  17.565 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.4265     2.1603   -0.66     0.51    
## O18           7.7302     0.1229   62.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.995 on 157 degrees of freedom
## Multiple R-squared:  0.9618, Adjusted R-squared:  0.9616 
## F-statistic:  3957 on 1 and 157 DF,  p-value: < 2.2e-16

Predictions

What is the prediction for \(\delta ^2\)H when \(\delta ^{18}\)O is –9.6?

\[ \delta\ ^2\textrm{H}_\textrm{predicted}=-1.43 + 7.73 \times -9.6\]

my.prediction <- predict(my.model, newdata=data.frame(O18=-9.6))
my.prediction

##         1 
## -75.63621

Predictions

O18.for.predictions <- seq(from=min(my.data$O18),
                          to=max(my.data$O18), length=8)
O18.for.predictions

## [1] -32.62000 -29.04571 -25.47143 -21.89714 -18.32286 -14.74857 -11.17429
## [8]  -7.60000

my.predictions <- predict(my.model,
                          newdata=data.frame(O18=O18.for.predictions))
my.predictions

##          1          2          3          4          5          6 
## -253.58491 -225.95504 -198.32518 -170.69532 -143.06545 -115.43559 
##          7          8 
##  -87.80572  -60.17586

Predictions (plot)

plot(x=my.data$O18, y=my.data$DEU)
points(x=O18.for.predictions, y=my.predictions,
  col="red", type="l", lwd=2)

Alternative:

plot(x=my.data$O18, y=my.data$DEU)
abline(my.model, lty=2, col=2, lwd=2)

Proportion of variance explained

  cor(x=my.data$DEU, y=predict(my.model))^2  # r-squared

## [1] 0.961841

Residuals

my.prediction <- predict(my.model, newdata=data.frame(O18=-9.6))
my.prediction

##         1 
## -75.63621

my.data[1, ]  # observed values for the first station

##      Month  O18   DEU precip
## 1 Jun-1990 -9.6 -69.2   59.8

resid(my.model)[1]  # = my.prediction-my.data[1, "DEU"]

##        1 
## 6.436213

Residuals

summary(resid(my.model))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -26.570  -2.959   1.767   0.000   4.452  17.560

par(mfrow=c(1, 2), las=1)
hist(resid(my.model), col="blue", breaks=10, main="Histogram")
plot(ecdf(resid(my.model)), main="Cumulative distribution function")

Prediction interval at 0.95

The idea: 95% of new observations should fall within it!

Prediction interval (computation)

Easy:

my.predicts <- predict(
  my.model, newdata=data.frame(O18=O18.for.predictions),
  interval="prediction")
my.predicts

##          fit        lwr        upr
## 1 -253.58491 -269.88530 -237.28452
## 2 -225.95504 -242.07322 -209.83687
## 3 -198.32518 -214.30622 -182.34414
## 4 -170.69532 -186.58547 -154.80516
## 5 -143.06545 -158.91176 -127.21914
## 6 -115.43559 -131.28548  -99.58569
## 7  -87.80572 -103.70660  -71.90484
## 8  -60.17586  -76.17468  -44.17704

Prediction interval (computation)

Less easy:

my.predicts2 <- predict(my.model,
    newdata=data.frame(O18=O18.for.predictions), se.fit=TRUE)
my.predicts2

## $fit
##          1          2          3          4          5          6 
## -253.58491 -225.95504 -198.32518 -170.69532 -143.06545 -115.43559 
##          7          8 
##  -87.80572  -60.17586 
## 
## $se.fit
##         1         2         3         4         5         6         7 
## 2.0441131 1.6322608 1.2393424 0.8907753 0.6609153 0.6826086 0.9385718 
##         8 
## 1.2968025 
## 
## $df
## [1] 157
## 
## $residual.scale
## [1] 7.995408

Prediction interval (computation)

Less easy:

sd.predictions <- sqrt(my.predicts2$se.fit^2+my.predicts2$residual.scale^2)
sd.predictions

##        1        2        3        4        5        6        7        8 
## 8.252572 8.160320 8.090891 8.044876 8.022678 8.024494 8.050308 8.099892

my.predicts2$fit + qt(p=0.975, df=my.predicts2$df) * sd.predictions

##          1          2          3          4          5          6 
## -237.28452 -209.83687 -182.34414 -154.80516 -127.21914  -99.58569 
##          7          8 
##  -71.90484  -44.17704

my.predicts[, "upr"]

##          1          2          3          4          5          6 
## -237.28452 -209.83687 -182.34414 -154.80516 -127.21914  -99.58569 
##          7          8 
##  -71.90484  -44.17704

Prediction interval (computation)

Details:

my.predicts2$residual.scale

## [1] 7.995408

var.predicts <- sum(residuals(my.model)^2)/my.predicts2$df
sd.residuals <- sqrt(var.predicts)
sd.residuals

## [1] 7.995408

df.residuals <- nrow(my.data)-length(coef(my.model))
df.residuals == my.predicts2$df

## [1] TRUE

Confidence interval at 0.95

The idea: the true line falls within it for 95% of datasets!

Confidence interval at 0.95

Is our model compatible with the Craig Meteoric Water Line?

A formal test of the CMWL hypothesis

my.data$prediction.CMWL <- 8 * my.data$O18 + 10
model1 <- lm(DEU-prediction.CMWL ~ O18, data=my.data)
model2 <- lm(DEU-prediction.CMWL ~ -1, data=my.data)
anova(model1, model2)

## Analysis of Variance Table
## 
## Model 1: DEU - prediction.CMWL ~ O18
## Model 2: DEU - prediction.CMWL ~ -1
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1    157 10036                                  
## 2    159 17897 -2   -7860.6 61.481 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Main assumptions of linear regressions

About \(x_k\):

strict exogeneity = no error on \(x_{k}\)
linearity = additive effects
lack of multicolinearity = \(x_{k}\) are independant

About the statistical error:

independence –> no autocorrelation in residuals
homoscedasticity –> residuals have a constant variance
normality –> residuals are normally distributed

plot(my.model)

The iid covariance matrix

Covariance matrix of residuals in lm, for \(i=\)A, B, C:

	A	B	C
A	\(\sigma^2\)	0	0
B	0	\(\sigma^2\)	0
C	0	0	\(\sigma^2\)

Errors are independent & identically distributed!

It means that errors come from a single normal distribution.

Modeling heteroscedasticity

Covariance matrix of residuals with structured dispersion:

	A	B	C
A	\(\sigma^2_\textrm{A}\)	0	0
B	0	\(\sigma^2_\textrm{B}\)	0
C	0	0	\(\sigma^2_\textrm{C}\)

Errors are independent & NOT (necessarily) identically distributed!

It means that errors come from several independent normal distribution.

Modeling heteroscedasticity using spaMM

library(spaMM)

## spaMM (version 1.4.1) is loaded.
## Type 'help(spaMM)' for a short introduction.

my.model.spaMM <- HLfit(DEU~O18, resid.formula=~O18, data=my.data)

my.model.spaMM

## formula: DEU ~ O18
## REML: Estimation of phi by REML.
##       Estimation of fixed effects by ML.
## Family: gaussian ( link = identity )
##  ------- Fixed effects (beta) -------
##             Estimate Cond. SE t-value
## (Intercept)  -0.2341   2.0959 -0.1117
## O18           7.7978   0.1074 72.5808
##  -------- Residual variance  --------
## phi formula: "phi" ~ O18
## Coefficients for log[ phi= residual var ]
##             Estimate Cond. SE
## (Intercept)   5.1976  0.38646
## O18           0.0651  0.02204
##  -------- Likelihood values  --------
##                         logLik
## p(h)   (Likelihood): -550.8483
##   p_beta(h)   (ReL): -550.8483

Modeling heteroscedasticity using spaMM

pred.spaMM <- predict(my.model.spaMM,
                newX=data.frame(O18=O18.for.predictions),
                predVar=TRUE, residVar=TRUE)
sqrt(attr(pred.spaMM, "residVar"))

## [1]  4.650973  5.224785  5.869391  6.593526  7.407000  8.320836  9.347417
## [8] 10.500652

my.predicts2$residual.scale  # result for lm

## [1] 7.995408

A more accurate test of the CMWL hypothesis

model3 <- HLfit(DEU-prediction.CMWL ~ O18, resid.formula=~O18,
                data=my.data, HLmethod="ML")
model4 <- HLfit(DEU-prediction.CMWL ~ 0, resid.formula=~O18,
                data=my.data, HLmethod="ML")
anova(model3, model4)

##        LR2 df pvalue
## 1 90.05665  2      0

pchisq(90.06, df=2, lower.tail=FALSE)

## [1] 2.777918e-20

Alternatives to fit a meteoric water line

The Precipitation Weighted Least Squared Regression

Hughes & Crawford (2012), Journal of Hydrology 464:344-351

weighed.model <- update(my.model, weight=precip)
weighed.model

## 
## Call:
## lm(formula = DEU ~ O18, data = my.data, weights = precip)
## 
## Coefficients:
## (Intercept)          O18  
##     -0.1356       7.7384

Alternatives to fit a meteoric water line

The Reduced Major Axis regression

It does capture errors in \(x\) & \(y\), but it is not designed for making predictions!

library(lmodel2)
RMA <- lmodel2(DEU~O18, data=my.data)  # confusing: here RMA = Ranged MA

## RMA was not requested: it will not be computed.
## 
## No permutation test will be performed

RMA$regression.results[3, ]   # look at SMA, this is our RMA!

##   Method Intercept    Slope Angle (degrees) P-perm (1-tailed)
## 3    SMA  1.125407 7.882026        82.76946                NA

2. mixed effects models

What are mixed (effects) models?

\[ y_i = \textrm{fix}_i + u_i + e_i \]

with:

\(\textrm{fix}_i = \alpha + \beta_1 \times x_{1i} + \beta_2 \times x_{2i} + \dots + \beta_p \times x_{pi}\)

. \(\textrm{fix}_i\) = fixed effect

. \(u_i\) = random effect

. \(e_i\) = new iid residuals

Going from non-mixed to mixed linear models,

\(\varepsilon_i\) becomes \(u_i+e_i\)

What are mixed (effects) models?

The covariance matrix of \(e_i\):

	A	B	C
A	\(\sigma^2\)	0	0
B	0	\(\sigma^2\)	0
C	0	0	\(\sigma^2\)

For this error component, errors are independent & identically distributed!

It means that these errors come from a single normal distribution.

What are mixed (effects) models?

The covariance matrix of \(u_i\):

	A	B	C
A	\(\sigma^2_\textrm{A}\)	\(\sigma_\textrm{AB}\)	\(\sigma_\textrm{AC}\)
B	\(\sigma_\textrm{AB}\)	\(\sigma^2_\textrm{B}\)	\(\sigma_\textrm{BC}\)
C	\(\sigma_\textrm{AC}\)	\(\sigma_\textrm{AB}\)	\(\sigma^2_\textrm{C}\)

For this error component, errors are NOT (necessarily) independent & NOT (necessarily) identically distributed!

It means that these errors come from a multivariate normal distribution.

\(\sigma^2_\textrm{A}=\) variance from which the error of A is drawn

\(\sigma_\textrm{AB}=\) covariance for the joint-error of A & B

When shall you use a mixed model?

When:

. you expect a structure in the errors that a non-mixed model would produce. The usual suspects are: kinship (from individuals to the phylogeny), spatial structure, time series

. you have predictors to account for the \(u_i\)

. you have a lot of data

. you can handle with ease non-mixed models!

Typical NO GO: you have 5 repeated measures for 3 different years and you think that considering year as a random effect instead of a fixed factor will spare you some degrees of freedom and will help you to get your p-value below 0.05

Mixed model in R

Some famous packages:

nlme (old, GLMM not possible, flexible if Gaussian)
lme4 (current, GLMM, limited, the standard, messy)
MCMCglmm (current, GLMM, hyper flexible including multivariate, highly technical, super slow)
spaMM (new, GLMM, flexible, clean, still a bit young)

A usefull link: http://glmm.wikidot.com/faq

Comparison of lme4 & spaMM

data(wafers)  # load dataset
table(wafers$batch)

## 
##  1  2  3  4  5  6  7  8  9 10 11 
## 18 18 18 18 18 18 18 18 18 18 18

library(lme4)

## Loading required package: Matrix
## Loading required package: Rcpp

mod.lme4 <- lmer(formula=y ~ X1 + (1|batch), data=wafers)
mod.spaMM <- HLfit(formula=y ~ X1 + (1|batch), data=wafers)

Comparison of lme4 & spaMM

mod.lme4

## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ X1 + (1 | batch)
##    Data: wafers
## REML criterion at convergence: 2454.533
## Random effects:
##  Groups   Name        Std.Dev.
##  batch    (Intercept)  32.86  
##  Residual             120.88  
## Number of obs: 198, groups:  batch, 11
## Fixed Effects:
## (Intercept)           X1  
##      264.42        23.68

mod.spaMM

mod.spaMM

## formula: y ~ X1 + (1 | batch)
## REML: Estimation of lambda and phi by REML.
##       Estimation of fixed effects by ML.
## Family: gaussian ( link = identity )
##  ------- Fixed effects (beta) -------
##             Estimate Cond. SE t-value
## (Intercept)   264.42   13.113  20.164
## X1             23.68    9.862   2.401
##  ---------- Random effects ----------
## Family: gaussian ( link = identity )
## Coefficients for log[ lambda = var(u) ]: 
##  Group        Term Estimate Cond.SE
##  batch (Intercept)    6.985  0.5919
## Estimate of lambda:  1080 
## # of obs: 198; # of groups: batch, 11 
##  -------- Residual variance  --------
## phi formula: "phi" ~ 1
## Coefficients for log[ phi= residual var ]
##             Estimate Cond. SE
## (Intercept)     9.59   0.1025
## Estimate of phi=residual var:  14610 
##  -------- Likelihood values  --------
##                         logLik
## p_v(h) (marginal L): -1233.967
##   p_beta,v(h) (ReL): -1227.266

rbind(lme4=mod.lme4@beta, spaMM=mod.spaMM$fixef)  # fixed effect coeff

##       (Intercept)       X1
## lme4      264.416 23.67608
## spaMM     264.416 23.67608

c(lme4=getME(mod.lme4, "sigma")^2,
  spaMM=exp(mod.spaMM$phi.object$beta_phi)) # var e

##             lme4 spaMM.Intercept) 
##         14611.06         14611.06

c(lme4=as.numeric(VarCorr(mod.lme4)), spaMM=mod.spaMM$lambda)  # var u

##     lme4    spaMM 
## 1079.774 1079.789

c(logLik(mod.lme4), mod.spaMM$APHLs$p_bv)  # likelihood

## [1] -1227.266 -1227.266

More fun: let's model an isoscape!

library(IsoriX)
data(stationdata1)
corrHLfit(formula=y ~ lat.abs + lat.2 + elev + Matern(1|long+lat), 
          data=stationdata1))

distance <- seq(0,100, length=100)
corr <- Matern.corr(d=distance, rho = 0.05, nu = 1.5)
plot(corr~distance, type="l", ylim=c(0,1), las=1, lwd=3)
coor <- Matern.corr(d=distance, rho = 0.05, nu = 0.5)
points(coor~distance, col=2, type="l", lwd=3)
coor <- Matern.corr(d=distance, rho = 0.12, nu = 2)
points(coor~distance, col=3, type="l", lwd=3)
legend("topright", bty="n", lty=1, lwd=3, col=1:3,
       legend=c("0.05; 1.5", "0.05; 05", "0.12; 2"), title="rho; nu")

From correlations predicted between all observations, we can deduce the covariance matrix. Remember: cor(A,B)\(=\frac{\sigma_\textrm{AB}}{\sigma^2_\textrm{A} \sigma^2_\textrm{B}}\)

Bayesian inference

Basics of Bayesian inference

\[P(\theta|\textbf{D}) = \frac{1}{P(\textbf{D})} \times P(\theta) \times P(\textbf{D} |\theta)\]

where:

. \(\theta\) = model parameters

. \(\textbf{D}\) = data

. \(P(\theta|\textbf{D})\) = posterior

. \(P(\textbf{D})\) = a uninteresting constant

. \(P(\theta)\) = prior

. \(P(\textbf{D} |\theta)\) = likelihood

Bayesian vs Frequentist approach

Bayesian inference try to get estimates that optimize the posterior (i.e. probability of the model for a fixed dataset)

Frequentist inference try to get estimates that optimize the likelihood (i.e. probability of the dataset for a fixed model)

Psychologically, Bayesian looks beter!

BUT the posterior depends on the likelihood (and the prior), so we don't avoid to build the inference on the probability of the model for a fixed dataset, and we add the complexity of the prior…

So is it any better?

Only if you know very well your prior!

Brute force methods hidden in Bayesian wrapping

By essence Bayesian inferences lead to the same information than frequentist inference, i.e. parameter estimates from which one can derive tests, predictions, intervals…

So if a package seller pretends he can give you more using Bayesian stuff, this has nothing to do with Bayes

It is because he uses stochiastic optimization methods (Key-words: Gibbs, Metropolis, MCMC) that compute the likelihood using computer power rather than math

Doing so necessarily requires priors and the prior may influence results, so a Bayesian wrapping save appearances…

Those methods (whether sold as Bayesian or not) are the only options for complex problems

The example 1 revisited with JAGS

JAGS: Just Another Gibbs Sampler

library(coda)

## Loading required package: lattice

library(rjags)  # needs JAGS on your computer

## Linked to JAGS 3.4.0
## Loaded modules: basemod,bugs

data <-list(DEU=my.data$DEU, O18=my.data$O18, N=nrow(my.data))

The example 1 revisited with JAGS

The model (in Jags syntax):

model.str <- "model {for (i in 1:N) 
  {DEU[i] ~ dnorm(mu[i], sdresid)        
    mu[i] <- alpha + beta*O18[i]           
  }
  alpha ~ dnorm(0, 0.00001) 
  beta ~ dnorm(0, 0.00001)
  sdresid <- pow(varresid, -2)
  varresid ~ dunif(0, 100)
}"

The example 1 revisited with JAGS

parameters <- c("alpha", "beta")
model <- jags.model(file=textConnection(model.str), data=data,
                    n.adapt=100)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 604
## 
## Initializing model

output <- coda.samples(model=model, variable.names=parameters,
                       n.iter=10000)

summary(output)

## 
## Iterations = 101:10100
## Thinning interval = 1 
## Number of chains = 1 
## Sample size per chain = 10000 
## 
## 1. Empirical mean and standard deviation for each variable,
##    plus standard error of the mean:
## 
##         Mean    SD Naive SE Time-series SE
## alpha -1.344 2.097  0.02097       0.094464
## beta   7.735 0.119  0.00119       0.005376
## 
## 2. Quantiles for each variable:
## 
##         2.5%    25%    50%     75% 97.5%
## alpha -5.387 -2.771 -1.354 0.06749 2.761
## beta   7.505  7.653  7.735 7.81434 7.965

plot(output)

Content of the course

1. a quick introduction to R

What is R?

Basics of R

Basics of R

Basics of R

Basics of R

Basics of R

2. linear models

A meteoric water line is a linear model!

An isoscape is a linear model!

A transfer function is a linear model!

An assignment is a linear model!

A mixing analysis is a linear model!

Linear models are the most commonly used statistical tools!

A linear model is nothing but a sum!

Different linear models

Different linear models

Example 1: the meteoric water line

summary(my.model)

Predictions

Predictions

Predictions (plot)

Proportion of variance explained

Residuals

Residuals

Prediction interval at 0.95

Prediction interval (computation)

Prediction interval (computation)

Prediction interval (computation)

Prediction interval (computation)

Confidence interval at 0.95

Confidence interval at 0.95

A formal test of the CMWL hypothesis

Main assumptions of linear regressions

plot(my.model)

The iid covariance matrix

Modeling heteroscedasticity

Modeling heteroscedasticity using spaMM

Modeling heteroscedasticity using spaMM

A more accurate test of the CMWL hypothesis

Alternatives to fit a meteoric water line

Alternatives to fit a meteoric water line

2. mixed effects models

What are mixed (effects) models?

What are mixed (effects) models?

What are mixed (effects) models?

When shall you use a mixed model?

Mixed model in R

Comparison of lme4 & spaMM

Comparison of lme4 & spaMM

More fun: let's model an isoscape!

The MatÃ©rn correlation structure

Bayesian inference

Basics of Bayesian inference

Bayesian vs Frequentist approach

Brute force methods hidden in Bayesian wrapping

The example 1 revisited with JAGS

The example 1 revisited with JAGS

The example 1 revisited with JAGS

The end