Dummy data

For the sake of this demontration we use a very simple dataset with a very high signal versus noise ratio. Let’s look at a simple timeseries with multiple observations per timepoint.

n.year <- 30
n.replicate <- 10
sd.noise <- 0.1
dataset <- expand.grid(
  Replicate =  seq_len(n.replicate),
  Year = seq_len(n.year)
)
dataset$fYear <- factor(dataset$Year)
dataset$Noise <- rnorm(nrow(dataset), mean = 0, sd = sd.noise)
dataset$Linear <- 1 + 2 * dataset$Year + dataset$Noise
dataset$Quadratic <- 1 + 20 * dataset$Year - 0.5 * dataset$Year ^ 2 + 
  dataset$Noise

ggplot(dataset, aes(x = Year, y = Linear)) + geom_point()

ggplot(dataset, aes(x = Year, y = Quadratic)) + geom_point()

Quadratic trend but fit as a linear trend

Assume that we assume that the trend is linear but in reality it is quadratic. In this case the random intercept of year will pickup the differences with the linear trend.

model.quadratric <- lmer(Quadratic ~ Year + (1|Year), data = dataset)
summary(model.quadratric)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Quadratic ~ Year + (1 | Year)
##    Data: dataset
## 
## REML criterion at convergence: -112.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.65081 -0.64298  0.01145  0.59170  2.59158 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Year     (Intercept) 1.199e+03 34.6304 
##  Residual             1.023e-02  0.1011 
## Number of obs: 300, groups:  Year, 30
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  83.6586    12.9681   6.451
## Year          4.5000     0.7305   6.160
## 
## Correlation of Fixed Effects:
##      (Intr)
## Year -0.873
dataset$QuadraticPredict <- predict(model.quadratric)
dataset$QuadraticFixed <- predict(model.quadratric, re.form = ~0)
ggplot(dataset, aes(x = Year, y = Quadratic)) + 
  geom_point() + 
  geom_line(aes(y = QuadraticFixed), colour = "red") + 
  geom_line(aes(y = QuadraticPredict), colour = "blue")

The random effects will have a strong pattern. Indicating the a second order polynomial makes more sense than a linear trend.

rf <- data.frame(
  Year = seq_len(n.year),
  RandomIntercept = ranef(model.quadratric)$Year[, 1]
)
ggplot(rf, aes(x = Year, y = RandomIntercept)) + geom_point()

What is the trend is perfectly linear?

Both the fixed effect and the random effect can perfectly model the pattern in the data. So the model seems to be unidentifiable. However the likelihood of the model contains a penalty term for the random intercept. The strong the absolute value of the random effect, the stronger the penalty. The fixed effects have no penalty term. Hence, model with strong fixed effect will have a higher likelihood than exactly the same fit generated by strong random effects.

In this case the linear trend is very strong compared to the noise. So the linear trend in the fixed effect fits the data very well. Note that the random effect variance is zero.

model.linear <- lmer(Linear ~ Year + (1|Year), data = dataset)
summary(model.linear)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Linear ~ Year + (1 | Year)
##    Data: dataset
## 
## REML criterion at convergence: -505.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.91649 -0.74745  0.04343  0.68103  2.69821 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Year     (Intercept) 0.00000  0.0000  
##  Residual             0.01019  0.1009  
## Number of obs: 300, groups:  Year, 30
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) 0.9919179  0.0119519      83
## Year        2.0000319  0.0006732    2971
## 
## Correlation of Fixed Effects:
##      (Intr)
## Year -0.873
dataset$LinearPredict <- predict(model.linear)
dataset$LinearFixed <- predict(model.linear, re.form = ~0)
ggplot(dataset, aes(x = Year, y = Linear)) + 
  geom_point() + 
  geom_line(aes(y = LinearFixed), colour = "red") + 
  geom_line(aes(y = LinearPredict), colour = "blue")

What about fitting year as a factor in the fixed effects

Combining the same variables as a factor in the fixed effects and as a random intercept doesn’t make sense. They allow extactly the same model fit and thus the random intercept will always shrink to zero.