For the sake of this demontration we use a very simple dataset with a very high signal versus noise ratio. Let’s look at a simple timeseries with multiple observations per timepoint.
n.year <- 30
n.replicate <- 10
sd.noise <- 0.1
dataset <- expand.grid(
Replicate = seq_len(n.replicate),
Year = seq_len(n.year)
)
dataset$fYear <- factor(dataset$Year)
dataset$Noise <- rnorm(nrow(dataset), mean = 0, sd = sd.noise)
dataset$Linear <- 1 + 2 * dataset$Year + dataset$Noise
dataset$Quadratic <- 1 + 20 * dataset$Year - 0.5 * dataset$Year ^ 2 +
dataset$Noise
ggplot(dataset, aes(x = Year, y = Linear)) + geom_point()
ggplot(dataset, aes(x = Year, y = Quadratic)) + geom_point()
Assume that we assume that the trend is linear but in reality it is quadratic. In this case the random intercept of year will pickup the differences with the linear trend.
model.quadratric <- lmer(Quadratic ~ Year + (1|Year), data = dataset)
summary(model.quadratric)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Quadratic ~ Year + (1 | Year)
## Data: dataset
##
## REML criterion at convergence: -112.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.65081 -0.64298 0.01145 0.59170 2.59158
##
## Random effects:
## Groups Name Variance Std.Dev.
## Year (Intercept) 1.199e+03 34.6304
## Residual 1.023e-02 0.1011
## Number of obs: 300, groups: Year, 30
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 83.6586 12.9681 6.451
## Year 4.5000 0.7305 6.160
##
## Correlation of Fixed Effects:
## (Intr)
## Year -0.873
dataset$QuadraticPredict <- predict(model.quadratric)
dataset$QuadraticFixed <- predict(model.quadratric, re.form = ~0)
ggplot(dataset, aes(x = Year, y = Quadratic)) +
geom_point() +
geom_line(aes(y = QuadraticFixed), colour = "red") +
geom_line(aes(y = QuadraticPredict), colour = "blue")
The random effects will have a strong pattern. Indicating the a second order polynomial makes more sense than a linear trend.
rf <- data.frame(
Year = seq_len(n.year),
RandomIntercept = ranef(model.quadratric)$Year[, 1]
)
ggplot(rf, aes(x = Year, y = RandomIntercept)) + geom_point()
Both the fixed effect and the random effect can perfectly model the pattern in the data. So the model seems to be unidentifiable. However the likelihood of the model contains a penalty term for the random intercept. The strong the absolute value of the random effect, the stronger the penalty. The fixed effects have no penalty term. Hence, model with strong fixed effect will have a higher likelihood than exactly the same fit generated by strong random effects.
In this case the linear trend is very strong compared to the noise. So the linear trend in the fixed effect fits the data very well. Note that the random effect variance is zero.
model.linear <- lmer(Linear ~ Year + (1|Year), data = dataset)
summary(model.linear)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Linear ~ Year + (1 | Year)
## Data: dataset
##
## REML criterion at convergence: -505.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.91649 -0.74745 0.04343 0.68103 2.69821
##
## Random effects:
## Groups Name Variance Std.Dev.
## Year (Intercept) 0.00000 0.0000
## Residual 0.01019 0.1009
## Number of obs: 300, groups: Year, 30
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.9919179 0.0119519 83
## Year 2.0000319 0.0006732 2971
##
## Correlation of Fixed Effects:
## (Intr)
## Year -0.873
dataset$LinearPredict <- predict(model.linear)
dataset$LinearFixed <- predict(model.linear, re.form = ~0)
ggplot(dataset, aes(x = Year, y = Linear)) +
geom_point() +
geom_line(aes(y = LinearFixed), colour = "red") +
geom_line(aes(y = LinearPredict), colour = "blue")
Combining the same variables as a factor in the fixed effects and as a random intercept doesn’t make sense. They allow extactly the same model fit and thus the random intercept will always shrink to zero.