mixed_effects.R

# Mixed-effects model for Split-Plot design
# Author: Prof Dr Md Kamrul Hasan

# Load the required libraries
library(readxl)
library(lme4)

## Loading required package: Matrix

library(lmerTest)

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

library(car)

## Loading required package: carData

library(sjstats)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::expand() masks Matrix::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ tidyr::pack()   masks Matrix::pack()
## ✖ dplyr::recode() masks car::recode()
## ✖ purrr::some()   masks car::some()
## ✖ tidyr::unpack() masks Matrix::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Load the data
crop = read_excel('datasets.xlsx', sheet = 'anova', range = 'AC13:AF45')
crop

## # A tibble: 32 × 4
##    Replication Irrigation Variety Yield
##          <dbl> <chr>      <chr>   <dbl>
##  1           1 control    A         8.9
##  2           1 control    B         9.5
##  3           1 control    C        11.7
##  4           1 control    D        15  
##  5           1 new        A        16.7
##  6           1 new        B        14.6
##  7           1 new        C        18.2
##  8           1 new        D        18.1
##  9           2 control    A        11.6
## 10           2 control    B         7.7
## # ℹ 22 more rows

factors = c('Irrigation', 'Variety', 'Replication')
crop[factors] = lapply(crop[factors], factor)
str(crop)

## tibble [32 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Replication: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 2 2 ...
##  $ Irrigation : Factor w/ 2 levels "control","new": 1 1 1 1 2 2 2 2 1 1 ...
##  $ Variety    : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Yield      : num [1:32] 8.9 9.5 11.7 15 16.7 14.6 18.2 18.1 11.6 7.7 ...

# What is the design of this data?
# The design is a split-plot design with Irrigation (control and new) as the whole plot factor and Variety (A, B, C, D) as the subplot factor.
# There are 4 replications for each combination. So, the total number experimental plots is 2 (Irrigation) * 4 (Variety) * 4 (Replication) = 32. 
# The response variable is Yield (kg per plot).

# Calculate group means
group_means = crop %>% 
  group_by(Irrigation, Variety) %>% 
  summarise(mean_yield = mean(Yield))

## `summarise()` has grouped output by 'Irrigation'. You can override using the
## `.groups` argument.

# Visualize the design with ggplot
# In reality the sequence of the plots is not like this. The treatment Irrigation (whole-plot factor) is randomly assigned in the whole plots at the first stage. In the second stage, the treatment Variety (split-plot factor) is again randomly assigned in the sub-plots. Therefore, split-plot design has two randomization stages.

ggplot(crop)+
  aes(x = Replication, y = Variety, fill = Irrigation, color = Replication) + 
  geom_tile(color = 'white', linewidth = 0) + 
  geom_text(aes(label = Yield), color = 'white', size = 6, angle = 0) + 
  facet_grid(~Irrigation) +
  theme_test() + theme(legend.position = 'none') +
  theme(panel.spacing = unit(0, "lines"))

# See the interaction between Irrigation and Variety
# The interaction plot shows that the effect lines of new and control irrigation are parallel. This indicates that there is unlikely to have interaction between Irrigation and Variety. This means that the variety has not changed the effects of irrigation on yield and vice versa.

with(crop, interaction.plot(Variety, Irrigation, Yield))

# Mixed-effect model
# Fixed effects: Irrigation + Variety + Irrigation:Variety
# Random effects: 1|Irrigation (intercepts vary by Irrigation)
# 1|Variety (intercepts vary by Variety)
# 1|Irrigation:Variety (intercepts vary by Irrigation and Variety)
# The random effects are nested within each other. The random effects are crossed with each other. The random effects are not nested within each other. The random effects are not crossed with each other.
# Irrigation|Variety (slopes vary by Irrigation and intercepts vary by Variety)
# 1|Irrigation/Variety (intercepts vary by Variety nested in Irrigation)


# Fit the mixed-effects model
# We are interested in the fixed effects of Irrigation, Variety, and the interaction between Irrigation and Variety on Yield.
# The factor Irrigation is our random effect. We are interested in the random effect of Irrigation on Yield. The intercepts are allowed to vary by Irrigation. 

model_mixed = lmerTest::lmer(Yield ~ Irrigation*Variety + (1|Irrigation), data = crop)

## Warning in as_lmerModLT(model, devfun): Model may not have converged with 1
## eigenvalue close to zero: 6.7e-10

summary(model_mixed)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Yield ~ Irrigation * Variety + (1 | Irrigation)
##    Data: crop
## 
## REML criterion at convergence: 98.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.5290 -0.4873  0.0252  0.5209  2.4363 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  Irrigation (Intercept) 8.471    2.911   
##  Residual               2.214    1.488   
## Number of obs: 32, groups:  Irrigation, 2
## 
## Fixed effects:
##                        Estimate Std. Error     df t value Pr(>|t|)   
## (Intercept)              10.325      3.004 24.000   3.437  0.00215 **
## Irrigationnew             6.100      4.248 24.000   1.436  0.16396   
## VarietyB                 -0.950      1.052 24.000  -0.903  0.37554   
## VarietyC                  1.725      1.052 24.000   1.640  0.11415   
## VarietyD                  3.900      1.052 24.000   3.707  0.00110 **
## Irrigationnew:VarietyB   -1.700      1.488 24.000  -1.143  0.26451   
## Irrigationnew:VarietyC   -1.275      1.488 24.000  -0.857  0.39998   
## Irrigationnew:VarietyD   -1.825      1.488 24.000  -1.227  0.23190   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) Irrgtn VartyB VartyC VartyD Irr:VB Irr:VC
## Irrigatinnw -0.707                                          
## VarietyB    -0.175  0.124                                   
## VarietyC    -0.175  0.124  0.500                            
## VarietyD    -0.175  0.124  0.500  0.500                     
## Irrgtnnw:VB  0.124 -0.175 -0.707 -0.354 -0.354              
## Irrgtnnw:VC  0.124 -0.175 -0.354 -0.707 -0.354  0.500       
## Irrgtnnw:VD  0.124 -0.175 -0.354 -0.354 -0.707  0.500  0.500

# with nice formatting
jtools::summ(model_mixed)

Observations	32
Dependent variable	Yield
Type	Mixed effects linear regression

AIC	118.27
BIC	132.93
Pseudo-R² (fixed effects)	0.47
Pseudo-R² (total)	0.89

Fixed Effects
	Est.	S.E.	t val.	d.f.	p
(Intercept)	10.32	3.00	3.44	24.00	0.00
Irrigationnew	6.10	4.25	1.44	24.00	0.16
VarietyB	-0.95	1.05	-0.90	24.00	0.38
VarietyC	1.73	1.05	1.64	24.00	0.11
VarietyD	3.90	1.05	3.71	24.00	0.00
Irrigationnew:VarietyB	-1.70	1.49	-1.14	24.00	0.26
Irrigationnew:VarietyC	-1.28	1.49	-0.86	24.00	0.40
Irrigationnew:VarietyD	-1.83	1.49	-1.23	24.00	0.23
p values calculated using Satterthwaite d.f.

Random Effects
Group	Parameter	Std. Dev.
Irrigation	(Intercept)	2.91
Residual		1.49

Grouping Variables
Group	# groups	ICC
Irrigation	2	0.79

# The fixed part explains (not exactly like the R2) 47% of the total variance.
# The total variance explained by the model is 89%
# Unexplained variance = 11%

# See the random effects part
# Residual variance is 2.214
# Irrigation variance is 8.471
# So, the Irrigation can explain 8.471/(8.471+2.214) = 79.3% of the 11% variance not explained by the fixed effects.

# See the ANOVA of mixed effects model
# Only variety has significant effect on yield
# Irrigation and the interaction between Irrigation and Variety are not significant

anova(model_mixed)

## Type III Analysis of Variance Table with Satterthwaite's method
##                    Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
## Irrigation          3.087   3.087     1    24  1.3944    0.2492    
## Variety            96.431  32.144     3    24 14.5187 1.335e-05 ***
## Irrigation:Variety  4.173   1.391     3    24  0.6282    0.6039    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Now we will analysis usual ANOVA with this data and compare the results with the mixed-effects model.

mod_int = aov(Yield ~ Replication + Irrigation*Variety, data = crop)
summary(mod_int)

##                    Df Sum Sq Mean Sq F value   Pr(>F)    
## Replication         3  10.08    3.36   1.638    0.211    
## Irrigation          1 192.08  192.08  93.679 3.42e-09 ***
## Variety             3  96.43   32.14  15.677 1.40e-05 ***
## Irrigation:Variety  3   4.17    1.39   0.678    0.575    
## Residuals          21  43.06    2.05                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# We see both irrigation and variety have significant effect but their interactions are not significant. Replications are also not significant. We will remove this variable from the final model.

# Now additive model without replication
mod_add = aov(Yield ~ Irrigation + Variety, data = crop)
summary(mod_add)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Irrigation   1 192.08  192.08   90.50 4.10e-10 ***
## Variety      3  96.43   32.14   15.14 5.64e-06 ***
## Residuals   27  57.31    2.12                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Now we will compare the models with anova
anova(mod_int, mod_add)

## Analysis of Variance Table
## 
## Model 1: Yield ~ Replication + Irrigation * Variety
## Model 2: Yield ~ Irrigation + Variety
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     21 43.059                           
## 2     27 57.308 -6   -14.249 1.1582  0.365

# We see that the additive model is better than the interaction model. Based on the p-value of F (p = 0.365, H0: Both model are same) The interaction model is not significantly different from the additive model. So, we will choose the additive model as the final model. We should not loose degree of freedom in the residuals by adding non-significant factors in the model. Interactive model with replication has 21 df for residuals, while this is 27 in case of additive model without replication.

# Now we will compare the mixed effect model with the additive model
anova(model_mixed, mod_add)

## refitting model(s) with ML (instead of REML)

## Data: crop
## Models:
## mod_add: Yield ~ Irrigation + Variety
## model_mixed: Yield ~ Irrigation * Variety + (1 | Irrigation)
##             npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## mod_add        6 121.46 130.25 -54.729   109.46                     
## model_mixed   10 127.04 141.70 -53.520   107.04 2.4191  4     0.6592

# p = 0.65, so models are not different
# Principle of parsimony (simplicity): select the simple model if other models are not significantly better.

# Now you can continue with assumptions, post hoc analysis, cv, variance explained and visualization.

mixed_effects.R

kamru

2025-02-23