Linear mixed models allow us to handle repeated measurements such as clinical trial in which individuals are assigned to different groups and have a response variable of interest that is recorded on different days. Repeated data generally have random effects and mixed effects. Fixed effects are the factors of interest that we manipulate in the study and Random effects whose level whose levels were sampled randomly from a larger population about which we wish to generalize.
Benefits of Linear Mixed Models:
In this study, I will analyze alcoholic drink study and its influences on word produced.
Data:
The data set contains 31 respondent, and each respondent tried 3 different drink conditions:
1. Alcohol,
2. Placebo drink (smells and taste like alcohol),
3. Water.
Response variable is word, it is the number of words produced by each respondent. Each participant was asked to produce as many words as possible, starting with a given letter, in 1 minute.
Objective:
Find the influences the alcoholic drinks on the number of words produced by each respondent.
Data Preparation
#Loading necessary libraries
library(tidyverse) # For data manipulation
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lmerTest) # Returns comprehensive anova output
## Loading required package: lme4
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
library(ez)
#PREPARING WORK SPAcE
# Clear the workspace:
rm(list = ls())
# Disable scientific notation:
options(scipen=999, digits=4)
# Load data
df <- read.csv("one_way2.csv", header = TRUE)
head(df)
## individuals drink words drink.1 words.1 drink.2 words.2
## 1 1 1 57 2 80 3 75
## 2 2 1 58 2 67 3 64
## 3 3 1 39 2 49 3 48
## 4 4 1 45 2 47 3 41
## 5 5 1 45 2 43 3 36
## 6 6 1 34 2 43 3 49
Our data is in wide format, we need to convert into long format to run the rmle model.
#Reshaping data from wide to long format
dflong <- reshape(df, direction="long",
idvar=c("individuals"),
times= c(1,2,3),
v.names=c("Words","DRINKS"),
varying=c("drink","words", "drink.1", "words.1", "drink.2", "words.2"))
head(dflong)
## individuals time Words DRINKS
## 1.1 1 1 57 1
## 2.1 2 1 58 1
## 3.1 3 1 39 1
## 4.1 4 1 45 1
## 5.1 5 1 45 1
## 6.1 6 1 34 1
#Data manipulation
dat<-dflong %>%
select(-c(DRINKS)) %>%
rename(IDs = individuals,
Drink = time)
head(dat)
## IDs Drink Words
## 1.1 1 1 57
## 2.1 2 1 58
## 3.1 3 1 39
## 4.1 4 1 45
## 5.1 5 1 45
## 6.1 6 1 34
And now, we need to convert our Fixed Effect (Drink variable) into categorical variable, otherwise, ANOVA function will treat as an numeric value.
#Convert to categorical variable
dat$Drink <- factor(dat$Drink,
levels = c(1,2,3),
labels= c("alcohol", "placebo", "control"))
myvars <- c("IDs", "Drink", "Words")
newdata <- dat[myvars]
summary(newdata)
## IDs Drink Words
## Min. : 1 alcohol:31 Min. :24.0
## 1st Qu.: 8 placebo:31 1st Qu.:38.0
## Median :16 control:31 Median :45.0
## Mean :16 Mean :45.4
## 3rd Qu.:24 3rd Qu.:52.0
## Max. :31 Max. :80.0
str(newdata)
## 'data.frame': 93 obs. of 3 variables:
## $ IDs : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Drink: Factor w/ 3 levels "alcohol","placebo",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Words: int 57 58 39 45 45 34 38 44 38 41 ...
Our interest is to find the effect of drinking condition on words produced.
I will use Linear-mixed model(lmer):
“lmer” requires response variable (Words), categorical factor variable (Drink) and (1/IDs) for the repeated measurements. Note that, Words is our response variable, Drink is a Mixed variable and IDs is a random variable.
sum <- newdata %>%
group_by(Drink) %>%
summarise(meanword = mean(Words),
sdword = sd(Words))
sum
## # A tibble: 3 x 3
## Drink meanword sdword
## <fct> <dbl> <dbl>
## 1 alcohol 42.1 9.57
## 2 placebo 46.0 11.8
## 3 control 48.3 11.4
#Graph the sumary test
ggplot(sum, aes(x=Drink, y=meanword))+
geom_col(colour = "black")
We can see that the number of words produced in average increase from alcohol level to control(water) level.
fit_LM <- lmer(Words ~ Drink + (1|IDs), data=newdata)
fit_LM
## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: Words ~ Drink + (1 | IDs)
## Data: newdata
## REML criterion at convergence: 641.2
## Random effects:
## Groups Name Std.Dev.
## IDs (Intercept) 9.50
## Residual 5.49
## Number of obs: 93, groups: IDs, 31
## Fixed Effects:
## (Intercept) Drinkplacebo Drinkcontrol
## 42.06 3.90 6.23
anova(fit_LM)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## Drink 614 307 2 60 10.2 0.00016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see that there was a significant effect of drink(alcohol) on words produced F(2,60) =10.19, P< .001
ezPlot(data = newdata, dv = Words,
within = Drink,
wid = IDs,
x = Drink, #title of y axis
y_lab = 'Mean of words produced by Fixed level') #title of y axis
Results
Based on the results of the ANOVA, we can see that the F statistic was significant at p < .001 and can therefore conclude that there was a significant effect of drink(alcohol) on words produced with P< .001.