Analysis of Repeated Data

Introduction

Linear mixed models allow us to handle repeated measurements such as clinical trial in which individuals are assigned to different groups and have a response variable of interest that is recorded on different days. Repeated data generally have random effects and mixed effects. Fixed effects are the factors of interest that we manipulate in the study and Random effects whose level whose levels were sampled randomly from a larger population about which we wish to generalize.

Benefits of Linear Mixed Models:

It can handle some missing data
It can also handle unbalanced data where you have different amounts of data in different conditions.

In this study, I will analyze alcoholic drink study and its influences on word produced.

Data:

The data set contains 31 respondent, and each respondent tried 3 different drink conditions:

1.  Alcohol, 
2.  Placebo drink (smells and taste like alcohol),
3.  Water.

Response variable is word, it is the number of words produced by each respondent. Each participant was asked to produce as many words as possible, starting with a given letter, in 1 minute.

Objective:

Find the influences the alcoholic drinks on the number of words produced by each respondent.

Data Preparation

#Loading necessary libraries
library(tidyverse) # For data manipulation

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lmerTest)  # Returns comprehensive anova output

## Loading required package: lme4

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

library(ez)

#PREPARING WORK SPAcE
# Clear the workspace: 
rm(list = ls())

# Disable scientific notation:
options(scipen=999, digits=4)

# Load data
df <- read.csv("one_way2.csv", header = TRUE)

head(df)

##   individuals drink words drink.1 words.1 drink.2 words.2
## 1           1     1    57       2      80       3      75
## 2           2     1    58       2      67       3      64
## 3           3     1    39       2      49       3      48
## 4           4     1    45       2      47       3      41
## 5           5     1    45       2      43       3      36
## 6           6     1    34       2      43       3      49

Our data is in wide format, we need to convert into long format to run the rmle model.

#Reshaping data from wide to long format 
dflong <- reshape(df, direction="long",
                  idvar=c("individuals"),
                  times= c(1,2,3),
                  v.names=c("Words","DRINKS"),
                  varying=c("drink","words", "drink.1", "words.1", "drink.2", "words.2"))
head(dflong)

##     individuals time Words DRINKS
## 1.1           1    1    57      1
## 2.1           2    1    58      1
## 3.1           3    1    39      1
## 4.1           4    1    45      1
## 5.1           5    1    45      1
## 6.1           6    1    34      1

#Data manipulation
dat<-dflong %>%
  select(-c(DRINKS)) %>%
  rename(IDs   = individuals,
         Drink = time)

head(dat)

##     IDs Drink Words
## 1.1   1     1    57
## 2.1   2     1    58
## 3.1   3     1    39
## 4.1   4     1    45
## 5.1   5     1    45
## 6.1   6     1    34

And now, we need to convert our Fixed Effect (Drink variable) into categorical variable, otherwise, ANOVA function will treat as an numeric value.

#Convert to categorical variable
dat$Drink <- factor(dat$Drink,
                   levels = c(1,2,3),
                   labels= c("alcohol", "placebo", "control"))

myvars <- c("IDs", "Drink", "Words")
newdata <- dat[myvars]
summary(newdata)

##       IDs         Drink        Words     
##  Min.   : 1   alcohol:31   Min.   :24.0  
##  1st Qu.: 8   placebo:31   1st Qu.:38.0  
##  Median :16   control:31   Median :45.0  
##  Mean   :16                Mean   :45.4  
##  3rd Qu.:24                3rd Qu.:52.0  
##  Max.   :31                Max.   :80.0

str(newdata)

## 'data.frame':    93 obs. of  3 variables:
##  $ IDs  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Drink: Factor w/ 3 levels "alcohol","placebo",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Words: int  57 58 39 45 45 34 38 44 38 41 ...

Setting up a Linear Mixed Model

Our interest is to find the effect of drinking condition on words produced.

I will use Linear-mixed model(lmer):

“lmer” requires response variable (Words), categorical factor variable (Drink) and (1/IDs) for the repeated measurements. Note that, Words is our response variable, Drink is a Mixed variable and IDs is a random variable.

sum <- newdata %>% 
  group_by(Drink) %>%
  summarise(meanword = mean(Words),
            sdword   = sd(Words))
sum

## # A tibble: 3 x 3
##   Drink   meanword sdword
##   <fct>      <dbl>  <dbl>
## 1 alcohol     42.1   9.57
## 2 placebo     46.0  11.8 
## 3 control     48.3  11.4

#Graph the sumary test
ggplot(sum, aes(x=Drink, y=meanword))+
  geom_col(colour = "black")

We can see that the number of words produced in average increase from alcohol level to control(water) level.

fit_LM <- lmer(Words ~ Drink + (1|IDs), data=newdata)
fit_LM

## Linear mixed model fit by REML ['lmerModLmerTest']
## Formula: Words ~ Drink + (1 | IDs)
##    Data: newdata
## REML criterion at convergence: 641.2
## Random effects:
##  Groups   Name        Std.Dev.
##  IDs      (Intercept) 9.50    
##  Residual             5.49    
## Number of obs: 93, groups:  IDs, 31
## Fixed Effects:
##  (Intercept)  Drinkplacebo  Drinkcontrol  
##        42.06          3.90          6.23

anova(fit_LM)

## Type III Analysis of Variance Table with Satterthwaite's method
##       Sum Sq Mean Sq NumDF DenDF F value  Pr(>F)    
## Drink    614     307     2    60    10.2 0.00016 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that there was a significant effect of drink(alcohol) on words produced F(2,60) =10.19, P< .001

ezPlot(data = newdata, dv = Words, 
       within = Drink,
       wid = IDs, 
       x = Drink, #title of y axis
       y_lab = 'Mean of words produced by Fixed level') #title of y axis

Results

Based on the results of the ANOVA, we can see that the F statistic was significant at p < .001 and can therefore conclude that there was a significant effect of drink(alcohol) on words produced with P< .001.

**************

Analysis of Repeated Data

Mustafa Arslan

9/6/2021