Overview

The purpose of this document is to work through the code from the paper below, tidyverse-ising it where possible and adding notes to help me understand what the code is doing to illustrating how linear mixed models work with this dataset.

Things I learned along the way…

  • this code might not be 100% relevant to using LMM with EMG because they are using glmer() rather than lmer(). glmer() is equivalent to logisitic regression and is necessary here because the outcome variable is binary (correct v. not)
  • this code is oversimplified. The examples add intercepts first and then slopes, but in Models 2-4 (with slopes) the slope for each fixed effect is added separately for illustrative purposes. In reality, you would want to include intercepts AND slopes for each fixed effect.

In the real paper, they report …

Accuracy ~ Cuetype + (1 + Condition | Subject) + (1 | Word)

… which is a model including random intercepts for Subject and slopes for condition across subject (1 + Condition | Subject) AND random intercepts across words/items (1 | Word)

Then they report….

Accuracy ~ PhonologicalCue + zLength + zFrequency + zImageability + zNameAgreement + Phonological Cue: zVisualComplexity + (1 + Condition | Subject) + (1 | Word)

… which is cuetype reduced to 2 levels (phono vs not), plus all kinds of word parameters (length, freq, imageability, nameagreement) plus the INTERACTION between phono and visualComplexity, AND random intercepts of subject and slope of condition across subjects, and random intercepts for word/items.

TAKE home: in the real world this is really complex!

An aside about the broom package
  • the broom() package, which is part of the tidymodels framework, looks like it could be really useful, but I haven’t worked out all the functions yet.
    • The tidy() function turns the output of these models into a tidy dataframe,
    • The glance() function gets high-level stats related to the model.
    • I had high hopes for the augment() function, which added fitted values to the original dataframe, but for reasons I havent managed to work out yet, it doesn’t produce EXACTLY the same values as fitted.values() from the stats package. So in extracting fitted.values and random effects I have stuck with base::stats for now.

Data source

Meteyard, L., & Bose, A. (2018). What does a cue do? comparing phonological and semantic cues for picture naming in aphasia. Journal of Speech, Language, and Hearing Research, 61(3), 658-674.

Experiment details

Picture naming experiment with 10 individuals with aphasia

Four conditions / two sessions - Semantic: related (associated) and unrelated (not associated) word presented with picture - Phonological: phonological cue (word onset) and tone presented with picture Counterbalanced so that in each session, half items cued and half uncued - 2 phonological sessions, 2 semantic sessions

Pictures taken from the Philadelphia Naming Test, 175 pictures. Vary along different lexical and picture variables.

For the purpose of this example, looking at the fixed effects of CueType, Length and Frequency.

Dependent variable: Accuracy of naming (coded as 0/1) Generalised LMM used with logistic link function / logistic regression

Data dictionary

  • Word - word to be named from the picture
  • Subject - ID for participant
  • Trial - trial number during naming session
  • ItemNumber - Number code for picture
  • Session - testing session (1-4)
  • Condition - Phonological or Semantic condition of session
  • Acc - naming accuracy (0 or 1)
  • Cue - whether item was cued (associated word or word onset) or not cued (unassociated word or tone)
  • CueCondition - full label for condition - Phonological Cue (PhCue), Associated word (SemCue), Unassociated word (Unassoc) or Tone (tone)
  • Length.Phon - Length of word to be named in phonemes
  • FreqLogLemma - Frequency of the word, log lemma value from Celex

Notes on models

For each model, ONLY the random effect under consideration is fit (see manuscript). This allowed the figures to be generated more easily, and each one to be considered on its own. In reality, a model was fit with multiple random effects (e.g. for the effects of frequency and length and Cue type)

library(tidyverse)
library(broom.mixed)
library(here)
library(lmerTest)
library(lme4)
library(effects)
library(multcomp)
library(merTools)


here::here()
## [1] "/Users/jenny/Desktop/bestPractice_LMM"

Data import and checking

NamingData <- read_csv(here("data", "naming_data.csv"))

Create tables to check responses across conditions/sessions

table(NamingData$Acc,NamingData$Subject,NamingData$Condition)
## , ,  = Phonological
## 
##    
##      AM  AW  CB  DH  EW  FF  JK  JV  MH  WR
##   0 299  37  46 169  39 177 163  42 151  83
##   1  51 310 302 162 237 162 132 293 135 199
## 
## , ,  = Semantic
## 
##    
##      AM  AW  CB  DH  EW  FF  JK  JV  MH  WR
##   0 302  65  38 195  49 213 156  70 206  83
##   1  48 280 309 145 206 126 140 256  74 205
table(NamingData$Acc,NamingData$Subject,NamingData$Session)
## , ,  = 1
## 
##    
##      AM  AW  CB  DH  EW  FF  JK  JV  MH  WR
##   0 150  31  25  87  19  87  78  19  92  41
##   1  25 142 149  77 114  82  68 147  33  99
## 
## , ,  = 2
## 
##    
##      AM  AW  CB  DH  EW  FF  JK  JV  MH  WR
##   0 149  34  21  82  20  90  85  23 114  42
##   1  26 138 153  85 123  80  64 146  41 100
## 
## , ,  = 3
## 
##    
##      AM  AW  CB  DH  EW  FF  JK  JV  MH  WR
##   0 150  18  17 101  23 103  77  41  73  42
##   1  25 155 157  68 104  63  71 123  73 103
## 
## , ,  = 4
## 
##    
##      AM  AW  CB  DH  EW  FF  JK  JV  MH  WR
##   0 152  19  21  94  26 110  79  29  78  41
##   1  23 155 152  77 102  63  69 133  62 102

Mean center and standardize continuous IVs

Length and Frequence are continuous IVs, check coding of predictors

Make new varirables zLength zFreq using scale function.

NamingData$zLengthPh <- scale(NamingData$Length.Phon, scale = TRUE, center = TRUE)
NamingData$zFreq <- scale(NamingData$FreqLogLemma, scale = TRUE, center = TRUE)

summary(NamingData) # check new z scores mean 0
##      Word             Subject              Trial         ItemNo   
##  Length:7000        Length:7000        Min.   :  1   Min.   :  1  
##  Class :character   Class :character   1st Qu.: 44   1st Qu.: 44  
##  Mode  :character   Mode  :character   Median : 88   Median : 88  
##                                        Mean   : 88   Mean   : 88  
##                                        3rd Qu.:132   3rd Qu.:132  
##                                        Max.   :175   Max.   :175  
##                                                                   
##     Session      Condition              Acc             Cue           
##  Min.   :1.00   Length:7000        Min.   :0.0000   Length:7000       
##  1st Qu.:1.75   Class :character   1st Qu.:0.0000   Class :character  
##  Median :2.50   Mode  :character   Median :1.0000   Mode  :character  
##  Mean   :2.50                      Mean   :0.5935                     
##  3rd Qu.:3.25                      3rd Qu.:1.0000                     
##  Max.   :4.00                      Max.   :1.0000                     
##                                    NA's   :645                        
##     ACC          CueCondition        Length.Phon      FreqLogLemma   
##  Mode :logical   Length:7000        Min.   : 1.000   Min.   :0.0000  
##  FALSE:2583      Class :character   1st Qu.: 3.000   1st Qu.:0.9031  
##  TRUE :3772      Mode  :character   Median : 4.000   Median :1.2788  
##  NA's :645                          Mean   : 4.497   Mean   :1.3376  
##                                     3rd Qu.: 5.000   3rd Qu.:1.8325  
##                                     Max.   :11.000   Max.   :3.2119  
##                                                                      
##     zLengthPh.V1           zFreq.V1      
##  Min.   :-1.980182   Min.   :-2.1157856  
##  1st Qu.:-0.847725   1st Qu.:-0.6872390  
##  Median :-0.281496   Median :-0.0929472  
##  Mean   : 0.000000   Mean   : 0.0000000  
##  3rd Qu.: 0.284732   3rd Qu.: 0.7829096  
##  Max.   : 3.682103   Max.   : 2.9648797  
## 

Response variable coding

For this logistic model, coding of response variable needs to be factor with 0=error and 1=correct, currently integer.

glimpse(NamingData)
## Rows: 7,000
## Columns: 14
## $ Word         <chr> "ambulance", "ambulance", "ambulance", "ambulance",…
## $ Subject      <chr> "FF", "WR", "AW", "FF", "MH", "AW", "AW", "CB", "JV…
## $ Trial        <dbl> 156, 65, 156, 40, 40, 65, 132, 65, 132, 40, 65, 65,…
## $ ItemNo       <dbl> 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54,…
## $ Session      <dbl> 1, 4, 3, 3, 1, 1, 4, 3, 2, 3, 4, 3, 2, 3, 2, 3, 1, …
## $ Condition    <chr> "Phonological", "Semantic", "Phonological", "Semant…
## $ Acc          <dbl> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, …
## $ Cue          <chr> "Cued", "Cued", "Cued", "NotCued", "NotCued", "Cued…
## $ ACC          <lgl> FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, …
## $ CueCondition <chr> "PhCue", "SemCue", "PhCue", "UnAssoc", "UnAssoc", "…
## $ Length.Phon  <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, …
## $ FreqLogLemma <dbl> 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.9…
## $ zLengthPh    <dbl[,1]> <matrix[25 x 1]>
## $ zFreq        <dbl[,1]> <matrix[25 x 1]>

In the original data, Acc is double, and ACC is logical. Need to have variable for accuracy that is 0 and 1 as a factor (to include in model) and one that is 0 and 1 as an integer (to do calc on). Leave Acc as integer and convert ACC (which is currently logical) to factor).

NamingData <- NamingData %>%
  mutate(ACC = case_when(ACC == FALSE ~ 0, 
                         ACC == TRUE ~ 1))  

NamingData$ACC <- as.factor(NamingData$ACC)

NamingData$CueCondition <- as.factor(NamingData$CueCondition)

Make Subject factor, then check level names. Recode subject initials to participant = A-J.

NamingData$Subject <- as.factor(NamingData$Subject)


# first check the level names for the original Subject ID
levels(NamingData$Subject)
##  [1] "AM" "AW" "CB" "DH" "EW" "FF" "JK" "JV" "MH" "WR"
# then recode the participant ID (levels), creating a new ID coding variable, Participant, in the same dataset
NamingData$Participant <- fct_recode(NamingData$Subject, 
                              A = "CB", B = "AW", C = "EW", 
                              D = "JV",E = "WR", F = "JK", 
                              G = "FF", H = "DH", I = "MH", 
                              J = "AM")

levels(NamingData$Participant)                                                    
##  [1] "J" "B" "A" "H" "C" "G" "F" "D" "I" "E"

Model 1 Random INTERCEPTS (1|Participant)

First step random effect to add is intercepts for participants. In this random intercepts only model, we are predicting accuracy (ACC) from fixed effects of CueCondition, length and frequency, plus the random effect due to between-subject variation in intercept.

This allows the intercept to vary for individual participants; in effect this accounts for overall between subject varibility in performance generally. Becuase this study is of people with aphasia, allowing for random intercepts means that the model accounts for differences between participants in the severity of their aphasia.

Here using generalised linear model (GLM) because ACC is binary (correct/incorrect). This is equivalent to logistic regression.

Run model predicting ACC from CueCondition, word length and freq, with random effects of Participant. The notations is (1|Participant) which means we are allowing the intercept (1) to vary by (|) participant.

Run model 1

Model.lmer1 <- glmer(ACC ~ CueCondition + zLengthPh + zFreq + (1|Participant), 
            data = NamingData, family = "binomial",
            control = glmerControl(optimizer="bobyqa"))

Get model 1 summary

summary(Model.lmer1) # standard model summary
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: ACC ~ CueCondition + zLengthPh + zFreq + (1 | Participant)
##    Data: NamingData
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   6600.7   6648.0  -3293.4   6586.7     6348 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0036 -0.6876  0.3306  0.5796  4.0567 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Participant (Intercept) 1.58     1.257   
## Number of obs: 6355, groups:  Participant, 10
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.95584    0.40249   2.375   0.0176 *  
## CueConditionSemCue  -0.62757    0.08667  -7.241 4.45e-13 ***
## CueConditionTone    -0.59457    0.08720  -6.818 9.22e-12 ***
## CueConditionUnAssoc -0.62037    0.08768  -7.075 1.49e-12 ***
## zLengthPh           -0.34702    0.03439 -10.092  < 2e-16 ***
## zFreq                0.21299    0.03432   6.205 5.46e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) CCndSC CCndtT CCndUA zLngtP
## CueCndtnSmC -0.114                            
## CueCondtnTn -0.113  0.525                     
## CCndtnUnAss -0.113  0.523  0.520              
## zLengthPh   -0.007  0.031  0.029  0.028       
## zFreq        0.005 -0.019 -0.016 -0.021  0.407
tidymodel1 <- tidy(Model.lmer1) # model summary as tidy df

REPRODUCE FIGURE 1

The purpose of Figure 1A in the paper is to plot the mean performance of each participant (averaged across conditions) and the grand mean (averaged across all conditions and participants). This shows that there are BIG differences in the performance across individuals in the study (probably due to differences in the severity of their aphasia. The data in Figure 1A are pulled using the fitted.values() function.

The individual differences in performance across participants is what the random effect of participant is trying to model. The data in Figure 1B comes from pulling ranef() from the model and represents the same differences seen in Figure 1A, but as deviations from the grand mean.

plotDat1 (fitted.values)

Using fitted.values() from stats package

plotDat1 = df of fitted values with original data for plotting

plotDat1 <- na.omit(NamingData) #remove NAs from original data so aligns with model values

plotDat1 <- plotDat1 %>%
  mutate(fitted_values = fitted.values(Model.lmer1)) 

names(plotDat1)
##  [1] "Word"          "Subject"       "Trial"         "ItemNo"       
##  [5] "Session"       "Condition"     "Acc"           "Cue"          
##  [9] "ACC"           "CueCondition"  "Length.Phon"   "FreqLogLemma" 
## [13] "zLengthPh"     "zFreq"         "Participant"   "fitted_values"

Create new plotDat1_mean df to plot summary stats, meanAcc for each participant, plus the grand mean across participants.

Plot 1A setup

As I look at this code again, noticing that this plot is calcuating means for each participantf rom their actual Acc values, not from fitted.values, which makes me wonder why the code above was pulling fitted.values go back and check original code

glimpse(plotDat1)
## Rows: 6,355
## Columns: 16
## $ Word          <chr> "ambulance", "ambulance", "ambulance", "ambulance"…
## $ Subject       <fct> FF, WR, AW, FF, MH, AW, AW, CB, JV, JK, FF, EW, CB…
## $ Trial         <dbl> 156, 65, 156, 40, 40, 65, 132, 65, 132, 40, 65, 65…
## $ ItemNo        <dbl> 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54…
## $ Session       <dbl> 1, 4, 3, 3, 1, 1, 4, 3, 2, 3, 4, 3, 2, 3, 2, 3, 1,…
## $ Condition     <chr> "Phonological", "Semantic", "Phonological", "Seman…
## $ Acc           <dbl> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
## $ Cue           <chr> "Cued", "Cued", "Cued", "NotCued", "NotCued", "Cue…
## $ ACC           <fct> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
## $ CueCondition  <fct> PhCue, SemCue, PhCue, UnAssoc, UnAssoc, SemCue, To…
## $ Length.Phon   <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ FreqLogLemma  <dbl> 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.…
## $ zLengthPh     <dbl[,1]> <matrix[25 x 1]>
## $ zFreq         <dbl[,1]> <matrix[25 x 1]>
## $ Participant   <fct> G, E, B, G, I, B, B, A, D, F, G, C, A, J, H, D, E,…
## $ fitted_values <dbl> 0.29341517, 0.42874248, 0.78257076, 0.18254126, 0.…
# get meanacc by participant
plotDat1_mean <- plotDat1 %>%
  group_by(Participant) %>%
  summarise(MeanAcc = mean(Acc)) 
## `summarise()` ungrouping output (override with `.groups` argument)
glimpse(plotDat1_mean) # check structure, note 'Participant' is factor but not ordered
## Rows: 10
## Columns: 2
## $ Participant <fct> J, B, A, H, C, G, F, D, I, E
## $ MeanAcc     <dbl> 0.1414286, 0.8526012, 0.8791367, 0.4575261, 0.834275…
plotDat1_mean$Participant <- as.character(plotDat1_mean$Participant) # make participant character so can add Mean label below


# in the 11th row, put "Mean" in the 1st col
# in the 11th row, in the 2nd col, calculate the mean Acc across all participants (grand mean)

plotDat1_mean[11,1] <- "Mean"

plotDat1_mean[11,2] <- mean(plotDat1$Acc)

plotDat1_mean$MeanAcc <- round(plotDat1_mean$MeanAcc, digits = 2) # round numbers for plotting

glimpse(plotDat1_mean)
## Rows: 11
## Columns: 2
## $ Participant <chr> "J", "B", "A", "H", "C", "G", "F", "D", "I", "E", "M…
## $ MeanAcc     <dbl> 0.14, 0.85, 0.88, 0.46, 0.83, 0.42, 0.46, 0.83, 0.37…

FIGURE 1A

Plot accuracy by subject and with grand mean, this code reproduces Figure 1a from the paper, mean acc for each participant with the grand mean.

plotDat1_mean %>% ggplot(aes(x=as.factor(Participant), y=MeanAcc)) + 
  geom_col() +  
  xlab("Participant and Grand Mean Accuracy") + 
  ylab("Mean Accuracy") + 
  ggtitle("1a) Raw participant accuracy and group/grand mean") + 
  expand_limits(y = c(0,1)) + 
  theme_bw() +
  theme(text=element_text(size=14))

In the original code, there was a quick and dirty lattice plot here that visualises the random effects of participant, that is the extent to which average performance of each participant differs from the grand mean. Below is how to reproduce that in ggplot.

Plot 1B set up (df_randomeffects1)

Use ranef() to pull the random effects from the model object. This function returns a list of 1 thing, so its easy to turn it into a dataframe.

randomeffects1 <- ranef(Model.lmer1) # ranef() gets a list of 1 with by subject intercepts

df_randomeffects1 <- randomeffects1[[1]] # the list only contains 1 thing so it is easier to work with as a df

glimpse(df_randomeffects1) 
## Rows: 10
## Columns: 1
## $ `(Intercept)` <dbl> -2.3830710, 1.3388007, 1.5818498, -0.6859396, 1.17…
# the pp-nos are in the row names, so need to make them a column 

df_randomeffects1 <- df_randomeffects1 %>%
 rownames_to_column(var = "subject") 

df_randomeffects1 <- df_randomeffects1 %>%
  dplyr::rename(intercept = "(Intercept)")


names(df_randomeffects1)
## [1] "subject"   "intercept"

FIGURE 1B

Plot random effects for subjects, this plot reproduces Figure 1B in the paper.

df_randomeffects1 %>% 
  ggplot(aes(x=as.factor(subject), y=intercept)) + geom_point(stat="identity") +
  ylab("Deviation from Grand Mean Intercept (0)") + 
  xlab("Participant") + 
  geom_hline(yintercept=0) + 
  ggtitle("1b) Participant accuracy as deviations from the grand mean") + theme_bw() + 
  theme(text=element_text(size=14))

Model 2: Random SLOPES (0 + CueCondition|Subject)

While the random effect of participant (intercept) accounts for individual differences in performance that the participant came to the experiment with (in this case probably differences in the severity of their aphasia), it is also possible that there are participant differences in the extent to which each participant is affected by the manipulation. To account for the possibility that the magnitude of the effect of CueCondition is difference across participants, here we add a random effect of participant on SLOPE cue condition effects.

Note that here we specify the random effect of subjects on the slope of the CueCondition effect by specifying +

(0 + CueCondition|Subject)

NOTE: here we are asking the glmer() function to ignore potential deviations between subjects in the intercepts, i.e. we are excluding the random effect of subjects on intercepts.

The (0 + CueCondidion|Subject) term models how the effect of Cue Condition might have a different slope for different people i.e. impact different people to a greater or lesser extent.

Run model 2

Model.lmer2 <-  glmer(ACC ~ CueCondition + zLengthPh + zFreq + 
                        (0 + CueCondition|Subject), 
                      data = NamingData, family = "binomial",
                      control = glmerControl(optimizer="bobyqa"))
## boundary (singular) fit: see ?isSingular

Get model 2 summary

summary(Model.lmer2)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: ACC ~ CueCondition + zLengthPh + zFreq + (0 + CueCondition |  
##     Subject)
##    Data: NamingData
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   6601.2   6709.3  -3284.6   6569.2     6339 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.3847 -0.6704  0.3238  0.5808  3.9320 
## 
## Random effects:
##  Groups  Name                Variance Std.Dev. Corr          
##  Subject CueConditionPhCue   1.608    1.268                  
##          CueConditionSemCue  1.585    1.259    0.94          
##          CueConditionTone    1.715    1.310    0.99 0.97     
##          CueConditionUnAssoc 1.572    1.254    0.97 0.99 0.99
## Number of obs: 6355, groups:  Subject, 10
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.95129    0.40647   2.340 0.019264 *  
## CueConditionSemCue  -0.62146    0.16964  -3.663 0.000249 ***
## CueConditionTone    -0.58364    0.10478  -5.570 2.55e-08 ***
## CueConditionUnAssoc -0.61683    0.13411  -4.599 4.24e-06 ***
## zLengthPh           -0.34961    0.03448 -10.141  < 2e-16 ***
## zFreq                0.21370    0.03442   6.209 5.34e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) CCndSC CCndtT CCndUA zLngtP
## CueCndtnSmC -0.231                            
## CueCondtnTn -0.011  0.647                     
## CCndtnUnAss -0.204  0.821  0.665              
## zLengthPh   -0.008  0.018  0.026  0.020       
## zFreq        0.004 -0.008 -0.012 -0.012  0.407
## convergence code: 0
## boundary (singular) fit: see ?isSingular
tidymodel2 <- tidy(Model.lmer2)

REPRODUCE FIGURE 2

Figure 2A plots main effect of CueCondition averaged across participants. Before when we were plotting random effect of participant, we used ranef() to pull the random effect of participant. Now we want the get the fixed effect of CueCondition, so use the allEffects() function. Sadly this function creates a super complex list, that you then need to index pieces out of.

Plot 2A set up (fixed.effects)

The allEffects() function returns a “large efflist”. This code gets the fixed effects that are associated with CueCondition and turns them into a dataframe, adds lower and upper CIs and standard error, then adds labels to the values in the df to make the plot make sense.

library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:here':
## 
##     here
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
effects <- allEffects(Model.lmer2) # pull allEffects into "large efflist"

fixed.effects <- effects$CueCondition$fit    # pull effects of cue condition

fixed.effects<- as.data.frame(fixed.effects) # make a dataframe

fixed.effects[,2]<- effects$CueCondition$lower # add col 2 with lower CI

fixed.effects[,3]<- effects$CueCondition$upper # add col 3 with upper CI

fixed.effects[,4]<- effects$CueCondition$se # add col 4 with se

fixed.effects[,5] <-c("PhCue","SemCue","Tone","UnAssoc") # add col 5 with condition type

fixed.effects[,6]<-c("Shared onset","Associated word","Tone","Non-associated word") # add col 6 with condition labels

fixed.effects <- fixed.effects %>%
  dplyr::rename(fit = V1, lower = V2, upper = V3, se = V4, cue = V5, label = V6) #rename V1-6

fixed.effects$label <- fct_relevel(fixed.effects$label, c("Shared onset", "Associated word", "Tone", "Non-associated word")) #fix levels of label
                                  
                             
levels(fixed.effects$label)
## [1] "Shared onset"        "Associated word"     "Tone"               
## [4] "Non-associated word"

FIGURE 2A

Mean accuracy across four cue conditions averaged across participants, use fixed.effects df to plot fitted values w 95% confidence intervals

fixed.effects %>%
  ggplot(aes(x = label, y = fit)) +
  geom_col() +
  geom_errorbar(aes(ymin=lower,ymax=upper),
                size = .3, 
                width=.2, 
                position=position_dodge(.9)) +
  ylim(-0.5, 2) +
  theme_bw() + 
  theme(text=element_text(size=14)) +
  xlab("Cue Type") +
  ylab("Accuracy (model fit)") +
  labs(title = "2a) Average accuracy across four Cue Type conditions")

plotDat2 (fitted.values)

Figure 2B illustrates how the effect of CueCondition differs for each participant but plotting fitted values from the model using a boxplot and facet_grid(). This plot illustrates the individual subject level effect of cue condition

The plotDat2 df pulls fitted.values (from the new model lmer2 which includes random slopes) and adds them to the original NamingData.

The original code did this with lattice- can I get the same plot with ggplot?

plotDat2 <- na.omit(NamingData) #remove NAs from original data so aligns with model values

plotDat2 <- plotDat2 %>%
  mutate(fitted_values = fitted.values(Model.lmer2)) 

glimpse(plotDat2)
## Rows: 6,355
## Columns: 16
## $ Word          <chr> "ambulance", "ambulance", "ambulance", "ambulance"…
## $ Subject       <fct> FF, WR, AW, FF, MH, AW, AW, CB, JV, JK, FF, EW, CB…
## $ Trial         <dbl> 156, 65, 156, 40, 40, 65, 132, 65, 132, 40, 65, 65…
## $ ItemNo        <dbl> 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54…
## $ Session       <dbl> 1, 4, 3, 3, 1, 1, 4, 3, 2, 3, 4, 3, 2, 3, 2, 3, 1,…
## $ Condition     <chr> "Phonological", "Semantic", "Phonological", "Seman…
## $ Acc           <dbl> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
## $ Cue           <chr> "Cued", "Cued", "Cued", "NotCued", "NotCued", "Cue…
## $ ACC           <fct> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
## $ CueCondition  <fct> PhCue, SemCue, PhCue, UnAssoc, UnAssoc, SemCue, To…
## $ Length.Phon   <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ FreqLogLemma  <dbl> 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.…
## $ zLengthPh     <dbl[,1]> <matrix[25 x 1]>
## $ zFreq         <dbl[,1]> <matrix[25 x 1]>
## $ Participant   <fct> G, E, B, G, I, B, B, A, D, F, G, C, A, J, H, D, E,…
## $ fitted_values <dbl> 0.30829898, 0.46656863, 0.80474230, 0.17829396, 0.…
plotDat2$Participant <- fct_relevel(plotDat2$Participant, c("A" ,"B", "C", "D", "E", "F", "G", "H", "I", "J"))

levels(plotDat2$Participant)
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"

FIGURE 2B

plotDat2 %>%
  ggplot(aes(x = CueCondition, y = fitted_values)) +
  geom_boxplot() +
  facet_wrap(~ Participant) +
  ylab("Accuracy (model fit)") +
   xlab("Cue Type") +
  labs(title = "2b) Effect of Cue Type by Participant")

# WOOT

Plot 2c set up (df_randomeffects2)

Figure 2c is a little like Figure 1b, except rather than just the random effect of participant, this is the random slope effect of cue type. For each participant, it is plotting the effect of CueCondition as a deviation from the condition mean, showing that while across the board the Cues with shared onset result in higher perforamnce than the other cues (Figure2a) this effet is not the same across all participants.

#Extract random effect variances
randomeffects2 <- ranef(Model.lmer2)  #list of 1, 40 obs of 5 variables

str(randomeffects2) #look at what has been extracted
## List of 1
##  $ Subject:'data.frame': 10 obs. of  4 variables:
##   ..$ CueConditionPhCue  : num [1:10] -2.459 1.486 1.281 -0.675 1.266 ...
##   ..$ CueConditionSemCue : num [1:10] -2.235 1.195 1.789 -0.681 1.074 ...
##   ..$ CueConditionTone   : num [1:10] -2.501 1.454 1.535 -0.712 1.259 ...
##   ..$ CueConditionUnAssoc: num [1:10] -2.323 1.296 1.642 -0.685 1.143 ...
##   ..- attr(*, "postVar")= num [1:4, 1:4, 1:10] 0.02446 -0.00254 0.01569 0.00596 -0.00254 ...
##  - attr(*, "class")= chr "ranef.mer"
df_randomeffects2 <- as.data.frame(randomeffects2)

df_randomeffects2 <- df_randomeffects2 %>%
  dplyr::rename(participant = grp) %>%
dplyr::rename(deviation = condval)  %>%
  mutate(participant = recode(participant, 
                              "CB" = "A", "AW" = "B","EW" = "C",
                              "JV" = "D","WR" = "E", "FF" = "G", 
                              "JK" = "F", "DH" = "H","MH" = "I",
                              "AM" = "J")) %>%
  mutate(cuetype = case_when(term == "CueConditionPhCue" ~ "Shared Onset", 
                             term == "CueConditionSemCue" ~ "Associated Word", 
                             term == "CueConditionTone" ~ "Tone",
                             term == "CueConditionUnAssoc" ~ "Non associated word"))    
    

glimpse(df_randomeffects2)
## Rows: 40
## Columns: 6
## $ grpvar      <chr> "Subject", "Subject", "Subject", "Subject", "Subject…
## $ term        <fct> CueConditionPhCue, CueConditionPhCue, CueConditionPh…
## $ participant <fct> J, B, A, H, C, G, F, D, I, E, J, B, A, H, C, G, F, D…
## $ deviation   <dbl> -2.4590051, 1.4858826, 1.2811261, -0.6754017, 1.2657…
## $ condsd      <dbl> 0.15638615, 0.18284801, 0.17790477, 0.12606286, 0.18…
## $ cuetype     <chr> "Shared Onset", "Shared Onset", "Shared Onset", "Sha…
df_randomeffects2$participant <- fct_relevel(df_randomeffects2$participant, c("A" ,"B", "C", "D", "E", "F", "G", "H", "I", "J"))

df_randomeffects2$cuetype <- fct_relevel(df_randomeffects2$cuetype, c("Shared Onset" ,"Associated Word", "Tone", "Non associated word"))

levels(df_randomeffects2$participant) 
##  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
levels(df_randomeffects2$cuetype) 
## [1] "Shared Onset"        "Associated Word"     "Tone"               
## [4] "Non associated word"
#Quick plot of random effects (will order participants by deviation, and panel by Cue Type)
lattice::dotplot(ranef(Model.lmer2))
## $Subject

FIGURE 2c

df_randomeffects2 %>%
  ggplot(aes(x = participant, y = deviation)) +
  geom_point() +
  ylab("Deviation from Condition Mean (0)") + xlab("Participant") +
  geom_hline(yintercept=0) +
   facet_grid(. ~ cuetype) +
  ggtitle("2c) Effect of Cue Type by Participant, as deviations from Condition Mean") +
   theme_bw() + theme(text=element_text(size=14))

I think this alternative to Figure 2c by participant, which swaps x axis and facet to plot the effect of cueCond for each participant, makes it easier to see the random slopes of cue condition. In the average plot in Figure 2A, shared onset has a clear advantage over other conditions, but in individual participant plots that is not always the case. The pattern seen in averages (that Shared onset performance is better than the other 3 conditions) only holds in half the participants.

df_randomeffects2 %>%
  ggplot(aes(x = cuetype, y = deviation)) +
  geom_point() +
  ylab("Deviation from Condition Mean (0)") + xlab("Cue Condition") +
  scale_x_discrete(labels=c("Shared Onset" = "S", "Associated Word" = "A",
                              "Tone" = "T", "Non associated word" = "N")) +
  geom_hline(yintercept=0) +
   facet_grid(. ~ participant) +
   ggtitle("2c alt) Effect of Cue Type by Participant, as deviations from Condition Mean") +
   theme_bw() + 
  theme(text=element_text(size=14)) 

tidy model 2 output

Filtering tidymodel2 output to only display sd for random slopes CueCondition aka varince in slopes associated with each cue type. Important things to note, “on average within each condition, participants vary around the mean by ~ 1.3 units”.

tidymodel2 %>%
  filter(effect == "ran_pars") %>%
  filter(term %in% c("sd__CueConditionPhCue","sd__CueConditionSemCue",  "sd__CueConditionTone", "sd__CueConditionUnAssoc")) %>%
  kableExtra::kable()
effect group term estimate std.error statistic p.value
ran_pars Subject sd__CueConditionPhCue 1.267931 NA NA NA
ran_pars Subject sd__CueConditionSemCue 1.258864 NA NA NA
ran_pars Subject sd__CueConditionTone 1.309690 NA NA NA
ran_pars Subject sd__CueConditionUnAssoc 1.253928 NA NA NA

Model 3 : Intercepts for participant and slopes for length (1+zLengthPh|Subject)

This model includes BOTH intercepts for subject (accounting for the overall differences in performance between subjects - as in model 1) AND random slopes for Length, that is the possiblity that the effect of word length on participants naming differs across participants. This model includes fixed effects of CueCondition, Length, and Frequency, as well as random intercepts for Subject and random slopes for the effect of Length across subjects.

Run model 3

Model.lmer3 = glmer(ACC ~ CueCondition + zLengthPh + zFreq + (1+zLengthPh|Subject), 
                      data = NamingData, family = "binomial",
                      control = glmerControl(optimizer="bobyqa"))

get model 3 summary

summary(Model.lmer3)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: 
## ACC ~ CueCondition + zLengthPh + zFreq + (1 + zLengthPh | Subject)
##    Data: NamingData
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   6595.9   6656.7  -3288.9   6577.9     6346 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.7441 -0.6793  0.3361  0.5689  4.2515 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr
##  Subject (Intercept) 1.58402  1.2586       
##          zLengthPh   0.01816  0.1348   0.34
## Number of obs: 6355, groups:  Subject, 10
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.94763    0.40307   2.351   0.0187 *  
## CueConditionSemCue  -0.62891    0.08686  -7.241 4.46e-13 ***
## CueConditionTone    -0.59569    0.08742  -6.814 9.46e-12 ***
## CueConditionUnAssoc -0.62221    0.08789  -7.080 1.44e-12 ***
## zLengthPh           -0.34817    0.05531  -6.295 3.07e-10 ***
## zFreq                0.21124    0.03432   6.154 7.55e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) CCndSC CCndtT CCndUA zLngtP
## CueCndtnSmC -0.114                            
## CueCondtnTn -0.114  0.526                     
## CCndtnUnAss -0.113  0.524  0.521              
## zLengthPh    0.256  0.019  0.018  0.017       
## zFreq        0.005 -0.018 -0.016 -0.020  0.254
tidymodel3 <- tidy(Model.lmer3)

Lattice gets quick plot of Random effects (both intercept = overall performance and Length). Eyeballing it looks like intercepts much greater difference across subjects that the effect of length (all deviations hover around 0)

lattice::dotplot(ranef(Model.lmer3))
## $Subject

## REPRODUCE FIGURE 3

Figure 3 has 3 parts. - 3a illustrates the effet of length on naming performance, averaged across the group (pp are less accurate at naming longer words) - 3b illustrates how that main effect of length, differs in slope for each participant (there are some pp whose slops are alot worse than the others, but across the group, not much variability in the effect of length). - 3c illustrates the random effects, first intercept, which is the differences across the group in overall perforamnce (ie the effect of aphasia severity as in Figure 1B), then slope, which is the difference in the effect of length across participants (ie does length affect some participants more than others- not really all hovering around the mean)

plotDat3 (fitted.values)

plotDat3 <- na.omit(NamingData) #remove NAs from original data so aligns with model values

plotDat3 <- plotDat3 %>%
  mutate(fitted_values = fitted.values(Model.lmer3)) 

names(plotDat3)
##  [1] "Word"          "Subject"       "Trial"         "ItemNo"       
##  [5] "Session"       "Condition"     "Acc"           "Cue"          
##  [9] "ACC"           "CueCondition"  "Length.Phon"   "FreqLogLemma" 
## [13] "zLengthPh"     "zFreq"         "Participant"   "fitted_values"
plotDat3$Participant <-as.character(plotDat3$Participant) #convert participant labels to text

# also need version of participant that is factor
plotDat3 %>%
  mutate(PartCodeFact = fct_relevel(plotDat3$Participant, c("A" ,"B", "C", "D", "E", "F", "G", "H", "I", "J"))) 
## # A tibble: 6,355 x 17
##    Word  Subject Trial ItemNo Session Condition   Acc Cue   ACC  
##    <chr> <fct>   <dbl>  <dbl>   <dbl> <chr>     <dbl> <chr> <fct>
##  1 ambu… FF        156     54       1 Phonolog…     0 Cued  0    
##  2 ambu… WR         65     54       4 Semantic      1 Cued  1    
##  3 ambu… AW        156     54       3 Phonolog…     1 Cued  1    
##  4 ambu… FF         40     54       3 Semantic      0 NotC… 0    
##  5 ambu… MH         40     54       1 Semantic      0 NotC… 0    
##  6 ambu… AW         65     54       1 Semantic      1 Cued  1    
##  7 ambu… AW        132     54       4 Phonolog…     1 NotC… 1    
##  8 ambu… CB         65     54       3 Semantic      1 Cued  1    
##  9 ambu… JV        132     54       2 Phonolog…     1 NotC… 1    
## 10 ambu… JK         40     54       3 Semantic      0 NotC… 0    
## # … with 6,345 more rows, and 8 more variables: CueCondition <fct>,
## #   Length.Phon <dbl>, FreqLogLemma <dbl>, zLengthPh[,1] <dbl>,
## #   zFreq[,1] <dbl>, Participant <chr>, fitted_values <dbl>,
## #   PartCodeFact <fct>

Figure 3a

Plot average slope for effect of length

plotDat3 %>%
  ggplot(aes(x=zLengthPh, y=fitted_values)) +
geom_smooth(method = "glm", colour = "black") +
  xlab("Length in Phonemes (z score)") + 
  ylab("Accuracy (model fit)") + 
  ggtitle("3a) Average (group) effect of Length in Phonemes") +
ylim(0,1) +
theme_bw() + 
  theme(text=element_text(size=14))
## `geom_smooth()` using formula 'y ~ x'

Figure 3b

Plot individual slopes for effect of length, this linetype = Participant trick is something I haven’t seen before (clever)

plotDat3 %>%
  ggplot(aes(x=zLengthPh, y=fitted_values, linetype=Participant)) + 
  geom_smooth(se = FALSE, method = "glm", colour = "black") +
  xlab("Length in Phonemes (z score)") + 
  ylab("Accuracy (model fit)") +
ggtitle("3b) Effect of Length in Phonemes by Participant") +
ylim(0,1) +
theme(text=element_text(size=14)) +
scale_colour_grey() + theme_bw() 
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 23 rows containing missing values (geom_smooth).

plot 3c set up (df_randomeffects3)

As for Figure 2c, pull random effects slope for Length.

#Extract random effect variances
randomeffects3 <- ranef(Model.lmer3)  #list of 1, 10 rows 2 col

str(randomeffects3) #look at what has been extracted
## List of 1
##  $ Subject:'data.frame': 10 obs. of  2 variables:
##   ..$ (Intercept): num [1:10] -2.396 1.326 1.58 -0.672 1.165 ...
##   ..$ zLengthPh  : num [1:10] -0.0779 0.0962 0.0326 0.0503 0.1077 ...
##   ..- attr(*, "postVar")= num [1:2, 1:2, 1:10] 0.01297 0.00295 0.00295 0.00882 0.01187 ...
##  - attr(*, "class")= chr "ranef.mer"
df_randomeffects3 <- randomeffects3[[1]]

# the pp-nos are in the row names, so need to make them a column , rename intercept, record Subject into A:J

df_randomeffects3 <- df_randomeffects3 %>%
 rownames_to_column(var = "subject") %>%
  dplyr::rename(intercept = "(Intercept)") %>%
  mutate(participant= recode(subject, 
                              "CB" = "A", "AW" = "B","EW" = "C",
                              "JV" = "D","WR" = "E", "FF" = "G", 
                              "JK" = "F", "DH" = "H","MH" = "I",
                              "AM" = "J"))
# quick plot of random effects (will order participants by deviation, and using original participants codes from model)
lattice::dotplot(ranef(Model.lmer3))
## $Subject

NOW.. Fgure 3c this is a little different to the other random effects plots (1b,2c), because we have more than 1 random effect that we want to plot. In Figure 1b we were just plotting the random intercept for participants and in Figure 2c, we were plotting the random slope for cueCondition alone (WITHOUT the random intercept for participants). This is the first time we are wanting to plot a random intercept and a random slope at the same time.

So, we need to make the data long so we can facet by variable (intercept vs slope) and plot the deviation (intercept and slope for length) comparatively.

df_randomeffects3_long <-  df_randomeffects3 %>%
  pivot_longer(names_to = "variable", values_to = "deviation", 2:3)

Figure 3c

df_randomeffects3_long %>% 
  ggplot(aes(x=as.factor(participant), y=deviation)) + 
      geom_point(stat="identity") +
  facet_grid(~ variable) +
  ylab("Deviation from Mean (0)") + xlab("Participant") +
ggtitle("3c) Participant intercepts and slopes for Length") +
geom_hline(yintercept=0) +
  theme_bw() + theme(text=element_text(size=14))

tidy model 3 output

Filtering tidymodel3 output to include only random effects. This model includes both random intercepts for participants (estiamte = 1.259), and random slopes for length. Variance for random effect of length = 0.134, telling us that participants vary much much more in their overall perforamnce, than in the extent to which the length of words makes a difference to their performance.

tidymodel3 %>%
  filter(effect == "ran_pars")  %>%
  kableExtra::kable()
effect group term estimate std.error statistic p.value
ran_pars Subject sd__(Intercept) 1.2585801 NA NA NA
ran_pars Subject cor__(Intercept).zLengthPh 0.3403571 NA NA NA
ran_pars Subject sd__zLengthPh 0.1347630 NA NA NA

Model 4: intercepts for participant and slopes for frequency (1+zFreq|Subject)

Model 4 is the same as model 3, except instead of including random intercepts for participant AND slope for Length, we are including intercepts for participant AND slope for Frequency.

NOTE- probably in the real analysis they included both.

Model.lmer4 = glmer(ACC ~ CueCondition + zLengthPh + zFreq + (1+zFreq|Subject), 
                      data = NamingData, family = "binomial",
                      control = glmerControl(optimizer="bobyqa"))

summary(Model.lmer4)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: ACC ~ CueCondition + zLengthPh + zFreq + (1 + zFreq | Subject)
##    Data: NamingData
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   6577.3   6638.1  -3279.6   6559.3     6346 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9959 -0.6727  0.3477  0.5622  5.7565 
## 
## Random effects:
##  Groups  Name        Variance Std.Dev. Corr 
##  Subject (Intercept) 1.60698  1.2677        
##          zFreq       0.03981  0.1995   -0.84
## Number of obs: 6355, groups:  Subject, 10
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.93782    0.40592   2.310   0.0209 *  
## CueConditionSemCue  -0.63507    0.08714  -7.288 3.14e-13 ***
## CueConditionTone    -0.59970    0.08766  -6.841 7.88e-12 ***
## CueConditionUnAssoc -0.62466    0.08813  -7.088 1.36e-12 ***
## zLengthPh           -0.36034    0.03513 -10.256  < 2e-16 ***
## zFreq                0.17644    0.07238   2.438   0.0148 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) CCndSC CCndtT CCndUA zLngtP
## CueCndtnSmC -0.114                            
## CueCondtnTn -0.113  0.528                     
## CCndtnUnAss -0.113  0.525  0.522              
## zLengthPh   -0.006  0.031  0.029  0.029       
## zFreq       -0.726 -0.008 -0.007 -0.009  0.205
tidymodel4 <- tidy(Model.lmer4)
# quick plot of random effects from model
lattice::dotplot(ranef(Model.lmer4))
## $Subject

REPRODUCE FIGURE 4

plotDat4 (fitted.values)

plotDat4 <- na.omit(NamingData) #remove NAs from original data so aligns with model values

plotDat4 <- plotDat4 %>%
  mutate(fitted_values = fitted.values(Model.lmer3)) 

# participant labels are factor, need version that is text

plotDat4 <- plotDat4 %>%
  mutate(PartCodeFact = fct_relevel(plotDat4$Participant, c("A" ,"B", "C", "D", "E", "F", "G", "H", "I", "J"))) # partcodefact is factos


plotDat4$Participant <-as.character(plotDat4$Participant) #convert participant labels to text

glimpse(plotDat4)
## Rows: 6,355
## Columns: 17
## $ Word          <chr> "ambulance", "ambulance", "ambulance", "ambulance"…
## $ Subject       <fct> FF, WR, AW, FF, MH, AW, AW, CB, JV, JK, FF, EW, CB…
## $ Trial         <dbl> 156, 65, 156, 40, 40, 65, 132, 65, 132, 40, 65, 65…
## $ ItemNo        <dbl> 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54…
## $ Session       <dbl> 1, 4, 3, 3, 1, 1, 4, 3, 2, 3, 4, 3, 2, 3, 2, 3, 1,…
## $ Condition     <chr> "Phonological", "Semantic", "Phonological", "Seman…
## $ Acc           <dbl> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
## $ Cue           <chr> "Cued", "Cued", "Cued", "NotCued", "NotCued", "Cue…
## $ ACC           <fct> 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1,…
## $ CueCondition  <fct> PhCue, SemCue, PhCue, UnAssoc, UnAssoc, SemCue, To…
## $ Length.Phon   <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ FreqLogLemma  <dbl> 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.9542, 0.…
## $ zLengthPh     <dbl[,1]> <matrix[25 x 1]>
## $ zFreq         <dbl[,1]> <matrix[25 x 1]>
## $ Participant   <chr> "G", "E", "B", "G", "I", "B", "B", "A", "D", "F", …
## $ fitted_values <dbl> 0.16892833, 0.44141248, 0.81807048, 0.09837124, 0.…
## $ PartCodeFact  <fct> G, E, B, G, I, B, B, A, D, F, G, C, A, J, H, D, E,…

Figure 4a

Plot average slope for effect of freq, across participants and conditions, overall the less frequent a word is the harder it is to name.

plotDat4 %>%
  ggplot(aes(x=zFreq, y=fitted_values)) +
geom_smooth(method = "glm", colour = "black") +
  xlab("Frequency (z score)") + 
  ylab("Accuracy (model fit)") + 
  ggtitle("4a) Average (group) effect of Freq") +
ylim(0,1) +
theme_bw() + 
  theme(text=element_text(size=14))
## `geom_smooth()` using formula 'y ~ x'

Figure 4b

Plot individual slopes for effect of length, showing that while overall infreqent words are harder to name, the extent to which frequency impacts naming perofmrance differs across participants. The slope of the effect differs across participants.

plotDat4 %>%
  ggplot(aes(x=zFreq, y=fitted_values, linetype=Participant)) + 
  geom_smooth(se = FALSE, method = "glm", colour = "black") +
  xlab("Frequency(z score)") + 
  ylab("Accuracy (model fit)") +
ggtitle("4b) Effect of Frequnecy by Participant") +
ylim(0,1) +
theme(text=element_text(size=14)) +
scale_colour_grey() + theme_bw() 
## `geom_smooth()` using formula 'y ~ x'

# quick plot of random effects (will order participants by deviation, and using original participants codes from model)
lattice::dotplot(ranef(Model.lmer4))
## $Subject

#### plot 4c set up (df_randomeffects3)

Plot both random intercepts and slopes for Length in phonemes

#Extract random effect variances
randomeffects4 <- ranef(Model.lmer4)  #list of 1, 10 rows 2 col

str(randomeffects4) #look at what has been extracted
## List of 1
##  $ Subject:'data.frame': 10 obs. of  2 variables:
##   ..$ (Intercept): num [1:10] -2.456 1.332 1.569 -0.658 1.189 ...
##   ..$ zFreq      : num [1:10] 0.315 -0.143 -0.246 0.014 -0.257 ...
##   ..- attr(*, "postVar")= num [1:2, 1:2, 1:10] 0.01403 -0.00352 -0.00352 0.00646 0.01145 ...
##  - attr(*, "class")= chr "ranef.mer"
df_randomeffects4 <- randomeffects4[[1]]

# the pp-nos are in the row names, so need to make them a column, rename intercept, recode participant values

df_randomeffects4 <- df_randomeffects4 %>%
 rownames_to_column(var = "subject") %>%
  dplyr::rename(intercept = "(Intercept)") %>%
  mutate(participant= recode(subject, 
                              "CB" = "A", "AW" = "B","EW" = "C",
                              "JV" = "D","WR" = "E", "FF" = "G", 
                              "JK" = "F", "DH" = "H","MH" = "I",
                              "AM" = "J"))


# As for plot 3, data needs to be long in order to plot BOTH intercepts and slopes.

df_randomeffects4_long <-  df_randomeffects4 %>%
  pivot_longer(names_to = "variable", values_to = "deviation", 2:3)

Figure 4c

Plot both random intercepts (overall differences in performance as a function of participant) and slopes for frequency (the extext to which the effect of frequency differs across participants).

Slope for frequency effect a bit more systematic here. Looks like the worse the aphasia, the bigger the effect of freq?

df_randomeffects4_long %>% 
  ggplot(aes(x=as.factor(participant), y=deviation)) + 
      geom_point(stat="identity") +
  facet_grid(~ variable) +
  ylab("Deviation from Mean (0)") + xlab("Participant") +
ggtitle("4c) Participant intercepts and slopes for Freq") +
geom_hline(yintercept=0) +
  theme_bw() + theme(text=element_text(size=14))

tidy model 4 output

Filtering tidymodel4 output to include only random effects. This model includes both random intercepts for participants (estiamte = 1.267), and random slopes for frequency. Variance for random effect of length = 0.200, also correlation between intercept and freq effect is high 0.84 telling us that participants higher intercepts (better accuracy overall) are less impated by frequency.

tidymodel4 %>%
  filter(effect == "ran_pars")  %>%
  kableExtra::kable()
effect group term estimate std.error statistic p.value
ran_pars Subject sd__(Intercept) 1.2676658 NA NA NA
ran_pars Subject cor__(Intercept).zFreq -0.8432434 NA NA NA
ran_pars Subject sd__zFreq 0.1995297 NA NA NA