For analysis we transformed all behavioral measures as follows. Count data because they are zero bound were first incremental by a value of 1 and then log transformed; proportions (data that ranged between 0 and 1) were logit scaled. Then, all outcome values were standardized (scaled and centered).
# Data
data <- read_csv(file = "../data/by_token_estimates.csv") %>%
mutate(across(c(starts_with("n_"), starts_with("len_")), ~log(.+1)),
across(starts_with("prop_"), logit_scaled),
across(c(participant, session_no), as.factor),
across(where(is.numeric), ~scale(.)[,1]))
# Remove 3 participants for which session_no 1 was prompt S at random
set.seed(365)
rnd_subs <- filter(data, session_no == "1", prompt == "S") %>%
pull(participant) %>% factor() %>% levels() %>%
sample(3)
data <- data %>% filter(!(participant %in% rnd_subs))
# Data long
data <- data %>%
select(-token, -prompt) %>%
pivot_wider(names_from = session_no,
values_from = n_major_block:prob_long_event_duration_within_word,
names_sep = "+",
names_prefix = "session_") %>%
pivot_longer(-participant,
names_to = c("outcome", ".value"),
names_sep = "\\+")
# Preview data
data
## # A tibble: 600 × 5
## participant outcome session_1 session_2 session_3
## <fct> <chr> <dbl> <dbl> <dbl>
## 1 71 n_major_block 1.77 2.20 2.20
## 2 71 n_jumps -2.01 -0.829 -2.01
## 3 71 n_edges -0.377 0.151 0.151
## 4 71 n_sustained_reading -0.0485 0.522 0.673
## 5 71 prop_edits_before_sentence -0.792 -0.449 -2.10
## 6 71 prop_edits_before_word -1.46 -0.942 -2.02
## 7 71 prop_edits_within_word -1.61 -0.591 -2.03
## 8 71 prop_lookbacks_before_sentence 1.03 0.957 1.86
## 9 71 prop_lookbacks_before_word 1.31 1.38 1.44
## 10 71 prop_lookbacks_within_word 0.857 1.29 1.60
## # … with 590 more rows
Data were analysed in a Bayesian linear mixed effects models (Gelman et al. 2014; McElreath 2016). The R (R Core Team 2020) package brms (Bürkner 2017, 2018) was used to model the data.
Models were fitted with weakly informative priors (see McElreath 2016), and run with 10,000 iterations on 3 chains with a warm-up of 5,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (\(\hat{R}\) = 1) (Gelman and Rubin 1992) and inspection of the Markov chain Monte Carlo chains.
Scatterplots of the by-participant and by-session estimates are shown in Figure 3.1.
Figure 2.1: Observed correlations
The model formula used was the following (in brms
syntax).
formula <- bf(mvbind(session_2, session_3) ~ session_1 : outcome +
(outcome|s|participant), family = gaussian())
By-participant estimates for each process measure of session 2 and session 3 were predicted by the corresponding process measure (outcome
) of session 1. The colon notation omits the main effects for session 1 and process measure as their coefficients would be meaningless.
The model was fitted with random intercepts for participant, and with slope adjustments for process measures to allow the slope of the random intercepts to vary by process measure.
Model coefficients are shown in Table 3.1.
Predictor (process measure) | Session 2 | Session 3 |
---|---|---|
len_nonlookbacks | 0.7 [0.46 – 0.94] | 0.58 [0.37 – 0.8] |
len_prodseq | 0.83 [0.59 – 1.07] | 0.94 [0.72 – 1.16] |
lookback_duration | 0.38 [0.03 – 0.72] | 0.54 [0.06 – 1.05] |
n_edges | 0.59 [0.31 – 0.85] | 0.67 [0.4 – 0.95] |
n_jumps | 0.56 [0.28 – 0.84] | 0.75 [0.47 – 1.03] |
n_major_block | 0.61 [0.33 – 0.9] | 0.52 [0.17 – 0.87] |
n_sustained_reading | 0.77 [0.45 – 1.07] | 0.82 [0.5 – 1.14] |
prob_long_event_duration_before_sentence | 0.5 [0.17 – 0.83] | 0.59 [0.2 – 0.99] |
prob_long_event_duration_before_word | 0.96 [0.68 – 1.24] | 0.68 [0.4 – 0.96] |
prob_long_event_duration_within_word | 0.37 [0.03 – 0.71] | 0.36 [-0.04 – 0.77] |
prob_long_lookback_duration | 0.43 [0.07 – 0.79] | 0.3 [-0.1 – 0.7] |
prop_edits_before_sentence | 0.54 [0.21 – 0.88] | 0.5 [0.05 – 0.94] |
prop_edits_before_word | 0.64 [0.42 – 0.88] | 0.68 [0.47 – 0.89] |
prop_edits_within_word | 0.49 [0.27 – 0.71] | 0.63 [0.42 – 0.84] |
prop_lookbacks_before_sentence | 0.41 [0.14 – 0.67] | 0.33 [0.03 – 0.62] |
prop_lookbacks_before_word | 0.48 [0.17 – 0.8] | 0.45 [0.18 – 0.72] |
prop_lookbacks_within_word | 0.46 [0.09 – 0.83] | 0.37 [0.06 – 0.67] |
short_event_duration_before_sentence | 0.85 [0.6 – 1.1] | 0.62 [0.37 – 0.87] |
short_event_duration_before_word | 0.95 [0.71 – 1.18] | 0.9 [0.68 – 1.11] |
short_event_duration_within_word | 0.89 [0.65 – 1.13] | 0.8 [0.58 – 1.01] |
Posterior coefficients are shown in Figure 3.1.
Figure 3.1: Inferred model coefficients with 95% probability intervals (PIs) by process measure.
To assess the correlations between the process measures after controlling for possible differences between sessions, we setup another multi-variate mixed effects model with all process measures as outcome variable and session number (session_no
) as predictor variable. Participants were modelled as random intercepts with by-session slope adjustment:
formula <- bf(mvbind(n_major_block,
n_jumps,
n_edges,
n_sustained_reading,
prop_edits_before_sentence,
prop_edits_before_word,
prop_edits_within_word,
prop_lookbacks_before_sentence,
prop_lookbacks_before_word,
prop_lookbacks_within_word,
len_nonlookbacks,
len_prodseq,
lookback_duration,
prob_long_lookback_duration,
short_event_duration_before_sentence,
short_event_duration_before_word,
short_event_duration_within_word,
prob_long_event_duration_before_sentence,
prob_long_event_duration_before_word,
prob_long_event_duration_within_word) ~
session_no + (session_no|s|participant), family = gaussian())
The residual correlations of all process measures are shown in Figure 4.1 and Table 4.1.
Figure 4.1: Residual correlation coefficients. Values for correlations \(\le |.2|\) were omitted.
nmajorblock | njumps | nedges | nsustainedreading | propeditsbeforesentence | propeditsbeforeword | propeditswithinword | proplookbacksbeforesentence | proplookbacksbeforeword | proplookbackswithinword | lennonlookbacks | lenprodseq | lookbackduration | problonglookbackduration | shorteventdurationbeforesentence | shorteventdurationbeforeword | shorteventdurationwithinword | problongeventdurationbeforesentence | problongeventdurationbeforeword | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
njumps | .45 [.24, .65] | ||||||||||||||||||
nedges | .35 [.12, .57] | .87 [.78, .94] | |||||||||||||||||
nsustainedreading | .35 [.09, .58] | .40 [.17, .59] | .38 [.15, .58] | ||||||||||||||||
propeditsbeforesentence | -.07 [-.35, .23] | .17 [-.10, .43] | .10 [-.17, .37] | .27 [-.01, .53] | |||||||||||||||
propeditsbeforeword | .15 [-.09, .37] | .25 [.04, .45] | .22 [.00, .42] | .28 [.05, .49] | .26 [-.01, .50] | ||||||||||||||
propeditswithinword | .04 [-.20, .27] | .21 [-.00, .41] | .17 [-.05, .37] | .28 [.05, .50] | .32 [.05, .55] | .79 [.65, .89] | |||||||||||||
proplookbacksbeforesentence | .28 [.04, .50] | .07 [-.15, .28] | .07 [-.15, .28] | .38 [.17, .58] | -.05 [-.32, .21] | -.14 [-.36, .09] | -.08 [-.30, .14] | ||||||||||||
proplookbacksbeforeword | .15 [-.09, .39] | -.05 [-.25, .16] | -.05 [-.25, .16] | .36 [.14, .55] | .15 [-.13, .40] | -.12 [-.35, .11] | -.05 [-.28, .17] | .55 [.39, .69] | |||||||||||
proplookbackswithinword | .24 [.02, .46] | .12 [-.08, .30] | .11 [-.10, .30] | .45 [.26, .62] | .12 [-.13, .36] | .01 [-.20, .22] | .02 [-.19, .23] | .57 [.40, .72] | .61 [.46, .74] | ||||||||||
lennonlookbacks | -.25 [-.46, -.02] | -.11 [-.30, .10] | -.05 [-.25, .15] | -.51 [-.67, -.32] | -.13 [-.37, .13] | .01 [-.21, .24] | -.04 [-.26, .18] | -.69 [-.80, -.56] | -.85 [-.90, -.79] | -.86 [-.92, -.79] | |||||||||
lenprodseq | -.11 [-.37, .17] | -.28 [-.51, -.03] | -.26 [-.50, -.01] | -.16 [-.43, .12] | -.20 [-.48, .10] | -.46 [-.67, -.23] | -.46 [-.67, -.23] | .19 [-.08, .44] | .08 [-.19, .35] | .14 [-.11, .39] | -.09 [-.35, .17] | ||||||||
lookbackduration | .07 [-.31, .42] | -.06 [-.40, .30] | -.07 [-.41, .28] | .05 [-.32, .41] | .04 [-.34, .40] | -.01 [-.36, .33] | -.04 [-.38, .31] | .21 [-.21, .53] | .09 [-.28, .42] | .08 [-.27, .41] | -.12 [-.45, .25] | .01 [-.34, .37] | |||||||
problonglookbackduration | .06 [-.22, .34] | .23 [-.02, .46] | .26 [.01, .48] | .37 [.11, .58] | .05 [-.25, .34] | -.05 [-.29, .19] | -.15 [-.38, .10] | .18 [-.06, .40] | -.03 [-.26, .20] | .07 [-.17, .30] | -.03 [-.24, .20] | .06 [-.22, .32] | .06 [-.32, .40] | ||||||
shorteventdurationbeforesentence | .18 [-.14, .48] | .21 [-.11, .48] | .17 [-.13, .46] | -.01 [-.32, .32] | .05 [-.29, .38] | .02 [-.27, .31] | -.02 [-.31, .28] | .03 [-.27, .34] | -.08 [-.38, .24] | .01 [-.28, .31] | .06 [-.26, .36] | -.05 [-.36, .26] | .01 [-.37, .39] | .28 [-.10, .57] | |||||
shorteventdurationbeforeword | -.18 [-.51, .22] | -.08 [-.42, .26] | -.06 [-.40, .28] | -.03 [-.38, .32] | -.01 [-.37, .36] | -.09 [-.42, .27] | -.10 [-.43, .26] | -.05 [-.40, .30] | -.01 [-.37, .35] | -.04 [-.38, .31] | .04 [-.32, .39] | .14 [-.25, .47] | .03 [-.37, .41] | .11 [-.28, .44] | .02 [-.35, .38] | ||||
shorteventdurationwithinword | -.01 [-.29, .27] | .04 [-.22, .30] | .05 [-.21, .31] | .15 [-.14, .42] | .05 [-.26, .35] | -.12 [-.37, .15] | -.09 [-.34, .18] | .13 [-.16, .41] | -.13 [-.41, .17] | .02 [-.25, .29] | .01 [-.28, .28] | -.14 [-.41, .16] | .08 [-.30, .43] | .19 [-.10, .47] | .06 [-.27, .38] | .12 [-.29, .46] | |||
problongeventdurationbeforesentence | .13 [-.15, .39] | .05 [-.20, .29] | .02 [-.23, .27] | .35 [.08, .58] | .02 [-.27, .31] | .24 [-.01, .47] | .24 [-.01, .47] | .24 [-.02, .46] | .15 [-.10, .39] | .18 [-.06, .41] | -.27 [-.48, -.03] | -.03 [-.31, .24] | .05 [-.32, .41] | -.07 [-.34, .22] | -.29 [-.58, .08] | -.10 [-.44, .28] | -.01 [-.31, .27] | ||
problongeventdurationbeforeword | .05 [-.28, .37] | .15 [-.17, .45] | .16 [-.17, .46] | .25 [-.12, .57] | .25 [-.15, .56] | .09 [-.22, .38] | .17 [-.15, .45] | .17 [-.19, .47] | .09 [-.25, .40] | .14 [-.20, .43] | -.11 [-.42, .22] | -.11 [-.42, .23] | -.05 [-.42, .35] | .12 [-.22, .44] | .01 [-.35, .36] | -.19 [-.57, .27] | .05 [-.29, .39] | .10 [-.24, .42] | |
problongeventdurationwithinword | .04 [-.33, .40] | .05 [-.31, .39] | .08 [-.29, .42] | .14 [-.27, .48] | -.08 [-.44, .32] | -.04 [-.38, .32] | -.02 [-.36, .33] | .08 [-.29, .42] | .12 [-.27, .45] | .16 [-.24, .49] | -.14 [-.47, .25] | .01 [-.35, .37] | -.02 [-.41, .37] | .04 [-.32, .39] | -.03 [-.39, .35] | -.03 [-.42, .37] | -.05 [-.40, .33] | .08 [-.31, .44] | .07 [-.33, .45] |
We tested another parametrization similar to the first model using a wide data format.
# Data
data <- read_csv(file = "data/by_token_estimates.csv") %>%
mutate(across(c(starts_with("n_"), starts_with("len_")), ~log(.+1)),
across(starts_with("prop_"), logit_scaled),
across(c(participant, session_no), as.factor),
across(where(is.numeric), ~scale(.)[,1]))
# Remove 3 participants for which session_no 1 was prompt S at random
set.seed(365)
rnd_subs <- filter(data, session_no == "1", prompt == "S") %>%
pull(participant) %>% factor() %>% levels() %>%
sample(3)
data <- data %>% filter(!(participant %in% rnd_subs))
# Data viz
data_wide <- data %>%
select(-token, -prompt) %>%
pivot_wider(names_from = session_no,
values_from = n_major_block:prob_long_event_duration_within_word)
# Check variables
glimpse(data)
The brms
syntax is as follows.
formula <- bf(mvbind(n_major_block_2,
n_jumps_2,
n_edges_2,
n_sustained_reading_2,
prop_edits_before_sentence_2,
prop_edits_before_word_2,
prop_edits_within_word_2,
prop_lookbacks_before_sentence_2,
prop_lookbacks_before_word_2,
prop_lookbacks_within_word_2,
len_nonlookbacks_2,
len_prodseq_2,
lookback_duration_2,
prob_long_lookback_duration_2,
short_event_duration_before_sentence_2,
short_event_duration_before_word_2,
short_event_duration_within_word_2,
prob_long_event_duration_before_sentence_2,
prob_long_event_duration_before_word_2,
prob_long_event_duration_within_word_2,
n_major_block_3,
n_jumps_3,
n_edges_3,
n_sustained_reading_3,
prop_edits_before_sentence_3,
prop_edits_before_word_3,
prop_edits_within_word_3,
prop_lookbacks_before_sentence_3,
prop_lookbacks_before_word_3,
prop_lookbacks_within_word_3,
len_nonlookbacks_3,
len_prodseq_3,
lookback_duration_3,
prob_long_lookback_duration_3,
short_event_duration_before_sentence_3,
short_event_duration_before_word_3,
short_event_duration_within_word_3,
prob_long_event_duration_before_sentence_3,
prob_long_event_duration_before_word_3,
prob_long_event_duration_within_word_3) ~
n_major_block_1 +
n_jumps_1 +
n_edges_1 +
n_sustained_reading_1 +
prop_edits_before_sentence_1 +
prop_edits_before_word_1 +
prop_edits_within_word_1 +
prop_lookbacks_before_sentence_1 +
prop_lookbacks_before_word_1 +
prop_lookbacks_within_word_1 +
len_nonlookbacks_1 +
len_prodseq_1 +
lookback_duration_1 +
prob_long_lookback_duration_1 +
short_event_duration_before_sentence_1 +
short_event_duration_before_word_1 +
short_event_duration_within_word_1 +
prob_long_event_duration_before_sentence_1 +
prob_long_event_duration_before_word_1 +
prob_long_event_duration_within_word_1, family = gaussian())
This model returns coefficients for the relationship between all session 1 process measures and all session 2 and 3 process measures. Note though that this model returns also the relationship between all other process measures which is possibly not needed as we are only interested in the relationship between of each process measure across sessions.
Also this model returns the residual correlations between all session 2 and 3 process measures not including session 1 (the correlations above included all three sessions).
Intercepts were omitted from the model summary. As all process measures were centred and standardised, model intercepts are going to be close to zero.
Model coefficients are shown in Table 5.1. This table summarises only the relevant coefficients, i.e. the effects each process measure of session 1 has on the same process measure of session 2 and 3 (similar to Table 3.1 above).
Predictor (process measure) | Session 2 | Session 3 |
---|---|---|
nmajorblock | 0.41 [0.05 – 0.79] | 0.53 [0.08 – 0.97] |
njumps | 0.24 [-0.23 – 0.82] | 0.35 [-0.13 – 0.96] |
nedges | 0.39 [-0.12 – 1.03] | 0.48 [-0.06 – 1.16] |
nsustainedreading | 0.42 [-0.1 – 1.08] | 0.85 [0.22 – 1.46] |
propeditsbeforesentence | 0.32 [-0.09 – 0.78] | 0.38 [-0.08 – 0.91] |
propeditsbeforeword | 0.41 [0 – 0.9] | 0.82 [0.18 – 1.5] |
propeditswithinword | 0.21 [-0.17 – 0.64] | 0.03 [-0.38 – 0.46] |
proplookbacksbeforesentence | 0.26 [-0.02 – 0.54] | 0.03 [-0.33 – 0.41] |
proplookbacksbeforeword | 0.01 [-0.46 – 0.46] | 0.12 [-0.31 – 0.57] |
proplookbackswithinword | 0.42 [-0.06 – 0.97] | 0.13 [-0.27 – 0.57] |
lennonlookbacks | 0.17 [-0.29 – 0.73] | 0.25 [-0.21 – 0.83] |
lenprodseq | 0.62 [0.34 – 0.91] | 0.66 [0.33 – 0.97] |
lookbackduration | 0.4 [0.01 – 0.81] | 0.37 [-0.14 – 1.02] |
problonglookbackduration | 0.16 [-0.2 – 0.55] | -0.03 [-0.44 – 0.36] |
shorteventdurationbeforesentence | 0.66 [0.34 – 0.96] | 0.35 [0.08 – 0.63] |
shorteventdurationbeforeword | 0.76 [0.48 – 1.03] | 0.83 [0.59 – 1.06] |
shorteventdurationwithinword | 0.83 [0.46 – 1.16] | 0.67 [0.31 – 1.01] |
problongeventdurationbeforesentence | 0.46 [0.1 – 0.81] | 0.4 [-0.02 – 0.85] |
problongeventdurationbeforeword | 0.69 [0.26 – 1.11] | 0.59 [0.15 – 1.04] |
problongeventdurationwithinword | 0.22 [-0.13 – 0.6] | 0.15 [-0.18 – 0.53] |
Posterior coefficients are shown in Figure 5.1.
Figure 5.1: Inferred model coefficients with 95% probability intervals (PIs) by process measure.
The residual correlations of all process measures are shown in Figure 5.2. Again, for simplicity we extracted only the correlations with each session 2 and 3, not the correlations across sessions.
Figure 5.2: Residual correlation coefficients. Values for correlations \(\le |.2|\) were omitted.