This document describes a real‑time activation model used to estimate the availability of propositional content (EDUs, e.g. sentences) in memory during reading‑to‑write tasks. The model treats reading as a stream of memory‑updating events that dynamically shape which ideas are available for writing under time pressure and interference and is driven by eye‑movement–derived events but abstracts away from word‑level processing. It is intended to capture proposition‑level availability, not surface recall.
The model builds on activation‑based and interference‑based accounts of memory in psycholinguistics (ACT‑R, cue‑based retrieval, discourse models), according to which linguistic representations are continuously updated, decay over time, and compete for retrieval. Like cue‑based retrieval and discourse comprehension models, it assumes that forgetting reflects reduced accessibility rather than loss of representations. EDUs are treated as propositional units whose activation is strengthened by reading, weakened by interference from new content, and eroded by non‑reading activities such as writing. Availability is defined in relative terms, capturing the selective accessibility of propositions under competition. The model thus integrates insights from activation‑based memory theory, discourse processing, and eye‑movement research into a real‑time account of source memory during writing.
Below is an example of the input format that the model is expecting. All variables are required (except for token).
Rows: 8,354
Columns: 6
$ token <chr> "test", "test", "test", "test", "test", "test", "test", "test", "test", "test", "test",…
$ edu <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "001kgfc", NA, NA, "002smtf", "003twnf", "002sm…
$ t_start <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.7…
$ event_type <chr> "other", "other", "other", "other", "other", "other", "other", "other", "other", "other…
$ duration <int> 797, 355, 1788, 4286, 1761, 2183, 2565, 2076, 120, 93, 160, 140, 107, 602, 307, 221, 19…
$ words_in_edu <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, 22, 24, 22, 16, 16, 16, NA, 18, NA,…
In these data event_type has the value read always when edu is not NA and other when edu is NA. words_in_edu comes from the rst_tsv file:
Rows: 246
Columns: 3
$ edu <chr> "001wfwa", "002tiwf", "003atti", "004saec", "005osit", "006netd", "007tpid", "008bats",…
$ source <chr> "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_…
$ words_in_edu <int> 20, 25, 21, 18, 5, 39, 4, 25, 29, 19, 11, 36, 14, 32, 29, 16, 8, 23, 20, 31, 18, 35, 38…
The input data format requires minimal data transformation from the incremental report returned by cywrite and no data cleaning. The data wrangling that would be minimally required (in R) is shown here:
jsonlite::fromJSON(txt = token, flatten = TRUE) %>%
select(eye) %>%
unnest(eye) %>%
# label edus as "read" or "other
mutate(event_type = case_when(!is.na(edu) ~ "read", TRUE ~ "other")) %>%
# add number of words in edu
left_join(data_info, by = join_by(edu))
The following code illustrates how the reading history function update_reading_history can be applied to slices of data. These data slices can have any size from 1 to the overall length of the writing session. Details of the implementation of update_reading_history and a conceptual description can be found below.
# Model parameters
lambda <- .08 # slow passive time decay (/sec)
alpha <- .9 # moderate encoding strength
beta <- .1 # strong propositinoal level interference from reading
gamma <- .18 # strong decay during writing / "other"
refix_window <- 2.0 # seconds
k_maint <- 0.3 # refixation boost
theta <- 0.15 # retrieval threshold (normalised)
# Vector for initial states
state <- init_state()
# Data frame for results
rh <- tibble()
# Loop over random slices of data (to simulate real time)
for(i in seq(1, nrow(data), 20)){
tmp <- data %>% slice(i)
res <- update_reading_history(tmp, state)
state <- res$state
rh <- bind_rows(rh, res$output)
}
# Add source information for plotting
rh <- left_join(rh, data_info, by = "edu")
The model returns the following information
Rows: 22,492
Columns: 7
$ token <chr> "test", "test", "test", "test", "test", "test", "test", "test", "test", "test", "test",…
$ t_start <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.7…
$ activation <dbl> 1.00000000, 0.29487149, 1.00000000, 0.29487149, 1.00000000, 0.29487149, 1.00000000, 0.1…
$ available <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, F…
$ edu <chr> "006bssf", "006bssf", "010isat", "006bssf", "010isat", "006bssf", "010isat", "006bssf",…
$ source <chr> "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_short_…
$ words_in_edu <int> 18, 18, 9, 18, 9, 18, 9, 18, 9, 12, 18, 9, 12, 15, 18, 9, 12, 15, 19, 18, 9, 12, 15, 19…
The model results are visualised in Figure 1. Activation values represent relative indices of memory prominence rather than absolute memory strength. Accordingly, the model’s predictions focus on availability and competition among propositions, which are assumed to drive rereading and source use decisions. The model does not attempt to estimate absolute memory strength. Instead, it models dynamic changes in relative availability under time‑based decay and interference, which are assumed to be the relevant determinants of writing behavior. The model permits statements about the relative prominence of propositions in memory (e.g. EDU “X” being more active than EDU “Y” at a given moment), but does not assign absolute memory‑strength units to activation values.
Figure 1: Reading history. Each line represents an EDU. Colour indicates whether the propositional information of the EDU is available or not.
Each EDU is treated as a memory trace whose activation varies over time. Activation increases when an EDU is read and decays due to time, interference from new information, and non‑reading activities such as writing or planning.
Core assumptions:
The model outputs a memory snapshot after every event, giving the estimated activation and availability of all EDUs seen so far.
At any time \(t\), the model maintains:
The complete model state is:
\[ \text{state}(t) = \{ A(t), t_{\text{last}}, t^{\text{fix}} \} \]
lambda = 0.08 # passive decay rate (/sec)
alpha = 0.9 # encoding strength
beta = 0.10 # interference from new reading
gamma = 0.18 # decay during non-reading (“other”)
tau = 2.0 # refixation window (sec)
k = 0.3 # refixation maintenance gain
theta = 0.15 # availability threshold (normalised)
These values are intended for adult L1 readers processing propositional content.
Each input event corresponds to a single attentional or task-related episode. Events are processed sequentially in time order.
Each event has the following attributes:
read, other}read events only:
The model makes no assumptions about dwell structure or fixation clustering; any event stream that satisfies these conditions can be used.
Between any two events, all EDU activations decay exponentially:
\[ \Delta t = \frac{t - t_{\text{last}}}{1000} \]
\[ A_e(t) = A_e(t_{\text{last}}) \cdot e^{-\lambda \Delta t} \]
where:
In other words, propositional activation fades over time even in the absence of interference.
event_type == "other")For non-reading events (e.g. writing, planning, thinking), no new content is encoded. Instead, existing activations decay further:
\[ A_e \leftarrow A_e \cdot e^{-\gamma \cdot (d / 1000)} \]
where:
In other words, non-reading activity competes with source-memory activation.
event_type == "read")Let EDU \(e\) be fixated for duration \(d\).
Encoding input is computed as fixation duration on a logarithmic scale, normalised by EDU length:
\[ I = \frac{\log(d + 1)}{w_e} \]
This ensures that long sentences do not receive disproportionately large activation.
Reading a new EDU suppresses activation of all existing EDUs:
\[ A_j \leftarrow A_j \cdot e^{-\beta I} \quad \forall j \]
where \(\beta\) controls interference strength. Thus propositions compete for limited attentional and memory resources.
If the same EDU was fixated within a short temporal window:
\[ \text{if } (t - t^{\text{fix}}_e) < \tau: \quad A_e \leftarrow A_e \cdot (1 + k) \]
where:
This models short‑term conceptual rehearsal.
After interference and maintenance, the current EDU gains activation:
\[ A_e \leftarrow A_e + \alpha I \]
Finally, its fixation time is updated:
\[ t^{\text{fix}}_e \leftarrow t \]
To estimate which EDUs are likely retrievable at a given moment, activations are normalised:
\[ A^*_e = \frac{A_e}{\max_j A_j} \]
An EDU is considered available if:
\[ A^*_e \ge \theta \]
where \(\theta\) is a threshold parameter. Thus availability reflects relative retrievability under competition, not absolute memory strength.
For each incoming event, in temporal order:
"other":
"read":
A complete memory snapshot is produced after every event.
# passive decay
A <- A * exp(-lambda * dt)
if (row$event_type == "other") {
A <- A * exp(-gamma * (row$duration / 1000))
}
if (row$event_type == "read") {
input <- log(row$duration + 1) / row$words_in_edu
# interference
A <- A * exp(-beta * input)
# refixation maintenance
if (edu %in% names(last_fix_time) &&
(now - last_fix_time[edu]) < refix_window) {
A[edu] <- A[edu] * (1 + k_maint)
}
# encoding
A[edu] <- (A[edu] %||% 0) + alpha * input
last_fix_time[edu] <- now
}
A_norm <- A / max(A)
available <- A_norm >= theta
The model is intended to:
The model is not intended to:
This model is designed for analyses such as:
Parameters are expected to be weakly identifiable. Model evaluation should focus on robust qualitative behavior, not exact numerical estimates.
init_state <- function() {
list(
A = numeric(), # named activation vector
last_t = NA_real_, # last event time (ms)
last_fix_time = numeric() # last fixation per EDU (sec)
)
}
update_one_event <- function(row, state) {
A <- state$A
last_fix_time <- state$last_fix_time
# time elapsed (sec)
if (is.na(state$last_t)) {
dt <- 0
} else {
dt <- (row$t_start - state$last_t) / 1000
}
# universal passive decay
if (length(A) > 0 && dt > 0) {
A <- A * exp(-lambda * dt)
}
# ---- EVENT TYPES ----
if (row$event_type == "other") {
# task engagement without source input
if (length(A) > 0) {
A <- A * exp(-gamma * (row$duration / 1000))
}
} else if (row$event_type == "read") {
edu <- row$edu
now <- row$t_start / 1000
# encoding strength (EDU-length normalised)
input <- log(row$duration + 1) / row$words_in_edu
# interference from new reading
if (length(A) > 0) {
A <- A * exp(-beta * input)
}
if (!is.na(edu) &&
edu %in% names(last_fix_time) &&
!is.na(last_fix_time[edu]) &&
(now - last_fix_time[edu]) < refix_window) {
A[edu] <- (A[edu] %||% 0) * (1 + k_maint)
}
# encoding
A[edu] <- (A[edu] %||% 0) + alpha * input
last_fix_time[edu] <- now
}
list(
A = A,
last_t = row$t_start,
last_fix_time = last_fix_time
)
}
`%||%` <- function(x, y) if (is.null(x) || is.na(x)) y else x
update_reading_history <- function(df, state) {
df <- df[order(df$t_start), ]
outputs <- vector("list", nrow(df))
for (i in seq_len(nrow(df))) {
state <- update_one_event(df[i, ], state)
A <- state$A
# reporting snapshot
if (length(A) > 0 && any(is.finite(A))) {
A_norm <- A / max(A)
} else {
A_norm <- A
}
outputs[[i]] <- tibble(
token = df$token[i],
t_start = df$t_start[i],
edu = names(A_norm),
activation = as.numeric(A_norm),
available = A_norm >= theta
)
}
list(
state = state,
output = dplyr::bind_rows(outputs)
)
}