This document describes a real‑time activation model used to estimate the availability of propositional content (EDUs, e.g. sentences) in memory during reading‑to‑write tasks. The model treats reading as a stream of memory‑updating events that dynamically shape which ideas are available for writing under time pressure and interference and is driven by eye‑movement–derived events but abstracts away from word‑level processing. It is intended to capture proposition‑level availability, not surface recall.

The model builds on activation‑based and interference‑based accounts of memory in psycholinguistics (ACT‑R, cue‑based retrieval, discourse models), according to which linguistic representations are continuously updated, decay over time, and compete for retrieval. Like cue‑based retrieval and discourse comprehension models, it assumes that forgetting reflects reduced accessibility rather than loss of representations. EDUs are treated as propositional units whose activation is strengthened by reading, weakened by interference from new content, and eroded by non‑reading activities such as writing. Availability is defined in relative terms, capturing the selective accessibility of propositions under competition. The model thus integrates insights from activation‑based memory theory, discourse processing, and eye‑movement research into a real‑time account of source memory during writing.

Input data format

Below is an example of the input format that the model is expecting. All variables are required (except for token).

Rows: 8,354
Columns: 6
$ token        <chr> "test", "test", "test", "test", "test", "test", "test", "test", "test", "test", "test",…
$ edu          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "001kgfc", NA, NA, "002smtf", "003twnf", "002sm…
$ t_start      <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.7…
$ event_type   <chr> "other", "other", "other", "other", "other", "other", "other", "other", "other", "other…
$ duration     <int> 797, 355, 1788, 4286, 1761, 2183, 2565, 2076, 120, 93, 160, 140, 107, 602, 307, 221, 19…
$ words_in_edu <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, 22, 24, 22, 16, 16, 16, NA, 18, NA,…

In these data event_type has the value read always when edu is not NA and other when edu is NA. words_in_edu comes from the rst_tsv file:

Rows: 246
Columns: 3
$ edu          <chr> "001wfwa", "002tiwf", "003atti", "004saec", "005osit", "006netd", "007tpid", "008bats",…
$ source       <chr> "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_…
$ words_in_edu <int> 20, 25, 21, 18, 5, 39, 4, 25, 29, 19, 11, 36, 14, 32, 29, 16, 8, 23, 20, 31, 18, 35, 38…

The input data format requires minimal data transformation from the incremental report returned by cywrite and no data cleaning. The data wrangling that would be minimally required (in R) is shown here:

jsonlite::fromJSON(txt = token, flatten = TRUE) %>%
  select(eye) %>%
  unnest(eye) %>%
  # label edus as "read" or "other
  mutate(event_type = case_when(!is.na(edu) ~ "read", TRUE ~ "other")) %>%
  # add number of words in edu
  left_join(data_info, by = join_by(edu)) 

The following code illustrates how the reading history function update_reading_history can be applied to slices of data. These data slices can have any size from 1 to the overall length of the writing session. Details of the implementation of update_reading_history and a conceptual description can be found below.

# Model parameters
lambda  <- .08       # slow passive time decay (/sec)
alpha   <- .9        # moderate encoding strength
beta    <- .1        # strong propositinoal level interference from reading
gamma   <- .18       # strong decay during writing / "other"
refix_window <- 2.0  # seconds
k_maint      <- 0.3  # refixation boost
theta        <- 0.15 # retrieval threshold (normalised)

# Vector for initial states
state <- init_state()

# Data frame for results
rh <- tibble()

# Loop over random slices of data (to simulate real time)
for(i in seq(1, nrow(data), 20)){
  tmp <- data %>% slice(i)
  res <- update_reading_history(tmp, state)
  state <- res$state
  rh <- bind_rows(rh, res$output)
}

# Add source information for plotting
rh <- left_join(rh, data_info, by = "edu")

The model returns the following information

Rows: 22,492
Columns: 7
$ token        <chr> "test", "test", "test", "test", "test", "test", "test", "test", "test", "test", "test",…
$ t_start      <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.7…
$ activation   <dbl> 1.00000000, 0.29487149, 1.00000000, 0.29487149, 1.00000000, 0.29487149, 1.00000000, 0.1…
$ available    <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, F…
$ edu          <chr> "006bssf", "006bssf", "010isat", "006bssf", "010isat", "006bssf", "010isat", "006bssf",…
$ source       <chr> "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_short_…
$ words_in_edu <int> 18, 18, 9, 18, 9, 18, 9, 18, 9, 12, 18, 9, 12, 15, 18, 9, 12, 15, 19, 18, 9, 12, 15, 19…

The model results are visualised in Figure 1. Activation values represent relative indices of memory prominence rather than absolute memory strength. Accordingly, the model’s predictions focus on availability and competition among propositions, which are assumed to drive rereading and source use decisions. The model does not attempt to estimate absolute memory strength. Instead, it models dynamic changes in relative availability under time‑based decay and interference, which are assumed to be the relevant determinants of writing behavior. The model permits statements about the relative prominence of propositions in memory (e.g. EDU “X” being more active than EDU “Y” at a given moment), but does not assign absolute memory‑strength units to activation values.

Reading history. Each line represents an EDU. Colour indicates whether the propositional information of the EDU is available or not.

Figure 1: Reading history. Each line represents an EDU. Colour indicates whether the propositional information of the EDU is available or not.

Conceptual Overview

Each EDU is treated as a memory trace whose activation varies over time. Activation increases when an EDU is read and decays due to time, interference from new information, and non‑reading activities such as writing or planning.

Core assumptions:

The model outputs a memory snapshot after every event, giving the estimated activation and availability of all EDUs seen so far.

State Variables

At any time \(t\), the model maintains:

The complete model state is:

\[ \text{state}(t) = \{ A(t), t_{\text{last}}, t^{\text{fix}} \} \]


Parameters (Current Defaults)

lambda = 0.08   # passive decay rate (/sec)
alpha  = 0.9    # encoding strength
beta   = 0.10   # interference from new reading
gamma  = 0.18   # decay during non-reading (“other”)
tau    = 2.0    # refixation window (sec)
k      = 0.3    # refixation maintenance gain
theta  = 0.15   # availability threshold (normalised)

These values are intended for adult L1 readers processing propositional content.

Event Representation

Each input event corresponds to a single attentional or task-related episode. Events are processed sequentially in time order.

Each event has the following attributes:

The model makes no assumptions about dwell structure or fixation clustering; any event stream that satisfies these conditions can be used.

Mathematical Model

Passive Time‑Based Decay (All Events)

Between any two events, all EDU activations decay exponentially:

\[ \Delta t = \frac{t - t_{\text{last}}}{1000} \]

\[ A_e(t) = A_e(t_{\text{last}}) \cdot e^{-\lambda \Delta t} \]

where:

  • \(A_e(t)\) is the activation of EDU \(e\)
  • \(\lambda\) is the passive decay rate (per second)

In other words, propositional activation fades over time even in the absence of interference.

Non‑Reading Events (event_type == "other")

For non-reading events (e.g. writing, planning, thinking), no new content is encoded. Instead, existing activations decay further:

\[ A_e \leftarrow A_e \cdot e^{-\gamma \cdot (d / 1000)} \]

where:

  • \(\gamma\) reflects decay due to cognitive engagement
  • \(d\) is event duration in milliseconds

In other words, non-reading activity competes with source-memory activation.

Reading Events (event_type == "read")

Let EDU \(e\) be fixated for duration \(d\).

Encoding Strength

Encoding input is computed as fixation duration on a logarithmic scale, normalised by EDU length:

\[ I = \frac{\log(d + 1)}{w_e} \]

This ensures that long sentences do not receive disproportionately large activation.

Interference from New Content

Reading a new EDU suppresses activation of all existing EDUs:

\[ A_j \leftarrow A_j \cdot e^{-\beta I} \quad \forall j \]

where \(\beta\) controls interference strength. Thus propositions compete for limited attentional and memory resources.

Refixation‑Based Maintenance

If the same EDU was fixated within a short temporal window:

\[ \text{if } (t - t^{\text{fix}}_e) < \tau: \quad A_e \leftarrow A_e \cdot (1 + k) \]

where:

  • \(\tau\) is the refixation window
  • \(k\) is the maintenance gain

This models short‑term conceptual rehearsal.

Encoding Update

After interference and maintenance, the current EDU gains activation:

\[ A_e \leftarrow A_e + \alpha I \]

Finally, its fixation time is updated:

\[ t^{\text{fix}}_e \leftarrow t \]

Availability (Retrieval Readiness)

To estimate which EDUs are likely retrievable at a given moment, activations are normalised:

\[ A^*_e = \frac{A_e}{\max_j A_j} \]

An EDU is considered available if:

\[ A^*_e \ge \theta \]

where \(\theta\) is a threshold parameter. Thus availability reflects relative retrievability under competition, not absolute memory strength.

Algorithmic Summary

For each incoming event, in temporal order:

  1. Compute elapsed time \(\Delta t\)
  2. Apply passive decay to all EDUs
  3. If event type is "other":
    • apply task‑based decay
  4. If event type is "read":
    • compute encoding strength
    • apply interference
    • apply refixation maintenance if applicable
    • add encoding to the current EDU
    • update last fixation time
  5. Normalise activations
  6. Determine availability

A complete memory snapshot is produced after every event.

R Implementation

Core Update Logic

# passive decay
A <- A * exp(-lambda * dt)

if (row$event_type == "other") {
  A <- A * exp(-gamma * (row$duration / 1000))
}

if (row$event_type == "read") {
  input <- log(row$duration + 1) / row$words_in_edu

  # interference
  A <- A * exp(-beta * input)

  # refixation maintenance
  if (edu %in% names(last_fix_time) &&
      (now - last_fix_time[edu]) < refix_window) {
    A[edu] <- A[edu] * (1 + k_maint)
  }

  # encoding
  A[edu] <- (A[edu] %||% 0) + alpha * input
  last_fix_time[edu] <- now
}

Normalisation and Availability

A_norm <- A / max(A)
available <- A_norm >= theta

Scope and Limitations

The model is intended to:

The model is not intended to:

Intended Use

This model is designed for analyses such as:

Parameters are expected to be weakly identifiable. Model evaluation should focus on robust qualitative behavior, not exact numerical estimates.

Full R Implementation

init_state <- function() {
  list(
    A = numeric(),              # named activation vector
    last_t = NA_real_,          # last event time (ms)
    last_fix_time = numeric()   # last fixation per EDU (sec)
  )
}

update_one_event <- function(row, state) {

  A <- state$A
  last_fix_time <- state$last_fix_time

  # time elapsed (sec)
  if (is.na(state$last_t)) {
    dt <- 0
  } else {
    dt <- (row$t_start - state$last_t) / 1000
  }

  # universal passive decay
  if (length(A) > 0 && dt > 0) {
    A <- A * exp(-lambda * dt)
  }

  # ---- EVENT TYPES ----

  if (row$event_type == "other") {

    # task engagement without source input
    if (length(A) > 0) {
      A <- A * exp(-gamma * (row$duration / 1000))
    }

  } else if (row$event_type == "read") {

    edu <- row$edu
    now <- row$t_start / 1000

    # encoding strength (EDU-length normalised)
    input <- log(row$duration + 1) / row$words_in_edu

    # interference from new reading
    if (length(A) > 0) {
      A <- A * exp(-beta * input)
    }

    if (!is.na(edu) &&
        edu %in% names(last_fix_time) &&
        !is.na(last_fix_time[edu]) &&
        (now - last_fix_time[edu]) < refix_window) {

      A[edu] <- (A[edu] %||% 0) * (1 + k_maint)
    }

    # encoding
    A[edu] <- (A[edu] %||% 0) + alpha * input

    last_fix_time[edu] <- now
  }

  list(
    A = A,
    last_t = row$t_start,
    last_fix_time = last_fix_time
  )
}

`%||%` <- function(x, y) if (is.null(x) || is.na(x)) y else x

update_reading_history <- function(df, state) {

  df <- df[order(df$t_start), ]

  outputs <- vector("list", nrow(df))

  for (i in seq_len(nrow(df))) {

    state <- update_one_event(df[i, ], state)

    A <- state$A

    # reporting snapshot
    if (length(A) > 0 && any(is.finite(A))) {
      A_norm <- A / max(A)
    } else {
      A_norm <- A
    }

    outputs[[i]] <- tibble(
      token      = df$token[i],
      t_start    = df$t_start[i],
      edu        = names(A_norm),
      activation = as.numeric(A_norm),
      available  = A_norm >= theta
    )
  }

  list(
    state = state,
    output = dplyr::bind_rows(outputs)
  )
}