Real‑time propositional reading‑history model

This document describes a real‑time activation model used to estimate the availability of propositional content (EDUs, e.g. sentences) in memory during reading‑to‑write tasks. The model treats reading as a stream of memory‑updating events that dynamically shape which ideas are available for writing under time pressure and interference and is driven by eye‑movement–derived events but abstracts away from word‑level processing. It is intended to capture proposition‑level availability, not surface recall.

The model builds on activation‑based and interference‑based accounts of memory in psycholinguistics (ACT‑R, cue‑based retrieval, discourse models), according to which linguistic representations are continuously updated, decay over time, and compete for retrieval. Like cue‑based retrieval and discourse comprehension models, it assumes that forgetting reflects reduced accessibility rather than loss of representations. EDUs are treated as propositional units whose activation is strengthened by reading, weakened by interference from new content, and eroded by non‑reading activities such as writing. Availability is defined in relative terms, capturing the selective accessibility of propositions under competition. The model thus integrates insights from activation‑based memory theory, discourse processing, and eye‑movement research into a real‑time account of source memory during writing.

Input data format

Below is an example of the input format that the model is expecting. All variables are required (except for token).

Rows: 8,354
Columns: 6
$ token        <chr> "test", "test", "test", "test", "test", "test", "test", "test", "test", "test", "test",…
$ edu          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "001kgfc", NA, NA, "002smtf", "003twnf", "002sm…
$ t_start      <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.7…
$ event_type   <chr> "other", "other", "other", "other", "other", "other", "other", "other", "other", "other…
$ duration     <int> 797, 355, 1788, 4286, 1761, 2183, 2565, 2076, 120, 93, 160, 140, 107, 602, 307, 221, 19…
$ words_in_edu <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, 22, 24, 22, 16, 16, 16, NA, 18, NA,…

In these data event_type has the value read always when edu is not NA and other when edu is NA. words_in_edu comes from the rst_tsv file:

Rows: 246
Columns: 3
$ edu          <chr> "001wfwa", "002tiwf", "003atti", "004saec", "005osit", "006netd", "007tpid", "008bats",…
$ source       <chr> "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_…
$ words_in_edu <int> 20, 25, 21, 18, 5, 39, 4, 25, 29, 19, 11, 36, 14, 32, 29, 16, 8, 23, 20, 31, 18, 35, 38…

The input data format requires minimal data transformation from the incremental report returned by cywrite and no data cleaning. The data wrangling that would be minimally required (in R) is shown here:

jsonlite::fromJSON(txt = token, flatten = TRUE) %>%
  select(eye) %>%
  unnest(eye) %>%
  # label edus as "read" or "other
  mutate(event_type = case_when(!is.na(edu) ~ "read", TRUE ~ "other")) %>%
  # add number of words in edu
  left_join(data_info, by = join_by(edu))

The following code illustrates how the reading history function update_reading_history can be applied to slices of data. These data slices can have any size from 1 to the overall length of the writing session. Details of the implementation of update_reading_history and a conceptual description can be found below.

# Model parameters
lambda  <- .08       # slow passive time decay (/sec)
alpha   <- .9        # moderate encoding strength
beta    <- .1        # strong propositinoal level interference from reading
gamma   <- .18       # strong decay during writing / "other"
refix_window <- 2.0  # seconds
k_maint      <- 0.3  # refixation boost
theta        <- 0.15 # retrieval threshold (normalised)

# Vector for initial states
state <- init_state()

# Data frame for results
rh <- tibble()

# Loop over random slices of data (to simulate real time)
for(i in seq(1, nrow(data), 20)){
  tmp <- data %>% slice(i)
  res <- update_reading_history(tmp, state)
  state <- res$state
  rh <- bind_rows(rh, res$output)
}

# Add source information for plotting
rh <- left_join(rh, data_info, by = "edu")

The model returns the following information

Rows: 22,492
Columns: 7
$ token        <chr> "test", "test", "test", "test", "test", "test", "test", "test", "test", "test", "test",…
$ t_start      <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.7…
$ activation   <dbl> 1.00000000, 0.29487149, 1.00000000, 0.29487149, 1.00000000, 0.29487149, 1.00000000, 0.1…
$ available    <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, F…
$ edu          <chr> "006bssf", "006bssf", "010isat", "006bssf", "010isat", "006bssf", "010isat", "006bssf",…
$ source       <chr> "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_short_…
$ words_in_edu <int> 18, 18, 9, 18, 9, 18, 9, 18, 9, 12, 18, 9, 12, 15, 18, 9, 12, 15, 19, 18, 9, 12, 15, 19…

The model results are visualised in Figure 1. Activation values represent relative indices of memory prominence rather than absolute memory strength. Accordingly, the model’s predictions focus on availability and competition among propositions, which are assumed to drive rereading and source use decisions. The model does not attempt to estimate absolute memory strength. Instead, it models dynamic changes in relative availability under time‑based decay and interference, which are assumed to be the relevant determinants of writing behavior. The model permits statements about the relative prominence of propositions in memory (e.g. EDU “X” being more active than EDU “Y” at a given moment), but does not assign absolute memory‑strength units to activation values.

Figure 1: Reading history. Each line represents an EDU. Colour indicates whether the propositional information of the EDU is available or not.

Conceptual Overview

Each EDU is treated as a memory trace whose activation varies over time. Activation increases when an EDU is read and decays due to time, interference from new information, and non‑reading activities such as writing or planning.

Core assumptions:

EDUs correspond to propositional units (sentence meaning).
Memory activation is graded, competitive, and time‑dependent.
Reading strengthens the currently fixated EDU but suppresses others.
Writing or other non‑reading activities erode source memory.
Only sufficiently active EDUs are assumed to be retrieval‑ready (available).

The model outputs a memory snapshot after every event, giving the estimated activation and availability of all EDUs seen so far.

State Variables

At any time \(t\), the model maintains:

\(A_e(t)\): activation of EDU \(e\)
\(t_{\text{last}}\): time of the previous event (ms)
\(t^{\text{fix}}_e\): most recent fixation time (s) for EDU \(e\)

The complete model state is:

\[ \text{state}(t) = \{ A(t), t_{\text{last}}, t^{\text{fix}} \} \]

Parameters (Current Defaults)

lambda = 0.08   # passive decay rate (/sec)
alpha  = 0.9    # encoding strength
beta   = 0.10   # interference from new reading
gamma  = 0.18   # decay during non-reading (“other”)
tau    = 2.0    # refixation window (sec)
k      = 0.3    # refixation maintenance gain
theta  = 0.15   # availability threshold (normalised)

These values are intended for adult L1 readers processing propositional content.

Event Representation

Each input event corresponds to a single attentional or task-related episode. Events are processed sequentially in time order.

Each event has the following attributes:

Time stamp \(t\) (milliseconds)
Duration \(d\) (milliseconds)
Event type \(\in\) {read, other}
For read events only:
- Target EDU identifier \(e\)
- EDU length \(w_e\) (number of words)

The model makes no assumptions about dwell structure or fixation clustering; any event stream that satisfies these conditions can be used.

Mathematical Model

Passive Time‑Based Decay (All Events)

Between any two events, all EDU activations decay exponentially:

\[ \Delta t = \frac{t - t_{\text{last}}}{1000} \]

\[ A_e(t) = A_e(t_{\text{last}}) \cdot e^{-\lambda \Delta t} \]

where:

\(A_e(t)\) is the activation of EDU \(e\)
\(\lambda\) is the passive decay rate (per second)

In other words, propositional activation fades over time even in the absence of interference.

Non‑Reading Events (`event_type == "other"`)

For non-reading events (e.g. writing, planning, thinking), no new content is encoded. Instead, existing activations decay further:

\[ A_e \leftarrow A_e \cdot e^{-\gamma \cdot (d / 1000)} \]

where:

\(\gamma\) reflects decay due to cognitive engagement
\(d\) is event duration in milliseconds

In other words, non-reading activity competes with source-memory activation.

Reading Events (`event_type == "read"`)

Let EDU \(e\) be fixated for duration \(d\).

Encoding Strength

Encoding input is computed as fixation duration on a logarithmic scale, normalised by EDU length:

\[ I = \frac{\log(d + 1)}{w_e} \]

This ensures that long sentences do not receive disproportionately large activation.

Interference from New Content

Reading a new EDU suppresses activation of all existing EDUs:

\[ A_j \leftarrow A_j \cdot e^{-\beta I} \quad \forall j \]

where \(\beta\) controls interference strength. Thus propositions compete for limited attentional and memory resources.

Refixation‑Based Maintenance

If the same EDU was fixated within a short temporal window:

\[ \text{if } (t - t^{\text{fix}}_e) < \tau: \quad A_e \leftarrow A_e \cdot (1 + k) \]

where:

\(\tau\) is the refixation window
\(k\) is the maintenance gain

This models short‑term conceptual rehearsal.

Encoding Update

After interference and maintenance, the current EDU gains activation:

\[ A_e \leftarrow A_e + \alpha I \]

Finally, its fixation time is updated:

\[ t^{\text{fix}}_e \leftarrow t \]

Availability (Retrieval Readiness)

To estimate which EDUs are likely retrievable at a given moment, activations are normalised:

\[ A^*_e = \frac{A_e}{\max_j A_j} \]

An EDU is considered available if:

\[ A^*_e \ge \theta \]

where \(\theta\) is a threshold parameter. Thus availability reflects relative retrievability under competition, not absolute memory strength.

Algorithmic Summary

For each incoming event, in temporal order:

Compute elapsed time \(\Delta t\)
Apply passive decay to all EDUs
If event type is "other":
- apply task‑based decay
If event type is "read":
- compute encoding strength
- apply interference
- apply refixation maintenance if applicable
- add encoding to the current EDU
- update last fixation time
Normalise activations
Determine availability

A complete memory snapshot is produced after every event.

R Implementation

Core Update Logic

# passive decay
A <- A * exp(-lambda * dt)

if (row$event_type == "other") {
  A <- A * exp(-gamma * (row$duration / 1000))
}

if (row$event_type == "read") {
  input <- log(row$duration + 1) / row$words_in_edu

  # interference
  A <- A * exp(-beta * input)

  # refixation maintenance
  if (edu %in% names(last_fix_time) &&
      (now - last_fix_time[edu]) < refix_window) {
    A[edu] <- A[edu] * (1 + k_maint)
  }

  # encoding
  A[edu] <- (A[edu] %||% 0) + alpha * input
  last_fix_time[edu] <- now
}

Normalisation and Availability

A_norm <- A / max(A)
available <- A_norm >= theta

Scope and Limitations

The model is intended to:

track real‑time availability of propositional content
model decay and interference during reading‑to‑write tasks
support predictions about rereading and source reuse

The model is not intended to:

explain fixation durations or word‑recognition processes
model verbatim recall
provide absolute memory-strength estimates

Intended Use

This model is designed for analyses such as:

identifying when propositions become unavailable during writing
predicting rereading behavior
estimating which propositions are usable without rereading
examining interference between topically similar ideas

Parameters are expected to be weakly identifiable. Model evaluation should focus on robust qualitative behavior, not exact numerical estimates.

Full R Implementation

init_state <- function() {
  list(
    A = numeric(),              # named activation vector
    last_t = NA_real_,          # last event time (ms)
    last_fix_time = numeric()   # last fixation per EDU (sec)
  )
}

update_one_event <- function(row, state) {

  A <- state$A
  last_fix_time <- state$last_fix_time

  # time elapsed (sec)
  if (is.na(state$last_t)) {
    dt <- 0
  } else {
    dt <- (row$t_start - state$last_t) / 1000
  }

  # universal passive decay
  if (length(A) > 0 && dt > 0) {
    A <- A * exp(-lambda * dt)
  }

  # ---- EVENT TYPES ----

  if (row$event_type == "other") {

    # task engagement without source input
    if (length(A) > 0) {
      A <- A * exp(-gamma * (row$duration / 1000))
    }

  } else if (row$event_type == "read") {

    edu <- row$edu
    now <- row$t_start / 1000

    # encoding strength (EDU-length normalised)
    input <- log(row$duration + 1) / row$words_in_edu

    # interference from new reading
    if (length(A) > 0) {
      A <- A * exp(-beta * input)
    }

    if (!is.na(edu) &&
        edu %in% names(last_fix_time) &&
        !is.na(last_fix_time[edu]) &&
        (now - last_fix_time[edu]) < refix_window) {

      A[edu] <- (A[edu] %||% 0) * (1 + k_maint)
    }

    # encoding
    A[edu] <- (A[edu] %||% 0) + alpha * input

    last_fix_time[edu] <- now
  }

  list(
    A = A,
    last_t = row$t_start,
    last_fix_time = last_fix_time
  )
}

`%||%` <- function(x, y) if (is.null(x) || is.na(x)) y else x

update_reading_history <- function(df, state) {

  df <- df[order(df$t_start), ]

  outputs <- vector("list", nrow(df))

  for (i in seq_len(nrow(df))) {

    state <- update_one_event(df[i, ], state)

    A <- state$A

    # reporting snapshot
    if (length(A) > 0 && any(is.finite(A))) {
      A_norm <- A / max(A)
    } else {
      A_norm <- A
    }

    outputs[[i]] <- tibble(
      token      = df$token[i],
      t_start    = df$t_start[i],
      edu        = names(A_norm),
      activation = as.numeric(A_norm),
      available  = A_norm >= theta
    )
  }

  list(
    state = state,
    output = dplyr::bind_rows(outputs)
  )
}

Real‑time propositional reading‑history model

Jens Roeser

Update: 2026-04-03

Input data format

Conceptual Overview

State Variables

Parameters (Current Defaults)

Event Representation

Mathematical Model

Passive Time‑Based Decay (All Events)

Non‑Reading Events (`event_type == "other"`)

Reading Events (`event_type == "read"`)

Encoding Strength

Interference from New Content

Refixation‑Based Maintenance

Encoding Update

Availability (Retrieval Readiness)

Algorithmic Summary

R Implementation

Core Update Logic

Normalisation and Availability

Scope and Limitations

Intended Use

Full R Implementation

Real‑time propositional reading‑history model

Jens Roeser

Update: 2026-04-03

Input data format

Conceptual Overview

State Variables

Parameters (Current Defaults)

Event Representation

Mathematical Model

Passive Time‑Based Decay (All Events)

Non‑Reading Events (event_type == "other")

Reading Events (event_type == "read")

Encoding Strength

Interference from New Content

Refixation‑Based Maintenance

Encoding Update

Availability (Retrieval Readiness)

Algorithmic Summary

R Implementation

Core Update Logic

Normalisation and Availability

Scope and Limitations

Intended Use

Full R Implementation

Non‑Reading Events (`event_type == "other"`)

Reading Events (`event_type == "read"`)