Model description

This document describes a real‑time activation model used to estimate the availability of propositional content in memory during reading‑ and writing‑from‑source tasks. The model treats reading as a stream of temporally ordered memory‑updating events that dynamically shape which ideas are available for subsequent production under time pressure and interference. It is driven by eye‑movement–derived events but abstracts away from word‑level processing, focusing instead on proposition‑level representations associated with sentence‑level units of text. The model is intended to capture the availability and accessibility of meaning rather than verbatim recall.

The model builds on activation‑based and interference‑based accounts of memory in psycholinguistics (e.g. ACT‑R, cue‑based retrieval, and related comprehension models), according to which linguistic representations are continuously updated, decay over time, and compete for retrieval (Anderson 1983; Anderson and Bower 1973; Lewis and Vasishth 2005). As in cue‑based retrieval frameworks, forgetting is treated not as loss of information but as a reduction in accessibility under competition (Lewis, Vasishth, and Van Dyke 2006; Myers and O’Brien 1998). Propositions are treated as the basic representational units whose activation is strengthened incrementally by reading, weakened by interference from new information, and eroded by time and non‑source‑reading activities such as writing.

At any point in time, each proposition has a continuous activation value reflecting its latent memory strength. This latent activation evolves continuously as a function of decay, interference, encoding, maintenance, and spreading activation but is not itself used directly for retrieval or learning decisions. Instead, availability is defined as a derived quantity that reflects how accessible a proposition is relative to other propositions within the same information source and under current attentional conditions.

Concretely, latent activation values are first normalised within each source using a divisive, mass‑based normalisation scheme. This source‑local normalisation yields a bounded relative activation signal that captures competitive accessibility among propositions belonging to the same text, independent of propositions from other sources. Relative activation therefore reflects within‑source competition and resource sharing rather than absolute memory strength.

The present model extends a time‑ and interference‑based activation framework by embedding propositions in structured networks over which activation can spread, and by allowing aspects of this structure to change during reading. Propositions are embedded in a semantic similarity network derived from distributional representations of sentence meaning, supporting limited spreading activation among related propositions. In addition, the model implements episodic learning: when multiple propositions are simultaneously accessible, their mutual associations can strengthen through gated co‑activation learning. Episodic learning is selective and availability‑dependent, allowing central ideas to emerge over time while peripheral propositions remain weakly connected.

In addition to competition among propositions, the model incorporates an explicit attentional control mechanism operating at the level of information sources (e.g. texts or discourse streams). While propositions from multiple sources may remain active in memory, only propositions belonging to currently attended sources are considered functionally available. Attentional focus is implemented as a graded, source‑level weight that decays over time and partially recovers when a source is revisited. This attentional signal modulates availability after source‑local normalisation, without directly altering the underlying latent activation of propositions. As a consequence, memory representations can persist outside the current focus of attention while remaining temporarily inaccessible for retrieval, spreading, and learning.

In line with Fuzzy Trace Theory (Reyna and Brainerd 1995; Reyna 2008), gist is treated as a meaning‑based representation that is more stable than verbatim content. In the present model, gist is not implemented as a separate representational trace. Instead, gist‑like representations emerge from stable patterns of relative activation and learned episodic connectivity among propositions within a source. Propositions that are both competitively central and strongly integrated within the episodic network contribute most strongly to gist, whereas transient or weakly connected propositions correspond to peripheral or unstable interpretations. Gist is computed source‑locally from the interaction of relative activation and episodic centrality, rather than from global activation across sources.

Overall, the model implements a hybrid activation‑based memory system with the following properties:

  • Items: Propositions are the basic representational units, associated with sentence‑level units of text.
  • State: Each proposition has a graded activation value reflecting latent memory strength.
  • Time dynamics: Activation decays continuously over time and is modulated by interference from ongoing processing, independently of competitive normalisation.
  • Competition: Availability is shaped by source‑local competitive normalisation, reflecting limited processing resources within each source.
  • Control: Availability is further modulated by attentional gating at the level of information sources.
  • Structure: Propositions are embedded in a semantic similarity network, supplemented by episodic associations learned through gated co‑activation during reading.
  • Learning: Episodic associations strengthen selectively among co‑active and available propositions, with learning magnitude determined by latent activation.
  • Derived signals: In addition to latent activation, the model derives relative and attention‑gated activation measures for retrieval, learning, and descriptive analyses of memory persistence over time.

This architecture aligns with a large body of psycholinguistic work in which forgetting during comprehension and production is understood as a consequence of decay, interference, competitive retrieval, and learning in a structured memory space, rather than as the loss of stored representations (e.g. Lewis, Vasishth, and Van Dyke 2006; Myers and O’Brien 1998).

Full functions can be found at the end of this document and will be loaded here.

# Load functions for reading history model with activation spreading
source("../r/rh-sa.R")

Model inputs

Example data

Below is an example of the input format that the model is expecting. All variables are required (except for token). The data are represented as time stamps with associated fixation durations associated with event_type. event_type has the value "read" always when edu is not NA and other when edu is NA. Thus, when event_type is "read" there is an edu, i.e. a text unit within the source texts, that is associated with the fixation duration. The code assumes that edu values are unambiguous identifiers of text regions across source texts.

Rows: 8,354
Columns: 7
$ token        <chr> "test", "test", "test", "test", "test", "test", "test", "test", "te…
$ source       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "RW1_short_ascii", NA, NA, …
$ edu          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "001kgfc", NA, NA, "002smtf…
$ t_start      <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+1…
$ event_type   <chr> "other", "other", "other", "other", "other", "other", "other", "oth…
$ duration     <int> 797, 355, 1788, 4286, 1761, 2183, 2565, 2076, 120, 93, 160, 140, 10…
$ words_in_edu <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, 22, 24, 22, 16,…

words_in_edu comes from the rst_tsv file:

Rows: 246
Columns: 4
$ edu          <chr> "001wfwa", "002tiwf", "003atti", "004saec", "005osit", "006netd", "…
$ text         <chr> "when faced with a particularly tough question on rounds during my …
$ source       <chr> "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short…
$ words_in_edu <int> 20, 25, 21, 18, 5, 39, 4, 25, 29, 19, 11, 36, 14, 32, 29, 16, 8, 23…

Span information from the rst_tsv file is required to calculate the weights between PEs.

Rows: 183
Columns: 3
$ source <chr> "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii", "AM1_short_ascii…
$ edu    <chr> "001wfwa,002tiwf", "001wfwa,002tiwf,003atti,004saec,005osit", "001wfwa,00…
$ level  <dbl> 9, 7, 5, 3, 1, 8, 10, 8, 6, 8, 6, 4, 4, 2, 4, 3, 0, 2, 2, 4, 5, 3, 1, 3, …

The input data format requires minimal data transformation from the incremental report returned by CyWrite (Chukharev-Hudilainen 2019) and no data cleaning. The data wrangling that would be minimally required (in R) is shown here:

jsonlite::fromJSON(txt = token, flatten = TRUE) %>%
  select(eye) %>%
  unnest(eye) %>%
  # label edus as "read" or "other
  mutate(event_type = case_when(!is.na(edu) ~ "read", TRUE ~ "other")) %>%
  # add number of words in edu
  left_join(data_pe, by = join_by(edu)) 

Finally we need a named vector with source as values and EDUs as names.

# EDUs by source 
edu_source <- setNames(data_pe$source, data_pe$edu)

# Preview
edu_source[1:2]
          001wfwa           002tiwf 
"AM1_short_ascii" "AM1_short_ascii" 

The raw fixation data are shown in Figure 1.

Raw eye-tracking data.

Figure 1: Raw eye-tracking data.

Semantic similarity

The function build_S_sbert() constructs a semantic similarity matrix for knowledge‑based spreading activation for each source text on the basis of the data stored in data_pe. This matrix is derived from sentence‑level semantic embeddings allowing semantic relations to be captured independently of surface lexical overlap. This matrix provides the structural basis for landscape‑style semantic spreading activation, in which activation can flow between propositions that are semantically related in a conceptual sense, even if they do not share words or are not directly connected by discourse structure.

Each EDU is mapped onto a dense semantic vector using a Sentence‑BERT (SBERT) model (Reimers and Gurevych 2019). SBERT models are trained to represent the meaning of sentences such that semantically equivalent or closely related sentences have similar vector representations, even when they differ substantially in wording. This makes them particularly well suited for short, propositional EDUs, where lexical overlap may be minimal.

SBERT is accessed via Python.

library(reticulate)
use_python("/usr/bin/python3", required = TRUE)
transformers <- import("sentence_transformers")
np <- import("numpy")

# initialise model (do once per session)
sbert_model <- transformers$SentenceTransformer("all-MiniLM-L6-v2")

# Generate SBERT embeddings for EDUs
emb_sbert_by_source <- data_pe %>%
  split(.$source) %>%
  lapply(compute_sbert_embeddings)
saveRDS(emb_sbert_by_source, file = "objects/emb_sbert_by_source.rda")

Semantic representations are constructed as follows:

  1. Sentence encoding: Each EDU text is passed to a pretrained SBERT model, which produces a fixed‑length, dense vector representation intended to capture the EDU’s propositional meaning.
  2. Embedding normalisation: EDU embeddings are L2‑normalised to ensure that cosine similarity reflects angular distance in semantic space rather than vector magnitude.
  3. Per‑source processing: Embeddings are constructed separately for each source text, and semantic similarity matrices are computed independently for each source. This prevents semantic spreading across sources.
# Build semantic similarity matrix for each source and its emebddings
edu_list <- data_pe %>% split(.$source)

common_names <- intersect(names(edu_list), names(emb_sbert_by_source))

S_by_source <- mapply(
  build_S_sbert,
  edu_df = edu_list[common_names],
  embeddings = emb_sbert_by_source[common_names],
  SIMPLIFY = FALSE)

SBERT embeddings approximate long‑term semantic knowledge learned from large corpora, including information about synonymy, paraphrase, and typical conceptual associations.

For each source text, a semantic similarity matrix \(S\) is computed by taking the cosine similarity between all pairs of EDU embeddings. For EDUs \(i\) and \(j\):

\[ S_{ij} = \frac{\vec{v}_i \cdot \vec{v}_j}{\|\vec{v}_i\| \|\vec{v}_j\|} \]

where \(\vec{v}_i\) and \(\vec{v}_j\) are the sentence‑level embedding vectors of EDUs \(i\) and \(j\), respectively. A preview is shown here:

          001kgfc   002smtf   003twnf   004eahf   005wtsg   006bssf
001kgfc 1.0000000 0.2677321 0.1633561 0.4572471 0.1959719 0.1470794
002smtf 0.2677321 1.0000000 0.3595323 0.3795831 0.4427657 0.5298123
003twnf 0.1633561 0.3595323 1.0000000 0.2868217 0.4470820 0.3980450
004eahf 0.4572471 0.3795831 0.2868217 1.0000000 0.3417116 0.4565610
005wtsg 0.1959719 0.4427657 0.4470820 0.3417116 1.0000000 0.4074331
006bssf 0.1470794 0.5298123 0.3980450 0.4565610 0.4074331 1.0000000

Cosine similarity yields values in the range \([0, 1]\), where higher values indicate greater semantic relatedness in embedding space. Because embeddings encode meaning beyond surface form, high similarity values can arise even when EDUs share few or no lexical items. The SBERT-based semantic similarities between EDUs is visualised in Figure 2 for both RW source texts.

SBERT-based semantic similarity matrix.

Figure 2: SBERT-based semantic similarity matrix.

The resulting semantic similarity matrix \(S\) has the following properties:

  • Rows and columns correspond to EDUs from the same source text
  • \(S_{ij} = S_{ji}\) (the matrix is symmetric)
  • \(S_{ii} = 1\) (self‑similarity)
  • Off‑diagonal values encode graded semantic relatedness
  • The matrix is dense and continuous, reflecting probabilistic semantic associations rather than categorical links

Row and column names are set to EDU identifiers, allowing direct alignment with the activation state vector.

The SBERT‑based semantic similarity matrix is intended to approximate the reader’s semantic memory structure in a distributional sense. Embedding‑based similarity captures conceptual relatedness that support:

  • synonymy and paraphrase relations
  • bridging and predictive inferences
  • resonance between propositions with similar meanings
  • maintenance of topic‑level coherence despite variation in wording

In the model, this matrix supports landscape‑style semantic activation spreading whereby activation of one proposition partially reactivates other propositions that are semantically related at the level of meaning rather than surface form. Semantic spreading is implemented with a small weighting parameter ensuring that knowledge‑based resonance provides weak but persistent background support rather than dominating direct encoding or discourse‑structural integration mechanisms.

Application of reading history model

The following code illustrates how the reading history function update_reading_history can be applied to slices of data. These data slices can have any size from 1 to the overall length of the writing session. Details of the implementation of update_reading_history and a conceptual description can be found below. Default parameter values are chosen to yield qualitatively plausible dynamics for skilled adult readers, including sparse availability, strong attentional modulation, and gradual emergence of gist, rather than to provide quantitative fits to individual readers.

# Decay and attention
lambda <- 0.3     # slow time decay (/sec)
kappa <- 0.3      # attentional disengagement from source (per second)

# Encoding and interference
alpha <- 0.3      # moderate encoding strength
# propositinoal level interference from reading
beta_within <- 0.2 # strong interference within source
beta_cross <- 0.1  # weaker interference across sources

# Refixation
refix_window <- 2.0  # seconds refixation window (sec)
k_maint <- 0.1  # refixation boost

# Rehearsal during non-reading events
k_rehearse <- 0.05   # very small maintenance gain
K_rehearse <- 2      # max number of propositions rehearsed

# Availability
theta <- 0.08 # retrieval threshold (normalised)

# Episodic learning
eta <- 0.001  # learning rate
rho <- 0.0002  # slow decay

# Activation spreading
semantic_spread_rate <- 0.02 # how much does semantic resonance matter
episodic_spread_rate <- 0.01 # episodic learning influence
spread_threshold <- theta * 1.2 # minimum activation for spreading
spread_gain <- 0.12

# Source activation recovery
source_recovery <- 0.6

# The code needs to know which sources that participant is looking at.
# Select only the S matrices for the current data
# sources <- c("RW1_short_ascii", "RW2_short_ascii") # would do
sources <- unique(drop_na(data, source) %>% pull(source))
S_by_source <- S_by_source[sources]

# Select source-edu info for current data
edu_source <- edu_source[edu_source %in% unique(data$source)]

# Vector for initial state
state <- init_state(S_by_source, edu_source)

# For testing load functions here (can be removed later)
source("../r/rh-sa.R")

# input data can be any slice of data
out <- update_reading_history(new_events = data, state = state, theta = theta) 

# state needs to be extracted when this should be applied incrementally
# (otherwise the function will revert to initial state)
#state <- out$state

# bind lists into data frames
rh_activation <- bind_rows(out$activation_history)
rh_gist <- bind_rows(out$gist_history)
rh_episodic <- bind_rows(out$episodic_history)

The model returns the following variables for rh_activation

Rows: 587,377
Columns: 6
$ t_start        <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e…
$ source         <chr> "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW1_sho…
$ edu            <chr> "001kgfc", "001kgfc", "001kgfc", "001kgfc", "002smtf", "001kgfc",…
$ activation_raw <dbl> 0.09515951, 0.09440978, 0.09428620, 0.08527568, 0.08727624, 0.067…
$ activation_rel <dbl> 0.08689100, 0.08626548, 0.08616229, 0.07272657, 0.07443273, 0.055…
$ available      <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…

and the following variables for rh_gist

Rows: 74,560
Columns: 6
$ source              <chr> "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii", "RW…
$ edu                 <chr> "004eahf", "003twnf", "002smtf", "006bssf", "001kgfc", "005w…
$ t_start             <dbl> 1.738679e+12, 1.738679e+12, 1.738679e+12, 1.738679e+12, 1.73…
$ episodic_centrality <dbl> 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, …
$ gist_score          <dbl> 7.427182e-01, 7.425269e-01, 1.484897e-02, 1.391411e-02, 2.34…
$ gist_confidence     <dbl> 0.0001912892, 0.0001912892, 0.0001912892, 0.0001912892, 0.00…

and the following variables for rh_episodic (episodic learning updates)

Rows: 3,052
Columns: 3
$ t_start                 <dbl> 1.738678e+12, 1.738678e+12, 1.738678e+12, 1.738678e+12, …
$ source                  <chr> "RW1_short_ascii", "RW1_short_ascii", "RW1_short_ascii",…
$ total_episodic_strength <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…

Activation

The model returns two activation-related signals that serve different functional roles. At the core of the model is a latent activation value for each proposition, which reflects its continuous memory strength as it evolves under encoding, decay, interference, maintenance, and spreading activation. This latent activation is expressed in arbitrary units and is not intended to represent absolute memory strength in psychophysical terms. Instead, it provides a continuous index of how strongly a proposition is represented in memory relative to its own prior state.

Crucially, latent activation is not used directly to determine retrieval or learning. Access to memory is mediated by competitive and attentional mechanisms. For this reason, the model’s predictions focus on availability rather than on raw activation alone. Availability is defined as a derived quantity that depends on (i) competitive normalisation of activation within a source and (ii) attentional gating at the level of information sources. These mechanisms determine which propositions are functionally accessible at a given moment and therefore able to influence rereading, spreading activation, episodic learning, and writing behaviour.

Figure 3 visualises the latent activation of propositions over time. Availability is indicated separately by graphical transparency: propositions that are available under the model’s availability criterion are plotted with full opacity, whereas unavailable propositions are shown with reduced opacity. In this way, the figure distinguishes memory strength (latent activation) from functional accessibility (availability), allowing both signals to be inspected simultaneously. The figure therefore supports statements about relative prominence in memory (e.g., proposition X being more strongly activated than proposition Y) while making explicit that only a subset of active propositions is functionally available at any given moment.

Reading history (raw activation). Each line / colour represents an EDU. Shading indicates whether the propositional information of the EDU is available (full opacity) or not (reduced opacity).

Figure 3: Reading history (raw activation). Each line / colour represents an EDU. Shading indicates whether the propositional information of the EDU is available (full opacity) or not (reduced opacity).

Episodic learning

Figure 4 shows whether the episodic structure accumulate, stabilise, or decay over time.

Global episodic learning over time.

Figure 4: Global episodic learning over time.

Figure 5 shows the episodic centrality of individual EDUs over time. This visualisation can inform which propositions become central to the episodic structure, and when.

Episodic centrality. Each line represents the episoctic centrality of an EDU.

Figure 5: Episodic centrality. Each line represents the episoctic centrality of an EDU.

Gist

Reported gist measures reflect a post‑hoc evaluation of the activation landscape and do not feed back into the activation or learning dynamics.

Emergence of gist candidates over time

Figure 6 shows how candidate gist propositions emerge, compete, and stabilise over time within each source. Each line represents the gist score of a proposition, reflecting its combined activation prominence and episodic integration. Early in reading, multiple propositions compete for gist status; as reading progresses, a smaller set of propositions comes to dominate, indicating consolidation of meaning.

Gist candidates over time. Lines represent EDUs.

Figure 6: Gist candidates over time. Lines represent EDUs.

Stability of gist interpretation over time

Figure 7 illustrates the confidence of the model’s gist representation over time, defined as the difference between the strongest and second‑strongest gist candidates. Low confidence reflects ambiguity and competition among interpretations, whereas increasing confidence indicates consolidation around a dominant meaning representation. Drops in confidence correspond to moments of reinterpretation or interference.

Gist confidence (top–2 gap). Lines represent source texts.

Figure 7: Gist confidence (top–2 gap). Lines represent source texts.

Persistence of dominant gist propositions

Figure 8 visualises which proposition is the dominant gist at each moment in time. Each horizontal band represents a source, and colour indicates the currently dominant gist proposition. Long uninterrupted bands reflect stable gist persistence, whereas frequent colour changes indicate ongoing reinterpretation.

Gist proposition dominance. EDUs are represented as colours.

Figure 8: Gist proposition dominance. EDUs are represented as colours.

Relate gist to what is written next (or semantic overlap)

Later.

Parameter overview

An overview of the model parameters is given in Table 1. Unless otherwise stated, all reported results use a single default parametrisation. Qualitative model behaviour was robust across reasonable parameter ranges; parameters primarily modulate the rate and selectivity of activation, rather than the structure of the process itself. Default parameter values are chosen to yield qualitatively plausible dynamics for skilled adult readers, including sparse availability, strong attentional modulation, and gradual emergence of gist, rather than to provide quantitative fits to individual readers.

Table 1: Model parameters, units, and qualitative interpretation
Parameter Unit Interpretation Effect of larger values
lambda per second Passive decay rate of latent activation over time Faster loss of latent activation between events
kappa per second Rate of attentional disengagement from an information source More rapid loss of attentional focus on sources
source_recovery dimensionless Minimum attentional weight when a source is revisited Faster recovery of availability when revisiting a source
alpha dimensionless Strength of encoding per reading event Faster accumulation of latent activation during reading
beta_within dimensionless Strength of interference among propositions within the same source Stronger suppression of competing propositions within a text
beta_cross dimensionless Strength of interference among propositions across different sources Stronger cross-source competition during reading
refix_window seconds Temporal window within which refixations count as maintenance More fixations treated as rehearsal rather than new encoding
k_maint dimensionless Multiplicative maintenance gain applied during refixation Stronger stabilisation of recently fixated propositions
k_rehearse dimensionless Maintenance gain during non-reading rehearsal Greater persistence of salient ideas during writing or planning
K_rehearse count Maximum number of propositions rehearsed during non-reading events Broader set of propositions maintained during non-reading events
theta dimensionless Threshold for attention-gated relative activation to count as available More selective availability; fewer propositions considered retrievable
eta dimensionless Learning rate for episodic association updates Faster growth of episodic associations
rho per update Decay rate of episodic associations over time Faster weakening of episodic associations without reinforcement
semantic_spread_rate dimensionless Weight of semantic similarity in activation spreading Stronger influence of background semantic knowledge
episodic_spread_rate dimensionless Weight of episodic associations in activation spreading Stronger influence of learned episodic structure
spread_gain dimensionless Overall gain applied during spreading activation More rapid redistribution of activation across related propositions
spread_threshold dimensionless Minimum effective activation required to participate in spreading Stricter gating of spreading to highly available propositions

Conceptual overview

The model treats reading and writing as a continuous stream of cognitively demanding events that update the activation state of propositional memory representations in real time. Sentence‑level EDUs serve as perceptual and attentional units during reading (i.e., areas of interest), while the underlying memory representations are propositions associated with those sentences. Each proposition is represented as a graded activation trace whose value evolves continuously over time. This activation reflects latent memory strength rather than permanent storage in long‑term memory. Momentary accessibility is not equated with raw activation but is derived from it through competitive normalisation and attentional gating.

Latent activation increases incrementally when a proposition is processed during reading and decreases as a function of time‑based decay, interference from newly processed information, and diversion of attention during non‑reading activities such as writing or planning. Interference is implemented as a direct reduction in latent activation, reflecting competition for limited processing resources during comprehension and production. Interference operates both within and across sources, with stronger suppression among propositions belonging to the currently attended source and weaker suppression across sources.

Competition among propositions is implemented at the level of accessibility rather than as a direct suppression of raw activation. At each time point, latent activation values are normalised within each information source using a divisive (mass‑based) normalisation scheme. This source‑local normalisation yields a bounded relative activation signal that reflects competitive accessibility among propositions belonging to the same text and captures resource sharing within a source. Relative activation is therefore invariant to uniform scaling of activation and reflects competition rather than memory strength per se.

In addition, relative activation is modulated by an explicit attentional control mechanism operating at the level of information sources. While propositions from multiple sources may remain active in memory, only propositions belonging to currently attended sources are considered functionally accessible. Attentional focus is implemented as a graded, source‑level weight that decays over time and partially recovers when a source is revisited. This attentional weight gates accessibility after source‑local normalisation, yielding an effective activation signal that determines which propositions are available for retrieval, spreading activation, and learning at a given moment. Importantly, attentional gating modulates access without directly altering latent activation, allowing memory representations to persist outside the current focus of attention while remaining temporarily unavailable.

Propositions are interdependent rather than independent units. Each source is associated with a semantic similarity network derived from distributional representations of sentence meaning, supporting limited spreading activation among related propositions. Spreading activation is availability‑dependent: only propositions that are sufficiently active and accessible participate in spreading, preventing indiscriminate propagation and runaway excitation.

In addition to fixed semantic relations, the model implements episodic learning during reading. When multiple propositions are simultaneously active and accessible within a source, their mutual associations may strengthen through gated co‑activation learning. Episodic learning is conditioned on availability but scales with latent activation, ensuring that learning strength reflects graded differences in processing depth while remaining selective. These episodic associations reflect reader‑ and episode‑specific integration of the text and depend on reading order, attentional patterns, and task demands. They are not interpreted as new propositional representations but as changes in the connectivity among existing propositions.

Within this framework, higher‑level understanding such as gist is not represented as a separate memory trace. Instead, gist emerges as a stable pattern of relative activation and learned episodic connectivity among propositions within a source. Propositions that are both competitively central and strongly integrated within the episodic network contribute most strongly to gist‑like representations, whereas peripheral propositions form weaker or more transient connections and are more susceptible to decay and interference. Operationally, gist is computed source‑locally from the interaction of relative activation and episodic centrality rather than from global activation across sources.

The model operates in real time and produces a memory snapshot after each event, yielding continuous estimates of latent activation, source‑local relative activation, attention‑gated availability, and a bounded decay‑preserving activation signal for all propositions encountered so far. The decay‑preserving signal is used exclusively for descriptive and visual analyses of memory persistence over time and does not influence competition, learning, or retrieval. This architecture allows the model to track how the set of accessible ideas changes dynamically during writing‑from‑source tasks as a function of reading, attention, interference, and learning.

Activation and accessibility

The model distinguishes explicitly between latent activation, relative activation, and availability, which serve different functional roles. At any time \(t\), each proposition is associated with a latent activation value that reflects its current memory strength. This activation evolves continuously as a function of encoding, decay, interference, maintenance, and spreading activation. Latent activation is expressed in arbitrary units and is not interpreted as direct retrieval readiness.

Accessibility is derived from latent activation through a two-stage process. First, activation values are normalised source‑locally using a divisive (mass‑based) normalisation scheme, yielding a bounded relative activation signal that reflects competitive accessibility among propositions belonging to the same information source. This relative activation captures within‑source competition and resource sharing and is invariant to uniform scaling of activation.

Second, relative activation is modulated by attentional focus via a source‑level gating mechanism. Each information source is associated with a graded attentional weight that decays over time and partially recovers when the source is revisited. Availability is defined as the attentional‑gated relative activation and determines whether a proposition is functional for retrieval, spreading activation, episodic learning, and production at a given moment.

As a consequence, propositions may retain substantial latent activation while remaining temporarily unavailable for retrieval, allowing the model to dissociate memory strength from moment‑to‑moment accessibility under attentional and contextual constraints.

State variables

At any time \(t\), the model maintains the following state variables:

  • \(A_e(t)\): the latent activation of proposition \(e\), reflecting its current memory strength.
  • \(t_{\text{last}}\): the time (in milliseconds) at which the previous event was processed.
  • \(t^{\text{fix}}_e\): the most recent fixation time (in seconds) for EDU \(e\), used to detect refixation‑based maintenance during reading.
  • \(w^{(s)}(t)\): a source‑level attentional weight for each source \(s\), modulating the availability of propositions associated with that source.
  • \(S^{(s)}\): a semantic similarity matrix for each source \(s\), encoding knowledge‑based relations among propositions derived from sentence‑level semantics.
  • \(E^{(s)}(t)\): an episodic association matrix for each source \(s\), encoding learned associations between propositions as a function of gated co‑activation during the current reading episode.

The complete model state at time \(t\) can thus be characterised as:

\[ \text{state}(t) = \bigl\{ A(t),\; t_{\text{last}},\; t^{\text{fix}},\; w^{(s)}(t),\; S^{(s)},\; E^{(s)}(t) \bigr\} \]

In addition to these persistent state variables, the model computes several derived signals that are not stored as part of the state. Source‑local relative activation \(\tilde{A}(t)\) is obtained by divisive normalisation of latent activation values within each source and reflects competitive accessibility among propositions in the same text. Availability is then computed by applying attentional gating to relative activation, yielding an effective activation signal that determines which propositions are functionally accessible for retrieval, spreading, learning, and production at a given moment. The model also derives a bounded, decay‑preserving activation signal for descriptive analyses of memory persistence over time.

In summary, propositional information is strengthened and weakened over time at the level of latent activation, while retrieval, learning, and availability‑dependent behaviour are governed by source‑local competitive normalisation, attentional control at the level of information sources, and structured relations among propositions. Latent activation supports persistence of meaning in memory, whereas relative activation and attentional gating jointly determine which propositions are functionally accessible at any given moment.

Event representation

Each input event corresponds to a single attentional or task‑related episode and is processed sequentially in temporal order. Events are derived from eye‑movement data but are treated abstractly as discrete updates to the memory state, rather than as low‑level perceptual or oculomotor units.

Each event is characterised by the following attributes:

  • Time stamp \(t\) (milliseconds), indicating when the event begins
  • Duration \(d\) (milliseconds), indicating the temporal extent of the event
  • Event type \(\in \{\texttt{read}, \texttt{other}\}\)
  • For events only:
    • Target EDU identifier \(e\)
    • EDU length \(w_e\) (number of words)

The time stamp determines the amount of passive decay and attentional disengagement since the previous event, while event duration is used to scale encoding strength and task‑related interference during processing. The model makes no assumptions about fixation clustering, dwell structure, or saccade patterns. Any temporally ordered event stream satisfying these conditions can serve as input, allowing the model to be applied to eye‑movement data as well as to other event‑segmented reading or writing paradigms.

Algorithmic summary

For each incoming event, processed sequentially in time order, the model performs the following operations:

  1. Compute elapsed time

    • Compute the elapsed time \(\Delta t\) since the previous event.
    • Cap the elapsed time to prevent unbounded decay during long pauses.
  2. Apply passive time‑based decay

    • The latent activation of all propositions decays exponentially as a function of \(\Delta t\).
    • This decay operates independently of event type, availability, or attentional state.
  3. Update source‑level attention

    • Attentional weights for all sources decay as a function of elapsed time.
    • If the current event is associated with a source, its attentional weight partially recovers.
    • Attentional weights modulate later availability but do not alter latent activation.
  4. Handle event‑type–specific processing

    (a) Non‑reading events (event_type == "other"):

    • No new content is encoded.
    • Selective rehearsal may occur among a small number of currently salient propositions.
    • Latent activation continues to decay, reflecting sustained diversion of attention from the source text.

    (b) Reading events (event_type == "read"):

    • Compute encoding strength from fixation duration, normalised by EDU length.
    • Apply interference to all previously activated propositions as a function of encoding strength:
      • stronger interference for propositions from the same source,
      • weaker interference for propositions from other sources.
    • If the currently fixated EDU was fixated recently, apply refixation‑based maintenance (conceptual rehearsal).
    • Add encoding‑related activation to propositions associated with the currently fixated EDU.
    • Update the most recent fixation time for that EDU.

    Interference, maintenance, and encoding operate directly on latent activation and precede all availability‑based computations.

  5. Compute competitive accessibility

    • Compute source‑local relative activation by divisive (mass‑based) normalisation of latent activation values within each source.
    • Apply attentional gating by modulating relative activation with the source‑level attentional weight, yielding an effective activation signal.
  6. Apply activation spreading

    • For the current source, apply spreading activation to propositions whose effective activation exceeds a minimum threshold.
    • Spreading integrates:
      • fixed semantic similarity relations, and
      • episodic associations learned through prior co‑activation.
    • Spreading is gain‑controlled and subject to a soft upper bound to prevent runaway growth.
    • Spreading operates on effective (attention‑gated) activation and influences subsequent availability indirectly.
  7. Determine availability for retrieval

    • Propositions whose effective activation exceeds an availability threshold are considered retrieval‑ready.
    • Availability determines which propositions can participate in learning and later influence behaviour.
  8. Update episodic associations (reading events only)

    • Select a small set of the most strongly activated available propositions within the current source.
    • Strengthen episodic associations among these propositions using a normalised co‑activation rule.
    • Apply slow decay to all episodic associations, allowing weakly supported links to fade.
    • Episodic learning is strictly gated by availability and capacity limits, but learning magnitude scales with latent activation.

After each event, the model produces a memory snapshot containing: - latent activation of all propositions encountered so far, - source‑local relative activation and attention‑gated availability, - retrieval availability status, - a bounded decay‑preserving activation signal for descriptive analyses, - updated episodic association strengths, - and source‑local gist estimates and confidence values computed from relative activation and episodic centrality.

This per‑event update cycle allows the model to track how propositional activation, accessibility, integration, and gist‑level understanding evolve dynamically over time as a function of reading, attention, interference, and learning.

Mathematical model

Passive time‑based decay

Between any two events, the latent activation of all propositions decays exponentially as a function of elapsed time:

\[ \Delta t = \frac{t - t_{\text{last}}}{1000} \]

\[ A_e(t) = A_e(t_{\text{last}}) \cdot e^{-\lambda \Delta t} \]

where:

  • \(A_e(t)\) is the latent activation of proposition \(e\), reflecting its current memory strength rather than its momentary accessibility,
  • \(\lambda\) is the passive decay rate (per second).

Elapsed time \(\Delta t\) is computed from event time stamps and may be capped to prevent excessive decay during long pauses. Passive decay operates directly on latent activation and is applied prior to all event‑type–specific updates. It is independent of competitive normalisation, attentional gating, or task context and provides a continuous temporal baseline against which the effects of interference, maintenance, and spreading activation are evaluated.

This operation implements a smooth loss of activation over time, consistent with activation‑based memory models in which representations gradually fade unless reactivated or maintained. Importantly, decay affects memory strength rather than retrieval or learning directly; its influence on accessibility is indirect and mediated through subsequent competitive normalisation and attentional gating.

Non‑reading events

For non‑reading events (, e.g. writing, planning, or thinking), no new propositional content is encoded and no reading‑related interference is applied. Changes in latent activation arise from passive time‑based decay and, optionally, from selective rehearsal of a small number of salient propositions.

Non‑reading events may trigger selective rehearsal. Relative activation \(\tilde{A}_e\) is first computed by source‑local divisive normalisation. When episodic associations are available, relative activation may be combined with episodic centrality \(C_e\) to yield a rehearsal priority score:

\[ R_e = \begin{cases} \tilde{A}_e, & \text{if no episodic associations are present} \\ \alpha_r \tilde{A}_e + (1 - \alpha_r) C_e, & \text{otherwise} \end{cases} \]

where \(\alpha_r \in [0,1]\) controls the influence of activation‑based versus episodic information.

The top \(K\) propositions according to \(R_e\) are selected for rehearsal. For each rehearsed proposition \(e\), latent activation is multiplicatively increased:

\[ A_e \leftarrow A_e \cdot (1 + k_{\text{rehearse}}) \]

where \(k_{\text{rehearse}}\) is a small maintenance gain.

All other propositions are unaffected by rehearsal during non‑reading events. No episodic learning or spreading activation occurs during these periods. This implementation captures the assumption that non‑reading activities divert attention away from source material while still allowing a limited number of salient ideas to remain active through conceptual rehearsal, without introducing new information or associations.

Encoding, interference, and maintenance during reading

During reading events (), activation dynamics reflect the combined effects of encoding new information, interference from concurrent processing, and short‑term maintenance through rereading. All operations in this section act directly on latent activation and precede competitive normalisation and attentional gating.

Encoding

Encoding input is computed as fixation duration on a logarithmic scale, normalised by EDU length:

\[ I = \frac{\log(d)}{w_e} \]

where EDU \(e\) is fixated for duration \(d\) (in milliseconds) and contains \(w_e\) words.

This transformation serves three purposes. First, logarithmic scaling reflects diminishing returns of fixation duration: very long fixations increase encoding strength, but not linearly. Second, normalisation by EDU length prevents longer sentences from receiving disproportionately large activation simply because they contain more words. Third, continuous scaling allows encoding strength to vary smoothly rather than categorically.

As a result, \(I\) can be interpreted as an approximation of processing depth at the propositional level rather than raw visual exposure. Encoding increases the latent activation of propositions associated with the currently fixated EDU in proportion to \(I\):

\[ A_e \leftarrow A_e + \alpha I \]

where \(\alpha\) is an encoding strength parameter.

Interference

Interference during reading is implemented as a graded reduction in latent activation of other propositions, reflecting competition for limited processing resources. When a proposition associated with source \(s\) is processed, activation of other propositions is attenuated as a function of encoding strength \(I\), with stronger interference applied to propositions from the same source than to propositions from other sources:

\[ A_e \leftarrow \begin{cases} A_e \cdot \bigl(1 - \beta_{\text{within}} I\bigr), & \text{if } e \in s \\ A_e \cdot \bigl(1 - \beta_{\text{cross}} I\bigr), & \text{if } e \notin s \end{cases} \]

with \(\beta_{\text{within}} > \beta_{\text{cross}}\).

This formulation captures similarity‑independent interference, a core principle of cue‑based and activation‑based memory models: encoding new information reduces the strength of competing representations because all propositions draw on shared processing resources. Interference is scaled by encoding strength \(I\), such that deeper engagement with new material produces stronger competition than superficial processing. The distinction between within‑source and cross‑source interference reflects the assumption that propositions embedded in the same discourse context compete more strongly than propositions from unrelated sources.

Importantly, interference operates directly on latent activation and precedes competitive normalisation and attentional gating. Its consequences for accessibility and retrieval are therefore indirect, mediated through subsequent computations of relative activation and availability.

Refixation‑based maintenance

When the same EDU is refixated within a short temporal window, this is treated not as a new encoding event but as short‑term conceptual rehearsal that stabilises an already active representation. Formally, if the time elapsed since the previous fixation of EDU \(e\) is less than a refixation window \(\tau\):

\[ \text{if } (t - t^{\text{fix}}_e) < \tau: \quad A_e \leftarrow A_e \cdot (1 + k) \]

where \(k\) is a maintenance gain.

Multiplicative scaling increases latent activation of propositions associated with the refixated EDU while preserving their proportional strength relative to one another. This mechanism captures the intuition that brief rereading supports short‑term maintenance of an idea in working accessibility rather than creating a qualitatively new memory trace.

Availability‑dependent activation spreading

In addition to direct encoding and interference effects, the model implements an activation‑spreading mechanism that captures resonance among semantically and episodically related propositions. Spreading is availability‑dependent: only propositions that are sufficiently prominent within their source and fall within the current attentional focus can participate.

For each source text \(s\), a semantic similarity matrix \(S^{(s)}\) is constructed such that:

  • Rows and columns correspond to EDUs from the same source,
  • \(S^{(s)}_{ij} \in [0,1]\) reflects semantic similarity between EDU texts,
  • Similarity is derived from sentence‑level distributional representations,
  • The matrix is symmetric and non‑negative.

In addition, the model maintains an episodic association matrix \(E^{(s)}(t)\), which is learned incrementally during reading. Semantic and episodic influences are combined into an effective spreading matrix:

\[ M^{(s)}(t) = \gamma \, S^{(s)} + \epsilon \, E^{(s)}(t), \]

where \(\gamma\) controls the contribution of fixed semantic similarity and \(\epsilon\) controls the influence of learned episodic associations.

Let \(A^{\text{eff}}_e(t)\) denote the attention‑gated relative activation (availability signal) of proposition \(e\). Activation spreads only if at least two propositions within the source exceed a minimum spreading threshold. When this condition is met, effective activation is updated according to:

\[ A^{\text{eff,new}}_i(t) = A^{\text{eff}}_i(t) + g \sum_j M^{(s)}_{ij}(t) \, A^{\text{eff}}_j(t), \]

where \(g\) is a small spreading‑gain parameter.

To prevent uncontrolled growth of activation, spreading is subject to gain control and a soft upper bound. Propositions whose effective activation falls below the spreading threshold neither send nor receive spreading activation.

By operating on attention‑gated relative activation, this mechanism ensures that spreading reflects resonance among propositions that are simultaneously salient and attended, rather than indiscriminate propagation across the semantic network. Early in reading, spreading is dominated by semantic similarity; as episodic associations accumulate, spreading increasingly reflects reader‑ and episode‑specific integration of the text.

Episodic learning via co‑activation

The model includes a mechanism for episodic learning that allows associations between propositions to change as a function of their co‑activation during reading. Episodic learning occurs after availability has been determined and after any availability‑dependent activation spreading has taken place.

For each source text \(s\), the model maintains an episodic association matrix \(E^{(s)}(t)\), initialised with zero weights. Learning is gated by availability: only propositions whose attention‑gated relative activation exceeds an availability threshold are eligible for learning. In addition, learning is capacity‑limited. At each update, only the most strongly activated available propositions (up to a fixed maximum \(K\)) are allowed to form or strengthen associations.

For two propositions \(i\) and \(j\) selected for learning at time \(t\), episodic associations are updated according to a normalised co‑activation rule:

\[ \Delta E^{(s)}_{ij}(t) = \eta \cdot \frac{A_i(t)\, A_j(t)}{\sum_{k \in \mathcal{L}} A_k^2(t) + \varepsilon}, \]

where:

  • \(\eta\) is a small learning rate,
  • \(\mathcal{L}\) is the set of propositions selected for learning,
  • \(\varepsilon\) is a small constant preventing division by zero.

Learning magnitude scales with latent activation, ensuring that deeper or more sustained processing leads to stronger integration, while availability determines whether learning occurs at all.

Episodic associations decay slowly over time according to:

\[ E^{(s)}(t+1) \leftarrow (1 - \rho)\, E^{(s)}(t), \]

allowing weakly supported associations to fade while consistently reinforced links become stable.

Episodic associations feed back into subsequent activation dynamics by contributing to availability‑dependent spreading. Over time, propositions that are repeatedly co‑activated and integrated acquire greater episodic centrality, becoming more influential contributors to gist‑level understanding.

Gist computation

In the present model, gist is not represented as a separate memory trace but is derived from the interaction of activation dynamics and episodic structure. At each time point, gist reflects which propositions are both highly accessible and centrally embedded within the learned episodic network for a given source.

Gist computation is source‑local and is applied independently for each source text. For a source \(s\), the model combines relative activation and episodic centrality to estimate the contribution of each proposition to gist‑level understanding.

Relative activation \(\tilde{A}^{(s)}_e(t)\) captures the momentary accessibility of proposition \(e\) within source \(s\), as defined by divisive competitive normalisation of latent activation (see Attentional gating and availability). Episodic centrality \(C^{(s)}_e(t)\) reflects the degree to which a proposition is integrated with other propositions via episodic associations and is computed as the sum of its episodic connection strengths:

\[ C^{(s)}_e(t) = \sum_j E^{(s)}_{ej}(t) \]

To ensure comparability with relative activation, episodic centrality is normalised within each source:

\[ \tilde{C}^{(s)}_e(t) = \frac{C^{(s)}_e(t)}{\max_j C^{(s)}_j(t)} \]

where normalisation is defined only when episodic associations are non‑zero.

The gist score for proposition \(e\) at time \(t\) is then defined as a weighted combination of relative activation and episodic centrality:

\[ G^{(s)}_e(t) = \alpha_g \cdot \tilde{A}^{(s)}_e(t) + (1 - \alpha_g) \cdot \tilde{C}^{(s)}_e(t) \]

where \(\alpha_g \in [0,1]\) controls the relative contribution of momentary accessibility versus longer‑term episodic integration.

Gist scores are computed only for propositions that are represented in both the activation state and the episodic association matrix for the current source. Propositions with high gist scores are those that are both currently accessible and strongly integrated with other propositions through episodic learning. Peripheral propositions, by contrast, may be episodically connected but inaccessible, or accessible but weakly connected, and therefore contribute less to gist‑level understanding.

To quantify how well‑defined the current gist is, the model computes a confidence measure based on the dominance of the strongest gist candidate over alternatives. Gist confidence at time \(t\) is defined as the difference between the highest and second‑highest gist scores:

\[ \text{Conf}(t) = G^{(s)}_{(1)}(t) - G^{(s)}_{(2)}(t) \]

where \(G^{(s)}_{(1)}(t)\) and \(G^{(s)}_{(2)}(t)\) denote the largest and second‑largest gist scores within source \(s\) at time \(t\). Higher confidence values indicate a clear, dominant interpretation, whereas lower values reflect competition or ambiguity among candidate gist propositions.

Full \(R\) implementation

# Main reading history update function ------------------------------------
# Iterate over events and store relevant information
update_reading_history <- function(new_events,
                                   state,
                                   theta) {

  activation_out <- list()
  gist_out <- list()
  episodic_out <- list()

  for (i in seq_len(nrow(new_events))) {

    # get event
    row <- slice(new_events, i)

    # for testing
    if(F){
      state <- init_state(S_by_source, edu_source)
      for(i in 1:41){
        row <- slice(data, i)
        state <- update_one_event(state, row, source_recovery)
      }
      row <- slice(data, 42)
    }

    # 1. state update
    state <- update_one_event(state, row, source_recovery)

    # 2. activation snapshot
    activation_out[[i]] <- snapshot_activation(state, row, theta)

    # 3. episodic snapshot
    episodic_out[[i]] <- snapshot_episodic(state, row)

    # 4. gist snapshot
    gist_out[[i]] <- snapshot_gist(state, row, theta)

  }

  list(state = state,
    activation_history = bind_rows(activation_out),
    gist_history = bind_rows(gist_out),
    episodic_history = bind_rows(episodic_out))
}

# Update activation for each event
update_one_event <- function(state,
                             row,
                             source_recovery,
                             A_ceiling = 10) {
  # For testing
  if(F){
    state <- init_state(S_by_source, edu_source)
    row <- slice(data, 1000)
    state <- update_one_event(state, row, source_recovery)
    row <- slice(data, 1001)
    state <- update_one_event(state, row, source_recovery)
    row <- slice(data, 1010)
    state <- update_one_event(state, row, source_recovery)
    row <- slice(data, 1015)

  }

  # Last fixation time
  last_fix_time <- state$last_fix_time

  # time elapsed (sec)
  now <- row$t_start
  prev <- state$last_t
  dt_eff <- compute_dt(now, prev)

  # source-level attention decay
  state$source_weight <- apply_source_weight_decay(state, kappa, dt_eff)

  # 1. Passive decay
  A <- state$A_raw
  A <- apply_exponential_decay(A, lambda, dt_eff)

  # event types
  if (row$event_type == "other") {

    # Rehearsal for non reading events
    A <- apply_rehearsal(A,
                         state,
                         k_rehearse = k_rehearse,
                         K_rehearse = K_rehearse)

  } else if (row$event_type == "read") {

    edu <- row$edu # current EDU
    src <- row$source # current source

    # reset attention for current source
    state$source_weight[src] <- reset_source_attention(state, src, source_recovery)

    # Transform fixation duration so it's relative to number of words in edu
    input <- log(row$duration) / row$words_in_edu

    # 2. Interference (within- and cross-source)
    A <- apply_interference(A, input, src, edu_source, beta_within, beta_cross)

    # Refixation maintenance
    res <- apply_refixation(A, edu, now, last_fix_time, refix_window, k_maint)

    # Extract activation after re-fixation maintenance
    A <- res$A
    last_fix_time <- res$last_fix_time

    # 3. Encoding / maintenance
    A[edu] <- (A[edu] %||% 0) + alpha * input

    # Get relative activation
    A_rel <- compute_relative_activation(A, state)

    # Attentional gating via source activation
    A_eff <- apply_attentional_gating(A_rel, state)

    # 4. Spreading (prior and episodic knowledge)
    A_eff <- apply_spreading(A_eff,
                         state,
                         src,
                         A_ceiling,
                         spread_gain,
                         spread_threshold)

    # 5. Episodic learning update
    # Update coactivation
    state$E_by_source[[src]] <- update_E_by_coactivation(
        state,
        A,
        available = A_eff >= theta,  # Determine if activation is high enough
        src,
        eta,
        rho)
  }

  # 6. Hygiene
  A <- pmax(A, 0) # replace -Inf values with 0

  list(
    A_raw = A,
    last_t = row$t_start,
    last_fix_time = last_fix_time,
    S_by_source = state$S_by_source,
    E_by_source = state$E_by_source,
    edu_source = state$edu_source,
    source_weight = state$source_weight
  )
}

# Exponential decay ----------------------------------------------
apply_exponential_decay <- function(A, rate, dt) {
  if (length(A) == 0 || dt <= 0) return(A)
  A * exp(-rate * dt)
}

# Source weight decay -----------------------------------------------------
apply_source_weight_decay <- function(state, kappa, dt_eff){
  if (!is.na(state$last_t) && dt_eff > 0) return(state$source_weight * exp(-kappa * dt_eff))
  state$source_weight
}

# Source level interference -----------------------------------------------
apply_interference <- function(A,
                               input,
                               src,
                               edu_source,
                               beta_within,
                               beta_cross) {
  if (length(A) == 0) return(A)

  src_of_A <- edu_source[names(A)]

  A[src_of_A == src]  <- A[src_of_A == src] * (1 - beta_within * input)
  A[src_of_A != src]  <- A[src_of_A != src] * (1 - beta_cross * input)
  A
}

# Refixation maintenance --------------------------------------------------
apply_refixation <- function(A, edu, now, last_fix_time,
                             refix_window, k_maint) {

  if (!is.na(last_fix_time[edu]) &&
      (now - last_fix_time[edu]) < refix_window) {
    A[edu] <- (A[edu] %||% 0) * (1 + k_maint)
  }

  last_fix_time[edu] <- now
  list(A = A, last_fix_time = last_fix_time)
}


# Selective rehearsal -----------------------------------------------------
apply_rehearsal <- function(A,
                            state,
                            k_rehearse = 0.05,
                            K_rehearse = 2) {

  if (length(A) == 0) return(A)

  # relative activation
  A_rel <- compute_relative_activation(A, state)

  # bias toward strong / central items
  E <- combine_E_matrices(state)
  if (is.null(E) || nrow(E) == 0 || all(E == 0)) {
    score <- A_rel
  } else {
    C <- episodic_centrality(E)
    score <- 0.7 * A_rel + 0.3 * C
  }

  # select top-K
  rehearsed <- names(sort(score, decreasing = TRUE))[
    seq_len(min(K_rehearse, length(score)))
  ]

  # apply maintenance
  A[rehearsed] <- A[rehearsed] * (1 + k_rehearse)

  return(A)

}


# Activation spreading ----------------------------------------------------
apply_spreading <- function(A,
                            state,
                            src_id,
                            A_ceiling = 10,
                            spread_gain = 0.1,
                            spread_threshold = 0.2) {

  S_full <- state$S_by_source[[src_id]]
  E_full <- state$E_by_source[[src_id]]
  edus <- intersect(names(A), rownames(S_full))

  spread_edus <- edus[A[edus] >= spread_threshold]

  if (length(spread_edus) < 2) return(A)

  M <- combine_spreading_matrices(
    S_full[spread_edus, spread_edus, drop = FALSE],
    E_full[spread_edus, spread_edus, drop = FALSE],
    semantic_rate  = semantic_spread_rate,
    episodic_rate  = episodic_spread_rate
  )

  # Soft ceiling to prevent explosion (ratio‑preserving)
  A[spread_edus] <- pmin(
    A[spread_edus] + spread_gain * as.numeric(M %*% A[spread_edus]),
    A_ceiling)

  return(A)
}

# Reset source attention --------------------------------------------------
reset_source_attention <- function(state, src, source_recovery){
  return(pmax(state$source_weight[src], source_recovery))
}

# Source‑local normalisation ---------------------------------------------
compute_relative_activation <- function(A_raw, state) {
  # Copy for relative activation
  A_rel <- A_raw

  # Currently active sources
  active_sources <- unique(state$edu_source[names(A_raw)])

  # Normalise activation values values within source
  for(s in active_sources){
    # get edus for each source
    edus <- names(A_raw)[state$edu_source[names(A_raw)] == s]
    A_rel[edus] <- A_raw[edus] / (sum(A_raw[edus]) + 1)
  }

  # Return relative activation
  return(A_rel)
}

# Attentional gating ------------------------------------------------------
apply_attentional_gating <- function(A_rel, state){
  # Copy for relative activation
  A_eff <- A_rel

  # Currently active sources
  active_sources <- unique(state$edu_source[names(A_rel)])

  # Gate attention values within source
  for(s in active_sources){
    # get edus for each source
    edus <- names(A_rel)[state$edu_source[names(A_rel)] == s]
    A_eff[edus] <- A_rel[edus] * state$source_weight[s]
  }

  return(A_eff)
}

# Episodic learning update ------------------------------------------------
update_E_by_coactivation <- function(state, A, available, src, eta, rho, K = 4) {
  #available = A_eff >= theta

  # EDU source lookup
  edu_source <- state$edu_source

  # Extract episodic history of current source
  E <- state$E_by_source[[src]]

  # Get activation values of edus in current source
  edus <- rownames(E)
  valid <- names(A)[names(A) %in% edus & available]
  A_avail <- A[valid]

  if (length(valid) >= 2) {
    a_vec <- head(sort(A_avail, decreasing = T), K)
    norm <- sum(a_vec^2) + 1e-6
    delta <- eta * tcrossprod(a_vec) / norm
    diag(delta) <- 0
    valid <- names(a_vec)
    E[valid, valid] <- E[valid, valid] + delta
  }

  E <- (1 - rho) * E
  diag(E) <- 0
  return(E)
}

episodic_centrality <- function(E) {
  if (is.null(E) || nrow(E) == 0 || all(E == 0)) return(numeric())
  c <- rowSums(E)
  c / max(c)
}

episodic_strength <- function(E) {
  sum(E) / 2   # divide by 2 because matrix is symmetric
}

combine_E_matrices <- function(state) {
  # Episodic history
  E_by_source <- state$E_by_source

  # Get currently known edus
  known_edus <- names(state$A_raw)

  # initialize empty matrix
  E_combined <- matrix(0,
                       nrow = length(known_edus),
                       ncol = length(known_edus),
                       dimnames = list(known_edus, known_edus))

  for (E in E_by_source) {
    if (is.null(E) || nrow(E) == 0) next

    edus <- intersect(rownames(E), known_edus)
    E_combined[edus, edus] <- E_combined[edus, edus] + E[edus, edus]
  }

  diag(E_combined) <- 0
  return(E_combined)
}

# Combine spreading matrices ----------------------------------------------
combine_spreading_matrices <- function(S, E,
                                       semantic_rate,
                                       episodic_rate) {

  M <- semantic_rate  * S + episodic_rate  * E
  diag(M) <- 0
  return(M)
}

# Save results ------------------------------------------------------------
snapshot_activation <- function(state, row, theta) {

  # extract raw activation
  A_raw <- state$A_raw

  # encountered edus
  known_edus <- names(state$A_raw)

  if (length(known_edus) == 0) return(tibble())

  # relative activation by source
  A_rel <- compute_relative_activation(A_raw, state)

  # attentional gating
  A_eff <- apply_attentional_gating(A_rel, state)

  tibble(
    t_start = row$t_start,
    edu = names(A_raw),
    activation_raw = as.numeric(A_raw),
    activation_rel = as.numeric(A_rel),
    available = A_eff >= theta) %>%
      mutate(source = state$edu_source[edu]) %>%
      relocate(t_start, source)
}

snapshot_episodic <- function(state, row) {

  src <- row$source
  if (is.na(src)) return(tibble())

  E <- state$E_by_source[[src]]
  if (is.null(E) || nrow(E) == 0) return(tibble())

  tibble(
    t_start = row$t_start,
    source  = src,
    total_episodic_strength = episodic_strength(E)
  )
}

snapshot_gist <- function(state, row, theta) {
  # current source
  src <- row$source

  if (is.na(src) || length(state$A_raw) == 0) return(NULL)

  gist_now <- compute_gist(state, src, alpha)

  if (is.null(gist_now) || nrow(gist_now) == 0) return(NULL)

  conf <- gist_confidence(gist_now)

  gist_now %>%
    mutate(t_start = row$t_start,
           source = src,
           gist_confidence = conf) %>%
    relocate(source, edu, t_start, episodic_centrality)
}

# Real time gist source ---------------------------------------------------
compute_gist <- function(state, src, alpha = 0.6) {

  # Episodic history of current source
  E <- state$E_by_source[[src]]

  # Compute relative activation
  A_rel <- compute_relative_activation(state$A_raw, state)

  if (length(A_rel) == 0) return(NULL)

  # Calculate episodic centraliy
  C <- episodic_centrality(E)

  # Get current EDUs
  edus <- intersect(names(A_rel), names(C))

  if (length(edus) == 0) return(NULL)

  # Calculate the gist score
  gist_score <- alpha * A_rel[edus] + (1 - alpha) * C[edus]

  # Return results
  tibble::tibble(
    edu = edus,
    gist_score = gist_score,
    episodic_centrality = C[edus]
  ) %>% arrange(desc(gist_score))
}

gist_confidence <- function(gist_df) {
  if (is.null(gist_df) || nrow(gist_df) < 2) return(NA_real_)
  gist_df$gist_score[1] - gist_df$gist_score[2]
}

# Semantic similarity matrix for spreading activation ---------------------

# Compute and cache EDU embeddings
compute_sbert_embeddings <- function(edu_df) {

  # edu_df: columns edu, text
  texts <- edu_df$text

  emb <- sbert_model$encode(
    texts,
    convert_to_numpy = TRUE,
    normalize_embeddings = TRUE)

  rownames(emb) <- edu_df$edu
  emb
}

# Build semantic similarity matrix
build_S_sbert <- function(edu_df, embeddings) {

  edus <- edu_df$edu
  V <- embeddings[edus, , drop = FALSE]

  # cosine similarity
  S <- tcrossprod(V)

  # numerical safety
  diag(S) <- 1
  S[S < 0] <- 0

  S
}


# Misc helper functions ----------------------------------------------------

`%||%` <- function(x, y) if (is.null(x) || is.na(x)) y else x

# initial state
init_state <- function(S_by_source, edu_source) {

  E_by_source <- lapply(S_by_source, function(S) {
    matrix(0, nrow = nrow(S), ncol = ncol(S),
           dimnames = dimnames(S))
  })

  list(
    A_raw = numeric(),
    last_t = NA_real_,
    last_fix_time = numeric(),
    S_by_source = S_by_source,
    E_by_source = E_by_source,
    edu_source = edu_source,

    source_weight = setNames(
      rep(1, length(S_by_source)),
      names(S_by_source)
    )

  )
}

# time passed
compute_dt <- function(t_now, t_last, cap = 5) {
  if (is.na(t_last)) return(0)
  min((t_now - t_last) /  1000, cap)
}

References

Anderson, John R. 1983. The Architecture of Cognition. Cambridge, MA: Harvard University Press.
Anderson, John R., and Gordon H. Bower. 1973. Human Associative Memory. Washington, DC: Winston.
Chukharev-Hudilainen, Evgeny. 2019. “Empowering Automated Writing Evaluation with Keystroke Logging.” In Observing Writing, edited by Eva Lindgren and Kirk Sullivan, 38:125–42. Brill.
Lewis, Richard L., and Shravan Vasishth. 2005. “An Activation-Based Model of Sentence Processing as Skilled Memory Retrieval.” Cognitive Science 29 (3): 375–419. https://doi.org/10.1207/s15516709cog0000_25.
Lewis, Richard L., Shravan Vasishth, and Julie A. Van Dyke. 2006. “Computational Principles of Working Memory in Sentence Comprehension.” Trends in Cognitive Sciences 10 (10): 447–54. https://doi.org/10.1016/j.tics.2006.08.007.
Myers, Jerome L., and Edward J. O’Brien. 1998. “Accessing the Discourse Representation During Reading.” Discourse Processes 26 (2-3): 131–57. https://doi.org/10.1080/01638539809545042.
Reimers, Nils, and Iryna Gurevych. 2019. “Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084.
Reyna, Valerie F. 2008. “A Theory of Medical Decision Making and Health: Fuzzy Trace Theory.” Medical Decision Making 28 (6): 850–65. https://doi.org/10.1177/0272989X08327066.
Reyna, Valerie F., and Charles J. Brainerd. 1995. “Fuzzy-Trace Theory: An Interim Synthesis.” Learning and Individual Differences 7 (1): 1–75. https://doi.org/10.1016/1041-6080(95)90031-4.