Reading-history model

Jens Roeser

Compiled Oct 10 2024

1 Preamble

This is a possible reading-history model intended to capture the mental representation of what a participant remembers of what they have read based on their gaze data. It’s unlikely to be perfect and I don’t know if it has any cognitive validity but I think that would be true even if this was the perfect reading-history model. It might be worth testing if the reading model based on eye data mirrors recall accuracy when prompted to answer questions about paragraphs at various points throughout a reading task.

2 Reading history calculation

Input variables were calculated for each dwell id which is a consecutive index for each period the gaze remained inside an EDU (paragraph) before moving on to another EDU, meaning attention changes from one EDU to another. I’m using the following three measures calculated for each dwell id:

dwell time: the sum of all fixation durations inside an EDU before leaving it again.
number of fixations: the count of all fixation inside an EDU before leaving it again.
re-fixation: whether or not an EDU has previously been fixated which was coded 0 if the EDU has not been fixated before and 1 if the EDU has been fixated before.

These three measures can be seen in Figure 2.1A to Figure 2.1C across dwell id.

On the basis of these three measures and the dwell id, an overall attention index was calculated which was defined as the product of the eye measures and their dwell id. The latter is important for the calculation of the reading history in the next step. Expresses formula the calculation is

\[ \text{attention index}_i = \text{log}(\text{dwell time}_i) \times \text{log}(\text{no. of fixations}_i) \times (\text{re-fixation}_i + 1) \times \text{dwell id}_i, \]

where \(i\) is \(\in 1, \ldots, n\) and \(n\) is the total number of dwell ids. The log transformation de-emphasizes large values. The attention index is shown in Figure 2.1D.

Figure 2.1: Eye data and reading history.

The attention index was then used to calculate an indicator of the reading history using the cumulative sum of the attention index from the first dwell id to the current dwell id divided over the dwell id

\[ \text{reading history}_k = \frac{\sum_{i=1}^{k} \text{attention index}_i}{\text{dwell id}_k}, \quad \text{for } k \in 1, \ldots, n, \]

where \(i\) is \(\in 1, \ldots, n\) and \(n\) is the total number of dwell ids. The cumulative sum captures how information is being accumulated and dividing the cumulative sum of dwell id allows the function to decrease to capture memory decay.

Finally the reading history index was normalised by dividing it over the maximum value in the current reading history

\[ \text{reading history}'_i = \frac{\text{reading hisotry}_i}{\max(\text{reading history})}, \]

where \(i\) is \(\in 1, \ldots, n\) and \(n\) is the total number of dwell ids. This normalisation was applied separately for each source (tab) in order to avoid a numeric bias towards EDUs in sources where less EDUs were fixated so far. The normalised reading-history index is shown in Figure 2.1E.

Figure 2.1 summarizes the data from the expert writer who was asked to first read and then summarise paragraphs 4-6. Attention to the target paragraph is shown in black individually for each of the three corresponding EDU. All other paragraphs were grouped in triplets. The increase in reading history for large dwell ids shows when the participants started reading for summary. There is also a small increase in re-reading for the EDU immediately proceeding the target section which may be deliberate or due to imprecision the eye-tracking data. The latter is likely to for the early increase that happened simultaneously to the increase in looks to the target paragraphs.

3 Comparing reading histories

A comparison with all reading histories from four expert writers can be found in Figure 3.1.

From these plots it can be seen that the participants in the right-hand panels dedicated more time re-reading the target EDUs. The participant in the top-left panel dedicated less time to re-reading the target EDUs but paid more attention to the target EDUs when encountering them the first time. The participant in the bottom-left panel did neither pay more attention to the target EDUs when reading them the first time, nor did they dedicate more time re-reading these paragraphs at the end of the session.

Figure 3.1: Reading-history comparison. Each line represents an EDU and the extent to which it is active in the mental reading history.

4 Notes

Decay is represented continuously depending on dwell id. Dwell ids are not equally long though this is captured by including the dwell time.
Decay might be faster when more EDUs were read before because of memory interference but this isn’t captured in this model.
Decay is more rapid when un-logging the number of fixations.
How do we know that this model has cognitive validity?
We could compare the reading history for our expert writers and the data from novice writers.