1 Introduction

The Exemplar-Based Random Walk (EBRW) is a dynamic extension of the Generalized Context Model (GCM). According to the GCM, each time you experience an item in a particular context, it leaves a “trace” in memory. This trace is called an “exemplar” or an “instance” and it consists of a record of the values the item had for particular features.

1.1 Example: Memory for Colors

For example, colors can be described in terms of three features: hue, chroma (or “saturation”), and luminance (or “brightness”). Imagine that you saw many reddish-hued colors that varied in chroma and luminance. According to the GCM, each of the colors you saw would leave a trace in memory. Each trace would consist of the values of the color you saw along those three dimensions. In this example, all the colors have the same value for “hue”, which is 0. The value of “0” for “hue” corresponds to a reddish color. The colors differ in their values for chroma and luminance, as illustrated in the graph below:

colDF <- expand_grid(h = hue, c = seq(30, 70, length.out = 5), l = seq(30, 70, length.out = 5)) %>%
    mutate(col = hcl(h, c, l))

colDF %>%
    ggplot(aes(x = c, y = l, fill = col)) +
    geom_tile(width = 2.5, height = 2.5) +
    scale_fill_identity() +
    coord_equal() +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", caption = bquote(plain(hue) == .(round(hue))))

Your memory traces can also be written as a matrix, where each row corresponds to an item (color) and each column corresponds to a feature (h for “hue”, c for “chroma”, and l for “luminance”):

knitr::kable(head(colDF %>% select(h, c, l), 10), row.names = TRUE)
h c l
1 0 30 30
2 0 30 40
3 0 30 50
4 0 30 60
5 0 30 70
6 0 40 30
7 0 40 40
8 0 40 50
9 0 40 60
10 0 40 70

1.2 Perceived distance and similarity

The perceived distance between any two of these colors is the Euclidean distance between their feature values, weighted by the degree of attention given to each feature: \[ d_{ij} = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{ik} - x_{jk} \right)^2} \] where \(w_k\) is the weight given to feature \(k\), \(x_{ik}\) is the value of item \(i\) on feature \(k\), \(x_{jk}\) is the value of item \(j\) on feature \(k\), \(d_{ij}\) is the perceived distance between items \(i\) and \(j\), and \(N_F\) is the number of features which in this example is 3 (hue, chroma, and luminance). Note also that while Euclidean distance is appropriate for colors, other types of distance metrics may be more appropriate for other kinds of stimuli.

Perceived similarity between two items, \(s_{ij}\), is an exponential function of the psychological distance between those items: \[ s_{ij} = \exp(-c d_{ij}) \] where \(c\) is a sensitivity parameter that controls how quickly perceived similarity decreases with distance, as illustrated in the graph below:

expand_grid(d = seq(0, 10, length.out = 151), c = seq(0.25, 3, by = 0.25)) %>%
    mutate(s = exp(-c * d)) %>%
    ggplot(aes(x = d, y = s, color = c, group = c)) +
    geom_line() +
    scale_color_continuous_sequential() +
    labs(x = expression(d[ij]), y = expression(s[ij])) +
    theme(legend.position = c(1, 1), legend.justification = c(1, 1))

1.3 Summed similarity

Imagine that you have just been shown two colors from among the set shown earlier. We then present you with a third color—a probe item—and ask whether it was one of the two you just saw. You answer this question on the basis of the summed similarity between the probe item and the two items in memory. The probe item has its own vector of feature values, \(\mathbf{x_q}\). The perceived distance between the probe item and each of the memory items is, as above, the Euclidean distance between the probe’s feature values and those of the memory item: \[ \begin{align} d_{qj} & = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{qk} - x_{jk} \right)^2} \\ s_{qj} & = \exp \left( -c d_{qj} \right) \\ S & = \sum_{j = 1}^{N_M} s_{qj} \end{align} \] and, again, the perceived similarity \(s_{qj}\) between the probe \(q\) and memory item \(j\) is an exponential function of perceived distance \(d_{qj}\). Finally, summed similarity \(S\) is the sum of the perceived similarities across all \(N_M\) items in memory.

The graphs below illustrate how this works. The left graph shows contours of equal similarity from each of two study items. The right graph shows summed similarity as a function of the chroma and luminance of a probe item (assuming the same hue).

X <- matrix(
    c(40, 60,
      60, 50),
    nrow = 2,
    ncol = 2,
    byrow = TRUE
)

sens <- 0.05

colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
    mutate(col = hcl(h, c, l))

item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))

simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
    mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
    mutate(s = exp(-sens * d))

conPlot <- colDF %>%
    ggplot(aes(x = c, y = l)) +
    geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 1, by = 0.2)) +
    geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
    scale_fill_identity(aesthetics = "fill") +
    scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))

sumSimPlot <- simDF %>%
    group_by(c, l) %>%
    summarize(summed_sim = sum(s), .groups = "keep") %>%
    ggplot(aes(x = c, y = l, fill = summed_sim)) +
    geom_raster() +
    scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))

conPlot + sumSimPlot + plot_layout(nrow = 1)

1.3.1 Varying item strength

It is reasonable to believe that some memory traces are stronger than others, likely due to things like primacy and recency. In GCM/EBRW, “strength” is operationalized as a scaling factor \(m_j\) applied to perceived similarity: \[ s_{qj} = m_j \exp \left( -c d_{qj} \right) \]

Stronger traces have their similarity multiplied by a large value (\(m_j\) is large if trace \(j\) is strong) while weaker traces have their similarity multiplied by a small value (\(m_j\) is small if trace \(j\) is weak). This is illustrated in the pair of graphs below. A probe item does not need to be as similar to a strong item in order to evoke the same level of perceived similarity.

X <- matrix(
    c(40, 60,
      60, 50),
    nrow = 2,
    ncol = 2,
    byrow = TRUE
)

sens <- 0.05
strength <- c(1, 2)

colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
    mutate(col = hcl(h, c, l)) %>%
    mutate(label = c("1", "2"))

item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))

simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
    mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
    mutate(s = exp(-sens * d) * strength[item])

conPlot <- colDF %>%
    ggplot(aes(x = c, y = l)) +
    geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
    geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
    geom_text(aes(label = label), color = "black") +
    scale_fill_identity(aesthetics = "fill") +
    scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))

sumSimPlot <- simDF %>%
    group_by(c, l) %>%
    summarize(summed_sim = sum(s), .groups = "keep") %>%
    ggplot(aes(x = c, y = l, fill = summed_sim)) +
    geom_raster() +
    scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))

conPlot + sumSimPlot + plot_layout(nrow = 1)

1.3.2 Varying item specificity

The specificity parameter \(c\) represents the fact that items are assumed to be encoded with some degree of error/uncertainty. Just as items may be encoded with more or less strength, it is reasonable to assume that items can be encoded in memory with more or less specificity. Thus, we add a subscript to the \(c\) parameter corresponding to each study item: \[ s_{qj} = m_j \exp \left( -c_j d_{qj} \right) \]

If an item is encoded with high specificity, then it will only be perceived as similar to the probe if the probe is very close in psychological space. This is illustrated in the pair of graphs below, where item 2 is not only stronger (\(m_2 = 2\) vs. \(m_1 = 1\)) but also more precise (\(c_2 = 0.1\) vs. \(c_1 = 0.05\)).

X <- matrix(
    c(40, 60,
      60, 50),
    nrow = 2,
    ncol = 2,
    byrow = TRUE
)

specificity <- c(0.05, 0.1)
strength <- c(1, 2)

colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
    mutate(col = hcl(h, c, l)) %>%
    mutate(label = c("1", "2"))

item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))

simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
    mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
    mutate(s = exp(-specificity[item] * d) * strength[item])

conPlot <- colDF %>%
    ggplot(aes(x = c, y = l)) +
    geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
    geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
    geom_text(aes(label = label), color = "black") +
    scale_fill_identity(aesthetics = "fill") +
    scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))

sumSimPlot <- simDF %>%
    group_by(c, l) %>%
    summarize(summed_sim = sum(s), .groups = "keep") %>%
    ggplot(aes(x = c, y = l, fill = summed_sim)) +
    geom_raster() +
    scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))

conPlot + sumSimPlot + plot_layout(nrow = 1)

2 Making a recognition decision: From random walk to diffusion

According to the EBRW, each trace \(j\) in memory races to be retrieved at a rate proportional to its perceived similarity to the probe item, \(s_{qj}\). Traces race not just against one another, but against a criterion. If a memory trace wins the race, this is taken as evidence that the probe item matches something in memory, thus favoring a “yes” recognition response. If the criterion wins the race instead, this is taken as evidence that the probe item does not match anything in memory, thus favoring a “no” recognition response.

The idea is that, if the probe item matches something in memory, then the corresponding memory trace (or one sufficiently similar to probably match) should be able to win against the criterion. If nothing wins against the criterion, then this suggests there are no traces in memory that are a good match to the probe.

2.1 Accumulating memory evidence

The outcome of each race is added to a running tally which starts at zero. The value of the tally at any given time \(t\), which we can denote \(x(t)\), constitutes the current state of evidence from memory for making a recognition decision. Assume that each race takes \(\Delta\) seconds to complete. Each time a memory trace wins the race, the tally gets incremented by one; each time the criterion wins the race, the tally gets decremented by one. Put formally, we can write this process as \[ x\left( t + \Delta \right) = \begin{cases} x\left( t \right) + 1 & \text{if trace wins} \\ x\left( t \right) - 1 & \text{if criterion wins} \end{cases} \] where \(x(0) = 0\).

2.2 Decision boundaries

You commit to a recognition decision when your accumulated memory evidence is either sufficiently high, leading to a “yes” response, or sufficiently low, leading to a “no” response. We can write the upper and lower decision boundaries as \(B_Y\) and \(B_N\), respectively. For example, imagine setting \(B_Y = 7\) and \(B_N = -8\). In that case, you would say “yes” when \(x(t) = B_Y = 7\), i.e., when a memory trace had won seven more races than the criterion. You would say “no” when \(x(t) = B_N = -8\), i.e., when the criterion had won eight more races than any memory trace. In this example, you would have a slight bias toward saying “yes”, in the sense that you require less evidence to say “yes” than to say “no”. We might therefore expect that, all else being equal, you would tend to say “yes” faster than you would say “no”, since in general it will take less time to accumulate an advantage of \(B_Y = 7\) as opposed to an advantage of \(|B_N| = 8\).

2.3 Step probabilities

Although we have specified that whether memory evidence goes up or down depends on whether a trace or the criterion wins the race, we have not yet specified how the outcome of that race is determined. The winner of each race is random but depends on the similarity \(s_{qj}\) between each trace \(j\) and the probe \(q\). Specifically, trace \(j\) wins the race with probability: \[ \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} \] where \(\kappa\) is a nonnegative number that represents how stringent the criterion is. In other words, the equation above says that the probability that trace \(j\) wins the race is the similarity between trace \(j\) and the probe \(q\) divided by the summed similarity across all traces plus the criterion \(\kappa\). The probability that the criterion wins is \[ \frac{\kappa}{\sum_{k = 1}^N s_{qk} + \kappa} \] In a sense, we can think of the criterion is like a “virtual memory trace” that races alongside the \(N\) actual memory traces.

Remember that we increment memory strength whenever any trace wins the race, regardless of which one it is. Because only one trace can win each race, the probability that any trace wins is just the sum of the probabilities of winning across all \(N\) traces, i.e.: \[ \sum_{j = 1}^N \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} = \frac{1}{\sum_{k = 1}^N s_{qk} + \kappa} \left( \sum_{j = 1}^Ns_{qj} \right) = \frac{S}{S + \kappa} \] where \(S = \sum_{j = 1}^N s_{qj}\) is the summed similarity across all \(N\) traces.

2.4 Adjusting criterion for within-list similarity

It is reasonable to think that the criterion \(\kappa\) could be adjusted in response to a number of factors. One such factor might be list length; it would make sense to set a more stringent criterion for a longer list, since there will be a lot of incidental similarity between probes and traces.

However, in the present application, we only have one list length (\(N = 2\)). Another factor that can be important in adjusting criterion is within-list similarity. If the items within a list are themselves highly similar, then it makes sense to set a more stringent criterion since otherwise you may falsely recognize a probe that happens to be similar to both items. In other words, recognition should depend, at least in part, on judging whether the probe and trace(s) are more similar to one another than the traces are to themselves.

Conceptually, within-list similarity is the average similarity between all pairs of list items. Practically, this is a bit challenging to calculate because similarity depends on the strength and specificity with which the items are encoded.

To calculate within-list similarity, let’s first consider a pair of traces \(j\) and \(k\). If we treat trace \(k\) as if it were a probe (like \(q\)), then the similarity between \(k\) and \(j\) would be \[ s_{kj} = m_j \exp \left(-c_j d_{kj} \right) \] where the strength \(m_j\) and specificity \(c_j\) come from trace \(j\). We can also go the other direction and treat trace \(j\) as if it were a probe, resulting in \[ s_{jk} = m_k \exp \left(-c_k d_{jk} \right) \] We assume that the perceived similarity between traces \(j\) and \(k\) is just the average of \(s_{kj}\) and \(s_{jk}\), i.e., \[ \frac{s_{kj} + s_{jk}}{2} = \frac{1}{2} \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \]

In our present application, \(N = 2\), so the formula above is all we need to find the within-list similarity; just set \(j = 1\) and \(k = 2\). In general, however, within-list similarity can be written as \[ \begin{align} W & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \end{align} \] where we divide by \(N \left(N - 1 \right)\) because that is the number of pairwise similarities entering into the average. Notice also that the indexes \(j\) and \(k\) are cleverly chosen so as to cover every unique \(jk\) pair.

Finally, we can write the probability of taking a step up (i.e., of a trace winning the race) as \[ \frac{S}{S + \gamma W + \kappa} \] where \(\gamma W + \kappa\) reflects the criterion adjusted for within-list similarity, and \(\gamma\) is a free parameter (assumed to be nonnegative) that reflects the strength of the adjustment.

2.5 Exemplar-Based Random Walk

Finally, we are in a position to fully specify the Exemplar-Based Random Walk (EBRW) model. Let \(i\) be the index of the current experimental trial and \(s_i\) indicate the participant who produced trial \(i\). That participant is associated with a matrix of perceived distances \(\mathbf{d_{s_i}}\), where each entry in the matrix \(d_{{s_i}, jk}\) is the perceived distance between items \(j\) and \(k\) for participant \(s_i\). The probe item on trial \(i\) is item \(q_i\). The item studied in position \(j\) on trial \(i\) is item \(j_i\) (same goes for \(k\)). The summed similarity \(S_i\), within-list similarity \(W_i\), and step probability \(p_i\) for trial \(i\) can then be written \[ \begin{align} S_i & = \sum_{j = 1}^N m_{s_i, j} \exp \left( -c_{s_i, j} d_{s_i, q_i, j_i} \right) \\ W_i & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_{s_i, j} \exp \left(-c_{s_i, j} d_{s_i, k_i, j_i} \right) + m_{s_i, k} \exp \left(-c_{s_i, k} d_{s_i, j_i, k_i} \right) \right] \\ p_i & = \frac{S_i}{S_i + \gamma_{s_i} W_i + \kappa_{s_i}} \end{align} \] where \(N\) is the number of items in the list, \(m_{s_i, j}\) are free parameters for the strengths of items in different study positions, \(c_{s_i, j}\) are the specificity parameters for items in different study positions, and \(\gamma_{s_i}\) and \(\kappa_{s_i}\) are free parameters for the participant’s within-list similarity and criterion.

The EBRW models the speed of decision making in terms of how many races must be run until the accumulated win advantage in favor of either a “yes” or “no” response reaches a decision boundary. To convert this to “real time”, we must say how long each race takes and allow for a residual time that accounts for the additional time needed to orient attention to the trial and execute the selected motor response. Above, we used \(\Delta\) to stand for the amount of time (in seconds) each race takes to run. We add a subscript \(s_i\) to make \(\Delta_{s_i}\) to make clear that \(\Delta\) is a parameter that can vary between people. In addition, it will be convenient later to think instead of \(\nu_{s_i} = \frac{1}{\Delta_{s_i}}\), where \(\nu_{s_i}\) is the number of races per second. Finally, parameter \(R_{s_i}\) stands for the additional residual time.

The figure below shows an example of how memory evidence evolves during a single trial in which \(p_i = 0.55\), \(\nu_{s_i} = 40\), \(B_{s_i, Y} = 7\), \(B_{s_i, N} = -8\), and \(R_{s_i} = 0.2\). In addition, the graphs above and below the evidence trajectory illustrate the relative frequency with which, across many identical trials, each type of response would be made at each unit of time. Note that these distributions are discrete because the random walk operates in discrete time intervals, each of duration \(\Delta_{s_i}\) (which in this example is \(\Delta_{s_i} = \frac{1}{\nu_{s_i}} = 0.025\) seconds).

set.seed(1)

nu <- 40
p <- 0.55

B <- c(-8, 7)
resid <- 0.2

### RT distributions

Y_rw <- seq(B[1], B[2])
P_rw <- matrix(0, nrow = length(Y_rw), ncol = length(Y_rw))
P_rw[cbind(2:(nrow(P_rw) - 1), 1:(ncol(P_rw) - 2))] <- 1 - p
P_rw[cbind(2:(nrow(P_rw) - 1), 3:ncol(P_rw))] <- p
P_rw[1, 1] <- P_rw[nrow(P_rw), ncol(P_rw)] <- 1

### Simulation

while (TRUE) {
    winner <- 0
    x_rw <- 0

    while (TRUE) {
        s <- 2 * (runif(n = 1) < p) - 1
        x_rw <- c(x_rw, x_rw[length(x_rw)] + s)
        if (x_rw[length(x_rw)] <= B[1]) {
            winner <- 1
            break
        } else if (x_rw[length(x_rw)] >= B[2]) {
            winner <- 2
            break
        }
    }
    
    if (winner == 2 & (length(x_rw) / nu) > 1.5) {
        break
    }
}

RT_rw <- matrix(0, nrow = 2, ncol = length(x_rw))
Z_rw <- 1 * c(Y_rw == 0)

for (i in 1:length(x_rw)) {
    Z_rw <- Z_rw %*% P_rw
    RT_rw[1, i] <- Z_rw[1]
    RT_rw[2, i] <- Z_rw[length(Z_rw)]
}

dRT_rw <- apply(RT_rw, MARGIN = 1, FUN = diff) * nu / 2

rtPlot1 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(fill = "#eeb211", color = NA, width = 1 / nu) +
    coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_rw)))) +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rtPlot0 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(fill = "#46166b", color = NA, width = 1 / nu) +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    scale_x_continuous(limits = c(0, NA)) +
    scale_y_reverse(limits = c(max(c(dRT_rw)), 0)) +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rwPlot <- tibble(t = resid + 1:length(x_rw) / nu, x = x_rw) %>%
    ggplot(aes(x = t, y = x)) +
    geom_step(size = 1.5) +
    geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", size = 2) +
    geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", size = 2) +
    geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", size = 2) +
    geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
    geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
    geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
    coord_cartesian(xlim = c(0, NA)) +
    labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu))))

rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))

2.6 From discrete random walk to continuous diffusion

A random walk takes discrete-valued steps either up or down in discrete units of time. A Wiener diffusion process takes continuous-valued steps sampled from a normal distribution in infinitely small units of time, thus effectively operating in continuous time. We are going to approximate the discrete EBRW with a continuous EBD, or Exemplar-Based Diffusion.

In going from a random walk to a diffusion model, we are making an important psychological claim: We are saying that, instead of memory evidence arriving in discrete units at regular intervals, memory evidence is a continuous value that continually evolves as new information arrives. We can think of this as saying that, instead of only knowing the outcome of each race, you can see who is ahead and who is behind at any given time; this is the move from discrete time to continuous time. Moreover, instead of only scoring each race as a win or loss for the memory traces, the races are assigned a continuous value depending on how clear the winner is; this is the move from discrete evidence to continuous evidence.

2.6.1 Mean and standard deviation of diffusion

We can write the update equation for the random walk like we did above: \[ \begin{align} x \left( t + \Delta_{s_i} \right) & = \begin{cases} x(t) + 1 & \text{with probability } p_i \\ x(t) - 1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & = \begin{cases} 1 & \text{with probability } p_i \\ -1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & \sim 2 \times \text{Bernoulli} \left( p_i \right) - 1 \end{align} \] where we have rearranged terms and used the shorthand in the final line to emphasize the idea that each step of the random walk can be thought of as a sample from a Bernoulli distribution with parameter \(p_i\) that is then transformed from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\).

The update equation above is for a single race, but what if we want to know the distribution of the random walk after a particular span of time? Specifically, what is the distribution of the total change in memory evidence over one second of time? It depends on how many races can be run within one second, which is \(\frac{1}{\Delta_{s_i}} = \nu_{s_i}\). Because the total change over one second is the sum of \(\nu\) Bernoulli random variables each with parameter \(p_i\), it is distributed as a Binomial distribution. As above, this distribution gets transformed to reflect the change in scale in the outcome of each race from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\): \[ x \left( t + 1 \right) - x(t) \sim 2 \times \text{Binomial} \left( \nu_{s_i}, p_i \right) - 1 \]

One last step to turn this into a continuous diffusion: We need to approximate the binomial distribution with a normal distribution that has the same mean and variance. The mean and variance of this normal distribution are the mean and variance of the diffusion process that approximates the random walk. The mean of a binomial distribution is \(\nu_{s_i} p_i\) and its variance is \(\nu_{s_i} p_i \left(1 - p_i \right)\). This leads to the normal approximation \[ \begin{align} x \left( t + 1 \right) - x(t) & \sim 2 \times \text{Normal} \left[ \nu_{s_i} p_i, \nu_{s_i} p_i \left(1 - p_i \right) \right] - 1 \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left[ 2\nu_{s_i} p_i - 1, 4\nu_{s_i} p_i \left(1 - p_i \right) \right] \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left( \mu_i, \sigma^2_i \right) \\ \end{align} \] where, to summarize, the mean of the diffusion on trial \(i\) is \(\mu_i\) and the variance of the diffusion on trial \(i\) is \(\sigma_i^2\), both of which depend on step probability \(p_i\) and rate \(\nu_{s_i}\): \[ \begin{align} \mu_i & = \nu_{s_i} \left( 2 p_i - 1 \right) \\ \sigma^2_i & = 4 \nu_{s_i} p_i \left(1 - p_i \right) \\ \sigma_i & = 2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)} \end{align} \]

The mean \(\mu_i\) is the drift rate of the diffusion process. The variance \(\sigma_i^2\) is the noise in the diffusion process. For each infinitesimal unit of time \(dt\), the current state of memory evidence is updated according to \[ dx(t) = \mu_i dt + \epsilon_i(t) \] where \[ \epsilon_i(t) \sim \text{Normal} \left( 0, \sigma_i^2 dt \right) \] is a random variable sampled from a Normal distribution with mean zero and variance \(\sigma_i^2 dt\).

Notice an important property of the noise \(\sigma^2_i\): It is largest when \(p_i = 0.5\) and approaches zero as \(p_i\) approaches either 0 or 1. In other words, the more uncertain the outcome of each race, the more noise there is in the diffusion. This is illustrated below:

expand_grid(p = seq(0, 1, length.out = 101), nu = seq(10, 40, by = 10)) %>%
    mutate(sigma2 = 4 * nu * p * (1 - p)) %>%
    ggplot(aes(x = p, y = sigma2, color = nu, group = nu)) +
    geom_line() +
    labs(x = expression(p[i]), y = expression(sigma[i]^2), color = expression(nu[s[i]])) +
    theme(legend.position = c(1, 1), legend.justification = c(1, 1))

To summarize, the difference between the random walk and the diffusion is that we have swapped out a discrete binomial distribution of evidence increments per unit time with a continuous normal distribution of evidence increments per unit time. Everything else is the same: You still respond “yes” if and when the accumulated evidence \(x(t)\) reaches either the upper boundary \(B_{s_i, Y}\) or the lower boundary \(B_{s_i, N}\) and

2.6.2 Closeness of predictions

To illustrate how well the diffusion process approximates the random walk, the graphs below show the diffusion approximation to the same random walk example used above. The smooth lines in the upper and lower graphs are the probability of responding per unit time (i.e., the probability density function) according to the Wiener diffusion model. The open bars are the same probabilities from the random walk. The diffusion model’s predictions hew very closely to those of the random walk!

mu <- nu * (2 * p - 1)
sigma2 <- 4 * nu * p * (1 - p)
boundsep <- B[2] - B[1]
bias <- (0 - B[1]) / (B[2] - B[1])
delta_t <- 0.001

while (TRUE) {
    winner_diff <- NA
    x_diff <- 0
    
    while (TRUE) {
        x_diff <- c(x_diff, x_diff[length(x_diff)] + rnorm(n = 1, mean = mu * delta_t, sd = sqrt(sigma2 * delta_t)))
        if (x_diff[length(x_diff)] <= B[1]) {
            winner_diff <- 1
            break
        } else if (x_diff[length(x_diff)] >= B[2]) {
            winner_diff <- 2
            break
        }
    }
    
    if (winner == winner_diff & abs((length(x_diff) * delta_t) - (length(x_rw) / nu)) < (1 / nu)) {
        break
    }
}

x_diff <- pmax(pmin(x_diff, B[2]), B[1])

t <- seq(1, length(x_diff)) * delta_t

dRT_diff <- cbind(
    WienerPDF(t = t, response = "lower", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value,
    WienerPDF(t = t, response = "upper", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value
)

rtPlot1 <- tibble(t = resid + t, p = dRT_diff[,2]) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])), color = "#eeb211aa", fill = NA, width = 1 / nu) +
    geom_area(fill = "#eeb21177", color = "#eeb211") +
    coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_diff)))) +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rtPlot0 <- tibble(t = resid + t, p = dRT_diff[,1]) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])), color = "#46166baa", fill = NA, width = 1 / nu) +
    geom_area(fill = "#46166b77", color = "#46166b") +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    scale_x_continuous(limits = c(0, NA)) +
    scale_y_reverse(limits = c(max(c(dRT_diff)), 0)) +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rwPlot <- tibble(t = resid + t, x = x_diff) %>%
    ggplot(aes(x = t, y = x)) +
    geom_line() +
    geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", size = 2) +
    geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", size = 2) +
    geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", size = 2) +
    geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
    geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
    geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
    coord_cartesian(xlim = c(0, NA)) +
    labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu), mu[i] == .(signif(mu, 3)), sigma[i] == .(signif(sqrt(sigma2), 3)))))

rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))

3 Implementation in Stan

To find the EBD parameters that best fit the recognition data from each participant, I implemented the EBD in Stan. The Stan implementation of the EBD as applied to our data involves some additional complications and considerations.

3.1 The data

We model the data at the level of individual trials. For each trial i, we know

  • subject[i]: The numerical index of the participant who produced the trial.
  • study_item[i, 1] and study_item[i, 2]: These are the numerical indexes of the first and second items presented on the study list just prior to this trial. For example, if the study list for trial i was items 6 and 3 in that order, then study_item[i, 1] would be 6 and study_item[i, 2] would be 3.
  • probe_item[i]: The numerical index of the probe item presented on trial i.
  • resp[i]: The response the participant gave on trial i, either 0 (for “no”) or 1 (for “yes”).
  • rt[i]: The response time of the participant on trial i.
  • dist[1:nSubj, 1:nItem, 1:nItem]: A list of matrices of the perceived distances between all nItem items, inferred from Multidimensional Scaling for each participant. dist[s, j, k] gives the distance perceived by participant s between item j and item k.

3.2 Parameters to be estimated

For each participant s, there are a total of nine parameters to be estimated. These are:

  • retrieval_rate[s]: This is the number of races run per second, labeled \(\nu\) above.
  • boundsep[s]: This is the difference \(B_Y - B_N\), called boundary separation. Since the decision boundaries of a diffusion process do not need to be whole numbers, it is convenient to use this and the following parameter instead of \(B_Y\) and \(B_N\).
  • bias[s]: This is a number between 0 and 1 representing the degree of bias in how participant s sets their decision boundaries. \(B_Y\) is given by boundsep[s] * (1 - bias[s]) and \(B_N\) is given by -boundsep[s] * bias[s]. That way, if bias[s] is 0.5, the diffusion process is unbiased because \(B_Y = -B_N\) and you need exactly the same amount of evidence before you are willing to make either a “yes” or “no” response. If bias[s] is greater than 0.5, then s is biased in favor of “yes” responses while if bias[s] is less than 0.5, then s is biased in favor of “no” responses.
  • resid[s]: The residual time, other than that needed for the diffusion process, needed to orient to the probe item on the trial and execute the motor response. By definition, this must be a number between 0 and the shortest RT produced by participant s. This was called \(R\) above.
  • strength1[s]: The strength in memory of the trace of the first list item, \(m_{s_i, 1}\) above. Since strengths are on an arbitrary scale, we assume that \(m_{s_i, 2} = 1\) for all participants. That means strength1[s] is the strength of the first studied item relative to the strength of the second for participant s.
  • specificity[s]: This is the mean specificity for participant s, called parameter \(c\) above, which together with the next parameter is used to define the specificity for each serial position.
  • specificity_diff[s]: This is a number between 0 and 1 that, combined with the parameter above, specifies the specificity with which subject s encodes items in each serial position. Specifically, specificity_pos[1, s] = 2 * specificity[s] * (1 - specificity_diff[s]) (the result corresponds to \(c_{s_i, 1}\) above) and specificity_pos[1, s] = 2 * specificity[s] * specificity_diff[s] (the result corresponds to \(c_{s_i, 2}\) above). If specificity_diff[s] is 0.5, then subject s encodes items in each position with the same specificity, specificity[s]. If specificity_diff[s] > 0.5, then the second (more recent) item is encoded with greater precision than the first; if specificity_diff[s] < 0.5, then the first item is encoded with greater precision than the second.
  • within_sim[s]: This is how much the criterion for participant s is raised by within-list similarity; this corresponds to parameter \(\gamma\) above.
  • criterion[s]: This is the baseline criterion for participant s, corresponding to parameter \(\kappa\) above.

3.3 Hierarchical priors

All the parameters to be estimated with the exception of bias and specificity_diff are restricted to be non-negative real numbers. For these parameters, we assume that they are themselves sampled from group-level Gamma distributions characterized by their mode and standard deviation. The group-level modes and standard deviations are themselves free parameters. For bias and specificity_diff, since these parameters are restricted to be between 0 and 1, we assume that they are sampled from group-level Beta distributions characterized by their mode and concentration. The group-level modes and concentrations are also free parameters.

Thus, in addition to the nine model parameters for each participant, we estimate parameters corresponding to the group-level central tendency (modes) and variability (either standard deviation or concentration) of each of the nine parameters. This structure, along with moderately informative priors on the group-level parameters, is specified in the model block of the Stan program:

retrieval_rate_mode ~ exponential(0.033); // Mean = 30, SD = 30
retrieval_rate_sd ~ gamma(2.618, 0.054);  // Mode = 30, SD = 30
retrieval_rate ~ gamma(retrieval_rate_shape, retrieval_rate_rate);

boundsep_mode ~ exponential(0.033); // Mean = 30, SD = 30
boundsep_sd ~ gamma(2.618, 0.054);  // Mode = 30, SD = 30
boundsep ~ gamma(boundsep_shape, boundsep_rate);

bias_mode ~ beta(2.0, 2.0);             // Mean of 0.5, goes to zero at 0 or 1
bias_concentration ~ exponential(0.05); // Mean = 20, SD = 20
bias ~ beta(bias_shape1, bias_shape2);

resid_mode ~ exponential(2.0);  // Mean = 0.5, SD = 0.5
resid_sd ~ gamma(2.618, 3.236); // Mode = 0.5, SD = 0.5
resid ~ gamma(resid_shape, resid_rate);

strength1_mode ~ exponential(1.0);  // Mean = 1, SD = 1
strength1_sd ~ gamma(2.618, 1.618); // Mode = 1, SD = 1
strength1 ~ gamma(strength1_shape, strength1_rate);

specificity_mode ~ exponential(0.5);  // Mean = 2, SD = 2
specificity_sd ~ gamma(2.618, 0.809); // Mode = 2, SD = 2
specificity ~ gamma(specificity_shape, specificity_rate);

specificity_diff_mode ~ beta(2.0, 2.0);             // Mean of 0.5, goes to zero at 0 or 1
specificity_diff_concentration ~ exponential(0.05); // Mean = 20, SD = 20
specificity_diff ~ beta(specificity_diff_shape1, specificity_diff_shape2);

within_sim_mode ~ exponential(1.0);  // Mean = 1, SD = 1
within_sim_sd ~ gamma(2.618, 1.618); // Mode = 1, SD = 1
within_sim ~ gamma(within_sim_shape, within_sim_rate);

criterion_mode ~ exponential(1.0);  // Mean = 1, SD = 1
criterion_sd ~ gamma(2.618, 1.618); // Mode = 1, SD = 1
criterion ~ gamma(criterion_shape, criterion_rate);

For the parameters that are nonnegative, there is a prior on the group-level mode as well as the group-level standard deviation. The priors on the group-level modes are exponential() distributions chosen so that the mean and standard deviation of the exponential distribution are the same as the overall scale of the corresponding parameter (by definition, the mean and SD of an exponential distribution are identical). In some cases, the choice of overall scales is somewhat arbitrary (e.g., assuming that boundary separation \(A_{s_i}\) and retrieval rate \(\nu_{s_i}\) are on the order of 30) but in other cases it is dictated by meaningful considerations (e.g., strength and criterion-related parameters should be on a unit scale because they are all relative to the strength of the second item, which is 1). Instead of exponential distribution where the mean and SD are equal, the priors on the group-level standard deviations are gamma distributions where the mode and SD are equal. with a single parameter that specifies the overall scale of the parameter while the group-level standard deviations have a gamma() prior with two parameters (a shape and rate), also tied to the overall scale of the parameter. By “scale”, I’m referring to the mean/mode/SD of the priors on the group-level parameters. The figure below gives some examples of exponential and gamma distributions with different scales.

expand_grid(x = seq(0, 10, length.out = 101), scale = c(0.2, 0.5, 1, 2, 3, 4)) %>%
    mutate(exponential = dexp(x, rate = 1 / scale), gamma = dgamma(x, shape = (3 + sqrt(5)) / 2, rate = (1 + sqrt(5)) / (2 * scale))) %>%
    pivot_longer(cols = c(exponential, gamma), names_to = "Distribution", values_to = "d") %>%
    ggplot(aes(x = x, y = d, color = Distribution)) +
    geom_line() +
    facet_wrap("scale", labeller = label_both)

The reason for using a gamma() prior on standard deviations (as opposed to another exponential prior) is to avoid implosive shrinkage of the participant-level parameters onto the group mode. In preliminary explorations, I noticed that varying some parameters (like within_sim) does not have a strong influence on model predictions. As a result, you would need a lot of data (hundreds of trials per person) to uniquely identify the action of some parameters. Another way of saying that is that, if you have relatively few trials per person, that sparse data would probably be consistent with many possible values of some model parameters. When the data provide only a weak constraint on model parameters, the prior exerts itself. If the prior allows for zero between-participant variability, then implosive shrinkage can occur. With implosive shrinkage, the group-level SD is estimated to be zero and everyone gets assigned the same value for a parameter. As shown in the figure above, the gamma priors used on the group-level SD’s assign zero prior probability to values of zero. Thus, the gamma priors prevent implosive shrinkage while still allowing for a wide range of plausible values on a reasonable scale.

Finally, I note that the _mode and _sd group-level parameters are transformed into _shape and _rate parameters in the transformed parameters block of the Stan program, following the conversion formulas given by Kruschke: \[ \begin{align} r & = \frac{\omega + \sqrt{\omega^2 + 4 \sigma^2}}{2 \sigma^2} \\ s & = 1 + \omega r \end{align} \] where \(s\) and \(r\) are the shape and rate of the gamma distribution, respectively, and \(\omega\) and \(\sigma\) are the mode and standard deviation of the gamma distribution, respectively. Similarly, for the two parameters that fall between 0 and 1, their _mode and _concentration parameters are converted to the two shape parameters (_shape1 and _shape2) of a Beta distribution: \[ \begin{align} s_1 & = \omega \kappa + 1 \\ s_2 & = \left(1 - \omega \right) \kappa + 1 \end{align} \] where \(s_1\) and \(s_2\) are the two shape parameters of the Beta distribution and \(\omega\) and \(\kappa\) are the mode and concentration of the Beta distribution, respectively.

3.4 The likelihood function

The final line of the model block in the Stan program is

rt ~ wiener(trial_boundsep, resid[subject], trial_bias, trial_drift);

This line is a shorter and more efficient way of writing the following:

for (i in 1:nData) {
    rt[i] ~ wiener(trial_boundsep[i], resid[subject[i]], trial_bias[i], trial_drift[i]);
}

What these lines say is that the likelihood of the response time on each trial is the likelihood that a Wiener diffusion process with boundary separation trial_boundsep, residual time resid[subject], bias trial_bias, drift rate trial_drift, and drift variance 1 reaches its upper boundary at time rt. Notice that this poses two challenges:

  1. The variance of the diffusion is assumed to be 1, but according to the model, the variance \(\sigma_i^2\) changes from trial to trial depending on \(p_i\). As described below, we can account for this by re-scaling the boundary separation and mean drift rate.
  2. Stan computes the likelihood assuming that the process hit the upper boundary at time rt. This means that, in order to get the likelihood of hitting the lower boundary at time rt, we will need to “flip” the process. As described below, we can account for this by inverting the drift rate and bias on these trials.

3.4.1 Re-scaling to unit variance

The noise in the diffusion process, \(\sigma_i\), sets the scale for the diffusion process. If we multiply the response boundaries, mean drift rate \(\mu_i\), and standard deviation \(\sigma_i\) by the same amount, the model’s predictions will be unchanged. This means that, to make the process have unit variance, all we need to do is divide the mean drift on trial \(i\), \(\mu_i\), as well as the boundary separation for the subject on trial \(i\) (written below as \(A_{s_i}\)) by \(\sigma_i\) to find the re-scaled mean drift \(d_i\) and boundary separation \(b_i\) for trial \(i\). Mathematically, this can be written \[ \begin{align} d_i & = \frac{\mu_i}{\sigma_i} = \left(p_i - \frac{1}{2} \right) \sqrt{\frac{\nu_{s_i}}{p_i \left(1 - p_i \right)}} \\ b_i & = \frac{A_{s_i}}{\sigma_i} = \frac{A_{s_i}}{2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)}} \end{align} \]

3.4.2 Flipping to hit the upper boundary

Recall that we record the response made on trial i as resp[i], which is either 0 (for a “no” response) or 1 (for a “yes” response). That means that, for purposes of using Stan to calculate the likelihood, we need to “flip” both the drift rate and the bias on any trial i for which resp[i] is zero. The idea is that hitting the lower boundary with drift \(d\) and bias \(w\) is equivalent to hitting the upper boundary with drift \(-d\) and bias \(1 - w\).

Mathematically, if we let \(\mathbb{R}_i\) be an indicator variable that stands for resp[i] and \(\beta_{s_i}\) stand for the participant on trial \(i\)’s bias parameter (bias[subject[i]] in Stan) , we can write this additional transformation as \[ \begin{align} \tilde{d}_i & = \left(2 \mathbb{R}_i - 1 \right) d_i = \left(2 \mathbb{R}_i - 1 \right)\left(p_i - \frac{1}{2} \right) \sqrt{\frac{\nu_{s_i}}{p_i \left(1 - p_i \right)}} \\ b_i & = \frac{A_{s_i}}{\sigma_i} = \frac{A_{s_i}}{2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)}} \\ w_i & = \mathbb{R}_i \beta_{s_i} + \left(1 - \mathbb{R}_i \right) \left(1 - \beta_{s_i} \right) \end{align} \] where \(\beta_{s_i}\) is the bias parameter for participant \(s_i\) (bias[subject[i]]) and \(\tilde{d}_i\), \(b_i\), and \(w_i\) correspond to trial_drift, trial_boundsep, and trial_bias in the Stan code below. Note that we use \(d_i\) to stand for the drift rate on trial \(i\) and \(\tilde{d}_i\) to stand for the “flipped” drift rate used for the purpose of calculating the likelihood in Stan.

3.4.3 Stan code defining trial-level parameters

The Stan code for defining the trial-level parameters from the EBD is part of the transformed parameters block and is a direct translation of the equations above from math into Stan:

vector[nSubj] sqrt_retrieval_rate;
vector[nSubj] scaled_boundsep;

vector[nData] trial_sum_sim;
vector[nData] trial_within_sim;
vector[nData] p;

sqrt_retrieval_rate = sqrt(retrieval_rate);
scaled_boundsep = 0.5 * boundsep ./ sqrt_retrieval_rate;

for (i in 1:nData) {
    trial_sum_sim[i] = strength1[subject[i]] * exp(-specificity_pos[1, subject[i]] * dist[subject[i], study_item[i, 1], probe_item[i]]) + exp(-specificity_pos[2, subject[i]] * dist[subject[i], study_item[i, 2], probe_item[i]]);
    trial_within_sim[i] = 0.5 * (strength1[subject[i]] * exp(-specificity_pos[1, subject[i]] * dist[subject[i], study_item[i, 1], study_item[i, 2]]) + exp(-specificity_pos[2, subject[i]] * dist[subject[i], study_item[i, 1], study_item[i, 2]]));
}

p = trial_sum_sim ./ (trial_sum_sim + criterion[subject] + within_sim[subject] .* trial_within_sim);

trial_drift = sqrt_retrieval_rate[subject] .* (2.0 * resp - 1.0) .* (p - 0.5) .* inv_sqrt(p .* (1.0 - p));
trial_boundsep = scaled_boundsep[subject] .* inv_sqrt(p .* (1.0 - p));
trial_bias = bias[subject] .* resp + (1.0 - bias[subject]) .* (1.0 - resp);

3.5 Predicted mean RT’s and response probabilities

When we fit the model, it is important to check whether its predictions are close to what was observed. Therefore, it will be important to be able to calculate, for each trial \(i\), three things:

  1. The predicted probability of saying “yes” on trial \(i\), denoted \(\hat{p}_i\).
  2. The predicted mean RT for “yes” responses on trial \(i\), denoted \(\widehat{RT}_{Y, i}\).
  3. The predicted mean RT for “no” responses on trial \(i\), denoted \(\widehat{RT}_{N, i}\).

Since we know whether the probe item on trial \(i\) was a target (from the list) or not, we can use \(\widehat{RT}_{Y, i}\) and \(\widehat{RT}_{N, i}\) to find the predicted mean correct and error RT’s. If trial \(i\) is a target, then the predicted mean correct RT is \(\widehat{RT}_{Y, i}\) and the predicted mean error RT is \(\widehat{RT}_{N, i}\). These are reversed if the probe on trial \(i\) is a foil.

You may ask why it is useful to find the mean RT’s and response probabilities for each trial when we only see one outcome per trial. You can consider the predicted mean RT’s and response probabilities as what we would get if we simulated many many repetitions of trial \(i\). In that case, the mean RT’s and response probabilities represent the best possible guesses we could make about how trial \(i\) actually turned out.

As it turns out, there are closed-form (if ugly) expressions for the response probability and mean RT at each boundary for a Wiener process. These are given below and are computed as part of the Stan program. Note that these formulas refer to \(d_i\), the non-flipped scaled drift rate on trial \(i\), and \(\beta_{s_i}\), the non-flipped bias parameter for participant \(s_i\). \[ \begin{align} \hat{p}_i & = \frac{1 - \exp \left( -2 b_i d_i \beta_{s_i} \right)}{1 - \exp \left( -2 b_i d_i \right)} \\ \widehat{RT}_{Y, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\beta_{s_i}}{\tanh \left(b_i d_i \beta_{s_i} \right)} \right\rbrace \\ \widehat{RT}_{N, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\left(1 - \beta_{s_i} \right)}{\tanh \left[b_i d_i \left(1 - \beta_{s_i} \right) \right]} \right\rbrace \\ \end{align} \]

The trial-level predictions and log-likelihoods are computed in the generated quantities block of the Stan program:

generated quantities {
    vector[nData] p_resp;
    vector[nData] mean_rt_corr;
    vector[nData] mean_rt_err;
    vector[nData] ll;
    
    for (i in 1:nData) {
        real this_drift = trial_drift[i] / (2.0 * resp[i] - 1.0);
        
        p_resp[i] = (1.0 - exp(-2.0 * trial_boundsep[i] * this_drift * bias[subject[i]])) / (1.0 - exp(-2.0 * trial_boundsep[i] * this_drift));
        
        if (isTarget[i] == 1) {
            mean_rt_corr[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - bias[subject[i]] / tanh(this_drift * trial_boundsep[i] * bias[subject[i]])) / this_drift;
            mean_rt_err[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - (1.0 - bias[subject[i]]) / tanh(this_drift * trial_boundsep[i] * (1.0 - bias[subject[i]]))) / this_drift;
        }
        else {
            mean_rt_err[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - bias[subject[i]] / tanh(this_drift * trial_boundsep[i] * bias[subject[i]])) / this_drift;
            mean_rt_corr[i] = resid[subject[i]] + trial_boundsep[i] * (1.0 / tanh(this_drift * trial_boundsep[i]) - (1.0 - bias[subject[i]]) / tanh(this_drift * trial_boundsep[i] * (1.0 - bias[subject[i]]))) / this_drift;
        }
        
        ll[i] = wiener_lpdf(rt[i] | trial_boundsep[i], resid[subject[i]], trial_bias[i], trial_drift[i]);
    }
}