1 Introduction

The Exemplar-Based Random Walk (EBRW) is a dynamic extension of the Generalized Context Model (GCM). According to these models, each time you experience an item in a particular context, it leaves a “trace” in memory. This trace is called an “exemplar” or an “instance” and it consists of a record of the values the item had for particular features. When you are presented with a probe item, it activates each of these exemplars in memory in proportion to how similar the features of the probe are to the features of the exemplars in memory. The summed activation across all exemplars in memory is termed the familiarity of the probe. The extent to which familiarity is higher than a criterion determines the extent to which the probe will be recognized as having been seen before. Specifically, familiarity determines the rate at which evidence accumulates toward either a “yes” or “no” decision regarding whether the probe item was seen before. If the accumulated evidence reaches an upper threshold, a “yes” decision is made; otherwise, if the accumulated evidence reaches a lower threshold, a “no” decision is made.

The text below first describes how familiarity is determined by the summed similarity between a probe and the exemplars in memory, using the concrete example of items consisting of patches of color. Then, the dynamics of evidence accumulation are described. These dynamics are first described as a random walk between the thresholds for “yes” and “no” decisions. Then, we illustrate how the same dynamics can be described in continuous time as a diffusion process instead of a random walk. Finally, we see how the continuous diffusion process makes it easier to fit the parameters of the EBRW model to recognition data.

2 Summed similarity

For concreteness, we imagine an experiment in which several color patches were studied. Colors can be described in terms of three features: hue, chroma (or “saturation”), and luminance (or “brightness”). Imagine that you saw many reddish-hued colors that varied in chroma and luminance. According to the GCM, each of the colors you saw would leave a trace in memory. Each trace would consist of the values of the color you saw along those three dimensions. In this example, all the colors have the same value for “hue”, which is 0. The value of “0” for “hue” corresponds to a reddish color. The colors differ in their values for chroma and luminance, as illustrated in the graph below:

colDF <- expand_grid(h = hue, c = seq(30, 70, length.out = 5), l = seq(30, 70, length.out = 5)) %>%
    mutate(col = hcl(h, c, l))

colDF %>%
    ggplot(aes(x = c, y = l, fill = col)) +
    geom_tile(width = 2.5, height = 2.5) +
    scale_fill_identity() +
    coord_equal() +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", caption = bquote(plain(hue) == .(round(hue))))

Your memory traces can also be written as a matrix, where each row corresponds to an item (color) and each column corresponds to a feature (h for “hue”, c for “chroma”, and l for “luminance”):

knitr::kable(head(colDF %>% select(h, c, l), 10), row.names = TRUE)
h c l
1 0 30 30
2 0 30 40
3 0 30 50
4 0 30 60
5 0 30 70
6 0 40 30
7 0 40 40
8 0 40 50
9 0 40 60
10 0 40 70

2.1 Perceived distance and similarity

The perceived distance between any two of these colors is the Euclidean distance between their feature values, weighted by the degree of attention given to each feature: \[ d_{ij} = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{ik} - x_{jk} \right)^2} \] where \(w_k\) is the weight given to feature \(k\), \(x_{ik}\) is the value of item \(i\) on feature \(k\), \(x_{jk}\) is the value of item \(j\) on feature \(k\), \(d_{ij}\) is the perceived distance between items \(i\) and \(j\), and \(N_F\) is the number of features which in this example is 3 (hue, chroma, and luminance). Note also that while Euclidean distance is appropriate for colors, other types of distance metrics may be more appropriate for other kinds of stimuli.

Perceived similarity between two items, \(s_{ij}\), is an exponential function of the psychological distance between those items: \[ s_{ij} = \exp(-c d_{ij}) \] where \(c\) is a sensitivity parameter that controls how quickly perceived similarity decreases with distance, as illustrated in the graph below:

expand_grid(d = seq(0, 10, length.out = 151), c = seq(0.25, 3, by = 0.25)) %>%
    mutate(s = exp(-c * d)) %>%
    ggplot(aes(x = d, y = s, color = c, group = c)) +
    geom_line() +
    scale_color_continuous_sequential() +
    labs(x = expression(d[ij]), y = expression(s[ij])) +
    theme(legend.position = c(1, 1), legend.justification = c(1, 1))

2.2 Summed similarity

Imagine that you have just been shown two colors from among the set shown earlier. We then present you with a third color—a probe item—and ask whether it was one of the two you just saw. You answer this question on the basis of the summed similarity between the probe item and the two items in memory. The probe item has its own vector of feature values, \(\mathbf{x_q}\). The perceived distance between the probe item and each of the memory items is, as above, the Euclidean distance between the probe’s feature values and those of the memory item: \[ \begin{align} d_{qj} & = \sqrt{\sum_{k = 1}^{N_F} w_k \left(x_{qk} - x_{jk} \right)^2} \\ s_{qj} & = \exp \left( -c d_{qj} \right) \\ S & = \sum_{j = 1}^{N_M} s_{qj} \end{align} \] and, again, the perceived similarity \(s_{qj}\) between the probe \(q\) and memory item \(j\) is an exponential function of perceived distance \(d_{qj}\). Finally, summed similarity \(S\) is the sum of the perceived similarities across all \(N_M\) items in memory.

The graphs below illustrate how this works. The left graph shows contours of equal similarity from each of two study items. The right graph shows summed similarity as a function of the chroma and luminance of a probe item (assuming the same hue).

X <- matrix(
    c(40, 60,
      60, 50),
    nrow = 2,
    ncol = 2,
    byrow = TRUE
)

sens <- 0.05

colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
    mutate(col = hcl(h, c, l))

item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))

simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
    mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
    mutate(s = exp(-sens * d))

conPlot <- colDF %>%
    ggplot(aes(x = c, y = l)) +
    geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 1, by = 0.2)) +
    geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
    scale_fill_identity(aesthetics = "fill") +
    scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))

sumSimPlot <- simDF %>%
    group_by(c, l) %>%
    summarize(summed_sim = sum(s), .groups = "keep") %>%
    ggplot(aes(x = c, y = l, fill = summed_sim)) +
    geom_raster() +
    scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)))))

conPlot + sumSimPlot + plot_layout(nrow = 1)

2.2.1 Varying item strength

It is reasonable to believe that some memory traces are stronger than others, likely due to things like primacy and recency. In GCM/EBRW, “strength” is operationalized as a scaling factor \(m_j\) applied to perceived similarity: \[ s_{qj} = m_j \exp \left( -c d_{qj} \right) \]

Stronger traces have their similarity multiplied by a large value (\(m_j\) is large if trace \(j\) is strong) while weaker traces have their similarity multiplied by a small value (\(m_j\) is small if trace \(j\) is weak). This is illustrated in the pair of graphs below. A probe item does not need to be as similar to a strong item in order to evoke the same level of perceived similarity.

X <- matrix(
    c(40, 60,
      60, 50),
    nrow = 2,
    ncol = 2,
    byrow = TRUE
)

sens <- 0.05
strength <- c(1, 2)

colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
    mutate(col = hcl(h, c, l)) %>%
    mutate(label = c("1", "2"))

item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))

simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
    mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
    mutate(s = exp(-sens * d) * strength[item])

conPlot <- colDF %>%
    ggplot(aes(x = c, y = l)) +
    geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
    geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
    geom_text(aes(label = label), color = "black") +
    scale_fill_identity(aesthetics = "fill") +
    scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))

sumSimPlot <- simDF %>%
    group_by(c, l) %>%
    summarize(summed_sim = sum(s), .groups = "keep") %>%
    ggplot(aes(x = c, y = l, fill = summed_sim)) +
    geom_raster() +
    scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), c == .(signif(sens, 3)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)))))

conPlot + sumSimPlot + plot_layout(nrow = 1)

2.2.2 Varying item specificity

The specificity parameter \(c\) represents the fact that items are assumed to be encoded with some degree of error/uncertainty. Just as items may be encoded with more or less strength, it is reasonable to assume that items can be encoded in memory with more or less specificity. Thus, we add a subscript to the \(c\) parameter corresponding to each study item: \[ s_{qj} = m_j \exp \left( -c_j d_{qj} \right) \]

If an item is encoded with high specificity, then it will only be perceived as similar to the probe if the probe is very close in psychological space. This is illustrated in the pair of graphs below, where item 2 is not only stronger (\(m_2 = 2\) vs. \(m_1 = 1\)) but also more precise (\(c_2 = 0.1\) vs. \(c_1 = 0.05\)).

X <- matrix(
    c(40, 60,
      60, 50),
    nrow = 2,
    ncol = 2,
    byrow = TRUE
)

specificity <- c(0.05, 0.1)
strength <- c(1, 2)

colDF <- tibble(h = hue, c = X[,1], l = X[,2]) %>%
    mutate(col = hcl(h, c, l)) %>%
    mutate(label = c("1", "2"))

item_cols <- as.vector(colDF$col)
names(item_cols) <- as.character(1:length(item_cols))

simDF <- expand_grid(c = seq(30, 70, length.out=101), l = seq(30, 70, length.out=101), item = c(1, 2)) %>%
    mutate(d = sqrt((c - X[item, 1])^2 + (l - X[item, 2])^2)) %>%
    mutate(s = exp(-specificity[item] * d) * strength[item])

conPlot <- colDF %>%
    ggplot(aes(x = c, y = l)) +
    geom_contour(aes(z = s, color = factor(item)), data = simDF, alpha = 2/3, breaks = seq(0, 2, by = 0.2)) +
    geom_tile(aes(fill = col), width = 2.5, height = 2.5) +
    geom_text(aes(label = label), color = "black") +
    scale_fill_identity(aesthetics = "fill") +
    scale_color_manual(values = item_cols, aesthetics = "color", guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Contours of equal similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))

sumSimPlot <- simDF %>%
    group_by(c, l) %>%
    summarize(summed_sim = sum(s), .groups = "keep") %>%
    ggplot(aes(x = c, y = l, fill = summed_sim)) +
    geom_raster() +
    scale_fill_viridis_c(option = "magma", limits = c(0, 3), guide = "none") +
    coord_equal(xlim = c(30, 70), ylim = c(30, 70)) +
    labs(x = "Chroma (saturation)", y = "Luminance (brightness)", subtitle = "Summed similarity", caption = bquote(list(plain(hue) == .(round(hue)), m[1] == .(signif(strength[1], 3)), m[2] == .(signif(strength[2], 3)), c[1] == .(signif(specificity[1], 3)), c[2] == .(signif(specificity[2], 3)))))

conPlot + sumSimPlot + plot_layout(nrow = 1)

3 Making a recognition decision: From random walk to diffusion

According to the EBRW, each trace \(j\) in memory races to be retrieved at a rate proportional to its perceived similarity to the probe item, \(s_{qj}\). Traces race not just against one another, but against a criterion. If a memory trace wins the race, this is taken as evidence that the probe item matches something in memory, thus favoring a “yes” recognition response. If the criterion wins the race instead, this is taken as evidence that the probe item does not match anything in memory, thus favoring a “no” recognition response.

The idea is that, if the probe item matches something in memory, then the corresponding memory trace (or one sufficiently similar to probably match) should be able to win against the criterion. If nothing wins against the criterion, then this suggests there are no traces in memory that are a good match to the probe.

3.1 Accumulating memory evidence

The outcome of each race is added to a running tally which starts at zero. The value of the tally at any given time \(t\), which we can denote \(x(t)\), constitutes the current state of evidence from memory for making a recognition decision. Assume that each race takes \(\Delta\) seconds to complete. Each time a memory trace wins the race, the tally gets incremented by one; each time the criterion wins the race, the tally gets decremented by one. Put formally, we can write this process as \[ x\left( t + \Delta \right) = \begin{cases} x\left( t \right) + 1 & \text{if trace wins} \\ x\left( t \right) - 1 & \text{if criterion wins} \end{cases} \] where \(x(0) = 0\).

3.2 Decision boundaries

You commit to a recognition decision when your accumulated memory evidence is either sufficiently high, leading to a “yes” response, or sufficiently low, leading to a “no” response. We can write the upper and lower decision boundaries as \(B_Y\) and \(B_N\), respectively. For example, imagine setting \(B_Y = 7\) and \(B_N = -8\). In that case, you would say “yes” when \(x(t) = B_Y = 7\), i.e., when a memory trace had won seven more races than the criterion. You would say “no” when \(x(t) = B_N = -8\), i.e., when the criterion had won eight more races than any memory trace. In this example, you would have a slight bias toward saying “yes”, in the sense that you require less evidence to say “yes” than to say “no”. We might therefore expect that, all else being equal, you would tend to say “yes” faster than you would say “no”, since in general it will take less time to accumulate an advantage of \(B_Y = 7\) as opposed to an advantage of \(|B_N| = 8\).

3.3 Step probabilities

Although we have specified that whether memory evidence goes up or down depends on whether a trace or the criterion wins the race, we have not yet specified how the outcome of that race is determined. The winner of each race is random but depends on the similarity \(s_{qj}\) between each trace \(j\) and the probe \(q\). Specifically, trace \(j\) wins the race with probability: \[ \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} \] where \(\kappa\) is a nonnegative number that represents how stringent the criterion is. In other words, the equation above says that the probability that trace \(j\) wins the race is the similarity between trace \(j\) and the probe \(q\) divided by the summed similarity across all traces plus the criterion \(\kappa\). The probability that the criterion wins is \[ \frac{\kappa}{\sum_{k = 1}^N s_{qk} + \kappa} \] In a sense, we can think of the criterion is like a “virtual memory trace” that races alongside the \(N\) actual memory traces.

Remember that we increment memory strength whenever any trace wins the race, regardless of which one it is. Because only one trace can win each race, the probability that any trace wins is just the sum of the probabilities of winning across all \(N\) traces, i.e.: \[ \sum_{j = 1}^N \frac{s_{qj}}{\sum_{k = 1}^N s_{qk} + \kappa} = \frac{1}{\sum_{k = 1}^N s_{qk} + \kappa} \left( \sum_{j = 1}^Ns_{qj} \right) = \frac{S}{S + \kappa} \] where \(S = \sum_{j = 1}^N s_{qj}\) is the summed similarity across all \(N\) traces.

3.4 Adjusting criterion for within-list similarity

It is reasonable to think that the criterion \(\kappa\) could be adjusted in response to a number of factors. One such factor might be list length; it would make sense to set a more stringent criterion for a longer list, since there will be a lot of incidental similarity between probes and traces.

However, in the present application, we only have one list length (\(N = 2\)). Another factor that can be important in adjusting criterion is within-list similarity. If the items within a list are themselves highly similar, then it makes sense to set a more stringent criterion since otherwise you may falsely recognize a probe that happens to be similar to both items. In other words, recognition should depend, at least in part, on judging whether the probe and trace(s) are more similar to one another than the traces are to themselves.

Conceptually, within-list similarity is the average similarity between all pairs of list items. Practically, this is a bit challenging to calculate because similarity depends on the strength and specificity with which the items are encoded.

To calculate within-list similarity, let’s first consider a pair of traces \(j\) and \(k\). If we treat trace \(k\) as if it were a probe (like \(q\)), then the similarity between \(k\) and \(j\) would be \[ s_{kj} = m_j \exp \left(-c_j d_{kj} \right) \] where the strength \(m_j\) and specificity \(c_j\) come from trace \(j\). We can also go the other direction and treat trace \(j\) as if it were a probe, resulting in \[ s_{jk} = m_k \exp \left(-c_k d_{jk} \right) \] We assume that the perceived similarity between traces \(j\) and \(k\) is just the average of \(s_{kj}\) and \(s_{jk}\), i.e., \[ \frac{s_{kj} + s_{jk}}{2} = \frac{1}{2} \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \]

In our present application, \(N = 2\), so the formula above is all we need to find the within-list similarity; just set \(j = 1\) and \(k = 2\). In general, however, within-list similarity can be written as \[ \begin{align} W & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_j \exp \left(-c_j d_{kj} \right) + m_k \exp \left(-c_k d_{jk} \right) \right] \end{align} \] where we divide by \(N \left(N - 1 \right)\) because that is the number of pairwise similarities entering into the average. Notice also that the indexes \(j\) and \(k\) are cleverly chosen so as to cover every unique \(jk\) pair.

Finally, we can write the probability of taking a step up (i.e., of a trace winning the race) as \[ \frac{S}{S + \gamma W + \kappa} \] where \(\gamma W + \kappa\) reflects the criterion adjusted for within-list similarity, and \(\gamma\) is a free parameter (assumed to be nonnegative) that reflects the strength of the adjustment.

3.5 Exemplar-Based Random Walk

Finally, we are in a position to fully specify the Exemplar-Based Random Walk (EBRW) model. Let \(i\) be the index of the current experimental trial and \(s_i\) indicate the participant who produced trial \(i\). That participant is associated with a matrix of perceived distances \(\mathbf{d_{s_i}}\), where each entry in the matrix \(d_{{s_i}, jk}\) is the perceived distance between items \(j\) and \(k\) for participant \(s_i\). The probe item on trial \(i\) is item \(q_i\). The item studied in position \(j\) on trial \(i\) is item \(j_i\) (same goes for \(k\)). The summed similarity \(S_i\), within-list similarity \(W_i\), and step probability \(p_i\) for trial \(i\) can then be written \[ \begin{align} S_i & = \sum_{j = 1}^N m_{s_i, j} \exp \left( -c_{s_i, j} d_{s_i, q_i, j_i} \right) \\ W_i & = \frac{1}{N \left(N - 1 \right)}\sum_{j = 1}^{N - 1} \sum_{k = j + 1}^N \left[ m_{s_i, j} \exp \left(-c_{s_i, j} d_{s_i, k_i, j_i} \right) + m_{s_i, k} \exp \left(-c_{s_i, k} d_{s_i, j_i, k_i} \right) \right] \\ p_i & = \frac{S_i}{S_i + \gamma_{s_i} W_i + \kappa_{s_i}} \end{align} \] where \(N\) is the number of items in the list, \(m_{s_i, j}\) are free parameters for the strengths of items in different study positions, \(c_{s_i, j}\) are the specificity parameters for items in different study positions, and \(\gamma_{s_i}\) and \(\kappa_{s_i}\) are free parameters for the participant’s within-list similarity and criterion.

The EBRW models the speed of decision making in terms of how many races must be run until the accumulated win advantage in favor of either a “yes” or “no” response reaches a decision boundary. To convert this to “real time”, we must say how long each race takes and allow for a residual time that accounts for the additional time needed to orient attention to the trial and execute the selected motor response. Above, we used \(\Delta\) to stand for the amount of time (in seconds) each race takes to run. We add a subscript \(s_i\) to make \(\Delta_{s_i}\) to make clear that \(\Delta\) is a parameter that can vary between people. In addition, it will be convenient later to think instead of \(\nu_{s_i} = \frac{1}{\Delta_{s_i}}\), where \(\nu_{s_i}\) is the number of races per second. Finally, parameter \(R_{s_i}\) stands for the additional residual time.

The figure below shows an example of how memory evidence evolves during a single trial in which \(p_i = 0.55\), \(\nu_{s_i} = 40\), \(B_{s_i, Y} = 7\), \(B_{s_i, N} = -8\), and \(R_{s_i} = 0.2\). In addition, the graphs above and below the evidence trajectory illustrate the relative frequency with which, across many identical trials, each type of response would be made at each unit of time. Note that these distributions are discrete because the random walk operates in discrete time intervals, each of duration \(\Delta_{s_i}\) (which in this example is \(\Delta_{s_i} = \frac{1}{\nu_{s_i}} = 0.025\) seconds).

set.seed(1)

nu <- 40
p <- 0.55

B <- c(-8, 7)
resid <- 0.2

### RT distributions

Y_rw <- seq(B[1], B[2])
P_rw <- matrix(0, nrow = length(Y_rw), ncol = length(Y_rw))
P_rw[cbind(2:(nrow(P_rw) - 1), 1:(ncol(P_rw) - 2))] <- 1 - p
P_rw[cbind(2:(nrow(P_rw) - 1), 3:ncol(P_rw))] <- p
P_rw[1, 1] <- P_rw[nrow(P_rw), ncol(P_rw)] <- 1

### Simulation

while (TRUE) {
    winner <- 0
    x_rw <- 0

    while (TRUE) {
        s <- 2 * (runif(n = 1) < p) - 1
        x_rw <- c(x_rw, x_rw[length(x_rw)] + s)
        if (x_rw[length(x_rw)] <= B[1]) {
            winner <- 1
            break
        } else if (x_rw[length(x_rw)] >= B[2]) {
            winner <- 2
            break
        }
    }
    
    if (winner == 2 & (length(x_rw) / nu) > 1.5) {
        break
    }
}

RT_rw <- matrix(0, nrow = 2, ncol = length(x_rw))
Z_rw <- 1 * c(Y_rw == 0)

for (i in 1:length(x_rw)) {
    Z_rw <- Z_rw %*% P_rw
    RT_rw[1, i] <- Z_rw[1]
    RT_rw[2, i] <- Z_rw[length(Z_rw)]
}

dRT_rw <- apply(RT_rw, MARGIN = 1, FUN = diff) * nu / 2

rtPlot1 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(fill = "#eeb211", color = NA, width = 1 / nu) +
    coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_rw)))) +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rtPlot0 <- tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(fill = "#46166b", color = NA, width = 1 / nu) +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    scale_x_continuous(limits = c(0, NA)) +
    scale_y_reverse(limits = c(max(c(dRT_rw)), 0)) +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rwPlot <- tibble(t = resid + 1:length(x_rw) / nu, x = x_rw) %>%
    ggplot(aes(x = t, y = x)) +
    geom_step(linewidth = 1.5) +
    geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", linewidth = 2) +
    geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", linewidth = 2) +
    geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", linewidth = 2) +
    geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
    geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
    geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
    coord_cartesian(xlim = c(0, NA)) +
    labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu))))

rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))

3.6 From discrete random walk to continuous diffusion

A random walk takes discrete-valued steps either up or down in discrete units of time. A Wiener diffusion process takes continuous-valued steps sampled from a normal distribution in infinitely small units of time, thus effectively operating in continuous time. We are going to approximate the discrete EBRW with a continuous diffusion process (so technically we should call this model the EBD for Exemplar-Based Diffusion, but we will keep calling it the EBRW for posterity).

In going from a random walk to a diffusion model, we are making an important psychological claim: We are saying that, instead of memory evidence arriving in discrete units at regular intervals, memory evidence is a continuous value that continually evolves as new information arrives. We can think of this as saying that, instead of only knowing the outcome of each race, you can see who is ahead and who is behind at any given time; this is the move from discrete time to continuous time. Moreover, instead of only scoring each race as a win or loss for the memory traces, the races are assigned a continuous value depending on how clear the winner is; this is the move from discrete evidence to continuous evidence.

3.6.1 Mean and standard deviation of diffusion

We can write the update equation for the random walk like we did above: \[ \begin{align} x \left( t + \Delta_{s_i} \right) & = \begin{cases} x(t) + 1 & \text{with probability } p_i \\ x(t) - 1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & = \begin{cases} 1 & \text{with probability } p_i \\ -1 & \text{with probability } 1 - p_i \end{cases} \\ x \left( t + \Delta_{s_i} \right) - x(t) & \sim 2 \times \text{Bernoulli} \left( p_i \right) - 1 \end{align} \] where we have rearranged terms and used the shorthand in the final line to emphasize the idea that each step of the random walk can be thought of as a sample from a Bernoulli distribution with parameter \(p_i\) that is then transformed from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\).

The update equation above is for a single race, but what if we want to know the distribution of the random walk after a particular span of time? Specifically, what is the distribution of the total change in memory evidence over one second of time? It depends on how many races can be run within one second, which is \(\frac{1}{\Delta_{s_i}} = \nu_{s_i}\). Because the total change over one second is the sum of \(\nu\) Bernoulli random variables each with parameter \(p_i\), it is distributed as a Binomial distribution. As above, this distribution gets transformed to reflect the change in scale in the outcome of each race from \(\lbrace 0, 1 \rbrace\) to \(\lbrace -1, 1 \rbrace\): \[ x \left( t + 1 \right) - x(t) \sim 2 \times \text{Binomial} \left( \nu_{s_i}, p_i \right) - 1 \]

One last step to turn this into a continuous diffusion: We need to approximate the binomial distribution with a normal distribution that has the same mean and variance. The mean and variance of this normal distribution are the mean and variance of the diffusion process that approximates the random walk. The mean of a binomial distribution is \(\nu_{s_i} p_i\) and its variance is \(\nu_{s_i} p_i \left(1 - p_i \right)\). This leads to the normal approximation \[ \begin{align} x \left( t + 1 \right) - x(t) & \sim 2 \times \text{Normal} \left[ \nu_{s_i} p_i, \nu_{s_i} p_i \left(1 - p_i \right) \right] - 1 \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left[ 2\nu_{s_i} p_i - 1, 4\nu_{s_i} p_i \left(1 - p_i \right) \right] \\ x \left( t + 1 \right) - x(t) & \sim \text{Normal} \left( \mu_i, \sigma^2_i \right) \\ \end{align} \] where, to summarize, the mean of the diffusion on trial \(i\) is \(\mu_i\) and the variance of the diffusion on trial \(i\) is \(\sigma_i^2\), both of which depend on step probability \(p_i\) and rate \(\nu_{s_i}\): \[ \begin{align} \mu_i & = \nu_{s_i} \left( 2 p_i - 1 \right) \\ \sigma^2_i & = 4 \nu_{s_i} p_i \left(1 - p_i \right) \\ \sigma_i & = 2 \sqrt{\nu_{s_i} p_i \left(1 - p_i \right)} \end{align} \]

The mean \(\mu_i\) is the drift rate of the diffusion process. The variance \(\sigma_i^2\) is the noise in the diffusion process. For each infinitesimal unit of time \(dt\), the current state of memory evidence is updated according to \[ dx(t) = \mu_i dt + \epsilon_i(t) \] where \[ \epsilon_i(t) \sim \text{Normal} \left( 0, \sigma_i^2 dt \right) \] is a random variable sampled from a Normal distribution with mean zero and variance \(\sigma_i^2 dt\).

Notice an important property of the noise \(\sigma^2_i\): It is largest when \(p_i = 0.5\) and approaches zero as \(p_i\) approaches either 0 or 1. In other words, the more uncertain the outcome of each race, the more noise there is in the diffusion. This is illustrated below:

expand_grid(p = seq(0, 1, length.out = 101), nu = seq(10, 40, by = 10)) %>%
    mutate(sigma2 = 4 * nu * p * (1 - p)) %>%
    ggplot(aes(x = p, y = sigma2, color = nu, group = nu)) +
    geom_line() +
    labs(x = expression(p[i]), y = expression(sigma[i]^2), color = expression(nu[s[i]])) +
    theme(legend.position = c(1, 1), legend.justification = c(1, 1))

To summarize, the difference between the random walk and the diffusion is that we have swapped out a discrete binomial distribution of evidence increments per unit time with a continuous normal distribution of evidence increments per unit time. Everything else is the same: You still respond “yes” if and when the accumulated evidence \(x(t)\) reaches either the upper boundary \(B_{s_i, Y}\) or the lower boundary \(B_{s_i, N}\) and

3.6.2 Closeness of predictions

To illustrate how well the diffusion process approximates the random walk, the graphs below show the diffusion approximation to the same random walk example used above. The smooth lines in the upper and lower graphs are the probability of responding per unit time (i.e., the probability density function) according to the Wiener diffusion model. The open bars are the same probabilities from the random walk. The diffusion model’s predictions hew very closely to those of the random walk!

mu <- nu * (2 * p - 1)
sigma2 <- 4 * nu * p * (1 - p)
boundsep <- B[2] - B[1]
bias <- (0 - B[1]) / (B[2] - B[1])
delta_t <- 0.001

while (TRUE) {
    winner_diff <- NA
    x_diff <- 0
    
    while (TRUE) {
        x_diff <- c(x_diff, x_diff[length(x_diff)] + rnorm(n = 1, mean = mu * delta_t, sd = sqrt(sigma2 * delta_t)))
        if (x_diff[length(x_diff)] <= B[1]) {
            winner_diff <- 1
            break
        } else if (x_diff[length(x_diff)] >= B[2]) {
            winner_diff <- 2
            break
        }
    }
    
    if (winner == winner_diff & abs((length(x_diff) * delta_t) - (length(x_rw) / nu)) < (1 / nu)) {
        break
    }
}

x_diff <- pmax(pmin(x_diff, B[2]), B[1])

t <- seq(1, length(x_diff)) * delta_t

dRT_diff <- cbind(
    WienerPDF(t = t, response = "lower", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value,
    WienerPDF(t = t, response = "upper", a = boundsep / sqrt(sigma2), v = mu / sqrt(sigma2), w = bias)$value
)

rtPlot1 <- tibble(t = resid + t, p = dRT_diff[,2]) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,2])), color = "#eeb211aa", fill = NA, width = 1 / nu) +
    geom_area(fill = "#eeb21177", color = "#eeb211") +
    coord_cartesian(xlim = c(0, NA), ylim = c(0, max(c(dRT_diff)))) +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rtPlot0 <- tibble(t = resid + t, p = dRT_diff[,1]) %>%
    ggplot(aes(x = t, y = p)) +
    geom_col(data = tibble(t = resid + 1:length(x_rw) / nu, p = c(0, dRT_rw[,1])), color = "#46166baa", fill = NA, width = 1 / nu) +
    geom_area(fill = "#46166b77", color = "#46166b") +
    labs(x = NULL, y = "Pr(Respond at time t)") +
    scale_x_continuous(limits = c(0, NA)) +
    scale_y_reverse(limits = c(max(c(dRT_diff)), 0)) +
    theme(axis.text = element_blank(), axis.ticks.y = element_blank())

rwPlot <- tibble(t = resid + t, x = x_diff) %>%
    ggplot(aes(x = t, y = x)) +
    geom_line() +
    geom_hline(yintercept = B[2], linetype = "solid", color = "#eeb211", linewidth = 2) +
    geom_hline(yintercept = B[1], linetype = "solid", color = "#46166b", linewidth = 2) +
    geom_vline(xintercept = resid, linetype = "dashed", color = "#666666", linewidth = 2) +
    geom_text(data = tibble(x = 0, y = B[2], label = paste0("B[list(Y, s[i])] == ", B[2])), mapping = aes(x = x, y = y, label = label), color = "#eeb211", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = 2) +
    geom_text(data = tibble(x = 0, y = B[1], label = paste0("B[list(N, s[i])] == ", B[1])), mapping = aes(x = x, y = y, label = label), color = "#46166b", inherit.aes = FALSE, parse = TRUE, hjust = 0, vjust = -1) +
    geom_text(data = tibble(x = resid, y = 0, label = paste0("R[s[i]] == ", resid)), mapping = aes(x = x, y = y, label = label), color = "#666666", inherit.aes = FALSE, parse = TRUE, hjust = 1.1, vjust = 1.5) +
    coord_cartesian(xlim = c(0, NA)) +
    labs(x = "Retrieval time (s)", y = "Memory evidence", caption = bquote(list(p[i] == .(p), nu[s[i]] == .(nu), mu[i] == .(signif(mu, 3)), sigma[i] == .(signif(sqrt(sigma2), 3)))))

rtPlot1 + rwPlot + rtPlot0 + plot_layout(ncol = 1, heights = c(1, 1.75, 1))

4 Implementation in R

To find the EBRW parameters that best fit the recognition data from each participant, I implemented the EBRW in R using the WienR package.

4.1 The data

We model the data at the level of individual trials. For each trial i, we know

  • subject[i]: The numerical index of the participant who produced the trial.
  • study_item[i, 1] and study_item[i, 2]: These are the numerical indexes of the first and second items presented on the study list just prior to this trial. For example, if the study list for trial i was items 6 and 3 in that order, then study_item[i, 1] would be 6 and study_item[i, 2] would be 3.
  • probe_item[i]: The numerical index of the probe item presented on trial i.
  • resp[i]: The response the participant gave on trial i, either 0 (for “no”) or 1 (for “yes”).
  • rt[i]: The response time of the participant on trial i.
  • dist[1:nSubj, 1:nItem, 1:nItem]: An array which is basically a list of matrices of the perceived distances between all nItem items, inferred from Multidimensional Scaling for each participant. dist[s, j, k] gives the distance perceived by participant s between item j and item k.

4.2 Parameters to be estimated

For each participant s, there are a total of nine parameters to be estimated. These are:

  • retrieval_rate[s]: This is the number of races run per second, labeled \(\nu\) above.
  • boundsep[s]: This is the difference \(B_Y - B_N\), called boundary separation. Since the decision boundaries of a diffusion process do not need to be whole numbers, it is convenient to use this and the following parameter instead of \(B_Y\) and \(B_N\).
  • bias[s]: This is a number between 0 and 1 representing the degree of bias in how participant s sets their decision boundaries. \(B_Y\) is given by boundsep[s] * (1 - bias[s]) and \(B_N\) is given by -boundsep[s] * bias[s]. That way, if bias[s] is 0.5, the diffusion process is unbiased because \(B_Y = -B_N\) and you need exactly the same amount of evidence before you are willing to make either a “yes” or “no” response. If bias[s] is greater than 0.5, then s is biased in favor of “yes” responses while if bias[s] is less than 0.5, then s is biased in favor of “no” responses.
  • resid[s]: The residual time, other than that needed for the diffusion process, needed to orient to the probe item on the trial and execute the motor response. By definition, this must be a number between 0 and the minimum RT produced by participant s. This was called \(R\) above.
  • strength1[s]: The strength in memory of the trace of the first list item, \(m_{s_i, 1}\) above. Since strengths are on an arbitrary scale, we assume that \(m_{s_i, 2} = 1\) for all participants. That means strength1[s] is the strength of the first studied item relative to the strength of the second for participant s.
  • specificity[s]: This is the mean specificity for participant s, called parameter \(c\) above, which together with the next parameter is used to define the specificity for each serial position.
  • specificity_diff[s]: This is a number between 0 and 1 that, combined with the parameter above, specifies the specificity with which subject s encodes items in each serial position. Specifically, specificity_pos[1, s] = 2 * specificity[s] * (1 - specificity_diff[s]) (the result corresponds to \(c_{s_i, 1}\) above) and specificity_pos[2, s] = 2 * specificity[s] * specificity_diff[s] (the result corresponds to \(c_{s_i, 2}\) above). If specificity_diff[s] is 0.5, then subject s encodes items in each position with the same specificity, specificity[s]. If specificity_diff[s] > 0.5, then the second (more recent) item is encoded with greater precision than the first; if specificity_diff[s] < 0.5, then the first item is encoded with greater precision than the second.
  • within_sim[s]: This is how much the criterion for participant s is raised by within-list similarity; this corresponds to parameter \(\gamma\) above.
  • criterion[s]: This is the baseline criterion for participant s, corresponding to parameter \(\kappa\) above.

4.3 The likelihood function

The whole reason why we took the original EBRW and turned it into a Wiener diffusion process was so that we could compute the likelihood of making a “yes” or “no” response at any time. What we need to do now is find the set of parameter values, for each participant, that, out of all possible sets of parameter values, assigns the highest likelihood to the observed set of responses/RT’s across all trials produced by subject s. This means we need to tell R how to calculate that likelihood. We do this by specifying a function that takes a set of parameters as well as a set of data and returns this likelihood value. We will break down the design of this function below.

4.3.1 Actually, negative log-likelihood

For one thing, we actually don’t work directly with the likelihood function. The reason is twofold: First, because the likelihood involves multiplying many small numbers, it is numerically unstable; multiply too many small numbers together and even a computer can’t distinguish that result from zero. Second, most optimization routines are minimizers by default, so we need to “reverse-code” the likelihood.

The first issue above we deal with by taking the natural logarithm of the likelihood. Let’s define the likelihood explicitly here:

\[ \text{likelihood} = \prod_{i = 1}^N f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right) \] where \(f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right)\) is shorthand for the likelihood of the response and RT observed on trial \(i\), given a set of EBRW parameters. Because we assume that each trial is conditionally independent (conditional on the parameters characterizing the participant that produced that trial), the likelihood of all \(N\) trials is the product of their individual likelihoods. These are all the tiny numbers that need to get added up. By taking the natural logarithm, we turn the product into a sum of log-likelihoods:

\[ \text{log-likelihood} = \sum_{i = 1}^N \log f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right) \]

Finally, to “reverse-code” the likelihood for the purposes of using typical optimization routines, we just take the negative log-likelihood, often abbreviated NLL.

\[ NLL = \text{negative log-likelihood} = -\sum_{i = 1}^N \log f \left(\text{resp}_i, \text{RT}_i | \text{parameters} \right) \]

Our job is now subtly changed. We are looking for the set of parameters that result in the lowest possible value of the negative log-likelihood (NLL).

4.3.2 What the function needs

Having set our goals, we must now define a function to compute the NLL for a set of observed responses/RT’s, given a set of parameters. The function itself, in outline form, looks like the one below. I have written comments for the things that the function needs to accomplish to get from what is given to the function (in the parentheses following function) to what the function needs to return at the end.

# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
    # 1. Transform the given parameters into their appropriate ranges
    
    # 2. Compute the mean and SD of the drift for each trial
    
    # 3. Calculate the log-likelihood of each observed response/RT on each trial
    
    # 4. Return final result
    return(-sum(trial_log_likelihood))
}

Let’s now fill in each of those sections in turn.

4.3.3 Transforming parameters

The vector par that is the first argument to the ebrw_nll function should be a named vector that looks like the following:

par <- c(
    "retrieval_rate" = 3,
    "boundsep" = 2.3,
    "bias" = 0,
    "resid" = 0,
    "strength1" = -0.22,
    "specificity" = 0.41,
    "specificity_diff" = 0.41,
    "within_sim" = -1.2,
    "criterion" = -0.92
)

The vector par is a bit odd, however, because it specifies the values of each of those named parameters on a scale that goes from \(-\infty\) to \(\infty\). This is because many optimization routines do not allow for constraints on parameter values. But this doesn’t pose a problem for us, because we can transform the “raw” values in the par vector into the actual parameter values on their corresponding ranges, so that we can use those parameters as they were intended.

We use basically two functions to perform these transformations: exp() and plogis(). The exponential function exp() takes numbers that can fall across the entire real line and returns a value that is strictly positive. So we will use the exp() function for any parameters that are constrained to be positive but that do not have an upper bound.

The plogis() function is also called the “logistic” function. It takes values that can fall across the entire real line and returns a value that is between 0 or 1. We will use the plogis() function for any parameters that have both an upper and lower bound. If the upper bound is different from 1, then we can just multiply the transformed parameter by its proper maximum (this is what we do for residual time).

The graph below illustrates the exp() and plogis() functions.

expand_grid(x = seq(-4, 4, length.out = 201), trans = c("exp", "plogis")) %>%
    mutate(y = if_else(trans == "exp", exp(x), plogis(x))) %>%
    ggplot(aes(x = x, y = y, color = trans, linetype = trans)) +
    geom_hline(yintercept = c(0, 1), linetype = "dotted", color = "black") +
    geom_line() +
    coord_cartesian(ylim = c(0, 3)) +
    labs(x = "Raw value", y = "Transformed value", color = NULL, linetype = NULL) +
    theme(legend.position = c(0.01, 0.99), legend.justification = c(0, 1))

We can now update our definition of the ebrw_nll function to include the necessary parameter transformations:

# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
    # 1. Transform the given parameters into their appropriate ranges
    
    # 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
    min_rt <- min(rt)
    
    # 1b. Transform the raw parameters as needed
    retrieval_rate <- exp(par["retrieval_rate"])
    boundsep <- exp(par["boundsep"])
    bias <- plogis(par["bias"])
    resid <- plogis(par["resid"]) * min_rt
    strength1 <- exp(par["strength1"])
    specificity <- exp(par["specificity"])
    specificity_diff <- plogis(par["specificity_diff"])
    within_sim <- exp(par["within_sim"])
    criterion <- exp(par["criterion"])
    
    # 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
    specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
    specificity_pos2 <- 2 * specificity * specificity_diff
    
    # 2. Compute the mean and SD of the drift for each trial
    
    # 3. Calculate the log-likelihood of each observed response/RT on each trial
    
    # 4. Return final result
    return(-sum(trial_log_likelihood))
}

4.3.4 Computing the mean and SD of the drift for each trial

Armed with our properly-transformed parameter values, we can now use them, in conjunction with the dist, study_item, and probe_item arguments, to compute what the mean and SD of the drift will be for each trial.

4.3.4.1 A single trial example

Assume that the index variable i indicates which trial we are currently looking at. Then the study items for that trial can be found by looking up study_items[i, ]. Specifically, study_items[i, 1] and study_items[i, 2] tell us the identities of the first and second items studied on trial i. Meanwhile, probe_item[i] tells us the identity of the probe item on trial i. As a result, dist[study_items[i, 1], probe_item[i]] will give us the perceived distance between the first study item and the probe item on trial i. The similarity between the probe item and the first studied item on trial i is then given by strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], probe_item[i]]). Finally, the summed similarity between the probe item and both studied items on trial i can be computed and stored as trial_sum_sim:

trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], probe_item]) + exp(-specificity_pos2 * dist[study_items[i, 2], probe_item])

We also need to compute the average similarity between items within the list, too. As described in mathematical form above, the similarity between the first and second items is strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], study_items[i, 2]]) while the similarity between the second and first items is strength2 * exp(-specificity_pos2 * dist[study_items[i, 1], study_items[i, 2]]). Thus, we can define the average within-list similarity on trial i (called trial_list_sim) as:

trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items[i, 1], study_items[i, 2]]) + exp(-specificity_pos2 * dist[study_items[i, 1], study_items[i, 2]])) / 2

At this point, we can directly define the probability of taking an upward step in the random walk and then the mean and standard deviation of the corresponding diffusion process by translating the formulas above into R code.

trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
    
trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))

4.3.4.2 Vectorizing to work with multiple trials at once

The code above was written under the assumption that we were dealing with just a single trial with index i. However, R makes it easy to do the same set of calculations for all trials at once via the miracle of vectorization. “Vectorization” means that an operation that works on a single number can also be applied at once to an entire vector (or matrix or even array) of numbers. We will can use this trick because probe_item is a vector, and so are the columns of study_items (e.g., study_items[, 1] is a vector of the first studied item in every trial).

To index the appropriate entries in the dist matrix, we use the cbind function (short for “bind into columns”) to stitch together the indices we need. For example, cbind(study_items[, 1], probe_item) gives a matrix where each row is a trial, column 1 is the first studied item on each trial, and column 2 is the second studied item on each trial. Then dist[cbind(study_items[, 1], probe_item)] gives a vector of the distances between all the first-studied items and probe items across every trial.

Armed with this vectorization trick, we can define trial_sum_sim not just for trial i, but as a vector where each entry is the summed similarity on a different trial:

trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])

For within-list similarity, we don’t even need the cbind function, since study_items is already a matrix where each row is a trial and the two columns indicate the first and second studied items:

trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2

4.3.4.3 Interim summary

We have now completed the second step of writing the ebrw_nll function, as summarized below.

# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
    # 1. Transform the given parameters into their appropriate ranges
    
    # 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
    min_rt <- min(rt)
    
    # 1b. Transform the raw parameters as needed
    retrieval_rate <- exp(par["retrieval_rate"])
    boundsep <- exp(par["boundsep"])
    bias <- plogis(par["bias"])
    resid <- plogis(par["resid"]) * min_rt
    strength1 <- exp(par["strength1"])
    specificity <- exp(par["specificity"])
    specificity_diff <- plogis(par["specificity_diff"])
    within_sim <- exp(par["within_sim"])
    criterion <- exp(par["criterion"])
    
    # 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
    specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
    specificity_pos2 <- 2 * specificity * specificity_diff
    
    # 2. Compute the mean and SD of the drift for each trial
    
    # 2a. Compute the summed similarity between the probe and study items on each trial
    trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
    
    # 2b. Compute the within-list similarity between study items on each trial
    trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
    
    # 2c. Compute the probability of taking a step up in the random walk
    trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
    
    # 2d. Compute the mean and standard deviation of the corresponding diffusion
    trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
    trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
    
    # 3. Calculate the log-likelihood of each observed response/RT on each trial
    
    # 4. Return final result
    return(-sum(trial_log_likelihood))
}

4.3.5 Calculating the log-likelihood

This step is almost too easy. We are using the WienR package, which includes a function called WienerPDF. The WienerPDF function computes for us both the likelihood and log-likelihood of the response/RT for each trial. We just need to tell it the drift rate, boundary separation, bias, and residual time parameters.

There is only one thing we need to do before we can use the WienerPDF function. The WienerPDF function assumes that the standard deviation of the diffusion is always equal to one. As such, we need to standardize the drift rate and boundary separation before we send them to the WienerPDF function by dividing each by trial_drift_sd:

trial_standard_drift <- trial_drift_mean / trial_drift_sd
trial_standard_boundsep <- boundsep / trial_drift_sd

Then, we can send these standardized values to the WienerPDF function:

trial_wiener <- WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid)

Notice that WienerPDF expects response to be either 1 or 2 for “no” and “yes”, respectively, so we have to add 1 to resp.

Our code now looks like the following:

# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
    # 1. Transform the given parameters into their appropriate ranges
    
    # 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
    min_rt <- min(rt)
    
    # 1b. Transform the raw parameters as needed
    retrieval_rate <- exp(par["retrieval_rate"])
    boundsep <- exp(par["boundsep"])
    bias <- plogis(par["bias"])
    resid <- plogis(par["resid"]) * min_rt
    strength1 <- exp(par["strength1"])
    specificity <- exp(par["specificity"])
    specificity_diff <- plogis(par["specificity_diff"])
    within_sim <- exp(par["within_sim"])
    criterion <- exp(par["criterion"])
    
    # 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
    specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
    specificity_pos2 <- 2 * specificity * specificity_diff
    
    # 2. Compute the mean and SD of the drift for each trial
    
    # 2a. Compute the summed similarity between the probe and study items on each trial
    trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
    
    # 2b. Compute the within-list similarity between study items on each trial
    trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
    
    # 2c. Compute the probability of taking a step up in the random walk
    trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
    
    # 2d. Compute the mean and standard deviation of the corresponding diffusion
    trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
    trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
    
    # 3. Calculate the log-likelihood of each observed response/RT on each trial
    
    # 3a. Standardize the drift rates and boundary separation for each trial
    trial_standard_drift <- trial_drift_mean / trial_drift_sd
    trial_standard_boundsep <- boundsep / trial_drift_sd
    
    # 3b. Find the likelihoods of the given responses/RT
    trial_wiener <- WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid)
    
    # 4. Return final result
    return(-sum(trial_log_likelihood))
}

4.3.6 Regularization (or keeping parameters in line)

When we are fitting the model to participants from whom we have only a few trials (e.g., a few dozen), then it is possible for the maximum likelihood parameters to be very bizarre. This is because the parameters are just fitting the idiosyncratic noise in the data. One way to counteract this would be to place hard constraints on how large each parameter can get (all the parameters are such that zero is as small as they can get). The problem with this approach is that the exact choice of constraint may matter and it is difficult to justify. Moreover, it may well be that a participant really is best characterized by a “bizarre” parameter value and we do not want to rule those out completely.

An alternative approach is to place soft constraints on the parameters. This is called regularization, because parameter estimates are regularized toward “regular” values when the data are ambiguous. Regularization is, in fact, the same idea as placing a prior distribution in a Bayesian analysis. The prior distribution acts to regularize the resulting parameter estimates so that they tend not to over-fit noise in the data.

For the purpose of estimating the EBRW, I chose the following prior distributions for each free parameter. These priors encourage the parameters not to get too bizarre but do not entirely prevent that from happening if the data require it.

log_prior <-
    # Prevents from getting too large
    dexp(retrieval_rate, rate = 1 / 50, log = TRUE) +
    # Prevents from getting too large
    dexp(boundsep, rate = 1 / 50, log = TRUE) +
    # Prevents from getting too close to 0 or 1
    dbeta(bias, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
    # Prevents from getting too close to 0 or 1 (notice the division below)
    dbeta(resid / min_rt, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
    # Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 1)
    dgamma(strength1, shape = 2.618, rate = 1.618, log = TRUE) +
    # Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 10)
    dgamma(specificity, shape = 1.1, rate = 0.1, log = TRUE) +
    # Prevents from getting too close to 0 or 1
    dbeta(specificity_diff, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
    # Prevents from getting too large
    dexp(within_sim, rate = 1 / 20, log = TRUE) +
    # Prevents from getting too large
    dexp(criterion, rate = 1 / 20, log = TRUE)

Notice that the quantity log_prior is the sum of the log-priors for each EBRW parameter. We will incorporate log_prior into the negative summed log-likelihood that the function returns. Doing so has the effect of forcing the optimization to find parameters that achieve the best possible balance between the likelihood and the regularizing prior.

4.3.7 Error-checking and the final result

As part of filling in the blanks for the final step, in which we return the final result, it is important for us to do some error checking. Sometimes, a particular combination of parameters will make it impossible for the WienerPDF function to calculate the log-likelihood. When that happens, it gives an error. In essence, such a result tells us that the model cannot work with that combination of parameters. Thus, rather than an “error”, that is really telling us that we should assign zero likelihood to that set of parameters, which is equivalent to a log-likelihood of \(-\infty\).

We can do that kind of check in R by putting the WienerPDF function call within try(). If the WienerPDF function gives an error, then the result that gets stored in trial_wiener is also an error. Otherwise, it just gives us the log-likelihoods that we want.

Let’s set up an “if…else” structure to do this check:

trial_wiener <- try(WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid))

if (class(trial_wiener) == "try-error") {
    trial_log_likelihood <- -Inf
} else {
    trial_log_likelihood <- trial_wiener$logvalue
}

You will notice, too, that the log-likelihoods that we want are stored under the name logvalue in the list that is returned by the WienerPDF function.

4.3.8 The complete likelihood function

We now have a complete definition of the likelihood function below.

# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt) {
    # 1. Transform the given parameters into their appropriate ranges
    
    # 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
    min_rt <- min(rt)
    
    # 1b. Transform the raw parameters as needed
    retrieval_rate <- exp(par["retrieval_rate"])
    boundsep <- exp(par["boundsep"])
    bias <- plogis(par["bias"])
    resid <- plogis(par["resid"]) * min_rt
    strength1 <- exp(par["strength1"])
    specificity <- exp(par["specificity"])
    specificity_diff <- plogis(par["specificity_diff"])
    within_sim <- exp(par["within_sim"])
    criterion <- exp(par["criterion"])
    
    # 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
    specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
    specificity_pos2 <- 2 * specificity * specificity_diff
    
    # 2. Compute the mean and SD of the drift for each trial
    
    # 2a. Compute the summed similarity between the probe and study items on each trial
    trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
    
    # 2b. Compute the within-list similarity between study items on each trial
    trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
    
    # 2c. Compute the probability of taking a step up in the random walk
    trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
    
    # 2d. Compute the mean and standard deviation of the corresponding diffusion
    trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
    trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
    
    # 3. Calculate the log-likelihood of each observed response/RT on each trial
    
    # 3a. Standardize the drift rates and boundary separation for each trial
    trial_standard_drift <- trial_drift_mean / trial_drift_sd
    trial_standard_boundsep <- boundsep / trial_drift_sd
    
    # 3b. Find the likelihoods of the given responses/RT
    trial_wiener <- try(WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid))
    
    # 4. Return final result
    
    # 4a. Check if the Wiener PDF returned an error or not
    if (class(trial_wiener) == "try-error") {
        trial_log_likelihood <- -Inf
    } else {
        trial_log_likelihood <- trial_wiener$logvalue
    }
    
    # 4b. Compute regularizing prior terms to encourage (but not constrain) parameters to stay within reasonable ranges
    log_prior <-
        # Prevents from getting too large
        dexp(retrieval_rate, rate = 1 / 50, log = TRUE) +
        # Prevents from getting too large
        dexp(boundsep, rate = 1 / 50, log = TRUE) +
        # Prevents from getting too close to 0 or 1
        dbeta(bias, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
        # Prevents from getting too close to 0 or 1
        dbeta(resid / min_rt, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
        # Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 1)
        dgamma(strength1, shape = 2.618, rate = 1.618, log = TRUE) +
        # Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 10)
        dgamma(specificity, shape = 1.1, rate = 0.1, log = TRUE) +
        # Prevents from getting too close to 0 or 1
        dbeta(specificity_diff, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
        # Prevents from getting too large
        dexp(within_sim, rate = 1 / 20, log = TRUE) +
        # Prevents from getting too large
        dexp(criterion, rate = 1 / 20, log = TRUE)
    
    # 4b. The value to be returned is the negative of the summed log likelihood plus log prior
    return(-(sum(trial_log_likelihood) + log_prior))
}

4.4 Predicted mean RT’s and response probabilities

When we fit the model, it is important to check whether its predictions are close to what was observed. Therefore, it will be important to be able to calculate, for each trial \(i\), three things:

  1. The predicted probability of saying “yes” on trial \(i\), denoted \(\hat{p}_i\).
  2. The predicted mean RT for “yes” responses on trial \(i\), denoted \(\widehat{RT}_{Y, i}\).
  3. The predicted mean RT for “no” responses on trial \(i\), denoted \(\widehat{RT}_{N, i}\).

Since we know whether the probe item on trial \(i\) was a target (from the list) or not, we can use \(\widehat{RT}_{Y, i}\) and \(\widehat{RT}_{N, i}\) to find the predicted mean correct and error RT’s. If trial \(i\) is a target, then the predicted mean correct RT is \(\widehat{RT}_{Y, i}\) and the predicted mean error RT is \(\widehat{RT}_{N, i}\). These are reversed if the probe on trial \(i\) is a foil.

You may ask why it is useful to find the mean RT’s and response probabilities for each trial when we only see one outcome per trial. You can consider the predicted mean RT’s and response probabilities as what we would get if we simulated many many repetitions of trial \(i\). In that case, the mean RT’s and response probabilities represent the best possible guesses we could make about how trial \(i\) actually turned out.

4.4.1 Formulae for predictions

As it turns out, there are closed-form (if ugly) expressions for the response probability and mean RT at each boundary for a Wiener process. These are given below and are computed as part of the Stan program. Note that these formulas refer to \(d_i\), the non-flipped scaled drift rate on trial \(i\), and \(\beta_{s_i}\), the non-flipped bias parameter for participant \(s_i\). \[ \begin{align} \hat{p}_i & = \frac{1 - \exp \left( -2 b_i d_i \beta_{s_i} \right)}{1 - \exp \left( -2 b_i d_i \right)} \\ \widehat{RT}_{Y, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\beta_{s_i}}{\tanh \left(b_i d_i \beta_{s_i} \right)} \right\rbrace \\ \widehat{RT}_{N, i} & = r_{s_i} + \frac{b_i}{d_i} \left\lbrace \frac{1}{\tanh \left( b_i d_i \right)} - \frac{\left(1 - \beta_{s_i} \right)}{\tanh \left[b_i d_i \left(1 - \beta_{s_i} \right) \right]} \right\rbrace \\ \end{align} \]

4.4.2 Computing predictions in R

Rather than define an entirely new function that will compute the predictions of the EBRW, it will be easier to add some functionality to our existing ebrw_nll function. Specifically, we will add a new function argument called return.pred which we can use to tell the function whether we want it to return the model predictions (return.pred = TRUE) or just the negative summed log-likelihood (return.pred = FALSE).

Let’s focus our attention on the block of code we will add to our ebrw_nll function. It will be surrounded by an if statement that checks whether return.pred is TRUE. If it is, then we will compute the predictions for each trial.

This block of code is shown below, along with comments describing the function of each line:

if (return.pred) {
    # Define a logical vector that is TRUE for trials in which the probe item was a target and FALSE otherwise.  We define this vector by seeing whether at least one study item is equal to the probe item on that trial.  If the number of study items that are equal to the probe item is greater than 0, that means the probe matches at least one studied item and is, therefore, a target.
    is_target <- (rowSums(study_items == matrix(probe_item, nrow = nrow(study_items), ncol = ncol(study_items))) > 0)
    
    # This is a vector giving the predicted probability of making a "yes" response for each trial.
    p_resp <- (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift * bias)) / (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift))
    
    # The following two vectors give the predicted mean "yes" and "no" response times for each trial.
    mean_yes_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - bias / tanh(trial_standard_drift * trial_standard_boundsep * bias)) / trial_standard_drift
    mean_no_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - (1 - bias) / tanh(trial_standard_drift * trial_standard_boundsep * (1 - bias))) / trial_standard_drift
    
    # The following two lines use our knowledge about which response is correct to convert "yes" and "no" into "correct" and "error" response times, depending on whether the probe is a target or not.
    mean_corr_rt <- mean_yes_rt * is_target + mean_no_rt * (1 - is_target)
    mean_err_rt <- mean_yes_rt * (1 - is_target) + mean_no_rt * is_target
    
    # Finally, since we are returning the predictions and not the summed negative log-likelihood, let's put all the predictions into a data frame where each row is a trial.
    return(data.frame(p_resp = p_resp, mean_corr_rt = mean_corr_rt, mean_err_rt = mean_err_rt))
}

We will insert the block of code above just before we computed our log-likelihoods before.

# Function arguments:
# par: this is a named vector of parameter values
# dist: this is a matrix of perceived distances (derived from MDS) between all pairs of stimuli
# study_items: this is a matrix where each row is a trial and there are two columns; the value in the first column indicates the first studied item, the value in the second column indicates the second studied item
# probe_item: this is a vector giving the index of the probe item on each trial
# resp: this is a vector where each value is 0 or 1, depending on whether the participant responsed "yes" (1) or "no" (0) on that trial
# rt: this is a vector of the response times from each trial
# return.pred: A logical indicator telling the function whether we want the model predictions for each trial (TRUE) or just the negative summed log-likelihood (FALSE); default is "FALSE"
ebrw_nll <- function(par, dist, study_items, probe_item, resp, rt, return.pred = FALSE) {
    # 1. Transform the given parameters into their appropriate ranges
    
    # 1a. Find the minimum observed RT; this is the upper bound for the residual time parameter
    min_rt <- min(rt)
    
    # 1b. Transform the raw parameters as needed
    retrieval_rate <- exp(par["retrieval_rate"])
    boundsep <- exp(par["boundsep"])
    bias <- plogis(par["bias"])
    resid <- plogis(par["resid"]) * min_rt
    strength1 <- exp(par["strength1"])
    specificity <- exp(par["specificity"])
    specificity_diff <- plogis(par["specificity_diff"])
    within_sim <- exp(par["within_sim"])
    criterion <- exp(par["criterion"])
    
    # 1c. As noted above, this trick lets us take the overall specificity ("specificity") and allocate it to each of the positions according to "specificity_diff".
    specificity_pos1 <- 2 * specificity * (1 - specificity_diff)
    specificity_pos2 <- 2 * specificity * specificity_diff
    
    # 2. Compute the mean and SD of the drift for each trial
    
    # 2a. Compute the summed similarity between the probe and study items on each trial
    trial_sum_sim <- strength1 * exp(-specificity_pos1 * dist[cbind(study_items[, 1], probe_item)]) + exp(-specificity_pos2 * dist[cbind(study_items[, 2], probe_item)])
    
    # 2b. Compute the within-list similarity between study items on each trial
    trial_list_sim <- (strength1 * exp(-specificity_pos1 * dist[study_items]) + exp(-specificity_pos2 * dist[study_items])) / 2
    
    # 2c. Compute the probability of taking a step up in the random walk
    trial_p <- trial_sum_sim / (trial_sum_sim + criterion + within_sim * trial_list_sim)
    
    # 2d. Compute the mean and standard deviation of the corresponding diffusion
    trial_drift_mean <- retrieval_rate * (2 * trial_p - 1)
    trial_drift_sd <- 2 * sqrt(retrieval_rate * trial_p * (1 - trial_p))
    
    # 3. Calculate the log-likelihood of each observed response/RT on each trial
    
    # 3a. Standardize the drift rates and boundary separation for each trial
    trial_standard_drift <- trial_drift_mean / trial_drift_sd
    trial_standard_boundsep <- boundsep / trial_drift_sd
    
    # --- If we only need to return the predictions, then compute and return them now ---
    if (return.pred) {
        # Define a logical vector that is TRUE for trials in which the probe item was a target and FALSE otherwise.  We define this vector by seeing whether at least one study item is equal to the probe item on that trial.  If the number of study items that are equal to the probe item is greater than 0, that means the probe matches at least one studied item and is, therefore, a target.
        is_target <- (rowSums(study_items == matrix(probe_item, nrow = nrow(study_items), ncol = ncol(study_items))) > 0)
        
        # This is a vector giving the predicted probability of making a "yes" response for each trial.
        p_resp <- (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift * bias)) / (1 - exp(-2 * trial_standard_boundsep * trial_standard_drift))
        
        # The following two vectors give the predicted mean "yes" and "no" response times for each trial.
        mean_yes_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - bias / tanh(trial_standard_drift * trial_standard_boundsep * bias)) / trial_standard_drift
        mean_no_rt <- resid + trial_standard_boundsep * (1 / tanh(trial_standard_drift * trial_standard_boundsep) - (1 - bias) / tanh(trial_standard_drift * trial_standard_boundsep * (1 - bias))) / trial_standard_drift
        
        # The following two lines use our knowledge about which response is correct to convert "yes" and "no" into "correct" and "error" response times, depending on whether the probe is a target or not.
        mean_corr_rt <- mean_yes_rt * is_target + mean_no_rt * (1 - is_target)
        mean_err_rt <- mean_yes_rt * (1 - is_target) + mean_no_rt * is_target
        
        # Finally, since we are returning the predictions and not the summed negative log-likelihood, let's put all the predictions into a data frame where each row is a trial.
        return(data.frame(p_resp = p_resp, mean_corr_rt = mean_corr_rt, mean_err_rt = mean_err_rt))
    }
    
    # --- If we did not need to compute and return the predictions, the program will continue here ---
    
    # 3b. Find the likelihoods of the given responses/RT
    trial_wiener <- try(WienerPDF(t = rt, response = resp + 1, a = trial_standard_boundsep, v = trial_standard_drift, w = bias, t0 = resid))
    
    # 4. Return final result
    
    # 4a. Check if the Wiener PDF returned an error or not
    if (class(trial_wiener) == "try-error") {
        trial_log_likelihood <- -Inf
    } else {
        trial_log_likelihood <- trial_wiener$logvalue
    }
    
    # 4b. Compute regularizing prior terms to encourage (but not constrain) parameters to stay within reasonable ranges
    log_prior <-
        # Prevents from getting too large
        dexp(retrieval_rate, rate = 1 / 50, log = TRUE) +
        # Prevents from getting too large
        dexp(boundsep, rate = 1 / 50, log = TRUE) +
        # Prevents from getting too close to 0 or 1
        dbeta(bias, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
        # Prevents from getting too close to 0 or 1
        dbeta(resid / min_rt, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
        # Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 1)
        dgamma(strength1, shape = 2.618, rate = 1.618, log = TRUE) +
        # Prevents from getting too large or small (Gamma distribution has mode = 1, SD = 10)
        dgamma(specificity, shape = 1.1, rate = 0.1, log = TRUE) +
        # Prevents from getting too close to 0 or 1
        dbeta(specificity_diff, shape1 = 1.01, shape2 = 1.01, log = TRUE) +
        # Prevents from getting too large
        dexp(within_sim, rate = 1 / 20, log = TRUE) +
        # Prevents from getting too large
        dexp(criterion, rate = 1 / 20, log = TRUE)
    
    # 4b. The value to be returned is the negative of the summed log likelihood plus log prior
    return(-(sum(trial_log_likelihood) + log_prior))
}