I’ve been exploring how it may be possible to fit SCRI to behavior alone. In our prior modeling, we fit SCRI to neural data and then predicted behavior by feeding simulated SCRI activity into a Gated Accumulator Model (GAM). There are two main challenges in fitting SCRI to behavior directly, without first fitting it to neural data.
The first challenge is actually just about GAM, not SCRI per se. Because GAM does not have a closed-form likelihood expression, it requires extensive simulation to obtain predictions. For one thing, this makes GAM extremely inefficient to work with; for example, my seven year old laptop can fit SCRI to a neuron in a matter of minutes while my new multicore workstation takes an entire day to fit GAM to the RT’s of a single monkey. For another thing, it means that we have no guarantee of finding a maximum likelihood fit, making it difficult to assess the stability of the estimated parameters and hard to compare different models. Therefore, the first challenge is how to turn GAM into a tractable and efficient model that doesn’t sacrifice its ability to fit data.
The second challenge is that, in the absence of neural data, we cannot expect to be able to identify the different contributions of SCRI’s many different competitive mechanisms. For example, we know that feedforward and lateral inhibition are both important for explaining neural dynamics, but it may not be possible to distinguish between these two forms of inhibition from behavior alone. Therefore, the second challenge is to find a minimal set of SCRI mechanisms that is identifiable from behavior alone and that, in combination with the simplified GAM, yields satisfactory fits to behavior.
From GAM to DAM
As noted above, GAM does not have a closed-form likelihood expression. There are three basic reasons for this: First, because the input to GAM is not homogeneous in time (since FEF visual neuron activity is not homogeneous in time), it is not possible to rely on existing models that assume a time-homogeneous evidence generating process, like traditional diffusion or race models. Second, the presence of lateral inhibition between GAM accumulators makes it difficult to derive approximate expressions for the joint choice-RT distributions GAM predicts. Third, GAM’s reflecting boundary at zero also makes it challenging to approximate its joint choice-RT distributions.
Since our ultimate goal is to infer SCRI dynamics from behavior, we cannot avoid dealing with the first difficulty noted above. In other words, we will have to accept that the inputs to the accumulators are dynamic, since the nature of those dynamics is exactly what we want to learn about. So we can see instead about relaxing the other two difficulties by eschewing lateral inhibition and the reflecting boundary at zero.
These simplifications are not so drastic as they may seem at first: Although we won’t allow for lateral inhibition between accumulators, we will allow for lateral inhibition between the inputs to those accumulators, potentially capturing any effects this form of interaction would produce (Brown and Heathcote took a similar approach in their 2005 simplification of the LCA model). Also, we will see that the technique we will use to allow for time-inhomogeneous inputs will also effectively deal with the lack of an explicit reflecting boundary at zero.
Even with time-inhomogeneous inputs, the resulting model is easier to work with because we can model each accumulator independently. All we need to do is figure out how to approximate the first-passage time density for a single accumulator with a single response threshold. Then we can treat the decision as a race between dynamic accumulators.
Dealing with time-varying evidence
To model accumulation of time-inhomogeneous evidence, I adopt the basic approach described by Smith (2000), which I’ve also used to derive choice/RT predictions in my recognition modeling (Cox, 2024; Cox & Shiffrin, 2017). We first compute the means and covariances of the accumulated evidence across a number of equally-spaced timepoints starting from the beginning of a trial. This allows us to compute the probability that the level of accumulated evidence is above threshold at each timepoint. Finally, we use the covariance information to compute the probability that the accumulator first crossed threshold at each timepoint, which approximates the first-passage time density.
Assume that we have divided the time since the start of the trial into \(T\) intervals each of duration \(\Delta t\). Let \(\mathbf{\mu_i}\) be a vector describing the mean levels of accumulated evidence in each of the \(T\) intervals for accumulator \(i\) and let \(\mathbf{\Sigma_i}\) be the covariance matrix of the accumulated evidence across each of the \(T\) intervals. If \(\theta > 0\) is the response threshold, then the probability that the level of accumulated evidence is above threshold at time interval \(t\) is \[
p_i[t] = 1 - \Phi \left( \frac{\theta - \mu_i[t]}{\sqrt{\Sigma_{i}[t, t]}} \right)
\] where \(\Phi\) is the standard normal CDF. Meanwhile, we can write the conditional probability that the accumulator was above threshold at interval \(t\)given that it was above threshold at a previous interval \(s\) (\(s < t\)) as \(r_i[s, t]\); this probability must be computed numerically. Finally, we obtain the probability \(f_t\) that the accumulator first crosses threshold during interval \(t\) as \[
f_i[t] = p_i[t] - \sum_{s = 1}^{t - 1} p_i[s] r_i[s, t]
\] where the sum \(\sum_{s = 1}^{t - 1} p_i[s] r_i[s, t]\) gives the probability that the accumulator had crossed threshold during some prior interval \(s\) and is above threshold again during interval \(t\).
Obtaining evidence from SCRI
Armed with a method for computing the first-passage probabilities for accumulators with time-varying inputs, we now need to get \(\mathbf{\mu_i}\) and \(\mathbf{\Sigma_i}\) from SCRI. Let \(v_i[t]\) represent the activity of a SCRI salience unit with receptive field \(i\) at time \(t\). If we treat \(v_i[t]\) as the rate of a Poisson spike-generating process, then the distribution of the number of spikes produced at time \(t\) has a mean of \(v_i[t]\) and variance \(v_i[t]\) (since the mean and variance of a Poisson distribution are equal). We will retain the assumption from GAM that the input to each accumulator comes from several visual salience neurons. For simplicity, let’s treat the input to each accumulator as the average activity of \(L\) neurons. If those neurons are all described by the same dynamics, then the momentary input to the accumulator for movement field \(i\) at time \(t\) has mean \(d_i[t]\) and variance \(s^2_i[t]\): \[\begin{align}
d_i[t] & = v_i[t] \\
s^2_i[t] & = \frac{v_i[t]}{L}
\end{align}\] Note that this is closely aligned with Smith (2023), who argued for accumulator models in which the variance of the evidence was linearly related to the mean. As shown above, this is a natural consequence of assuming that evidence is the product of an average of Poisson processes.
The expressions above describe the samples of evidence generated at each time point \(t\). To find the mean and variance of the accumulated evidence by time \(t\), we must sum these samples and also allow for leakage of accumulated evidence over time. Letting \(\lambda_m\) represent the rate at which evidence “leaks” out of an accumulator, we can express the mean and variance of the accumulator for movement field \(i\) at time \(t\) as \[\begin{align}
\mu_i[t] & = \sum_{s = 1}^t \exp \left[-\lambda_m \left(t - s \right) \right] d_i[s] \\
\Sigma_i[t, t] & = \sum_{s = 1}^t \exp \left[-\lambda_m \left(t - s \right) \right] s^2_i[s]
\end{align}\] We can express the entire covariance matrix \(\mathbf{\Sigma_i}\) more compactly by defining a \(T \times T\) matrix \(\mathbf{S}\) where each row gives the weight with which each of the \(T\) samples contributes to the accumulated evidence at each time. Specifically, we can define the entry in row \(t\) and column \(s\) of \(S\) as \[
S[t, s] = \begin{cases}
\exp \left[-\lambda_m \left(t - s \right) \Delta t \right] & \text{if } s \leq t \\
0 & \text{if } s > t
\end{cases}
\] Such that, for example, if \(\Delta t = 10\) and \(\lambda_m = 0.01\), the first 5 rows and columns of \(S\) would be
Then, we can express the mean and covariance of the accumulated evidence in accumulator \(i\) as \[\begin{align}
\mathbf{\mu_i} & = \mathbf{S} \mathbf{d_i} \\
\mathbf{\Sigma_i} & = \mathbf{S} \text{diag} \left( \mathbf{\sigma^2_i} \right) \mathbf{S}^T
\end{align}\] where \(\text{diag}\) is a \(T \times T\) diagonal matrix with the vector of variances \(\mathbf{\sigma^2_i}\) along the diagonal. We thus have all the ingredients necessary to compute the distribution of first-passage times for a single accumulator, given its input from SCRI.
Choice-RT likelihood
Since each accumulator is conditionally independent of the others, the likelihood of making a saccade into the movement field of unit \(i\) at time interval \(t\) is \[
L \left(i, t \right) = \frac{f_i[t]}{\Delta t} \prod_{j \neq i} \left( 1 - F_j[t] \right)
\] where dividing by \(\Delta t\) (the duration of each interval) is necessary to express a probability density (as opposed to mass) and \(F_j[t]\) is the cumulative probability of unit \(j\) having reached threshold by time \(t\), i.e. \[
F_j[t] = \sum_{s = 1}^t f_j[s]
\]
Opening the gate, breaking the dam
As noted above, one simplification in the present model relative to GAM is the lack of lateral inhibition between movement units, which allows us to treat them as conditionally independent. The other simplification is the removal of a gate. This second simplification is not for computational reasons, but because the gate parameter is no longer identifiable. Because the model no longer needs to account for baseline activity of neurons, which would typically fall below the gate, the gate parameter and the baseline tonic excitation parameters trade off with one another. In essence, the simplified model is a model only of above-gate neural activity, since this is the only activity that makes an identifiable contribution to behavior.
I therefore call the resulting model “DAM” for Dynamic Accumulator Model rather than “GAM”. Since the model no longer includes an explicit gate, we must drop the “G”, but since the model does include an explicit account of the dynamics of the evidence being accumulated, we can replace it with a “D”. Also, “DAM” evokes the idea of a “dam” of a river—we are modeling the activity that “spills over” the dam and thus contributes to downstream behavior.
Simplifying SCRI
The original SCRI model allowed for a variety of potential competitive interactions between salience and identification units to account for the diversity of neural responses. In addition, the model needed to account for the fact that neurons have nonzero baseline activity. As noted above, however, this baseline activity is not identifiable in the current model since it does not contribute to behavior. Moreover, it is unlikely that behavior alone would be able to distinguish between all the different varieties of competitive interaction present in the full version of SCRI.
Summary of simplifications
Here, I summarize key simplifications I propose for SCRI when fitting to behavior alone.
No baseline tonic excitation
Behavior depends only on activity that is above the “gate” for movement units to accumulate. A visual neuron’s baseline activity must be below this gate, otherwise primates would constantly make saccades to random endpoints. As a result, we fix the baseline activity parameter at \(b = 0\).
No lateral inhibition between identification units
This mechanism was only weakly supported by our original model comparisons. Moreover, because this form of inhibition only has an indirect effect on salience unit activation, it is doubtful that this mechanism could be identified from behavior alone.
Same value of leak parameter for salience and identification units
Because the absolute level of activation is not constrained by neural data, it is also not possible to uniquely identify different rates of leakage for salience and identification units. We therefore assume that both salience and identification units have the same leakage parameters, i.e., \(\lambda_v = \lambda_z\). Below, I refer to just a single “visual” leakage parameter \(\lambda_v\), although movement units are permitted to have their own leakage parameter, \(\lambda_m\).
Recurrent gating only, no additional delay
Finally, although the full version of SCRI allows for an additional delay in the onset of identification information, we fix this parameter to \(\kappa = 0\). Again, it is doubtful that this additional delay would be identifiable from behavior alone, since it would trade-off with the onset time of the localization signal (which is no longer constrained by neural data).
Final set of simplified SCRI equations and parameters
Putting it all together, the simplified version of SCRI for fitting behavior is expressed via the following equations:
For reference, these are the remaining free parameters in the model.
Parameter
Description
\(\chi\)
Strength of localization signal
\(\eta_i\)
Maximum strength of identification signal for item in location \(i\) (presumed to differ between targets and distractors)
\(\omega_p\)
Time of peak localization signal
\(\omega_s\)
Standard deviation of localization signal over time
\(\lambda_v\)
Leakage of salience and identification unit activity over time
\(\alpha_x\)
Strength of feedforward inhibition from localization signal
\(\alpha_z\)
Strength of feedforward inhibition from identification units
\(\beta_v\)
Strength of lateral inhibition between salience units
\(\rho_v\)
Range parameter describing spatial extent of lateral inhibition between salience units
\(L\)
Effective number of salience units contributing to evidence accumulated by movement units
\(\lambda_m\)
Leakage of movement unit activity over time
\(\theta\)
Threshold on movement units for initiating a saccade
Note that the shape \(s\) and rate \(r\) of the Gamma distribution describing the localization signal are calculated from the \(\omega_p\) and \(\omega_s\) parameters: \[\begin{align*}
r & = \frac{\omega_p + \sqrt{\omega_p^2 + 4 \omega_s^2}}{2 \omega_s^2} \\
s & = 1 + \omega_p r
\end{align*}\]
Exploratory fits and model comparisons
To test out the simplified SCRI+DAM model, I tried fitting different versions of the model to the RT and accuracy data from monkeys Q and S. These monkeys did a visual search task in which set size varied from 2 to 4 to 8. As shown below, error rates were relatively consistent across set sizes but saccade RT’s were slower with larger set sizes for both corrects and errors.
I tried fitting the model with all the parameters in the table above. The graphs below show both the internal SCRI dynamics as well as predicted/observed choice/RT distributions for each monkey.
`summarise()` has grouped output by 'cond', 't'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'cond', 't'. You can override using the
`.groups` argument.
Code
obs_all <- obs_all %>%mutate(subject =factor(subject, levels =levels(data$subject)),correct =factor(resp ==1, levels =c(TRUE, FALSE), labels =c("Correct", "Error")),setsize =factor(cond, levels =1:nrow(items), labels =paste("Set size", levels(factor(data$setsize)))) )pred_all <- pred_all %>%mutate(subject =factor(subject, levels =levels(data$subject)),correct =factor(array_position ==1, levels =c(TRUE, FALSE), labels =c("Correct", "Error")),setsize =factor(cond, levels =1:nrow(items), labels =paste("Set size", levels(factor(data$setsize)))) )for (subj inlevels(data$subject)) {print( pred_all %>%filter(subject == subj) %>%ggplot(aes(x = t, y = y, color = array_position)) +geom_col(aes(fill = array_position), alpha =0.5, position ="identity", data = obs_all %>%filter(subject == subj), color =NA) +geom_line() +expand_limits(y =0) +coord_cartesian(xlim =c(0, 1000)) +facet_grid(type ~ setsize, scales ="free_y", switch ="y") + plot_theme +theme(strip.placement ="outside") +labs(x ="Time from array (ms)", y =NULL, color ="Array position", fill ="Array position", title =paste("Monkey", subj)) )}
The model is doing a good job fitting the choice/RT distributions. Moreover, the best-fitting SCRI dynamics resemble those of the FEF visual neurons that SCRI was originally built to explain, with an initial peak followed by a later phase in which activity diminishes but is higher with a target in the RF than a distractor.
The best-fitting parameter values for each monkey are given below.
Code
knitr::kable(parDF %>%filter(model ==11) %>%mutate(par =factor(par, levels =c("loc", "id[1]", "id[2]", "loc_peak", "loc_spread", "leak", "ff_loc", "ff_id", "lat_vis", "lat_vis_spread", "N", "leak_mov", "threshold"), labels =c("Localization strength", "Target ID strength", "Distractor ID strength", "Localization peak time", "Localization time SD", "Salience/ID leakage", "Localization feedforward inhibition", "ID feedforward inhibition", "Salience lateral inhibition", "Lateral inhibition range", "Effective number of input units", "Movement unit leakage", "Movement unit threshold"))) %>%pivot_wider(id_cols = par, names_from = subject, values_from = raw) %>%arrange(par),digits =4)
par
Q
S
Localization strength
0.9932
0.9677
Target ID strength
0.0033
0.0052
Distractor ID strength
0.0029
0.0033
Localization peak time
3.8228
37.2612
Localization time SD
3.2569
7.3608
Salience/ID leakage
0.0105
0.0080
Localization feedforward inhibition
0.0279
0.0567
ID feedforward inhibition
0.0000
0.0023
Salience lateral inhibition
0.0002
0.0003
Lateral inhibition range
94674.9867
7.1114
Effective number of input units
15.5913
1.4247
Movement unit leakage
0.0051
0.0067
Movement unit threshold
44.9747
49.9715
I highlight two features of these estimates: First, the peak time for the localization signal, for both monkeys, is estimated to be very early—much more so than when SCRI was fit to individual neurons. This may be a consequence of the fact that the present modeling only addresses activity that would fall above the “gate” in GAM, such that the resulting dynamics are distorted relative to what we would see if we could model activity that falls below the gate. In other words, the interpretation of “peak localization signal” may need to change in this simplified model.
Second, the estimated range of lateral inhibition is quite large (effectively infinite for monkey Q), suggesting that the corresponding parameter can be fixed to infinity without sacrificing model fit, an issue we now address with model comparison.
Model comparisons
The table below gives AIC and BIC for eleven different SCRI+DAM variants, defined by which combination of inhibitory mechanisms are included in the model. The model variant yielding the highest AIC and BIC for each monkey is indicated in the rightmost columns.
According to BIC, the preferred model for both monkeys is model 7, which includes location-based feedforward inhibition and lateral inhibition between salience units, but no identification-based feedforward inhibition or range falloff for lateral inhibition. For monkey Q, this same model is also preferred by AIC. For monkey S, however, the model preferred by AIC also includes identification-based feedforward inhibition. To that end, the table above of best-fitting parameters indicates that the best-fitting value of the ID feedforward inhibition parameter (\(\alpha_z\)) is effectively zero for moneky Q, but is non-zero for monkey S.
BIC-preferred model
For simplicity, however, we focus on the model variant that was preferred for both monkeys based on BIC. As shown below, this simpler model displays qualitatively similar dynamics and model fit compared with the full model above.
`summarise()` has grouped output by 'cond', 't'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'cond', 't'. You can override using the
`.groups` argument.
Code
obs_all <- obs_all %>%mutate(subject =factor(subject, levels =levels(data$subject)),correct =factor(resp ==1, levels =c(TRUE, FALSE), labels =c("Correct", "Error")),setsize =factor(cond, levels =1:nrow(items), labels =paste("Set size", levels(factor(data$setsize)))) )pred_all <- pred_all %>%mutate(subject =factor(subject, levels =levels(data$subject)),correct =factor(array_position ==1, levels =c(TRUE, FALSE), labels =c("Correct", "Error")),setsize =factor(cond, levels =1:nrow(items), labels =paste("Set size", levels(factor(data$setsize)))) )for (subj inlevels(data$subject)) {print( pred_all %>%filter(subject == subj) %>%ggplot(aes(x = t, y = y, color = array_position)) +geom_col(aes(fill = array_position), alpha =0.5, position ="identity", data = obs_all %>%filter(subject == subj), color =NA) +geom_line() +expand_limits(y =0) +coord_cartesian(xlim =c(0, 1000)) +facet_grid(type ~ setsize, scales ="free_y", switch ="y") + plot_theme +theme(strip.placement ="outside") +labs(x ="Time from array (ms)", y =NULL, color ="Array position", fill ="Array position", title =paste("Monkey", subj)) )}
The best-fitting parameters, shown below, are comparable to those estimated with the full model. It is still the case, for example, that the peak time of the localization signal is quite early, which again may reflect the incomplete picture of SCRI dynamics afforded by behavior alone. It is also worth noting that monkey S is estimated to have a smaller effective number of input units to each of his movement units. While it is probably not wise to interpret that parameter as a literal number of neurons—since it reflects not just the number of units but the degree to which they are independent vs. correlated—it is consistent with the fact that Purcell et al. (2012) also estimated fewer input neurons for monkey S compared to monkey Q.
Code
knitr::kable(parDF %>%filter(model ==7) %>%mutate(par =factor(par, levels =c("loc", "id[1]", "id[2]", "loc_peak", "loc_spread", "leak", "ff_loc", "ff_id", "lat_vis", "lat_vis_spread", "N", "leak_mov", "threshold"), labels =c("Localization strength", "Target ID strength", "Distractor ID strength", "Localization peak time", "Localization time SD", "Salience/ID leakage", "Localization feedforward inhibition", "ID feedforward inhibition", "Salience lateral inhibition", "Lateral inhibition range", "Effective number of input units", "Movement unit leakage", "Movement unit threshold"))) %>%pivot_wider(id_cols = par, names_from = subject, values_from = raw) %>%arrange(par),digits =4)