A Cascade of Conjunctive Representations

Modeling feature activation

Features

Imagine that different types of features become active at different rates. I model activation as a function that grows from zero to one, and as such can be interpreted as the probability that the given feature has been activated by a particular time. For simplicity, I assume that this function is exponential: \[ v_k(t) = 1 - \exp(-\rho_k t) \] where \(v_k(t)\) is the degree of activation of feature \(k\) at time \(t\) and \(\rho_k\) is the rate at which feature \(k\) becomes active.

Conjunctions

By definition, one cannot make a conjunction until both elements of the conjunction are available. You cannot encode “A & B” until you have both A and B. By that logic, the degree of activation of a conjunction between features is the product of the degree of activation for the features being conjoined. For example, if \(v_{\text{Item}}(t)\) and \(v_{\text{Pos.}}(t)\) are the degree of activation of item and position features, respectively, at time \(t\), then the degree of activation of the conjunction of position and item information is \[ v_{\text{Pos. & Item}}(t) = v_{\text{Pos.}}(t) \times v_{\text{Item}}(t) \]

Cascade analogy

The present formulation implies that conjunctions will, in general, take longer to become active than features. In particular, as shown in the examples below, the degree of activation of a conjunction is strongly determined by the slowest of the features being conjoined. This is similar to serial processing in that making a decision on the basis of a conjunction requires waiting until the component features are sufficiently activated. However, this process is not strictly serial because it allows for partial activation of features and conjunctions. It also allows for component features to be processed in parallel, even if conjunctions are slower. Thus, this process is more like a cascade than a strictly serial model.

Modeling response times

I view features and conjunctions as different dimensions of a stimulus to which you might attend. If you attend to a feature or a conjunction, then it contributes to the drift rate of a two-boundary diffusion decision process (I just picked this because I had the code ready for it; we could also explore accumulator models).

From feature activations to drift rates

I assume that each feature/conjunction is associated with an asymptotic drift rate \(\mu_k\) and drift variance \(\sigma^2_k\) that is approached as the feature/conjunction becomes fully active. As such, the time-varying drift rate \(d_k(t)\) and drift variance \(s^2_k(t)\) associated with feature/conjunction \(k\) are given by the following equations: \[ \begin{align} d_k(t) & = v_k(t) \mu_k \\ s^2_k(t) & = v_k(t) \sigma^2_k \end{align} \]

We can then model the mean \(x_k(t)\) and variance \(\xi^2_k(t)\) of the accumulated evidence based on feature/conjunction \(k\) at time \(t\) as \[ \begin{align} x_k(t) & = \int_{\tau = 0}^t v_k(t) \mu_k d \tau & = \mu_k \int_{\tau = 0}^t v_k(t) d \tau \\ \xi^2_k(t) & = \int_{\tau = 0}^t v_k(t) \sigma^2_k d \tau & = \sigma^2_k\int_{\tau = 0}^t v_k(t) d \tau \end{align} \]

Discrete time approximation

In practice, I employ a discrete time approximation to find the mean and variance of the evidence state within time intervals of length \(\Delta t\). Specifically \[ \begin{align} x_k \left( t + \Delta t \right) & = x_k \left( t \right) + v_k(t) \mu_k \Delta t \\ \xi^2_k \left( t + \Delta t \right) & = \xi^2_k \left( t \right) + v_k(t) \sigma^2_k \Delta t \end{align} \]

Multivariate normal distribution evidence over time

The result of the discrete time approximation is the mean vector and covariance matrix of a multivariate normal distribution of evidence across time. Let \(\boldsymbol{\eta}\) stand for the random variable \[ \boldsymbol{\eta_k} \sim \mathcal{MVN} \left( \begin{bmatrix} x_k \left( 0 \right) \\ x_k \left( \Delta t \right) \\ x_k \left( 2 \Delta t \right) \\ \vdots \end{bmatrix}, \begin{bmatrix} \xi^2_k \left( 0 \right) & \xi^2_k \left( 0 \right) & \xi^2_k \left( 0 \right) & \cdots \\ \xi^2_k \left( 0 \right) & \xi^2_k \left( \Delta t \right) & \xi^2_k \left( \Delta t \right) & \cdots \\ \xi^2_k \left( 0 \right) & \xi^2_k \left( \Delta t \right) & \xi^2_k \left( 2 \Delta t \right) & \cdots \\ \vdots & \vdots & \vdots & \ddots \\ \end{bmatrix} \right) \] The structure of the covariance matrix comes from the fact that each interval of time involves the accumulation of additional activation beyond that present at earlier intervals.

Crossing decision boundaries

Imagine that we have two decision bounds, an upper one \(B_Y > 0\) for saying “yes” and a lower one \(B_N < 0\) for saying no. The probability that memory evidence crossed the upper response boundary between times \(t - \Delta t\) and \(t\) is the joint probability that \(\boldsymbol{\eta_k}(t) > B_Y\) and \(\boldsymbol{\eta_k} \left( t - \Delta t \right) < B_Y\). This joint probability, which we denote \(p_Y(t)\), is a bivariate normal cumulative probability which can be computed numerically. We can similarity define and compute \(p_N(t)\), the probability that memory evidence crossed the lower response boundary between times \(t - \Delta t\) and \(t\). To obtain joint distributions of response time and response outcome, we need to know the probability that memory evidence first crossed either the upper or lower boundary between times \(t\) and \(t - \tau\). We take a renewal-process approach to find this probability as described by Phil in his 2000 paper.

Consider just the first crossing distribution for the upper boundary. If memory evidence crossed the upper boundary between times \(t\) and \(t - \Delta t\), then there are three mutually exclusive and exhaustive possibilities:

This is the first time memory evidence crossed either boundary; call the probability of this \(f_Y(t)\).
Memory evidence previously crossed the upper boundary and is crossing it again; call the probability of this \(r_{YY}(t)\).
Memory evidence previously crossed the lower boundary and is now crossing the upper boundary; call the probability of this \(r_{NY}(t)\).

Because these possibilities are mutually exclusive and exhaustive, we can find \(f_Y(t)\) by rearranging terms: \[ \begin{align} p_Y(t) & = f_Y(t) + r_{YY}(t) + r_{NY}(t) \\ f_Y(t) & = p_Y(t) - r_{YY}(t) - r_{NY}(t) \end{align} \]

To find the probabilities \(r_{YY}(t)\) and \(r_{NY}(t)\), we have to find the probability, for each preceding interval of time, that either the upper (for \(r_{YY}(t)\)) or lower (for \(r_{NY}(t)\)) boundary was first crossed during that interval and that, conditional on that happening, memory evidence again crossed the upper boundary in the interval preceding time \(t\): \[ \begin{align} r_{YY}(t) & = \sum_{l = 0}^{\frac{t}{\Delta t}} f_Y(l \Delta t) \Pr \left[ \boldsymbol{\eta_k}(t) > B_Y \middle\vert \boldsymbol{\eta_k} (l \Delta t) > B_Y \right] \\ r_{NY}(t) & = \sum_{l = 0}^{\frac{t}{\Delta t}} f_N(l \Delta t) \Pr \left[ \boldsymbol{\eta_k}(t) > B_Y \middle\vert \boldsymbol{\eta_k}(l \Delta t) < B_N \right] \end{align} \]

Analogous computations yield the first passage time distribution at the lower boundary, denoted \(f_N(t)\). Together, \(f_Y(t)\) and \(f_N(t)\) give the probability of making each response during a particular time interval.

Examples

In the following examples, I set \(B_Y = 1\), \(B_N = -1\), and \(\Delta t = 0.01\). I assume that there are two types of feature, position and item, as well as the conjunction position & item. I assume that \(\mu_{\text{Position}} = \sigma^2_{\text{Position}} = 0\) to reflect the idea that encoding position information alone provides no evidence either for or against a match. I also assume that \(\sigma^2_{\text{Item}} = \sigma^2_{\text{Position & item}} = 1\) for simplicity.

The RT predictions come from two different models: In one model, the drift of the diffusion is determined by item features. In the other model, the drift of the diffusion is determined by the conjunction of position & item.

Position and item activate at the same rate

Even if position and item features activate at the same rate, responses based on the conjunction of position & item are slower because they need to wait until both feature types are active.

Position is available immediately

One way to think about pre-cuing the position is that the position feature is already active at the start of the trial. In this case, there is no delay for decisions driven by the conjunction of position & item relative to decision driven by item features alone.

Position is slower than item

We know that, in general, it takes time to focus attention on the cued position. The example below illustrates how this slows responding based on the conjunction, because now you have to wait even longer before all the necessary information is activated.