Classification experiments:
Under SDT, if stimuli \(S_1, S_2, S_3\) are perceptually one dimensional and the means of the distributions they give rise to can be ordered such as \(\mu_1< \mu_2< \mu_3\), then \[ d'(1,3) = d'(1,2)+d'(2,3) \] * cumulative d’: d’ for any stimulus S to the end-point.
Thurstone (1927) developed models of one-dimensional classification based on the normal distribution, known as “Thurstonian scaling”.
Recognition memory experiment:
## items pYes zPyes
## 1 Old 0.85 1.0364334
## 2 New 0.30 -0.5244005
## 3 New-Lures 0.80 0.8416212
Can calculate \(d'_{(Old, New)}\) as usual:
Can calculate \(d'_{(Old, New-Lures)}\) as usual:
And can also calculate \(d'_{(New-Lures, New)}\) : \(pseudo-d' = z(p(\text{"old"}|\text{New-Lures})) - z(p(\text{"old"}|\text{New})) = 0.84 - (-0.52) = 1.36\)
## Stimulus Yes No
## 1 1 0.02 0.98
## 2 2 0.06 0.94
## 3 3 0.15 0.85
## 4 4 0.40 0.60
## 5 5 0.80 0.20
## 6 6 0.92 0.08
## 7 7 0.96 0.04
Can calculate d’ directly for each adjacent or non-adjacent stimulus pair:
round(qnorm(0.06) - qnorm(0.02),3)
## [1] 0.499
round(qnorm(0.15) - qnorm(0.02),3)
## [1] 1.017
Can calculate d’ indirectly for non-adjacent stimulus pairs:
Here, we calculate d’ for each adjacent pair, and calculate the cumulative sum:
audExpt$zYes = round(qnorm(audExpt$Yes),3)
audExpt$dPrime = c(NA,(diff(audExpt$zYes)))
audExpt$cumulativeDp = c(NA, cumsum(audExpt$dPrime[2:7]))
audExpt
## Stimulus Yes No zYes dPrime cumulativeDp
## 1 1 0.02 0.98 -2.054 NA NA
## 2 2 0.06 0.94 -1.555 0.499 0.499
## 3 3 0.15 0.85 -1.036 0.519 1.018
## 4 4 0.40 0.60 -0.253 0.783 1.801
## 5 5 0.80 0.20 0.842 1.095 2.896
## 6 6 0.92 0.08 1.405 0.563 3.459
## 7 7 0.96 0.04 1.751 0.346 3.805
Cumulative d’ on line 3 is the indirectly-calculated d’(1,3) = 1.018, very similar to directly obtained d’=1.017.
Representation of the stimuli on decision space:
Typically, the mean of an endpoint stimulus is set to 0 (here, S1). Cumulative d’ from that endpoint specifies the means of the distributions S1..S7.
The criterion location can be found from any pair using \(\lambda = -z(FA)\).
If we plot the p(“yes”) or cumulative d’ as a function of the stimulus, we get a psychometric function.
Threshold:
\(d'\)-based definition is response-bias free, while p(yes) definition only depends on ‘hits’, so in not free of response bias.
Participant response whether the second line was ‘longer’ or ‘shorter’ than the standard.
round(qnorm(0.25),3)
## [1] -0.674
Typically interested in the ‘category boundary’, which is, like the PSE, the stimulus on which p(/ga/) = 0.5.
## Stimulus ka ga zKa dPrime cumulativeDp
## 1 1 0.02 0.98 -2.054 NA NA
## 2 2 0.06 0.94 -1.555 0.499 0.499
## 3 3 0.15 0.85 -1.036 0.519 1.018
## 4 4 0.40 0.60 -0.253 0.783 1.801
## 5 5 0.80 0.20 0.842 1.095 2.896
## 6 6 0.92 0.08 1.405 0.563 3.459
## 7 7 0.96 0.04 1.751 0.346 3.805
Experimental data yield a confusion matrix:
Stimuli = c(1,2,3,4)
Resp2 = c(11, 12, 21, 0)
Resp1 = c(39, 17, 0, 0)
Resp3 = c(0, 21, 12, 17)
Resp4 = c(0, 0, 17, 33)
intensity = data.frame(Stimuli, Resp1, Resp2, Resp3, Resp4)
intensity
## Stimuli Resp1 Resp2 Resp3 Resp4
## 1 1 39 11 0 0
## 2 2 17 12 21 0
## 3 3 0 21 12 17
## 4 4 0 0 17 33
cat(' proportions\n')
## proportions
intensityProportions = intensity[,2:5]/apply(intensity[,2:5], 1, sum)
data.frame(Stimuli, intensityProportions)
## Stimuli Resp1 Resp2 Resp3 Resp4
## 1 1 0.78 0.22 0.00 0.00
## 2 2 0.34 0.24 0.42 0.00
## 3 3 0.00 0.42 0.24 0.34
## 4 4 0.00 0.00 0.34 0.66
cat('cumulative proportions\n')
## cumulative proportions
intensityCumulativeProportions = t(apply(intensityProportions,1,cumsum))
data.frame(Stimuli, intensityCumulativeProportions)
## Stimuli Resp1 Resp2 Resp3 Resp4
## 1 1 0.78 1.00 1.00 1
## 2 2 0.34 0.58 1.00 1
## 3 3 0.00 0.42 0.66 1
## 4 4 0.00 0.00 0.34 1
cat('z-transforms\n')
## z-transforms
intensityZ = round(qnorm(intensityCumulativeProportions),3)
intensityZ
## Resp1 Resp2 Resp3 Resp4
## [1,] 0.772 Inf Inf Inf
## [2,] -0.412 0.202 Inf Inf
## [3,] -Inf -0.202 0.412 Inf
## [4,] -Inf -Inf -0.412 Inf
cat('d-primes\n')
## d-primes
dP_1vs2 = intensityZ[1,]-intensityZ[2,]
dP_2vs3 = intensityZ[2,]-intensityZ[3,]
dP_3vs4 = intensityZ[3,]-intensityZ[4,]
dPrimes = rbind(dP_1vs2, dP_2vs3, dP_3vs4)
rownames(dPrimes) = c('d(1,2)', 'd(2,3)', 'd(3,4)')
(dPrimes)
## Resp1 Resp2 Resp3 Resp4
## d(1,2) 1.184 Inf NaN NaN
## d(2,3) Inf 0.404 Inf NaN
## d(3,4) NaN Inf 0.824 NaN
cat('Can estimate d(1,3) using d(1,2) + d(2,3) = -1.18 + - 0.404 \n')
## Can estimate d(1,3) using d(1,2) + d(2,3) = -1.18 + - 0.404
1.18 + 0.404
## [1] 1.584
cat('# Total d\': d(1,2)+d(2,3)+d(3,4) = -1.18 + - 0.404 + -0.824\n')
## # Total d': d(1,2)+d(2,3)+d(3,4) = -1.18 + - 0.404 + -0.824
1.18 + 0.404 + 0.824
## [1] 2.408
As long as we have at least one response for which the rate is non-0 or non-1 for a pair of stimuli, we can calculate their d’. If not, then we can use the indirect method.
Psychological scaling of the 4 stimuli based on cumulative d’:
x = seq(-3,6,len = 100)
plot(x, dnorm(x = x, mean = 0, sd = 1), col = 1, type = "l", ylab = c(''))
lines(x, dnorm(x = x, mean = 1.18, sd = 1), col = 1, lty =2, type = "l")
lines(x, dnorm(x = x, mean = 1.18+0.404, sd = 1), col = 1, lty =3, type = "l")
lines(x, dnorm(x = x, mean = 1.18+0.404+0.824, sd = 1), col = 1, lty =4, type = "l")
legend("topleft", legend = c('S1', 'S2', "S3", "S4"), lty = c(1,2,3,4), bty = "n")
Most common analysis method is to calculate the mean response rating for each stimulus.
intensityProportions
## Resp1 Resp2 Resp3 Resp4
## 1 0.78 0.22 0.00 0.00
## 2 0.34 0.24 0.42 0.00
## 3 0.00 0.42 0.24 0.34
## 4 0.00 0.00 0.34 0.66
plot(Stimuli,as.matrix(intensityProportions)%*%c(1,2,3,4), type = "b",
xlim = c(1,4), ylim = c(1,4), ylab = 'Average response rating')
abline(a = 0, b = 1, lty = 2)
Since this only takes into account responses for one stimulus, it is a measure of response bias.
The slope, or difference, between average response ratings, is often used as a measure of sensitivity. However, it is flawed as well.
\[\text{Mean rating for S1} = 1\times P(\text{"1"}|S1) + 2\times P(\text{"2"}|S1)\\ \text{considering S2 as 'stimulus' and S1 as noise, then P("1"|S1) = CR = (1-FA) and FA = P("2"|S1), so }\\ \text{Mean rating for S1} = 1 \times (1-F) + 2 \times (FA) = 1+F\]
\[\text{Mean rating for S2} = 1\times P(\text{"1"}|S2) + 2\times P(\text{"2"}|S2)\\ \text{still considering S2 as 'stimulus' and S1 as noise, P("1"|S2) = Miss = (1-H) and P("2"|S2) = H, so}\\ \text{Mean rating for S2} = 1 \times (1-H) + 2 \times (H) = 1+H\]
\[ \text{Mean rating S2 - Mean rating S1} = 1+H - (1+F) = H - F \text{, which is not bias-free.} \]
If the stimuli are not uni-dimensional, they cannot be represented on a single axis, and will live instead in a multidimensional space. The field of multidimensional scaling seeks to analyze such situations.
This chapter deals with detection and discrimination of compound stimuli, i.e., those composed of two or more distinct perceptual components and serves as an introduction to Multidimensionsal Detection Theory.
Examples of compound stimuli: * Is an aircraft a plane or a helicopter? * Detecting a light-beep signal or vibrating cell phone ringtone.
Typical experimental question of interest: how much is detection of a multidimensional stimulus better than detection of the individual components? Signal detection theory provides a quantitative framework for comparing these situations.
3D and 2D representations of the joint distribution of two variables, loudness and brightness, produced when a tone and a light are presented together.
Here, the variability in the two dimensions differs, generating elliptical 2D representation.
The axes above are perpendicular and the ellipse long and short axes are oriented parallel to the main axes. This represents perceptually independent variables.
Perceptually dependent variables are shown using non-perpendicular axes and/or elliptical distributions oriented at an angle to the main axes, as shown below:
In the maximum rule, an observer says “yes” if both loudness and brightness are above some criteria.
The min rule occurs when the observer says “yes” if either the tone or the light are above the criterion.
So far, the decision axis has always been either one or the other variable, but it does not have to be. Below, you see a decision axis as a line at 45 deg to both axes. This corresponds to an observer that adds the values of loudness and brightness and compares the sum to their decision criterion to decide if the tone-light pair was presented.
In multidimensional tasks, observers may adopt a number of possible response strategies. To compare decision strategies, we can use the proportion correct of an unbiased observer. For a yes/no task, \(p(c)_{max} = \Phi(d'/2)\).
This strategy consists of ignoring one component and basing one’s decision on only one component.
Assuming equal presentation probabilities, \(p(c) = 0.5*(H + CR) = 0.5*(\Phi(d'_x-k_x) + \Phi(k_x))\). An unbiased observer would place the criterion at \(d'_x/2\), so \(p(c)_{max} = \Phi(d'_x/2)\). If \(d'_x = 1\), \(p(c)_{max} = 0.69\).
Notice that the min rule has a much higher FA and Hit rate than the max rule, although both lead to the same p(c).
Calculating the compound H and FA rates from SDT using the product rule, we can obtain expressions for the hits and false alarms of compound stimuli wrt any general criterion location, k:
\[\begin{align} \text{maximum rule: } &H_{xy} = \Phi(d'-k)^2 \text{ and }\\ &F_{xy} = \Phi(-k)^2\\ \text{minimum rule: } &H_{xy} = 1-\Phi(-d'+k)^2 \text{ and }\\ &F_{xy} = 1-\Phi(k)^2 \end{align}\]Using the Pythagorean Theorem, we can derive distance between the null and compound stimulus on the decision axis:
\[ d'_{compound} = \sqrt{{d'_x}^2+{d'_y}^2 } \]
If both \(d'_x = d'_y\), then \(d'_{compound} = \sqrt{2{d'}^2} = \sqrt{2} d'\).
For d’ = 1, \(p(c)_{max} = \Phi(d'_{compound}/2) = \Phi(\sqrt(2)d'/2) = \Phi(\sqrt(2)*0.5) = 0.76\). Clearly this strategy yields the highest proportion correct (for an unbiased observer).
Expressions for the ROC curve can be obtained from the decision axis..
\[ H_{xy} = \Phi(\sqrt{2}d'-k) \\ F_{xy} = \Phi(k) \]
This figure compares the ROC curves for these different decision strategies. The min and max rule have similar sensitivity, both of which are worse that the optimal rule. This improvement in accuracy for the optimal rule is because this strategy combines information before making a decision about the individual components.
MacMillan and Creelman classify 2AFC as a comparison designs which require a comparison between two stimulus presentations and the possible stimuli are only of 2 classes.
There is nothing new about the ‘forced choice’ part. The title of the task design is not very diagnostic. The distinction between Yes/No and 2AFC is that both stimulus classes are presented on every trial, in random (spatial or temporal) order. The observer reports on the order of the stimuli (signal in the first or second interval).
Data for such an experiment can be represented as
The unbiased decision rule is to pick the interval with the larger value.
For \(X_a \sim N(\mu_a, \sigma_a^2)\) and \(X_b \sim N(\mu_b, \sigma_b^2)\), if the observations are independent, then \(Y_{a-b} = X_a -X_b\sim N(\mu_a-\mu_b, \sigma_a^2+ \sigma_b^2)\).
\[ \text{For <Old, New> trials, } Y_{<Old, New>} = X_1 - X_2 = X_s - X_n \sim N(\mu_s, 1+\sigma_s^2)\\ \text{For <New, Old> trials, } Y_{<New, Old>} = X_1 - X_2 = X_n - X_s \sim N(-\mu_s, 1+\sigma_s^2)\]
Best performance is obtained by an unbiased observer (shown in blue) that compares \(X_1- X_2\) to 0. However, the observer could have a ‘position bias’ (e.g. shown in green), whereby
Notice that \(X_1-X_2\) and \(X_2 - X_1\) have the same variance (assuming no interactions between position and stimulus). This means that we can use the equal variance model equations for d’ here even when \(X_s\) and \(X_n\) do not have equal variance.
\[ d'_{2AFC} = \frac{\text{difference in means}}{\text{common standard deviation}} = \frac{2\mu_s}{\sqrt{1+\sigma_s^2}} \]
Under the equal variance mode, \(\mu_s\) = d’ and \(\sigma_s^2 = 1\).
\[ d'_{2AFC} = \frac{2d'}{\sqrt{1+1}} = \sqrt{2}d'\]
We can also represent the same situation using 2 axes. The internal value generated by the top word is plotted on the x- axis and bottom word on the y-axis. The plot shows the two distributions associated with \(<Old,New>\) trials (bottom right) and \(<New,Old>\) trials, top left. For each trial, the observer responds depending on which axis (x or y) is closer, i.e., uses the positive diagonal as a boundary.
The distance between the two distribution means is $ d’_{2AFC} = = d’$, as we saw before.
So, \[ d'_{yes/no} = \frac{z(H) - z(FA)}{\sqrt{2}} \]
Let’s say “signal” is \(<Old, New>\) and noise is \(<New, Old>\), then \[ H = P("\text{old on top}"|<Old, New>) = A\\ FA = P("\text{old on top}"|<New,Old>) = C\\\]
(H = 16/25)
## [1] 0.64
(FA = 7/25)
## [1] 0.28
\[ d'_{2AFC} = z(A) - z(C)\\ \lambda_{2AFC} = -0.5 * (z(A) + z(C))\\ log(\beta)_{2AFC} = -0.5 * (z^2(C) - z^2(A))\\ \]
(dprime2AFC = qnorm(H) - qnorm(FA))
## [1] 0.9413003
i.e., the same formulas as with Yes/No equal variance model.
\(d'_{YesNo} = d'_{2AFC}/\sqrt(2)\)
(dprimeYesNo = dprime2AFC/sqrt(2))
## [1] 0.6655998
We can also recast these in terms of P(c), using
\[ PC_{<Old, New>} = P(\text{old on top }|<Old, New>) = A = H \\
PC_{<New, Old>} = P(\text{old on bottom}|<New, Old>) = D = 1-C = 1-FA\].
Since \(z(x) = -z(1-x)\), \(-z(C) = +z(1-C) = +z(D)\), we can replace \(z(A)\) with \(-z(PC_{New,Old})\) and \(z(A)\) with \(-z(PC_{Old,New})\).
\[ d'_{2AFC} = z(PC_{Old,New>}) + z(PC_{<New,Old>})\\ \lambda_{2AFC} = 0.5(z(PC_{<New,Old>}) - z(PC_{Old, New}))\\ log(\beta)_{2AFC} = 0.5(z^2(PC_{<New,Old>}) - z^2(PC_{Old, New}))\\ \]
# calculate values for the example
PCnewold = 18/25
PColdnew = 16/25
# qnorm(x) = -qnorm(1-x)
(dprime2AFC_2 = qnorm(PColdnew) + qnorm(PCnewold))
## [1] 0.9413003
(lambda2AFC = 0.5*(qnorm(PCnewold) - qnorm(PColdnew)))
## [1] 0.1121914
# lambda is slightly positive, so bias to say 'old at the bottom'
Overall \(P_C\) is a combination of correct-response probabilities for each order, \(PC_{Old, New}\) and \(PC_{New, Old}\), weighted by the response probabilities of each trial type.
\[ P_C = P_{Old,New} PC_{Old,New}+ P_{New, Old} PC_{New, Old}\] For an unbiased observer, \(PC_{<old,new>} = PC_{<new,old>}\) and \(\lambda\) should be \(0\) when probabilities of each order are equal. When they are unequal, \(\lambda_{FC}\) should shift to increase PC for the most frequent type.
For an unbiased observer, the 2AFC Area Theorem states that \[ p(c)_{2AFC} = \text{Area under the Yes/No isosensitivity curve} = A_z\] Under the equal variance model, the unbiased observer places their criterion on the minor diagonal and proportion correct at this point is found by
\[ p(c)_{max} = \Phi( \frac{z(H) - z(F)}{2} )\] In the Yes/No task, \[ p(c)_{max, YN} = \Phi(d'/ 2)\]
In the 2AFC task, \[ p(c)_{max, 2AFC} = \Phi(\frac{\sqrt{2}d'}{2}) = \Phi(\frac{d'}{\sqrt{2}}) \] According to the Area theorem,
\[ p(c)_{max, 2AFC} = \Phi(\frac{d'}{\sqrt{2}}) = A_z\]
We can solve the above for d’, to give
\[ d' = \sqrt(2) * z(p(c)_{max,2AFC})\]
# for the example above, p(c) = 0.68.
(dprime = sqrt(2)*qnorm(0.5*(16/25 + 18/25)))
## [1] 0.661426