How to Fit Signal Detection Models as Bayesian Generalized Linear Models

Andrew Ellis

2019-03-27

Main points

Equal-variance signal detection models can formulated as a subclass of generalized linear models (GLM) (DeCarlo 1998; Knoblauch and Maloney 2012)
SDT models can be estimated using multilevel GLMs
Multilevel models provide regularizarion (shrinkage) of group-level estimates
More generally, we can formulate any variation of SDT as a probabilistic model, e.g. unequal variance SDT, extreme value SDT
Parameters can be estimated using Bayesian inference, resulting in posterior distributions.
Advantages of Bayesian framework:
- flexibility
- access to full statistical workflow, i.e. model checking, model comparison, etc.

GLM

What is a GLM? A quick reminder

The GLM generalizes a regression model by allowing the linear predictor to be related to the response variable via a link function.

We have to the following components:

Response (outcome): Observable behaviour. E.g. subjects’ choices in an experiments, number of cars driving by per hour, etc.
Linear model: We use known variables (stimulus identity) to predict the expected value of the response. E.g. when a previously seen image is presented, the subject is more likely to respond “I’ve seen that before”.
Link function: This is a function that relates the output of the linear model to the response. E.g. in a 2AFC experiment, the linear predictor gives a value on the real line. The link function is chosen to map this real number to a probability (in this case the inverse of a sigmoidal function mapping the interval \((0, 1)\) to \((-\infty, \infty)\)).

Probit model

The most common model for binary responses is the logistic model. The probit model is related, but uses a different link function. The probit function is the quantile function associated with the normal distribution. The quantile function is also the inverse cumulative distribution function:

\[probit(x) = \Phi^{-1}(x) \]

and looks like this:

For reference, this is the cumulative distribution function:

As a GLM (with a single predictor \(X\)):

\[ P(y = 1 | X) = \Phi(\alpha + \beta \cdot X) \] or more conventionally:

\[ probit(P(y = 1 | X)) = \alpha + \beta \cdot X \]

or \[ \Phi^{-1}(P(y = 1 | X)) = \alpha + \beta \cdot X \]

Generally:

\[ g(E[P(y = 1 | X)]) = \alpha + \beta \cdot X \]

where \(g()\) is the link function.

Summary:

\(\Phi\) is the inverse link function: this maps a number on the real line to a probability, and is therefore a cumulative distribution function.
\(\Phi^{-1}\) is the link function: this maps a probability to a real, and is therefore a quantile function.

Signal detection theory

We consider an experiment in which a person has to classify a stimulus into one of two possible categories:

new / old
left / right
yes / no

We can neglect the underlying task, as the math is the same. In the general case, say we present two stimulus categories A and B, that vary along some dimension. The task of the subject in our experiment is to perform a binary classification with the response options A and B. The subject’s performance can be summarised in a classification table, with four possible outcomes:

hit: Stimulus is A, subject responds A
miss: Stimulus is A, subject responds B
false alarm: Stimulus is B, subject responds A
correct rejection: Stimulus is B, subject responds B

Given the stimulus, the subject has two response options. Therefore, we consider only the A responses when the stimulus is A (hits) or B (false alarms).

The SDT model assumes that on each trial \(i\), a person’s information about a stimulus can be modeled as a random variable \(X_i\).
This is drawn from one of two possible distributions, which (in equal variance SDT) differ only in their location, but not their scale ( we assume that \(\sigma = 1\)).

Example: familiarity. When the subject is shown an image, this evokes a feeling of ’familiarity`. This is a latent construct.

SDT does not require any particular distributions, but in practise, Gaussians are often chosen.

Thought experiment: you are a subject in a memory experiment. You were previously shown a number of images, and now you are presented with a mixture of old and new items, and have to say whether you have previously seen the test image.

This can be formulated as the following statistical problem:

You are given a random variable \(X\), i.e. a draw from a normal distribution with a known standard deviation. You also know that distribution can have either of two known means, you just don’t know which one. The two distributions differ only in their mean, and the difference in means is called d'.
You are asked to say which distribution that \(X\) is most likely to have come from. This is a decision, so you need some sort of decision rule. In this case you can choose a criterion, and compare \(X\) to this.
You will produce four types of responses: you will either correctly classifiy the presented stimulus, or its internal representation \(X\), as either old or new. You will do this correctly (hits / correct rejections), or you will produce a missclassification (false alarms / misses).
From your behavioural data, the number of hits and false alarms, we want to estimate your hit rate and false alarm rate, and then compute your internal (latent) quantities that guided your behaviour.

The internal signal evoked by old and new items is often shown like this:

New items produce less familiarity than old items, but the internal representation is noisy.

In order to classify the presented stimulus, based on the evoked familiarity (decision variable), we need a decision rule:

A simple rule is to compare the signal with a criterion \(c\). If the signal \(X > c\), then respond old (“I have previously seen it”), otherwise respond new (“haven’t seen it before”).

Signal detection parameters

The commonly known SDT parameters are \(d'\) and and \(c\).

d’: distance between distributions

\[ d' = c - \phi^{-1}(1-p_{H}) = \phi^{-1}(p_{H}) - \phi^{-1}(p_{FA}) \] which can also be written as: \[ d' = \phi^{-1}(P(y = 1 | old)) - \phi^{-1}(P(y = 1 | new)) \]

The expression for \(d'\)

requires estimating probabilities conditional on the identity of a signal
requires taking the difference on a transformed (probit) scale
this is equivalent to a contrast between levels of a factor with two levels as the linear predictor for a response in a GLM

c: decision criterion \[ c = \phi^{-1}(1-p_{FA}) = -\phi^{-1}(p_{FA}) \]

Better: distance to optimal decision boundary \[ c' = -\frac{1}{2} \left[\phi^{-1}(p_{H}) + \phi^{-1}(p_{FA})\right] \]

What we are doing here is estimating the hit rate and false alarm rate from observed hits and false alarms using maximum likelihood estimation, and then computing d’ and c from these estimated probabilities.

We can also write this the other way round:

When the stimulus is new, we will produce false alarms with probability:

\[ p_{FA} = P(y = 1 | X = 0) = 1 - \Phi(c) \]

When the stimulus is old, we will produce hits with probability: \[ p_{H} = P(y = 1 | X=1) = 1 - \Phi(c-d') \]

We can write this in one equation:

\[ P(y = 1 | X = x) = 1 - \Phi(c-d'X) = \Phi(-c + d'X) \] where \(X\) is an indicator variable, i.e. takes the value 1 for old and 0 for new.

This produces the probability of giving an old response, given the stimulus. If the stimulus is old, this is the probability of a hit, if the stimulus is new, this is the probability of a false alarm.

Compare the signal detection model

\[ P(y = 1 | X) = \Phi(-c + d'X) \]

to a GLM: \[ P(y = 1 | X) = \Phi(\alpha + \beta \cdot X) \]

We can estimate SDT parameters c and d' using a probit GLM, if we use dummy coding (or effect coding) for a two-level stimulus factor.
The intercept provides an estimate of the normal quantile of the false alarm rate.
The stimulus parameter (\(\beta\) or \(d'\)) provides an estimate of the difference between hit and false alarm rates on the probit scale.

Memory experiment: single subject

Let’s look at an example (borrowing heavily from this blog post).

The data are from a recognition memory experiment:

library(sdtalt)
library(tidyverse)
library(brms)
library(tidybayes)

data(confcontr)

confcontr <- as_tibble(confcontr) %>% 
  mutate(subno = as_factor(subno),
         item = isold - 0.5)

confcontr

First we classify each response as hit, miss, correct rejection (cr) or false alarm (fa):

sdt <- confcontr %>% 
  mutate(type = "hit",
         type = ifelse(isold==1 & sayold==0, "miss", type),
         type = ifelse(isold==0 & sayold==0, "cr", type),  
         type = ifelse(isold==0 & sayold==1, "fa", type))

And then count the number of hits, etc.

sdt <- sdt %>% 
  group_by(subno, type) %>% 
  summarise(count = n()) %>% 
  spread(type, count)
sdt

Next, we estimate the hit and false alarm rates, based on the observed number of hits and false alarms, and compute \(d'\), \(c\) and \(bias\) using the formulae given above.

sdt <- sdt %>% 
  mutate(hr = hit / (hit+miss),
         far = fa / (fa+cr),
         zhr = qnorm(hr),
         zfa = qnorm(far),
         dprime = zhr-zfa,
         c = -zfa,
         bias = -0.5*(zhr+zfa),
         model = "no pooling")

sdt %>% 
  select(subno, hr, far, zhr, zfa, dprime, c, bias)

For simplicity, we first look at the data from subject 53 only:

sdt %>% 
  filter(subno ==53) %>% 
  select(subno, hr, far, zhr, zfa, dprime, c, bias)

A (standard) GLM will give us

an intercept: this corresponds to -c
a parameter for the indicator isold: this corresponds to d'

subno53 <- confcontr %>% 
  filter(subno == 53)

fit_glm_53 <- glm(sayold ~ isold, 
                  family = binomial(link = "probit"),
                  data = subno53)
summary(fit_glm_53)

## 
## Call:
## glm(formula = sayold ~ isold, family = binomial(link = "probit"), 
##     data = subno53)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2322  -0.9734  -0.9734   1.1236   1.3961  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.3124     0.1752  -1.783   0.0746 .
## isold         0.3925     0.2534   1.549   0.1214  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.63  on 99  degrees of freedom
## Residual deviance: 135.22  on 98  degrees of freedom
## AIC: 139.22
## 
## Number of Fisher Scoring iterations: 4

We can use Bayesian inference for the same model:

fit_53 <- brm(sayold ~ isold, 
              family = bernoulli(link="probit"), 
              data = subno53,
              cores = parallel::detectCores(),
              file = here::here("models/sdtmodel-1_1"))

summary(fit_53)

##  Family: bernoulli 
##   Links: mu = probit 
## Formula: sayold ~ isold 
##    Data: subno53 (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.32      0.18    -0.68     0.02       3318 1.00
## isold         0.40      0.25    -0.10     0.88       3432 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

fit_53 %>% 
  gather_draws(b_Intercept, b_isold) %>%
  ggplot(aes(y = .variable, x = .value)) +
  geom_halfeyeh()

But what if we would rather obtain an estimate for the bias, rather than c?

We can use a special effect coding: the stimulus item is coded as 1/2 if it is old, and -1/2 if it is new. We are then either adding \(\frac{1}{2}d'\) to the intercept or or subtracting \(-\frac{1}{2}d'\) from the intercept, depending on the stimulus, and the intercept will give us the negative bias (difference between actual and optimal decision boundary).

\[ P(y = 1 | X) = \Phi(-c + d'X) \]

Using a standard GLM:

fit_glm_53_alt <- glm(sayold ~ item, 
                  family = binomial(link = "probit"),
                  data = subno53)
summary(fit_glm_53_alt)

## 
## Call:
## glm(formula = sayold ~ item, family = binomial(link = "probit"), 
##     data = subno53)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2322  -0.9734  -0.9734   1.1236   1.3961  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.1162     0.1267  -0.917    0.359
## item          0.3925     0.2534   1.549    0.121
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.63  on 99  degrees of freedom
## Residual deviance: 135.22  on 98  degrees of freedom
## AIC: 139.22
## 
## Number of Fisher Scoring iterations: 4

Or a Bayesian GLM:

fit_53_alt <- brm(sayold ~ item, 
              family = bernoulli(link="probit"), 
              data = subno53,
              cores = parallel::detectCores(),
              file = here::here("models/sdtmodel-1_1_alt"))

summary(fit_53_alt)

##  Family: bernoulli 
##   Links: mu = probit 
## Formula: sayold ~ item 
##    Data: filter(confcontr, subno == 53) (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.12      0.13    -0.36     0.14       3708 1.00
## item          0.40      0.25    -0.08     0.89       3574 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

fit_53_alt %>% 
  gather_draws(b_Intercept, b_item) %>%
  ggplot(aes(y = .variable, x = .value)) +
  geom_halfeyeh()

Another trick is to use a non-linear model in brms:

m53_nl <- bf(sayold ~ Phi(dprime*item - c), 
         dprime ~ 1, c ~ 1, 
         nl = TRUE)

Priors <- c(prior(normal(.5, 1.5), nlpar = "dprime"),
            prior(normal(0, 1.5), nlpar = "c"))

fit_53_nl <- brm(m53_nl, 
            family = bernoulli(link="identity"), 
            data = subno53,
            prior = Priors,
            control = list(adapt_delta = .99),
            cores = parallel::detectCores(),
            file = here::here("models/sdtmodel-1_2"))

summary(fit_53_nl)

##  Family: bernoulli 
##   Links: mu = identity 
## Formula: sayold ~ Phi(dprime * item - c) 
##          dprime ~ 1
##          c ~ 1
##    Data: subno53 (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dprime_Intercept     0.39      0.25    -0.08     0.86       2170 1.00
## c_Intercept          0.11      0.13    -0.13     0.35       2356 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

fit_53_nl %>% 
  gather_draws(b_c_Intercept, b_dprime_Intercept) %>%
  ggplot(aes(y = .variable, x = .value)) +
  geom_halfeyeh()

Memory experiment: multilevel model

Traditionally, we would summarise the manually calculated subject-specific point estimates of \(d'\) and \(c\) with their sample means and standard deviations.

sdt_sum <- sdt %>% 
  select(subno, dprime, c, bias) %>%
  gather(parameter, value, -subno) %>%  
  group_by(parameter) %>%  
  summarise(n = n(), 
            mu = mean(value), 
            sd = sd(value), 
            se = sd/sqrt(n))
sdt_sum

Alternatively, we can estimate both population and group level parameters in a multilevel model. Note that I am using effect coding (1/2 and -1/2) instead of dummy coding.

fitglmm <- brm(sayold ~ 1 + item + (1 + item | subno), 
               family = bernoulli(link = "probit"), 
               data = confcontr,
               prior = c(prior(student_t(3, 0, 1), class = sd),
                         prior(lkj(1), class = cor)),
               cores = parallel::detectCores(),
               file = here::here("models/sdtmodel-2_1"))

summary(fitglmm)

##  Family: bernoulli 
##   Links: mu = probit 
## Formula: sayold ~ 1 + item + (1 + item | subno) 
##    Data: confcontr (Number of observations: 3100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~subno (Number of levels: 31) 
##                     Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)           0.21      0.04     0.13     0.30       1953 1.00
## sd(item)                0.40      0.09     0.25     0.59       1999 1.00
## cor(Intercept,item)     0.11      0.26    -0.39     0.59       1791 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.13      0.04    -0.22    -0.04       1703 1.00
## item          1.06      0.09     0.89     1.23       2697 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The following figure shows the population-level parameters (also know to frequentists as fixed effects):

In the next figure, we can see a visualization of the shrinkage of individual, or group level, parameters (only mean is shown) towards to overall mean in the partial pooling model, as compared to the maximum likelihood (no pooling) estimates.

coefs <- coef(fitglmm)

crit <- coefs[[1]][,,1][,1] %>% 
  enframe() %>% 
  select(subno = name,
         crit = value) %>% 
  mutate(crit = -crit)
  
dprime <- coefs[[1]][,,2][,1] %>% 
  enframe() %>% 
  select(subno = name,
         dprime = value)

estimates <- crit %>% 
  left_join(dprime) %>% 
  mutate(model = "partial pooling",
         subno = as_factor(subno)) %>% 
  select(subno, c = crit, dprime, model)

sdt %>% 
  select(subno, c = bias, dprime, model) %>% 
  bind_rows(estimates) %>% 
  mutate(model = as_factor(model)) %>% 
  ggplot(aes(x=dprime, y = c, color =  model)) + 
  geom_hline(yintercept = 0,
             linetype = "dotted",
             alpha = 0.6) +
  geom_line(aes(group = subno), 
            color = "grey60", alpha = 0.2,
            linetype = 1) +
  geom_point(size = 2) +
  scale_color_viridis_d()

Of course, we can use a non-linear model here as well, in order to get a direct estimate of the bias:

glmm2 <- bf(sayold ~ Phi(dprime*item - c), 
            dprime ~ 1 + (1 |s| subno), 
            c ~ 1 + (1 |s| subno), 
            nl = TRUE)

Priors <- c(prior(normal(0, 3), nlpar = "dprime", lb = 0),
            prior(normal(0, 3), nlpar = "c"),
            prior(student_t(10, 0, 1), class = "sd", nlpar = "dprime"),
            prior(student_t(10, 0, 1), class = "sd", nlpar = "c"),
            prior(lkj(4), class = "cor"))

fitglmm2 <- brm(glmm2, 
                family = bernoulli(link = "identity"), 
                data = confcontr,
                prior = Priors,
                control = list(adapt_delta = .99),
                cores = parallel::detectCores(), 
                inits = 0,
                file = here::here("models/sdtmodel-2_2"))

summary(fitglmm2)

##  Family: bernoulli 
##   Links: mu = identity 
## Formula: sayold ~ Phi(dprime * item - c) 
##          dprime ~ 1 + (1 | s | subno)
##          c ~ 1 + (1 | s | subno)
##    Data: confcontr (Number of observations: 3100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~subno (Number of levels: 31) 
##                                   Estimate Est.Error l-95% CI u-95% CI
## sd(dprime_Intercept)                  0.40      0.08     0.25     0.57
## sd(c_Intercept)                       0.21      0.04     0.14     0.30
## cor(dprime_Intercept,c_Intercept)    -0.07      0.21    -0.47     0.36
##                                   Eff.Sample Rhat
## sd(dprime_Intercept)                    1561 1.00
## sd(c_Intercept)                         1690 1.00
## cor(dprime_Intercept,c_Intercept)       1307 1.00
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dprime_Intercept     1.06      0.09     0.89     1.23       1458 1.00
## c_Intercept          0.13      0.05     0.04     0.22       1321 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior predictive checks

And finally, because this is always a good idea, we can perform some posterior predictive checks to make sure that the data are well-described by the model.

We simulate 500 data sets from the posterior predictive distribution, and then compute empirical estimates of \(c\) and \(d'\).

preds_fitglmm <- fitglmm$data %>%
    select(subno, item) %>%
    add_predicted_draws(model = fitglmm, n = 500)

sdt_pred <- preds_fitglmm  %>%
    group_by(subno, .draw) %>%
    mutate(type = "hit",
           type = ifelse(item > 0 & .prediction == 0, "miss", type),
           type = ifelse(item < 0 & .prediction == 0, "cr", type),
           type = ifelse(item < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_pred <- sdt_pred %>%
    group_by(subno, .draw, type) %>%
    summarise(count = n()) %>%
    spread(type, count)

sdt_pred <- sdt_pred %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_pred %>%
    group_by(subno) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = subno)) +
    geom_pointinterval(fatten_point = 2,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_point(aes(y = dprime, x = subno),
               data = sdt,
               shape = 21, color = "black",
               fill = "pink", stroke = 1, size = 3) +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
  theme_tidybayes()

Line bisection

We have the following binary data from a line bisection task. Each of 6 subjects performed the task at several time points.

A common approach is to compute d’ for each subject in each session, as a measure of performance.

DeCarlo (1998); Knoblauch and Maloney (2012) describe how d’ can be computed using a type of generalized linear model (Probit regression)
Here, we implement a Bayesian multilevel probit regression model.

Bayesian multilevel probit regression model

In order to obtain d’ as a parameter, we use an effect coding for the stimulus:

PilotData %>%
  select(subject, measurement, offset, offsetnum, sayright) %>%
  head() %>%
  knitr::kable()

subject	measurement	offset	offsetnum	sayright
ALPE94	1	left	-0.5	0
ALPE94	1	right	0.5	1
ALPE94	1	left	-0.5	0
ALPE94	1	right	0.5	1
ALPE94	1	left	-0.5	1
ALPE94	1	left	-0.5	0

fit_line <- brm(sayright ~ 0 + measurement + measurement:offsetnum +
                      (0 + measurement + measurement:offsetnum | subject),
                  family = bernoulli("probit"),
                  data = PilotData,
                  control = list(adapt_delta = .99),
                  prior = c(prior(normal(1, 1), class = b),
                            prior(student_t(3, 0, 1), class = sd)),
                  inits = 0,
                  file = here::here("models/fit-line_bisection"))

Posterior predictive check

To check whether the model captures the important features of the data, we generate 500 data sets from the posterior predictive distribution.
Next, we calculate d’ for each dataset, and then plot the distribution of d’. We overlay this with the d’ computed from the actual data.

preds_dprime_c <- fit_line$data %>%
    select(subject, measurement, offsetnum) %>%
    add_predicted_draws(model = fit_line, n = 500)

sdt_line_pred <- preds_dprime_c %>%
    group_by(subject, measurement, .draw) %>%
    mutate(type = "hit",
           type = ifelse(offsetnum > 0 & .prediction == 0, "miss", type),
           type = ifelse(offsetnum < 0 & .prediction == 0, "cr", type),
           type = ifelse(offsetnum < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_line_pred <- sdt_line_pred %>%
    group_by(subject, measurement, .draw, type) %>%
    summarise(count = n()) %>%
    spread(type, count)

sdt_line_pred <- sdt_line_pred %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_line_pred %>%
    group_by(subject, measurement) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = measurement)) +
    geom_pointinterval(fatten_point = 2,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_point(aes(y = dprime, x = measurement),
               data = sdt_line,
               shape = 21, color = "black",
               fill = "pink", stroke = 1, size = 3) +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
    facet_wrap(~subject)

d’ estimates: population level

These are the population level effects (called fixed effects in frequentist jargon)

plot(fit_line, "b_measurement")

plot(hypothesis(fit_line,
                "measurement2:offsetnum > measurement1:offsetnum"))

d’ estimates: subject level

We plot the posterior estimtates of d’ for each individual subject (sum of population level and group level effects)

coefs <- purrr::array_tree(coef(object = fit_line,
                      summary = FALSE)$subject[,,],
                  margin = 3)

f <- function(x) {
    x <- as_tibble(x)
    x <- gather(x, key = subject, value = dprime)
    x
}

coefs <- coefs %>% map_df(f, .id = "measurement") %>%
    filter(str_detect(measurement, 'offsetnum')) %>%
    mutate(subject = as_factor(subject),
           measurement = str_extract(measurement, "(\\d)+"),
           measurement = as_factor(measurement))

dprimes <- coefs %>%
    group_by(subject, measurement) %>%
    median_qi(.width = c(.95, .66))

compare_d_primes <- sdt_line %>%
    mutate(point_estimate = dprime) %>%
    select(subject, measurement, point_estimate) %>%
    left_join(dprimes)

p_dprime <- coefs %>%
    ggplot(aes(y = measurement, x = dprime)) +
    geom_vline(xintercept = 0, linetype = 2, color = "grey",
               size = 1, alpha = 0.9) +
    geom_halfeyeh(fill = "steelblue4", alpha = 0.4) +
    geom_point(aes(x = point_estimate),
               data = compare_d_primes,
               color = 'red') +
    facet_wrap(~subject) +
    theme_tidybayes()

For comparison, the red data points represent the d’ values compputed from the data.

Attentional lapses and guessing

We could extend this by implementing a non-linear model that can account for attentional lapses and guessing. This is equivalent to a binomial mixture model, in which subjects lapse with probability \(\lambda\) and pay attention with probability \(1-\lambda\). If they lapse, they have to guess, otherwise they respond as in the previous model.

BF <- bf(sayright ~ guess + (1-guess-lapse) * Phi(eta),
         eta ~ 0 + measurement:offsetnum +
             (0 + measurement:offsetnum |s| subject),
         guess ~ 1,
         lapse ~  (1 |s| subject),
         family = bernoulli(link="identity"),
         nl = TRUE)

prior_guessing <- c(
    prior(normal(1, 1), class = "b", nlpar = "eta"),
    prior(beta(1, 1), nlpar = "lapse", lb = 0, ub = 1),
    prior(beta(1, 1), nlpar = "guess", lb = 0, ub = 1))

fit_dprime_guessing <- brm(BF,
                           data = PilotData,
                           control = list(adapt_delta = .99),
                           inits = 0,
                           prior = prior_guessing,
                           file = here::here("models/fit_dprime-guessing"))

The model can also be implemented as a mixture:

mix <- mixture(bernoulli("probit"), bernoulli("probit"))
p_mix <- c(
    prior(normal(0, 0.1), class = Intercept, dpar = mu1),
    prior(normal(0, 1), class = "b", dpar = mu2))

fit_mix <- brm(bf(y ~ 1, mu1 ~ 1, mu2 ~ x),
               data = PilotData,
               family = mix,
               prior = p_mix)

Predict new subjects

Finally, we can predict data for new subjects (since we have a multilevel model, and the estimated variance of the subjects. In other words, each subject’s parameters are drawn from population level distributions).

preds_new_subjects <- fit_line$data %>%
    data_grid(subject = str_c("subject_", 1:6),
              measurement,
              offsetnum) %>%
    add_predicted_draws(model = fit_line,
                        allow_new_levels = TRUE,
                        n = 100)

# preds_new_subjects_2 <- fit_dprime_c$data %>%
#     data_grid(measurement, offsetnum) %>%
#     add_predicted_draws(model = fit_dprime_c,
#                         allow_new_levels = TRUE,
#                         re_formula = NULL,
#                         # re_formula = ~ (1|subject),
#                         = 4010


sdt_pred_new <- preds_new_subjects %>%
    group_by(subject, measurement, .draw) %>%
    mutate(type = "hit",
           type = ifelse(offsetnum > 0 & .prediction == 0, "miss", type),
           type = ifelse(offsetnum < 0 & .prediction == 0, "cr", type),
           type = ifelse(offsetnum < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_pred_new %>%
    group_by(subject, measurement, type) %>%
    summarise(count = n()) %>%
    spread(type, count) -> sdt_pred_new

sdt_pred_new <- sdt_pred_new %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           hit_rate = hit / (hit+miss),
           fa_rate = fa / (fa+cr),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_pred_new %>%
    group_by(subject, measurement) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = measurement)) +
    geom_pointinterval(fatten_point = 1,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
    facet_wrap(~subject)

References

DeCarlo, Lawrence T. 1998. “Signal Detection Theory and Generalized Linear Models.” Psychological Methods 3 (2): 186–205.

Knoblauch, Kenneth, and Laurence T. Maloney. 2012. Modeling Psychophysical Data in R. Use R! New York: Springer-Verlag.