How to Fit Signal Detection Models as Bayesian Generalized Linear Models

Andrew Ellis

2019-03-27

Contents

Main points

GLM

The GLM generalizes a regression model by allowing the linear predictor to be related to the response variable via a link function.

We have to the following components:

Probit model

The most common model for binary responses is the logistic model. The probit model is related, but uses a different link function. The probit function is the quantile function associated with the normal distribution. The quantile function is also the inverse cumulative distribution function:

\[probit(x) = \Phi^{-1}(x) \]

and looks like this:

For reference, this is the cumulative distribution function:

As a GLM (with a single predictor \(X\)):

\[ P(y = 1 | X) = \Phi(\alpha + \beta \cdot X) \] or more conventionally:

\[ probit(P(y = 1 | X)) = \alpha + \beta \cdot X \]

or \[ \Phi^{-1}(P(y = 1 | X)) = \alpha + \beta \cdot X \]

Generally:

\[ g(E[P(y = 1 | X)]) = \alpha + \beta \cdot X \]

where \(g()\) is the link function.

Summary:

Signal detection theory

We consider an experiment in which a person has to classify a stimulus into one of two possible categories:

We can neglect the underlying task, as the math is the same. In the general case, say we present two stimulus categories A and B, that vary along some dimension. The task of the subject in our experiment is to perform a binary classification with the response options A and B. The subject’s performance can be summarised in a classification table, with four possible outcomes:

Given the stimulus, the subject has two response options. Therefore, we consider only the A responses when the stimulus is A (hits) or B (false alarms).

Example: familiarity. When the subject is shown an image, this evokes a feeling of ’familiarity`. This is a latent construct.

SDT does not require any particular distributions, but in practise, Gaussians are often chosen.

Thought experiment: you are a subject in a memory experiment. You were previously shown a number of images, and now you are presented with a mixture of old and new items, and have to say whether you have previously seen the test image.

This can be formulated as the following statistical problem:

  1. You are given a random variable \(X\), i.e. a draw from a normal distribution with a known standard deviation. You also know that distribution can have either of two known means, you just don’t know which one. The two distributions differ only in their mean, and the difference in means is called d'.

  2. You are asked to say which distribution that \(X\) is most likely to have come from. This is a decision, so you need some sort of decision rule. In this case you can choose a criterion, and compare \(X\) to this.

  3. You will produce four types of responses: you will either correctly classifiy the presented stimulus, or its internal representation \(X\), as either old or new. You will do this correctly (hits / correct rejections), or you will produce a missclassification (false alarms / misses).

  4. From your behavioural data, the number of hits and false alarms, we want to estimate your hit rate and false alarm rate, and then compute your internal (latent) quantities that guided your behaviour.

The internal signal evoked by old and new items is often shown like this:

New items produce less familiarity than old items, but the internal representation is noisy.

In order to classify the presented stimulus, based on the evoked familiarity (decision variable), we need a decision rule:

A simple rule is to compare the signal with a criterion \(c\). If the signal \(X > c\), then respond old (“I have previously seen it”), otherwise respond new (“haven’t seen it before”).

Signal detection parameters

The commonly known SDT parameters are \(d'\) and and \(c\).

\[ d' = c - \phi^{-1}(1-p_{H}) = \phi^{-1}(p_{H}) - \phi^{-1}(p_{FA}) \] which can also be written as: \[ d' = \phi^{-1}(P(y = 1 | old)) - \phi^{-1}(P(y = 1 | new)) \]

The expression for \(d'\)

Better: distance to optimal decision boundary \[ c' = -\frac{1}{2} \left[\phi^{-1}(p_{H}) + \phi^{-1}(p_{FA})\right] \]

What we are doing here is estimating the hit rate and false alarm rate from observed hits and false alarms using maximum likelihood estimation, and then computing d’ and c from these estimated probabilities.

We can also write this the other way round:

When the stimulus is new, we will produce false alarms with probability:

\[ p_{FA} = P(y = 1 | X = 0) = 1 - \Phi(c) \]

When the stimulus is old, we will produce hits with probability: \[ p_{H} = P(y = 1 | X=1) = 1 - \Phi(c-d') \]

\[ P(y = 1 | X = x) = 1 - \Phi(c-d'X) = \Phi(-c + d'X) \] where \(X\) is an indicator variable, i.e. takes the value 1 for old and 0 for new.

This produces the probability of giving an old response, given the stimulus. If the stimulus is old, this is the probability of a hit, if the stimulus is new, this is the probability of a false alarm.

Compare the signal detection model

\[ P(y = 1 | X) = \Phi(-c + d'X) \]

to a GLM: \[ P(y = 1 | X) = \Phi(\alpha + \beta \cdot X) \]

Memory experiment: single subject

Let’s look at an example (borrowing heavily from this blog post).

The data are from a recognition memory experiment:

library(sdtalt)
library(tidyverse)
library(brms)
library(tidybayes)

data(confcontr)

confcontr <- as_tibble(confcontr) %>% 
  mutate(subno = as_factor(subno),
         item = isold - 0.5)
confcontr

First we classify each response as hit, miss, correct rejection (cr) or false alarm (fa):

sdt <- confcontr %>% 
  mutate(type = "hit",
         type = ifelse(isold==1 & sayold==0, "miss", type),
         type = ifelse(isold==0 & sayold==0, "cr", type),  
         type = ifelse(isold==0 & sayold==1, "fa", type))

And then count the number of hits, etc.

sdt <- sdt %>% 
  group_by(subno, type) %>% 
  summarise(count = n()) %>% 
  spread(type, count)
sdt

Next, we estimate the hit and false alarm rates, based on the observed number of hits and false alarms, and compute \(d'\), \(c\) and \(bias\) using the formulae given above.

sdt <- sdt %>% 
  mutate(hr = hit / (hit+miss),
         far = fa / (fa+cr),
         zhr = qnorm(hr),
         zfa = qnorm(far),
         dprime = zhr-zfa,
         c = -zfa,
         bias = -0.5*(zhr+zfa),
         model = "no pooling")
sdt %>% 
  select(subno, hr, far, zhr, zfa, dprime, c, bias)

For simplicity, we first look at the data from subject 53 only:

sdt %>% 
  filter(subno ==53) %>% 
  select(subno, hr, far, zhr, zfa, dprime, c, bias)

A (standard) GLM will give us

subno53 <- confcontr %>% 
  filter(subno == 53)

fit_glm_53 <- glm(sayold ~ isold, 
                  family = binomial(link = "probit"),
                  data = subno53)
summary(fit_glm_53)
## 
## Call:
## glm(formula = sayold ~ isold, family = binomial(link = "probit"), 
##     data = subno53)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2322  -0.9734  -0.9734   1.1236   1.3961  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.3124     0.1752  -1.783   0.0746 .
## isold         0.3925     0.2534   1.549   0.1214  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.63  on 99  degrees of freedom
## Residual deviance: 135.22  on 98  degrees of freedom
## AIC: 139.22
## 
## Number of Fisher Scoring iterations: 4

We can use Bayesian inference for the same model:

fit_53 <- brm(sayold ~ isold, 
              family = bernoulli(link="probit"), 
              data = subno53,
              cores = parallel::detectCores(),
              file = here::here("models/sdtmodel-1_1"))
summary(fit_53)
##  Family: bernoulli 
##   Links: mu = probit 
## Formula: sayold ~ isold 
##    Data: subno53 (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.32      0.18    -0.68     0.02       3318 1.00
## isold         0.40      0.25    -0.10     0.88       3432 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
fit_53 %>% 
  gather_draws(b_Intercept, b_isold) %>%
  ggplot(aes(y = .variable, x = .value)) +
  geom_halfeyeh()

But what if we would rather obtain an estimate for the bias, rather than c?

We can use a special effect coding: the stimulus item is coded as 1/2 if it is old, and -1/2 if it is new. We are then either adding \(\frac{1}{2}d'\) to the intercept or or subtracting \(-\frac{1}{2}d'\) from the intercept, depending on the stimulus, and the intercept will give us the negative bias (difference between actual and optimal decision boundary).

\[ P(y = 1 | X) = \Phi(-c + d'X) \]

Using a standard GLM:

fit_glm_53_alt <- glm(sayold ~ item, 
                  family = binomial(link = "probit"),
                  data = subno53)
summary(fit_glm_53_alt)
## 
## Call:
## glm(formula = sayold ~ item, family = binomial(link = "probit"), 
##     data = subno53)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2322  -0.9734  -0.9734   1.1236   1.3961  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.1162     0.1267  -0.917    0.359
## item          0.3925     0.2534   1.549    0.121
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 137.63  on 99  degrees of freedom
## Residual deviance: 135.22  on 98  degrees of freedom
## AIC: 139.22
## 
## Number of Fisher Scoring iterations: 4

Or a Bayesian GLM:

fit_53_alt <- brm(sayold ~ item, 
              family = bernoulli(link="probit"), 
              data = subno53,
              cores = parallel::detectCores(),
              file = here::here("models/sdtmodel-1_1_alt"))
summary(fit_53_alt)
##  Family: bernoulli 
##   Links: mu = probit 
## Formula: sayold ~ item 
##    Data: filter(confcontr, subno == 53) (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.12      0.13    -0.36     0.14       3708 1.00
## item          0.40      0.25    -0.08     0.89       3574 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
fit_53_alt %>% 
  gather_draws(b_Intercept, b_item) %>%
  ggplot(aes(y = .variable, x = .value)) +
  geom_halfeyeh()

Another trick is to use a non-linear model in brms:

m53_nl <- bf(sayold ~ Phi(dprime*item - c), 
         dprime ~ 1, c ~ 1, 
         nl = TRUE)

Priors <- c(prior(normal(.5, 1.5), nlpar = "dprime"),
            prior(normal(0, 1.5), nlpar = "c"))

fit_53_nl <- brm(m53_nl, 
            family = bernoulli(link="identity"), 
            data = subno53,
            prior = Priors,
            control = list(adapt_delta = .99),
            cores = parallel::detectCores(),
            file = here::here("models/sdtmodel-1_2"))
summary(fit_53_nl)
##  Family: bernoulli 
##   Links: mu = identity 
## Formula: sayold ~ Phi(dprime * item - c) 
##          dprime ~ 1
##          c ~ 1
##    Data: subno53 (Number of observations: 100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dprime_Intercept     0.39      0.25    -0.08     0.86       2170 1.00
## c_Intercept          0.11      0.13    -0.13     0.35       2356 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).
fit_53_nl %>% 
  gather_draws(b_c_Intercept, b_dprime_Intercept) %>%
  ggplot(aes(y = .variable, x = .value)) +
  geom_halfeyeh()

Memory experiment: multilevel model

Traditionally, we would summarise the manually calculated subject-specific point estimates of \(d'\) and \(c\) with their sample means and standard deviations.

sdt_sum <- sdt %>% 
  select(subno, dprime, c, bias) %>%
  gather(parameter, value, -subno) %>%  
  group_by(parameter) %>%  
  summarise(n = n(), 
            mu = mean(value), 
            sd = sd(value), 
            se = sd/sqrt(n))
sdt_sum

Alternatively, we can estimate both population and group level parameters in a multilevel model. Note that I am using effect coding (1/2 and -1/2) instead of dummy coding.

fitglmm <- brm(sayold ~ 1 + item + (1 + item | subno), 
               family = bernoulli(link = "probit"), 
               data = confcontr,
               prior = c(prior(student_t(3, 0, 1), class = sd),
                         prior(lkj(1), class = cor)),
               cores = parallel::detectCores(),
               file = here::here("models/sdtmodel-2_1"))
summary(fitglmm)
##  Family: bernoulli 
##   Links: mu = probit 
## Formula: sayold ~ 1 + item + (1 + item | subno) 
##    Data: confcontr (Number of observations: 3100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~subno (Number of levels: 31) 
##                     Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)           0.21      0.04     0.13     0.30       1953 1.00
## sd(item)                0.40      0.09     0.25     0.59       1999 1.00
## cor(Intercept,item)     0.11      0.26    -0.39     0.59       1791 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.13      0.04    -0.22    -0.04       1703 1.00
## item          1.06      0.09     0.89     1.23       2697 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The following figure shows the population-level parameters (also know to frequentists as fixed effects):

In the next figure, we can see a visualization of the shrinkage of individual, or group level, parameters (only mean is shown) towards to overall mean in the partial pooling model, as compared to the maximum likelihood (no pooling) estimates.

coefs <- coef(fitglmm)

crit <- coefs[[1]][,,1][,1] %>% 
  enframe() %>% 
  select(subno = name,
         crit = value) %>% 
  mutate(crit = -crit)
  
dprime <- coefs[[1]][,,2][,1] %>% 
  enframe() %>% 
  select(subno = name,
         dprime = value)

estimates <- crit %>% 
  left_join(dprime) %>% 
  mutate(model = "partial pooling",
         subno = as_factor(subno)) %>% 
  select(subno, c = crit, dprime, model)

sdt %>% 
  select(subno, c = bias, dprime, model) %>% 
  bind_rows(estimates) %>% 
  mutate(model = as_factor(model)) %>% 
  ggplot(aes(x=dprime, y = c, color =  model)) + 
  geom_hline(yintercept = 0,
             linetype = "dotted",
             alpha = 0.6) +
  geom_line(aes(group = subno), 
            color = "grey60", alpha = 0.2,
            linetype = 1) +
  geom_point(size = 2) +
  scale_color_viridis_d()

Of course, we can use a non-linear model here as well, in order to get a direct estimate of the bias:

glmm2 <- bf(sayold ~ Phi(dprime*item - c), 
            dprime ~ 1 + (1 |s| subno), 
            c ~ 1 + (1 |s| subno), 
            nl = TRUE)

Priors <- c(prior(normal(0, 3), nlpar = "dprime", lb = 0),
            prior(normal(0, 3), nlpar = "c"),
            prior(student_t(10, 0, 1), class = "sd", nlpar = "dprime"),
            prior(student_t(10, 0, 1), class = "sd", nlpar = "c"),
            prior(lkj(4), class = "cor"))

fitglmm2 <- brm(glmm2, 
                family = bernoulli(link = "identity"), 
                data = confcontr,
                prior = Priors,
                control = list(adapt_delta = .99),
                cores = parallel::detectCores(), 
                inits = 0,
                file = here::here("models/sdtmodel-2_2"))
summary(fitglmm2)
##  Family: bernoulli 
##   Links: mu = identity 
## Formula: sayold ~ Phi(dprime * item - c) 
##          dprime ~ 1 + (1 | s | subno)
##          c ~ 1 + (1 | s | subno)
##    Data: confcontr (Number of observations: 3100) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~subno (Number of levels: 31) 
##                                   Estimate Est.Error l-95% CI u-95% CI
## sd(dprime_Intercept)                  0.40      0.08     0.25     0.57
## sd(c_Intercept)                       0.21      0.04     0.14     0.30
## cor(dprime_Intercept,c_Intercept)    -0.07      0.21    -0.47     0.36
##                                   Eff.Sample Rhat
## sd(dprime_Intercept)                    1561 1.00
## sd(c_Intercept)                         1690 1.00
## cor(dprime_Intercept,c_Intercept)       1307 1.00
## 
## Population-Level Effects: 
##                  Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dprime_Intercept     1.06      0.09     0.89     1.23       1458 1.00
## c_Intercept          0.13      0.05     0.04     0.22       1321 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior predictive checks

And finally, because this is always a good idea, we can perform some posterior predictive checks to make sure that the data are well-described by the model.

We simulate 500 data sets from the posterior predictive distribution, and then compute empirical estimates of \(c\) and \(d'\).

preds_fitglmm <- fitglmm$data %>%
    select(subno, item) %>%
    add_predicted_draws(model = fitglmm, n = 500)

sdt_pred <- preds_fitglmm  %>%
    group_by(subno, .draw) %>%
    mutate(type = "hit",
           type = ifelse(item > 0 & .prediction == 0, "miss", type),
           type = ifelse(item < 0 & .prediction == 0, "cr", type),
           type = ifelse(item < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_pred <- sdt_pred %>%
    group_by(subno, .draw, type) %>%
    summarise(count = n()) %>%
    spread(type, count)

sdt_pred <- sdt_pred %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_pred %>%
    group_by(subno) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = subno)) +
    geom_pointinterval(fatten_point = 2,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_point(aes(y = dprime, x = subno),
               data = sdt,
               shape = 21, color = "black",
               fill = "pink", stroke = 1, size = 3) +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
  theme_tidybayes()

Line bisection

We have the following binary data from a line bisection task. Each of 6 subjects performed the task at several time points.

A common approach is to compute d’ for each subject in each session, as a measure of performance.

Bayesian multilevel probit regression model

In order to obtain d’ as a parameter, we use an effect coding for the stimulus:

PilotData %>%
  select(subject, measurement, offset, offsetnum, sayright) %>%
  head() %>%
  knitr::kable()
subject measurement offset offsetnum sayright
ALPE94 1 left -0.5 0
ALPE94 1 right 0.5 1
ALPE94 1 left -0.5 0
ALPE94 1 right 0.5 1
ALPE94 1 left -0.5 1
ALPE94 1 left -0.5 0
fit_line <- brm(sayright ~ 0 + measurement + measurement:offsetnum +
                      (0 + measurement + measurement:offsetnum | subject),
                  family = bernoulli("probit"),
                  data = PilotData,
                  control = list(adapt_delta = .99),
                  prior = c(prior(normal(1, 1), class = b),
                            prior(student_t(3, 0, 1), class = sd)),
                  inits = 0,
                  file = here::here("models/fit-line_bisection"))

Posterior predictive check

preds_dprime_c <- fit_line$data %>%
    select(subject, measurement, offsetnum) %>%
    add_predicted_draws(model = fit_line, n = 500)

sdt_line_pred <- preds_dprime_c %>%
    group_by(subject, measurement, .draw) %>%
    mutate(type = "hit",
           type = ifelse(offsetnum > 0 & .prediction == 0, "miss", type),
           type = ifelse(offsetnum < 0 & .prediction == 0, "cr", type),
           type = ifelse(offsetnum < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_line_pred <- sdt_line_pred %>%
    group_by(subject, measurement, .draw, type) %>%
    summarise(count = n()) %>%
    spread(type, count)

sdt_line_pred <- sdt_line_pred %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_line_pred %>%
    group_by(subject, measurement) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = measurement)) +
    geom_pointinterval(fatten_point = 2,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_point(aes(y = dprime, x = measurement),
               data = sdt_line,
               shape = 21, color = "black",
               fill = "pink", stroke = 1, size = 3) +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
    facet_wrap(~subject)

d’ estimates: population level

plot(fit_line, "b_measurement")

plot(hypothesis(fit_line,
                "measurement2:offsetnum > measurement1:offsetnum"))

d’ estimates: subject level

coefs <- purrr::array_tree(coef(object = fit_line,
                      summary = FALSE)$subject[,,],
                  margin = 3)

f <- function(x) {
    x <- as_tibble(x)
    x <- gather(x, key = subject, value = dprime)
    x
}

coefs <- coefs %>% map_df(f, .id = "measurement") %>%
    filter(str_detect(measurement, 'offsetnum')) %>%
    mutate(subject = as_factor(subject),
           measurement = str_extract(measurement, "(\\d)+"),
           measurement = as_factor(measurement))

dprimes <- coefs %>%
    group_by(subject, measurement) %>%
    median_qi(.width = c(.95, .66))

compare_d_primes <- sdt_line %>%
    mutate(point_estimate = dprime) %>%
    select(subject, measurement, point_estimate) %>%
    left_join(dprimes)

p_dprime <- coefs %>%
    ggplot(aes(y = measurement, x = dprime)) +
    geom_vline(xintercept = 0, linetype = 2, color = "grey",
               size = 1, alpha = 0.9) +
    geom_halfeyeh(fill = "steelblue4", alpha = 0.4) +
    geom_point(aes(x = point_estimate),
               data = compare_d_primes,
               color = 'red') +
    facet_wrap(~subject) +
    theme_tidybayes()

For comparison, the red data points represent the d’ values compputed from the data.

Attentional lapses and guessing

We could extend this by implementing a non-linear model that can account for attentional lapses and guessing. This is equivalent to a binomial mixture model, in which subjects lapse with probability \(\lambda\) and pay attention with probability \(1-\lambda\). If they lapse, they have to guess, otherwise they respond as in the previous model.

BF <- bf(sayright ~ guess + (1-guess-lapse) * Phi(eta),
         eta ~ 0 + measurement:offsetnum +
             (0 + measurement:offsetnum |s| subject),
         guess ~ 1,
         lapse ~  (1 |s| subject),
         family = bernoulli(link="identity"),
         nl = TRUE)

prior_guessing <- c(
    prior(normal(1, 1), class = "b", nlpar = "eta"),
    prior(beta(1, 1), nlpar = "lapse", lb = 0, ub = 1),
    prior(beta(1, 1), nlpar = "guess", lb = 0, ub = 1))
fit_dprime_guessing <- brm(BF,
                           data = PilotData,
                           control = list(adapt_delta = .99),
                           inits = 0,
                           prior = prior_guessing,
                           file = here::here("models/fit_dprime-guessing"))

The model can also be implemented as a mixture:

mix <- mixture(bernoulli("probit"), bernoulli("probit"))
p_mix <- c(
    prior(normal(0, 0.1), class = Intercept, dpar = mu1),
    prior(normal(0, 1), class = "b", dpar = mu2))

fit_mix <- brm(bf(y ~ 1, mu1 ~ 1, mu2 ~ x),
               data = PilotData,
               family = mix,
               prior = p_mix)

Predict new subjects

Finally, we can predict data for new subjects (since we have a multilevel model, and the estimated variance of the subjects. In other words, each subject’s parameters are drawn from population level distributions).

preds_new_subjects <- fit_line$data %>%
    data_grid(subject = str_c("subject_", 1:6),
              measurement,
              offsetnum) %>%
    add_predicted_draws(model = fit_line,
                        allow_new_levels = TRUE,
                        n = 100)

# preds_new_subjects_2 <- fit_dprime_c$data %>%
#     data_grid(measurement, offsetnum) %>%
#     add_predicted_draws(model = fit_dprime_c,
#                         allow_new_levels = TRUE,
#                         re_formula = NULL,
#                         # re_formula = ~ (1|subject),
#                         = 4010


sdt_pred_new <- preds_new_subjects %>%
    group_by(subject, measurement, .draw) %>%
    mutate(type = "hit",
           type = ifelse(offsetnum > 0 & .prediction == 0, "miss", type),
           type = ifelse(offsetnum < 0 & .prediction == 0, "cr", type),
           type = ifelse(offsetnum < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_pred_new %>%
    group_by(subject, measurement, type) %>%
    summarise(count = n()) %>%
    spread(type, count) -> sdt_pred_new

sdt_pred_new <- sdt_pred_new %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           hit_rate = hit / (hit+miss),
           fa_rate = fa / (fa+cr),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_pred_new %>%
    group_by(subject, measurement) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = measurement)) +
    geom_pointinterval(fatten_point = 1,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
    facet_wrap(~subject)

References

DeCarlo, Lawrence T. 1998. “Signal Detection Theory and Generalized Linear Models.” Psychological Methods 3 (2): 186–205.

Knoblauch, Kenneth, and Laurence T. Maloney. 2012. Modeling Psychophysical Data in R. Use R! New York: Springer-Verlag.