Beyond Significant

Andrew Ellis

2019-02-20

Load R packages

library(tidyr)
library(stringr)
library(dplyr)
library(forcats)
library(modelr)
library(purrr)
library(ggplot2)
library(ggstance)
library(ggridges)
library(patchwork)
library(rstan)
library(brms)
library(tidybayes)

theme_set(theme_tidybayes())

Overview

Problem: non-significant result

IQ example


The group means are:


Group mean
Placebo 100.36
SmartDrug 101.91


It is obvious that the data contain several ‘outliers’.


Two sample t-test / Welch test

We can perform a two-sample t-test, or a Welch test:

t.test(IQ ~ Group,
       data = TwoGroupIQ,
       var.equal = TRUE,
       alternative = "less")
## 
##  Two Sample t-test
## 
## data:  IQ by Group
## t = -1.5587, df = 87, p-value = 0.06135
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf 0.1037991
## sample estimates:
##   mean in group Placebo mean in group SmartDrug 
##                100.3571                101.9149
t.test(IQ ~ Group,
       data = TwoGroupIQ,
       var.equal = FALSE,
       alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  IQ by Group
## t = -1.6222, df = 63.039, p-value = 0.05488
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf 0.04532157
## sample estimates:
##   mean in group Placebo mean in group SmartDrug 
##                100.3571                101.9149


Problem: Neither yield a significant result. What do we do?

Problems with NHST

The (Bayesian) New Statistics

Bayesian methods

🤗


😧

Quantifying uncertainty


The figure shows a frequentist confidence interval (OLS model) and a Bayesian credible interval. Which one is easier to interpret?

Parameters are random variables


Three steps of Bayesian data analysis

According to Gelman (2014):

  1. Set up probability model (joint probability distribution for observed (\(y\), \(x\)) and latent quantities \(\theta\)).

  2. Condition on observed data: calculate posterior distribution \(P(y | \theta) \cdot p(\theta)\).

  3. Evaluate model and implications of the posterior distribution.
    • How well does the model fit the data?
    • Are the substantive conclusions reasonable?
    • How sensitive are the results to the modelling assumptions?
    • Does the model need to be revised?

Box’s loop

Blei (2014) describes data analysis as an iterative process.



He calls this Box’s loop, a modern interpretation of the perspective of the statistician George Box (Box (1976)).

Bayesian workflow

Posterior evaluation

Model comparison

Case study: Robust t-test

Let’s return to the IQ example.

Bayes factor to the rescue?

We might want to compute a Bayes factor. Can we at least quantify evidence for the null hypothesis?

Not in this case. The data provide little evidence either for or against our hypothesis.

Bayes factor

Let’s have another look at Bayes rule, making the dependency of the parameters \(\mathbf{\theta}\) on the model \(\mathcal{M}\) explicit:

\[ p(\theta | y, \mathcal{M}) = \frac{p(y|\theta, \mathcal{M}) p(\theta | \mathcal{M})}{p(y | \mathcal{M})}\]

where \(\mathcal{M}\) refers to a specific model. The marginal likelihood \(p(y | \mathcal{M})\) now gives the probability of the data, averaged over all possible parameter value under model \(\mathcal{M}\).

Writing out the marginal likelihood \(p(y | \mathcal{M})\): \[ p(y | \mathcal{M}) = \int{p(y | \theta, \mathcal{M}) p(\theta|\mathcal{M})d\theta}\]

we see that this is averaged over all possible values of \(\theta\) that the model will allow.

The important thing to consider here is that the model evidence will depend on what kind of predictions a model can make. This gives us a measure of complexity – a complex model is a model that can make many predictions.

The problem with making many predictions is that most of these predictions will turn out to be false.

The complexity of a model will depend on (among other things):

When a parameters priors are broad (uninformative), those parts of the parameter space where the likelihood is high are assigned low probability. Intuitively, by hedging one’s bets, one assigns low probability to parameter values that make good predictions.

All this leads to the fact that more complex model have comparatively lower marginal likelihood.

Therefore, when we compare models, and we prefer models with higher marginal likelihood, we are using Ockham’s razor in a principled manner.

We can also write Bayes rule applied to a comparison between models (marginalized over all parameters within the model):

\[ p(\mathcal{M}_1 | y) = \frac{P(y | \mathcal{M}_1) p(\mathcal{M}_1)}{p(y)}\]

and

\[ p(\mathcal{M}_2 | y) = \frac{P(y | \mathcal{M}_2) p(\mathcal{M}_2)}{p(y)}\]

This tells us that for model \(\mathcal{M_m}\), the posterior probability of the model is proportional to the marginal likelihood times the prior probability of the model.

Now, one is usually less interested in absolute evidence than in relative evidence; we want to compare the predictive performance of one model over another.

To do this, we simply form the ratio of the model probabilities:

\[ \frac{p(\mathcal{M}_1 | y) = \frac{P(y | \mathcal{M}_1) p(\mathcal{M}_1)}{p(y)}} {p(\mathcal{M}_2 | y) = \frac{P(y | \mathcal{M}_2) p(\mathcal{M}_2)}{p(y)}}\]

The term \(p(y)\) cancels out, giving us: \[ \frac{p(\mathcal{M}_1 | y) = P(y | \mathcal{M}_1) p(\mathcal{M}_1)} {p(\mathcal{M}_2 | y) = P(y | \mathcal{M}_2) p(\mathcal{M}_2)}\]

The ratio \(\frac{p(\mathcal{M}_1)}{p(\mathcal{M}_2)}\) is called the prior odds, and the ratio \(\frac{p(\mathcal{M}_1 | y)}{p(\mathcal{M}_2 | y)}\) is therefore the posterior odds.

We are particulary interested in the ratio of the marginal likelihoods:

\[\frac{P(y | \mathcal{M}_1)}{P(y | \mathcal{M}_2)}\]

This is the Bayes factor, and it can be interpreted as the change from prior odds to posterior odds that is indicated by the data.

If we consider the prior odds to be \(1\), i.e. we do not favour one model over another a priori, then we are only interested in the Bayes factor. We write this as:

\[ BF_{12} = \frac{P(y | \mathcal{M}_1)}{P(y | \mathcal{M}_2)}\]

Here, \(BF_{12}\) indicates the extent to which the data support model \(\mathcal{M}_1\) over model \(\mathcal{M}_2\).

As an example, if we obtain a \(BF_{12} = 5\), this mean that the data are 5 times more likely to have occured under model 1 than under model 2. Conversely, if \(BF_{12} = 0.2\), then the data are 5 times more likely to have occured under model 2.

The following classification is sometimes used, although it is unnessecary.

T-Test as general linear model

In order to go Bayesian, we first formulate our t-test as a GLM.

Assuming equal variances, we can write

\[ Y = \alpha + \beta X + \epsilon\] \[ \epsilon \sim N(0, \sigma^2) \]

where \(X\) is indicates group membership.

levels(TwoGroupIQ$Group)
## [1] "Placebo"   "SmartDrug"


Using R’s formula notation, we can write this as

fit_ols <- lm(IQ ~ Group,
              data = TwoGroupIQ)

(The outcome IQ is modeled as a function of Group)


More formally:

\[ IQ = Placebo + \beta \cdot SmartDrug + \epsilon\]

\[\epsilon \sim N(0, \sigma^2) \]

The \(\beta\) parameter therefore represents the difference between groups.

 summary(fit_ols)
## 
## Call:
## lm(formula = IQ ~ Group, data = TwoGroupIQ)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.9149  -0.9149   0.0851   1.0851  22.0851 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    100.3571     0.7263 138.184   <2e-16 ***
## GroupSmartDrug   1.5578     0.9994   1.559    0.123    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.707 on 87 degrees of freedom
## Multiple R-squared:  0.02717,    Adjusted R-squared:  0.01599 
## F-statistic:  2.43 on 1 and 87 DF,  p-value: 0.1227

Generative model

For a Bayesian model, we need prior distributions on all parameters.

The outcome \(y\) can be modelled as a normally distributed random variable. The normal distribution has two parameters:

Linear regression model as a probabilistic model

Software

Probabilistic programming languages:


R interface / GUI:

Inference using Stan

We can translate the graphical model into a Stan program (this is what brms does).

brms: equal variance model

fit_eqvar <- brm(IQ ~ Group,
                 data = TwoGroupIQ,
                 file = here::here("models/fit_iq-eqvar"))
summary(fit_eqvar)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: IQ ~ Group 
##    Data: TwoGroupIQ (Number of observations: 89) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept        100.35      0.72    98.92   101.74       3865 1.00
## GroupSmartDrug     1.57      0.99    -0.37     3.51       3793 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     4.76      0.36     4.13     5.53       3109 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Estimated parameter

Marginal posterior distribution of the regression parameter \(\beta\)

fit_eqvar %>%
    gather_draws(b_GroupSmartDrug) %>%
    ggplot(aes(y = .variable, x = .value)) +
    geom_halfeyeh(fill = "Steelblue4") +
    geom_vline(xintercept = 0, color = "white", linetype = 1, size = 1) +
    ylab("") +
    xlab("Estimated difference") +
    theme_classic()

Posterior predictions: normal model

We can plot the predicted marginal means and the predictive bands alongside the data

grid <- TwoGroupIQ %>%
    data_grid(Group)

fits_IQ <- grid %>%
    add_fitted_draws(fit_eqvar)

preds_IQ <- grid %>%
    add_predicted_draws(fit_eqvar)

pp_eqvar <- TwoGroupIQ %>%
    ggplot(aes(x = IQ, y = Group)) +
    geom_halfeyeh(aes(x = .value),
                  relative_scale = 0.7,
                  position = position_nudge(y = 0.1),
                  data = fits_IQ,
                  .width = c(.66, .95, 0.99)) +
    stat_intervalh(aes(x = .prediction),
                   data = preds_IQ,
                   .width = c(.66, .95, 0.99)) +
    scale_x_continuous(limits = c(75, 125)) +
    geom_point(data = TwoGroupIQ) +
    scale_color_brewer() +
    labs(title = "Equal variance model predictions")

pp_eqvar

Robust model


We have now have an extra parameter, for which we need a prior. A recommened distribution is the gamma distribution:

brms: robust model

fit_robust_1 <- brm(bf(IQ ~ 0 + Group, sigma ~ Group),
                  family = student,
                  data = TwoGroupIQ,
                  prior = c(set_prior("normal(100, 10)", class = "b"),
                            set_prior("cauchy(0, 1)", class = "b", dpar = "sigma"),
                            set_prior("gamma(2, 0.1)", class = "nu")),
                  cores = parallel::detectCores(),
                  file = here::here("models/fit_iq-robust_1"))
summary(fit_robust_1)
##  Family: student 
##   Links: mu = identity; sigma = log; nu = identity 
## Formula: IQ ~ 0 + Group 
##          sigma ~ Group
##    Data: TwoGroupIQ (Number of observations: 89) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_Intercept          0.02      0.19    -0.36     0.40       4165 1.00
## GroupPlacebo           100.53      0.21   100.12   100.94       4451 1.00
## GroupSmartDrug         101.55      0.36   100.83   102.29       4053 1.00
## sigma_GroupSmartDrug     0.61      0.25     0.12     1.10       3973 1.00
## 
## Family Specific Parameters: 
##    Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## nu     1.84      0.49     1.14     2.94       3310 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior estimates: robust model

fit_robust_1 %>%
    spread_draws(b_GroupSmartDrug, b_GroupPlacebo) %>%
    mutate(`SmartDrug - Placebo` = b_GroupSmartDrug - b_GroupPlacebo) %>%
    gather(variable, difference, `SmartDrug - Placebo`) %>%
    ggplot(aes(y = variable, x = difference)) +
    geom_halfeyeh(fill = "Steelblue4") +
    geom_vline(xintercept = 0, color = "white", linetype = 1, size = 1) +
    ylab("") +
    xlab("Estimated difference") +
    theme_tidybayes()

Posterior predictions: robust model

Model comparison

We can use the loo package to carry out Pareto smoothed importance-sampling leave-one-out cross-validation (PSIS-LOO)

loo(fit_eqvar, fit_robust_1)
##                           LOOIC    SE
## fit_eqvar                539.50 39.67
## fit_robust_1             437.11 27.72
## fit_eqvar - fit_robust_1 102.39 24.71

Bayes factors: Savage Dickey

For the robust model, we can obtain a Bayes factor using either the Savage-Dickey density ratio, or a recently implemented approach called bridge sampling.

fit_robust_bf <- brm(bf(IQ ~ 0 + intercept + Group,
                        sigma ~ Group),
                     family = student,
                     data = TwoGroupIQ,
                     prior = c(set_prior("normal(100, 10)",
                                         class = "b", coef = "intercept"),
                               set_prior("cauchy(0, 0.707)",
                                         class = "b", coef = "GroupSmartDrug"),
                               set_prior("cauchy(0, 1)",
                                         class = "b", dpar = "sigma"),
                               set_prior("gamma(2, 0.1)",
                                         class = "nu")),
                     cores = parallel::detectCores(),
                     sample_prior = TRUE,
                     file = here::here("models/fit_iq-robust_bf"))
BF <- hypothesis(fit_robust_bf,
                hypothesis = 'GroupSmartDrug = 0')
1/BF$hypothesis$Evid.Ratio
## [1] 2.669161

Bayes factors: Bridge sampling

fit_robust_bridge <- brm(bf(IQ ~ 0 + intercept + Group,
                            sigma ~ Group),
                         family = student,
                         data = TwoGroupIQ,
                         prior = c(set_prior("normal(100, 10)",
                                             class = "b", coef = "intercept"),
                                   set_prior("cauchy(0, 0.707)",
                                             class = "b",
                                             coef = "GroupSmartDrug"),
                                   set_prior("cauchy(0, 1)",
                                             class = "b", dpar = "sigma"),
                                   set_prior("gamma(2, 0.1)",
                                             class = "nu")),
                         cores = parallel::detectCores(),
                         save_all_pars = TRUE,
                         file = here::here("models/fit_iq-robust_bridge"))

We have to specify a null model (leaving out the Group variable):

fit_robust_bridge_null <- brm(bf(IQ ~ 0 + intercept,
                                 sigma ~ Group),
                              family = student,
                              data = TwoGroupIQ,
                              prior = c(set_prior("normal(100, 10)",
                                                  class = "b",
                                                  coef = "intercept"),
                                        set_prior("cauchy(0, 1)",
                                                  class = "b",
                                                  dpar = "sigma"),
                                        set_prior("gamma(2, 0.1)",
                                                  class = "nu")),
                              cores = parallel::detectCores(),
                              save_all_pars = TRUE,
                              file = here::here("models/fit_iq-robust_bridge-null"))
BF_bridge <- bayes_factor(fit_robust_bridge, fit_robust_bridge_null)
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
## Iteration: 1
## Iteration: 2
## Iteration: 3
## Iteration: 4
## Iteration: 5
BF_bridge$bf
## [1] 3.22826

Compare OLS and Bayesian estimates

fit_ols <- lm(IQ ~ Group,
              data = TwoGroupIQ)


d <- broom::tidy(fit_ols, conf.int = TRUE) %>%
    mutate(
        low = estimate - std.error,
        high = estimate + std.error
    )

linear_results <- fit_ols %>%
    broom::tidy(conf.int = TRUE) %>%
    filter(term == "GroupSmartDrug") %>%
    mutate(model = "OLS")


bayes_results <- fit_robust_1 %>%
    spread_draws(b_GroupSmartDrug, b_GroupPlacebo) %>%
    mutate(GroupSmartDrug = b_GroupSmartDrug - b_GroupPlacebo) %>%
    median_qi(GroupSmartDrug) %>%
    gather(term, estimate, GroupSmartDrug) %>%
    to_broom_names() %>%
    mutate(model = "Robust Bayesian model")


bind_rows(linear_results, bayes_results) %>%
    mutate(term = term) %>%
    ggplot(aes(y = term,
               x = estimate,
               xmin = conf.low,
               xmax = conf.high,
               color = model)) +
    geom_pointrangeh(position = position_dodgev(height = .3)) +
    geom_vline(xintercept = 0, linetype = 3,
               color = "grey", alpha = 0.8) +
    scale_color_viridis_d(option = "B", direction = -1,
                          begin = 1/3, end = 2/3) +
    scale_y_discrete(breaks = NULL) +
    ylab("Comparison: SmartDrug - Placebo")

Is this b-hacking?

Multilevel model

set.seed(5)

n_obs <- 30
n_condition <- 5
ABC <- tibble(condition = rep(LETTERS[1:n_condition], n_obs),
                  response = rnorm(n_obs * n_condition,
                                   mean = c(-2, -1, 0, 1, 2),
                                   sd = 0.5))
knitr::kable(head(ABC))
condition response
A -2.4204277
B -0.3078203
C -0.6277459
D 1.0350714
E 2.8557204
A -2.3014540
ABC %>%
    ggplot(aes(y = condition, x = response, color = condition)) +
    geom_point() +
    scale_color_viridis_d(option = "B", begin = 1/3, end = 2/3)

No pooling

fit_abc_np <- brm(response ~ 0 + condition,
              data = ABC,
              # control = list(adapt_delta = .99),
              prior = c(prior(normal(0, 10), class = b),
                        prior(student_t(3, 0, 1), class = sigma)),
              file = here::here("models/fit_abc_no-pooling"))
summary(fit_abc_np)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: response ~ 0 + condition 
##    Data: ABC (Number of observations: 150) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##            Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## conditionA    -1.95      0.10    -2.15    -1.76       5375 1.00
## conditionB    -1.11      0.09    -1.29    -0.93       5449 1.00
## conditionC     0.07      0.09    -0.11     0.26       5613 1.00
## conditionD     1.00      0.10     0.80     1.20       4698 1.00
## conditionE     1.97      0.10     1.78     2.15       4198 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.53      0.03     0.47     0.59       4678 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Partial pooling

fit_abc_pp <- brm(response ~ 1 + (1|condition),
              data = ABC,
              # control = list(adapt_delta = .99),
              prior = c(prior(normal(0, 1), class = Intercept),
                        prior(student_t(3, 0, 1), class = sd),
                        prior(student_t(3, 0, 1), class = sigma)),
              file = here::here("models/fit_abc_partial-pooling"))
summary(fit_abc_pp)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: response ~ 1 + (1 | condition) 
##    Data: ABC (Number of observations: 150) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Group-Level Effects: 
## ~condition (Number of levels: 5) 
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(Intercept)     1.58      0.51     0.88     2.83        872 1.00
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept    -0.03      0.58    -1.22     1.13        984 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.53      0.03     0.47     0.59       2596 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Fit is the same

loo(fit_abc_pp, fit_abc_np)
##                          LOOIC    SE
## fit_abc_pp              237.36 14.76
## fit_abc_np              237.45 14.90
## fit_abc_pp - fit_abc_np  -0.09  0.34
pp <- fit_abc_pp$data %>%
    data_grid(condition) %>%
    add_fitted_draws(fit_abc_pp) %>%
    do(data_frame(.value = quantile(.$.value, ppoints(100)))) %>%
    mutate(model = "partial pooling") %>%
    bind_rows()

np <- fit_abc_np$data %>%
    data_grid(condition) %>%
    add_fitted_draws(fit_abc_np) %>%
    do(data_frame(.value = quantile(.$.value, ppoints(100)))) %>%
    mutate(model = "no pooling")

pp_np <- bind_rows(pp, np)

pp_np_summary <- pp_np %>%
    group_by(condition) %>%
    summarise(mean = mean(.value))


pp_np %>%
    ggplot(aes(x = .value, fill = condition)) +
    # geom_dotplot(binwidth = .04) +
    geom_density() +
    # scale_color_viridis_d(option = "A", end = 4/7) +
    geom_vline(aes(xintercept = mean), data = pp_np_summary,
               color = "white", linetype = 1, size = 1,
               alpha = 0.8) +
    scale_fill_viridis_d(option = "A", end = 6/7) +
    facet_wrap(~model, nrow = 2) +
    scale_y_continuous(breaks = NULL)

Signal detection model

We have the following binary data from a line bisection task. Each of 6 subjects performed the task at two time points. We expect that performance decreases from t1 to t2.

A common approach is to compute d’ for each subject in each session, as a measure of performance.

subject measurement cr fa hit miss dprime c
ALPE94 1 21 19 27 13 0.516 -0.196
ALPE94 2 25 15 33 7 1.253 -0.308
ANCH96 1 26 14 26 14 0.771 0.000
ANCH96 2 29 11 32 8 1.439 -0.122
BRMA88 1 33 7 22 18 1.060 0.404
BRMA88 2 31 9 29 11 1.353 0.079
BRMA96 1 31 9 26 14 1.141 0.185
BRMA96 2 30 10 22 18 0.800 0.274
REER95 1 25 15 31 9 1.074 -0.218
REER95 2 25 15 31 9 1.074 -0.218
URST96 1 30 10 27 13 1.128 0.110
URST96 2 28 12 30 10 1.199 -0.075

Bayesian multilevel probit regression model

In order to obtain d’ as a parameter, we use an efect coding for the stimulus:

PilotData %>%
  select(subject, measurement, offset, offsetnum, sayright) %>%
  head() %>%
  knitr::kable()
subject measurement offset offsetnum sayright
ALPE94 1 left -0.5 0
ALPE94 1 right 0.5 1
ALPE94 1 left -0.5 0
ALPE94 1 right 0.5 1
ALPE94 1 left -0.5 1
ALPE94 1 left -0.5 0
fit_dprime_c <- brm(sayright ~ 0 + measurement + measurement:offsetnum +
                      (0 + measurement + measurement:offsetnum | subject),
                  family = bernoulli("probit"),
                  data = PilotData,
                  control = list(adapt_delta = .99),
                  prior = c(prior(normal(1, 1), class = b),
                            prior(student_t(3, 0, 1), class = sd)),
                  inits = 0,
                  file = here::here("models/fit_dprime-c"))

Posterior predictive check

preds_dprime_c <- fit_dprime_c$data %>%
    select(subject, measurement, offsetnum) %>%
    add_predicted_draws(model = fit_dprime_c, n = 500)

sdt_pred <- preds_dprime_c %>%
    group_by(subject, measurement, .draw) %>%
    mutate(type = "hit",
           type = ifelse(offsetnum > 0 & .prediction == 0, "miss", type),
           type = ifelse(offsetnum < 0 & .prediction == 0, "cr", type),
           type = ifelse(offsetnum < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_pred <- sdt_pred %>%
    group_by(subject, measurement, .draw, type) %>%
    summarise(count = n()) %>%
    spread(type, count)

sdt_pred <- sdt_pred %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_pred %>%
    group_by(subject, measurement) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = measurement)) +
    geom_pointinterval(fatten_point = 2,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_point(aes(y = dprime, x = measurement),
               data = sdt,
               shape = 21, color = "black",
               fill = "pink", stroke = 1, size = 3) +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
    facet_wrap(~subject)

d’ estimates: population level

plot(fit_dprime_c, "b_measurement")

plot(hypothesis(fit_dprime_c,
                "measurement2:offsetnum > measurement1:offsetnum"))

d’ estimates: subject level

coefs <- purrr::array_tree(coef(object = fit_dprime_c,
                      summary = FALSE)$subject[,,],
                  margin = 3)

f <- function(x) {
    x <- as_tibble(x)
    x <- gather(x, key = subject, value = dprime)
    x
}

coefs <- coefs %>% map_df(f, .id = "measurement") %>%
    filter(str_detect(measurement, 'offsetnum')) %>%
    mutate(subject = as_factor(subject),
           measurement = str_extract(measurement, "(\\d)+"),
           measurement = as_factor(measurement))

dprimes <- coefs %>%
    group_by(subject, measurement) %>%
    median_qi(.width = c(.95, .66))

compare_d_primes <- sdt %>%
    mutate(point_estimate = dprime) %>%
    select(subject, measurement, point_estimate) %>%
    left_join(dprimes)

p_dprime <- coefs %>%
    ggplot(aes(y = measurement, x = dprime)) +
    geom_vline(xintercept = 0, linetype = 2, color = "grey",
               size = 1, alpha = 0.9) +
    geom_halfeyeh(fill = "steelblue4", alpha = 0.4) +
    geom_point(aes(x = point_estimate),
               data = compare_d_primes,
               color = 'red') +
    facet_wrap(~subject) +
    theme_tidybayes()

For comparison, the red data points represent the d’ values compputed from the data.

Attentional lapses and guessing

We could extend this by implementing a non-linear model that can account for attentional lapses and guessing (Kuss, Jäkel, and Wichmann 2005). This is equivalent to a binomial mixture model, in which subjects lapse with probability \(\lambda\) and pay attention with probability \(1-\lambda\). If they lapse, they have to guess, otherwise they respond as in the previous model.

BF <- bf(sayright ~ guess + (1-guess-lapse) * Phi(eta),
         eta ~ 0 + measurement:offsetnum +
             (0 + measurement:offsetnum |s| subject),
         guess ~ 1,
         lapse ~  (1 |s| subject),
         family = bernoulli(link="identity"),
         nl = TRUE)

prior_guessing <- c(
    prior(normal(1, 1), class = "b", nlpar = "eta"),
    prior(beta(1, 1), nlpar = "lapse", lb = 0, ub = 1),
    prior(beta(1, 1), nlpar = "guess", lb = 0, ub = 1))
fit_dprime_guessing <- brm(BF,
                           data = PilotData,
                           control = list(adapt_delta = .99),
                           inits = 0,
                           prior = prior_guessing,
                           file = here::here("models/fit_dprime-guessing"))

The model can also be implemented as a mixture:

mix <- mixture(bernoulli("probit"), bernoulli("probit"))
p_mix <- c(
    prior(normal(0, 0.1), class = Intercept, dpar = mu1),
    prior(normal(0, 1), class = "b", dpar = mu2))

fit_mix <- brm(bf(y ~ 1, mu1 ~ 1, mu2 ~ x),
               data = PilotData,
               family = mix,
               prior = p_mix)
loo(fit_dprime_guessing, fit_dprime_c)
##                                      LOOIC    SE
## fit_dprime_guessing                1178.93 23.91
## fit_dprime_c                       1179.04 24.90
## fit_dprime_guessing - fit_dprime_c   -0.11  4.15

A model comparison shows that lapse model’s predictive accuracy is no better than the model without lapses and guessing.

Predict new subjects

Finally, we can predict data for new subjects (since we have a multilevel model, and the estimated variance of the subjects. In other words, each subject’s parameters are drawn from population level distributions).

preds_new_subjects <- fit_dprime_c$data %>%
    data_grid(subject = str_c("subject_", 1:6),
              measurement,
              offsetnum) %>%
    add_predicted_draws(model = fit_dprime_c,
                        allow_new_levels = TRUE,
                        n = 100)

sdt_pred_new <- preds_new_subjects %>%
    group_by(subject, measurement, .draw) %>%
    mutate(type = "hit",
           type = ifelse(offsetnum > 0 & .prediction == 0, "miss", type),
           type = ifelse(offsetnum < 0 & .prediction == 0, "cr", type),
           type = ifelse(offsetnum < 0 & .prediction == 1, "fa", type)) %>%
    ungroup()

sdt_pred_new <- sdt_pred_new %>%
    group_by(subject, measurement, type) %>%
    summarise(count = n()) %>%
    spread(type, count)

sdt_pred_new <- sdt_pred_new %>%
    mutate(zhr = qnorm(hit / (hit+miss)),
           zfa = qnorm(fa / (fa+cr)),
           hit_rate = hit / (hit+miss),
           fa_rate = fa / (fa+cr),
           dprime = zhr-zfa,
           crit = -zfa,
           c = -0.5*(zhr+zfa),
           beta = exp(0.5 * (zfa^2 - zhr^2))) %>%
    mutate_if(is.numeric, round, 3)


sdt_pred_new %>%
    group_by(subject, measurement) %>%
    median_qi(dprime, .width = c(.80, .95), na.rm = TRUE) %>%
    ggplot(aes(y = dprime, x = measurement)) +
    geom_pointinterval(fatten_point = 1,
                       size_range = c(0.6, 2),
                       color = "black") +
    geom_hline(yintercept = 0, linetype = 3, color = "grey", alpha = 0.9) +
    facet_wrap(~subject)

Further reading

References

Blei, David M. 2014. “Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models.” Annual Review of Statistics and Its Application 1 (1): 203–32. https://doi.org/10.1146/annurev-statistics-022513-115657.

Box, George E. P. 1976. “Science and Statistics.” Journal of the American Statistical Association 71 (356): 791–99. https://doi.org/10.1080/01621459.1976.10480949.

Cumming, Geoff. 2014. “The New Statistics: Why and How.” Psychological Science 25 (1): 7–29. https://doi.org/10.1177/0956797613504966.

DeCarlo, Lawrence T. 1998. “Signal Detection Theory and Generalized Linear Models.” Psychological Methods 3 (2): 186–205.

Gelman, Andrew. 2014. Bayesian Data Analysis. Third edition. Chapman & Hall/CRC Texts in Statistical Science. Boca Raton: CRC Press.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. https://doi.org/10.1017/CBO9780511790942.

Gigerenzer, Gerd. 2004. “Mindless Statistics.” The Journal of Socio-Economics 33 (5): 587–606. https://doi.org/10.1016/j.socec.2004.09.033.

———. 2018. “Statistical Rituals: The Replication Delusion and How We Got There.” Advances in Methods and Practices in Psychological Science 1 (2): 198–218. https://doi.org/10.1177/2515245918771329.

Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.

Knoblauch, Kenneth, and Laurence T. Maloney. 2012. Modeling Psychophysical Data in R. Use R! New York: Springer-Verlag.

Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.

Kruschke, John K., and Torrin M. Liddell. 2018. “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective.” Psychonomic Bulletin & Review 25 (1): 178–206. https://doi.org/10.3758/s13423-016-1221-4.

Kuss, Malte, Frank Jäkel, and Felix A. Wichmann. 2005. “Bayesian Inference for Psychometric Functions.” Journal of Vision 5 (5): 8–8. https://doi.org/10.1167/5.5.8.

Lazar, Nicole A. 2016. “The ASA’s Statement on P-Values: Context, Process, and Purpose AU - Wasserstein, Ronald L.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. 1st ed. Cambridge ; New York: Cambridge University Press. https://doi.org/10.1017/CBO9781139087759.