General overview

Binomial GLMM on participants’ image-classification responses (N = 10,368 trials from 54 participants). Model: response_binary ~ image_type * sound_condition * PAS + display_time + (1 | id) with logit link.

The three-way interaction was not significant (Table 1), so the below focuses on the two-way image_type × sound_condition interaction at each level of subjective awareness (PAS).


Table 1: Omnibus (Type III Wald) tests

Table 1. Type III Wald χ² tests for the full model
Binomial GLMM, N = 10,368 trials, 54 participants
Effect χ² df p
image_type 620.21 1 < .001 ***
sound_condition 9.36 2 0.009 **
PAS 185.75 3 < .001 ***
display_time 13.92 6 0.031 *
image_type:sound_condition 9.56 2 0.008 **
image_type:PAS 501.43 3 < .001 ***
sound_condition:PAS 7.16 6 0.307 NA
image_type:sound_condition:PAS 9.75 6 0.136 NA
Model: response_binary ~ image_type × sound_condition × PAS + display_time + (1 | id).
* p < .05, ** p < .01, *** p < .001.

Conclusions. Strong main effects of image type, PAS and sound condition, plus meaningful two-way interactions image_type:sound_condition (p = .008) and image_type:PAS (p < .001). The three-way interaction is not significant (p = .136); we can interpret the sound × image effect without invoking a PAS-level moderator.


Figure 1: Predicted probability of salty response by type, sound, PAS


Table 2: Sound modulation of image discrimination at each PAS level

Difference-of-differences on the probability scale. A positive estimate means that the musical condition increased the gap between responses to salty vs. sweet images (i.e., improved discrimination) relative to the comparison condition.

Table 2. Interaction contrasts: sound modulation of image discrimination
Difference-of-differences (probability scale), by PAS level
Contrast Δ SE CI lower CI upper z p
1 sweetmusic - silent −0.019 0.048 −0.112 0.074 −0.400 0.689 NA
saltymusic - silent −0.024 0.048 −0.119 0.070 −0.502 0.615 NA
saltymusic - sweetmusic −0.005 0.048 −0.099 0.088 −0.108 0.914 NA
2 sweetmusic - silent 0.120 0.060 0.002 0.238 1.987 0.047 *
saltymusic - silent 0.026 0.063 −0.097 0.149 0.413 0.680 NA
saltymusic - sweetmusic −0.094 0.062 −0.216 0.029 −1.502 0.133 NA
3 sweetmusic - silent 0.020 0.032 −0.043 0.083 0.629 0.529 NA
saltymusic - silent −0.068 0.036 −0.139 0.003 −1.870 0.061 NA
saltymusic - sweetmusic −0.088 0.035 −0.156 −0.020 −2.532 0.011 *
4 sweetmusic - silent 0.003 0.007 −0.011 0.017 0.454 0.650 NA
saltymusic - silent −0.007 0.008 −0.022 0.009 −0.879 0.379 NA
saltymusic - sweetmusic −0.010 0.008 −0.025 0.005 −1.333 0.183 NA
Positive Δ = music enhanced image discrimination (salty–sweet gap) relative to comparison.
p-values uncorrected for multiple comparisons. With 12 contrasts, Bonferroni α ≈ .004; the PAS = 2 and PAS = 3 effects must be suggestive.

Conclusions. Sound has no effect when participants report no awareness (PAS 1) or complete awareness (PAS 4). The only signals are in the partial-awareness range: at PAS 2 sweet music increased discrimination relative to silence (Δ = +0.12, p = .047), and at PAS 3 sweet music also outperformed salty music (Δ = −0.09, p = .011).

Figure 2: Sound modulation of image discrimination by PAS level


Table 3: Simple effects: image discrimination within each sound × PAS cell

Table 3. Image discrimination (salty vs. sweet) within each sound condition
Odds ratios and Δ-probability, by PAS level
Sound OR CI lower CI upper z p ΔP
1 silent 1.06 0.81 1.40 0.456 0.648 0.016
sweetmusic 0.99 0.75 1.29 −0.106 0.915 −0.004
saltymusic 0.97 0.74 1.26 −0.255 0.799 −0.009
2 silent 3.00 2.08 4.31 5.913 < .001 0.268
sweetmusic 5.16 3.46 7.69 8.067 < .001 0.388
saltymusic 3.37 2.26 5.02 5.953 < .001 0.294
3 silent 154.79 77.56 308.90 14.302 < .001 0.850
sweetmusic 223.86 105.31 475.85 14.064 < .001 0.870
saltymusic 68.53 38.67 121.45 14.480 < .001 0.783
4 silent 5,991.67 2,638.36 13,606.99 20.785 < .001 0.973
sweetmusic 7,591.89 3,253.00 17,718.06 20.663 < .001 0.976
saltymusic 3,331.80 1,669.54 6,649.07 23.008 < .001 0.966
OR > 1 → higher probability of ‘salty’ response for salty images (correct discrimination).
ΔP = difference in predicted probability (salty − sweet image).
At PAS 1 discrimination is near chance (ΔP ≈ 0); at PAS 4 discrimination is at ceiling in every sound condition.

Conclusions. Discrimination climbs with PAS in every sound condition. The huge ORs at PAS 4 reflect near-complete separation (ceiling): ΔP is the more interpretable quantity at that level.

Figure 3: Image discrimninaton by sound and PAS


Table 4: Model-predicted cell means

Predicted probability of a ‘salty’ response, back-transformed from the logit scale and averaged over display-time levels.

Table 4. Predicted probability of a ‘salty’ response
Averaged over display_time, by PAS × image × sound
Image Sound P(salty) SE CI lower CI upper
1 sweet silent 0.424 0.040 0.348 0.505
salty silent 0.440 0.041 0.362 0.520
sweet sweetmusic 0.426 0.040 0.349 0.506
salty sweetmusic 0.422 0.040 0.347 0.501
sweet saltymusic 0.464 0.041 0.385 0.544
salty saltymusic 0.455 0.041 0.377 0.535
2 sweet silent 0.362 0.042 0.285 0.448
salty silent 0.630 0.039 0.550 0.703
sweet sweetmusic 0.280 0.040 0.210 0.364
salty sweetmusic 0.668 0.038 0.589 0.739
sweet saltymusic 0.330 0.043 0.251 0.418
salty saltymusic 0.623 0.041 0.540 0.699
3 sweet silent 0.066 0.018 0.038 0.111
salty silent 0.916 0.019 0.870 0.947
sweet sweetmusic 0.048 0.015 0.025 0.088
salty sweetmusic 0.918 0.018 0.875 0.947
sweet saltymusic 0.093 0.022 0.059 0.146
salty saltymusic 0.876 0.023 0.824 0.914
4 sweet silent 0.009 0.003 0.004 0.018
salty silent 0.982 0.005 0.970 0.989
sweet sweetmusic 0.008 0.003 0.004 0.017
salty sweetmusic 0.984 0.004 0.973 0.991
sweet saltymusic 0.017 0.004 0.010 0.029
salty saltymusic 0.983 0.005 0.971 0.990
Cell shading: pink → ‘sweet’ response bias, blue → ‘salty’ response bias.
At PAS 1 all cells sit near 0.45 (chance), at PAS 4 they saturate near 0 or 1 as expected.

Table 5: Response bias at 0 ms

Table 5. Response bias at 0 ms display time
Raw proportions vs. model-estimated P(salty)
Sound N P(salty) - observed SE P(salty) - model
silent 864 0.471 0.017 0.439
sweetmusic 864 0.459 0.017 0.424
saltymusic 864 0.494 0.017 0.469
At 0 ms no image is shown; image type was randomly assigned.
Observed proportions hover around chance (0.47–0.49); no significant sound-driven response bias.

Conclusions. The sound conditions do not produce a response bias when there is nothing to see. The effects in Table 2 seem to be perceptual enhancements, not response-level priming.


Analysis approach. Classification responses were coded 1 = “salty”, 0 = “sweet” and analysed with a binomial generalized linear mixed-effects model (lme4::glmer, logit link). Fixed effects were image_type, sound_condition (silent / sweet music / salty music), PAS (ordered factor, levels 1–4) and their three-way interaction, plus an additive 7-level covariate for display_time (0–50 ms); participants contributed a random intercept. Type III Wald χ² tests (car::Anova) tested omnibus effects, and follow-up contrasts on the probability scale were computed with emmeans. All p-values are two-tailed and uncorrected unless noted.

Omnibus effects. Image type (χ²(1) = 620.21, p < .001) and PAS (χ²(3) = 185.75, p < .001) had the largest effects, with a large image_type × PAS interaction (χ²(3) = 501.43, p < .001) reflecting the steep rise in discrimination with subjective awareness (Figure 1, Table 3). Sound condition produced a small overall effect (χ²(2) = 9.36, p = .009) that interacted with image type (χ²(2) = 9.56, p = .008). The three-way interaction was not significant (χ²(6) = 9.75, p = .136); given a priori expectations of null modulation at the PAS extremes, we decomposed the image × sound interaction at each PAS level.

Sound modulation by awareness. Figure 2 and Table 2 present the interaction contrasts at each PAS level. Contrasts were negligible at PAS 1 (|Δ| ≤ 0.024, p ≥ .61) and PAS 4 (|Δ| ≤ 0.011, p ≥ .18), consistent with floor and ceiling discrimination (Figure 3, Table 3). All reliable modulation was confined to partial awareness. At PAS 2, sweet music widened the salty–sweet discrimination gap by 12.0 percentage points relative to silence (Δ = +0.120, 95% CI [+0.002, +0.238], z = 1.99, p = .047). At PAS 3, sweet music outperformed salty music (Δ = −0.088 for saltymusic − sweetmusic, 95% CI [−0.156, −0.020], z = −2.53, p = .011), because of reduced discrimination under salty music (ΔP = 0.783) relative to silent (0.850) and sweet music (0.870). Across twelve contrasts neither effect survives Holm or Benjamini–Hochberg correction (best adjusted p = BH q = 0.132); we treat them as suggestive and in need of replication.

Response-bias control. A separate GLMM on 0-ms trials, where no image was shown and image type was randomly assigned, found no reliable effect of sound (observed P(salty) = 0.471, 0.459, 0.494 for silent, sweet and salty music; all pairwise p ≥ .15; Table 5). The PAS-2/PAS-3 effects are therefore unlikely to reflect a simple response-priming account. The additive display-time specification held empirically: within-panel curves in Figure 4 run approximately parallel across 0–50 ms, so the findings do not depend on any particular display-time subset.


Supplement: Display Time (Figure 4)


Table 2b: Multiple-comparison correction for the interaction contrasts

Table 2b. Multiple-comparison correction across the 12 interaction contrasts
Holm, Benjamini–Hochberg (q) and Bonferroni adjustments applied to the family in Table 2
Contrast raw p Holm p BH q Bonferroni p
PAS 1 sweetmusic − silent 0.689 1.000 0.752 1.000
saltymusic − silent 0.615 1.000 0.752 1.000
saltymusic − sweetmusic 0.914 1.000 0.914 1.000
PAS 2 sweetmusic − silent 0.047 0.517 0.244 0.564
saltymusic − silent 0.680 1.000 0.752 1.000
saltymusic − sweetmusic 0.133 1.000 0.399 1.000
PAS 3 sweetmusic − silent 0.529 1.000 0.752 1.000
saltymusic − silent 0.061 0.610 0.244 0.732
saltymusic − sweetmusic 0.011 0.132 0.132 0.132
PAS 4 sweetmusic − silent 0.650 1.000 0.752 1.000
saltymusic − silent 0.379 1.000 0.752 1.000
saltymusic − sweetmusic 0.183 1.000 0.439 1.000
Rows highlighted in lavender had raw p < .05. Family of 12 contrasts
(4 PAS levels × 3 pairwise sound comparisons). Neither nominally-significant
effect survives any standard family-wise or false-discovery correction:
the best adjusted value is BH q = 0.132 (PAS 3, saltymusic − sweetmusic).

Conclusions. The two contrasts that were nominally significant in Table 2 (PAS 2 sweetmusic − silent; PAS 3 saltymusic − sweetmusic) do not survive correction under any of the three standard procedures. The strongest effect — PAS 3 saltymusic − sweetmusic — reaches Holm-adjusted p = .132 / BH q = .132, i.e. well above the conventional α = .05 threshold. A narrower, a priori confirmatory family (e.g. the four “music vs. silent” contrasts at PAS 2 and PAS 3 only, αBonferroni = .0125) would still leave only PAS 3 saltymusic − sweetmusic approaching significance, and even that contrast is not in the restricted family. It therefore seems that the PAS-2/PAS-3 pattern is suggestive and requires replication.


Limitations and methodological considerations

Random-effects structure. The reported GLMM includes only a by-participant random intercept, (1 | id). For a fully within-subjects design the principled specification is a random-slopes model (at minimum (1 + image_type | id) and ideally (1 + image_type + sound_condition | id)) so that by-participant variability in the very effects being tested is not absorbed into the residual term. With 192 trials per participant there is in principle sufficient data to support random slopes.

Multiple-comparison correction. As shown in Table 2b, none of the twelve interaction contrasts survives correction under Holm, Benjamini–Hochberg or Bonferroni procedures applied to the full family. Two paths forward are 1) a possible pre-registering a targeted replication with the PAS-2/PAS-3 “music vs. silent” contrasts as the primary confirmatory tests, or 2) narrowing the confirmatory family a priori (e.g. the four “music vs. silent” contrasts at PAS 2–3, αBonferroni = .0125). The present effects are therefore described as suggestive rather than confirmatory.

Theoretical framing vs. empirical pattern. The observed pattern does not match a classical semantic congruence account. Under congruence, sweet music should selectively facilitate “sweet” responses to sweet images (and salty music should do the same for salty images). Instead, sweet music was associated with enhanced discrimination generally, and salty music was neutral at PAS 2 and mildly suppressive at PAS 3 (Table 3). This is more consistent with a general-enhancement or arousal-based account (e.g. one in which the sweet-music clip raises attentional engagement across image types). An acoustic/affective comparison of the two music clips (tempo, loudness, valence, arousal etc.) would help distinguish these accounts.

0-ms trials and the PAS = 1 cell. Approximately 14% of trials were presented at 0 ms (no image shown), and these trials almost exclusively produce PAS = 1. Because image_type was randomly assigned on 0-ms trials, the PAS-1 row of the analysis is due to trials without a genuine stimulus. The present specification absorbs this through the additive display_time covariate, and the PAS-1 contrasts are reported as null (consistent with expectation). As a robustness check, one could refit the model excluding 0-ms trials to confirm the PAS-2/PAS-3 contrasts are unchanged.

Convergence diagnostics. The reported model converged under the bobyqa optimizer without warnings. Extended models (e.g., random slopes) will require additional tuning.

Other. The present analysis does not 1) include a variance-explained measure beyond ΔP (e.g. marginal/conditional pseudo-R² via MuMIn::r.squaredGLMM); 2) analyse reaction-time data; or 3) document participant- or trial-level exclusions.