PNAS Binder: Lower Triangle Regression

Chris Cox

Overview

Patterns of neural activity were summarized as RSMs within regions of interest. Various semantic models were also summarized as RSMs. Unidimensional psycholinguistic confounds were extracted via PCA. Each confound was re-represented as a matrix of pairwise distances.

Lower triangle regression treats the lower triangle of each of these matrices as a variable in a regression model. There are \((N^2 - N) / 2\) data points going into this regression, but they are derived from only \(N\) distinct observations. Thus, the standard errors associated with the regression coefficients will be too small.

To compensate, we perform Mantel tests.

Mantel test

Assesses the correlation between two dissimilarity matrices, \(D_1\) and \(D_2\).
The rows and columns of \(D_1\) are symmetrically permuted many times.
Permuted matrices are correlated with \(D_2\).
Significance is determined relative to this simulated null distribution.

Regression model definition

\[ \textrm{N} = \beta_0 + \beta_1 \textrm{M} + \beta_2 \textrm{RC}_\textrm{Fam} + \beta_3 \textrm{RC}_\textrm{Conc} + \beta_4 \textrm{RC}_\textrm{RT} + \beta_5 \textrm{RC}_\textrm{Orth} + \epsilon \]

N is the lower triangle of the neural RSM
M is the lower triangle of the model RSM
RC is the lower triangle of a distance matrix for one confound.
Before modeling, the RSMs (i.e., correlation matrices) are converted to Euclidean distances:

\[ d = \sqrt{2 (1 - r)} \]

Semi-partial Mantel tests of regression terms

To perform Mantel tests for (sets of) regressors, we need to correlate the term of interest with the residuals of model that includes all other variables.
For example:

\[\begin{align} \textrm{N} &= \beta_0 + \beta_1 \textrm{M} + \beta_2 \textrm{RC}_\textrm{Conc} + \beta_3 \textrm{RC}_\textrm{RT} + \beta_4 \textrm{RC}_\textrm{Orth} + \epsilon_\textrm{Fam} \\ \epsilon_\textrm{Fam} &= \beta_0 + \beta_1 \textrm{RC}_\textrm{Fam} + \epsilon \end{align}\]

Since the Mantel test works with correlation, \(\beta_0 = 0\) and \(\beta_1\) is Pearson’s r in the second model.

Semi-partial Mantel example code

In practice, the full model is used to generate predicted values when the target term is held constant at its mean.

  RC1_Fam = mantel(
    vec_as_dist(d$RC1_Fam_lt),
    vec_as_dist(d$nsm_lt - predict(m, mutate(d, RC1_Fam_lt = mean(RC1_Fam_lt))))
  )

Convert RSM lower triangle to RDM lower triangle

The vegan::mantel() function expects dist objects, so correlations are converted to Euclidean distances and packaged.

cor2dist <- function(r, as_dist = FALSE) {
    d <- sqrt(2 * (1 - r))
    if (as_dist) vec_as_dist(d) else d
}

vec_as_dist <- function(x, upper_tri = FALSE) {
    n <- (1 + sqrt(1 + (8 * length(x)))) / 2
    assertthat::are_equal(n, floor(n))
    structure(
        x,
        class = "dist",
        Size = as.integer(n),
        Diag = FALSE,
        Upper = upper_tri
    )
}

Lower triangle regression code

This function expects binary blobs containing the lower triangle of the RSMs (correlations) read straight from the database. It converts, builds data frame for modeling, fits the model, and runs Mantel tests for:

Full model
Model RSM, alone
Confounds, as a set
Each confound, alone

Lower triangle regression code

lower_triangle_regression <- function(nsm_blob, rsm_blob, conf_df) {
    n <- nrow(conf_df)
    uniq_diag_elements <- (n^2 - n) / 2
    x <- map(conf_df |> select(starts_with("RC")), ~ as.numeric(dist(.)))
    d <- tibble(
        nsm_lt = nsm_blob |> raw_to_numeric(uniq_diag_elements) |> cor2dist(),
        rsm_lt = rsm_blob |> raw_to_numeric(uniq_diag_elements) |> cor2dist(),
        RC1_Fam_lt = x$RC1_Fam,
        RC2_Conc_lt = x$RC2_Conc,
        RC3_RT_lt = x$RC3_RT,
        RC4_Orth_lt = x$RC4_Orth
    )
    m <- lm(
        nsm_lt ~ rsm_lt + RC1_Fam_lt + RC2_Conc_lt + RC3_RT_lt + RC4_Orth_lt,
        data = d
    )
    tests <- list(
        omnibus = mantel(
            vec_as_dist(d$nsm_lt),
            vec_as_dist(fitted(m))
        ),
        model_rsm = mantel(
            vec_as_dist(d$rsm_lt),
            vec_as_dist(d$nsm_lt - predict(m, mutate(d, rsm_lt = mean(rsm_lt))))
        ),
        confounds = list(
            omnibus = mantel(
                vec_as_dist(predict(m, mutate(d, rsm_lt = mean(rsm_lt)))),
                vec_as_dist(d$nsm_lt - predict(m, mutate(d, across(starts_with("RC"), mean))))
            ),
            RC1_Fam = mantel(
                vec_as_dist(d$RC1_Fam_lt),
                vec_as_dist(d$nsm_lt - predict(m, mutate(d, RC1_Fam_lt = mean(RC1_Fam_lt))))
            ),
            RC2_Conc = mantel(
                vec_as_dist(d$RC2_Conc_lt),
                vec_as_dist(d$nsm_lt - predict(m, mutate(d, RC2_Conc_lt = mean(RC2_Conc_lt))))
            ),
            RC3_RT = mantel(
                vec_as_dist(d$RC3_RT_lt),
                vec_as_dist(d$nsm_lt - predict(m, mutate(d, RC3_RT_lt = mean(RC3_RT_lt))))
            ),
            RC4_Orth = mantel(
                vec_as_dist(d$RC4_Orth_lt),
                vec_as_dist(d$nsm_lt - predict(m, mutate(d, RC4_Orth_lt = mean(RC4_Orth_lt))))
            )
        )
    )
    list(model = m, tests = tests)
}

Averaging, before or after modeling

NSMs exist for each subject within each ROI.
Each can be modeled individually, in which case it would be reasonable to consider the standard error of the mean over subjects to draw statistical inferences.
Alternatively, the neural RSMs can be averaged and then modeled.
- This may improve signal to noise and reveal more representational structure.
- Averaging over RSMs, rather than raw neural activity, has the advantage of side-stepping uninteresting inter-subject variability.

Notes on Fernandino et al (2022)

Study 1 vs Study 2

Study 1

8 participants and 300 concepts.
Concepts span spanning a variety of semantic categories, but the categories are not balanced.

Study 2

36 participants and 320 concepts
- 62 objects and 34 events were also used in study 1
160 object nouns (animals, foods, tools, and vehicles—40 of each)
160 event nouns (social events, verbal communication events, nonverbal sound events, and negative events—40 of each)
- SI Appendix, Tables S5 and S6.

Study 1 Model RSMs

Fernandino et al (2022) PNAS, Figure 2A

Study 2 Model RSMs

Fernandino et al (2022) PNAS, Figure 3A

Model RSMs

The following is quoted from Fernandino et al (2022, PNAS):

Taxonomic Models

WordNet is the most influential representational model based on taxonomic information, having been used in several neuroimaging studies to successfully model semantic content (2, 22, 54, 55). It is organized as a knowledge graph in which words are grouped into sets of synonyms (“synsets”), with each expressing a distinct concept. Synsets are richly interconnected according to taxonomic relations, resulting in a hierarchically structured network encompassing 81,426 noun concepts (in the English version). Concept similarity is computed based on the shortest path connecting the two concepts in this network.

In contrast to WordNet, which provides a comprehensive taxonomy of lexical concepts, the Categorical model was customized to encode the particular taxonomic structure of the concept set in each study, based on a set of a priori categories. Therefore, the Categorical model ignores concept categories that were not included in the study, such as “furniture” or “clothing.” To reduce the level of subjectivity involved in the assignment of items to categories and in the evaluation of intercategory similarities, we tested 18 a priori versions of the Categorical model, with different numbers of categories and different levels of hierarchical structure, for the concepts in Study 1 (SI Appendix, Fig. S1). Each version was tested via RSA against the fMRI data, and the best-performing version was selected for comparisons against other types of models. The selected model for Study 1 (model N in SI Appendix, Fig. S1) consisted of 19 hierarchically structured categories, as follows: abstract (mental abstract, social abstract, social event, other abstract), event (social event, concrete event), animate (animal, human, body part), inanimate (artifact [musical instrument, vehicle, other artifact], food, other inanimate), and place.

In Study 2, the Categorical model consisted of two higher-level categories—object and event—with each consisting of four subcategories (animal, plant/food, tool, and vehicle; sound event, social event, communication event, and negative event).

Experiential Models

The Exp48 model consists of 48 dimensions corresponding to distinct aspects of phenomenal experience, such as color, shape, manipulability, and pleasantness (SI Appendix, Table S2). This model is based on the experiential salience norms of Binder and colleagues (40), in which each dimension encodes the relative importance of an experiential attribute according to crowd-sourced ratings. Exp48 encompasses all perceptual, motor, spatial, temporal, causal, valuation, and valence dimensions present in those norms.

The SM8 model consists of a subset of the Exp48 dimensions, focusing exclusively on sensory-motor information. These dimensions represent the relevance of each sensory modality (vision, audition, touch, taste, and smell) and of action schemas performed with each motor effector (hand, foot, and mouth) to the concept. The concept “apple,” for instance, has high values for vision, touch, taste, mouth actions, and hand actions and low values for the other dimensions.

Distributional Models

Distributional information was modeled with two of the most prominent distributional models available. Namely, word2vec (56) uses a deep neural network trained to predict a word based on a context window of a few words preceding and following the target word. In contrast, the model GloVe (57) is based on the ratio of cooccurrence probabilities between pairs of words across the entire corpus. In a comparative evaluation of distributional semantic models (17), word2vec and GloVe emerged as the two top-performing models in predicting human behavior across a variety of semantic tasks.

Regions of interest

Primarily interested in Angular Gyrus (AG)
Also interested in anterior temporal lobe (ATL)—specifically, ventral ATL.
The ATL definition is based on Rice, Lambon Ralph, & Hoffman (2015).
There are also ROIs for anterior inferior temporal gyrus (aITG) and anterior fusiform gyrus (aFusG).

ROI Definitions

Dotted line defines anterior temporal lobe (Rice, Lambon Ralph, & Hoffman, 2015).
aFusG and aITG refer to the portion of fusiform and inferior temporal gyrus ROIs that are anterior to this line.

Fernandino et al (2022) PNAS, Figure 1D

Semantic Network ROI

Distributed, functionally defined ROI based on the voxel-based meta-analysis (Binder et al, 2009, Cereb. Cortex).

Fernandino et al (2022) PNAS, Figure 1E

Modeling individual subject NSMs

Figure conventions

Error bars are standard error (specifically, standard deviation over subjects divided by the square root of the numer of subjects in the study).
Astericks indicate statistical significance, FDR-adjusted \(p<.05\).
- FDR adjustments are with respect to data within each panel, not over everything.
The y-axes for all plots are fixed to a range that accommodates the upper and lower limits of all data shown in this section.

Load data and setup

library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)

dsubj <- readRDS("data/lt_reg_stats.rds") |>
  bind_rows(readRDS("data/lt_reg_aITG_stats.rds")) |>
  bind_rows(readRDS("data/lt_reg_aFusG_stats.rds")) |>
  as_tibble() |>
  mutate(
    model_rsm = as.factor(code),
    test = factor(
      test,
      c("omnibus", "model_rsm", "confounds", "RC1_Fam", "RC2_Conc", "RC3_RT", "RC4_Orth")
    ),
    study_id = factor(study_id, c(1,2))
  ) |>
  select(-sid, -model_id, -code) |>
  group_by(study_id, region_id, hemisphere, abbreviation, model_rsm, test) |>
  summarize(n_subj = n(), r_avg = mean(r), se = sd(r) / sqrt(n_subj), p_med = median(p)) |>
  mutate(
    t_subjects_se = r_avg / se,
    p_subjects_se = pt(abs(t_subjects_se), df = n_subj - 1, lower.tail = FALSE) / 2
  ) |>
  group_by(study_id, region_id, hemisphere) |>
  mutate(p_subjects_se_fdr = p.adjust(p_subjects_se, method = "fdr")) |>
  ungroup()

Left Hemisphere: AG, ATL

Left Hemisphere: aFusG, aITL

Left Hemisphere: AG, SemNet

Right Hemisphere: AG, ATL

Right Hemisphere: aFusG, aITL

Right Hemisphere: AG, SemNet

Modeling averaged NSMs

Figure conventions

Error bars are standard error (specifically, standard deviation of the simulated null distribution from the Mantel test).
Astericks indicate statistical significance, FDR-adjusted \(p<.05\).
- FDR adjustments are with respect to data within each panel, not over everything.
The y-axes for all plots are fixed to a range that accommodates the upper and lower limits of all data shown in this section.

Load data and setup

library(dplyr)
library(purrr)
library(tidyr)
library(ggplot2)

davg <- readRDS("data/lt_reg_COMBINED_avg_stats.rds") |>
  as_tibble() |>
  mutate(
    model_rsm = as.factor(code),
    test = factor(
      test,
      c("omnibus", "model_rsm", "confounds", "RC1_Fam", "RC2_Conc", "RC3_RT", "RC4_Orth")
    ),
    study_id = factor(study_id, c(1,2))
  ) |>
  select(-model_id, -code) |>
  group_by(study_id, region_id, hemisphere) |>
  mutate(p_fdr = p.adjust(p, method = "fdr")) |>
  ungroup()