Introduction to Representational Similarity Analysis

Author

Chris Cox

What is Representational Similarity Analysis (RSA)?

There are many very good papers written about the logic, strengths, and weaknesses of RSA. It is worth digging into them. I have included a short list at the end of this document. One of may many languishing projects is writing another paper in this vein…

RSA boils down to a comparison of similarity or dissimilarity (i.e., distance) matrices. Similarity (and dissimilarity) matrices are symmetric, meaning that x[i,j] == x[j,i] is TRUE for all i and j. All values along the diagonal of a dissimilarity matrix (i.e., whenever i==j) will be zero. For similarity matrices useful for RSA, such as correlation matrices or cosine similarity matrices, the diagonal values will always be one.¹

Exploring similarity and distance matrices

Let’s load some packages …

library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(tibble)
library(tidyr)
library(ggplot2)
library(coop)

Note

The coop package provides efficient implementations of covariance, correlation, and cosine similarity.

… and a selection of data packaged with R to utilize for a demonstration:

data("mtcars")
x <-  mtcars[, c(1,3,4,5,6,7)]

View dataset

knitr::kable(x, digits = 1)

	mpg	disp	hp	drat	wt	qsec
Mazda RX4	21.0	160.0	110	3.9	2.6	16.5
Mazda RX4 Wag	21.0	160.0	110	3.9	2.9	17.0
Datsun 710	22.8	108.0	93	3.9	2.3	18.6
Hornet 4 Drive	21.4	258.0	110	3.1	3.2	19.4
Hornet Sportabout	18.7	360.0	175	3.1	3.4	17.0
Valiant	18.1	225.0	105	2.8	3.5	20.2
Duster 360	14.3	360.0	245	3.2	3.6	15.8
Merc 240D	24.4	146.7	62	3.7	3.2	20.0
Merc 230	22.8	140.8	95	3.9	3.1	22.9
Merc 280	19.2	167.6	123	3.9	3.4	18.3
Merc 280C	17.8	167.6	123	3.9	3.4	18.9
Merc 450SE	16.4	275.8	180	3.1	4.1	17.4
Merc 450SL	17.3	275.8	180	3.1	3.7	17.6
Merc 450SLC	15.2	275.8	180	3.1	3.8	18.0
Cadillac Fleetwood	10.4	472.0	205	2.9	5.2	18.0
Lincoln Continental	10.4	460.0	215	3.0	5.4	17.8
Chrysler Imperial	14.7	440.0	230	3.2	5.3	17.4
Fiat 128	32.4	78.7	66	4.1	2.2	19.5
Honda Civic	30.4	75.7	52	4.9	1.6	18.5
Toyota Corolla	33.9	71.1	65	4.2	1.8	19.9
Toyota Corona	21.5	120.1	97	3.7	2.5	20.0
Dodge Challenger	15.5	318.0	150	2.8	3.5	16.9
AMC Javelin	15.2	304.0	150	3.1	3.4	17.3
Camaro Z28	13.3	350.0	245	3.7	3.8	15.4
Pontiac Firebird	19.2	400.0	175	3.1	3.8	17.0
Fiat X1-9	27.3	79.0	66	4.1	1.9	18.9
Porsche 914-2	26.0	120.3	91	4.4	2.1	16.7
Lotus Europa	30.4	95.1	113	3.8	1.5	16.9
Ford Pantera L	15.8	351.0	264	4.2	3.2	14.5
Ferrari Dino	19.7	145.0	175	3.6	2.8	15.5
Maserati Bora	15.0	301.0	335	3.5	3.6	14.6
Volvo 142E	21.4	121.0	109	4.1	2.8	18.6

I generate a new matrix of random values (e) to add to this data to reduce the similarity among items.

e <- matrix(
    rnorm(nrow(x) * ncol(x), sd = apply(x, 2, sd)),
    nrow = nrow(x),
    ncol = ncol(x),
    byrow = TRUE
)

Then I compute five matrices: a correlation similarity matrix, a correlation distance matrix (i.e., dissimilarity), a cosine similarity matrix, a cosine distance matrix, and a Euclidean distance matrix.

matrices <- list(
    correl_similarity = cor(t(x + e)),
    correl_dissimilar = (2 - (cor(t(x + e)) + 1)) / 2,
    cosine_similarity = coop::cosine(t(x + e)),
    cosine_dissimilar = (2 - (coop::cosine(t(x + e)) + 1)) / 2,
    euclid_distance = as.matrix(dist(x + e))
)

Why all three of these? Purely for the sake of comparing them in the context of this demonstration. In a real analysis, you will want to think ahead of time about which metric makes most sense for your use case and question.

Euclidean distance is computed relative to the points as they are provided. There is no scaling at all.

With cosine distance (and similarity), you can think of all the points as defining a direction away from the origin. The distances from the origin themselves are not important, just the angle between the rays that pass through each pair of points. In other words, cosine distance is invariant to scale. You could multiply the vector defining the point for each item by some different random number and the cosine similarity would not change.

With correlation distance (and similarity), you subtract the mean and then divide by the standard deviation of the vector for each item. This means that correlation distance will not be affected by multiplying or adding a value to each item-vector.

Because cosine and correlation “distance” are invariant to scale, these are sometimes referred to a dissimilarities rather than distances.

To plot these matrices with ggplot, I will represent them as tibbles (which are just data.frames with some conveniences baked in):

d <- map(matrices, ~ {
    .x |>
        as_tibble(rownames = "carA") |>
        pivot_longer(
            cols = -carA,
            names_to = "carB",
            values_to = "value"
        )
}) |>
    list_rbind(names_to = "metric") |>
    mutate(
        metric = factor(
            metric,
            levels = c(
                "correl_similarity",
                "correl_dissimilar",
                "cosine_similarity",
                "cosine_dissimilar",
                "euclid_distance"
            ),
            labels = c(
                "correlation similarity",
                "correlation distance",
                "cosine similarity",
                "cosine distance",
                "Euclidean distance"
            )
        ),
        across(c(carA, carB), ~ factor(.x, levels = rownames(x)))
    )

I will also define a logical matrix that can be used as a filter to select the lower triangle of each of these matrices as:

lt <- lower.tri(matrices$cosine_similarity)

Now I will plot the correlation and cosine similarity and distance matrices:

d |>
    filter(metric != "Euclidean distance") |>
    ggplot(aes(x = carA, y = carB, fill = value)) +
        geom_raster() +
        scale_fill_gradient2() +
        theme(
            axis.title.y = element_blank(),
            axis.title.x = element_blank(),
            axis.text.x = element_blank(),
            axis.ticks.x = element_blank()
        ) +
        facet_wrap(vars(metric))

Notice that the two express the same patterns of similarity, but inverted and on different scales. Indeed, the correlation between the lower triangles of the two matrices is -1:

cor(matrices$correl_similarity[lt], matrices$correl_dissimilar[lt])

[1] -1

cor(matrices$cosine_similarity[lt], matrices$cosine_dissimilar[lt])

[1] -1

Also note that correlation similarity and

cor(matrices$correl_similarity[lt], matrices$cosine_similarity[lt])

[1] 0.9838994

Distances cannot be negative, so 0 is the bottom of the scale for the correlation distance matrix.

Now I will plot the Euclidean distance:

d |>
    filter(metric %in% c("cosine distance", "Euclidean distance")) |>
    group_by(metric) |>
    mutate(value = value / max(value)) |>
    ggplot(aes(x = carA, y = carB, fill = value)) +
        geom_raster() +
        scale_fill_gradient2() +
        theme(
            axis.title.y = element_blank(),
            axis.title.x = element_blank(),
            axis.text.x = element_blank(),
            axis.ticks.x = element_blank()
        ) +
    facet_wrap(vars(metric))

Euclidean distance conveys different structure

Notice that the Euclidean distances among items do not convey structure similar to the matrices above:

imap(matrices, ~ tibble(.y := .x[lt])) |>
    list_cbind() |>
    cor() |>
    knitr::kable(digits = 3)

	correl_similarity	correl_dissimilar	cosine_similarity	cosine_dissimilar	euclid_distance
correl_similarity	1.000	-1.000	0.984	-0.984	-0.415
correl_dissimilar	-1.000	1.000	-0.984	0.984	0.415
cosine_similarity	0.984	-0.984	1.000	-1.000	-0.417
cosine_dissimilar	-0.984	0.984	-1.000	1.000	0.417
euclid_distance	-0.415	0.415	-0.417	0.417	1.000

Why is this? Reflect on the differences between how correlation and Euclidean distance are computed. In the following equations, \(x\) and \(y\) refer to different items like Honda Civic and Fiat 128, and the numbers iterate over features like mpg (miles per gallon), and hp (horsepower)).

\[ \textrm{Euclidean Dist.} = \sqrt{\sum(x_i - y_i)^2} \] \[ \textrm{Pearson's } r = \frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum(x_i - \bar{x})^2\sum(y_i-\bar{y})^2}} \]

First, notice that the correlation subtracts the mean from each vector. While its not as obvious in the formulation above, we could also think about Pearson’s \(r\) as:

\[ \begin{align} \textrm{Pearson's } r &= \frac{\sum z(x)_i - z(y)_i}{n-1} \\ z(a)_i &= \frac{a_i - \bar{a}}{s(a)} \\ s(a) &= \sqrt{\frac{\sum a_i - \bar{a}}{n-1}} \end{align} \]

In other words, each variable is also standardized so that the vectors being correlated both have standard deviation equal to one (i.e., “unit variance”).

The cosine similarity is defined as:

\[ \textrm{Cosine Sim.} = \frac{\sum x_i y_i}{\sqrt{\sum x_i^2} \sqrt{\sum y_i^2}} \]

In this equation, notice that \(\sqrt{\sum x_i^2}\) is the Euclidean distance from the point defined by the vector \(x\) to the origin (i.e., the magnitude of the vector), and likewise for \(\sqrt{\sum y_i^2}\). So the denominator is the product of the distances from the origin, and so the equation standardizes with respect to distance from the origin. The numerator is the dot product of the vectors \(x\) and \(y\). For now, it suffices to know that when the dot product of two vectors is zero, then the vectors are orthogonal. So the cosine similarity is zero when the vectors are orthogonal. If not for the denominator, the dot product could be infinitely large or small, but in the context of the denominator the minimum and maximum values this equation can yield are -1 and 1.

It is important to consider what structure and relationships are being emphasized by the choice of similarity or distance metric.

While similarity and dissimilarity matrices have some important differences when considering relationships among items, when focusing on the mechanics of RSA they are interchangeable. From this point on, in the interest of brevity, I will write “similarity matrix” and not explicitly mention dissimilarity (or distance) matrices.

First example

Simulating stimulus structure

We are going to simulate an experiment where there are 100 stimuli. A study designed with the intention of applying RSA will typically be asking a question something like: “are there neural representations that express the similarity among these stimuli?”. The linking hypothesis between the cognitive scientific and the neuroscientific side of things is that patterns of activity in the brain related to concept representation will be more similar when the concepts are more similar, and vice versa.

For the purpose of the simulation, we do not need to care what the concepts are. We just need to define how they relate to each other. One easy way to do that is to assume that each concept maps to a point. The point can be specified in as many dimensions as you like, and each dimension can be thought of as aligning with some important axis of knowledge organization… maybe animacy, size, and dangerousness. :)

Digression on treating animacy as graded

Typically, animacy is treated as a binary, but people do tend to rate insects and fish and less animate than birds and mammals. Plants are living, but not animate, rocks are nonliving but natural, artifacts are nonliving and unnatural. There is some evidence for an “animacy continuum” in the posterior ventral temporal lobe (Sha et al., 2015).

For our simulation, let’s generate 100 points in 3D space.

x <- matrix(
    rnorm(300),
    nrow = 100,
    ncol = 3,
    dimnames = list(
        items = NULL,
        dimensions = NULL
    )
)

print(x[1:10, ])

       dimensions
items         [,1]       [,2]        [,3]
   [1,]  0.8260602  1.0767403  0.05287858
   [2,] -0.7483852  0.1497702 -0.58155601
   [3,]  0.9965190 -1.0652409  0.37140860
   [4,] -1.3437493 -0.8740077 -0.11609976
   [5,]  1.5553229 -0.3513053  1.57906259
   [6,] -0.7530725  1.0150058 -0.34767186
   [7,]  0.8347282  1.2615556  0.85840413
   [8,] -0.1670500  0.8495761 -1.74176826
   [9,]  0.6983030  0.1102902  0.04996594
  [10,] -1.8071028  0.2691978  0.47618229

How can three random vectors be “structure”?

While it is true that there is no interesting topology to the 100 points, they are still 100 points in space, each with some similarity/distance relative to every other point. As long as these positions are systematic, in that different attempts to measure these points will result in them being positioned similarly in space, that’s all we need in order to say there is structure.

Now, we can define the target representational similarity matrix (RSM) as the pairwise cosine similarities among items.

y <- coop::cosine(t(x))
str(y)

 num [1:100, 1:100] 1 -0.374 -0.149 -0.942 0.325 ...

Warning

The t() (transpose) function essentially rotates a matrix. It changes x from a \(100 \times 3\) matrix to a \(3 \times 100\) matrix. This is important here because cor() produces pairwise correlations among columns.

Meanwhile, the dist() function produces pairwise distances among rows. This is a quirk of R built in functions that should be committed to memory.

Euclidean distance vs correlation distance

Just to drive home the consequence of this decision to use correlation rather than, say, Euclidean distance, consider the following:

cosine_sim <- function(x) {
    assertthat::assert_that(is.matrix(x))
    y <- x / sqrt(rowSums(x * x))
    tcrossprod(y)
}
cosine_dist <- function(x) {
    1 - cosine_sim(x)
}
cor_dist <- function(x) {
    r <- cor(x)
    (2 - (r + 1)) / 2
}
std_dist <- function(x) {
    d <- as.matrix(dist(x))
    d / max(d)
}
tmp <- rbind(
    A = c( 1,  3,  2),
    B = c(-3, -1, -2),
    C = c( 0,  1,  1)
)
print(tmp)

  [,1] [,2] [,3]
A    1    3    2
B   -3   -1   -2
C    0    1    1

cor_dist(t(tmp))

          A         B         C
A 0.0000000 0.0000000 0.0669873
B 0.0000000 0.0000000 0.0669873
C 0.0669873 0.0669873 0.0000000

std_dist(tmp)

          A         B         C
A 0.0000000 1.0000000 0.3535534
B 1.0000000 0.0000000 0.6770032
C 0.3535534 0.6770032 0.0000000

Notice that A and B have a correlation distance \(0\), meaning they are perfectly correlated, but have a (standardized) distance of 1 (meaning this is that largest of the three distances between vectors A, B, and C).

Euclidean distance vs cosine distance

Just to drive home the consequence of this decision to use cosine distance rather than, say, Euclidean distance, consider the following:

cosine_dist <- function(x) {
    d <- 1 - coop::cosine(x)
    d / max(d)
}
euclidean_dist <- function(x) {
    d <- as.matrix(dist(x))
    d / max(d)
}
y1 <- rbind(
    A = c( 1,  3,  2),
    B = c(-3, -1, -2),
    C = c( 0,  1,  1)
)
y2 <- rbind(
    A = c( 1,  3,  2),
    B = c(-3, -1, -2) * 10,
    C = c( 0,  1,  1)
)
print(y1)

  [,1] [,2] [,3]
A    1    3    2
B   -3   -1   -2
C    0    1    1

cosine_dist(t(y1))

           A         B          C
A 0.00000000 1.0000000 0.03213514
B 1.00000000 0.0000000 0.91405225
C 0.03213514 0.9140522 0.00000000

euclidean_dist(y1)

          A         B         C
A 0.0000000 1.0000000 0.3535534
B 1.0000000 0.0000000 0.6770032
C 0.3535534 0.6770032 0.0000000

print(y2)

  [,1] [,2] [,3]
A    1    3    2
B  -30  -10  -20
C    0    1    1

cosine_dist(t(y2))

           A         B          C
A 0.00000000 1.0000000 0.03213514
B 1.00000000 0.0000000 0.91405225
C 0.03213514 0.9140522 0.00000000

euclidean_dist(y2)

           A         B          C
A 0.00000000 1.0000000 0.06097108
B 1.00000000 0.0000000 0.95174789
C 0.06097108 0.9517479 0.00000000

Simulating fMRI data

The assumption going into this simulation is that this 3D structure will be expressed in patterns of neural activity somehow.

The true 3D-structure will be a part of the matrices to which we will apply RSA. First, we’ll create some “MRI data” by:

1. Replicating the columns
2. Adding some noise on top of the true signal
3. Appending some completely unrelated columns

noise_sd <- 1
e <- matrix(rnorm(300 * 10, sd = noise_sd), nrow = 100, ncol = 3 * 10)
xe <- matrix(c(x) + c(e), nrow = 100, ncol = 3 * 10)
E <- matrix(rnorm(300 * 10, sd = noise_sd), nrow = 100, ncol = 3 * 10)
X <- cbind(xe, E)

print(dim(X))

[1] 100  60

Implicit R trickery

In the code above, I leverage the facts that R will recycle the shorter vector to match the longer one (assuming the length of the longer one is multiple of the shorter one) and that matrices are populated one column at a time by default. For example:

matrix(1:6, nrow = 3, ncol = 2)

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

We’ll say X is the “fMRI data”. There are 60 voxels in total. The former 30 carry signal, and the latter 30 are completely unrelated to the true 3D structure. The first voxel is derived from the first dimension of the the true structure, the second is derived from the second dimension of the true structure, and likewise for the third. This pattern repeats 10 times.

I set the standard deviation of the noise to be 1, which is equal to the standard deviation all three dimensions of “signal”. In other words, the signal to noise ratio is 1:1 for any signal-carrying voxel.

We know the signal to noise ratio exactly since this is a simulation. But thinking about real data in terms of potential sources of variance is very helpful, even if you do not know what they all are.

You might think that a situation where the noise has the same intensity as the signal as hopeless, but there are two reasons why it is not.

We are interested in how this structure correlates with a target structure, which means that even if half of the variance we observe is completely unrelated to the target structure, half of the variance is related to the target.
Additionally, each of the three dimensions of signal is repeated across multiple voxels. This means that while the noise is uncorrelated between voxels, the signal is correlated between voxels. This is true of this simulation, and also has some truth in many real datasets.

To highlight this second point, we will first compute the cosine similarity matrix among voxels.

r <- coop::cosine(X)
dim(r)

[1] 60 60

Then we will decompose that similarity matrix into its eigenvalues and eigenvectors.

e <- eigen(r)
str(e)

List of 2
 $ values : num [1:60] 6.62 5.83 5.33 2.43 2.07 ...
 $ vectors: num [1:60, 1:60] 0.276 -0.136 0.147 0.279 -0.088 ...
 - attr(*, "class")= chr "eigen"

The eigenvectors are essentially principal components: they are all orthogonal.

Orthogonality and the dot product

Two vectors are orthogonal when they are perpendicular to each other. This is hard to envision in more than two dimensions. However, for any pair of orthogonal vectors, their dot product is zero.

We know the vectors \(<1, 0>\) and \(<0, 1>\) are orthogonal.

a <- c(x = 1, y = 0)
b <- c(x = 0, y = 1)
plot(0, type = "n", xlim = c(-2, 2), ylim = c(-2, 2), xlab = "", ylab = "")
abline(v = 0, h = 0)
arrows(
    x0 = c(0, 0),
    y0 = c(0, 0),
    x1 = c(a["x"], b["x"]),
    y1 = c(a["y"], b["y"]),
    lwd = 3
)

The dot product is the sum of elementwise products between the vectors:

unname((a[1] * b[1]) + (a[2] * b[2]))

[1] 0

sum(a * b)

[1] 0

If those vectors are represented as matrices, we can formulate this slightly differently.

A <- matrix(a, nrow = 2, ncol = 1) 
B <- matrix(b, nrow = 2, ncol = 1)
list(A = A, B = B)

$A
     [,1]
[1,]    1
[2,]    0

$B
     [,1]
[1,]    0
[2,]    1

Notice that each matrix is a “column vector”: because it is a matrix the vector could be expressed as rows or columns, and that matters when doing operations with other matrices. When taking the dot product of two matrices, each row in the first matrix is multiplied with each column of the second matrix, and these products are summed. If we want to use matrix operations to take the dot product between these two matrices, we will need to transpose one or the other of them.

t(A) %*% B

     [,1]
[1,]    0

If we combine A and B into a single \(2 \times 2\) matrix C, we can multiply it with itself.

C <- cbind(A = A, B = B)
t(C) %*% C

     [,1] [,2]
[1,]    1    0
[2,]    0    1

This is the matrix cross product, and can be expressed in R simply as:

crossprod(C)

     [,1] [,2]
[1,]    1    0
[2,]    0    1

We see this produces a symmetric matrix, indicating that A and B are orthogonal (because their dot products are zero).

This is all build up to showing that the cross product of the matrix of eigenvectors (each eigenvector is a column in this matrix) produces a matrix with ones along the diagonal and zeros everywhere else, proving that they are orthogonal. In the following, I compute the cross product of the first 10 eigenvectors to constrain the output, this is true for all pairs of vectors.

crossprod(e$vectors[, 1:10]) |> knitr::kable(digits = 2)

1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	1

The eigenvectors are ordered from most to least “important”, where importance is determined by the proportion of variance it explains in the correlation matrix. This “importance” is captured by the eigenvalues, which we can plot below:

plot(e$values, xlab = "eigenvalues")

The first three eigenvalues clearly separate from the latter 57. A principal components analysis would show that the first 30 voxels load on these three components, while the latter 30 voxels do not.

Let’s now consider how the eigenvalues would look if we construct the correlation matrix based on a few subsets of voxels:

Omitting most signal carrying voxels

We can see that the first three eigenvalues shrink substantially.

r <- coop::cosine(X[, c(1:3, 31:60)])
plot(eigen(r)$values)

Omitting all voxels relevant to dimensions 2 and 3

We can see that the first eigenvalue is large, but 2 and 3 are reduced substantially.

r <- coop::cosine(X[, c(seq(1, 30, by = 3), 31:60)])
plot(eigen(r)$values)

In summary, we have a matrix X of simulated fMRI data the expresses the similarity structure in y, but:

No single voxel contains all the information needed to explain variance in y. y is the cosine similarity of points in a 3D space, and no voxel in X contains information about more than one of those dimensions.
Half of the voxels carry no signal at all.
When all 30 signal carrying voxels are considered together, the three dimensions of signal are clear in the data, but if we under-sample the signal carrying voxels the structure becomes less apparent in the cosine similarity matrix.

Doing basic RSA

RSA tests whether two similarity matrices are are more correlated than would be expected by chance given the number of observations in the lower triangle similarity among experimental items is reflected in the patterns of neural activity they are associated with (Kriegeskorte et al., 2008).

All we are doing in the following is taking the lower triangle of the target cosine similarity matrix, y, which I am calling rsm (representational similarity matrix) and correlating it with the lower triangle of a cosine similarity matrix derived from a set of voxels from the fMRI data, which I am calling nsm (neural similarity matrix).

When doing RSA, you do not know where the signal is necessarily, and there are many more than 60 voxels. In the series of analyses below, I make different selections from the fMRI data to show how the correlation between rsm and nsm are effected.

Note that the signal in this simulated fMRI dataset is MUCH stronger than in real data. Real RSA analyses typically report correlations \(<.1\).

Finally, I report the Pearson and Spearman correlations for each analysis. It is conventional in the RSA literature to use the Spearman (rank) correlation. This is somewhat relaxing the analysis: with Pearson’s correlation, we are looking for the linear differences in similarity between pairs of items to be recovered from the neural activity. With Spearman’s, we are looking for the rank order of the pairwise similarity magnitudes to be similar between the rsm and nsm.

methods <- c("Pearson's r" = "pearson", "Spearman's Rho" = "spearman")
lt <- lower.tri(y)
rsm <- y[lt]
nsm <- coop::tcosine(X)[lt]
vapply(methods, function(m) cor(rsm, nsm, method = m), numeric(1))

   Pearson's r Spearman's Rho 
     0.7504577      0.7551937

lt <- lower.tri(y)
rsm <- y[lt]
nsm <- coop::tcosine(X[, 1:30])[lt]
vapply(methods, function(m) cor(rsm, nsm, method = m), numeric(1))

   Pearson's r Spearman's Rho 
     0.7930671      0.7960409

lt <- lower.tri(y)
rsm <- y[lt]
nsm <- coop::tcosine(X[, seq(1, 30, by = 3)])[lt]
vapply(methods, function(m) cor(rsm, nsm, method = m), numeric(1))

   Pearson's r Spearman's Rho 
     0.5315229      0.5306405

lt <- lower.tri(y)
rsm <- y[lt]
nsm <- coop::tcosine(X[, 1:15])[lt]
vapply(methods, function(m) cor(rsm, nsm, method = m), numeric(1))

   Pearson's r Spearman's Rho 
     0.6811627      0.6800492

lt <- lower.tri(y)
rsm <- y[lt]
nsm <- coop::tcosine(X[, 16:60])[lt]
vapply(methods, function(m) cor(rsm, nsm, method = m), numeric(1))

   Pearson's r Spearman's Rho 
     0.6248428      0.6228495

lt <- lower.tri(y)
rsm <- y[lt]
nsm <- coop::tcosine(X[, 28:60])[lt]
vapply(methods, function(m) cor(rsm, nsm, method = m), numeric(1))

   Pearson's r Spearman's Rho 
     0.2064991      0.1976949

A short selection of papers on RSA

Cai, M. B., Schuck, N. W., Pillow, J. W., & Niv, Y. (2019). Representational structure or task structure? Bias in neural representational similarity analysis and a bayesian method for reducing bias. PLOS Computational Biology, 15(5), e1006299. https://doi.org/10.1371/journal.pcbi.1006299

Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLOS Computational Biology, 13(4), e1005508. https://doi.org/10.1371/journal.pcbi.1005508

Dimsdale-Zucker, H. R., & Ranganath, C. (2018). Representational similarity analyses. In Handbook of in vivo neural plasticity techniques (pp. 509–525). Elsevier. https://doi.org/10.1016/b978-0-12-812028-6.00027-6

Dujmović, M., Bowers, J. S., Adolfi, F., & Malhotra, G. (2022). Obstacles to inferring mechanistic similarity using Representational Similarity Analysis. bioRxiv. https://doi.org/10.1101/2022.04.05.487135

Friston, K. J., Diedrichsen, J., Holmes, E., & Zeidman, P. (2019). Variational representational similarity analysis. NeuroImage, 201, 115986. https://doi.org/10.1016/j.neuroimage.2019.06.064

Kaniuth, P., & Hebart, M. N. (2022). Feature-reweighted representational similarity analysis: A method for improving the fit between computational models, brains, and behavior. NeuroImage, 257, 119294. https://doi.org/10.1016/j.neuroimage.2022.119294

Keshavarz, S., Babaee, N., & Hossein-Zadeh, G.-A. (2024). Comparison of different dissimilarity measures in representational similarity analysis of multimodal EEG-fMRI data. 2024 10th International Conference on Signal Processing and Intelligent Systems (ICSPIS), 224–229. https://doi.org/10.1109/icspis65223.2024.10931068

Kriegeskorte, N., Mur, M., & Bandettini, P. (2008). Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience. https://doi.org/10.3389/neuro.06.004.2008

Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553. https://doi.org/10.1371/journal.pcbi.1003553

Sha, L., Haxby, J. V., Abdi, H., Guntupalli, J. S., Oosterhof, N. N., Halchenko, Y. O., & Connolly, A. C. (2015). The animacy continuum in the human ventral vision pathway. Journal of Cognitive Neuroscience, 27(4), 665–678. https://doi.org/10.1162/jocn_a_00733

Footnotes

A covariance matrix is an example of a similarity matrix that does not have ones on the diagonal. Instead, it has the item variances on the diagonal.↩︎

1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	1

1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	1

1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
0	0	1	0	0	0	0	0	0	0
0	0	0	1	0	0	0	0	0	0
0	0	0	0	1	0	0	0	0	0
0	0	0	0	0	1	0	0	0	0
0	0	0	0	0	0	1	0	0	0
0	0	0	0	0	0	0	1	0	0
0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	0	0	1