This document is a tutorial on how to calculate Goodman-Kruskal gamma correlations in R and RStudio. These calculations are important for metacognitive research in which judgments of learning (JOLs), feelings of knowing (FOKs), and confidence judgments (CJs) are compared to memory performance to assess relative accuracy.1

What are Gamma Correlations?

Simply put, gamma correlations are a measure of ordinal association between two vectors of numbers. Instead of measuring the relationship between two sets of continuous or interval data, such as what a “normal” Pearson correlation does, the gamma correlation translates scale numbers into ordinal (“rank”) data. Thus, scale metacognitive judgments, such as JOLs ranging from 0 - 100, are not analyzed by their absolute values; rather, they are considered rank judgments of memory likelihood. A judgment of “10”, for example, would be considered a greater percieved likelihood than a judgment of “1”, but would not be considered 10 times more likely than “1” in a gamma correlation.

Gamma correlations are computed pair-wise, meaning that the memory outcomes (0 or 1) are paired with a corresponding judgment (JOL, FOK, etc.) and compared to all other pairs of numbers. All pairs that are ranked on the same order for both variables are considered “concordant”, and all pairs that are ranked on the reverse order are considered “disconcordant”. Instances are tallied, and the difference between the concordant and disconcordant pairs is divided by the sum of the concordant and disconcordant pairs to return a vector from -1 to 1, with values less than 0 indicating a negative relationship between the judgment and subsequent memory (e.g. larger JOLs for items that are unrecalled) and values greater than 0 indicating a positive relationship (e.g. larger JOLs for items that are recalled).

\(G = \frac{N_c - N_d}{N_c + N_d}\)

Calculations in R

As a note: If you are viewing this in R as a Notebook, then you can execute the code in the boxes by clicking Run or by clicking inside of the chunk and pressing Ctrl+Shift+Enter. If you are viewing this as a PDF or HTML, the output will appear below the associated box.

The first thing that you will need to do (assuming that you already have RStudio set up) is install and initialize the packages associated with gammas and the computations below. The code here will only initialize it, but installation code is available in the commented sections.

#install.packages(Hmisc)
require(Hmisc)

Hmisc is a freely-available package (abbreviated from “Harrell Miscellaneous”) that supplements the standard R statistics repository. Most of the functions that are available here are of no interest to us, except for rcorr.cens, which provides the Goodman-Kruskal correlation – provided that you use the correct code. We will get into that later.

To begin, let’s generate a dataset to work with. For this example, we will be using scale FOKs from 0 to 100, with an FOK of 0 meaning no confidence and an FOK of 100 meaning absolute confidence, and recognition outcome data, with 0 meaning an incorrect recognition and 1 meaning a correct recognition. The code below randomly generates 60 FOKs from 0 - 100 and 60 recognition memory outcomes:

# Randomly generate FOk dataset
recog <- sample(0:1,60,replace=TRUE)
fok <- sample(0:100,60,replace=TRUE)
# Show generated data
print(recog)
 [1] 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 0 1 0 0 1 0
print(fok)
 [1] 13 29 12 88 74 71 56 62 97 65 75 53 45 26 57 77 94 14 13 74 52 96 41 36 41 35 66 32 87 64 21 11 44 76 36 44 15 19 34 85
[41] 82  6  6 18 22 78 40 37 40 38 27 23 20 32 18 88 80 12 48 77

The easiest way to calculate the GK gamma statistic is to run the command rcorr.cens(x, y, outx = TRUE), where x is the first vector of numbers you want to compare and y is the second vector of numbers you want to compare. The outx = TRUE command is crucial to the gamma calculation because rcorr.cens does not automatically throw out ties, which are not part of the gamma calculation. Specifying outx = TRUE will throw out ties and give you a true gamma statistic.

Here is an implementation of the code using the fok and recog data we generated a second ago:

# Generate gamma correlation
rcorr.cens(recog, fok, outx = TRUE)
       C Index            Dxy           S.D.              n        missing     uncensored Relevant Pairs     Concordant 
     0.4420697     -0.1158605      0.1552779     60.0000000      0.0000000     60.0000000   1778.0000000    786.0000000 
     Uncertain 
     0.0000000 

Notice that the output gives you explicit data about the calculation being made, including pairs that were thrown out due to “ties”, the number of concordant pairs, and so on. The most imporant piece of output here is Dxy, which is the gamma statistic. For this example, the generated data have a weak negative relationship (-0.116), but if you do this on your own, you will generate a different statistic.2

The output is useful, but for batch commands, sorting through the numbers and copying + pasting can be tedious. A better way to compute the gamma statistic is to specify that you only want the second piece of information generated (i.e., Dxy):

# Generate gamma statistic only
gamma = rcorr.cens(recog, fok, outx = TRUE)[2]
# Display gamma statistic
print(paste0("The gamma correlation is: ", gamma))
[1] "The gamma correlation is: -0.115860517435321"

By specifying the information from the computation that you want to save (i.e., the gamma statistic, or [2]), you are telling R to save it as a scalar, making it easier to write to file.

Alternatives

While using Hmisc seems to be the easiest way to calculate and store gamma correlations for large datasets, there are other ways to calculate gammas in R. The vcdExtra package has a function called GKgamma that performs the same operation:

# Initialize vcdExtra, or install if needed
# install.packages(vcdExtra)
require(vcdExtra)
# Make FOK data into a table
gammaTable <- table(recog,fok)
# Compute gamma using GKgamma
GKgamma(gammaTable)
gamma        : -0.116 
std. error   : 0.155 
CI           : -0.42 0.188 

GKgamma requires that the dataset be constructed as a 2-column table in order to run the calculation, which is why recog and fok are collapsed into gammaTable in the code. The package also does not give you all of the specific information that rcorr.cens does (as demonstrated in the output), but you can save the statistic as a scalar to write to file:

# Generate gamma statistic only
gamma2 = GKgamma(gammaTable)[1]
# Display gamma statistic
print(paste0("The gamma correlation is: ", gamma2))
[1] "The gamma correlation is: -0.115860517435321"


Example With Real Data

Below is an example with real data collected from an experiment conducted at the Georgia Institute of Technology. The dataset, which I have named subjdata, is stored in a CSV file, which I loaded outside of the code presented above. Here is the data from that participant:

print(subjdata)

Ignore the “Recall”, “Recog”, and “Err” columns for now. We will just be calculating the gamma correlation between recognition memory accuracy (“Recog”) and confidence judgments (“CJ”) for the first example.

The calcuation here is the same as before, but with slighly different notation: Because we are pulling data from a table instead of vectors, we have to specify the columns from subjdata that we want to analyze. For CJs, we will put in subjdata$CJ, and for the recognition data, we will put in subjdata$Recog:

# Compute gamma statistic
rcorr.cens(subjdata$Recog, subjdata$CJ, outx = TRUE)
       C Index            Dxy           S.D.              n        missing     uncensored Relevant Pairs     Concordant 
     0.8259494      0.6518987      0.1333313     40.0000000      0.0000000     40.0000000    632.0000000    522.0000000 
     Uncertain 
     0.0000000 

As you can see, this particular participant’s CJs were well-aligned with actual recognition memory performance, yielding a moderately high positive gamma correlation (0.652).

More can be done with the data, such as computing the correlation between FOKs and recognition memory performance for unrecalled items. To do this, I must select the data associated with unrecalled items as a subset:

# Select FOKs for unrecalled items
x <- as.numeric(unlist(subset(subjdata, Recall == 0, select=c(FOK))))
# Select recognition data for unrecalled items
y <- as.numeric(unlist(subset(subjdata, Recall == 0, select=c(Recog))))
# Compute gamma correlation
rcorr.cens(x, y, outx = TRUE)
       C Index            Dxy           S.D.              n        missing     uncensored Relevant Pairs     Concordant 
     0.6034483      0.2068966      0.2521036     22.0000000      1.0000000     22.0000000    232.0000000    140.0000000 
     Uncertain 
     0.0000000 

The correlation here is less impressive (0.207), but does show that this particular participant has some understanding of their own future recognition memory performance when giving an FOK for an unrecalled item, as indicated by the weak positive relationship.


  1. JOLs are prospective memory judgments made during study to gage confidence in future recall performance and are compared to recall outcomes. FOKs are also prospective memory judgments, but are made about future recognition performance during recall. CJs are retrospective memory judgments and gage confidence that a recalled or recognized item is correct.

  2. Note that because these are randomly generated data, the FOK and recognition data will essentially have no relationship to each other, i.e. \(G = 0\).

