Visualizing Correlations with Corrgrams

by Gaston Sanchez

Intro

In this Rpub, we are going to talk about different options in R for producing corrgrams in order to visualize a matrix of correlations, like the figure below:

plot of chunk unnamed-chunk-1


Corrgrams

A corrgram, sometimes mistakenly referred to as correlogram, is just a visual display technique that helps us to represent the pattern of relations among a set of variables in terms of their correlations.

Basically, a corrgram is a graphical representation of the cells of a matrix of correlations. The idea is to display the pattern of correlations in terms of their signs and magnitudes using visual thinning and correlation-based variable ordering. Moreover, the cells of the matrix can be shaded or colored to show the correlation value.

In R, we have two useful packages that provide functions to plot corrgrams: corrgram (by Kevin Wright) and ellipse (by Duncan Murdoch and E. D. Chow):

library(corrgram)
library(ellipse)

To know more about corrgrams see Corrgrams: Exploratory Displays for Correlation Matrices (by Michael Friendly)


Data

For illustration purposes, let's consider the dataset decathlon that comes with the package FactoMineR (by Husson, Josse, Le, and Mazet):

# 'FactoMineR'
library(FactoMineR)
# load data 'decathlon'
data(decathlon)
# more info about the data
help(decathlon)

Since our purpose is to work with the correlations among variables, we need to calculate the matrix of correlations of the first 10 variables in the data:

# matrix of correlations
R = cor(decathlon[, 1:10])
round(R, 3)
##               100m Long.jump Shot.put High.jump   400m 110m.hurdle Discus
## 100m         1.000    -0.599   -0.356    -0.246  0.520       0.580 -0.222
## Long.jump   -0.599     1.000    0.183     0.295 -0.602      -0.505  0.194
## Shot.put    -0.356     0.183    1.000     0.489 -0.138      -0.252  0.616
## High.jump   -0.246     0.295    0.489     1.000 -0.188      -0.283  0.369
## 400m         0.520    -0.602   -0.138    -0.188  1.000       0.548 -0.118
## 110m.hurdle  0.580    -0.505   -0.252    -0.283  0.548       1.000 -0.326
## Discus      -0.222     0.194    0.616     0.369 -0.118      -0.326  1.000
## Pole.vault  -0.083     0.204    0.061    -0.156 -0.079      -0.003 -0.150
## Javeline    -0.158     0.120    0.375     0.172  0.004       0.009  0.158
## 1500m       -0.061    -0.034    0.116    -0.045  0.408       0.038  0.258
##             Pole.vault Javeline  1500m
## 100m            -0.083   -0.158 -0.061
## Long.jump        0.204    0.120 -0.034
## Shot.put         0.061    0.375  0.116
## High.jump       -0.156    0.172 -0.045
## 400m            -0.079    0.004  0.408
## 110m.hurdle     -0.003    0.009  0.038
## Discus          -0.150    0.158  0.258
## Pole.vault       1.000   -0.030  0.247
## Javeline        -0.030    1.000 -0.180
## 1500m            0.247   -0.180  1.000

Corrgrams with corrgram()

The primary function in the package corrgram is corrgram(). The main argument for this function is either a data frame with the data or a matrix of correlations. Here's how to use it (with default options):

# default corrgram
corrgram(R)

plot of chunk unnamed-chunk-5

As you can see, with corrgram() we get a display of the cells in matrix R. The positive correlations are shown in blue, while the negative correlations are shown in red. The darker the hue, the greater the magnitude of the correlation.


Lower Triangular corrgram()

Another plotting option is to display just one of the triangular parts of the matrix. This is done by specifying the arguments lower.panel and upper.panel:

# corrgram (lower triangular)
corrgram(R, order = NULL, lower.panel = panel.shade, upper.panel = NULL, text.panel = panel.txt, 
    main = "Decathlon Data")

plot of chunk unnamed-chunk-6


corrgram() with Pie Charts

Besides shading and coloring each cell of the displayed matrix, it is also possible to use pie charts for visualizing the correlation values. I don't really like this type of representation but here's how you could use it:

# corrgram with pie charts
corrgram(R, order = TRUE, lower.panel = panel.shade, upper.panel = panel.pie, 
    text.panel = panel.txt, main = "Decathlon Data")

plot of chunk unnamed-chunk-7


Corrgrams with plotcorr()

The package ellipse provides the function plotcorr() that helps us to visualize correlations. plotcorr() uses ellipse-shaped glyphs for each entry of the correlation matrix. Here's the default plot using our matrix of R:

# default corrgram
plotcorr(R)

plot of chunk unnamed-chunk-8


plotcorr() with colors

You can use the argument col to specify colors when using plotcorr():

# colored corrgram
plotcorr(R, col = colorRampPalette(c("firebrick3", "white", "navy"))(10))

plot of chunk unnamed-chunk-9

As you can tell, colors are not used to represent the magnitude of the correlation. They are just used to fill the ellipses in each row.


# another colored corrgram
plotcorr(R, col = colorRampPalette(c("#E08214", "white", "#8073AC"))(10), type = "lower")

plot of chunk unnamed-chunk-10