In this Rpub, we are going to talk about different options in R for producing corrgrams in order to visualize a matrix of correlations, like the figure below:
A corrgram, sometimes mistakenly referred to as correlogram, is just a visual display technique that helps us to represent the pattern of relations among a set of variables in terms of their correlations.
Basically, a corrgram is a graphical representation of the cells of a matrix of correlations. The idea is to display the pattern of correlations in terms of their signs and magnitudes using visual thinning and correlation-based variable ordering. Moreover, the cells of the matrix can be shaded or colored to show the correlation value.
In R, we have two useful packages that provide functions to plot corrgrams: corrgram (by Kevin Wright) and ellipse (by Duncan Murdoch and E. D. Chow):
library(corrgram)
library(ellipse)
To know more about corrgrams see Corrgrams: Exploratory Displays for Correlation Matrices (by Michael Friendly)
For illustration purposes, let's consider the dataset decathlon
that comes with the package FactoMineR (by Husson, Josse, Le, and Mazet):
# 'FactoMineR'
library(FactoMineR)
# load data 'decathlon'
data(decathlon)
# more info about the data
help(decathlon)
Since our purpose is to work with the correlations among variables, we need to calculate the matrix of correlations of the first 10 variables in the data:
# matrix of correlations
R = cor(decathlon[, 1:10])
round(R, 3)
## 100m Long.jump Shot.put High.jump 400m 110m.hurdle Discus
## 100m 1.000 -0.599 -0.356 -0.246 0.520 0.580 -0.222
## Long.jump -0.599 1.000 0.183 0.295 -0.602 -0.505 0.194
## Shot.put -0.356 0.183 1.000 0.489 -0.138 -0.252 0.616
## High.jump -0.246 0.295 0.489 1.000 -0.188 -0.283 0.369
## 400m 0.520 -0.602 -0.138 -0.188 1.000 0.548 -0.118
## 110m.hurdle 0.580 -0.505 -0.252 -0.283 0.548 1.000 -0.326
## Discus -0.222 0.194 0.616 0.369 -0.118 -0.326 1.000
## Pole.vault -0.083 0.204 0.061 -0.156 -0.079 -0.003 -0.150
## Javeline -0.158 0.120 0.375 0.172 0.004 0.009 0.158
## 1500m -0.061 -0.034 0.116 -0.045 0.408 0.038 0.258
## Pole.vault Javeline 1500m
## 100m -0.083 -0.158 -0.061
## Long.jump 0.204 0.120 -0.034
## Shot.put 0.061 0.375 0.116
## High.jump -0.156 0.172 -0.045
## 400m -0.079 0.004 0.408
## 110m.hurdle -0.003 0.009 0.038
## Discus -0.150 0.158 0.258
## Pole.vault 1.000 -0.030 0.247
## Javeline -0.030 1.000 -0.180
## 1500m 0.247 -0.180 1.000
corrgram()
The primary function in the package corrgram is corrgram()
. The main argument for this function is either a data frame with the data or a matrix of correlations. Here's how to use it (with default options):
# default corrgram
corrgram(R)
As you can see, with corrgram()
we get a display of the cells in matrix R
. The positive correlations are shown in blue, while the negative correlations are shown in red. The darker the hue, the greater the magnitude of the correlation.
corrgram()
Another plotting option is to display just one of the triangular parts of the matrix. This is done by specifying the arguments lower.panel
and upper.panel
:
# corrgram (lower triangular)
corrgram(R, order = NULL, lower.panel = panel.shade, upper.panel = NULL, text.panel = panel.txt,
main = "Decathlon Data")
corrgram()
with Pie ChartsBesides shading and coloring each cell of the displayed matrix, it is also possible to use pie charts for visualizing the correlation values. I don't really like this type of representation but here's how you could use it:
# corrgram with pie charts
corrgram(R, order = TRUE, lower.panel = panel.shade, upper.panel = panel.pie,
text.panel = panel.txt, main = "Decathlon Data")
plotcorr()
The package ellipse provides the function plotcorr()
that helps us to visualize correlations. plotcorr()
uses ellipse-shaped glyphs for each entry of the correlation matrix. Here's the default plot using our matrix of R
:
# default corrgram
plotcorr(R)
plotcorr()
with colorsYou can use the argument col
to specify colors when using plotcorr()
:
# colored corrgram
plotcorr(R, col = colorRampPalette(c("firebrick3", "white", "navy"))(10))
As you can tell, colors are not used to represent the magnitude of the correlation. They are just used to fill the ellipses in each row.
# another colored corrgram
plotcorr(R, col = colorRampPalette(c("#E08214", "white", "#8073AC"))(10), type = "lower")