Introduction

The Corrplot package is a graphical representation of a correlation matrix, confidence interval. In addition, corrplot also has a plethora of options to chose color, title, label, string rotation, etc. Corrplot also has the support for visualizing general matrices.

Corrplot library can be loaded by the below command.

library('corrplot')
## corrplot 0.84 loaded
# Loading ISLR for reference dataset 'College'.
library(ISLR)

Usage

corrplot (data, method = c("circle", "square", "ellipse", "number", "shade",
"color", "pie"),, varnames=NULL, cutpts=NULL,
    abs=TRUE, details=TRUE, 
    n.col.legend=5, cex.col=0.7, 
    cex.var=0.9, digits=1, color=FALSE)

Arguments

  data                a data matrix

  varnames      variable names of the data matrix, if not provided use default variable names

  abs                 if TRUE, transform all correlation values into positive values, default=TRUE.

  cutpts            a vector of cutting points for the color legend, default is NULL. The function will decide the cutting points if cutpts is not assigned.

  details           show more than one digits correlation values. The default is TRUE. FALSE is suggested to get readable output.

  n.col.legend  number of legend for the color thermometer.

  cex.col           font size of the color thermometer.

  cex.var           font size of the variable names.

  digits              number of digits shown in the text of the color thermometer.

  color               color of the plot, the default is FALSE, which uses grayscale.

Visualization Methods

Corrplot package supports the use of seven visualization methods (parameter methods), namely “circle”, “square”, “ellipse”, “number”, “shade”, “color” and “pie”.

#corrplot plots correlation between numeric fields only, extracting all the numeric fields from dataset 'College'.

#returns a vector containing 'True' for columns that has numeric values else 'False'.
num_only <- unlist(lapply(College, is.numeric))

#Create a new data frame with only numeric fields from the dataset College
College_num <- College[ , num_only]
library("knitr")
kable(College_num[1:5,1:11], caption = "College dataset")
College dataset
Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal
Abilene Christian University 1660 1232 721 23 52 2885 537 7440 3300 450 2200
Adelphi University 2186 1924 512 16 29 2683 1227 12280 6450 750 1500
Adrian College 1428 1097 336 22 50 1036 99 11250 3750 400 1165
Agnes Scott College 417 349 137 60 89 510 63 12960 5450 450 875
Alaska Pacific University 193 146 55 16 44 249 869 7560 4120 800 1500

Plotting a correlation graph between all columns in ‘College’ dataset. Note that positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients.

#Compute the variance between x and y, create a matrix for all combinations of columns in the input dataset
College_corr <- cor(College_num) # by default the function calculates Pearson correlation
#plot the correlation matrix
corrplot(College_corr, method="circle")

Method = ‘number’ plots the correlation with correlation coefficients

#Plotting the correlation with method = 'number'
corrplot(College_corr, method = 'number')

Layout

There are three permissible layout types (parameter type), “full” (shown above), “upper” (below left) and “lower”(below right).

Mixed

Corrplot can also visualize mixed plot styles.

# Corr plot with upper plot using square method while the lower one is using ellipse.
corrplot(College_corr,order = "AOE", type = "upper", method = "square", tl.pos = "lt", mar=c(5,10,0,0))
corrplot(College_corr,add = TRUE, type = "lower", method = "ellipse",order = "AOE",
diag = FALSE, tl.pos = "n",mar=c(5,5,0,0))

corrplot.mixed can also be used to plot mixed visualization styles.

#
corrplot.mixed(College_corr, lower = "square", upper = "color", diag = "n", tl.pos = "lt")

order is hclust (hierarchical clustering order) and draw rectangles to highlight the clusters

corrplot(College_corr, order = "hclust", addrect = 4, rect.col = "blue")

Conclusion

This is invariably a brief introduction to an extremely functional visual tool in R. Apart from the fundamental options discussed above there are lots more to be explored and tested. Below added references remain an excellent place to develop some in-depth knowledge into the package. Any suggestions or tips on the package are welcome.

References

  1. https://cran.r-project.org/web/packages/corrplot/corrplot.pdf
  2. https://rdrr.io/cran/corrplot/f/vignettes/corrplot-intro.Rmd