The Corrplot package is a graphical representation of a correlation matrix, confidence interval. In addition, corrplot also has a plethora of options to chose color, title, label, string rotation, etc. Corrplot also has the support for visualizing general matrices.
Corrplot library can be loaded by the below command.
library('corrplot')
## corrplot 0.84 loaded
# Loading ISLR for reference dataset 'College'.
library(ISLR)
corrplot (data, method = c("circle", "square", "ellipse", "number", "shade",
"color", "pie"),, varnames=NULL, cutpts=NULL,
abs=TRUE, details=TRUE,
n.col.legend=5, cex.col=0.7,
cex.var=0.9, digits=1, color=FALSE)
data a data matrix
varnames variable names of the data matrix, if not provided use default variable names
abs if TRUE, transform all correlation values into positive values, default=TRUE.
cutpts a vector of cutting points for the color legend, default is NULL. The function will decide the cutting points if cutpts is not assigned.
details show more than one digits correlation values. The default is TRUE. FALSE is suggested to get readable output.
n.col.legend number of legend for the color thermometer.
cex.col font size of the color thermometer.
cex.var font size of the variable names.
digits number of digits shown in the text of the color thermometer.
color color of the plot, the default is FALSE, which uses grayscale.
Corrplot package supports the use of seven visualization methods (parameter methods), namely “circle”, “square”, “ellipse”, “number”, “shade”, “color” and “pie”.
#corrplot plots correlation between numeric fields only, extracting all the numeric fields from dataset 'College'.
#returns a vector containing 'True' for columns that has numeric values else 'False'.
num_only <- unlist(lapply(College, is.numeric))
#Create a new data frame with only numeric fields from the dataset College
College_num <- College[ , num_only]
library("knitr")
kable(College_num[1:5,1:11], caption = "College dataset")
| Apps | Accept | Enroll | Top10perc | Top25perc | F.Undergrad | P.Undergrad | Outstate | Room.Board | Books | Personal | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Abilene Christian University | 1660 | 1232 | 721 | 23 | 52 | 2885 | 537 | 7440 | 3300 | 450 | 2200 |
| Adelphi University | 2186 | 1924 | 512 | 16 | 29 | 2683 | 1227 | 12280 | 6450 | 750 | 1500 |
| Adrian College | 1428 | 1097 | 336 | 22 | 50 | 1036 | 99 | 11250 | 3750 | 400 | 1165 |
| Agnes Scott College | 417 | 349 | 137 | 60 | 89 | 510 | 63 | 12960 | 5450 | 450 | 875 |
| Alaska Pacific University | 193 | 146 | 55 | 16 | 44 | 249 | 869 | 7560 | 4120 | 800 | 1500 |
Plotting a correlation graph between all columns in ‘College’ dataset. Note that positive correlations are displayed in blue and negative correlations in red color. Color intensity and the size of the circle are proportional to the correlation coefficients.
#Compute the variance between x and y, create a matrix for all combinations of columns in the input dataset
College_corr <- cor(College_num) # by default the function calculates Pearson correlation
#plot the correlation matrix
corrplot(College_corr, method="circle")
Method = ‘number’ plots the correlation with correlation coefficients
#Plotting the correlation with method = 'number'
corrplot(College_corr, method = 'number')
There are three permissible layout types (parameter type), “full” (shown above), “upper” (below left) and “lower”(below right).
Corrplot can also visualize mixed plot styles.
# Corr plot with upper plot using square method while the lower one is using ellipse.
corrplot(College_corr,order = "AOE", type = "upper", method = "square", tl.pos = "lt", mar=c(5,10,0,0))
corrplot(College_corr,add = TRUE, type = "lower", method = "ellipse",order = "AOE",
diag = FALSE, tl.pos = "n",mar=c(5,5,0,0))
corrplot.mixed can also be used to plot mixed visualization styles.
#
corrplot.mixed(College_corr, lower = "square", upper = "color", diag = "n", tl.pos = "lt")
order is hclust (hierarchical clustering order) and draw rectangles to highlight the clusters
corrplot(College_corr, order = "hclust", addrect = 4, rect.col = "blue")
This is invariably a brief introduction to an extremely functional visual tool in R. Apart from the fundamental options discussed above there are lots more to be explored and tested. Below added references remain an excellent place to develop some in-depth knowledge into the package. Any suggestions or tips on the package are welcome.