Plotting correlation matrix in R.

Using PerformanceAnalytics package

#install.packages("PerformanceAnalytics")
#I already have this package.So, I have commented out that line of code.

We will use iris dataset.

The dataset looks like

head (iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

For plotting correaltion we need a matrix of numbers. So, we have to remove Species column and keep only four numeric variables. This can be done using select fucntion from dplyr package.

library (dplyr)
iris1 <- select (iris, 1:4)

Now, we can plot using iris1 data.

library (PerformanceAnalytics)
chart.Correlation(iris1,histogram=T,pch=15)

Result interpretation

The distribution of each variable is shown on the diagonal.

On the bottom of the diagonal : the bivariate scatter plots with a fitted line are displayed

On the top of the diagonal : the value of the correlation plus the significance level as stars.

Each significance level is associated to a symbol : p-values(0, 0.001, 0.01, 0.05, 0.1, 1) <=> symbols(“***”, “**”, “*”, “.”, " “)