Plotting correlation matrix in R.
Using PerformanceAnalytics package
#install.packages("PerformanceAnalytics")
#I already have this package.So, I have commented out that line of code.
We will use iris dataset.
The dataset looks like
head (iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
For plotting correaltion we need a matrix of numbers. So, we have to remove Species column and keep only four numeric variables. This can be done using select fucntion from dplyr package.
library (dplyr)
iris1 <- select (iris, 1:4)
Now, we can plot using iris1 data.
library (PerformanceAnalytics)
chart.Correlation(iris1,histogram=T,pch=15)
Result interpretation
The distribution of each variable is shown on the diagonal.
On the bottom of the diagonal : the bivariate scatter plots with a fitted line are displayed
On the top of the diagonal : the value of the correlation plus the significance level as stars.
Each significance level is associated to a symbol : p-values(0, 0.001, 0.01, 0.05, 0.1, 1) <=> symbols(“***”, “**”, “*”, “.”, " “)