Problem Statement
Develop a script in r to calculate and visualize a correlatio matrix for a given dataset, wth color-coded cells indicating the strength and direction of correlation,using ggplot2’s geom_tile function
#load the required libraries
library (ggplot2)
library (tidyr)
library (dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Dataset
We use the built-in mtcars dataset
#preview the dataset
head (mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
#use the built-in mtcars dataset
data (mtcars)
#compute correlation matrix
cor_matrix <- cor (mtcars)
#convert matrix to a data frame for plotting
cor_df <- as.data.frame (as.table (cor_matrix))
head (cor_df)
Var1 Var2 Freq
1 mpg mpg 1.0000000
2 cyl mpg -0.8521620
3 disp mpg -0.8475514
4 hp mpg -0.7761684
5 drat mpg 0.6811719
6 wt mpg -0.8676594
Explanation
cor(mtcars) computes pairwise correlation.
as.table() flattens matrix into a long format table.
The result has 3 columns: Var1, Var2 and the correlation value Freq.
Visualize using ggplot
p<- ggplot (cor_df,aes (x= Var1,y= Var2,fill = Freq))
p
p<- p+
geom_tile (color= "white" )
p
p<- p+ #draw title borders
scale_fill_gradient2 (
low= "blue" ,mid= "white" ,high= "red" ,
midpoint= 0 ,limit = c (- 1 ,1 ),
name= "correlation"
)
p
p<- p +
geom_text (aes (label= round (Freq,2 )),size= 3 )
p
p<- p+ #Show values
theme_minimal ()
p
p<- p+
labs (
title = "Correlation Matrix(mtcars)" ,
x = "" ,y = ""
)
p
p<- p+
theme (axis.text.x= element_text (angle = 45 ,hjust = 1 ))
p