Correlation Matrix Visualization using ggplot2
Develop a script in R to calculate and visualize a correlation matrix for a given dataset, with color-coded cells indicating the strength and direction of correlations, using ggplot2’s geom_tile function.
Step 1: load the Libraries.
# Load necessary libraries
library (ggplot2)
library (dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Dataset.
# Preview the dataset
head (mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Cundefined.
# Use built-in mtcars dataset
data (mtcars)
# Compute correlation matrix
cor_matrix <- cor (mtcars)
# Convert matrix to a data frame for plotting
cor_df <- as.data.frame (as.table (cor_matrix))
head (cor_df)
Var1 Var2 Freq
1 mpg mpg 1.0000000
2 cyl mpg -0.8521620
3 disp mpg -0.8475514
4 hp mpg -0.7761684
5 drat mpg 0.6811719
6 wt mpg -0.8676594
Explanation :
cor(mtcars)
computes pairwise correlation.
as.table()
flattens the matrix into a long-format table.
The result has 3 columns: Var1, Var2, and the correlation value (Freq
).
Step 2: Visualize Using ggplot2::geom_tile.
Create a multiple dot plots to compare the distribution
ggplot (cor_df, aes (x = Var1, y = Var2, fill = Freq)) +
geom_tile (color = "white" ) + # Draw tile borders
scale_fill_gradient2 (
low = "blue" , mid = "white" , high = "red" ,
midpoint = 0 , limit = c (- 1 , 1 ),
name = "Correlation"
) +
geom_text (aes (label = round (Freq, 2 )), size = 3 ) + # Show values
theme_minimal () +
labs (
title = "Correlation Matrix (mtcars)" ,
x = "" , y = ""
) +
theme (axis.text.x = element_text (angle = 45 , hjust = 1 ))
cor()
Computes correlation values between numeric variables.
as.table()
+ as.data.frame()
Converts matrix into a long format suitable for plotting.
ggplot()
Initializes the plot using long-form data.
geom_tile()
Creates color-coded tiles based on correlation values.
scale_fill_gradient2()
Applies a diverging color scale: red (strong +ve), blue (strong -ve), white (neutral).
geom_text()
Adds correlation values as text in each cell.
theme_minimal()
Cleans up the plot visually.
axis.text.x
rotation
Tilts x-axis labels for better readability.