The Art of Data Visualization: Creating Professional Heatmaps in R

Authors Rasouli, H., Moein, F
Date December 20, 2025
Affiliation ABSA, Tarbiat Modares University, Tehran, Iran
Workshop R club for researchers

Quick start

Master clustered heatmap visualization in under 60 minutes! This tutorial provides complete, ready-to-use code examples that you can immediately apply to your own datasets. Whether you’re analyzing biological data, financial metrics, survey responses, or any multivariate dataset, these techniques will accelerate your data exploration workflow.

1. Installation and Setup

# Installation
install.packages("heatmaply")

2. Opening its library

# Open its library
library(heatmaply)

3. Using built-in data as input

data <- mtcars
head(data, 4)
print(data)

4. head () function output

Output:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1

2. Simplest Usage - Basic Heatmap

# Basic heatmap
heatmaply(data)

Output: An interactive heatmap with default colors showing values of each variable.

3. Correlation Heatmap with heatmaply_cor()

# Plot correlation relationship between the input dataset
heatmaply_cor(
  cor(mtcars),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2
)

Code Components Explained:

  • cor(): Input correlation matrix
  • xlab = "features": Title of X axis (string format)
  • ylab = "type": Title of Y axis (string format)
  • k_col = 2 and k_row = 2: Number of clusters in each column and row

Output: A correlation heatmap with specific colors

Clustering Guidelines:

Number of Variables Clustering Type Recommended k
10-12 variables Simple clustering k = 2 or 3
12-20 variables Balanced detail/simplicity k = 3 or 4
> 20 variables More detailed grouping k = 4 to 6

4. Output Interpretation

  • A colorful and interactive plot ranging from +1 to -1
    • +1: Perfect positive correlation
    • 0: No correlation
    • -1: Perfect negative correlation
  • Two distinct colors:
    • Red: Complete positive correlation
    • Blue: Complete negative correlation
  • Interactive feature: Hover mouse cursor on each heatmap box to see numerical values

5. Practical Applications

  1. Identifying related variables
  2. Detecting multicollinearity for modeling
  3. Exploratory data analysis
  4. Feature selection

6. Customizing Gradient Colors

Before dealing with this topic, you should know some information about colors

Using color picker tools for choosing mind-catching colors in R plots

You can use this online color picker to get a wide range of beautiful colors. Online color picker

👁️ Understanding Color Vision Deficiency

Color vision deficiency, commonly called color blindness, is the decreased ability to see color differences. Most types are inherited, affect more men than women, and involve difficulties distinguishing between specific colors, most commonly reds and greens.

🟢 Green-Blind (Deuteranopia)

Distinguishing greens from reds and some shades of gray.

🔴 Red-Blind (Protanopia)

Distinguishing reds from greens and also from blues.

🔵 Blue-Blind (Tritanopia)

Distinguishing blues from yellows and violets from reds.

⚫ Desaturated (Achromatopsia / Monochromacy

Seeing any color at all; vision is in shades of black, white, and gray.

Basic Color Gradient:

# Three-color gradient
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = c("blue", "purple", "red")  # blue → purple → red
)

Output: A heatmap with blue-purple-red gradient

# Alternative color combination
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = c("green", "pink", "red")
)

Output: A heatmap with green-pink-red gradient

Built-In Continuous Color Scales:

# Example using RdYlBu
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colorscale = "RdYlBu"  # Red-Yellow-Blue
)

Output: A heatmap with RdYlBu palette

7. Advanced Colors with RColorBrewer

# Install and load RColorBrewer
install.packages("RColorBrewer")
library(RColorBrewer)

# Example 1: Spectral palette
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = brewer.pal(11, "Spectral")
)

Output: A heatmap with Spectral palette

# Example 2: YlOrRd palette
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = brewer.pal(9, "YlOrRd")
)

Output: A heatmap with YlOrRd palette:

# Example 3: YlGn palette
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = brewer.pal(5, "YlGn")
)

Output: A heatmap with YlGn palette

📋 RColorBrewer Palettes Reference:

Palette Max Colors Category Colorblind Safe
BrBG 11 div TRUE
PiYG 11 div TRUE
PRGn 11 div TRUE
PuOr 11 div TRUE
RdBu 11 div TRUE
RdGy 11 div FALSE
RdYlBu 11 div TRUE
RdYlGn 11 div FALSE
Spectral 11 div FALSE
Blues 9 seq TRUE
BuGn 9 seq TRUE
YlGn 9 seq TRUE
YlOrBr 9 seq TRUE
YlOrRd 9 seq TRUE

8. Viridis Color Scales

# Install and load viridis
install.packages("viridis")
library(viridis)

Available viridis palettes:

  • viridis - Default viridis palette
  • magma - Black-purple-red-yellow
  • plasma - Purple-blue-yellow
  • inferno - Black-red-orange-yellow
  • cividis - Specifically for colorblind people
  • mako - Blue-black-green
  • turbo - Rainbow-like with better perceptual uniformity
  • rocket - Dark purple-red-orange

# Example with plasma
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = plasma(100)
)

Output: A heatmap with plasma palette

# Example with turbo
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = turbo(100)
)

Output:

A heatmap with Turbo palette

# Example with inferno
heatmaply_cor(
  cor(data),
  xlab = "Features",
  ylab = "Features",
  k_col = 2,
  k_row = 2,
  colors = inferno(100)
)

Output: A heatmap with inferno palette

9. Advanced Correlation with P-Values

Step 1: Calculate Correlation Matrix

# Calculates Pearson correlation coefficients between all variables
# r is an 11×11 matrix of correlation values (-1 to 1)
r <- cor(mtcars)
print(round(r, 3))

Output:

       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
mpg   1.000 -0.852 -0.848 -0.776  0.681 -0.868  0.419  0.664  0.600  0.480 -0.551
cyl  -0.852  1.000  0.902  0.832 -0.700  0.782 -0.591 -0.811 -0.523 -0.493  0.527
disp -0.848  0.902  1.000  0.791 -0.710  0.888 -0.434 -0.710 -0.591 -0.556  0.395
hp   -0.776  0.832  0.791  1.000 -0.449  0.659 -0.708 -0.723 -0.243 -0.126  0.750
drat  0.681 -0.700 -0.710 -0.449  1.000 -0.712  0.091  0.440  0.713  0.700 -0.091
wt   -0.868  0.782  0.888  0.659 -0.712  1.000 -0.175 -0.555 -0.692 -0.583  0.428
qsec  0.419 -0.591 -0.434 -0.708  0.091 -0.175  1.000  0.745 -0.230 -0.213 -0.656
vs    0.664 -0.811 -0.710 -0.723  0.440 -0.555  0.745  1.000  0.168  0.206 -0.570
am    0.600 -0.523 -0.591 -0.243  0.713 -0.692 -0.230  0.168  1.000  0.794  0.058
gear  0.480 -0.493 -0.556 -0.126  0.700 -0.583 -0.213  0.206  0.794  1.000  0.274
carb -0.551  0.527  0.395  0.750 -0.091  0.428 -0.656 -0.570  0.058  0.274  1.000

Step 2: Create P-value Matrix Function

# Creates a matrix of p-values from correlation tests
# Uses cor.test() to test if each correlation is statistically significant
# outer() applies the test to all variable pairs
# Returns a matrix where each cell is the p-value for that correlation pair
cor.test.p <- function(x){
  FUN <- function(x, y) cor.test(x, y)[["p.value"]]
  z <- outer(
    colnames(x), 
    colnames(x), 
    Vectorize(function(i,j) FUN(x[,i], x[,j]))
  )
  dimnames(z) <- list(colnames(x), colnames(x))
  z
}

Step 3: Calculate P-value Matrix

# Applies the function to mtcars
# p is an 11×11 matrix of p-values (0 to 1)
# Small p-value (< 0.05) means correlation is statistically significant
p <- cor.test.p(mtcars)
print(round(p, 4))

Sample Output:

       mpg    cyl   disp     hp   drat     wt   qsec     vs     am   gear   carb
mpg  0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0171 0.0000 0.0002 0.0054 0.0011
cyl  0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0004 0.0000 0.0022 0.0042 0.0019
disp 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0131 0.0000 0.0004 0.0010 0.0250
hp   0.0000 0.0000 0.0000 0.0000 0.0090 0.0000 0.0000 0.0000 0.1800 0.4930 0.0000
drat 0.0000 0.0000 0.0000 0.0090 0.0000 0.0000 0.6190 0.0134 0.0000 0.0000 0.6220
wt   0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.3380 0.0009 0.0000 0.0003 0.0134
qsec 0.0171 0.0004 0.0131 0.0000 0.6190 0.3380 0.0000 0.0000 0.2050 0.2420 0.0000
vs   0.0000 0.0000 0.0000 0.0000 0.0134 0.0009 0.0000 0.0000 0.3380 0.2570 0.0010
am   0.0002 0.0022 0.0004 0.1800 0.0000 0.0000 0.2050 0.3380 0.0000 0.0000 0.7540
gear 0.0054 0.0042 0.0010 0.4930 0.0000 0.0003 0.2420 0.2570 0.0000 0.0000 0.1290
carb 0.0011 0.0019 0.0250 0.0000 0.6220 0.0134 0.0000 0.0010 0.7540 0.1290 0.0000

Step 4: Create Scatterplot Heatmap

heatmaply_cor(
  r,
  node_type = "scatter",
  point_size_mat = -log10(p), 
  point_size_name = "-log10(p-value)",
  label_names = c("x", "y", "Correlation"),
  main = "Functional Correlation Plot with Significance"
)

Output:

A Scatterplot Heatmap

10. Data transformation (scaling, normalize, and percentize)

If we assume that all variables follow a normal distribution, standardizing them—by subtracting the mean and dividing by the standard deviation—would align them closely with the standard normal distribution. In this standardized form, each data point expresses its distance from the mean in terms of standard deviation units. The scale parameter in heatmaply enables scaling along columns, rows, or both, and can be implemented as described below.

Key Parameters Explained:

node_type = "scatter"

  • Changes from traditional heatmap squares to scatterplot points
  • Each variable pair becomes a point instead of a colored square

point_size_mat = -log10(p)

  • Transformation: -log10(p-value)
    • p-value 0.05 → -log10(0.05) = 1.30
    • p-value 0.01 → -log10(0.01) = 2.00
    • p-value 0.001 → -log10(0.001) = 3.00
  • Why this transformation?
    • Larger values for more significant correlations
    • Points get bigger as p-values get smaller
    • Makes highly significant correlations more visible

Visual Interpretation:

  • Large red point: Strong positive AND statistically significant correlation
  • Large blue point: Strong negative AND statistically significant correlation
  • Small light-colored point: Weak correlation AND not statistically significant

📋 10. Summary

This tutorial covers:

✅ 1. Basic correlation heatmap creation

  • Using heatmaply_cor() to display correlation matrices
  • Setting axis titles with xlab and ylab

✅ 2. Data clustering

  • Using k_col and k_row for grouping
  • Guidelines for selecting cluster numbers based on variable count

✅ 3. Color customization

  • Basic colors with colors = c("color1", "color2", "color3")
  • RColorBrewer palettes with brewer.pal()
  • Viridis palettes for modern designs

✅ 4. Advanced analysis with p-values

  • Calculating statistical significance of correlations
  • Displaying scatterplot with node_type = "scatter"
  • Point size based on significance level

✅ 5. Practical applications

  • Identifying related variables
  • Detecting multicollinearity
  • Exploratory data analysis
  • Feature selection for modeling

Key Insights:

  1. Strong positive correlation = Dark red color
  2. Strong negative correlation = Dark blue color
  3. No correlation = Light/neutral colors
  4. Statistical significance = Larger point size in scatterplot mode

Next Steps for Learning:

  1. Apply to larger datasets
  2. Combine with other visualization techniques
  3. Use in professional reports and presentations
  4. Integrate with Shiny for interactive dashboards

The heatmaply package provides a powerful tool for interactive visualization of relationships in data and can be used at various stages of data analysis.

🎥 Previous Session Video Archive

Session Part Filename Direct Link
Part 1 part-1-R452-2025.mp4 📥 Download
Part 2 Part-2-2.mp4 📥 Download
Part 3 Part-3.mp4 📥 Download
Part 4 par-4.mp4 📥 Download

File Format: All videos are provided in the MP4 format. This is a universal container format commonly used for streaming and storing digital video and audio, ensuring broad compatibility with media players.