INTRODUCTION:

Welcome to our tutorial on how to create visually appealing heatmaps! Heatmaps are powerful visualization tools that allow researchers to identify patterns, trends, and outliers across two categorical variables simultaneously. This tutorial will teach you how to use the ggplot2 and plotly packages in R to create both static and interactive versions of a heatmap, focusing specifically on longitudinal data from various countries. By the end of this tutorial you should be able to (1) construct a basic heatmap using ggplot (2) customize a heatmap’s color scheme to your desired liking, and (3) convert a static heatmap into an interactive one, allowing you to explore the data in more detail. These skills are essential when dealing with more complex data, allowing you to create accessible visualizations for a wide range of audiences.

Heatmaps have multiple different uses. This tutorial will focus on how they may be used in data science to examine and compare temporal trends in the field of social sciences. The tutorial will teach you how to create interactive visualizations which will specifically allow audiences to hover over the heatmap to gather more information about the the data. This tutorial demonstrates these techniques using real-world labor force participation data, providing you with transferable skills applicable to any dataset with categorical variables and continuous measures.

Within this tutorial, we will use three primary R packages: ggplot2 for creating the base visualizations, plotly for adding interactivity, and dplyr (part of tidyverse) to manipulate the data. Other packages include RColorBrewer for expanding the color scheme. The dataset we will examine comprises of female labor force participation rates from 2014-2024, sourced from the World Bank. We will also focus on the G20 nationsto examine how women’s workforce participation has evolved across key economic regions, providing a case study which will help examine gender equality across cultural contexts.

KEY CONCEPTS:

As stated, this tutorial will help you use ggplot2 and plotly to create different styles of heatmaps. After learning how to create foundational heatmaps, we will explore customization options and interactivity features that enhance data exploration and presentation. Below is the overall structure of the tutorial:

  1. Install and Load Necessary Packages
  1. Load and Observe the Dataset
  1. Plotting a Basic Static Heatmap
  1. Customizing Color Schemes
  1. Creating an Interactive Heatmap
  1. Tutorial Conclusion

STEP-BY-STEP GUIDE ON HOW TO CREATE HEATMAPS USING GGPLOT2:

SECTION 0: Install and Load Necessary Packages

To begin, please load the necessary packages needed for this tutorial.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(countrycode)
library(RColorBrewer)
library(dplyr)

SECTION 1: Load Data from G20 Nations and Observe Dataset

Before we jump in to actually creating our heatmaps, let’s first take a look at the dataset we will be using, “G20_Data”.

#load the dataset
G20_data <- read.csv("G20_data.csv")

#Take a look at the first six rows of the dataset
head(G20_data)
#Take a look at the data frame dimension of the dataset
dim(G20_data)
## [1] 176   5

SECTION 2: Creating a Basic, Static Heatmap

The female labor participation rate measures the percentage of women in the working age population who are employed or actively searching for work. It is a key economic indicator, and its fluctuation is affected by multiple factors including education, governmental policies, cultural norms, and other societal factors. This rate can be an indicator for the level of gender equality of a country. Globally, the male labor participation rate is much higher than that of females. Here, we will look at G20 nations’ female labor participation rate over a decade (2014-2024). The G20 nations were chosen since if we examine all the countries in the world, we would end up with 400+ y variables, which would make it difficult to decipher a trend. In this section, you will begin by creating a basic heatmap with year on the x axis, country on the y axis, and rate indicated by the color of each grid. Heatmaps use color gradients to represent data density, with warm colors indicating higher values and cooler colors indicating lower values. The color scheme showing the colors and their corresponding rate (%) is shown on the right-hand side of the heatmap.

ggplot(G20_data, aes(
    x = year,
    y = Economy,
    fill = female_labour_participation_rate
)) +
  geom_tile() +
  scale_fill_viridis_c() +
  labs(
    title = "Female Labour Force Participation Rate (2014–2024)",
    x = "Year",
    y = "Country",
    fill = "Rate (%)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

SECTION 3: Changing the Color Scheme

In this section, you will attempt to change the color scheme of the heatmap so that the trend is more easily visible. In the map above, the colors of the grid range from purple, blue, green to yellow, and it is difficult to tell if the colors indicate higher values or lower values. Therefore, here, you will make it so that the colors range from white to darkgreen, which limits the color variety and makes it easier to distinguish warm vs cool shades. This color scheme can be changed to whatever you feel works best, so please feel free to change the colors! You are also shown how to rotate the x axis so that the labels do not overlap.

ggplot(G20_data, aes(year, Economy, fill= female_labour_participation_rate)) + 
  geom_tile() +
  scale_fill_gradient(low="white", high="darkgreen") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels so we can see it better

This color scheme makes it easier to see the trend as the two colors chosen are contrasting (white is a lot lighter than darkgreen). For Saudi Arabia for example, we can clearly see that their female labor participation rate is slowly increasing overtime. Also, it seems like for many countries, the grid color turns slightly light at 2020, meaning a slight decrease in rate. Does this have something to do with the COVID pandemic onset? Could it be that the resulting layoffs affected more women than men?

SECTION 4: Adapting to Create an Interactive Heatmap

While static heatmaps effectively display overall patterns and trends, they have limitations when audiences need to examine specific data points or explore the data in detail. Interactive heatmaps address this challenge by allowing you to hover over individual tiles to reveal exact values and additional context. This is particularly valuable when presenting to diverse audiences. For instance, policymakers may want to focus on their own country’s trajectory, researchers may need exact figures when citing, and general audiences may be curious to know the exact figures (especially to compare data that appears to be close to one another). In this section, you will transform your static heatmap into an interactive visualization using the plotly package. You’ll learn to create custom tooltip text that displays the country name, year, and exact participation rate when users hover over any square. The resulting interactive graphic can be embedded in HTML reports, shared as standalone web files, or included in presentations, making your data analysis more accessible and engaging for any audience!

# Create tooltip text column
G20_data <- G20_data %>%
  mutate(text = paste0(
    "Country: ", Economy, "\n",
    "Year: ", year, "\n",
    "Female Labor Force Participation (%): ", round(female_labour_participation_rate, 2), "%"
  ))

# Create ggplot with text in aes
p <- ggplot(G20_data, aes(
    x = year,
    y = Economy,
    fill = female_labour_participation_rate,
    text = text
)) +
  geom_tile() +
  scale_fill_viridis_c() +
  labs(
    title = "Female Labour Force Participation Rate (2014–2024)",
    x = "Year",
    y = "Country",
    fill = "Rate (%)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

# Convert to interactive plotly
ggplotly(p, tooltip = "text")

CONCLUSION:

Thank you for following along our tutorial! We went over how to make a basic heatmap using ggplot, which is widely used to visualize and display a geographical distribution of data. Then, we covered how to customize the color scheme of a heatmap using the scale_fill_gradient function as well as how to rotate the x axis labels so that the words do not overlap. Finally, we made an interactive heatmap using the ggplotly function.

This enables the viewer to hover over the cells and read the data for each individual country in a specific year. Hopefully, this tutorial helped you learn some basic ggplot functions to create a basic heatmap, change its design to your own desire, and convert it into an interactive map. The plotly function not only enhances data visualization, but it also better conveys the information to the audience. Since each data point becomes visible by hovering over each cell, viewers do not have to go back to the dataset and scour for the information that they are looking for. Heatmaps can be used for many different purposes, such as visualizing election results, environmental data like temperature or air quality, and even user scrolls on a website. We hope you can use this tutorial to make a heatmap for your data of choice!

Here are some resources you can use to learn more about creating heatmaps using R:

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.