Welcome to our tutorial on how to create visually appealing heatmaps! Heatmaps are powerful visualization tools that allow researchers to identify patterns, trends, and outliers across two categorical variables simultaneously. This tutorial will teach you how to use the ggplot2 and plotly packages in R to create both static and interactive versions of a heatmap, focusing specifically on longitudinal data from various countries. By the end of this tutorial you should be able to (1) construct a basic heatmap using ggplot (2) customize a heatmap’s color scheme to your desired liking, and (3) convert a static heatmap into an interactive one, allowing you to explore the data in more detail. These skills are essential when dealing with more complex data, allowing you to create accessible visualizations for a wide range of audiences.
Heatmaps have multiple different uses. This tutorial will focus on how they may be used in data science to examine and compare temporal trends in the field of social sciences. The tutorial will teach you how to create interactive visualizations which will specifically allow audiences to hover over the heatmap to gather more information about the the data. This tutorial demonstrates these techniques using real-world labor force participation data, providing you with transferable skills applicable to any dataset with categorical variables and continuous measures.
Within this tutorial, we will use three primary R packages: ggplot2 for creating the base visualizations, plotly for adding interactivity, and dplyr (part of tidyverse) to manipulate the data. Other packages include RColorBrewer for expanding the color scheme. The dataset we will examine comprises of female labor force participation rates from 2014-2024, sourced from the World Bank. We will also focus on the G20 nationsto examine how women’s workforce participation has evolved across key economic regions, providing a case study which will help examine gender equality across cultural contexts.
As stated, this tutorial will help you use ggplot2 and plotly to create different styles of heatmaps. After learning how to create foundational heatmaps, we will explore customization options and interactivity features that enhance data exploration and presentation. Below is the overall structure of the tutorial:
In this section, you will create a foundational heatmap displaying female labor force participation rates across G20 countries over time. This baseline visualization will demonstrate the core structure of heatmap creation in ggplot2.
Main Functions: ggplot(), aes(), geom_tile()
Additional Functions: scale_fill_viridis_c(), labs(), theme_minimal(), theme()
In this section, you will modify the color palette of our baseline heatmap to explore how different color schemes affect data interpretation and visual appeal. We’ll move from the default viridis palette to custom gradient colors.
Main Function: scale_fill_gradient()
Additional Parameters: low and high color specifications
In this section, you will transform the static heatmap into an interactive visualization that allows users to hover over tiles to see detailed information. You’ll add custom tooltip text and convert the ggplot object into an interactive plotly widget.
Main Functions: mutate(), ggplotly()
Additional Functions: text aesthetic in aes(), tooltip in ggplotly()
To begin, please load the necessary packages needed for this tutorial.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(countrycode)
library(RColorBrewer)
library(dplyr)
Before we jump in to actually creating our heatmaps, let’s first take a look at the dataset we will be using, “G20_Data”.
#load the dataset
G20_data <- read.csv("G20_data.csv")
#Take a look at the first six rows of the dataset
head(G20_data)
#Take a look at the data frame dimension of the dataset
dim(G20_data)
## [1] 176 5
The female labor participation rate measures the percentage of women in the working age population who are employed or actively searching for work. It is a key economic indicator, and its fluctuation is affected by multiple factors including education, governmental policies, cultural norms, and other societal factors. This rate can be an indicator for the level of gender equality of a country. Globally, the male labor participation rate is much higher than that of females. Here, we will look at G20 nations’ female labor participation rate over a decade (2014-2024). The G20 nations were chosen since if we examine all the countries in the world, we would end up with 400+ y variables, which would make it difficult to decipher a trend. In this section, you will begin by creating a basic heatmap with year on the x axis, country on the y axis, and rate indicated by the color of each grid. Heatmaps use color gradients to represent data density, with warm colors indicating higher values and cooler colors indicating lower values. The color scheme showing the colors and their corresponding rate (%) is shown on the right-hand side of the heatmap.
ggplot(G20_data, aes(
x = year,
y = Economy,
fill = female_labour_participation_rate
)) +
geom_tile() +
scale_fill_viridis_c() +
labs(
title = "Female Labour Force Participation Rate (2014–2024)",
x = "Year",
y = "Country",
fill = "Rate (%)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
In this section, you will attempt to change the color scheme of the heatmap so that the trend is more easily visible. In the map above, the colors of the grid range from purple, blue, green to yellow, and it is difficult to tell if the colors indicate higher values or lower values. Therefore, here, you will make it so that the colors range from white to darkgreen, which limits the color variety and makes it easier to distinguish warm vs cool shades. This color scheme can be changed to whatever you feel works best, so please feel free to change the colors! You are also shown how to rotate the x axis so that the labels do not overlap.
ggplot(G20_data, aes(year, Economy, fill= female_labour_participation_rate)) +
geom_tile() +
scale_fill_gradient(low="white", high="darkgreen") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels so we can see it better
This color scheme makes it easier to see the trend as the two colors chosen are contrasting (white is a lot lighter than darkgreen). For Saudi Arabia for example, we can clearly see that their female labor participation rate is slowly increasing overtime. Also, it seems like for many countries, the grid color turns slightly light at 2020, meaning a slight decrease in rate. Does this have something to do with the COVID pandemic onset? Could it be that the resulting layoffs affected more women than men?
While static heatmaps effectively display overall patterns and trends, they have limitations when audiences need to examine specific data points or explore the data in detail. Interactive heatmaps address this challenge by allowing you to hover over individual tiles to reveal exact values and additional context. This is particularly valuable when presenting to diverse audiences. For instance, policymakers may want to focus on their own country’s trajectory, researchers may need exact figures when citing, and general audiences may be curious to know the exact figures (especially to compare data that appears to be close to one another). In this section, you will transform your static heatmap into an interactive visualization using the plotly package. You’ll learn to create custom tooltip text that displays the country name, year, and exact participation rate when users hover over any square. The resulting interactive graphic can be embedded in HTML reports, shared as standalone web files, or included in presentations, making your data analysis more accessible and engaging for any audience!
# Create tooltip text column
G20_data <- G20_data %>%
mutate(text = paste0(
"Country: ", Economy, "\n",
"Year: ", year, "\n",
"Female Labor Force Participation (%): ", round(female_labour_participation_rate, 2), "%"
))
# Create ggplot with text in aes
p <- ggplot(G20_data, aes(
x = year,
y = Economy,
fill = female_labour_participation_rate,
text = text
)) +
geom_tile() +
scale_fill_viridis_c() +
labs(
title = "Female Labour Force Participation Rate (2014–2024)",
x = "Year",
y = "Country",
fill = "Rate (%)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
# Convert to interactive plotly
ggplotly(p, tooltip = "text")
Thank you for following along our tutorial! We went over how to make a basic heatmap using ggplot, which is widely used to visualize and display a geographical distribution of data. Then, we covered how to customize the color scheme of a heatmap using the scale_fill_gradient function as well as how to rotate the x axis labels so that the words do not overlap. Finally, we made an interactive heatmap using the ggplotly function.
This enables the viewer to hover over the cells and read the data for each individual country in a specific year. Hopefully, this tutorial helped you learn some basic ggplot functions to create a basic heatmap, change its design to your own desire, and convert it into an interactive map. The plotly function not only enhances data visualization, but it also better conveys the information to the audience. Since each data point becomes visible by hovering over each cell, viewers do not have to go back to the dataset and scour for the information that they are looking for. Heatmaps can be used for many different purposes, such as visualizing election results, environmental data like temperature or air quality, and even user scrolls on a website. We hope you can use this tutorial to make a heatmap for your data of choice!
Here are some resources you can use to learn more about creating heatmaps using R:
The R-graph gallery: ggplot2 heatmap https://r-graph-gallery.com/79-levelplot-with-ggplot2.html
The R-graph gallery: interactive heatmap: https://r-graph-gallery.com/79-levelplot-with-ggplot2.html
Little Miss Data: Time Based Heatmaps in R https://www.littlemissdata.com/blog/heatmaps
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.