
An Introduction to Treemaps
We live in a data-driven world. But understanding data in numbers is
not always quick and easy. Visualizations like charts and treemaps (or
treemapping, tree maps) help us present data in a way that is quick and
easy to digest.
Treemaps display hierarchical (tree-structured) data as a set of
nested rectangles. Each branch of the tree is given a rectangle, which
is then tiled with smaller rectangles representing sub-branches. A leaf
node’s rectangle has an area proportional to a specified dimension of
the data. Often the leaf nodes are colored to show a separate dimension
of the data.
When the color and size dimensions are correlated in some way with
the tree structure, one can often easily see patterns that would be
difficult to spot in other ways, such as whether a certain color is
particularly relevant. A second advantage of treemaps is that, by
construction, they make efficient use of space. As a result, they can
legibly display thousands of items on the screen simultaneously.
History of Treemaps
Area-based visualizations have existed for decades. For example,
mosaic plots (also known as Marimekko diagrams) use rectangular tilings
to show joint distributions (i.e., most commonly they are essentially
stacked column plots where the columns are of different widths). The
main distinguishing feature of a treemap, however, is the recursive
construction that allows it to be extended to hierarchical data with any
number of levels. This idea was invented by professor Ben Shneiderman at
the University of Maryland Human – Computer Interaction Lab in the early
1990s. Shneiderman and
his collaborators then deepened the idea by introducing a variety of
interactive techniques for filtering and adjusting treemaps.
These early treemaps all used the simple “slice-and-dice” tiling
algorithm. Despite many desirable properties (it is stable, preserves
ordering, and is easy to implement), the slice-and-dice method often
produces tilings with many long, skinny rectangles. In 1994 Mountaz Hascoet and Michel
Beaudouin-Lafon invented a “squarifying” algorithm, later
popularized by Jarke van Wijk,
that created tilings whose rectangles were closer to square. In 1999 Martin
Wattenberg used a variation of the “squarifying” algorithm that he
called “pivot and slice” to create the first Web-based treemap, the
SmartMoney Map of the Market, which displayed data on hundreds of
companies in the U.S. stock market. Following its launch, treemaps
enjoyed a surge of interest, especially in financial contexts.
A third wave of treemap innovation came around 2004, after Marcos
Weskamp created the Newsmap, a treemap that displayed news headlines.
This example of a non-analytical treemap inspired many imitators, and
introduced treemaps to a new, broad audience.[citation needed] In recent
years, treemaps have made their way into the mainstream media, including
usage by the New York Times. The Treemap Art Project produced 12 framed
images for the National Academies (United States), shown the Every
AlgoRiThm has ART in It exhibit in Washington, DC and another set for
the collection of Museum of Modern Art in New York.
Tiling Algorithms for Treemaps
To create a treemap, one must define a tiling algorithm, that is, a
way to divide a region into sub-regions of specified areas. Ideally, a
treemap algorithm would create regions that satisfy the following
criteria:
A small aspect ratio—ideally close to one. Regions with a small
aspect ratio (i.e., fat objects) are easier to perceive.
Preserve some sense of the ordering in the input data
(ordered).
Change to reflect changes in the underlying data (high
stability).
These properties have an inverse relationship. As the aspect ratio is
optimized, the order of placement becomes less predictable. As the order
becomes more stable, the aspect ratio is degraded.
Advantages of Treemaps
Here are some advantages of tree maps over bar (or pie) charts:
Efficient Space Utilization. Treemaps are excellent
for displaying hierarchical (tree-structured) data and for efficiently
using space. Unlike bar or pie charts that may require more space to
display many items distinctly, tree maps can show hundreds or even
thousands of items in a compact space.
Effective for Large Data Sets. Tree maps can handle
large datasets much better than pie charts and, to a certain extent, bar
charts. They can display a vast number of items at once, making it
easier to compare different segments without flipping through multiple
charts.
Hierarchical Representation. Tree maps can show
parts of a whole and how those parts are subdivided into smaller parts,
which is something pie charts cannot do and bar charts struggle with.
This makes tree maps ideal for visualizing nested data in a way that
immediately reveals the structure of the data.
Color Coding and Size Dimensions. Tree maps use both
size and color to represent different dimensions of data, offering a
multifaceted view of the dataset. For example, the size of each box can
represent a quantity, while the color indicates a category or metric,
providing a dense and rich informational summary at a glance.
Better for Comparing Proportions. While pie charts
can show proportions, they can be misleading or hard to interpret when
there are many small slices. Tree maps can more accurately represent
proportions, especially for categories that make up a smaller percentage
of the whole, because the size of each rectangle can be more precisely
compared than the angles or areas of pie slices.
Readability of Small Categories. In bar and pie
charts, small categories can become indistinguishable or require
additional labeling that clutters the chart. In a tree map, even small
categories can be more easily identified and analyzed without
overwhelming the visual presentation.
Intuitive for Certain Types of Data. For datasets
where the hierarchical structure or categorization is significant, tree
maps provide an intuitive and immediate understanding of the data’s
structure and composition. This makes them especially useful for certain
types of financial, organizational, or categorical data.
While tree maps have these advantages, the choice between using a
tree map or bar chart should be based on the specific goals of the data
visualization, the nature of the data, and the audience’s familiarity
with these tools. Each type of chart has its place in data presentation,
and understanding the strengths of each can help in selecting the most
effective way to communicate data insights.
A Real-world Application
Below are the R codes to create the Treemap as seen above:
#--------------------------------------------
# Stage 0: Load R packages and select font
#--------------------------------------------
# Load R packages:
library(rvest)
library(ggplot2)
library(dplyr)
library(stringr)
library(DescTools) # For capitalizing the first letter of a string.
library(treemapify) # For ploting tree map chart.
library(viridis) # For using Viridis Color Scales.
library(showtext) # For using Google fonts.
# Select Open Sans font:
my_font <- "Open Sans"
font_add_google(name = my_font, family = my_font)
showtext_auto()
# Extract GDP data by country:
url <- "https://www.worldometers.info/gdp/gdp-by-country/"
url %>%
read_html() %>%
html_nodes(xpath = '//*[@id="example2"]') %>%
html_table() %>%
.[[1]] -> gdpData
#-------------------------------
# Stage 1: Data pro-processing
#-------------------------------
gdpData %>%
select(2, 3, 4) %>%
rename(gdp = `GDP (nominal, 2022)`, gdpAbb = `GDP (abbrev.)`) -> gdpData
gdpData %>%
mutate(gdp = str_replace_all(gdp, pattern = "\\$|\\,", replacement = ""),
gdpAbb = str_replace_all(gdpAbb, pattern = "\\$", replacement = "")) %>%
mutate(unit = str_replace_all(gdpAbb, pattern = "[0-9]|\\.| ", replacement = "")) %>%
mutate(unit = str_sub(unit, start = 1, end = 3)) %>%
mutate(unit = StrCap(unit)) %>%
mutate(gdpAbb = str_replace_all(gdpAbb, pattern = "[a-z]| ", replacement = "")) %>%
mutate(gdp = as.numeric(gdp)) -> gdpData
gdpData %>%
mutate(shareGdp = 100*gdp / sum(gdp)) %>%
mutate(shareGdp = round(shareGdp, 1)) %>%
mutate(shareGdp = as.character(shareGdp)) %>%
mutate(shareGdp = case_when(!str_detect(shareGdp, "\\.") ~ str_c(shareGdp, ".0"), TRUE ~ shareGdp)) %>%
mutate(shareGdp = str_c(shareGdp, "%")) -> gdpData
gdpData %>%
mutate(label = str_c(Country, "\n", gdpAbb, " ", unit, " (", shareGdp, ")")) -> gdpData
#-------------------------------
# Stage 2: Plot Tree Map
#-------------------------------
gdpData %>%
ggplot(aes(area = gdp, fill = gdp, label = label)) +
geom_treemap(show.legend = FALSE) +
geom_treemap_text(colour = "white",
place = "centre",
family = my_font,
size = 13) +
scale_fill_viridis(option = "H", direction = -1) +
theme(legend.title = element_blank()) +
labs(title = "Nominal GDP Ranked by Country 2022",
subtitle = "Gross Domestic Product (GDP) is the monetary market value of all final goods and services made within a country during a specific period.\nAs of 2022, the United States and China would occupy the first two places. The US is ahead of China by $7 trillion in 2022 but the margin\nis coming down in nominal ranking as China's GDP growth rate of 2023 (5.01%) is higher than the US's 2.09%.",
caption = "Source: https://www.worldometers.info/gdp/gdp-by-country/") +
theme(plot.margin = unit(c(0.5, 0.5, 0.5, 0.5), "cm")) +
theme(text = element_text(family = my_font)) +
theme(plot.title = element_text(size = 18, color = "grey10"),
plot.subtitle = element_text(size = 10, color = "grey30"),
plot.caption = element_text(color = "grey30", size = 8))
