Recently, SocialCops has devoted considerable attention to the topic of geospatial data visualization. Vasavi Ayalasomayajula highlighted seven techniques to visualize geospatial data. I looked at how the R packages {shiny} and {leaflet} could be used to create an interactive choropleth. SocialCops also published a free Introduction to GIS in R course, which included lengthy lessons on creating static, animated and interactive maps.

Further expanding on this topic, I’d like to compare and contrast the strengths and weaknesses of a few of the most common types of thematic maps. The answer to what is the “best” style of geospatial data visualization often depends on the type of data at hand and what you hope to communicate as a storyteller. Therefore, it is important to understand the fundamental differences among your thematic mapping options.

The options included for discussion here are a choropleth, a dot density map, a proportional symbols map, and a 3D choropleth. To better emphasize the respective merits of these options, I have visualized the same dataset through each thematic mapping style, noting what kinds of stories each has to tell.

The dataset covers the three most recent decades of Indian census figures depicting household access to electricity and latrines at the district level. In addition to containing easily understood quantities, I find this dataset particularly instructive for three main reasons. First, it represents a highly uneven population density common in geospatial data of human settlements. Second, it contains information which we want to communicate in a raw count in some cases but also as a standardized rate in others. The tension between visualizing a raw count versus a standardized rate encapsulates the fundamental differences in the thematic mapping options discussed here.

Lastly, on top of being an instructive tool for data visualization principles, this dataset holds stories that are valuable in their own right. At a fine-grain level, and cross-tabulated by key social identities, this dataset tracks thirty years of progress (and lack thereof in some cases) for hundreds of millions of people towards access to two fundamental amenities needed for healthy, productive lives. Inside the dataset is rich context for stories of public health, economic growth, competitive federalism, and societal inequality to name a few.

Once again I’ll use {shiny} to allow for quick comparisons of each type of thematic map using the same dataset. Instead of {leaflet} however, I’ll use the {mapdeck} library from David Cooley. Mapdeck is a relatively new R library under active development which makes it easy to plot interactive maps using Mapbox GL and Deck.gl.

The shiny app running throughout the examples in this post can be found here:

https://seanangio.shinyapps.io/thematic_mapping/

Choropleths

The choropleth is likely the most ubiquitous thematic map. This style maps color to enumeration units (such as districts in India in this case) according to the value of some quantity.

In the example below, we can see how the percentage of households having access to electricity sharply varies at the district-level with respect to parameters such as time, societal cross-section, or demographic cross-section. In particular, we can observe how many parts of India, particularly South India and Gujarat, become more yellow as more households have gained access to electricity over time. At the same time, other regions, such as parts of Uttar Pradesh and Bihar, have remained stubbornly blue and purple, representing very low levels of electricity access.

India’s Household Access to Electricity (1991-2011); visit https://seanangio.shinyapps.io/thematic_mapping/ to explore the app on your own

India’s Household Access to Electricity (1991-2011); visit https://seanangio.shinyapps.io/thematic_mapping/ to explore the app on your own

It is important to note that here I am using a choropleth to map a rate or a ratio as opposed to a raw count. A percentage is one example of a standardized rate. Our data must fall in the range of 0 to 100.

The nature of standardized data is very different from that of raw count data. The number of households having electricity in a district is an example of a raw count. It has not been manipulated or transformed in any way. Converting this raw count to a percentage standardizes it. When using R, whatever package we choose (such as {tmap}, {ggplot2}, {leaflet} or {mapdeck}) will allow us to create a choropleth with raw count data, but it is usually a bad idea. Depending on the range of your data, color is often a poor aesthetic to communicate the magnitude of differences amongst enumeration units.

In this case, spatially mapping percentages through a choropleth is able to quickly provide a great deal of insight into a number of important questions at hand. My previous blog post explored some of these questions in greater detail. However, a choropleth in this case has a few major failings. In particular, it conceals the vastly different populations within individual enumeration units. Geographically large enumeration units can distract attention from smaller units that may in fact hold much greater population counts. This is a non-trivial problem for Indian districts where geographically-small metropolitan districts (such as Bangalore) have very large populations compared to other more sparsely-populated, primarily rural districts (such as Leh).

Examining the sea of colors in the choropleth, we have to be mindful of the fact that India’s population density, as is generally the case elsewhere in the world, is hardly uniform. Accordingly, although the choropleth effectively communicates relative differences in a standardized rate across a spatial area, it is unable to honestly represent the magnitude of differences in raw counts. From a mapmaking perspective, how might we address this challenge?

Dot Density Maps

A dot density map is one visualization option that can potentially address this problem. Unlike a choropleth, a dot density map is often an excellent choice to spatially visualize raw count data because it randomly assigns a certain number of dots within a given enumeration unit according to a value. This makes it very easy to ascertain where values cluster in a geographic space.

Typically, dot density maps have been a little more complicated to create in R than choropleths. The dots.R script I wrote draws heavily on functions from a blog by Andreas Beger. His functions are themselves modifications of the sf::st_sample() function. However, the recent addition of an “exact” argument to the sf::st_sample() function should make this process much simpler in the future.

Our shiny app includes an example of a “one-to-many” dot density map, where each dot represents 25,000 households. Note that in each case thus far, we manipulated the data in some way. In the choropleth, I converted raw counts to a percentage. In the dot density map, I divided raw counts by a chosen dot value, in this case 25,000.

Unlike a choropleth, a dot density map enables me to depict raw population growth and where it clusters over time as more dots are plotted. Compared to the choropleth, the dot density map more accurately represents populations in dense, urban centers compared to more sparsely-populated rural areas.

The example below compares the vastly different population clusters when mapping households in India having access to both electricity and latrines versus those having neither of the two amenities. The first case reveals a large concentration of dots in India’s largest urban centers – places like Delhi, Bangalore, Mumbai, Kolkata, and the state of Kerala. Toggling to depict households with access to neither electricity nor a latrine, however, reveals a major population shift to Uttar Pradesh and Bihar.

India’s 2011 Households with Access to both Electricity and Latrines vs. those having Neither Amenity

India’s 2011 Households with Access to both Electricity and Latrines vs. those having Neither Amenity

Examining the same parameter choices through the choropleth highlights the stark changes in colors, but the dot density map is better able to represent the raw differences in these populations.

Dot density maps also have the unique advantage of being able to map multivariate data. For example, we can plot both urban and rural populations at the same time using different colored dots. This may be their most valuable advantage compared to other geospatial visualizations. With each dot representing 25,000 households, the graphic below depicts the overwhelming majority of 2011 households in India that lacked access to both electricity and a latrine were classified as rural rather than urban.

Of course the dot density map is not without its own failings. Perhaps most importantly, we are unable to retrieve numeric data from the map. Although it is possible to examine clusters of populations for any given parameter, calculating exactly how many people are in any given category is usually not possible.

By choosing to map population counts, we lose insight into the percentage in any given area. Do most households in a certain district have access to electricity? Using the choropleth, we can easily match a district’s color to the percentage given in the legend. By contrast, the dot density map is a poor choice for answering this kind of question.

Another downside is that the final appearance of the dot density map can highly depend on two largely subjective factors: dot value and dot size. What value should a single dot represent? 25,000 households or 50,000 households? We may have some heuristics, but no definitive answer. Secondly, how large, either in pixels or a unit like meters depending on your software, should each dot be? Both questions can have a large impact on the appearance of the final map.

There are other more technical considerations at hand when drawing a dot density map. For instance, how do you handle populations falling just below the dot value threshold? If a single dot represented 25,000 households and a district had 24,999 households in any given category, should the map assign a dot? Techniques like stochastic rounding that take into account probability are useful in this situation. Compared to a choropleth, the dot density map may be more difficult for an average viewer to interpret.

With all of these concerns in mind, is there a way to simultaneously map the percentage and the raw count?

Proportional Symbols Maps

One method for mapping both the percentage and the raw total of a given variable is through a proportional symbols map, or more informally known as a bubble map. In this type of thematic map, we draw a symbol (most typically a circle) from the center of the enumeration unit. We can assign the radius of the circle to reflect the raw total and the color to represent the percentage.

Compared to a choropleth, some of the geographic information of the map has been lost – district shapes are covered by circles. Nevertheless, the proportional symbols map retains a geospatial arrangement that will be sufficient for many use cases. Without needing the exact geographic shapes, the circles can still reveal regional trends.

In the accompanying shiny app, the colors of the circles communicate the percentage of households having access to electricity or a latrine – a fact I was unable to show in the dot density map. At the same time, the size of the circles reflects the raw count, solving the choropleth’s problem of concealing population totals.

The example below traces India’s progression of household access to latrines from 1991 to 2011 through a proportional symbols map. The graphic below tells at least two stories simultaneously. First, through the increasing size of the circles, we observe that, in raw terms, India’s population with access to a latrine has grown considerably since 1991. Secondly, the colors of the circles communicate a typical value for access to latrines in a particular district. Note that the dot density map is able to represent the first story, but not the second, whereas the choropleth tells the second story, but not the first.

To tell this story for just one district, note how, in each decade, the size of the circle representing Bangalore has grown substantially, suggesting its population with access to a latrine has increased, in raw terms. At the same time, the color changes from the 70-80% band, to 80-90%, to 90-100%, suggesting that mean improvements have accompanied the increase in population.

India’s Household Access to Latrines (1991-2011)

India’s Household Access to Latrines (1991-2011)

The proportional symbols map can quite flexibly handle many types of data, but a common problem is congestion when the number of enumeration units is high. With 640 districts in 2011, this is certainly a problem for this particular dataset. If instead we were perhaps dealing with Indian states or large cities, this option might have been more effective.

When facing congestion, we often need to scale or transform the circles by some factor. Like choosing a dot value in the dot density map, this can also be somewhat arbitrary. {mapdeck}’s add_scatterplot() requires circle radius to be in meters, and so I divided the raw household counts by a factor of 10. This outcome works better for some parameter selections than others.

Another problem can lie in the interpretation of circle sizes themselves. From a perceptual standpoint, a two-dimensional quantity such as area can be difficult to interpret accurately compared to a one-dimensional quantity such as length. With this in mind, sometimes you will find a “graduated” symbols map where symbol size is binned to a few categories to make it easier to match a circle to the size that it represents. In this case however, a legend for circle size is regrettably absent.

Do we have any other options to depict both a raw count and a percentage in geographic space?

3D Choropleths

One last option made possible thanks to the elevation argument of {mapdeck}’s add_polygon() function is a 3D choropleth.

This visually-striking option manages to map three important quantities. It maps the percentage to color and the raw count to height, while at the same time maintaining the geographic shape of each enumeration unit. The choropleth could achieve only the first and third items; the proportional symbols map only the first two.

Below, we can see how over time, as represented by the rising elevation of the district shapes, the raw population of households having access to a latrine has grown, particularly from 2001 to 2011. The traditional choropleth fails to capture this population growth because it cannot communicate raw counts. At the same time, the colors of the district shapes communicate typical values in a way that the dot density map failed to do.

The example belows explores, in three-dimensional space, the distribution of India’s 2011 population with access to neither electricity nor a latrine. The low purple areas represent districts with small populations where it is very unlikely for a household to lack both electricity and a latrine. In a much more vivid way compared to the dot density map, the third dimension allows us to see the concentration of India’s population lacking these key amenities in the states of Uttar Pradesh, Bihar, and West Bengal. Although the colors are the same as they would be in a flat choropleth, the height parameter adds an entirely new dimension to the story.

India’s 2011 Households with Access to neither Electricity nor Latrines

Not surprisingly though, the 3D choropleth suffers from a common dilemma associated with any kind of 3D visualization. It can be difficult to see in its entirety. The view is routinely blocked or obscured by other parts of the visualization. Without being able to rotate and tilt the visualization at will, and sometimes even then, it is very difficult to comprehend the details of the entire map. In contrast, the other thematic mapping options discussed in this blog are all perfectly viable as static maps.

Further Resources

Using the {mapdeck} library and a single source of data, this post has attempted to highlight the strengths and weaknesses of the most common thematic maps, including choropleths, dot density maps, proportional symbols maps, and 3D choropleths. Although this is far from an exhaustive list of thematic mapping options, hopefully it has introduced the idea of tradeoffs inherent in visualization choices depending on the type of data at hand and the story that the designer hopes to communicate. At the same time, I hope it has helped to unearth some of the most important stories in the history of global development.

Read More: For more resources on geospatial data visualization, be sure to check out some of the links below.