Design principles for effective data visualization

An effective data visualization should be clear, concise, and visually appealing. The key design principles underlying such a visualization include:

Choose the right chart type for the data being presented
Use appropriate labels and titles to provide context and clarify data
Minimize clutter and distractions in the visualization
Ensure consistency in design elements such as font size, color scheme, and layout.

These are very useful sources on depicting data: Tufte (2001) and Zelazny (2001). These videos are valuable too: Karl W. Broman: Creating effective figures and tables and Darkhorse Analytics: Data looks better naked series.

Color wheel

The color wheel is a tool that helps designers and artists understand color relationships and create harmonious color schemes. It consists of a circle with colors arranged in a specific order, typically starting with red, then moving through orange, yellow, green, blue, and purple. The color wheel can be divided into different segments based on color relationships, such as complementary colors, analogous colors, or triadic colors.

The color wheel was invented in 1666 by Isaac Newton (1643–1727), who mapped the color spectrum onto a circle. It is the basis of color theory, because it shows the relationship between colors.

Source: DataNovia

There are several online tools that can be used to choose the appropriate colors for your graphs:

Adobe Color is a tool that allows you to create color schemes based on the color wheel.
Paletton is another tool that allows you to explore color schemes based on the color wheel. It includes a color wheel visualization, as well as options to create custom color schemes and preview them in different contexts.
Canva is a graphic design tool that includes a built-in color wheel.

You can use this feature to select colors for your designs and see how they relate to each other on the color wheel.

Data-to-ink ratio

In order to be clear and concise the plots must contain only relevant information. Data-to-ink ratio is a design principle suggested by Edward Tufte emphasizing the importance of displaying only necessary information in a graph. The goal is to minimize the amount of ink used to display non-essential information (chartjunk), while still conveying the important data effectively. This can improve readability and reduce clutter in a visualization.

How to maximize data-to-ink ratio?

Remove unnecessary grid lines or borders.
Use lighter colors for background or non-data elements.
Simplify chart types to eliminate redundant or unnecessary elements.

It can be useful to think of the ink that your printer will have to consume in order to plot your graph. If you want to save money on ink, but still need to print the graph, think how to consume as little ink as possible.

Let us consider an example of graph below. It is apparently overloaded with unnecessary “ink”. First, it has a grey background. Second, it contains gridlines that are not needed. Thus, its data-to-ink ratio is very low.

After removing these unnecessary elements, we obtain the following graph. It is still far from being perfect, but still much better than the previous graph.

Edward Tufte also introduced the notion of chartjunk that means all visual elements in graphs that are not necessary to understand the information shown on the graph or that distract the viewer from this information. Below several examples of chartjunk are presented.

The west-north panel has an unnecessary background and the bars are both colored and contain grid. Instead a graph without background and with monochrome bars would be much more appropriate. The north-east panel also has a background that is superfluous and, in addition, it contains a three-dimensional graph, where it is not needed, since it displays only two dimensions: time and values of the variables. In the south-west panel, there is a perspective that just distracts and, moreover, produces a biased impression regarding the magnitude of the two variables displayed in the graph. Moreover, area diagrams here are not needed. Both variables could have been depicted using simple lines. Such graph elements are to be avoided at any price.

Visual exploratory analysis

A necessary step of any data-based research is the exploratory analysis. It allows to investigate the properties of data that are otherwise impossible to uncover. During such an analysis, various anomalies of data can be found, including outliers, skewness of distribution, etc.

Any graph must satisfy the following requirements. Its title must be informative and interesting. It should not only describe what is shown in the graph, but also, especially in presentations, tell a story. For example, instead of “Dynamics of housing prices in 1990-2020” it could be “Falling prices after a decade of steady growth”. Axes (especially vertical one) must be described. Legend must be informative and not intersect with lines. All numbers must be horizontally placed. One must also avoid rotation of axis labels, since the values plotted in such a way are more difficult to grasp for the reader.

The standard graphs must have the following format. The upper panel shows general format, while the lower panel contains more specific titles and labels.

The graphs should have the following elements: * Informative title * Legend * Axis description * Axis labels should be placed horizontally * Sources of data

Avoid such graphs! They can be improved by correcting and introducing title, legend, and axis labels and by removing unnecessary elements.

North-west panel contains no descriptions. In the north-east panel, the legend is placed wrongly, for it intersects with lines. In south-west panel, labels of vertical axis are not horizontal and, hence, are more difficult to read. In south-east panel, grid lines are superfluous. They could be acceptable, had they been plotted at larger spaces, in lighter color, and dotted or dashed.

Avoid pie diagrams!

Pie diagrams are a popular means of illustrating the structure. However, pie diagrams should be avoided. They distort perception of relative sizes. For human eye cannot correctly evaluate the size of segments.

Compare pie diagram and a barplot. Both show the same — the distribution of Russian exports in 2021 by partner countries. In pie diagram, it appears that exports to China are twice as large as exports to the Netherlands. However, barplot makes it clear that they differ by less than 40%. Pie diagram requires the use of many colors, while barplot is more parsimonious, since it has only one color. Moreover, in pie diagram, the names of countries intersect, whereas in the barplot, they can be read nicely.

Sources: UN Comtrade and own representation

In both charts, there is a little problem. The largest category is the rest of world (ROW). Typically, you would try to avoid having something as a dominant category about which you cannot too much to say. Therefore, we combined all the EU member states into an EU aggregated. In such a way, we managed to reduce the rest of world category.

Sources: UN Comtrade and own calculation

In the new graph, we managed to dramatically reduce the ROW, or “Other” category. Now, the largest category is the European Union.

One feature — one color!

In a cross-section, if you compare different objects, use the same color for all of them. Painting each bar in a different color is superfluous. Consider, for instance, the following chart.

Here, the frequencies of three groups of dwellings (with one, two, and three rooms) are plotted. Each of them is plotted in a different color. This is an additional element that reflects the information that is already known thanks to the labels of the vertical axis. Therefore, this element is superfluous and must be get rid of. In addition, the labels on horizontal axis are placed vertically, which makes it more difficult to read them. And, of course, the box around the graph should be removed without any mercy.

The improved graph could look as follows.

In addition to all the above mentioned things, I changed the color of the label for 3-room dwellings to make it more visible on the dark background and rounded the percentage shares displayed in the labels to make them more uniform. Strictly speaking, the percentage sign in the labels can be dropped too, since this information is already reflected in the horizontal axis label.

However, a different color can be useful, if you want to stress some object of interest. For instance, I want to focus on a country of interest. In the graph below, I show the percentage change in real housing prices in the 3rd quarter 2022 compared to the 2nd quarter. The graph is aimed at German audience. Therefore, I need to stress Germany. I do it by plotting the corresponding bar in a different color.

Source: OECD and own representation

Although different color for Germany does not contain any additional information, it helps the audience to immediately see the object they are most interested in.

For the count data, one could use a dotplot. An advantage of the dotplot is that the number of dots is equal to the represented value. This simplifies grasping the diagram. Below, I show the monthly number of passengers in 10 largest airports worldwide.

Source: Civil Aviation Administration of China

It can be seen that the largest airports are similar in terms of their capacity: Each month they serve between six and nine million passengers.

If you want to provide an additional information, for example, the country where the airport is located, you can place the flags of the corresponding countries in the diagram.

Source: Civil Aviation Administration of China

It can be seen immediately that half of the 10 largest airports are located in the USA.

Sometimes, for illustrative purposes it can be needed to plot images instead of lines or bars. Below, we represent the gender structure of population of two imaginary countries A and B. Let one male or female image represent one million persons. However, if much more populous countries are considered, the value of one image can be increased to 10 or even 100 millions.

It can be seen that population in country B is more than two times smaller than in country A and that country B has relatively more females (56%) than country A (41%). The images are taken from a list of Egyptian hieroglyphs.

The following graph also uses images of the objects under discussion. It compares the number of cats and dogs per 10 persons in several countries. Since the ratios are seldom integer numbers, the fractional part is displayed as a part of the corresponding pet. The images of the pets are borrowed from the isotypes of Gerd Arntz.

It is important to choose an appropriate scale of ratios. If the number of pets per person is displayed, then the ratio is going to be smaller than one. If, however, pets per 100 persons is considered, then the ratio will be close to integer but to high to be grasped at a single glance. Therefore, here I show the number of cats and dogs per 10 persons.

Source: FEDIAF Annual Report 2024 and own calculations

As can be seen in the figure, the inhabitants of Austria, France, and Switzerland are predominantly “cat people,” while Spaniards and especially Portuguese are “dog people.” In the UK, however, the numbers of dogs and cats are balanced. Germany and Russia are also close to equal ratios of cats and dogs. Turkey has very few pets relative to the number of people compared to other countries considered here.

Analysis of the distribution

Correct identification fo true distribution is of utmost importance, since it can affect the conclusions drawn from the analysis of the data.

Typically, the distribution of both numeric and nominal data is analyzed using histograms. In case of numeric data, an intermediate step of breaking down the whole range of values into several group and assigning observations to them is needed.