GCAMMaps Tutorial: Choropleth Map

Jason Evanoff

2020-06-16

Introduction

One of the most commonly produced map types in research is the choropleth map, which is a thematic map that shades geographic areas in proportion to a statistical variable such as population density or per-capita income. This vignette explains how to use this package to dynamically generate and customize a simple choropleth map by inputting a shape file and data file, and customizing the input arguments. This function has been designed to be both flexible regarding the scope and complexity of its arguments and simple to use with default values available for most arguments. This tutorial will outline all of these arguments in a step-by-step format.

Building Your First Map: Show World Health Organization (WHO) Healthy Life Expectancy (HALE) at Birth

To get started building a choropleth map, there are 5 required fields that do not have defaults that must be passed to the function in order for it to generate a map: shape_data, map_data (or shape_data_field discussed later), data_col, shape_key_field, and data_key_field.

The shape_key_field and data_key_field fields must be compatible in a join operation. In the case of excess data, this function will left_join to preserve shape data (unjoinable map_data will be lost).

For the purposes of this example, the following arguments will be used:

### Load the example scenario data.
data_file <- system.file("data", "data.csv", package="gcammaptools") 
shape_file <- system.file("data", "tm_world_borders_simpl-0.3.shp", package="gcammaptools")
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   map_data = data_file, 
                                   data_col = "X2016",
                                   shape_key_field = "NAME", 
                                   data_key_field = "Country")
plot(output)

You should see a figure that shows WHO HALE data by country, using the default palette “blues”.

Saving the Output

Now that we have created a simple map, before we look at further customization let’s take a look at how to save the output and what options are available. The relevant arguments here are output_file and dpi. The output_file argument should point to a fully qualified path such as “c:/temp/output.png” or other file path that R is able to process. The output_file argument should also include the file name appended at the end as well as the desired file type (Types accepted: “eps”, “ps”, “tex”, “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg”). GCAMMaps will autodetect the file type based on the file extension passed in. Note for this vignette we are using a temp file instead of a qualified path.

The dpi argument sets the dots per inch resolution and should be a number between 30 and 300 depending on your output and printing requirements.

The arguments added in this call:

You can verify that the file saved correctly by checking your local machine’s temp file directly designated by R.

Customizing the Output: Text Labels

Now that we have started the framework for a simple choropleth map, let’s learn how to fill in and customize some of the basic details like giving both the map and legend titles and customizing the axes labels. There are 4 label fields that control the output text: map_title, map_legend_title, map_x_label, and map_y_label.

The arguments added in this call:

### Customize the map text labels
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   shape_key_field = "NAME", 
                                   data_col = "X2016",
                                   map_data = data_file, 
                                   data_key_field = "Country", 
                                   map_title = "2016 Healthy Life Expectancy (HALE) at Birth",
                                   map_legend_title = "HALE (years)",
                                   map_x_label = "Longitude", 
                                   map_y_label = "Latitude" )
plot(output)

Note the addition of the title text, legend text, and axes labels.

Customizing the Output: Bins and Binning Options

A choropleth map usually contains a range of values that need to be divided into categories, or “bins”. The number of bins as well as the method of categorization are both customizable within this function.

The arguments that control this functionality are bins (default 8) and bin_method (default “pretty”). The bins argument will vary based on your individual data, but should reflect the desired number of categories in which to appropriately subdivide the dataset. The bin_method argument must be one of “quantile”, “equal”, “pretty”, or “kmeans”. A description of these methods is available in the classIntervals description, or simply type help(“classIntervals”) in your R console.

Note that it is possible for the system to override the number of bins depending on which bin_method is selected, the number of bins entered, and the particulars of your dataset. If this happens, and is a problem, reexamine your dataset and its relation to the bin_method and number of bins defined.

The arguments added in this call:

### Customize bins and bin method
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   shape_key_field = "NAME", 
                                   data_col = "X2016",
                                   map_data = data_file, 
                                   data_key_field = "Country", 
                                   map_title = "2016 Healthy Life Expectancy (HALE) at Birth",
                                   map_legend_title = "HALE (years)",
                                   map_x_label = "Longitude", 
                                   map_y_label = "Latitude", 
                                   output_file = file_out,
                                   dpi = 150, 
                                   bins = 4, 
                                   bin_method = "equal" )
plot(output)

Note that there are now 4 bins as specified, and each category is now 7.8 years in length from using the “equal” method.

Customizing the Output: Color Palettes

In addition to layout and formatting, the color scheme for the map can also be customized. This can be done in several ways. The map_palette_reverse argument can be set to TRUE, which inverts/reverses the current color scheme. The map_palette_type argument can be set to one of three presets: “seq” for sequential data, “div” for divergent data, and “qual” for qualitative data sets. By using this option, you will use the preselected JGCRI color defaults for that map type. Finally, the palette can be directly changed with the map_palette argument.

The map_palette argument can be set to any palette type from the RColorBrewer package. To view the list of available palettes go to https://colorbrewer2.org/ or type display.brewer.all() in your R console (remember to load library). Note that using the map_palette option overrides the map_palette_type variable.

The arguments added in this call:

### Customize palette
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   shape_key_field = "NAME", 
                                   data_col = "X2016",
                                   map_data = data_file, 
                                   data_key_field = "Country", 
                                   map_title = "2016 Healthy Life Expectancy (HALE) at Birth",
                                   map_legend_title = "HALE (years)",
                                   map_x_label = "Longitude", 
                                   map_y_label = "Latitude", 
                                   output_file = file_out,
                                   dpi = 150, 
                                   bins = 8,
                                   bin_method = "pretty", 
                                   map_palette_reverse = FALSE,
                                   map_palette_type = "seq", 
                                   map_palette = "Spectral")
plot(output)

Note the output which reflects the change to the chosen color palette “Spectral”.

Customizing the Shape Fields

There are a number of shape specific options that can also be customized, if needed. The most important is the shape_geom_field, which designates which field within the shape object is used to plot the shape geometry. The standard field name is “geometry”, which is the default. However if your shape object is non-standard, then you must inspect it and determine which field to use. The other argument that controls how the shape object gets plotted is shape_xy_fields, which has a default of c(“LON”, “LAT”). This field designates how to generate the x and y axes using data from the shape object. Like the shape_geom_field, if your shape is different or non-standard, you must inspect and determine the correct value.

Other shape related fields that control output are simplify, shape_label_field, and shape_label_size. The simplify argument can be set to TRUE, which runs an algorithm to reduce the complexity and amount of lines that are drawn, improving map aesthetics and/or rendering time in certain situations. The shape_label_field and shape_label_size work together to provide optional geographic labels such as country name. The shape_label_field designates which field contains the labels and shape_label_size controls how big those labels are.

The arguments added in this call:

### Customize shape options
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   shape_key_field = "NAME", 
                                   data_col = "X2016",
                                   map_data = data_file,
                                   data_key_field = "Country", 
                                   map_title = "2016 Healthy Life Expectancy (HALE) at Birth",
                                   map_legend_title = "HALE (years)",
                                   map_x_label = "Longitude", 
                                   map_y_label = "Latitude",
                                   output_file = file_out,
                                   dpi = 150, 
                                   bins = 8,
                                   bin_method = "pretty", 
                                   map_palette_reverse = FALSE,
                                   map_palette_type = "seq", 
                                   map_palette = "Spectral",
                                   shape_geom_field = "geometry",
                                   shape_xy_fields = c("LON", "LAT"),
                                   simplify = TRUE,
                                   shape_label_field = "NAME", 
                                   shape_label_size = 1)
plot(output)

Note the shape labels displaying the country name and slightly reduced complexiy of the map from setting simplify = TRUE.

Customizing the Output: Map Adjustments

A few additional fields remain which affect the layout and aesthetics of the output map. The map_font_adjust field allows you to customize the size of all of the map text (except the title). A value of 1.0 is the default and represents unadjusted size. The range of values are 0.5 (50%) - 2.0 (200%).

The map_width_height parameter adjusts the actual output size of the map. Finally, the map_xy_min_max parameter adjusts the minimum and maximum values of the x and y scales and zooms in or out.

The arguments added in this call:

### Customize misc map options
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   shape_key_field = "NAME", 
                                   data_col = "X2016",
                                   map_data = data_file,
                                   data_key_field = "Country", 
                                   map_title = "2016 Healthy Life Expectancy (HALE) at Birth",
                                   map_legend_title = "HALE (years)",
                                   map_x_label = "Longitude", 
                                   map_y_label = "Latitude",
                                   output_file = file_out,
                                   dpi = 150, 
                                   bins = 8,
                                   bin_method = "pretty", 
                                   map_palette_reverse = FALSE,
                                   map_palette_type = "seq", 
                                   map_palette = "Spectral",
                                   shape_geom_field = "geometry",
                                   shape_xy_fields = c("LON", "LAT"),
                                   simplify = TRUE,
                                   shape_label_field = NULL, 
                                   shape_label_size = NULL,
                                   map_font_adjust = 0.5,
                                   map_width_height_in = c(12,8),
                                   expand_xy = c(0.5,0.5),
                                   map_xy_min_max = c(-90, 90, -45, 45) )
plot(output)

Note the reduced font size in the axes and legend as well as the extent of the map has been limited to 90/45 degrees.

Using Shape Data Fields(instead of map_data)

The shape_data_field argument can be used to plot data that is contained in your shape object (instead of passing in a separate map_data argument.) Both the shape_data_field and map_data cannot be passed in to the same function call, the system can only use one. The plot output and bins would then be calculated based off of the data field in the shape. Not all shape objects have data fields however.

Note that when using this method you do not need to use the map_data, data_col, data_key_field, and shape_key_field arguments.

The arguments added in this call:

### Customize misc map options
output <- gcammaptools::choropleth(shape_data = shape_file, 
                                   shape_data_field = "POP2005", 
                                   map_title = "World Population 2005",
                                   map_legend_title = "Population (millions)",
                                   map_x_label = "Longitude", 
                                   map_y_label = "Latitude",
                                   output_file = file_out,
                                   dpi = 150, 
                                   bins = 7,
                                   bin_method = "quantile", 
                                   map_palette_reverse = FALSE,
                                   map_palette_type = "seq", 
                                   shape_geom_field = "geometry",
                                   shape_xy_fields = c("LON", "LAT"),
                                   simplify = TRUE,
                                   shape_label_field = NULL, 
                                   shape_label_size = NULL,
                                    )
plot(output)

Note that the map now reflects 2005 world population from data embedded in the shape file.