The dream: an easy-to-use toolkit for R users to make quick, yet beautiful cartographic data visualizations

The woodson mapping suite is designed to be used as a wrapper and supplement to the ggplot() plotting and mapping functionalities.

Why use this package?

This code was created with the hope and intent of making cartography quick and easy.

The ggplot() library used without some kind of wrapper can create huge swaths of long, confusing code which clutters the main code text in which the mapping code is embedded
Fortifying your shapefiles, merging, and slicing your data using ggplot() alone is fraught with complexities and pitfalls, and this module will ensure that your colors/geometries are matching up with the right data

How do I download and use this library?

This library is freely available on Github at the url: https://github.com/RebeccaStubbs/woodson. To use this as a library, you can install it using github’s devtools library’s install_github() function.

# The first time you want to use this repository:
install.packages("devtools") # Only do this if you haven't already installed the devtools packages
library(devtools) # This has the install_github function in it
install_github("RebeccaStubbs/woodson")

Every other time you want to use this repository, simply use library(woodson), and all of the functions and documentation will be loaded in.

library(woodson)

What kind of data formats do I need to use it?

To use these tools, you need your spatial data already loaded into R in the format of a spatialpolygonsdataframe, with the data (including the geographic ID you want to use to link together your tabular and spatial data) as a data.table. See a tutorial here on information about bringing in shapefiles into R as spatialpolygonsdataframes:

https://rpubs.com/BeccaStubbs/bringing_shapefiles_into_R

You also need your data to be a data.table. These tools use data.tables because they are fast, conveneient, and make use of clear syntax. It’s easy to convert data.frames to data.tables to take advantage of these benefits.

Woodson color pallettes

In addition to creating maps, this suite also includes a variety of color palletes that you can use to plot your data. Please see the introduction to woodson pallettes. The colors you choose greatly impact how your data is interpreted– decide what you are trying to achieve, and choose wisely.

Using the Map Function

Using the “wmap” function, you can make a variety of chloropleth maps (maps where the colors of the polygons correspond to data values). This function now supports categorical or binned data, however, given that binned colors often distort the apparent inequalities between areas, it is recommended that numeric data is plotted by applying the values to a continuous color ramp. This mapping function can be used for making one-off maps, or for generating a series of maps based on a dimension such as year.

The number of things you must give to the function to create a map is actually fairly minimal. The most stripped-down version of this function will plot your SpatialPolygonsDataFrame using the default color pallette (easter to earth from the woodson pallettes), using variables that are already contained within the SpatialPolygonsDataFrame’s data.table.

If the variable you want to map is already within the @data object of the SpatialPolygons object, and you only want to plot one dimension/version of that variable, you’re all set to go. However, if the data you would like to plot is in a separate table, you can use this function to merge on your data to your map object, and then plot it (assuming your geog_id’s match, and are the same data type). You can also specify a variety of things to tinker with your map’s aesthetics.

Making Maps

These examples walk through some fundamentals of the woodson mapping suite. Refer to the parameter appendix at the end for more details on options and default values.

Example 1: Basic maps and histograms

This function will either print a map (and histogram, if desired) to the screen or a pdf, or, if return_ map_ object_ only is set to true, it will return the GGplot object of the map.

In this example, we start with the number of letters in each state’s name, which is contained within the data of the SpatialPolygonsDataTable as colors for each county.

wmap(chloropleth_map=mcnty_map, 
           geog_id="mcnty", 
           variable="n_letters")

Let’s also add a histogram at the bottom, and also make sure we can tell the differences between each of the counties by adding in a small border line around the counties.

wmap(chloropleth_map=mcnty_map, 
           geog_id="mcnty", 
           variable="n_letters",
           chlor_lcol="white",
           chlor_lsize=0.01,
           histogram=TRUE)

Sometimes it’s difficult to tell what’s happening in the distribution by eye– We also can add in some interesting distribution lines that show some summary statistics. By adding the dist_stats parameter, we can add in some summary statistics. In this case, I’ve added in some quantiles (the 10th, 25, 75, and 90th percentile, as well as the .5 percentile (which is recognized by the function as the median)).

wmap(chloropleth_map=mcnty_map, 
           geog_id="mcnty", 
           variable="n_letters",
           histogram=TRUE,
           dist_stats=c(.1,.25,.5,.75,.9))

You can also add in statistics based on the mean and standard deviation, as well as make the histogram bars a more boring color:

wmap(chloropleth_map=mcnty_map, 
           geog_id="mcnty", 
           variable="n_letters",
           histogram=TRUE,
           hist_color="grey",
           dist_stats=c("mean","sd"))

You can also create a combination of quantiles and mean-based statistics:

wmap(chloropleth_map=mcnty_map, 
           geog_id="mcnty", 
           variable="n_letters",
           histogram=TRUE,
           dist_stats=c("mean","sd",.5,.9))

Example 2: Adding on another data set and changing colors

Let’s try adding in some external data that isn’t already within the spatial polygons object. We’ll also add in a map title! The data.table you want to use to map variable values must contains both the specified geog_id, and the variable you want to plot. In addition, we can provide a list of colors that we want to stretch across the values. Any vector of r-recognized colors will do (hex, named, etc), and you can input a color scheme that you designed yourself, that is in R’s base packages (like topo.colors), or a library like R color brewer.

wmap(chloropleth_map=mcnty_map,
           data=us[year==2013],
           geog_id="mcnty",
           variable="unemployed",
           map_title="Percent Unemployed in 2013",
           color_ramp=c("purple","blue","green"))

Alternately, you can use one of the custom color schemes that the wpal() function has to offer, designed to maximize value-differentiation without using diverging color scales.

wmap(chloropleth_map=mcnty_map,
           data=us[year==2013],
           geog_id="mcnty",
           variable="unemployed",
           map_title="Percent Unemployed in 2013",
           color_ramp=wpal("cool_toned"))

Looking good! Now, what about variables that have postive and negative values, or we want to visualize a z-score of a variable? We can use the argument diverging_centerpoint in conjunction with a diverging color scheme to set the center of our color ramp at a predefined value (which does not need to be 0— it only needs to be between the minimum and maximum values of your data set). Diverging color schemes often get overused because of the fact that the additional colors make viewing differences between values easier to detect by eye. However, they really should be reserved for variables where there is a value within the minimum and maximum that is significant in some way.

wmap(chloropleth_map=mcnty_map,
           data=us[year==2013],
           geog_id="mcnty",
           variable="income_median_zscore_by_year",
           map_title="Z-score of HH Median Income in 2013",
           color_ramp=wpal("orange_blue_diverging_from_purple"),
           diverging_centerpoint=0)

We can also come up with our own diverging color schemes, combining different pallettes accessed by the wpal() function (see below for lots more information on these color schemes and how to use them):

diverging1<-c(rev(wpal("cool_green_grassy")),(wpal("warm_darkfire")))

wmap(chloropleth_map=mcnty_map,
           data=us[year==2013],
           geog_id="mcnty",
           variable="income_median_zscore_by_year",
           map_title="Z-score of HH Median Income in 2013",
           color_ramp=diverging1,
           diverging_centerpoint=0)

Example 3: Subsetting geographies

Sometimes, looking at the whole map is overwhelming, and you’re really just interested in what’s going on in one state, or area. We can subset the chloropleth map object to plot only the geographic area of interest. We’ll also turn off the histogram– it won’t be that interesting with so few observations, and we can tinker with the color scheme as well. We will also manually set what we want our color scheme to ramp between– in this case, 10 will serve as the lower limit of the color ramp, and 40% as the upper portion of the ramp.

wmap(chloropleth_map=mcnty_map[mcnty_map@data$state==1,],
           data=us[year==2013],
           geog_id="mcnty",
           variable="poverty",
           map_title="% of Population Below the Poverty Line in 2013 (Alabama)",
           color_ramp=wpal("black_to_light_10"),
           override_scale=c(10,40))

Hmm. That’s a lot of colors for relatively few observations– looks kind of chunky. Maybe a different color with fewer intermediate colors might serve better. Also, let’s try an intensity pallette- one where the colors get more saturated as they get darker. [Note: See examples further below for even more nuance about how to pick a color scale].

wmap(chloropleth_map=mcnty_map[mcnty_map@data$state==1,],
           data=us[year==2013],
           geog_id="mcnty",
           variable="poverty",
           map_title="% of Population Below the Poverty Line in 2013 (Alabama)",
           color_ramp=rev(wpal("pink_to_purple_intensity")))

Example 4: Adding an outline geometry

We can also add in outlines of a different geography, if we so desire. In this example, we can subset the data to washington state, and then add in an outline. In this circumstance, we will outline all of the counties.

wmap(chloropleth_map=mcnty_map[mcnty_map@data$state==53,],
           outline_map=mcnty_map[mcnty_map@data$state==53,],
           data=us[year==2014],
           geog_id="mcnty",
           variable="elev_range",
           map_title="Range of Elevation within County",
           destination_folder=NULL,
           color_ramp=wpal("brown_to_sea_green"))

## Regions defined for each Polygons

That’s great, but we’re really just interested in King County specifically– let’s use the border to call stronger attention to that county with a stronger, brighter border:

wmap(chloropleth_map=mcnty_map[mcnty_map@data$state==53,],
           outline_map=mcnty_map[mcnty_map@data$mcnty==2937,],
           data=us[year==2014],
           geog_id="mcnty",
           variable="pop_density",
           map_title="Population Density",
           fontsize=20,
           color_ramp=wpal("purple_to_sea_green"),
           histogram=FALSE,
           outline_size=1.2,
           outline_color="orange")

## Regions defined for each Polygons

Making Series Maps

You can also make a series of maps based on a data.table with information that repeats for each unit of geography– for example, you might have a variable you want to plot over time.

Example 5: Repeating a map over a certain dimension

This time, we’ll plot out some King County data that is available in multiple years. Let’s see how travel time to work per person has changed over time. To acheive a series map, you need to set 2 more parameters: the series_dimension (the field you want to subset your data on– year, or vaccine type, etc), and an (optional) series sequence that lists a subset of the unique observations within your dimension variable. This is useful if you don’t want to make a map of every year in the time period. The code will automatically check to make sure that the items in the sequence you provide actually exist in the data set you have provided. The color ramp will be based on the full series (time series, etc) unless you manually override it.

Providing a map subtitle will paste that text (in this case, “Year:”), and the series dimension, underneath the map title. If no subtitle text is provided, but there is a dimension you want to map over, the subtitle will simply be the value of the dimension you are mapping over. If you are making these maps for publication, consider generating a new variable that is a string with pretty, nicely labeled categories rather than something numeric (ex: sexes as Male, Female, Both rather than 1,2,3), and use that variable as the series dimension.

wmap(chloropleth_map=shp,
           data=king,
           geog_id="mtract",
           variable="edu_ba",
           map_title="% over 25 with a Bachelor's Education",
           map_subtitle="Year: ",
           series_dimension="year",
           series_sequence=seq(2000,2010,5),
           color_ramp=rev(wpal("bright_fire")),
           histogram=TRUE)

Saving, manipulating, and writing map objects to a PDF

Writing to a PDF

You may have noticed that calling the wmap() function prints out maps to the screen. This is convenient in that you can start a pdf() object, call the function, and include a series of maps in the same PDF as other plots describing your data of interest. However, you can also set the maps to be printed into a PDF by setting a destination folder. The PDF will be named the variable name you provide, within the folder specified.To do this, simply specify the folder you want the PDF to be saved into.

Looping over more than 1 dimension

Let’s say you have a variable by age-sex-year. How do you write PDFs for each age-sex-year combination?

Example 6: Setting up code to loop over multiple dimensions

This is code that would generate a PDF of time series maps for each age, sex combination. The files would each write to the directory provided, with the additional variable name string adding on to the PDF file name after the variable name– for example, folder_ specified/armed_ forces_ age_ 1_ sex_ 2.pdf. The title of the map would also update based on what age-sex group it was iterating on.

# Mapping out variable by age/sex
for (s in unique(covar$sex)){
  for (a in unique(covar$age)){
    
    wmap(chloropleth_map=copy(mcnty_map),
               outline_map=state_map,
               data=copy(armed_forces[age==a & sex==s]),
               geog_id="mcnty",
               variable="armed_forces",
               map_title=paste0("% in Armed Forces; age:",a,", sex:",s),
               additional_variable_name_string=paste0("age_",a,"_sex_",s),
               series_dimension="year",
               series_sequence=c(1980, 1985, 1990, 1995, 2000, 2003, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014),
               destination_folder=paste0(parent_dir, "covariates/counties/mapped_covariates/armed_forces/" ),
               color_ramp=wpal("easter_to_earth"),
               histogram=TRUE)
  } # closing age loop
} # closing sex loop

Fine-Tuning your Legend, Title, and Font Aesthetics

There are a lot of different tuning parameters included in this function that allow for modifications to the map’s legend and titles.

Example 7: Messing with the fonts, legends, and title

Returning to the Alabama poverty data, let’s modify how the legend is portrayed, and also change the title font. In addition, I want to highlight very particular values in the legend’s color bar, and name them appropriately.

I also can change the base font size, and font family of the text on the map, using fontfamily and fontsize. If I’d like to be very specific about the size of my title, or legend text I can also override the font size of the legend manually using the title font size parameter and legend font size parameter. However, using fontsize scales the title and legend text to make sure that everything is readable based on the size you have specified, so this shouldn’t be strictly necessary. If you’d rather not have a serif font, you can also specify the fontfamily as “sans”.

wmap(chloropleth_map=mcnty_map[mcnty_map@data$state==1,],
           outline_map=NULL,
           data=us[year==2013],
           geog_id="mcnty",
           variable="poverty",
           map_title="% of Pop Below the Poverty \n Line in 2013 (Alabama)",
           map_subtitle="Just how bad is it, really?",
           legend_name="How many below \nthe poverty line",
           color_ramp=wpal("cool_toned"),
           override_scale=c(10,40),
           legend_bar_width=.6,
           legend_bar_length=10,
           legend_position="right",
           legend_font_face="italic",
           legend_breaks=c(10,20,30,40),
           legend_labels=c("1 in 10","1 in 5","1 in 3","Close to half"),
           title_font_size=16,
           title_font_face="bold",
           fontfamily="sans",
           fontsize=14,
           histogram=FALSE)

Example 8: Getting super custom: scrapping the function-built legend altoghether and adding one of your own

Say you want to use this function to merge and map your data, but you want a very specific legend, and color scale. You can (of course) use whatever colors you want, including the color brewer pallettes, in conjunction with whatever new legend specification is desired. To do this, save the map object as a variable, and then add whatever plot components you desire (including overriding any of the formatting you dislike by adding a new theme() portion to the plot).

This example gets a specific scale, and then modifies the legend to inlude a title, adds back in tick marks, etc.

get_scale <- function(x, p=0.01, max_length=6) {
  # step size for limits and breaks
  step <- 10^(floor(log10(max(abs(x)))) - 1)
  # get limits
  limits <- c(step * floor(quantile(x, p)/step),
              step * ceiling(quantile(x, 1-p)/step))
  # get breaks
  int <- step
  while(T) {
    brks <- seq(limits[1], limits[2], int)
    if (length(brks) < max_length) break
    int <- int + step
  }
  brks <- unique(c(brks, limits[2]))
  # get labels
  labs <- as.character(brks)
  if (min(x) < limits[1]) labs[1] <- paste0("<", labs[1])
  if (max(x) > limits[2]) labs[length(brks)] <- paste0(">", labs[length(brks)])
  return(list(limits=limits, brks=brks, labs=labs))
}

scale<-get_scale(simulated_us_data[time==9,]$death_rate,p=0,max_length=6)

library(RColorBrewer)

main<-series_map(chloropleth_map =mcnty_map,
           outline_map = state_map,
           data=simulated_us_data[time==9,],
           geog_id="mcnty",
           variable="death_rate",
           map_title = " ",
           histogram = F,
           override_scale=scale$limits,
           color_ramp = brewer.pal(11, "Spectral"),
           legend_position="bottom",
           title_font_size = 14,
           legend_font_size=14,
           legend_bar_width = 1.5,
           legend_bar_length = 40,
           color_value_breaks=NULL,
           outline_color="black",
           return_map_object_only=TRUE,
           legend_breaks=scale$brks,
           legend_labels=scale$labs)

## Regions defined for each Polygons

## [1] " "

modified_map<-main+guides(fill=guide_colourbar(title="Hypothetical Variable", 
                                 barheight=1, barwidth=25, label=TRUE, ticks=TRUE, draw.ulim=TRUE, 
                                 title.position="top", 
                                 title.theme=element_text(family="sans", size=10, angle=0),
                                 label.theme = element_text(family="sans",size=8, angle = 0)))

print(modified_map)

Mapping binned or categorical data

A new functionality in this version of the mapping suite is the ability to map categorical and discretely-binned variables. Ultimately, the function accomplishes this by plotting the chloropleth map with the data (or bins) as factors with assigned colors. This means that the function will assign colors to your data using the pallete you choose, ordered based on the factor’s levels. Re-ordering factored data and generating “breaks” is a topic for another document, so here’s a quick example of how to generate a binned data map.

# Adding a new column (as "ordered", an ordinal factor data type) based on quantile breaks 
simulated_us_data[,new_classed:=as.ordered(cut(random_county,classIntervals(simulated_us_data$random_county, 5, style = "quantile")$brks))]

wmap(chloropleth_map=mcnty_map,
     outline_map=state_map,
     data=simulated_us_data[time==1],
     geog_id="mcnty",
     variable="new_classed",
     map_title="An Example of Binned Data",
     color_ramp=wpal("stormy_seas"),
     legend_position="bottom",
     label_position="bottom",
     patch_width=5,
     patch_height=1)

## Regions defined for each Polygons

In general, I discourage binning your data– the differences between areas can be percieved to be larger than they are when values are near the break thresholds. To compare, here is the same data set, this time plotted with a continuous color ramp– note that neighboring areas now don’t seem so different from one another.

wmap(chloropleth_map=mcnty_map,
     outline_map=state_map,
     data=simulated_us_data[time==1],
     geog_id="mcnty",
     variable="random_county",
     map_title="Mapped with a continuous color ramp",
     color_ramp=wpal("stormy_seas"),
     legend_position="bottom",
     label_position="bottom")

## Regions defined for each Polygons

Telling different versions of the truth with color

The way in which you use color breaks, and what color schemes you use, greatly influences the story you tell with your data. If you are concerned about how outliers are driving your color scale, my recommendation is that you choose a percentile in your data frame, and manually overwrite the data points above/below that cutoff to equal the value of the percentile you have chosen. However, you can use the way the colors are scaled through the ramp to highlight or downplay certain values. Note that this is purely a trick of color– these maps are all true representations of the data- they just have very different appearances to a viewer.

As such, we aren’t even going to use real data for these examples– we will use simulated data for the United States at the county level, and show how different the same (totally fake) data set looks with different color schemes.

Example 9: How different color pallettes portray results

It’s crucial to note that we have value judgements associated with particular color schemes. For example, reds are often viewed as “bad”, while greens and blues are often “good”. It’s tempting to use a diverging color scheme that stretches from a “good” color to a “bad” one, but be wary of using diverging color schemes when the middle value is not theoretically significant. Also, keep in mind that the colors you use plotting covariates like race probably should not reflect “good” or “bad” color judgements. As such, I prefer to use color pallettes that stretch through a variety of colors (the easter-to-earth and cool-toned pallettes) without going from a “good” to “bad” color.

Look how much the color scale matters for differentiating between high and low values (of the exact same data!) in the following examples:

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Simple light->dark scale)",
           color_ramp=wpal("stormy_seas"),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Dark->light scale with color differentiation)",
           color_ramp=rev(wpal("sea_green_to_pink")),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Diverging color scale)",
           color_ramp=rev(wpal("purple_blue_diverging_from_white")),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (scary-looking diverging scale)",
           color_ramp=wpal("orange_blue_diverging_from_purple"),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Judgement-free, non-diverging value differentiation)",
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           color_ramp=rev(wpal("cool_toned")),
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

Example 10: How different color breaks within the same pallette portray results

In addition to what color ramp you use, how you stretch the colors within that ramp also has significance. Let’s start with a normal portrayal of a the variable, and then use color_ breaks to make more values look extreme. Note that the color_ breaks vector needs to start at 0 and end at 1. The default is an evenly spaced distribution of colors between the high and low– modifying this vector such that more of the color-space is taken up by the colors usually present only at the extremes will highlight areas of low or high value, while reducing the color space of the extreme values will create a muffled effect.

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (explicit, evenly spaced breaks-same as default)",
           color_ramp=rev(wpal("tan_blue_multi_diverging_from_green")),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           color_value_breaks=c(0,.25,.5,.75,1),
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Uneven breaks making more values appear high)",
           color_ramp=rev(wpal("tan_blue_multi_diverging_from_green")),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           color_value_breaks=c(0,.2,.5,.6,1),
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Uneven breaks making more values appear low)",
           color_ramp=rev(wpal("tan_blue_multi_diverging_from_green")),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           color_value_breaks=c(0,.4,.5,.8,1),
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

wmap(chloropleth_map=mcnty_map,
           outline_map=state_map,
           data=simulated_us_data[time==1],
           geog_id="mcnty",
           variable="random_county",
           map_title="Totally hypothetical variable \n (Uneven breaks making more values appear midrange)",
           color_ramp=rev(wpal("tan_blue_multi_diverging_from_green")),
           legend_position="left",
           legend_bar_width = 1.5,
           legend_bar_length = 10,
           color_value_breaks=c(0,.1,.5,.9,1),
           return_map_object_only=TRUE,
           histogram=FALSE)

## Regions defined for each Polygons

The Woodson Color Pallettes

The woodson color pallette function contains a variety of custom color pallettes generated using a cubehelix scheme (basically, creating a corkscrew through a 3-d space of red, blue, and green from dark to light). These color ramps are named, and can be called upon any time where you can enter in a vector of colors. The woodson color pallette script also includes some functions for plotting color schemes, as well.

The “Woodson Pallettes” function contains a list of named color ramps (either diverting or moving from dark to light). Accessing the colors is easy– the color ramps are stored as named lists within the fuction. You can explore the pallettes in the following ways:

# Getting a list of the colors contained in a list from the 
# pallette (this is how you "call" the colors in code)
wpal("black_to_light_1")

##  [1] "#000000" "#0C1D2C" "#0F433F" "#20663C" "#4A7E35" "#848941" "#BD9068"
##  [8] "#E19EA2" "#EEB9D8" "#F0DDF8"

If you would like to explore what any list of colors looks like in a pallette, simply use the plot_colors() function also contained within the woodson pallettes code to view a plot of the colors in question. The second argument is a string of the name of the color scheme, or the plot.

plot_colors(c("lightgreen","seagreen","darkgreen"),"IHME On Brand Colors")

## Warning: Ignoring unknown aesthetics: stat

If you would like to explore any of the named pallettes, you can use a specific function to view the named pallettes native to the pallete module:

view_wpal("thanksgiving")

## [1] "Plotting wpal color scheme thanksgiving"

Say you want to reverse a color scheme so that dark to light is reversed– this is trivial, just use rev() on whatever color scheme you are interested in.

plot_colors(rev(wpal("black_to_light_1")),"Reversed Black to Light Pallette")

## Warning: Ignoring unknown aesthetics: stat

If you just want a list of the names of the pallettes, without needing to plot them, you can call:

# Getting a list of the names of the pallettes:
names(wpal())

Don’t rememeber what your favorite color ramp is? No worries. You can plot out all of the color ramps available from the woodson pallettes funciton quickly!

view_wpal()

## [1] "No color specified; plotting all colors"

Appendix: Function Inputs

This is a full, exhaustive listing of the function parameter options.

Mandatory Inputs

chloropleth_map

   A SpatialPolygons object with data as a data.table rather than a data.frame.The data.table must include a unique geographic ID.

geog_id

   A string-- the name of the column that serves as the geographic ID that specifies the unit of analyisis for your data

variable

   A string-- the name of the column that will serve as the values you want to plot by your geog_id

You may have noticed that there are a lot of default settings in place! More options are detailed below.

Optional Inputs

Optional data/geometry

data

A data.table that contains the data you want to map (must contain geog_id, and the variable of interest, if specified. If a series dimension and/or series sequence is defined, those must also exist in this data set)

outline_map

Another SpatialPolygons object that you want to use the outlines from. Make sure your outline map and main map have the same projection.

Do you want the geography you are mapping outlined?

chlor_lcol

Color of outline (the geography your values map to). Default value is NA (no outline). You can enter any valid color here.
chlor_lsize

Size of outline (the geography your values map to). Default value is 0.0.

What elements do you want the plot to contain?

histogram

  TRUE/FALSE. If "TRUE", the plot will contain a histogram of the values at the bottom. For categorical data, this will be a bar chart with frequency by category.

hist_color

Default=NULL. If a character string for a color (or colors) are entered (ex:"grey"), the histogram will be that color rather than the color ramp used for the main map. Only available for numeric data.

dist_stats

   Vertical lines on the histogram plot showing summary statistics. To show this, provide a vector of numeric values (between 0 and 1) to serve as quantiles, and the options "mean" and "sd" can also be included. example: c("mean","sd",.1,.5,.9). Default=NULL. Only available for numeric data.

return_ map_ object_ only

  If "TRUE", you can assign the function to a variable, and store the map plot portion of this ggplot object so that you can combine it with other graphics at will. Default value is FALSE. This will never return the histogram.

destination_folder

  A string file path to a folder you want a PDF created in that will hold your map(s). The pdf will be have a title that is the variable name, plus any additional_ variable_ name_string you specify. If this ps specified, a pdf with the map(s) will be created.

Inputs for the color scheme of the maps

color_ramp

  A list of colors that will serve as the colors you "stretch" through based on your data values. This will default to a color scheme described in woodson pallettes called "Easter to Earth" that displays variation well when there are many geographic units. The fewer geographic units, the simpler you probably want your color ramp to be. See woodson palletes for more options, or create your own.

outline_color

  What color you want the outline of the additional geography to be (if provided). This can be any color r recognizes--suggestions might be "black","yellow", or "white". Default is white.

outline_size

  A numeric value that specifies how large you want your white outlines to be if you have specified an outline you want shown on your map. Default value is .1.

override_scale

  Values that will be used to stretch the color ramp instead of the min/max values present in the entire data set. Should either be structured "c(min,max)", with numeric values, or be "each_dimension", which will create a map series where each individual map in a series will based on the min/max from that subset of data.

color_ value_ breaks

  How you want the colors "stretched" across the range of minimum/maximum values. Default is NULL/ uniform distribution stretched across the color ramp from the minimum and maximum data values provided. Vector must begin with 0 and end with 1.

diverging_centerpoint

Accepts any numeric value between the minimum and maximum of your data set. Sets the center of your color scheme to the value defined. This is meant to be used with diverging color schemes. It will override any previously defined color_ value_ breaks. Default=NULL.

mean_color

   The color of lines you want to represent mean and standard deviation statistics, only relevant if dist_stats!=NULL. Default="red".

quantile_color

   The color of lines you want to represent the median and quantile lines on the histogram, only relevant if dist_stats!=NULL. Default="black".

Inputs for map titles

map_title

  A string that serves as the basis for your map title (if no dimensions are specified, the title will be as it is specified. If a dimension is specified, a phrase constructed using the series dimension and what you are mapping will be added to the plot title [ex="year:1990"].

map_subtitle

  A string that serves as the basis for your map subtitle. If you are not mapping a series based on a dimension (you are only making 1 map), this substitle text will appear as-is. If you are mapping based on a dimension (for example, over time, using the "year" field), and no text is provided, the subtitle to your map will be the dimension value that you are mapping your data set by. If you would like some kind of text ahead of this subtitle (for example, "Year: "), enter it here and it will appear before the subset dimension. If your subset dimension is ugly or numeric (for example, 1,2,3 instead of "Male","Female","Both"), it is recommended to generate a new, string variable that you can subset on, with a better name for the map titles.

fontfamily

  The family ("serif" or "sans") that will serve as the font type for all text on the plots.

fontsize

  The base size that all text is based around-- increasing this will make sure that your map text is readable.

additional_ variable_ name_ string

  This is an additonal string that you want to add to the name of the PDF to describe any other breakdowns you might be using. For example, if you had to map something by year, age, sex, you would first need to subset your data to be one age/sex group before plotting it out by year. If you subset your data in a loop, you could use this string to specify something along the lines of paste0("age_ ",a," _ sex _",s). NOTE: You need to put in a similar paste0 statement in your map title if you also want this sub-breakdown described in the title of your map, not just the file path to the PDF.

title_ font_ size

  How large you want the title font to be. No default; default values based on ggthemes tufte()'s default.

title_ font_ face

  Special properties of the title font. Options include "plain", "bold", "italic". Default is plain.

Inputs for generating series-maps

These inputs will help you create a series of maps, rather than 1 single output.

series_dimension

 A string-- the name of the column that will serve as the variable you loop through to create a series map. For example, year.

series_sequence

  A vector c(x,y,z...) that specifies a subset of the series dimensions you want to map. For example, if you have a data set that contains all years between 1980-2014, you can specify that you only want to plot out every other year by setting series sequence to be seq(1980,2014,2). This function will make sure all of the items you speficy actually exist within your series_dimension.

Inputs for map legend

legend_position

  Where you want the legend to go. Options are "top","bottom","right","left", and "none", which will create a map with no legend (useful if you want to return the map only and add a custom legend yourself). Default is "bottom".

legend_ font_ size

  How large you want the legend font to be. No default; default values based on ggthemes tufte()'s default.

legend_ font_ face

  Special properties of the legend font. Options include "plain", "bold", "italic". Default is plain.

legend_ bar_ width

  How fat you want the color bar that serves as the legend to be. Default value is 0.4.

legend_ bar_ length

  How long you want the color bar that serves as the legend to be. Default value is 20.

legend_ breaks

  An optional vector of the values you want to label in your legend's color scale.

legend_ labels

  An optional vector of the character strings you want to use to label your legend's color scale (must be same length as legend_breaks)

verbose

  Whether you want print statements from the function (default=F)

Inputs specific to discrete/categorical data

scramble_colors

  Mixes up the color ramp generated for the factor data you have chosen to map. Default is FALSE.

patch_width

 How wide the color swatch in the legend is. Default=.25.

patch_height

 How tall the color swatch in the legend is. Default=.25.

label_position

 Where the category labels appear compared to the color swatches. Default= "right".

Questions or Suggestions? Contact Rebecca Stubbs at stubbsrw@gmail.com

Introduction to the Woodson Mapping Suite (v3.0)

Rebecca Stubbs