Reddit Question - Adding a Legend to a Choropleth

User Question

I’ve put together a neat map with gun related data, and need to add a key. Unfortunately, there are multiple layers so I have to set it up myself. This is no problem, except for finding a location.

I know there is an x,y coordinate option in the legend function, but since the graph is longitudinal (I think), I can’t use those. I also can’t use a set location like ‘bottomright’. I’ve tried to use specific lat/long coordinates but that also doesn’t solve the problem. Anyone have any suggestions for how I should approach this? Or if I’m even approaching the right way? Here is my code so far:

# Sample Code

library(ggplot2)
library(fiftystater)
library(colorplaner)
library(RColorBrewer)
library(maps)
library(mapdata)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:ggplot2':
## 
##     vars

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#Gun ownership and locations of shootings over the last 12 months.

# prepare data frame
data("fifty_states")

# Set up data frame for Arrest Data
df_Arrests <- USArrests

# Set up data frame for gun ownership rate data
df_GunOwnership <- read.csv("MarchDVDat.csv")

#Don't use code like the following. It's unclear what you are doing.
#It's also very fragile. If your input data file changes, it will break.
#Instead, use a 'tidy data' approach. I use the 'filter()' function from the
# 'dplyr' package as an example.
#df_gunownership <- df_gunownership[-9, ]

#Exclude DC from data frame using dplyr::filter(). This approach is much clearer
#and maintainable than your original approach.
df_GunOwnership <- df_GunOwnership %>% filter(State != 'DC')

#Get rid of the 'DC' in the factor.
df_GunOwnership$State <- droplevels(df_GunOwnership$State)

#Rename 'State' to 'state' - Personally, I wouldn't bother to rename
#such a minmal change, but there's nothing wrong if that's what you would rather do.
#What you've done here is ok, but a 'tidy data', like I use below, is cleaner.
#colnames(data)[1] <- "state"

#Using the 'tidy data' approach
#If I'm going to go to the trouble to change 'State', I'm also going to change 'Gun_Own'
#Note that I rename 'State' as 'stateabb'. This change will be very important
#for the 'merge()' operation below.
df_GunOwnership <- df_GunOwnership %>% transmute(stateabb = as.character(State), gunown = Gun_Own)

#What you were doing here is also very fragile. See below for one possible better approach.
#data$state <- df$state

#Your real goal here is to merge your arrest data and Gun ownership data.
#We could use some 'R-magic' to join the two data frames in one command.
#However, since you're still learning, let's do it in steps.

#First, let's create a new data frame from df_arrests, with the only difference
#being the addition of a state abbreviation for state name.
#R has two built in data sets for interconverting state names and abbreviations
#that will be very useful to you.

df_arr2 <- df_Arrests %>% mutate(stateabb = state.abb[match(rownames(.), state.name)], state = rownames(.))

#Second, let's join the two dataframes
df_all <- merge(df_arr2, df_GunOwnership)

#min(data$Gun_Own)
#max(data$Gun_Own)

#My suggestion is to get rid of all this code and replace it with the
#cut() function. See below.
# data$Bucket = 0
# for (i in 1:nrow(data)) {
#   if (data[i, 2] <= 10) {
#       data[i, 3] = 1
#   } else if (data[i, 2] > 10 & data[i, 2] <= 20) {
#       data[i, 3] = 2
#   } else if (data[i, 2] > 20 & data[i, 2] <= 30) {
#       data[i, 3] = 3
#   } else if (data[i, 2] > 30 & data[i, 2] <= 40) {
#       data[i, 3] = 4
#   } else if (data[i, 2] > 40 & data[i, 2] <= 50) {
#       data[i, 3] = 5
#   } else if (data[i, 2] > 50) {
#       data[i, 3] = 6
#   }
# }

df_all$bucket <- cut(df_all$gunown, breaks = c(0, 10, 20, 30, 40, 50, Inf), include.lowest = TRUE, labels = FALSE)

p1 <- ggplot(df_all) +
  # map points to the fifty_states shape data
  geom_map(
    aes(fill = factor(bucket), map_id = tolower(state)),
    map = fifty_states,
    color = 'gray') + 
  expand_limits(x = fifty_states$long, y = fifty_states$lat) +
  coord_map() +
  scale_x_discrete(breaks = NULL) +
  scale_y_discrete(breaks = NULL) +
  labs(x = "", y = "") +
  theme(legend.position = "right",
    panel.background = element_blank())
#p1

# Get color palletes here: http://colorbrewer2.org/#type=sequential&scheme=YlGn&n=6

#Shooting locations since Jan 1, 2017

labs <- data.frame(
  long = c(-80.137314,
           -90.405647,
           -89.6201,
           -81.3792,
           -106.3186,
           -115.1398,
           -98.0567,
           -122.4580,
           -79.3892,
           -80.2532,
           -76.4621),
  lat = c(26.122438,
          32.855133,
          44.8872,
          28.5383,
          36.2072,
          36.1699,
          29.2733,
          40.0007,
          40.0520,
          26.3108,
          38.2576),
  names = c('Ft. Lauderdale',
            'Yazoo City',
            'Rothschild',
            'Orlando',
            'Abiquiu',
            'Las Vegas',
            'Sutherland Springs',
            'Rancho Tehama',
            'Melcroft',
            'Parkland',
            'Lexington Park'),
  victims = c(11, 4, 4, 5, 5, 480, 46, 15, 5, 32, 2),
  stringsAsFactors = FALSE
)


p2 <- p1 + fifty_states_inset_boxes() +
  scale_fill_manual(
      'Percent Gun Owners',
    values = c(
      '#fef0d9',
      '#fdd49e',
      '#fdbb84',
      '#fc8d59',
      '#e34a33',
      '#b30000'),
    labels = c('0-10', '10-20', '20-30', '30-40', '40-50', '50-62'))

p3 <- p2 +
      geom_point(
        data = labs,
        aes(x = long, y = lat),
        color = ifelse(labs$victims > 60, 'black', 'blue'),
        size = ifelse(labs$victims > 60, 100, labs$victims / 5 * 8),
        alpha = .5)
p3

p4 <- p3 +
      geom_point(data = labs,
                aes(x = long, y = lat),
                color = 'green',
                size = 1.5)
p4

p5 <- p4 + geom_text(data = labs,
                     aes(x = long, y = lat, label = names, vjust = "top", hjust = "left")
                     )
p5

#legend(
#    'bottomright',
#    legend = c('Scaled Victims', 'Shooting Location'),
#    pch = 16,
#    col = c('blue', 'green')
#)

My First Reply

The reason you’re having trouble with a legend is because you are building your plot with ggplot2 (which is the right choice, BTW), but you are using the legend() function, which is only valid for base graphics.

Although you can mix and match ggplot2 and base graphics, unless you have a deep understanding of how the various plotting functions are working at an implementation specific level—which less than 1% of all R users would—I would not mix and match them, because, often, even though the result might work initially, it’s likely not going to be stable or portable. ggplot2 has most (if not all) of the capabilities of base graphics, and because of the way ggplot2 builds a graph from successive layers, debugging a ggplot2 graph is much easier than one generated with base graphics.

I have the following recommendations for you:

Do all of your plotting with ggplot2 functions only. To get some ideas of how to manipulate the legend with ggplot(), take a look at http://www.cookbook-r.com/Graphs/Legends_(ggplot2)/.
Save your plot more frequently at intermediate stages.
Indent your code to make it easier to read and debug.

Your code is hard to debug because of the way you formatted it. Your formatting isn’t terrible, but you could make debugging easier with a few simple changes with respect to indenting.

If you still need help, share a copy of MarchDVDat.csv and I’ll fix your legend and reformat your code for you to show you how to make it easier to read and debug.

Reddit Question - Adding a Legend to a Choropleth

Thaufas

March 30, 2018

Background

User Question

My First Reply