Choropleth Map with Tutorial

A choropleth map is a map that divides a map into discrete regions and then uses colors to indicate information about the values of a single variable within those regions. This is a guide that shows how to create one using the highcharter package in R.

Getting the Data

First, you need a data set to work with. For this example, we’re going to use a data set of complaints about consumer financial products and services received by the Consumer Financial Protection Bureau of the United States, and map the number of complaints filed within each state.

You first read and store the data in R. In this example, the data is a csv file stored within the same folder as this R file. As such, we can just use “read_csv(”[filename])“, since R assumes that an unspecified file path means the same folder as the R file itself.

Complaints <- read_csv("complaints.csv")

## Rows: 4208806 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (15): Product, Sub-product, Issue, Sub-issue, Consumer complaint narrat...
## dbl   (1): Complaint ID
## date  (2): Date received, Date sent to company
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Processing

Next, we’re going to need to process the data. This step involves thinking through the context of your map. Because we’re dealing with raw numbers of events occurring within specific regions, we need to control for population if we want to draw meaningful interpretations from our map.

To do that, we’re going to need data on population in each of the United States. To simplify things, we can look at a specific year: 2018.

First, we need to alter the Complaints data to only include entries from the year 2018.

Complaints <-
  Complaints %>%
  mutate(year = lubridate::year(Complaints$`Date received`)) %>% # this creates a new column, year, which contains the year split off from the rest of the 'Date received' data.
  filter(year == 2018) ## this filters the data so that only the year 2018 is shown.

Then, we need to get population data for the United States, divided by state, in 2018. To do this, we’re going to import a data set from the internet, and alter it to make sure it.

census_pop_est_2018 <- read_csv("https://bcheggeseth.github.io/112_spring_2023/data/us_census_2018_state_pop_est.csv") %>% # This imports the linked dataset.
  separate(state, into = c("dot", "state"), extra = "merge") %>% # This splits the "state" column to remove the dot at the start of each state name.
  select(-dot) # This removes the dot column that was just split off.

## Rows: 51 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (1): est_pop_2018
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Third, we need to alter the Complaints data to ensure it follows the same state naming scheme as our population data, and clean it up so that it only shows the data we want: the state name (in lowercase) and the number of complaints for that state. To do so, we’re going to preserve the existing Complaints dataset and instead create a new data set named complaints_by_state.

complaints_by_state <- Complaints %>%
  filter(`State` %in% c("AK", "AL", "AR", "AZ", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "IA", "ID", "IL", "IN", "KS", "KY", "LA", "MA", "MD", "ME", "MI", "MN", "MO", "MS", "MT", "NC", "ND", "NE", "NH", "NJ", "NM", "NV", "NY", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VA", "VT", "WA", "WI", "WV", "WY")) %>% # This filters out all States whose abbreviations aren't contained within the given list.
  count(`State`) %>% # This creates a new column, n, which counts the number of unique complaint rows for each state.
  mutate(state_name = abbr2state(`State`)) %>% # This creates a new column, state_name, which contains the entire state names for each state, in lowercase. This ensures the state identification is consistent between population data and complaint data.
  select(state_name, n) # This removes all columns besides state_name and n.

Finally, we can combine the two data sets into a new data set, complaints_with_2018_pop_est.

complaints_with_2018_pop_est <-
  complaints_by_state %>%
  left_join(census_pop_est_2018,
    by = c("state_name" = "state")
  ) %>% # This merges the two data sets, combining each row by matching the state identifiers.
  mutate(complaints_per_10000 = (n / est_pop_2018) * 10000) # This creates our final variable, complaints_per_10000, which is equal to the number of complaints filed in each state per 10,000 people.

Choropleths

To make the choropleth map itself, we need to find and save the background map and regions themselves. When using highcharter, we do this by looking at the relevant website: https://code.highcharts.com/mapdata/.

Earlier, we filtered our data to only include the 50 US states and the District of Columbia. Therefore, we find a map that only includes those regions. Since download_map_data takes in a js file, we locate the url https://code.highcharts.com/mapdata/countries/us/us-all.js and enter the last few words (see below).

mapdata <- get_data_from_map(download_map_data("countries/us/us-all")) # Saves the map to mapdata.

glimpse(mapdata) # Previewing the mapdata file.

## Rows: 52
## Columns: 19
## $ `hc-group`    <chr> "admin1", "admin1", "admin1", "admin1", "admin1", "admin…
## $ `hc-middle-x` <dbl> 0.36, 0.56, 0.51, 0.47, 0.41, 0.43, 0.71, 0.46, 0.51, 0.…
## $ `hc-middle-y` <dbl> 0.47, 0.52, 0.67, 0.52, 0.38, 0.40, 0.67, 0.38, 0.50, 0.…
## $ `hc-key`      <chr> "us-ma", "us-wa", "us-ca", "us-or", "us-wi", "us-me", "u…
## $ `hc-a2`       <chr> "MA", "WA", "CA", "OR", "WI", "ME", "MI", "NV", "NM", "C…
## $ labelrank     <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "…
## $ hasc          <chr> "US.MA", "US.WA", "US.CA", "US.OR", "US.WI", "US.ME", "U…
## $ `woe-id`      <chr> "2347580", "2347606", "2347563", "2347596", "2347608", "…
## $ `state-fips`  <chr> "25", "53", "6", "41", "55", "23", "26", "32", "35", "8"…
## $ fips          <chr> "US25", "US53", "US06", "US41", "US55", "US23", "US26", …
## $ `postal-code` <chr> "MA", "WA", "CA", "OR", "WI", "ME", "MI", "NV", "NM", "C…
## $ name          <chr> "Massachusetts", "Washington", "California", "Oregon", "…
## $ country       <chr> "United States of America", "United States of America", …
## $ region        <chr> "Northeast", "West", "West", "West", "Midwest", "Northea…
## $ longitude     <chr> "-71.99930000000001", "-120.361", "-119.591", "-120.386"…
## $ `woe-name`    <chr> "Massachusetts", "Washington", "California", "Oregon", "…
## $ latitude      <chr> "42.3739", "47.4865", "36.7496", "43.8333", "44.3709", "…
## $ `woe-label`   <chr> "Massachusetts, US, United States", "Washington, US, Uni…
## $ type          <chr> "State", "State", "State", "State", "State", "State", "S…

hcmap("countries/us/us-all") # Previewing the map.

Finally, we apply our data to this region map. When graphically displaying data, make sure to include alternative text for visually-impaired users, inserted into the {r} declaration.

hcmap(
  "countries/us/us-all", # Determines the background map, see above.
  data = complaints_with_2018_pop_est, # Determines the data set being used.
  value = "complaints_per_10000", # Determines the value that each state will be colored to represent.
  joinBy = c("name", "state_name"), # Indicates how states should be identified and paired between the complaint data set and the map. The first value is the name for the map data, and the second is the name for the complaint data.
  dataLabels = list(enabled = TRUE, format = "{point.name}", fontSize = 0.3), # Indicates that the state names should be shown on the map, and their display settings. Set enabled to FALSE to turn these off in your own map.
  borderColor = "#FAFAFA", # Sets the border color between regions (states), using a hex code. 
  borderWidth = 0.1, # Sets the border width.
  tooltip = list( # Configures the tooltip shown when one hovers over a state.
    valueDecimals = 1, # Sets the maximum number of decimals shown for each state's complain count to 1.
    valueSuffix = " Complaints" # Appends "Complaints" at the end of each state's tooltip value.
  )
) %>%
  hc_title(text = "Complaints per 10,000 People") %>% # Sets the title of the map. 
  hc_subtitle(text = "By: Ev K.") # Sets the subtitle of the map.