Introduction

A choropleth map is a map that uses different shades or colors to represent data values across geographic regions. They’re perfect for visualizing things like population density, election results, income levels, or any data tied to geographic areas.

In this tutorial, we’ll learn how to create beautiful choropleth maps using R!

What You’ll Need

# Install packages if you don't have them:
# install.packages(c("tidyverse", "maps", "viridis"))

library(tidyverse)   # For data manipulation and ggplot2
library(maps)        # For map data
library(viridis)     # For color-blind friendly palettes

Getting Started: US State Data

Let’s start with a simple example using US state data. In this code walkthrough, we’ll map unemployment rates across states.

Step 1: Get Map Data

The maps package provides geographic coordinates for US states.

# Get US state map data
states_map <- map_data("state")

# Let's see what this looks like
head(states_map)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

What’s in this data?

  • long and lat: longitude and latitude coordinates
  • group: identifies separate polygons (some states have multiple parts)
  • region: the state name (lowercase)

Step 2: Prepare Your Data

Let’s create some sample data to map. In a real project, you’d load your own dataset.

# Create sample unemployment data for each state
state_unemployment <- data.frame(
  region = c("alabama", "alaska", "arizona", "arkansas", "california", 
             "colorado", "connecticut", "delaware", "florida", "georgia",
             "hawaii", "idaho", "illinois", "indiana", "iowa",
             "kansas", "kentucky", "louisiana", "maine", "maryland",
             "massachusetts", "michigan", "minnesota", "mississippi", "missouri",
             "montana", "nebraska", "nevada", "new hampshire", "new jersey",
             "new mexico", "new york", "north carolina", "north dakota", "ohio",
             "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
             "south dakota", "tennessee", "texas", "utah", "vermont",
             "virginia", "washington", "west virginia", "wisconsin", "wyoming"),
  unemployment_rate = c(3.2, 4.5, 3.8, 3.5, 4.1, 2.9, 3.7, 3.5, 3.3, 3.4,
                       2.8, 2.9, 4.2, 3.1, 2.7, 2.8, 3.9, 4.3, 2.9, 3.2,
                       3.0, 4.0, 2.8, 4.8, 3.1, 3.5, 2.6, 4.1, 2.6, 3.5,
                       4.4, 3.9, 3.7, 2.5, 3.8, 3.2, 3.7, 4.1, 3.4, 3.2,
                       2.7, 3.4, 3.5, 2.5, 2.8, 2.9, 3.9, 4.2, 2.9, 3.3)
)

# Preview the data
head(state_unemployment)
##       region unemployment_rate
## 1    alabama               3.2
## 2     alaska               4.5
## 3    arizona               3.8
## 4   arkansas               3.5
## 5 california               4.1
## 6   colorado               2.9

Step 3: Merge Map and Data

We need to combine the geographic data with our values.

# Join the map coordinates with our data
map_data_merged <- states_map %>%
  left_join(state_unemployment, by = "region")

# Check it worked
head(map_data_merged)
##        long      lat group order  region subregion unemployment_rate
## 1 -87.46201 30.38968     1     1 alabama      <NA>               3.2
## 2 -87.48493 30.37249     1     2 alabama      <NA>               3.2
## 3 -87.52503 30.37249     1     3 alabama      <NA>               3.2
## 4 -87.53076 30.33239     1     4 alabama      <NA>               3.2
## 5 -87.57087 30.32665     1     5 alabama      <NA>               3.2
## 6 -87.58806 30.32665     1     6 alabama      <NA>               3.2

Notice that now each coordinate point has an associated unemployment rate!

Step 4: Create Your First Choropleth

ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_map("albers", lat0 = 39, lat1 = 45) +
  scale_fill_viridis(option = "plasma", name = "Unemployment\nRate (%)") +
  labs(title = "US Unemployment Rates by State",
       subtitle = "Sample data for demonstration",
       caption = "Data: Fictional example") +
  theme_void() +
  theme(
    plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5),
    legend.position = "right"
  )

What’s happening here?

  • geom_polygon(): draws the state shapes
  • fill = unemployment_rate: colors states by unemployment rate
  • coord_map(): uses a proper map projection (Albers equal-area)
  • scale_fill_viridis(): applies a color-blind friendly color scale
  • theme_void(): removes axes and gridlines for a clean map

Customizing Your Map

Different Color Schemes

# Try different viridis palettes
p1 <- ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_map("albers", lat0 = 39, lat1 = 45) +
  scale_fill_viridis(option = "magma", name = "Rate (%)") +
  labs(title = "Magma Palette") +
  theme_void()

p2 <- ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_map("albers", lat0 = 39, lat1 = 45) +
  scale_fill_gradient(low = "lightblue", high = "darkblue", name = "Rate (%)") +
  labs(title = "Blue Gradient") +
  theme_void()

# Display side by side
library(gridExtra)
grid.arrange(p1, p2, ncol = 2)

County-Level Map

Want more detail? Let’s make a county-level map!

# Get county data
counties_map <- map_data("county")

# Create sample data for a few states
county_data <- counties_map %>%
  filter(region %in% c("california", "oregon", "washington")) %>%
  select(region, subregion) %>%
  distinct() %>%
  mutate(value = rnorm(n(), mean = 50, sd = 15))

# Merge
county_merged <- counties_map %>%
  filter(region %in% c("california", "oregon", "washington")) %>%
  left_join(county_data, by = c("region", "subregion"))

# Plot
ggplot(county_merged, aes(x = long, y = lat, group = group, fill = value)) +
  geom_polygon(color = "gray80", size = 0.1) +
  coord_map() +
  scale_fill_viridis(option = "inferno", name = "Value") +
  labs(title = "West Coast Counties",
       subtitle = "More detailed geographic resolution") +
  theme_void() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5)
  )

Pro Tips

1. Handling Missing Data

# What if some states don't have data?
incomplete_data <- state_unemployment %>%
  slice(1:40)  # Only first 40 states

map_with_missing <- states_map %>%
  left_join(incomplete_data, by = "region")

ggplot(map_with_missing, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_map("albers", lat0 = 39, lat1 = 45) +
  scale_fill_viridis(option = "plasma", name = "Unemployment\nRate (%)", na.value = "gray90") +
  labs(title = "Handling Missing Data",
       subtitle = "Gray = No data available") +
  theme_void()

2. Adding State Labels

# Calculate state centroids for labels
state_centers <- states_map %>%
  group_by(region) %>%
  summarize(long = mean(long), lat = mean(lat)) %>%
  left_join(state_unemployment, by = "region")

# Create map with labels
ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
  geom_polygon(color = "white", size = 0.2) +
  geom_text(data = state_centers, 
            aes(x = long, y = lat, label = round(unemployment_rate, 1), group = NULL),
            size = 2.5, color = "white", fontface = "bold") +
  coord_map("albers", lat0 = 39, lat1 = 45) +
  scale_fill_viridis(option = "plasma", name = "Rate (%)") +
  labs(title = "Map with Data Labels") +
  theme_void()

Quick Reference

Here’s the basic structure to remember when creating choropleth maps:

# TEMPLATE - Replace the ALL-CAPS placeholders with your actual values:

ggplot(YOUR_MERGED_DATA, 
       aes(x = long, y = lat, group = group, fill = YOUR_VARIABLE)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_map() +
  scale_fill_viridis() +  # or scale_fill_gradient(low = "color1", high = "color2")
  theme_void()
# Uncomment the code block below to visualize with example set of values:
# Example with real values:
# ggplot(map_data_merged, 
#        aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
#   geom_polygon(color = "white", size = 0.2) +
#   coord_map() +
#   scale_fill_viridis() +
#   theme_void()

Next Steps

Now you can:

  • Load your own data and join it to map data
  • Experiment with different color palettes
  • Add more layers like points, labels, or highlights
  • Try other geographic regions (world maps, specific states)
  • Export your maps with ggsave()