A choropleth map is a map that uses different shades or colors to represent data values across geographic regions. They’re perfect for visualizing things like population density, election results, income levels, or any data tied to geographic areas.
In this tutorial, we’ll learn how to create beautiful choropleth maps using R!
Let’s start with a simple example using US state data. In this code walkthrough, we’ll map unemployment rates across states.
The maps package provides geographic coordinates for US
states.
# Get US state map data
states_map <- map_data("state")
# Let's see what this looks like
head(states_map)## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
What’s in this data?
long and lat: longitude and latitude
coordinatesgroup: identifies separate polygons (some states have
multiple parts)region: the state name (lowercase)Let’s create some sample data to map. In a real project, you’d load your own dataset.
# Create sample unemployment data for each state
state_unemployment <- data.frame(
region = c("alabama", "alaska", "arizona", "arkansas", "california",
"colorado", "connecticut", "delaware", "florida", "georgia",
"hawaii", "idaho", "illinois", "indiana", "iowa",
"kansas", "kentucky", "louisiana", "maine", "maryland",
"massachusetts", "michigan", "minnesota", "mississippi", "missouri",
"montana", "nebraska", "nevada", "new hampshire", "new jersey",
"new mexico", "new york", "north carolina", "north dakota", "ohio",
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina",
"south dakota", "tennessee", "texas", "utah", "vermont",
"virginia", "washington", "west virginia", "wisconsin", "wyoming"),
unemployment_rate = c(3.2, 4.5, 3.8, 3.5, 4.1, 2.9, 3.7, 3.5, 3.3, 3.4,
2.8, 2.9, 4.2, 3.1, 2.7, 2.8, 3.9, 4.3, 2.9, 3.2,
3.0, 4.0, 2.8, 4.8, 3.1, 3.5, 2.6, 4.1, 2.6, 3.5,
4.4, 3.9, 3.7, 2.5, 3.8, 3.2, 3.7, 4.1, 3.4, 3.2,
2.7, 3.4, 3.5, 2.5, 2.8, 2.9, 3.9, 4.2, 2.9, 3.3)
)
# Preview the data
head(state_unemployment)## region unemployment_rate
## 1 alabama 3.2
## 2 alaska 4.5
## 3 arizona 3.8
## 4 arkansas 3.5
## 5 california 4.1
## 6 colorado 2.9
We need to combine the geographic data with our values.
# Join the map coordinates with our data
map_data_merged <- states_map %>%
left_join(state_unemployment, by = "region")
# Check it worked
head(map_data_merged)## long lat group order region subregion unemployment_rate
## 1 -87.46201 30.38968 1 1 alabama <NA> 3.2
## 2 -87.48493 30.37249 1 2 alabama <NA> 3.2
## 3 -87.52503 30.37249 1 3 alabama <NA> 3.2
## 4 -87.53076 30.33239 1 4 alabama <NA> 3.2
## 5 -87.57087 30.32665 1 5 alabama <NA> 3.2
## 6 -87.58806 30.32665 1 6 alabama <NA> 3.2
Notice that now each coordinate point has an associated unemployment rate!
ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
geom_polygon(color = "white", size = 0.2) +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_viridis(option = "plasma", name = "Unemployment\nRate (%)") +
labs(title = "US Unemployment Rates by State",
subtitle = "Sample data for demonstration",
caption = "Data: Fictional example") +
theme_void() +
theme(
plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 12, hjust = 0.5),
legend.position = "right"
)What’s happening here?
geom_polygon(): draws the state shapesfill = unemployment_rate: colors states by unemployment
ratecoord_map(): uses a proper map projection (Albers
equal-area)scale_fill_viridis(): applies a color-blind friendly
color scaletheme_void(): removes axes and gridlines for a clean
map# Try different viridis palettes
p1 <- ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
geom_polygon(color = "white", size = 0.2) +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_viridis(option = "magma", name = "Rate (%)") +
labs(title = "Magma Palette") +
theme_void()
p2 <- ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
geom_polygon(color = "white", size = 0.2) +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_gradient(low = "lightblue", high = "darkblue", name = "Rate (%)") +
labs(title = "Blue Gradient") +
theme_void()
# Display side by side
library(gridExtra)
grid.arrange(p1, p2, ncol = 2)Want more detail? Let’s make a county-level map!
# Get county data
counties_map <- map_data("county")
# Create sample data for a few states
county_data <- counties_map %>%
filter(region %in% c("california", "oregon", "washington")) %>%
select(region, subregion) %>%
distinct() %>%
mutate(value = rnorm(n(), mean = 50, sd = 15))
# Merge
county_merged <- counties_map %>%
filter(region %in% c("california", "oregon", "washington")) %>%
left_join(county_data, by = c("region", "subregion"))
# Plot
ggplot(county_merged, aes(x = long, y = lat, group = group, fill = value)) +
geom_polygon(color = "gray80", size = 0.1) +
coord_map() +
scale_fill_viridis(option = "inferno", name = "Value") +
labs(title = "West Coast Counties",
subtitle = "More detailed geographic resolution") +
theme_void() +
theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5)
)# What if some states don't have data?
incomplete_data <- state_unemployment %>%
slice(1:40) # Only first 40 states
map_with_missing <- states_map %>%
left_join(incomplete_data, by = "region")
ggplot(map_with_missing, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
geom_polygon(color = "white", size = 0.2) +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_viridis(option = "plasma", name = "Unemployment\nRate (%)", na.value = "gray90") +
labs(title = "Handling Missing Data",
subtitle = "Gray = No data available") +
theme_void()# Calculate state centroids for labels
state_centers <- states_map %>%
group_by(region) %>%
summarize(long = mean(long), lat = mean(lat)) %>%
left_join(state_unemployment, by = "region")
# Create map with labels
ggplot(map_data_merged, aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
geom_polygon(color = "white", size = 0.2) +
geom_text(data = state_centers,
aes(x = long, y = lat, label = round(unemployment_rate, 1), group = NULL),
size = 2.5, color = "white", fontface = "bold") +
coord_map("albers", lat0 = 39, lat1 = 45) +
scale_fill_viridis(option = "plasma", name = "Rate (%)") +
labs(title = "Map with Data Labels") +
theme_void()Here’s the basic structure to remember when creating choropleth maps:
# TEMPLATE - Replace the ALL-CAPS placeholders with your actual values:
ggplot(YOUR_MERGED_DATA,
aes(x = long, y = lat, group = group, fill = YOUR_VARIABLE)) +
geom_polygon(color = "white", size = 0.2) +
coord_map() +
scale_fill_viridis() + # or scale_fill_gradient(low = "color1", high = "color2")
theme_void()
# Uncomment the code block below to visualize with example set of values:
# Example with real values:
# ggplot(map_data_merged,
# aes(x = long, y = lat, group = group, fill = unemployment_rate)) +
# geom_polygon(color = "white", size = 0.2) +
# coord_map() +
# scale_fill_viridis() +
# theme_void()Now you can:
ggsave()