A choropleth map is a map that shows data using different colors or shades for different areas, like states or countries. It’s great for showing things like population or election results. In this tutorial, we’ll make a map of the United States showing random population data for each state. Don’t worry, the steps are simple, and I’ll explain everything as we go!
To get started, we’ll need a few packages in R. Packages are like tools that add extra features to R. Here’s how to install and load them:
# Install packages if you don’t have them already
if (!require(tidyverse)) install.packages("tidyverse") # For data manipulation and visualization
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require(maps)) install.packages("maps") # For map data
## Loading required package: maps
##
## Attaching package: 'maps'
##
## The following object is masked from 'package:purrr':
##
## map
if (!require(viridis)) install.packages("viridis") # For pretty colors
## Loading required package: viridis
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
##
## The following object is masked from 'package:maps':
##
## unemp
# Load the libraries
library(tidyverse)
library(maps)
library(viridis)
Once these are loaded, we’re ready to make a map!
R has built-in data for maps, like U.S. states. We’ll start by getting that data and looking at what it contains.
# Get U.S. state map data
state_data <- map_data("state")
# Take a peek at the data
head(state_data)
## long lat group order region subregion
## 1 -87.46201 30.38968 1 1 alabama <NA>
## 2 -87.48493 30.37249 1 2 alabama <NA>
## 3 -87.52503 30.37249 1 3 alabama <NA>
## 4 -87.53076 30.33239 1 4 alabama <NA>
## 5 -87.57087 30.32665 1 5 alabama <NA>
## 6 -87.58806 30.32665 1 6 alabama <NA>
This data includes information like the borders of each state, so R can draw them on a map. Each row represents part of the state’s outline.
To make our map interesting, let’s pretend we have population data for each state. We’ll create some random numbers to represent populations.
# Create random population data for each state
set.seed(123) # Ensures we get the same random numbers every time
state_pop <- data.frame(
region = tolower(state.name), # Match state names with the map data
population = runif(50, min = 1e6, max = 4e7) # Random numbers between 1 million and 40 million
)
# Look at the first few rows of the data
head(state_pop)
## region population
## 1 alabama 12215523
## 2 alaska 31743900
## 3 arizona 16950100
## 4 arkansas 35437679
## 5 california 37678224
## 6 colorado 2776703
Now we’ll combine this population data with the map data, so we can color the map based on population.
# Combine the map data with the population data
choropleth_data <- state_data %>%
left_join(state_pop, by = "region")
# Check the combined data
head(choropleth_data)
## long lat group order region subregion population
## 1 -87.46201 30.38968 1 1 alabama <NA> 12215523
## 2 -87.48493 30.37249 1 2 alabama <NA> 12215523
## 3 -87.52503 30.37249 1 3 alabama <NA> 12215523
## 4 -87.53076 30.33239 1 4 alabama <NA> 12215523
## 5 -87.57087 30.32665 1 5 alabama <NA> 12215523
## 6 -87.58806 30.32665 1 6 alabama <NA> 12215523
Here’s the fun part! We’ll use the ggplot2 package to create the map.
# Create the choropleth map
ggplot(choropleth_data, aes(x = long, y = lat, group = group, fill = population)) +
geom_polygon(color = "white") + # Draw state borders in white
scale_fill_viridis_c(option = "C") + # Use a color scale to show population
theme_void() + # Remove background and axis labels
labs(
title = "U.S. State Population Map",
fill = "Population"
)
Here’s what’s happening in this code:
-aes() tells R which columns to use for the map. long and lat are the state borders, and fill is the population.
-geom_polygon() draws the map with colored areas for each state.
-scale_fill_viridis_c() adds a nice color scale for the population data.
-theme_void() removes background lines and text for a clean map.
-labs() adds a title and labels.
Let’s make the map easier to read by changing the color legend and centering the title.
ggplot(choropleth_data, aes(x = long, y = lat, group = group, fill = population)) +
geom_polygon(color = "white") +
scale_fill_viridis_c(
option = "C",
name = "Population (millions)", # Change the legend title
labels = scales::label_number(scale = 1e-6, suffix = "M") # Show numbers in millions
) +
theme_void() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"), # Center and style title
legend.position = "right" # Place the legend on the right
) +
labs(title = "Choropleth Map: U.S. State Populations")
That’s it! You’ve made a choropleth map in R. Here’s what we did:
Loaded the map data for U.S. states.
Created random population data.
Combined the map and population data.
Used ggplot2 to create a colorful map.
Customized the map to make it more readable.
This is just the start! You can use real data, try other regions, or explore different color schemes to create maps for your projects.