Introduction

A choropleth map is a map that shows data using different colors or shades for different areas, like states or countries. It’s great for showing things like population or election results. In this tutorial, we’ll make a map of the United States showing random population data for each state. Don’t worry, the steps are simple, and I’ll explain everything as we go!

Step 1: Set Up Your Tools

To get started, we’ll need a few packages in R. Packages are like tools that add extra features to R. Here’s how to install and load them:

# Install packages if you don’t have them already
if (!require(tidyverse)) install.packages("tidyverse") # For data manipulation and visualization
## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require(maps)) install.packages("maps") # For map data
## Loading required package: maps
## 
## Attaching package: 'maps'
## 
## The following object is masked from 'package:purrr':
## 
##     map
if (!require(viridis)) install.packages("viridis") # For pretty colors
## Loading required package: viridis
## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## 
## The following object is masked from 'package:maps':
## 
##     unemp
# Load the libraries
library(tidyverse)
library(maps)
library(viridis)

Once these are loaded, we’re ready to make a map!

Step 2: Get the Map Data

R has built-in data for maps, like U.S. states. We’ll start by getting that data and looking at what it contains.

# Get U.S. state map data
state_data <- map_data("state")

# Take a peek at the data
head(state_data)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

This data includes information like the borders of each state, so R can draw them on a map. Each row represents part of the state’s outline.

Step 3: Add Population Data

To make our map interesting, let’s pretend we have population data for each state. We’ll create some random numbers to represent populations.

# Create random population data for each state
set.seed(123) # Ensures we get the same random numbers every time
state_pop <- data.frame(
  region = tolower(state.name), # Match state names with the map data
  population = runif(50, min = 1e6, max = 4e7) # Random numbers between 1 million and 40 million
)

# Look at the first few rows of the data
head(state_pop)
##       region population
## 1    alabama   12215523
## 2     alaska   31743900
## 3    arizona   16950100
## 4   arkansas   35437679
## 5 california   37678224
## 6   colorado    2776703

Now we’ll combine this population data with the map data, so we can color the map based on population.

# Combine the map data with the population data
choropleth_data <- state_data %>%
  left_join(state_pop, by = "region")

# Check the combined data
head(choropleth_data)
##        long      lat group order  region subregion population
## 1 -87.46201 30.38968     1     1 alabama      <NA>   12215523
## 2 -87.48493 30.37249     1     2 alabama      <NA>   12215523
## 3 -87.52503 30.37249     1     3 alabama      <NA>   12215523
## 4 -87.53076 30.33239     1     4 alabama      <NA>   12215523
## 5 -87.57087 30.32665     1     5 alabama      <NA>   12215523
## 6 -87.58806 30.32665     1     6 alabama      <NA>   12215523

Step 4: Make the Map

Here’s the fun part! We’ll use the ggplot2 package to create the map.

# Create the choropleth map
ggplot(choropleth_data, aes(x = long, y = lat, group = group, fill = population)) +
  geom_polygon(color = "white") + # Draw state borders in white
  scale_fill_viridis_c(option = "C") + # Use a color scale to show population
  theme_void() + # Remove background and axis labels
  labs(
    title = "U.S. State Population Map",
    fill = "Population"
  )

Here’s what’s happening in this code:

-aes() tells R which columns to use for the map. long and lat are the state borders, and fill is the population.

-geom_polygon() draws the map with colored areas for each state.

-scale_fill_viridis_c() adds a nice color scale for the population data.

-theme_void() removes background lines and text for a clean map.

-labs() adds a title and labels.

Step 5: Customize the Map

Let’s make the map easier to read by changing the color legend and centering the title.

ggplot(choropleth_data, aes(x = long, y = lat, group = group, fill = population)) +
  geom_polygon(color = "white") +
  scale_fill_viridis_c(
    option = "C",
    name = "Population (millions)", # Change the legend title
    labels = scales::label_number(scale = 1e-6, suffix = "M") # Show numbers in millions
  ) +
  theme_void() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"), # Center and style title
    legend.position = "right" # Place the legend on the right
  ) +
  labs(title = "Choropleth Map: U.S. State Populations")

Conclusion

That’s it! You’ve made a choropleth map in R. Here’s what we did:

  1. Loaded the map data for U.S. states.

  2. Created random population data.

  3. Combined the map and population data.

  4. Used ggplot2 to create a colorful map.

  5. Customized the map to make it more readable.

This is just the start! You can use real data, try other regions, or explore different color schemes to create maps for your projects.