Introduction

The field of data visualization in R has come a long way, especially in the realm of mapping. Modern R packages have significantly simplified the process of creating maps, moving beyond traditional GIS tools. This lesson will explore various packages and techniques for creating static and interactive maps in R, focusing on both US and world maps. We’ll cover a range of tools from basic plotting to advanced interactive visualizations, providing you with a comprehensive toolkit for geographical data representation.

Setting Up

library(tidyverse)  # For data manipulation and visualization
library(ggplot2)    # For creating static maps
library(usmap)      # Specialized for US maps
library(sf)         # For working with spatial data
library(rnaturalearth)  # For world map data
library(rnaturalearthdata)  # Additional data for rnaturalearth
library(USAboundaries)  # For historical US boundaries
library(statebins)  # For creating state-based tile grid maps
library(leaflet)    # For interactive web maps
library(plotly)     # For interactive plots and maps
library(highcharter) # For highcharts visualizations)

Using the usmap Package

The usmap package is a powerful tool for creating US maps at both state and county levels. It simplifies the process of plotting US maps by handling the map data and providing convenience functions.

Basic State Map

Let’s start with a basic state map:

plot_usmap()

This command creates a blank state map of the United States. Notice how it uses a pleasing projection where the top of the US isn’t a straight line, making it more visually appealing and geographically accurate.

County Map

We can easily switch to a county-level map:

plot_usmap(regions = "counties")

This creates a detailed map showing all US counties. It’s particularly useful when you need to visualize data at a more granular level than states.

Understanding the Data Structure

The usmap package uses FIPS (Federal Information Processing Standards) codes to identify states and counties. Let’s look at how this data is structured:

states_df <- usmap::us_map()
counties_df <- usmap::us_map(regions = "counties")

# View a sample of the state data
states_df %>% slice_sample(n=20)
## Simple feature collection with 20 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -2590847 ymin: -2608148 xmax: 2523581 ymax: 731407.9
## Projected CRS: NAD27 / US National Atlas Equal Area
## # A tibble: 20 × 4
##    fips  abbr  full                                                         geom
##    <chr> <chr> <chr>                                          <MULTIPOLYGON [m]>
##  1 04    AZ    Arizona        (((-1388676 -1254584, -1389181 -1251856, -1384522…
##  2 22    LA    Louisiana      (((561191.4 -1344308, 560485.1 -1332803, 559919 -…
##  3 53    WA    Washington     (((-1681560 657002.6, -1669385 655249, -1665145 6…
##  4 36    NY    New York       (((2289409 -18529.61, 2295994 -14228.32, 2298140 …
##  5 26    MI    Michigan       (((1185004 212637.9, 1193870 206862.7, 1201942 20…
##  6 55    WI    Wisconsin      (((1017118 129040.8, 1024178 133478.9, 1027153 12…
##  7 42    PA    Pennsylvania   (((1623291 -261838.4, 1617848 -238044.8, 1617646 …
##  8 46    SD    South Dakota   (((-319754.7 7697.221, -318287.6 7745.276, -31765…
##  9 28    MS    Mississippi    (((801980.1 -1481934, 807488.2 -1482080, 812481 -…
## 10 01    AL    Alabama        (((1093777 -1378535, 1093269 -1374223, 1092965 -1…
## 11 40    OK    Oklahoma       (((-268949.9 -918945.8, -268044 -893009.6, -26773…
## 12 32    NV    Nevada         (((-1706439 -424290.8, -1703047 -410342.2, -17005…
## 13 06    CA    California     (((-1719946 -1090033, -1709611 -1090026, -1700882…
## 14 23    ME    Maine          (((2421620 348270.4, 2424412 351787.5, 2428573 34…
## 15 10    DE    Delaware       (((2042506 -284367.3, 2043078 -280000.3, 2044959 …
## 16 49    UT    Utah           (((-1236030 -716929.3, -1233521 -701434.4, -12309…
## 17 37    NC    North Carolina (((1425709 -953634.5, 1425623 -951299.1, 1430148 …
## 18 05    AR    Arkansas       (((483065.2 -927788.2, 506062 -926263.3, 531512.5…
## 19 02    AK    Alaska         (((-2396847 -2547721, -2393297 -2546391, -2391552…
## 20 47    TN    Tennessee      (((885956.1 -1055210, 889039.6 -1053574, 895108.2…
# View a sample of the county data
counties_df %>% slice_sample(n=20)
## Simple feature collection with 20 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -1612996 ymin: -1695531 xmax: 2348353 ymax: 372931.3
## Projected CRS: NAD27 / US National Atlas Equal Area
## # A tibble: 20 × 5
##    fips  abbr  full           county                                        geom
##    <chr> <chr> <chr>          <chr>                           <MULTIPOLYGON [m]>
##  1 20203 KS    Kansas         Wichita County  (((-136467.8 -698185.1, -129263.3…
##  2 40045 OK    Oklahoma       Ellis County    (((-246.8828 -932801.4, 35414.8 -…
##  3 47029 TN    Tennessee      Cocke County    (((1499142 -861232.6, 1500442 -83…
##  4 48343 TX    Texas          Morris County   (((485190 -1275077, 487986.4 -127…
##  5 19075 IA    Iowa           Grundy County   (((571930.1 -247145.1, 610580.2 -…
##  6 25021 MA    Massachusetts  Norfolk County  (((2343612 120557.6, 2347608 1207…
##  7 51047 VA    Virginia       Culpeper County (((1879152 -471300.5, 1894451 -44…
##  8 05101 AR    Arkansas       Newton County   (((585878.4 -989139.5, 591777 -97…
##  9 48421 TX    Texas          Sherman County  (((-194175.9 -940600, -182493 -94…
## 10 45029 SC    South Carolina Colleton County (((1758922 -1131380, 1770484 -112…
## 11 16049 ID    Idaho          Idaho County    (((-1293058 229643.3, -1288935 23…
## 12 40111 OK    Oklahoma       Okmulgee County (((345829.4 -1030612, 344852.5 -1…
## 13 29229 MO    Missouri       Wright County   (((647159.7 -806261.7, 685549.5 -…
## 14 20095 KS    Kansas         Kingman County  (((136114.3 -834366.6, 135606.6 -…
## 15 13193 GA    Georgia        Macon County    (((1480243 -1262957, 1484985 -125…
## 16 48015 TX    Texas          Austin County   (((328414 -1649075, 360177.4 -164…
## 17 48459 TX    Texas          Upshur County   (((456986 -1354417, 455709.5 -132…
## 18 37097 NC    North Carolina Iredell County  (((1695701 -832462.6, 1699850 -82…
## 19 41025 OR    Oregon         Harney County   (((-1607531 -34203.41, -1600676 -…
## 20 16057 ID    Idaho          Latah County    (((-1280978 372931.3, -1268124 35…

These dataframes contain the polygon data needed to draw the maps. The FIPS codes are crucial for merging your data with the map data.

Working with FIPS Codes

You can use the fips() function to get FIPS codes:

fips(state = "MA")
## [1] "25"
fips(state = "Massachusetts")
## [1] "25"

Or perform a reverse lookup:

fips_info(c("30", "33", "34"))
##   abbr fips          full
## 1   MT   30       Montana
## 2   NH   33 New Hampshire
## 3   NJ   34    New Jersey
fips_info(c("01001", "01003", "01005", "01007"))
##      full abbr         county  fips
## 1 Alabama   AL Autauga County 01001
## 2 Alabama   AL Baldwin County 01003
## 3 Alabama   AL Barbour County 01005
## 4 Alabama   AL    Bibb County 01007

This is particularly useful when working with datasets that use different state identifiers.

Adding Data to Maps

Now, let’s create a map that visualizes data. We’ll use the built-in statepop dataset to create a choropleth map of state populations:

data(statepop)

plot_usmap(data = statepop, values = "pop_2022", color = "red") + 
  scale_fill_continuous(
    low = "white", high = "red", name = "Population (2022)", label = scales::comma
  ) + 
  theme(legend.position = "right") +
  labs(title = "US State Populations", subtitle = "2022 Estimates")

This map uses color intensity to represent population size, with darker red indicating higher population. The scale_fill_continuous() function allows us to customize the color scale.

Focusing on Specific Regions

You can also focus on specific states or regions:

plot_usmap(
  data = statepop, 
  values = "pop_2022", 
  include = c("CA", "ID", "NV", "OR", "WA"), 
  color = "blue"
) + 
  scale_fill_continuous(
    low = "white", high = "blue", name = "Population (2022)", label = scales::comma
  ) + 
  labs(title = "Western US States", subtitle = "Population Estimates for Pacific States") +
  theme(legend.position = "right")

This map focuses on the Pacific states, using a blue color scheme. The include parameter allows you to specify which states to show.

County-Level Data

We can also visualize county-level data. Let’s use the countypov dataset to map poverty rates:

data(countypov)

plot_usmap(data = countypov, values = "pct_pov_2021", 
           include = c("CT", "ME", "MA", "NH", "VT"), color = "purple") + 
  scale_fill_continuous(low = "white", high = "purple", 
                        name = "Poverty Percentage", label = scales::comma) + 
  labs(title = "New England Region", 
       subtitle = "Poverty Percentage Estimates for New England Counties in 2021") +
  theme(legend.position = "right")

This map shows poverty rates across New England counties, demonstrating how usmap can handle detailed county-level data.

Visualizing Census county-level data

library(tidycensus)

census_api_key("2c5ba48be91062db570c7fc5fa49dbca03306c33")
## To install your API key for use in future sessions, run this function with `install = TRUE`.
# Download county-level median household income data
county_income <- get_acs(
  geography = "county",
  variables = "B19013_001",
  year = 2019,
  geometry = FALSE
) %>%
  mutate(
    fips = as.character(GEOID),
    income = estimate
  ) %>%
  select(fips, income) %>%
  mutate(fips = str_pad(fips, width = 5, pad = "0")) # Ensure fips is a character vector and has leading zeros where necessary
## Getting data from the 2015-2019 5-year ACS
# Create the choropleth map for Midwest
plot_usmap(
  data = county_income,
  values = "income",
  include = .midwest_region,
  color = "black",
  size = 0.1,
  regions = "counties"
) +
  scale_fill_viridis_c(
    name = "Median Household Income",
    label = scales::dollar
  ) +
  labs(title = "Median Household Income by County in the Midwest",
       subtitle = "2019 5-year ACS estimates") +
  theme(legend.position = "right")

This code creates a detailed county-level choropleth map of the Midwest region, showcasing median household income data. It begins by using the tidycensus package to fetch the latest available county-level income estimates from the American Community Survey. The data is then processed to ensure proper formatting of FIPS codes, which are crucial for accurate mapping. The plot_usmap() function is employed to generate the map, utilizing the built-in .midwest_region parameter to focus specifically on Midwest states. By setting regions = "counties", the map maintains county-level granularity, providing a detailed view of income distribution across the region. The resulting visualization uses a viridis color scale to represent income levels, with darker colors indicating higher median household incomes. This approach leverages the pre-defined regions in the usmap package for streamlined map creation.

Using Predefined Regions

The usmap package also includes predefined regions:

plot_usmap(include = .south_region)

You can combine or exclude regions:

plot_usmap(include = .south_region, exclude = .east_south_central)

These predefined regions make it easy to focus on specific parts of the country without manually specifying states.

Exploring the sf Package

The sf (Simple Features) package provides a modern approach to working with spatial data in R. It integrates seamlessly with the tidyverse and ggplot2, making it a powerful tool for spatial data analysis and visualization.

Creating an SF Object

Let’s create a simple feature object for US states:

us_states <- st_as_sf(maps::map("state", plot = FALSE, fill = TRUE))

ggplot(data = us_states) +
  geom_sf() +
  theme_minimal() +
  ggtitle("US States using sf and ggplot2")

This code converts the state map data into an sf object and then plots it using ggplot2. The geom_sf() function automatically handles the spatial nature of the data.

Working with Spatial Operations

One of the strengths of sf is its ability to perform spatial operations. Let’s find the centroids of states:

# Get US states data
us_states <- ne_states(country = "United States of America", returnclass = "sf")

# Calculate centroids
state_centroids <- st_centroid(us_states)
## Warning: st_centroid assumes attributes are constant over geometries
# Create the plot
ggplot() +
  geom_sf(data = us_states, fill = "white", color = "black") +
  geom_sf(data = state_centroids, color = "red", size = 2) +
  theme_minimal() +
  labs(title = "US States with Centroids") +
  coord_sf(crs = st_crs(us_states), xlim = c(-125, -65), ylim = c(25, 50))

This map shows each state with a red dot representing its centroid. Such operations can be useful for placing labels or symbols on your maps.

Using sf with Coordinate Data

We can convert our coordinate data to an sf object for more advanced spatial operations:

# Get US states data from rnaturalearth, specifying returnclass = "sf".
us_states_ne_sf <- ne_states(country = "United States of America", returnclass = "sf")

# Sample city data with latitude, longitude, and population.
cities_df <- data.frame(
  name = c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix"),
  lat = c(40.7128, 34.0522, 41.8781, 29.7604, 33.4484),
  lon = c(-74.0060, -118.2437, -87.6298, -95.3698, -112.0740),
  population = c(8336817, 3898747, 2746388, 2304580, 1608139)
)

# st_as_sf converts the dataframe to an sf object.
cities_points_sf <- st_as_sf(cities_df, coords = c("lon", "lat"), crs = 4326)

# Create the plot, focused on North America
ggplot() +
  geom_sf(data = us_states_ne_sf) +
  geom_sf(data = cities_points_sf, aes(size = population), color = "red", alpha = 0.7) +
  scale_size_continuous(name = "Population", label = scales::comma) +
  coord_sf(xlim = c(-125, -65), ylim = c(25, 50)) +
  theme_minimal() +
  labs(title = "Major US Cities", subtitle = "Sized by Population")

This approach allows us to use sf’s spatial functions with our point data.

World Maps with rnaturalearth

While usmap is great for US maps, rnaturalearth provides similar functionality for world maps. It’s an excellent complement to usmap when you need to create global visualizations.

Getting Country Data

We can retrieve country boundaries at different scales:

world_countries <- ne_countries(scale = "medium", returnclass = "sf")

ggplot(data = world_countries) +
  geom_sf(aes(fill = pop_est)) +
  scale_fill_viridis_c(option = "plasma", trans = "sqrt", name = "Population") +
  theme_minimal() +
  labs(title = "World Countries", subtitle = "Colored by Population Estimates")

Historical Boundaries with USAboundaries

The USAboundaries package is invaluable for working with historical US boundaries. This is particularly useful for historical analysis or showing how US state boundaries have changed over time.

states_1840 <- us_states("1840-03-12")

ggplot(data = states_1840) +
  geom_sf() +
  theme_minimal() +
  ggtitle("U.S. State Boundaries on March 12, 1840")

This map shows the state boundaries as they existed in 1840, providing a snapshot of US geography at that time.

Alternative Visualizations with statebins

Sometimes, a traditional map projection isn’t the best way to visualize state-level data. The statebins package offers an alternative by creating a tile grid map where each state is represented by a square.

data("USArrests")
USArrests$state <- rownames(USArrests)

statebins(USArrests, value_col = "Assault", name = "Assault", round = TRUE) +
  theme_statebins(legend_position = "right") +
  ggtitle("Assault Arrests by State")

This visualization represents each state as a square, colored by the number of assault arrests. It’s particularly useful when you want to give equal visual weight to each state, regardless of its geographic size.

Interactive Maps with leaflet and plotly

While static maps are useful, interactive maps can provide a more engaging user experience and allow for exploration of the data.

Leaflet for Interactive Web Maps

Leaflet is a popular JavaScript library for interactive maps, and the R package makes it easy to use in R:

leaflet() %>%
  addTiles() %>%
  addMarkers(lng = -95.307, lat = 29.7601, 
             popup = "Houston")

This creates an interactive map centered on Houston. Users can zoom, pan, and click on the marker to see the popup.

Plotly for Interactive Choropleth Maps

We can create a more detailed interactive choropleth map of the US using Plotly:

data(statepop, package = "usmap")

# Create an interactive choropleth map directly with plotly
fig <- plot_ly(
  data = statepop,
  type = 'choropleth',
  locations = ~abbr, # Use state abbreviations for locations
  locationmode = 'USA-states',
  z = ~pop_2022, # The variable to color by (population)
  colorscale = 'Viridis',
  # Custom hover text. We use the 'full' column for state names
  # and format the population with commas.
  text = ~paste(full, '<br>Population:', scales::comma(pop_2022)),
  hoverinfo = 'text',
  # Customize the color bar (legend)
  colorbar = list(title = 'Population')
)

# Customize the map's layout, such as the title and geographic projection
fig <- fig %>% layout(
  title = 'US State Population (Interactive)',
  geo = list(
    scope = 'usa', # Focus the map on the USA
    showlakes = TRUE,
    lakecolor = toRGB('white')
  )
)

# Display the final interactive figure
fig

Highchart

hcmap("countries/us/us-all")|>
  hc_title(text = "United States")
hcmap("countries/ua/ua-all")|>
  hc_title(text = "Ukraine")