The field of data visualization in R has come a long way, especially in the realm of mapping. Modern R packages have significantly simplified the process of creating maps, moving beyond traditional GIS tools. This lesson will explore various packages and techniques for creating static and interactive maps in R, focusing on both US and world maps. We’ll cover a range of tools from basic plotting to advanced interactive visualizations, providing you with a comprehensive toolkit for geographical data representation.
library(tidyverse) # For data manipulation and visualization
library(ggplot2) # For creating static maps
library(usmap) # Specialized for US maps
library(sf) # For working with spatial data
library(rnaturalearth) # For world map data
library(rnaturalearthdata) # Additional data for rnaturalearth
library(USAboundaries) # For historical US boundaries
library(statebins) # For creating state-based tile grid maps
library(leaflet) # For interactive web maps
library(plotly) # For interactive plots and maps
library(highcharter) # For highcharts visualizations)
usmap
PackageThe usmap
package is a powerful tool for creating US
maps at both state and county levels. It simplifies the process of
plotting US maps by handling the map data and providing convenience
functions.
Let’s start with a basic state map:
plot_usmap()
This command creates a blank state map of the United States. Notice how it uses a pleasing projection where the top of the US isn’t a straight line, making it more visually appealing and geographically accurate.
We can easily switch to a county-level map:
plot_usmap(regions = "counties")
This creates a detailed map showing all US counties. It’s particularly useful when you need to visualize data at a more granular level than states.
The usmap
package uses FIPS (Federal Information
Processing Standards) codes to identify states and counties. Let’s look
at how this data is structured:
states_df <- usmap::us_map()
counties_df <- usmap::us_map(regions = "counties")
# View a sample of the state data
states_df %>% slice_sample(n=20)
## Simple feature collection with 20 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -2590847 ymin: -2608148 xmax: 2523581 ymax: 731407.9
## Projected CRS: NAD27 / US National Atlas Equal Area
## # A tibble: 20 × 4
## fips abbr full geom
## <chr> <chr> <chr> <MULTIPOLYGON [m]>
## 1 04 AZ Arizona (((-1388676 -1254584, -1389181 -1251856, -1384522…
## 2 22 LA Louisiana (((561191.4 -1344308, 560485.1 -1332803, 559919 -…
## 3 53 WA Washington (((-1681560 657002.6, -1669385 655249, -1665145 6…
## 4 36 NY New York (((2289409 -18529.61, 2295994 -14228.32, 2298140 …
## 5 26 MI Michigan (((1185004 212637.9, 1193870 206862.7, 1201942 20…
## 6 55 WI Wisconsin (((1017118 129040.8, 1024178 133478.9, 1027153 12…
## 7 42 PA Pennsylvania (((1623291 -261838.4, 1617848 -238044.8, 1617646 …
## 8 46 SD South Dakota (((-319754.7 7697.221, -318287.6 7745.276, -31765…
## 9 28 MS Mississippi (((801980.1 -1481934, 807488.2 -1482080, 812481 -…
## 10 01 AL Alabama (((1093777 -1378535, 1093269 -1374223, 1092965 -1…
## 11 40 OK Oklahoma (((-268949.9 -918945.8, -268044 -893009.6, -26773…
## 12 32 NV Nevada (((-1706439 -424290.8, -1703047 -410342.2, -17005…
## 13 06 CA California (((-1719946 -1090033, -1709611 -1090026, -1700882…
## 14 23 ME Maine (((2421620 348270.4, 2424412 351787.5, 2428573 34…
## 15 10 DE Delaware (((2042506 -284367.3, 2043078 -280000.3, 2044959 …
## 16 49 UT Utah (((-1236030 -716929.3, -1233521 -701434.4, -12309…
## 17 37 NC North Carolina (((1425709 -953634.5, 1425623 -951299.1, 1430148 …
## 18 05 AR Arkansas (((483065.2 -927788.2, 506062 -926263.3, 531512.5…
## 19 02 AK Alaska (((-2396847 -2547721, -2393297 -2546391, -2391552…
## 20 47 TN Tennessee (((885956.1 -1055210, 889039.6 -1053574, 895108.2…
# View a sample of the county data
counties_df %>% slice_sample(n=20)
## Simple feature collection with 20 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -1612996 ymin: -1695531 xmax: 2348353 ymax: 372931.3
## Projected CRS: NAD27 / US National Atlas Equal Area
## # A tibble: 20 × 5
## fips abbr full county geom
## <chr> <chr> <chr> <chr> <MULTIPOLYGON [m]>
## 1 20203 KS Kansas Wichita County (((-136467.8 -698185.1, -129263.3…
## 2 40045 OK Oklahoma Ellis County (((-246.8828 -932801.4, 35414.8 -…
## 3 47029 TN Tennessee Cocke County (((1499142 -861232.6, 1500442 -83…
## 4 48343 TX Texas Morris County (((485190 -1275077, 487986.4 -127…
## 5 19075 IA Iowa Grundy County (((571930.1 -247145.1, 610580.2 -…
## 6 25021 MA Massachusetts Norfolk County (((2343612 120557.6, 2347608 1207…
## 7 51047 VA Virginia Culpeper County (((1879152 -471300.5, 1894451 -44…
## 8 05101 AR Arkansas Newton County (((585878.4 -989139.5, 591777 -97…
## 9 48421 TX Texas Sherman County (((-194175.9 -940600, -182493 -94…
## 10 45029 SC South Carolina Colleton County (((1758922 -1131380, 1770484 -112…
## 11 16049 ID Idaho Idaho County (((-1293058 229643.3, -1288935 23…
## 12 40111 OK Oklahoma Okmulgee County (((345829.4 -1030612, 344852.5 -1…
## 13 29229 MO Missouri Wright County (((647159.7 -806261.7, 685549.5 -…
## 14 20095 KS Kansas Kingman County (((136114.3 -834366.6, 135606.6 -…
## 15 13193 GA Georgia Macon County (((1480243 -1262957, 1484985 -125…
## 16 48015 TX Texas Austin County (((328414 -1649075, 360177.4 -164…
## 17 48459 TX Texas Upshur County (((456986 -1354417, 455709.5 -132…
## 18 37097 NC North Carolina Iredell County (((1695701 -832462.6, 1699850 -82…
## 19 41025 OR Oregon Harney County (((-1607531 -34203.41, -1600676 -…
## 20 16057 ID Idaho Latah County (((-1280978 372931.3, -1268124 35…
These dataframes contain the polygon data needed to draw the maps. The FIPS codes are crucial for merging your data with the map data.
You can use the fips()
function to get FIPS codes:
fips(state = "MA")
## [1] "25"
fips(state = "Massachusetts")
## [1] "25"
Or perform a reverse lookup:
fips_info(c("30", "33", "34"))
## abbr fips full
## 1 MT 30 Montana
## 2 NH 33 New Hampshire
## 3 NJ 34 New Jersey
fips_info(c("01001", "01003", "01005", "01007"))
## full abbr county fips
## 1 Alabama AL Autauga County 01001
## 2 Alabama AL Baldwin County 01003
## 3 Alabama AL Barbour County 01005
## 4 Alabama AL Bibb County 01007
This is particularly useful when working with datasets that use different state identifiers.
Now, let’s create a map that visualizes data. We’ll use the built-in
statepop
dataset to create a choropleth map of state
populations:
data(statepop)
plot_usmap(data = statepop, values = "pop_2022", color = "red") +
scale_fill_continuous(
low = "white", high = "red", name = "Population (2022)", label = scales::comma
) +
theme(legend.position = "right") +
labs(title = "US State Populations", subtitle = "2022 Estimates")
This map uses color intensity to represent population size, with
darker red indicating higher population. The
scale_fill_continuous()
function allows us to customize the
color scale.
You can also focus on specific states or regions:
plot_usmap(
data = statepop,
values = "pop_2022",
include = c("CA", "ID", "NV", "OR", "WA"),
color = "blue"
) +
scale_fill_continuous(
low = "white", high = "blue", name = "Population (2022)", label = scales::comma
) +
labs(title = "Western US States", subtitle = "Population Estimates for Pacific States") +
theme(legend.position = "right")
This map focuses on the Pacific states, using a blue color scheme.
The include
parameter allows you to specify which states to
show.
We can also visualize county-level data. Let’s use the
countypov
dataset to map poverty rates:
data(countypov)
plot_usmap(data = countypov, values = "pct_pov_2021",
include = c("CT", "ME", "MA", "NH", "VT"), color = "purple") +
scale_fill_continuous(low = "white", high = "purple",
name = "Poverty Percentage", label = scales::comma) +
labs(title = "New England Region",
subtitle = "Poverty Percentage Estimates for New England Counties in 2021") +
theme(legend.position = "right")
This map shows poverty rates across New England counties,
demonstrating how usmap
can handle detailed county-level
data.
library(tidycensus)
census_api_key("2c5ba48be91062db570c7fc5fa49dbca03306c33")
## To install your API key for use in future sessions, run this function with `install = TRUE`.
# Download county-level median household income data
county_income <- get_acs(
geography = "county",
variables = "B19013_001",
year = 2019,
geometry = FALSE
) %>%
mutate(
fips = as.character(GEOID),
income = estimate
) %>%
select(fips, income) %>%
mutate(fips = str_pad(fips, width = 5, pad = "0")) # Ensure fips is a character vector and has leading zeros where necessary
## Getting data from the 2015-2019 5-year ACS
# Create the choropleth map for Midwest
plot_usmap(
data = county_income,
values = "income",
include = .midwest_region,
color = "black",
size = 0.1,
regions = "counties"
) +
scale_fill_viridis_c(
name = "Median Household Income",
label = scales::dollar
) +
labs(title = "Median Household Income by County in the Midwest",
subtitle = "2019 5-year ACS estimates") +
theme(legend.position = "right")
This code creates a detailed county-level choropleth map of the Midwest
region, showcasing median household income data. It begins by using the
tidycensus
package to fetch the latest available
county-level income estimates from the American Community Survey. The
data is then processed to ensure proper formatting of FIPS codes, which
are crucial for accurate mapping. The plot_usmap()
function
is employed to generate the map, utilizing the built-in
.midwest_region
parameter to focus specifically on Midwest
states. By setting regions = "counties"
, the map maintains
county-level granularity, providing a detailed view of income
distribution across the region. The resulting visualization uses a
viridis color scale to represent income levels, with darker colors
indicating higher median household incomes. This approach leverages the
pre-defined regions in the usmap
package for streamlined
map creation.
The usmap
package also includes predefined regions:
plot_usmap(include = .south_region)
You can combine or exclude regions:
plot_usmap(include = .south_region, exclude = .east_south_central)
These predefined regions make it easy to focus on specific parts of the country without manually specifying states.
sf
PackageThe sf
(Simple Features) package provides a modern
approach to working with spatial data in R. It integrates seamlessly
with the tidyverse and ggplot2, making it a powerful tool for spatial
data analysis and visualization.
Let’s create a simple feature object for US states:
us_states <- st_as_sf(maps::map("state", plot = FALSE, fill = TRUE))
ggplot(data = us_states) +
geom_sf() +
theme_minimal() +
ggtitle("US States using sf and ggplot2")
This code converts the state map data into an sf object and then
plots it using ggplot2. The geom_sf()
function
automatically handles the spatial nature of the data.
One of the strengths of sf
is its ability to perform
spatial operations. Let’s find the centroids of states:
# Get US states data
us_states <- ne_states(country = "United States of America", returnclass = "sf")
# Calculate centroids
state_centroids <- st_centroid(us_states)
## Warning: st_centroid assumes attributes are constant over geometries
# Create the plot
ggplot() +
geom_sf(data = us_states, fill = "white", color = "black") +
geom_sf(data = state_centroids, color = "red", size = 2) +
theme_minimal() +
labs(title = "US States with Centroids") +
coord_sf(crs = st_crs(us_states), xlim = c(-125, -65), ylim = c(25, 50))
This map shows each state with a red dot representing its centroid. Such operations can be useful for placing labels or symbols on your maps.
We can convert our coordinate data to an sf object for more advanced spatial operations:
# Get US states data from rnaturalearth, specifying returnclass = "sf".
us_states_ne_sf <- ne_states(country = "United States of America", returnclass = "sf")
# Sample city data with latitude, longitude, and population.
cities_df <- data.frame(
name = c("New York", "Los Angeles", "Chicago", "Houston", "Phoenix"),
lat = c(40.7128, 34.0522, 41.8781, 29.7604, 33.4484),
lon = c(-74.0060, -118.2437, -87.6298, -95.3698, -112.0740),
population = c(8336817, 3898747, 2746388, 2304580, 1608139)
)
# st_as_sf converts the dataframe to an sf object.
cities_points_sf <- st_as_sf(cities_df, coords = c("lon", "lat"), crs = 4326)
# Create the plot, focused on North America
ggplot() +
geom_sf(data = us_states_ne_sf) +
geom_sf(data = cities_points_sf, aes(size = population), color = "red", alpha = 0.7) +
scale_size_continuous(name = "Population", label = scales::comma) +
coord_sf(xlim = c(-125, -65), ylim = c(25, 50)) +
theme_minimal() +
labs(title = "Major US Cities", subtitle = "Sized by Population")
This approach allows us to use sf’s spatial functions with our point data.
rnaturalearth
While usmap
is great for US maps,
rnaturalearth
provides similar functionality for world
maps. It’s an excellent complement to usmap
when you need
to create global visualizations.
We can retrieve country boundaries at different scales:
world_countries <- ne_countries(scale = "medium", returnclass = "sf")
ggplot(data = world_countries) +
geom_sf(aes(fill = pop_est)) +
scale_fill_viridis_c(option = "plasma", trans = "sqrt", name = "Population") +
theme_minimal() +
labs(title = "World Countries", subtitle = "Colored by Population Estimates")
USAboundaries
The USAboundaries
package is invaluable for working with
historical US boundaries. This is particularly useful for historical
analysis or showing how US state boundaries have changed over time.
states_1840 <- us_states("1840-03-12")
ggplot(data = states_1840) +
geom_sf() +
theme_minimal() +
ggtitle("U.S. State Boundaries on March 12, 1840")
This map shows the state boundaries as they existed in 1840, providing a snapshot of US geography at that time.
statebins
Sometimes, a traditional map projection isn’t the best way to
visualize state-level data. The statebins
package offers an
alternative by creating a tile grid map where each state is represented
by a square.
data("USArrests")
USArrests$state <- rownames(USArrests)
statebins(USArrests, value_col = "Assault", name = "Assault", round = TRUE) +
theme_statebins(legend_position = "right") +
ggtitle("Assault Arrests by State")
This visualization represents each state as a square, colored by the number of assault arrests. It’s particularly useful when you want to give equal visual weight to each state, regardless of its geographic size.
leaflet
and
plotly
While static maps are useful, interactive maps can provide a more engaging user experience and allow for exploration of the data.
Leaflet is a popular JavaScript library for interactive maps, and the R package makes it easy to use in R:
leaflet() %>%
addTiles() %>%
addMarkers(lng = -95.307, lat = 29.7601,
popup = "Houston")
This creates an interactive map centered on Houston. Users can zoom, pan, and click on the marker to see the popup.
We can create a more detailed interactive choropleth map of the US using Plotly:
data(statepop, package = "usmap")
# Create an interactive choropleth map directly with plotly
fig <- plot_ly(
data = statepop,
type = 'choropleth',
locations = ~abbr, # Use state abbreviations for locations
locationmode = 'USA-states',
z = ~pop_2022, # The variable to color by (population)
colorscale = 'Viridis',
# Custom hover text. We use the 'full' column for state names
# and format the population with commas.
text = ~paste(full, '<br>Population:', scales::comma(pop_2022)),
hoverinfo = 'text',
# Customize the color bar (legend)
colorbar = list(title = 'Population')
)
# Customize the map's layout, such as the title and geographic projection
fig <- fig %>% layout(
title = 'US State Population (Interactive)',
geo = list(
scope = 'usa', # Focus the map on the USA
showlakes = TRUE,
lakecolor = toRGB('white')
)
)
# Display the final interactive figure
fig
hcmap("countries/us/us-all")|>
hc_title(text = "United States")
hcmap("countries/ua/ua-all")|>
hc_title(text = "Ukraine")