Load packages

Download the following libraries (ggplot2, sf, dplyr, tidylog, and ggiprah) and load them.

library(ggplot2)
library(sf)
library(dplyr)
library(tidylog)
library(ggiraph)

Data Wrangling

Step 1

Download the Shapefile data from Minnesota Geospatial Commons. The page was found via a Google search using the term Minnesota Shapefile. Use the read_sf function from the sf library to capture the data to a table. We will call it mn.

mn <- read_sf("./mn_shapefile_2/", "mn_county_boundaries_500")
head(mn)
## Simple feature collection with 6 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 190012.2 ymin: 5166465 xmax: 591752.2 ymax: 5472428
## Projected CRS: NAD83 / UTM zone 15N
## # A tibble: 6 × 13
##           AREA PERIMETER CTYONLY_ CTYONLY_ID  COUN CTY_NAME    CTY_ABBR CTY_FIPS
##          <dbl>     <dbl>    <dbl>      <dbl> <int> <chr>       <chr>       <int>
## 1  4608320924.   388250.        2          1    39 Lake of th… LOTW           77
## 2  2862183702.   263017.        3          2    35 Kittson     KITT           69
## 3  4347098503.   302591.        4          3    68 Roseau      ROSE          135
## 4  8167237871.   412897.        5          4    36 Koochiching KOOC           71
## 5  4698732288.   374208.        6          5    45 Marshall    MARS           89
## 6 17451037319.   682518.        7          6    69 St. Louis   STLO          137
## # ℹ 5 more variables: MaxSimpTol <dbl>, MinSimpTol <dbl>, Shape_Leng <dbl>,
## #   Shape_Area <dbl>, geometry <POLYGON [m]>

Step 2

  • Next, get the Minnesota population data in year 2010 to get more features like population, households, and persons per household.
  • From the Popfinder for Minnesota Counties, press Download all the data in this PopFinder tool. (Excel file)
  • Move the Excel file to the working folder (meaning the same folder as your R file) and name it to mn_county_bigdata.cvs.

Step 3

Clean the data by just considering the county geography type and the year 2010 from the data.

mn_clean <- read.csv("mn_county_bigdata.csv")
mn_clean <- mn_clean %>% 
  filter(Year==2010 & Geography.Type=="County")
head(mn_clean)
##   Geography.Type Geography.Name Year Population Households
## 1         County         Aitkin 2010    16,202      7,299 
## 2         County          Anoka 2010   330,844    121,227 
## 3         County         Becker 2010    32,504     13,224 
## 4         County       Beltrami 2010    44,442     16,846 
## 5         County         Benton 2010    38,451     15,079 
## 6         County      Big Stone 2010     5,269      2,293 
##   Persons.Per.Household..PPH.
## 1                        2.18
## 2                        2.70
## 3                        2.42
## 4                        2.51
## 5                        2.48
## 6                        2.24

Perform a left join to to our main data mn so it will have the additional features.

# Add Households and Persons Per Household variables to the main data table
mn <- mn %>% left_join(mn_clean, by = c("CTY_NAME" = "Geography.Name"))

Step 4

Now, let’s create categories for the population from very small to very large.

# Define the breakpoints for the categories for population
breakpoints <- c(0, 5000, 10000, 50000, 100000, Inf)


# Convert to integer
mn$Pop_int <- gsub(",", "", mn$Population) %>%
  as.integer()

# Create categories
mn$pop_category <- cut(mn$Pop_int, breaks = breakpoints, labels = c("very small (0-5,000)", "small (5,000 - 10,000)", "medium (10,000 - 50,000)", "large (50,000 - 100,000)", "very large (> 100,000)" ), include.lowest = TRUE)

This code chunk below combines the wanted features of a county into a character string. This helps as when you hover over a county on the map, it displays the correct information of name, population, households, and persons per household.

mn <- mn %>%
  mutate(info = paste(
    "Name: ", CTY_NAME,
    "\nPopulation Category: ", pop_category,
    "\nPopulation: ", Population,
    "\nHouseholds: ", Households,
    "\nPersons Per Household: ", Persons.Per.Household..PPH.
  ))

Step 5

Let’s plot a static choropleth map showing Minnesota counties by population in 2010 first. We use ggplot with passed in parameters below.

plot <- ggplot(mn, aes(fill = pop_category)) +
  geom_sf() + # call geom shapefile
  geom_sf_interactive(aes(geometry = geometry, tooltip = info)) + 
  scale_fill_manual(
    values = c(
      "very small (0-5,000)" = "#eff3ff",
      "small (5,000 - 10,000)" = "#bdd7e7",
      "medium (10,000 - 50,000)" = "#6baed6",
      "large (50,000 - 100,000)" = "#3182bd",
      "very large (> 100,000)" = "#08519c") ) +
  labs(title = "Minnesota counties by population category in 2010", 
       subtitle = "Most highly populated counties are on the East of MN",
       fill="",
       caption = "Long Truong (5/12/2023)\nSource: Minnesota Geospatial Commons and Demographic Center")  +
  theme(plot.caption = element_text(hjust = 0)) +
  theme(legend.position = "right") +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank()) +
  theme(axis.text.x=element_blank(), 
      axis.ticks.x=element_blank(), 
      axis.text.y=element_blank(), 
      axis.ticks.y=element_blank()) 

plot

Choropleth map showing Minnesota counties by population in 2010

Final Product

Finally, we use this code below to produce the interactive map

final <- ggiraph(code = print(plot))  
## Function `ggiraph()` is replaced by `girafe()` and will be removed soon.
final

Resource used

GGiraph Interactive Map Documents
Minnesota Geospatial Commons
Additional Minnesota Population data