First we are going to need to install the necessary packages. Since creating an interactive map is a major portion of this, we’ll be needing the necessary packages to do so.

# install.packages("devtools")
# install.packages("stringr")
# install.packages(c("maps", "mapdata"))

Then we load them.

library(ggplot2)
library(maps)
library(mapdata)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)
library(viridis)

## Loading required package: viridisLite

## 
## Attaching package: 'viridis'

## The following object is masked from 'package:maps':
## 
##     unemp

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ tibble  3.1.2     ✓ purrr   0.3.4
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ readr   1.4.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x purrr::map()    masks maps::map()

Starting basic

We load up a map of the US to start with. We get the map from the “maps” pacakge.

usa <- map_data("usa")

dim(usa)

## [1] 7243    6

# head(usa)
# tail(usa)

Then we load a different map, “world2Hires” from mapdata, which is centered around the Pacific Ocean.

w2hr <- map_data("world2Hires")

dim(w2hr)

## [1] 2274539       6

head(w2hr)

##       long      lat group order region subregion
## 1 226.6336 58.42416     1     1 Canada      <NA>
## 2 226.6314 58.42336     1     2 Canada      <NA>
## 3 226.6122 58.41196     1     3 Canada      <NA>
## 4 226.5911 58.40027     1     4 Canada      <NA>
## 5 226.5719 58.38864     1     5 Canada      <NA>
## 6 226.5528 58.37724     1     6 Canada      <NA>

tail(w2hr)

##             long      lat group   order      region subregion
## 2276817 125.0258 11.18471  2284 2276817 Philippines     Leyte
## 2276818 125.0172 11.17142  2284 2276818 Philippines     Leyte
## 2276819 125.0114 11.16110  2284 2276819 Philippines     Leyte
## 2276820 125.0100 11.15555  2284 2276820 Philippines     Leyte
## 2276821 125.0111 11.14861  2284 2276821 Philippines     Leyte
## 2276822 125.0155 11.13887  2284 2276822 Philippines     Leyte

Creating the first map

From here let’s create a basic map of the USA. It’ll be filled with black and nothing else.

ggplot() + 
  geom_polygon(data = usa, aes(x = long, y = lat, group = group)) + 
  coord_quickmap()

Next, let’s make it a bit more advanced by coloring the border red.

ggplot() + 
  geom_polygon(data = usa, aes(x = long, y = lat, group = group), fill = NA, color = "red") + 
  coord_quickmap()

Then let’s fill the map with a violet color.

gg1 <- ggplot() + 
  geom_polygon(data = usa, aes(x = long, y = lat, group = group), fill = "violet", color = "blue") + 
  coord_quickmap()
gg1

Let’s add black and yellow points at the NMFS lab in Santa Cruz and at the Northwest Fisheries Science Center lab in Seattle.

labs <- tibble(
  long = c(-122.064873, -122.306417),
  lat = c(36.951968, 47.644855),
  names = c("SWFSC-FED", "NWFSC"))  

gg1 + 
  geom_point(data = labs, aes(x = long, y = lat), shape = 21, color = "black", fill = "yellow", size = 5) +
  geom_text(data = labs, aes(x = long, y = lat, label = names), hjust = 0, nudge_x = 1)

The group aesthetics importance can be seen here as there will be lines connecting certain points.

ggplot() + 
      geom_polygon(data = usa, aes(x = long, y = lat), fill = "violet", color = "blue") + 
      geom_point(data = labs, aes(x = long, y = lat), shape = 21, color = "black", fill = "yellow", size = 5) +
      geom_text(data = labs, aes(x = long, y = lat, label = names), hjust = 0, nudge_x = 1) +
      coord_quickmap()

State Maps

Now that we know how to make a basic graph, let’s make it a bit more advanced. This time we’ll be looking at California specifically. So first we need to load the data for states.

states <- map_data("state")

dim(states)

## [1] 15537     6

head(states)

##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

tail(states)

##            long      lat group order  region subregion
## 15594 -106.3295 41.00659    63 15594 wyoming      <NA>
## 15595 -106.8566 41.01232    63 15595 wyoming      <NA>
## 15596 -107.3093 41.01805    63 15596 wyoming      <NA>
## 15597 -107.9223 41.01805    63 15597 wyoming      <NA>
## 15598 -109.0568 40.98940    63 15598 wyoming      <NA>
## 15599 -109.0511 40.99513    63 15599 wyoming      <NA>

Now we just fill in the USA with the different states.

ggplot(data = states) + 
  geom_polygon(aes(x = long, y = lat, fill = region, group = group), color = "white") + 
  coord_quickmap() +
  guides(fill = FALSE)  # do this to leave off the color legend

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

Plotting subsets

We can even choose to include certain states. Let’s graph the 3 states California, Oregon, and Washington.

west_coast <- states %>%
  filter(region %in% c("california", "oregon", "washington"))

ggplot(data = west_coast) + 
  geom_polygon(aes(x = long, y = lat), fill = "palegreen", color = "black")

Something seems off, let’s include “group()” and “quickmap()” to see if there’s a difference.

ggplot(data = west_coast) + 
  geom_polygon(aes(x = long, y = lat, group = group), fill = "palegreen", color = "black") + 
  coord_quickmap()

Now it’s much better to look at over the previous map which was elongated a bit too much.

Focusing on California

Now let’s focus on a specific state - California.

ca_df <- states %>%
  filter(region == "california")

head(ca_df)

##        long      lat group order     region subregion
## 1 -120.0060 42.00927     4   667 california      <NA>
## 2 -120.0060 41.20139     4   668 california      <NA>
## 3 -120.0060 39.70024     4   669 california      <NA>
## 4 -119.9946 39.44241     4   670 california      <NA>
## 5 -120.0060 39.31636     4   671 california      <NA>
## 6 -120.0060 39.16166     4   672 california      <NA>

Now let’s add the country lines to the map of Califronia.

counties <- map_data("county")
ca_county <- counties %>%
  filter(region == "california")

head(ca_county)

##        long      lat group order     region subregion
## 1 -121.4785 37.48290   157  6965 california   alameda
## 2 -121.5129 37.48290   157  6966 california   alameda
## 3 -121.8853 37.48290   157  6967 california   alameda
## 4 -121.8968 37.46571   157  6968 california   alameda
## 5 -121.9254 37.45998   157  6969 california   alameda
## 6 -121.9483 37.47717   157  6970 california   alameda

Before we add any counties, let’s see if the basic map does indeed display Califronia.

ca_base <- ggplot(data = ca_df, mapping = aes(x = long, y = lat, group = group)) + 
  coord_quickmap() + 
  geom_polygon(color = "black", fill = "gray")
ca_base + theme_void()

Now that we’re set let’s fill in the counties.

ca_base + theme_void() + 
  geom_polygon(data = ca_county, fill = NA, color = "white") +
  geom_polygon(color = "black", fill = NA)  # get the state border back on top

Population Data

Since we were able to divide the country into counties, I wonder if we could add other data to the map. Let’s try to color the map by population. First, we need to pull the population data of each country first.

# make a data frame
x <- readLines("/Users/justinpark/Desktop/DATA 110/Unit 5/HW/CA Counties Text.txt")

## Warning in readLines("/Users/justinpark/Desktop/DATA 110/Unit 5/HW/CA Counties
## Text.txt"): incomplete final line found on '/Users/justinpark/Desktop/DATA 110/
## Unit 5/HW/CA Counties Text.txt'

    pop_and_area <- str_match(x, "^([a-zA-Z ]+)County\t.*\t([0-9,]{2,10})\t([0-9,]{2,10}) sq mi$")[, -1] %>%
      na.omit() %>%
      as.data.frame(stringsAsFactors = FALSE) %>%
      mutate(subregion = str_trim(V1) %>% tolower(),
             population = as.numeric(str_replace_all(V2, ",", "")),
             area = as.numeric(str_replace_all(V3, ",", ""))
      ) %>%
      dplyr::select(subregion, population, area) %>%
      tbl_df()

## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.

    head(pop_and_area)

## # A tibble: 6 x 3
##   subregion population  area
##   <chr>          <dbl> <dbl>
## 1 alameda      1578891   738
## 2 alpine          1159   739
## 3 amador         36519   593
## 4 butte         222090  1640
## 5 calaveras      44515  1020
## 6 colusa         21358  1151

Now we need to attach the population numbers to each county area. We do this using “left_join” and we’ll also add a new variable “people_per_mile”.

cacopa <- left_join(ca_county, pop_and_area, by = "subregion") %>%
  mutate(people_per_mile = population / area)

head(cacopa)

##        long      lat group order     region subregion population area
## 1 -121.4785 37.48290   157  6965 california   alameda    1578891  738
## 2 -121.5129 37.48290   157  6966 california   alameda    1578891  738
## 3 -121.8853 37.48290   157  6967 california   alameda    1578891  738
## 4 -121.8968 37.46571   157  6968 california   alameda    1578891  738
## 5 -121.9254 37.45998   157  6969 california   alameda    1578891  738
## 6 -121.9483 37.47717   157  6970 california   alameda    1578891  738
##   people_per_mile
## 1        2139.419
## 2        2139.419
## 3        2139.419
## 4        2139.419
## 5        2139.419
## 6        2139.419

Now let’s create the modified map.

elbow_room1 <- ca_base + 
      geom_polygon(data = cacopa, aes(fill = people_per_mile), color = "white") +
      geom_polygon(color = "black", fill = NA) +
      theme_void()

elbow_room1

The population density in Califronia is too great of a number. We need to change the scaling to get a better understanding.

elbow_room1 + scale_fill_gradient(trans = "log10")

There’s contrast among the county densities. However, the colors should still be more distinguishable.

eb2 <- elbow_room1 + 
    scale_fill_gradientn(colours = rev(rainbow(7)),
                         breaks = c(2, 4, 10, 100, 1000, 10000),
                         trans = "log10")

eb2

Very colorful. But perhaps a bit too colorfu. This could be unpleasant to look at for some people. Let’s use a different color package “viridis” to get better colors.

eb2 <- elbow_room1 + 
  scale_fill_viridis(breaks = c(2, 4, 10, 100, 1000, 10000),
                         trans = "log10")

eb2

Zooming In

So we’ve mapped the entire country, specific states, and even counties. But can we zoom in further to include only the Bay Area?

eb2 + xlim(-123, -121.0) + ylim(36, 38)

Unfortunately, this was unsuccessful. Parts of the map got cutout and even a random line appeared connecting 2 points (the same line from before). Let’s modify this to get rid of the line and include the full Bay Area.

eb2 + coord_quickmap(xlim = c(-123, -121.0),  ylim = c(36, 38))

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Great, now the map is complete. We were able to map and observe the entirety of the USA, specific states, counties, and even a specific region of a state.

eb2 + coord_quickmap(xlim = c(-123, -121.0),  ylim = c(36, 38))

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Justin Park - Making Maps with R

Justin Park