Making maps in GGplot

October 2023

Luke Brokensha

This markdown document will provide you with a step by step process on how to make maps using the package ggplot2. The first step for any project, is to organise our files and set our working directory. I usually make a new folder on my desktop where all my data, code and outputs will go. In this example, I have made a folder called “Making Maps”. To set the working directory use the “Session” dropdown at the top of R studio, choose “Set Working Directory”, and then “Choose Directory”. Navigate to the folder you have created and press “open”. Save all the data you need into this folder, so it is easy to access.

Install and load packages

The next step is to load the packages that you will need - these are below. You might need to install some of these before you can load them. This is done using the “install packages…” button under the “tools” dropdown at the top of R Studio. Installing “SOmap” is a little different. The code to install this package is below

install.packages("SOmap", repos = c(SCAR = "https://scar.r-universe.dev",
                                    CRAN = "https://cloud.r-project.org"))

Once all packages are installed, run the code below

library(ggplot2)
library(xfun)
library(dplyr)
library(reshape)
library(maps)
library(mapdata)
library(ggtext)
library(SOmap)
library(gganimate)

Loading data

Next, we will need our data. For this example, I have used Continuous Plankton Recorder zooplankton data taken from the AADC - https://data.aad.gov.au/metadata/records/AADC-00099. The data is downloaded as a .csv file, which is great for using in R. If opened in Excel, the data should look like the image below.

We need to load this data set into our R work space. When loading the data in, it will be turned into whats called a “data frame”. We use the code below to load it in. In the code, we are telling R to make a data frame called “data”, and we are using the “read.csv” function to find our file. If you have set your working directory correctly and saved the .csv into that folder, you should be able to find the file easily. Inside the parentheses, type “” and then press the tab button. This will show you the files that are currently in your working directory, and you can navigate to your desired .csv file.

data <- read.csv("AADC_Zoop_CPR_Data.csv")

A data frame has now been created, and will appear in the top right of your work space (image below)

We can click on this data frame and it will open in R, showing us our rows, columns and values.

Cleaning the data

The columns we are interested in keeping for our map are “Latitude”, “Longitude”, “Year_Local”, “Month_Local” and all the species columns at the end. To use this data effectively in a map, we need to remove the columns we don’t want. We can do this with the line of code below. Instead of cleaning the original data frame, I have decided to make a new data frame called “data_clean”. This means we will keep the original data frame untouched, and our clean up will happen on a new data frame. The reason for keeping the original data frame will become important later on.

data_clean = subset(data, select = -c(1, 2, 3, 4, 7, 10, 11))

In the code above, we are asking R to grab our data frame, and make a subset of it (into a new data frame). when we use “-c”, we are asking R to remove columns for us. The columns we want removed are chosen by the numbers in the parentheses. Once you run this code, the data frame should now look like the image below

Currently, each species has its own column. For mapping data points in ggplot, it is much easier if all species are in the one column. This makes it easier for us to select individual species for mapping, as well as plotting different species abundance against one another. To do this, we need to “melt” the data frame, which is done in the code below.

data_clean <- melt(data_clean, id = c("Month", "Year", "Latitude", "Longitude"))

We are asking R to take the data frame, and stack the species columns on top of each other. We do not want to stack the columns “Latitude”, “Longitude”, “Year_local”, and “Month_local”, so we mention this in the code above. R stacks the species in a new column and calls it “variable.” It also makes a new column called “value” which is the abundance count for each species at each lat/long. The result should look like the image below.

If you scroll down through the data frame, you will now see that each species is stacked, starting with Abylidae, and then onto Acartia etc. The next step we take, is to change the column names. This is not necessary, but it makes the data frame easier to read - code below

colnames(data_clean)[5] = "Species"
colnames(data_clean)[6] = "Abundance"

Next, we want to remove 0’s from our data. This is done with the code below

data_clean = data_clean[data_clean$Abundance > 0.1,]

Making a species specific data frame

Now that our data is ready, we can start to target a specific species. To do this, we need to make a separate data frame. Let’s say we want to target Calanus propinquus- we need to know exactly how it is spelt in our data frame. We can do this by clicking on our data frame and using the search function in the top right. Once we know how it is spelt, we use the dplyr filter function to make a new data frame (shown below). We have called our new data frame “Cal_prop”. If successful, you will see a new data frame appear in the top right window of your R program.

Cal_prop <- dplyr::filter(data_clean, Species == "Calanus.propinquus")

Now we can start to build our map using ggplot. There are a few steps we need to take to make sure the map works. Firstly, we need to give ggplot some data for mapping the world. When installing our packages, we installed one called “mapworld”. We are going to make a data frame using data from this package with the code below. We call this data frame “world”.

world <- map_data("world")

We should now have 4 data frames. These are “data”, “data_clean”, “kerguelensis” and “world”.

Making a simple map and setting boundaries

We are going to make a simple plot to see if everything is working as it should be. In the first line of code, we are asking ggplot to make a plot. The “+” symbol tells ggplot that we want to add more to this plot. In the second line, the “geom_polygon” function allows us to fill the plot space with data. under “data” we are choosing the recently made “world” data frame. The “aes” function lets us determine how the X and Y axis are filled. In this case, we use long and lat to fill x and y. These values come from the “world” data frame. We want to outline the countries with colour “darkgrey”, and fill them with colour “grey”. The alpha value controls the transparency of the data points (in this case, the world map). A value of 1 makes data points completely solid, where as 0.5 would make data points halfway transparent.

ggplot() +
    geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1)

We don’t need to see the whole world when our data is focused in the Southern Ocean. We can easily define our plot boundaries by adding a new line at the end of our code. This tells ggplot that we want to keep adding values and rules to our plot. I have chosen an xlimit (xlim) and ylimit (ylim) that correspond to lats and longs of interest. You will notice that a “+” symbol has been added to the end of the previous line of code. We only need to map between 50 and 160 East, and -30 and -70 South, so these values are entered into the code below. The coordinates line of code (coord_sf) is usually the last line of code for the plot, so anytime you want to add new lines of code, make sure they go above this line.

ggplot() +
    geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
    coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Adding data points to our map

Now, we can start to add data points. The code below can be copied, pasted and reused - as long as what you enter after “data” (in the geom_point line of code) corresponds to your data frame, and “x” and “y” correspond to values within that data frame. For example, the value for “x” is “Longitude”. That is exactly how it is spelt in my data frame. Putting a value of “long” or “longitude” (not capitalized), would cause an error. I have put the data frame “Cal_prop” into the data value, and the data points now show up on our map. I have chosen “deepskyblue4” as the colour for these points, but this can be easily changed to another colour. https://derekogle.com/NCGraphing/resources/colors provides a great resource for colours in R. If using Hexadecimal colours, make sure you place a “#” before the text.

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude), colour="steelblue4", pch=20, size=2) +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

The map above shows us the occurrence of Calanus propinquus from CPR records, but it doesn’t show us all CPR effort. We need to understand where the CPR sampled and didn’t find Calanus propinquus to be able to make any assumptions about the data. To do this, we need to also map all CPR samples by adding another line of code. This is the main reason that we kept the original “data” data frame as it was.

The new line of code is below:

geom_point(data=data, aes(x=Longitude, y=Latitude), colour=“grey”, pch=20, size=2) +

This line of code is very similar to the code that produced our Calanus data points, with two small changes. Under the “data” prompt, instead of “Cal_prop” like before, we have chosen the “data” data frame. We have also changed the colour from “steelblue4” to “grey”, so we can differentiate between the two data sets. I put this line of code directly under the geom_polygon line, as I think it should be the first thing we place on the world map. Make sure you include the “+” at the end of the line, so R knows to move to the next line and continue adding more to our plot. When put all together, the map looks like this:

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
    geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=2) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude), colour="steelblue4", pch=20, size=2) +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Cleaning the map

This map looks okay, but it is quite messy and hard to read. One way to clean up the data, is to group the lat and long into rounded numbers. For example, instead of a data point at -52.0365, and -52.0347, both data points would be grouped under -52. This means you lose some resolution in the data, but it becomes much easier to visualize. Whether you do this or not, depends on the question you are trying to answer. When rounding lat and long, we round all data frames, uncluding the “data”, “data_clean”, and the “Cal_prop” data frame. The code to round the lat and long is below

data <- dplyr::mutate(data, Latitude = round(Latitude, digits = 0))
data <- dplyr::mutate(data, Longitude = round(Longitude, digits = 0))
data_clean <- dplyr::mutate(data_clean, Latitude = round(Latitude, digits = 0))
data_clean <- dplyr::mutate(data_clean, Longitude = round(Longitude, digits = 0))
Cal_prop <- dplyr::mutate(Cal_prop, Latitude = round(Latitude, digits = 0))
Cal_prop <- dplyr::mutate(Cal_prop, Longitude = round(Longitude, digits = 0))

Once this is done, we can re run our map code, and see how it looks.

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
    geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=2) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude), colour="steelblue4", pch=20, size=2) +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

This looks much better. For more contrast between CPR data points and Calanus propinquus presence, we can change the size of the data points. For this example, let’s change the size of the CPR data points from 2 to 1. We make this change in the “size=” argument at the end of our target line, which in this case is the first “geom_point” line.

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey" ,fill="grey", alpha=1) +
  geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude), colour="steelblue4", pch=20, size=2) +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Adding a legend and figure title

Adding a legend can become a little tricky when you are using multiple data frames, but with some small manipulation to our code, we can get it done. Firstly, we want to move the “colour=” prompt inside the “aes()” prompt. To do this, we just need to move one bracket. We have simply moved the end bracket from “y=Latitude” to colour=steelblue4”. See the difference between the two lines below.

geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude), colour=“steelblue4”, pch=20, size=2) +

geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, colour=“steelblue4”), pch=20, size=2) +

Next, we need to change the colour value from “steelblue4” to “Calanus propinquus”. This only works in conjunction with a new line of code, which is below:

 scale_color_manual(values = c("Calanus propinquus" = "steelblue4"))

By putting the colour prompt inside the aes(), we are telling ggplot that we want to have a legend. By saying “Calanus propinquus” inside the colour prompt, we are telling R, this is the name of my data point in the legend. By adding the scale_colour_manual line, we are telling R what colour we want this data point to be. This is the final code, and final result.

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
  geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
  scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

The position of the legend is squishing our plot too much. We want to move the legend to the bottom, and we also want to get rid of the word “colour” from the legend. We can do both of these things with the following line.

  theme(legend.title = element_blank(), legend.position = "bottom")

When put all together..

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
  geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
  scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
  theme(legend.title = element_blank(), legend.position = "bottom") +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Finally, we will add a plot title. This is achieved with the following line of code

  labs(title = "Calanus Propinquus", subtitle = "Southern Ocean distribution")

When mapped all together:

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
  geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
  scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
  theme(legend.title = element_blank(), legend.position = "bottom") +
  labs(title = "Calanus Propinquus", subtitle = "Southern Ocean distribution") +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Adding more species

We can easily add another species to our map by following some previous steps from above. Firstly, we make a new data frame with our next target species - let’s choose Euphausia vallentini. Because we have already rounded the lat and long in our “data_clean” file, we don’t need to do this again, as this is where we make our new data frame from.

Euph_val <- dplyr::filter(data_clean, Species == "Euphausia.vallentini")

Once we have our new species data frame, we can add a new line to our code. We can copy and paste the Calanus Propinquus “geom_point” code, and change two variables. We change data from “Cal_prop” to “Euph_val” and we change the colour from “Calanus propinquus” to “Euphausia vallentini”. We then go to our “scale_color_manual” line and add a new colour. Both new lines look like this:

geom_point(data=Euph_val, aes(x=Longitude, y=Latitude, size=2, colour=“Euphausia vallentini”), pch=20, size=2) +

scale_color_manual(values = c(“Calanus propinquus” = “steelblue4”, “Euphausia vallentini” = “olivedrab4”)) +

When you add the new geom_point line, and add the new colour to the scale_colour_manual line, the overall code looks like this:

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
  geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
  geom_point(data=Euph_val, aes(x=Longitude, y=Latitude, size=2, colour="Euphausia vallentini"), pch=20, size=2) +
  scale_color_manual(values = c("Calanus propinquus" = "steelblue4", "Euphausia vallentini" = "olivedrab4")) +
  theme(legend.title = element_blank(), legend.position = "bottom") +
  labs(title = "Calanus Propinquus + Euphausia vallentini", subtitle = "Southern Ocean distribution") +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Adding SO fronts to the map

The SOmap package includes Southern Ocean fronts built into the package. Because we have installed and loaded that package, we can borrow these fronts for our map. We do this with the code below. This makes a data frame called “fronts”.

fronts <- SOmap_data$fronts_park

We can then add this to our map using a “geom_sf” line near the end of our code.

ggplot()+
  geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
  geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
  geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
  geom_point(data=Euph_val, aes(x=Longitude, y=Latitude, size=2, colour="Euphausia vallentini"), pch=20, size=2) +
  scale_color_manual(values = c("Calanus propinquus" = "steelblue4", "Euphausia vallentini" = "olivedrab4")) +
  theme(legend.title = element_blank(), legend.position = "bottom") +
  labs(title = "Calanus Propinquus + Euphausia vallentini", subtitle = "Southern Ocean distribution") +
  geom_sf(data=fronts, inherit.aes = FALSE, colour = "darkgrey") +
  coord_sf(xlim = c(50,160), ylim=c(-70,-30))

Mapping abundance instead of presence/absence

Finally, we will map abundance of species instead of presence/absence