Making maps in GGplot
October 2023
- Install and load packages
- Loading data
- Cleaning the data
- Making a species specific data frame
- Making a simple map and setting boundaries
- Adding data points and a legend to our map
- Cleaning the map
- Adding a figure title
- Adding more species
- Adding SO fronts to the map
- Mapping abundance instead of presence/absence
This markdown document will provide you with a step by step process on how to make maps using the package ggplot2. The first step for any project, is to organise our files and set our working directory. I usually make a new folder on my desktop where all my data, code and outputs will go. In this example, I have made a folder called “Making Maps”. To set the working directory use the “Session” dropdown at the top of R studio, choose “Set Working Directory”, and then “Choose Directory”. Navigate to the folder you have created and press “open”. Save all the data you need into this folder, so it is easy to access.
Install and load packages
The next step is to load the packages that you will need - these are below. You might need to install some of these before you can load them. This is done using the “install packages…” button under the “tools” dropdown at the top of R Studio. Installing “SOmap” is a little different. The code to install this package is below
install.packages("SOmap", repos = c(SCAR = "https://scar.r-universe.dev",
CRAN = "https://cloud.r-project.org"))Once all packages are installed, run the code below
library(ggplot2)
library(xfun)
library(dplyr)
library(reshape)
library(maps)
library(mapdata)
library(ggtext)
library(SOmap)
library(gganimate)Loading data
Next, we will need our data. For this example, I have used Continuous Plankton Recorder zooplankton data taken from the AADC - https://data.aad.gov.au/metadata/records/AADC-00099. The data is downloaded as a .csv file, which is great for using in R. If opened in Excel, the data should look like the image below.
We need to load this data set into our R work space. When loading the data in, it will be turned into whats called a “data frame”. We use the code below to load it in. In the code, we are telling R to make a data frame called “data”, and we are using the “read.csv” function to find our file. If you have set your working directory correctly and saved the .csv into that folder, you should be able to find the file easily. Inside the parentheses, type “” and then press the tab button. This will show you the files that are currently in your working directory, and you can navigate to your desired .csv file.
data <- read.csv("AADC_Zoop_CPR_Data.csv")A data frame has now been created, and will appear in the top right of your work space (image below)
We can click on this data frame and it will open in R, showing us our rows, columns and values.
Cleaning the data
The columns we are interested in keeping for our map are “Latitude”, “Longitude”, “Year_Local”, “Month_Local” and all the species columns at the end. To use this data effectively in a map, we need to remove the columns we don’t want. We can do this with the line of code below. Instead of cleaning the original data frame, I have decided to make a new data frame called “data_clean”. This means we will keep the original data frame untouched, and our clean up will happen on a new data frame. The reason for keeping the original data frame will become important later on.
data_clean = subset(data, select = -c(1, 2, 3, 4, 7, 10, 11))In the code above, we are asking R to grab our data frame, and make a subset of it (into a new data frame). when we use “-c”, we are asking R to remove columns for us. The columns we want removed are chosen by the numbers in the parentheses. Once you run this code, the data frame should now look like the image below
Currently, each species has its own column.
For mapping data points in ggplot, it is much easier if all species are
in the one column. This makes it easier for us to select individual
species for mapping, as well as plotting different species abundance
against one another. To do this, we need to “melt” the data frame, which
is done in the code below.
data_clean <- melt(data_clean, id = c("Month", "Year", "Latitude", "Longitude"))We are asking R to take the data frame, and stack the species columns on top of each other. We do not want to stack the columns “Latitude”, “Longitude”, “Year_local”, and “Month_local”, so we mention this in the code above. R stacks the species in a new column and calls it “variable.” It also makes a new column called “value” which is the abundance count for each species at each lat/long. The result should look like the image below.
If you scroll down through the data frame, you will now see that each species is stacked, starting with Abylidae, and then onto Acartia etc. The next step we take, is to change the column names. This is not necessary, but it makes the data frame easier to read - code below
colnames(data_clean)[5] = "Species"
colnames(data_clean)[6] = "Abundance"Next, we want to remove 0’s from our data. This is done with the code below
data_clean = data_clean[data_clean$Abundance > 0.1,]Making a species specific data frame
Now that our data is ready, we can start to target a specific species. To do this, we need to make a separate data frame. Let’s say we want to target Calanus propinquus- we need to know exactly how it is spelt in our data frame. We can do this by clicking on our data frame and using the search function in the top right. Once we know how it is spelt, we use the dplyr filter function to make a new data frame (shown below). We have called our new data frame “Cal_prop”. If successful, you will see a new data frame appear in the top right window of your R program.
Cal_prop <- dplyr::filter(data_clean, Species == "Calanus.propinquus")Now we can start to build our map using ggplot. There are a few steps we need to take to make sure the map works. Firstly, we need to give ggplot some data for mapping the world. When installing our packages, we installed one called “mapworld”. We are going to make a data frame using data from this package with the code below. We call this data frame “world”.
world <- map_data("world")We should now have 4 data frames. These are “data”, “data_clean”, “kerguelensis” and “world”.
Making a simple map and setting boundaries
We are going to make a simple plot to see if everything is working as it should be. In the first line of code, we are asking ggplot to make a plot. The “+” symbol tells ggplot that we want to add more to this plot. In the second line, the “geom_polygon” function allows us to fill the plot space with data. under “data” we are choosing the recently made “world” data frame. The “aes” function lets us determine how the X and Y axis are filled. In this case, we use long and lat to fill x and y. These values come from the “world” data frame. We want to outline the countries with colour “darkgrey”, and fill them with colour “grey”. The alpha value controls the transparency of the data points (in this case, the world map). A value of 1 makes data points completely solid, where as 0.5 would make data points halfway transparent.
ggplot() +
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1)We don’t need to see the whole world when our data is focused in the Southern Ocean. We can easily define our plot boundaries by adding a new line at the end of our code. This tells ggplot that we want to keep adding values and rules to our plot. I have chosen an xlimit (xlim) and ylimit (ylim) that correspond to lats and longs of interest. You will notice that a “+” symbol has been added to the end of the previous line of code. We only need to map between 50 and 160 East, and -30 and -70 South, so these values are entered into the code below. The coordinates line of code (coord_sf) is usually the last line of code for the plot, so anytime you want to add new lines of code, make sure they go above this line.
ggplot() +
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
coord_sf(xlim = c(50,160), ylim=c(-70,-30))Adding data points and a legend to our map
Now, we can start to add data points. The code below can be copied, pasted and reused - as long as what you enter after “data” (in the geom_point line of code) corresponds to your data frame, and “x” and “y” correspond to values within that data frame. For example, the value for “x” is “Longitude”. That is exactly how it is spelt in my data frame. Putting a value of “long” or “longitude” (not capitalized), would cause an error. I have put the data frame “Cal_prop” into the data value, and the data points now show up on our map.
If we want a legend to be added with our data, we need to tell R what each data point is to be called, and what colours are to be used. We do this with the ” scale_color_manual” function. Firstly, we make sure the “colour” argument is within the aes() in our geom_point line. Within this colour argument, we write what we want the data point to be called - in this instance, it’s “Calanus propinquus”. Finally, we add a scale_color_manual line, to tell R what colour we want our data point to be.
For our first set of data, I have chosen “deepskyblue4” as the colour for these points, but this can be easily changed to another colour. https://derekogle.com/NCGraphing/resources/colors provides a great resource for colours in R. If using Hexadecimal colours, make sure you place a “#” before the text.
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, colour="Calanus propinquus"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
coord_sf(xlim = c(50,160), ylim=c(-70,-30))The position of the legend is squishing our plot too much. We want to move the legend to the bottom, we want to get rid of the word “colour” from the legend, and we want to change the size of the legend text. We can all of these things with the following line.
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12))All together, it looks like this:
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, colour="Calanus propinquus"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12)) +
coord_sf(xlim = c(50,160), ylim=c(-70,-30))The map above shows us the occurrence of Calanus propinquus from CPR records, but it doesn’t show us all CPR effort. We need to understand where the CPR sampled and didn’t find Calanus propinquus to be able to make any assumptions about the data. To do this, we need to also map all CPR samples by adding another line of code. This is the main reason that we kept the original “data” data frame as it was.
The new line of code is below:
geom_point(data=data, aes(x=Longitude, y=Latitude), colour=“grey”, pch=20, size=2) +
This line of code is very similar to the code that produced our Calanus data points, with two small changes. Under the “data” prompt, instead of “Cal_prop” like before, we have chosen the “data” data frame. We have also changed the colour from “steelblue4” to “grey”, so we can differentiate between the two data sets. I put this line of code directly under the geom_polygon line, as I think it should be the first thing we place on the world map. Make sure you include the “+” at the end of the line, so R knows to move to the next line and continue adding more to our plot. When put all together, the map looks like this:
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=2) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, colour="Calanus propinquus"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12)) coord_sf(xlim = c(50,160), ylim=c(-70,-30))Cleaning the map
This map looks okay, but it is quite messy and hard to read. One way to clean up the data, is to group the lat and long into rounded numbers. For example, instead of a data point at -52.0365, and -52.0347, both data points would be grouped under -52. This means you lose some resolution in the data, but it becomes much easier to visualize. Whether you do this or not, depends on the question you are trying to answer. When rounding lat and long, we round all data frames, including the “data”, “data_clean”, and the “Cal_prop” data frame. The code to round the lat and long is below
data <- dplyr::mutate(data, Latitude = round(Latitude, digits = 0))
data <- dplyr::mutate(data, Longitude = round(Longitude, digits = 0))
data_clean <- dplyr::mutate(data_clean, Latitude = round(Latitude, digits = 0))
data_clean <- dplyr::mutate(data_clean, Longitude = round(Longitude, digits = 0))
Cal_prop <- dplyr::mutate(Cal_prop, Latitude = round(Latitude, digits = 0))
Cal_prop <- dplyr::mutate(Cal_prop, Longitude = round(Longitude, digits = 0))Once this is done, we can re run our map code, and see how it looks.
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=2) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, colour="Calanus propinquus"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12)) coord_sf(xlim = c(50,160), ylim=c(-70,-30))This looks much better. For more contrast between CPR data points and Calanus propinquus presence, we can change the size of the data points. For this example, let’s change the size of the CPR data points from 2 to 1. We make this change in the “size=” argument at the end of our target line, which in this case is the first “geom_point” line.
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey" ,fill="grey", alpha=1) +
geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, colour="Calanus propinquus"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4")) +
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12))+
coord_sf(xlim = c(50,160), ylim=c(-70,-30))Adding a figure title
We will now need to add a plot title. We can do this with the line of code below - This also sets the size of the title text. I have chosen size 22 for the main title, and 18 for the sub title.
labs(title = "Calanus Propinquus", subtitle = "Southern Ocean distribution", plot.title = element_text(size=22), plot.subtitle = element_text(size=18))Adding more species
We can easily add another species to our map by following some previous steps from above. Firstly, we make a new data frame with our next target species - let’s choose Euphausia vallentini. Because we have already rounded the lat and long in our “data_clean” file, we don’t need to do this again, as this is where we make our new data frame from.
Euph_val <- dplyr::filter(data_clean, Species == "Euphausia.vallentini")Once we have our new species data frame, we can add a new line to our code. We can copy and paste the Calanus Propinquus “geom_point” code, and change two variables. We change data from “Cal_prop” to “Euph_val” and we change the colour from “Calanus propinquus” to “Euphausia vallentini”. We then go to our “scale_color_manual” line and add a new colour. Both new lines look like this:
geom_point(data=Euph_val, aes(x=Longitude, y=Latitude, size=2, colour=“Euphausia vallentini”), pch=20, size=2) +
scale_color_manual(values = c(“Calanus propinquus” = “steelblue4”, “Euphausia vallentini” = “olivedrab4”)) +
When you add the new geom_point line, and add the new colour to the scale_colour_manual line, the overall code looks like this:
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
geom_point(data=Euph_val, aes(x=Longitude, y=Latitude, size=2, colour="Euphausia vallentini"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4", "Euphausia vallentini" = "olivedrab4")) +
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12)) +
labs(title = "Calanus Propinquus + Euphausia vallentini", subtitle = "Southern Ocean distribution", plot.title = element_text(size=22), plot.subtitle = element_text(size=18)) +
coord_sf(xlim = c(50,160), ylim=c(-70,-30))Adding SO fronts to the map
The SOmap package includes Southern Ocean fronts built into the package. Because we have installed and loaded that package, we can borrow these fronts for our map. We do this with the code below. This makes a data frame called “fronts”.
fronts <- SOmap_data$fronts_parkWe can then add this to our map using a “geom_sf” line near the end of our code.
ggplot()+
geom_polygon(data = world, aes(x=long, y=lat, group=group), colour="darkgrey",fill="grey", alpha=1) +
geom_point(data=data, aes(x=Longitude, y=Latitude), colour="grey", pch=20, size=1) +
geom_point(data=Cal_prop, aes(x=Longitude, y=Latitude, size=2, colour="Calanus propinquus"), pch=20, size=2) +
geom_point(data=Euph_val, aes(x=Longitude, y=Latitude, size=2, colour="Euphausia vallentini"), pch=20, size=2) +
scale_color_manual(values = c("Calanus propinquus" = "steelblue4", "Euphausia vallentini" = "olivedrab4")) +
theme(legend.title = element_blank(), legend.position = "bottom", legend.text=element_text(size=12)) +
labs(title = "Calanus Propinquus + Euphausia vallentini", subtitle = "Southern Ocean distribution", plot.title = element_text(size=22), plot.subtitle = element_text(size=18)) +
geom_sf(data=fronts, inherit.aes = FALSE, colour = "darkgrey") +
coord_sf(xlim = c(50,160), ylim=c(-70,-30))Mapping abundance instead of presence/absence
Finally, we will map abundance of species instead of presence/absence