1 Introduction

Geospatial data contains information related to locations on Earth and are an important type of data with a wide range of applications. Geospatial data comes in two main forms:

The R language has a collection of packages that make working with geospatial data quick and intuitive. In this vignette I have presented an example that shows off the basic functionalities of the follow packages:

2 Australian agricultural land use and climate data

This example makes use of one source of vector data and one source of raster data.

The vector data is land use commodities from the Australian Department of Agriculture. It is a multipart polygon feature in which each polygon is an area that has been identified as being used to produce a particular agriculture commodity.

The raster data is annual rainfall from the Australian Bureau of Meteorology. The average annual rainfall is calculated over the years 1961 to 1990 and is given in a resolution of 5 km.

library(here)

# download land use data from https://www.agriculture.gov.au/abares/aclump/land-use/catchment-scale-land-use-of-australia-commodities-update-december-2018
land_use_zip <- tempfile()
download.file("https://www.agriculture.gov.au/sites/default/files/documents/clum_commodities_2018_v2.zip", land_use_zip)
unzip(land_use_zip)
unlink(land_use_zip)

# download rainfall from http://www.bom.gov.au/jsp/ncc/climate_averages/rainfall/index.jsp?period=an&area=oz#maps
rainfall_zip <- tempfile()
download.file("http://www.bom.gov.au/web01/ncc/www/climatology/rainfall/rainan.zip", rainfall_zip)
unzip(rainfall_zip, exdir   = here("data"))
unlink(rainfall_zip)

3 Loading the data

The vector based land use data is loaded using readOGR from the rgdal package. To read the shapefile the “data source name” is the full file destination and the “layer” is the filename. We can quickly view the data using the plot function to see the land use polygons which are scattered across Australia.

library(rgdal)

# load shapefile
land_use <- readOGR(dsn  = here("data\\Spatial_data_zip\\CLUM_Commodities_2018_v2.shp"), layer = "CLUM_Commodities_2018_v2")
plot(land_use)

summary gives an overview of the attributes. We can see that the “Commod_dsc” and “Tertiary” attributes give detailed information about the land use type, however for this example we will only use “Broad_type” which categorises the land uses into 12 groups.

## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x 113.66138 153.59937
## y -43.35307 -11.21385
## Is projected: FALSE 
## proj4string : [+proj=longlat +ellps=GRS80 +no_defs]
## Data attributes:
##         Commod_dsc                   Broad_type      Source_yr    
##  cattle dairy:29240   Animals             :61191   2017   :68486  
##  grapes      :15201   Fruits              :58392   2016   :43147  
##  cattle meat :12047   Other crops         : 5350   2015   :20790  
##  grapes wine :11170   Nuts                : 4965   2014   : 1464  
##  sheep       : 7061   Vegetables and herbs: 4208   2009   : 1087  
##  grapes table: 6982   Cereals             : 1545   2012   : 1079  
##  (Other)     :56198   (Other)             : 2248   (Other): 1846  
##      State          Area_ha           Lucodev8n    
##  Vic    :84164   Min.   :     0.0   320    :30079  
##  WA     :15227   1st Qu.:     0.7   444    :15565  
##  NSW    :11538   Median :     3.4   449    :14587  
##  SA     :10690   Mean   :   433.9   441    :14068  
##  NT     : 7364   3rd Qu.:    21.9   420    :12363  
##  Qld    : 5275   Max.   :753505.0   210    : 4780  
##  (Other): 3641                      (Other):46457  
##                                       Tertiary       Shape_Leng       
##  3.2.0 Grazing modified pastures          :30079   Min.   : 0.000000  
##  4.4.4 Irrigated vine fruits              :15565   1st Qu.: 0.004179  
##  4.4.9 Irrigated grapes                   :14587   Median : 0.009225  
##  4.4.1 Irrigated tree fruits              :14068   Mean   : 0.025297  
##  4.2.0 Grazing irrigated modified pastures:12363   3rd Qu.: 0.023028  
##  2.1.0 Grazing native vegetation          : 4780   Max.   :14.367872  
##  (Other)                                  :46457                      
##    Shape_Area       
##  Min.   :0.0000000  
##  1st Qu.:0.0000006  
##  Median :0.0000033  
##  Mean   :0.0003771  
##  3rd Qu.:0.0000218  
##  Max.   :0.6570078  
## 

The rainfall data is loaded using raster. Again plot gives us a quick inital look at the data and the scale which in annual rainfall in mm.

library(raster)

# load raster
rainfall <- raster(here("data\\rnozan.txt"))
plot(rainfall)

4 Filtering the data

The national datasets are quite slow to work with due to their size. For this example we are going to filter out only the data for Tasmania. We can use indexing of the “State” attribute to filter the land use spatial data frame.

# filter subset of polygon data
land_use_TAS <- land_use[land_use$State == "Tas", ]
plot(land_use_TAS)

And for the rainfall data we use crop from the raster package. Here we have described the latitude and longitude of the extent we want directly. An alternative method is drawExtent which allows you to select the extent by clicking on the plot.

# filter subset of raster data
rainfall_TAS <- crop(rainfall, extent(142, 152, -46, -39))
plot(rainfall_TAS)

5 Coordinate Reference Systems (CRS)

Coordinate Reference Systems are used to define how 2D geospatial data relates to actual locations on the Earth’s surface. When combining spatial data its important to ensure that they have the same CRS. Shapefiles have a source CRS that can be checked with proj4string.

# check the source CRS of the polygon data
proj4string(land_use_TAS) 
## [1] "+proj=longlat +ellps=GRS80 +no_defs"

We then assign the same CRS to the rainfall data, which did not have a CRS defined in the source file.

# set CRS projection for the raster
projection(rainfall_TAS) <- CRS("+proj=longlat +ellps=GRS80 +no_defs")

More details on CRS’s in R can be found here.

6 Mapping with Leaflet

Leaflet is a powerful tool for creating interactive maps. Leaflets can include features such as maps from external sources, vector layers, rasters, control buttons and popups with attribute details. Here we create a leaflet with our vector and raster data overlayed on an OpenStreetMap background.

library(leaflet)

# create colour palettes for both objects
land_use_pal <- colorFactor("Dark2", land_use_TAS$Broad_type)
rainfall_pal <- colorNumeric("Blues", values(rainfall_TAS))

# create leaflet with Openstreetmap background, vector and raster data overlayed
leaflet(land_use_TAS) %>% 
  addProviderTiles(providers$CartoDB.Positron, options = tileOptions(minZoom = 8, maxZoom = 12)) %>% 
  addPolygons(weight = 2,  smoothFactor = 0.8, opacity = 1.0, color = ~land_use_pal(Broad_type)) %>% 
  addRasterImage(rainfall_TAS, colors = rainfall_pal, opacity = 0.5) %>% 
  addLegend("bottomright", pal = land_use_pal, values = ~Broad_type, title = "Type of land use") %>%
  addLegend("topright", pal = rainfall_pal, values = values(rainfall_TAS), title = "Average annual </br> rainfall (mm)")

7 Combining the data

Often a key step in analysing vector and raster data is to combine them. Here we obtain rainfall values within each land use polygon using raster’s extract. The polygon vector object is a S4 object and to assign a new attribute to it we use “@” to reference the “data” slot of the object.

# extract the raster values within each polygon to a list object
rainfall_TAS.vals <- raster::extract(rainfall_TAS, land_use_TAS)

# the rainfall raster pixels are relatively large so most polygons only have one value
# to obtain one average value for every polygon calculate the mean values with lapply
rainfall_TAS.mean <- unlist(lapply(rainfall_TAS.vals, FUN=mean))

# join the average values to a new attribute in the "data" slot of the polygon S4 object
land_use_TAS@data <- data.frame(land_use_TAS@data, mean_rainfall = rainfall_TAS.mean)

8 Anlaysis with dplyr and graphing with ggplot2

Now that our vector object contains the combined information we can manipulate it and visualise the results as we would a conventional dataframe. Here we access the attibute data of the polygon object by coercing it to a data frame using as.data.frame.

library(dplyr)
library(ggplot2)
library(stringr)

# visualise the annual rainfall in each land use area using a boxplot
as.data.frame(land_use_TAS) %>% 
  ggplot(aes(x = reorder(Broad_type, -mean_rainfall, mean), y = mean_rainfall, fill = Broad_type)) +
  geom_boxplot(show.legend = FALSE) +
  ggtitle("Rainfall in Tasmanian agricultural land uses") +
  scale_x_discrete(name = "Land use", labels = function(x) str_wrap(x, width = 10)) +
  scale_y_continuous(name = "Average annual rainfall (mm)") +
  scale_fill_brewer(palette = "Dark2")

Here we read and write to the attribute values using the “@data” slot method.

# create a new attribute that splits the rainfall results into bins
land_use_TAS@data <- as.data.frame(land_use_TAS) %>% 
  mutate(rainfall_bin = cut(mean_rainfall, breaks = c(500, 625, 750, 875, 1000, 1125, 1250, 1375, 1500), labels=c("625","750","875","1000","1125","1250","1375","1500")))

# calculate proportion of land area type that falls into each rainfall bin and chart 
land_use_TAS@data %>%
  group_by(Broad_type, rainfall_bin) %>% 
  summarise(sub_area = sum(Area_ha)) %>%
  filter(!is.na(rainfall_bin)) %>% 
  mutate(percent_area = sub_area / sum(sub_area)) %>% 
  ggplot(aes(x = rainfall_bin, y = percent_area, fill = Broad_type)) +
  geom_col() +
  ggtitle("Distribution of Tasmanian agricultural land by annual rainfall") +
  theme(axis.text.x  = element_text(angle=90, vjust=0.3), legend.position = "none", aspect.ratio = 3/5) +
  scale_x_discrete(name = "Average annual rainfall (mm)") +
  scale_y_continuous(name = "Percentage of land use area", labels = scales::percent) +
  facet_wrap(~Broad_type) +
  scale_fill_brewer(palette = "Dark2")

9 References

Agafonkin V. 2014, Leaflet for R, RStudio, viewed 14 August 2020, https://rstudio.github.io/leaflet/

Bivand R. et al. 2020, Package ‘rgdal’, CRAN, viewed 13 August 2020, https://cran.r-project.org/web/packages/rgdal/rgdal.pdf

Bureau of Meteorology 2016, Average annual, seasonal and monthly rainfall, Australian Government, viewed 13 August 2020, http://www.bom.gov.au/jsp/ncc/climate_averages/rainfall/index.jsp?period=an&area=oz#maps

Department of Agriculture, Water and the Environment 2019, Catchment scale land use of Australia - Commodities - Update December 2018, Australian Government, viewed 9 August 2020, https://www.agriculture.gov.au/abares/aclump/land-use/catchment-scale-land-use-of-australia-commodities-update-december-2018

Hijmans R.J. 2020, Package ‘raster’, CRAN, viewed 13 August 2020, https://cran.r-project.org/web/packages/raster/raster.pdf

Frazier M. 2020, Overview of Coordinate Reference Systems (CRS) in R, National Center for Ecological Analysis and Synthesis, viewed 16 August 2020, https://www.nceas.ucsb.edu/sites/default/files/2020-04/OverviewCoordinateReferenceSystems.pdf