My fourth R Notebook: Joining geospatial data and non-spatial attributes

1. Introduction

This is the four notebook which Geomatica Basica students have to write to get started with R & RStudio. It aims at learning how to join non-spatial attributes to geospatial data. We will illustrate the topic by joining a spatial feature representing municipalities in Boyacá (i.e. a shapefile provided by DANE, 2017) and a non-spatial data frame with agriculture statistics (i.e a csv provided by the Ministerio de Agricultura y Desarrollo Rural, 2020).

2. Loading the required libraries

Let’s start loading several libraries, in particular dplyr:

# install packages before anything
library(tidyverse)
library(dplyr)
library(ggplot2)
library(sf)

3. Reading a table with Boyaca’s agriculture statistics

We start by reading a file with EVA statistics for Boyaca.

eva_boyaca <- read_csv(file = "../../datos/EVA_Boyaca_2019.csv")

eva_boyaca

4. Reading a shapefile with Boyaca’s municipalities

We will use the shapefile corresponding to Marco Geoestadistico Departamental which is available at DANE Geoportal.

Let’s start reading the data using the sf library:

mun_boyaca <- sf::st_read("../../datos/shapefiles/15_BOYACA/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp")

## Reading layer `MGN_MPIO_POLITICO' from data source `/Users/ivanlizarazo/Documents/ivan/GB/datos/shapefiles/15_BOYACA/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp' using driver `ESRI Shapefile'
## Simple feature collection with 123 features and 2 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -74.66496 ymin: 4.655196 xmax: -71.94885 ymax: 7.055557
## geographic CRS: WGS 84

What is mun_boyaca?

mun_boyaca

Note that mun_boyaca is a simple feature collection. Make sure to review this link for understanding what is a simple feature.

Note also that the data uses the WGS1984 geographic coordinate reference system (i.e. 4326 epsg code).

Let’s calculate area of each municipality:

mun_boyaca$KM2 <- st_area(st_transform(mun_boyaca, 3116))/1E6

mun_boyaca$KM2 <- as.numeric(mun_boyaca$KM2)

mun_boyaca$KM2 <- round(mun_boyaca$KM2,3)

min(mun_boyaca$KM2)

## [1] 25.353

max(mun_boyaca$KM2)

## [1] 1513.601

Let’s make a map of municipalities:

library(leaflet)
bins <- c(0, 150, 300, 450, 600, 750, 900, 1200, 1600)
pal <- colorBin("RdYlGn", domain = mun_boyaca$KM2, bins = bins)

  mapa <- leaflet(data = mun_boyaca) %>%
  addTiles() %>%
  addPolygons(label = ~KM2,
              popup = ~MPIO_CNMBR,
              fillColor = ~pal(KM2),
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.5,
              highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
              ) %>%
  addProviderTiles(providers$OpenStreetMap) %>%
  addLegend("bottomright", pal = pal, values = ~KM2,
    title = "Municipalities extent [Km2] (DANE, 2018)",
    opacity = 1
  )

mapa

5. Joining attributes to the spatial object

5.1 The joining task

We can use the left_join function, provided by the dplyr package, to join the municipalities and the agricultural statistics.

Let’s review how left_join works:

Left joins are a type of mutating join, since they simply add columns to the first table. To perform a left join with dplyr, call left_join(), passing two tibbles and a character vector of columns to join on.

# do not uncomment
# this is just an explanation of the function syntax
#left_join(a_tibble, another_tibble, by = c("id_col1", "id_col2"))

We need a common attribute (or shared variable) to base the left join on. The best attribute is an ID. In mun_boyaca, the MPIO_CCDGO attribute seems fine (it reads 15001 for Tunja). In eva_boyaca, the corresponding attribute is COD_MUN (it reads 15001 for Tunja).

However, the MPIO_CCDGO attribute is a string:

class(mun_boyaca$MPIO_CCDGO)

## [1] "character"

And the COD_MUN attribute is numeric:

class(eva_boyaca$COD_MUN)

## [1] "numeric"

To be able to do the join, we need to change both data type and contents for the code which identifies every municipality. For this task, it is a good idea to create a copy of the original statistical data. Using this approach, any false movement will not spoil the original data.

5.2 Editing tasks

Let’s proceed step by step:

mun_boyaca$COD_MUN <-  as.double(mun_boyaca$MPIO_CCDGO)

class(mun_boyaca$COD_MUN)

## [1] "numeric"

5.3 Selecting, filtering, and tyding statistics:

We saw in a previous notebook several functions for “editing” a tibble. Let’s filter the EVA tibble by crop type (e.g. an important crop as identified in the third notebook). In addition, let’s select several columns.

papa_boyaca <- eva_boyaca %>%  filter(CULTIVO == "PAPA")  %>%  dplyr::select(MUNICIPIO, COD_MUN, YEAR, PERIODO, TON_PROD, RENDIM)

papa_boyaca

head(papa_boyaca)

tail(papa_boyaca)

unique(papa_boyaca$YEAR)

##  [1] 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

unique(papa_boyaca$PERIODO)

##  [1] "2006B" "2007A" "2007B" "2008A" "2008B" "2009A" "2009B" "2010A" "2010B"
## [10] "2011A" "2011B" "2012A" "2012B" "2013A" "2013B" "2014A" "2014B" "2015A"
## [19] "2015B" "2016A" "2016B" "2017A" "2017B" "2018A"

papa_boyaca %>% 
  gather("YEAR", "PERIODO", "TON_PROD", "RENDIM" , key = variable, value = number)

head(papa_boyaca)

tail(papa_boyaca)

summary(papa_boyaca)

##   MUNICIPIO            COD_MUN           YEAR        PERIODO         
##  Length:2228        Min.   :15001   Min.   :2006   Length:2228       
##  Class :character   1st Qu.:15226   1st Qu.:2009   Class :character  
##  Mode  :character   Median :15494   Median :2012   Mode  :character  
##                     Mean   :15475   Mean   :2012                     
##                     3rd Qu.:15757   3rd Qu.:2015                     
##                     Max.   :15879   Max.   :2018                     
##     TON_PROD         RENDIM     
##  Min.   :    2   Min.   : 0.50  
##  1st Qu.:  180   1st Qu.:10.00  
##  Median :  900   Median :13.00  
##  Mean   : 4298   Mean   :14.14  
##  3rd Qu.: 3721   3rd Qu.:17.68  
##  Max.   :95450   Max.   :44.55

5.4 Converting from long format into wide format

Before going further, let’s review several functions provided by the tidyr package: - gather() makes “wide” data longer - spread() makes “long” data wider - separate() splits a single column into multiple columns - unite() combines multiple columns into a single column

papa_boyaca %>% replace(is.na(.), 0) -> papa_boyaca2

papa_boyaca %>% group_by(MUNICIPIO, COD_MUN, YEAR) %>%
   summarize(TON_PROD=sum(TON_PROD)) -> papa_boyaca2

## `summarise()` has grouped output by 'MUNICIPIO', 'COD_MUN'. You can override using the `.groups` argument.

head(papa_boyaca2)

tail(papa_boyaca2)

The following chunk a code seems hard to understand. I would suggest that you study the documentation of the functions group, gather, unite, and pivot to gain an appropriate understanding of the chunk:

papa_boyaca2 %>% 
  group_by(COD_MUN) %>% 
  gather("TON_PROD", key = variable, value = number)   %>% 
  unite(combi, variable, YEAR) %>%
  pivot_wider(names_from = combi, values_from = number, values_fill = 0) ->                                                                    papa_boyaca3

Let’s check the head of the output:

head(papa_boyaca3)

Let’s check the tail of the output:

tail(papa_boyaca3)

5.5 Join attributes to the spatial object

Now, we can join the non-spatial attributes to the spatial simple feature object:

mun_boyaca_papa = left_join(mun_boyaca, papa_boyaca3, by="COD_MUN")

summary(mun_boyaca_papa)

##   MPIO_CCDGO         MPIO_CNMBR             KM2             COD_MUN     
##  Length:123         Length:123         Min.   :  25.35   Min.   :15001  
##  Class :character   Class :character   1st Qu.:  64.26   1st Qu.:15237  
##  Mode  :character   Mode  :character   Median : 122.70   Median :15500  
##                                        Mean   : 188.11   Mean   :15482  
##                                        3rd Qu.: 203.16   3rd Qu.:15722  
##                                        Max.   :1513.60   Max.   :15897  
##                                                                         
##   MUNICIPIO         TON_PROD_2007   TON_PROD_2008   TON_PROD_2009     
##  Length:123         Min.   :    0   Min.   :    0   Min.   :     0.0  
##  Class :character   1st Qu.:  359   1st Qu.:  168   1st Qu.:   236.5  
##  Mode  :character   Median : 2155   Median : 1320   Median :  1715.0  
##                     Mean   : 7036   Mean   : 7660   Mean   :  9085.0  
##                     3rd Qu.: 7238   3rd Qu.: 7428   3rd Qu.:  7802.5  
##                     Max.   :63000   Max.   :64000   Max.   :181575.0  
##                     NA's   :31      NA's   :31      NA's   :31        
##  TON_PROD_2010      TON_PROD_2011      TON_PROD_2012     TON_PROD_2013    
##  Min.   :     0.0   Min.   :     0.0   Min.   :    0.0   Min.   :    0.0  
##  1st Qu.:   274.5   1st Qu.:   251.2   1st Qu.:  264.2   1st Qu.:  265.5  
##  Median :  1887.5   Median :  1443.0   Median : 2365.5   Median : 2184.0  
##  Mean   :  9073.2   Mean   :  8069.9   Mean   : 7812.2   Mean   : 9058.9  
##  3rd Qu.:  6232.5   3rd Qu.:  6232.5   3rd Qu.: 7312.5   3rd Qu.: 8655.0  
##  Max.   :162990.0   Max.   :116510.0   Max.   :98160.0   Max.   :96000.0  
##  NA's   :31         NA's   :31         NA's   :31        NA's   :31       
##  TON_PROD_2014      TON_PROD_2015     TON_PROD_2016      TON_PROD_2017  
##  Min.   :     0.0   Min.   :    0.0   Min.   :     0.0   Min.   :    0  
##  1st Qu.:   228.2   1st Qu.:  279.5   1st Qu.:   238.5   1st Qu.:  255  
##  Median :  1965.0   Median : 1833.5   Median :  1500.0   Median : 1645  
##  Mean   :  9408.5   Mean   : 7889.8   Mean   : 10407.7   Mean   : 9347  
##  3rd Qu.:  9987.5   3rd Qu.: 6852.0   3rd Qu.:  7762.5   3rd Qu.: 7200  
##  Max.   :101140.0   Max.   :86000.0   Max.   :115500.0   Max.   :99100  
##  NA's   :31         NA's   :31        NA's   :31         NA's   :31     
##  TON_PROD_2018     TON_PROD_2006              geometry  
##  Min.   :    0.0   Min.   :    0.0   POLYGON      :123  
##  1st Qu.:  223.5   1st Qu.:  193.2   epsg:4326    :  0  
##  Median :  949.0   Median : 1150.0   +proj=long...:  0  
##  Mean   : 5997.5   Mean   : 3233.8                      
##  3rd Qu.: 4050.0   3rd Qu.: 3516.2                      
##  Max.   :66750.0   Max.   :30000.0                      
##  NA's   :31        NA's   :31

head(mun_boyaca_papa[,1:10])

tail(mun_boyaca_papa[,1:10])

We can see that CUBARA and BERBEO municipalities do not have values for the TON_PROD and REND attributes for the PAPA crop.

Why this happened? It is because the papa statistics for these municipalities are not available (NA).

papa_cubara <-  papa_boyaca2 %>% filter(COD_MUN==15223)

papa_cubara

6. Map potato production in Boyaca’s municipalities

Let’s load the libraries needed to make the map:

#install.packages("RColorBrewer")
library(RColorBrewer)
library(leaflet)

Let’s make a map of total production of potato in 2017:

bins <- c(0, 500, 1000, 5000, 10000, 15000, 25000, 50000, 75000)
pal <- colorBin("YlOrRd", domain = mun_boyaca_papa$TON_PROD_2017, bins = bins)

  mapa <- leaflet(data = mun_boyaca_papa) %>%
  addTiles() %>%
  addPolygons(label = ~TON_PROD_2017,
              popup = ~MPIO_CNMBR,
              fillColor = ~pal(TON_PROD_2017),
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.5,
              highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
              ) %>%
  addProviderTiles(providers$OpenStreetMap) %>%
  addLegend("bottomright", pal = pal, values = ~TON_PROD_2017,
    title = "Potato crop production in Boyaca [Ton] (2017)",
    opacity = 1
  )

mapa

7. Map potato harvested area in Boyaca’s municipalities

Now, we are going to repeat the process to make a different map, in this case the harvested area of potato crop in every municipality.

Let’s remind the EVA variables names:

summary(eva_boyaca)

##     COD_DPTO      DPTO              COD_MUN       MUNICIPIO        
##  Min.   :15   Length:20576       Min.   :15001   Length:20576      
##  1st Qu.:15   Class :character   1st Qu.:15238   Class :character  
##  Median :15   Mode  :character   Median :15511   Mode  :character  
##  Mean   :15                      Mean   :15486                     
##  3rd Qu.:15                      3rd Qu.:15753                     
##  Max.   :15                      Max.   :15897                     
##                                                                    
##     GRUPO             SUBGRUPO           CULTIVO            DESAGREG        
##  Length:20576       Length:20576       Length:20576       Length:20576      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##       YEAR        PERIODO            HA_SIEMBRA        HA_COSECHA     
##  Min.   :2006   Length:20576       Min.   :   0.00   Min.   :   0.00  
##  1st Qu.:2009   Class :character   1st Qu.:   7.00   1st Qu.:   6.00  
##  Median :2012   Mode  :character   Median :  20.00   Median :  18.00  
##  Mean   :2012                      Mean   :  94.11   Mean   :  81.19  
##  3rd Qu.:2015                      3rd Qu.:  61.00   3rd Qu.:  55.00  
##  Max.   :2018                      Max.   :8443.00   Max.   :7528.00  
##                                                                       
##     TON_PROD           RENDIM          CICLO          
##  Min.   :    0.0   Min.   :  0.07   Length:20576      
##  1st Qu.:   17.0   1st Qu.:  1.29   Class :character  
##  Median :   66.0   Median :  4.80   Mode  :character  
##  Mean   :  959.8   Mean   :  9.93                     
##  3rd Qu.:  315.0   3rd Qu.: 12.00                     
##  Max.   :95450.0   Max.   :200.00                     
##                    NA's   :193

Now, apply the filter & select functions to get the area cosechada en papa:

papa_boyaca <- eva_boyaca %>%  filter(CULTIVO == "PAPA")  %>%  dplyr::select(MUNICIPIO, COD_MUN, YEAR, PERIODO, HA_COSECHA)

Then, let’s aggregate PERIODO values into YEAR:

papa_boyaca %>% group_by(MUNICIPIO, COD_MUN, YEAR) %>%
   summarize(HA_COSECHA=sum(HA_COSECHA)) -> papa_boyaca2

Then, let’s reformat the statistics into wider format:

papa_boyaca2 %>% 
  group_by(COD_MUN) %>% 
  gather("HA_COSECHA", key = variable, value = number)   %>% 
  unite(combi, variable, YEAR) %>%
  pivot_wider(names_from = combi, values_from = number, values_fill = 0) ->                                                                    papa_boyaca3

Now, it’s join time:

mun_boyaca_papa = left_join(mun_boyaca, papa_boyaca3, by="COD_MUN")

Let’s check the output:

mun_boyaca_papa

Finally, mapping time:

bins <- c(0, 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000)
pal <- colorBin("RdYlGn", domain = mun_boyaca_papa$HA_COSECHA_2017, bins = bins)

  mapa <- leaflet(data = mun_boyaca_papa) %>%
  addTiles() %>%
  addPolygons(label = ~HA_COSECHA_2017,
              popup = ~MPIO_CNMBR,
              fillColor = ~pal(HA_COSECHA_2017),
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.5,
              highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
              ) %>%
  addProviderTiles(providers$OpenStreetMap) %>%
  addLegend("bottomright", pal = pal, values = ~HA_COSECHA_2017,
    title = "Potato harvested area in Boyaca [Ha] (2017)",
    opacity = 1
  )

The complete map can be displayed:

mapa

8. Further tasks

Feel free to explore more functionalities of the packages used here to examine and map agriculture trends in your department.

9. Reproducibility

sessionInfo()

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-2 leaflet_2.0.4.1    sf_0.9-7           forcats_0.5.1     
##  [5] stringr_1.4.0      dplyr_1.0.4        purrr_0.3.4        readr_1.4.0       
##  [9] tidyr_1.1.2        tibble_3.0.6       ggplot2_3.3.3      tidyverse_1.3.0   
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0        xfun_0.21               bslib_0.2.4            
##  [4] haven_2.3.1             leaflet.providers_1.9.0 colorspace_2.0-0       
##  [7] vctrs_0.3.6             generics_0.1.0          htmltools_0.5.1.1      
## [10] yaml_2.2.1              rlang_0.4.10            e1071_1.7-4            
## [13] jquerylib_0.1.3         pillar_1.4.7            glue_1.4.2             
## [16] withr_2.4.1             DBI_1.1.1               dbplyr_2.1.0           
## [19] modelr_0.1.8            readxl_1.3.1            lifecycle_1.0.0        
## [22] munsell_0.5.0           gtable_0.3.0            cellranger_1.1.0       
## [25] rvest_0.3.6             htmlwidgets_1.5.3       evaluate_0.14          
## [28] knitr_1.31              crosstalk_1.1.1         class_7.3-18           
## [31] broom_0.7.5             Rcpp_1.0.6              KernSmooth_2.23-18     
## [34] scales_1.1.1            backports_1.2.1         classInt_0.4-3         
## [37] jsonlite_1.7.2          farver_2.0.3            fs_1.5.0               
## [40] hms_1.0.0               digest_0.6.27           stringi_1.5.3          
## [43] grid_4.0.4              cli_2.3.0               tools_4.0.4            
## [46] magrittr_2.0.1          sass_0.3.1              crayon_1.4.1           
## [49] pkgconfig_2.0.3         ellipsis_0.3.1          xml2_1.3.2             
## [52] reprex_1.0.0            lubridate_1.7.9.2       assertthat_0.2.1       
## [55] rmarkdown_2.7           httr_1.4.2              rstudioapi_0.13        
## [58] R6_2.5.0                units_0.6-7             compiler_4.0.4