This is our fourth R notebook :Joining geospatiaal data and non-spatial attributes, written using R Markdown Notebook. It aims at learning how to join non-spatial attributes to geospatial data. We will illustrate the topic by joining a spatial feature representing municipalities in Sucre (i.e. a shapefile provided by DANE, 2017) and a non-spatial data frame with agriculture statistics (i.e a csv provided by the Ministerio de Agricultura y Desarrollo Rural, 2020).
library(RColorBrewer)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(leaflet)
library(sf)
We start by reading a file with EVA statistics and filter for the Sucre departament.
EVA <- read.csv(file = "/Users/Acer/Documents/Materias U/Geomatica Basica/Evaluaciones_Agropecuarias_Municipales_EVA.csv", header=T, sep = ";") #Load the all file
eva_sucre <- filter(EVA, DEPARTAMENTO == "SUCRE");eva_sucre; #Show the file only with Sucre
We will use the shapefile corresponding to Marco Geoestadistico Departamental which is available at DANE Geoportal.
Let’s start reading the data using the sf library:
#Leer el archivo .shp que contiene los poligonos de cada municipio del Sucre
mun_sucre <- read_sf("/Users/Acer/Documents/Materias U/Geomatica Basica/70_SUCRE/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp"); select(mun_sucre, MPIO_CCDGO, MPIO_CNMBR,)
Note that mun_sucre is a simple feature collection. Make sure to review this link for understanding what is a simple feature.
Note also that the data uses the WGS1984 geographic coordinate reference system (i.e. 4326 epsg code).
Let’s calculate area of each municipality:
mun_sucre$KM2 <- st_area(st_transform(mun_sucre, 3116))/1E6
mun_sucre$KM2 <- as.numeric(mun_sucre$KM2)
mun_sucre$KM2 <- round(mun_sucre$KM2,3)
min(mun_sucre$KM2)
## [1] 56.555
max(mun_sucre$KM2)
## [1] 1494.163
Let’s make a map of municipalities:
bins <- c(56, 186.7, 317.4, 448.1, 578.8, 709.5, 840.2, 970.9, 1101.6,1232.3, 1495)
pal <- colorBin("RdBu", domain = mun_sucre$KM2, bins = bins)
mapa <- leaflet(data = mun_sucre) %>%
addTiles() %>%
addPolygons(label = ~KM2,
popup = ~MPIO_CNMBR,
fillColor = ~pal(KM2),
color = "#444444",
weight = 1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
) %>%
addProviderTiles(providers$OpenStreetMap) %>%
addLegend("bottomright", pal = pal, values = ~KM2,
title = "Municipalities extent [Km2] (DANE, 2018)",
opacity = 1
);mapa
We can use the left_join function, provided by the dplyr package, to join the municipalities and the agricultural statistics. We need a common attribute (or shared variable) to base the left join on. The best attribute is an ID. In mun_sucre, the MPIO_CCDGO attribute seems fine (it reads 70001 for Sincelejo). In eva_sucre, the corresponding attribute is COD_MUN. (it reads 70001 for Sincelejo).
Now let’s see what type these attributes are
class(mun_sucre$MPIO_CCDGO); class(eva_sucre$COD_MUN.)
## [1] "character"
## [1] "integer"
To be able to do the join, we need to change both data type and contents for the code which identifies every municipality. For this task, it is a good idea to create a copy of the original statistical data. Using this approach, any false movement will not spoil the original data.
Let’s proceed step by step:
mun_sucre$COD_MUN. <- as.integer(mun_sucre$MPIO_CCDGO); class(mun_sucre$COD_MUN.); mun_sucre
## [1] "integer"
We saw in a previous notebook several functions for “editing” a tibble. Let’s filter the EVA tibble by crop type (e.g. an important crop as identified in the third notebook). In addition, let’s select several columns.
arroz_sucre <- eva_sucre %>% filter(CULTIVO == "ARROZ") %>% dplyr::select(MUNICIPIO, COD_MUN., YEAR, PERIODO, TON_PRODUCCION, RENDIMIENTO); arroz_sucre
unique(arroz_sucre$YEAR)
## [1] 2010 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018
unique(arroz_sucre$PERIODO)
## [1] "2010A" "2010B" "2006B" "2007A" "2007B" "2008A" "2008B" "2009A" "2009B"
## [10] "2011B" "2011A" "2012B" "2012A" "2013A" "2013B" "2014A" "2014B" "2015A"
## [19] "2015B" "2016A" "2016B" "2017A" "2017B" "2018A"
arroz_sucre%>%
gather("YEAR","PERIODO","TON_PRODUCCION","RENDIMIENTO", key = variable, value = number)
summary(arroz_sucre)
## MUNICIPIO COD_MUN. YEAR PERIODO
## Length:634 Min. : 704 Min. :2006 Length:634
## Class :character 1st Qu.:70204 1st Qu.:2009 Class :character
## Mode :character Median :70265 Median :2012 Mode :character
## Mean :58914 Mean :2012
## 3rd Qu.:70708 3rd Qu.:2015
## Max. :70823 Max. :2018
##
## TON_PRODUCCION RENDIMIENTO
## Min. : 0.00 Min. :0.300
## 1st Qu.: 25.25 1st Qu.:1.500
## Median : 145.00 Median :2.800
## Mean : 1185.19 Mean :2.752
## 3rd Qu.: 566.00 3rd Qu.:3.890
## Max. :47018.00 Max. :6.500
## NA's :5
Before going further, let’s review several functions provided by the tidyr package: - gather() makes “wide” data longer - spread() makes “long” data wider - separate() splits a single column into multiple columns - unite() combines multiple columns into a single column
arroz_sucre %>% replace(is.na(.), 0) -> arroz_sucre2
arroz_sucre %>% group_by(MUNICIPIO, COD_MUN., YEAR) %>%
summarize(TON_PROD=sum(TON_PRODUCCION)) -> arroz_sucre2 ; head(arroz_sucre2); tail(arroz_sucre2)
## `summarise()` has grouped output by 'MUNICIPIO', 'COD_MUN.'. You can override using the `.groups` argument.
The following chunk a code seems hard to understand. I would suggest that you study the documentation of the functions group, gather, unite, and pivot to gain an appropriate understanding of the chunk:
arroz_sucre2 %>%
group_by(COD_MUN.) %>%
gather("TON_PROD", key = variable, value = number) %>%
unite(combi, variable, YEAR) %>%
pivot_wider(names_from = combi, values_from = number, values_fill = 0) -> arroz_sucre3 ; head(arroz_sucre3); tail(arroz_sucre3)
Now, we can join the non-spatial attributes to the spatial simple feature object:
mun_sucre$COD_MUN. <- as.integer(mun_sucre$MPIO_CCDGO)
mun_sucre_arroz = left_join(mun_sucre, arroz_sucre3, by="COD_MUN."); summary(mun_sucre_arroz); select(mun_sucre_arroz,MPIO_CCDGO,MPIO_CNMBR,KM2,COD_MUN.,MUNICIPIO,TON_PROD_2010,TON_PROD_2006,TON_PROD_2007,TON_PROD_2008,TON_PROD_2009,TON_PROD_2011,TON_PROD_2012,TON_PROD_2013,TON_PROD_2014,TON_PROD_2015,TON_PROD_2016,TON_PROD_2017,TON_PROD_2018)
## DPTO_CCDGO MPIO_CCDGO MPIO_CNMBR MPIO_CRSLC
## Length:26 Length:26 Length:26 Length:26
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## MPIO_NAREA MPIO_NANO DPTO_CNMBR Shape_Leng
## Min. : 56.56 Min. :2017 Length:26 Min. :0.4269
## 1st Qu.: 180.54 1st Qu.:2017 Class :character 1st Qu.:0.8558
## Median : 282.49 Median :2017 Mode :character Median :1.1494
## Mean : 407.38 Mean :2017 Mean :1.2339
## 3rd Qu.: 417.35 3rd Qu.:2017 3rd Qu.:1.4341
## Max. :1494.16 Max. :2017 Max. :2.9850
##
## Shape_Area geometry KM2 COD_MUN.
## Min. :0.004652 POLYGON :26 Min. : 56.55 Min. :70001
## 1st Qu.:0.014847 epsg:4326 : 0 1st Qu.: 180.54 1st Qu.:70231
## Median :0.023241 +proj=long...: 0 Median : 282.49 Median :70451
## Mean :0.033493 Mean : 407.38 Mean :70459
## 3rd Qu.:0.034320 3rd Qu.: 417.35 3rd Qu.:70707
## Max. :0.122777 Max. :1494.16 Max. :70823
##
## MUNICIPIO TON_PROD_2010 TON_PROD_2006 TON_PROD_2007
## Length:26 Min. : 0.00 Min. : 0.0 Min. : 0.0
## Class :character 1st Qu.: 12.09 1st Qu.: 32.0 1st Qu.: 39.5
## Mode :character Median : 74.50 Median : 145.0 Median : 458.0
## Mean : 860.05 Mean : 1527.3 Mean : 1563.9
## 3rd Qu.: 845.25 3rd Qu.: 667.9 3rd Qu.: 1331.8
## Max. :5950.00 Max. :16925.0 Max. :12516.5
## NA's :8 NA's :8 NA's :8
## TON_PROD_2008 TON_PROD_2009 TON_PROD_2011 TON_PROD_2012
## Min. : 0.0 Min. : 0.00 Min. : 0 Min. : 0.00
## 1st Qu.: 53.5 1st Qu.: 38.76 1st Qu.: 30 1st Qu.: 59.25
## Median : 242.5 Median : 103.00 Median : 195 Median : 173.50
## Mean : 2594.7 Mean : 783.01 Mean : 1832 Mean : 2246.82
## 3rd Qu.: 1580.2 3rd Qu.: 789.00 3rd Qu.: 1902 3rd Qu.: 2349.97
## Max. :21711.2 Max. :4767.19 Max. :10599 Max. :16455.68
## NA's :8 NA's :8 NA's :8 NA's :8
## TON_PROD_2013 TON_PROD_2014 TON_PROD_2015 TON_PROD_2016
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 82.25 1st Qu.: 103.0 1st Qu.: 102.8 1st Qu.: 15
## Median : 454.00 Median : 455.5 Median : 482.5 Median : 422
## Mean :1424.64 Mean : 4122.9 Mean : 6156.0 Mean : 8237
## 3rd Qu.:1872.31 3rd Qu.: 4106.2 3rd Qu.: 5203.4 3rd Qu.: 4314
## Max. :6130.00 Max. :26092.0 Max. :36666.0 Max. :62557
## NA's :8 NA's :8 NA's :8 NA's :8
## TON_PROD_2017 TON_PROD_2018
## Min. : 0.00 Min. : 0
## 1st Qu.: 33.75 1st Qu.: 0
## Median : 650.40 Median : 31
## Mean : 5469.56 Mean : 3518
## 3rd Qu.: 2753.75 3rd Qu.: 3248
## Max. :43394.00 Max. :21401
## NA's :8 NA's :8
select(mun_sucre_arroz,MPIO_CCDGO,MPIO_CNMBR,KM2,COD_MUN.,MUNICIPIO,TON_PROD_2010,TON_PROD_2006,TON_PROD_2007,TON_PROD_2008,TON_PROD_2009,TON_PROD_2011,TON_PROD_2012,TON_PROD_2013,TON_PROD_2014,TON_PROD_2015,TON_PROD_2016,TON_PROD_2017,TON_PROD_2018)
We can see that BUENAVISTA and CHALAN municipalities do not have values for the TON_PROD and REND attributes for the ARROZ crop.
Why this happened? It is because the ARROZ statistics for these municipalities are not available (NA).
arroz_buenavista <- arroz_sucre2 %>% filter(COD_MUN.==70110); arroz_buenavista
Let’s make a map of total production of rice in 2017:
bins <- c(5000, 10000, 15000, 25000, 50000, 75000)
pal <- colorBin("YlOrRd", domain = mun_sucre_arroz$TON_PROD_2017, bins = bins)
mapa <- leaflet(data = mun_sucre_arroz) %>%
addTiles() %>%
addPolygons(label = ~TON_PROD_2017,
popup = ~MPIO_CNMBR,
fillColor = ~pal(TON_PROD_2017),
color = "#444444",
weight = 1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
) %>%
addProviderTiles(providers$OpenStreetMap) %>%
addLegend("bottomright", pal = pal, values = ~TON_PROD_2017,
title = "rice crop production in Sucre [Ton] (2017)",
opacity = 1
); mapa
## Warning in pal(TON_PROD_2017): Some values were outside the color scale and will
## be treated as NA
Now, we are going to repeat the process to make a different map, in this case the harvested area of rice crop in every municipality.
Let’s remind the EVA variables names:
summary(eva_sucre)
## ï..COD_DEP. DEPARTAMENTO COD_MUN. MUNICIPIO
## Min. :70 Length:4503 Min. : 704 Length:4503
## 1st Qu.:70 Class :character 1st Qu.:70124 Class :character
## Median :70 Mode :character Median :70265 Mode :character
## Mean :70 Mean :58496
## 3rd Qu.:70 3rd Qu.:70678
## Max. :70 Max. :70823
##
## GRUPO SUBGRUPO CULTIVO
## Length:4503 Length:4503 Length:4503
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## DESAGREGACION_REGIONAL_Y.O_SISTEMA_PRODUCTIVO YEAR
## Length:4503 Min. :2006
## Class :character 1st Qu.:2009
## Mode :character Median :2012
## Mean :2012
## 3rd Qu.:2015
## Max. :2018
##
## PERIODO HA_SEMBRADA HA._COSECHADA TON_PRODUCCION
## Length:4503 Min. : 1.0 Min. : 0.0 Min. : 0.0
## Class :character 1st Qu.: 10.0 1st Qu.: 8.0 1st Qu.: 12.0
## Mode :character Median : 50.0 Median : 40.0 Median : 78.0
## Mean :132.1 Mean :120.3 Mean : 591.6
## 3rd Qu.:160.0 3rd Qu.:140.0 3rd Qu.: 349.0
## Max. :990.0 Max. :995.0 Max. :47018.0
##
## RENDIMIENTO ESTADO_FISICO_PRODUCCION NOMBRE._CIENTIFICO
## Min. : 0.100 Length:4503 Length:4503
## 1st Qu.: 1.500 Class :character Class :character
## Median : 3.200 Mode :character Mode :character
## Mean : 5.038
## 3rd Qu.: 7.000
## Max. :70.000
## NA's :129
## CICLO
## Length:4503
## Class :character
## Mode :character
##
##
##
##
Now, apply the filter & select functions to get the area cosechada en arroz:
arroz_sucre <- eva_sucre %>% filter(CULTIVO == "ARROZ") %>% dplyr::select(MUNICIPIO, COD_MUN., YEAR, PERIODO, HA._COSECHADA)
Then, let’s aggregate PERIODO values into YEAR:
arroz_sucre %>% group_by(MUNICIPIO, COD_MUN., YEAR) %>%
summarize(HA._COSECHADA=sum(HA._COSECHADA)) -> arroz_sucre2
## `summarise()` has grouped output by 'MUNICIPIO', 'COD_MUN.'. You can override using the `.groups` argument.
Then, let’s reformat the statistics into wider format:
arroz_sucre2 %>%
group_by(COD_MUN.) %>%
gather("HA._COSECHADA", key = variable, value = number) %>%
unite(combi, variable, YEAR) %>%
pivot_wider(names_from = combi, values_from = number, values_fill = 0) -> arroz_sucre3
mun_sucre$COD_MUN. <- as.integer(mun_sucre$MPIO_CCDGO)
Now, it’s join time:
mun_sucre_arroz = left_join(mun_sucre, arroz_sucre3, by="COD_MUN."); mun_sucre_arroz
Finally, mapping time:
bins <- c(0, 100, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000)
pal <- colorBin("RdYlGn", domain = mun_sucre_arroz$HA._COSECHADA_2017, bins = bins)
mapa <- leaflet(data = mun_sucre_arroz) %>%
addTiles() %>%
addPolygons(label = ~HA._COSECHADA_2017,
popup = ~MPIO_CNMBR,
fillColor = ~pal(HA._COSECHADA_2017),
color = "#444444",
weight = 1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
) %>%
addProviderTiles(providers$OpenStreetMap) %>%
addLegend("bottomright", pal = pal, values = ~HA._COSECHADA_2017,
title = "rice harvested area in Sucre [Ha] (2017)",
opacity = 1
); mapa
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Spanish_Colombia.1252 LC_CTYPE=Spanish_Colombia.1252
## [3] LC_MONETARY=Spanish_Colombia.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Colombia.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sf_0.9-7 leaflet_2.0.4.1 forcats_0.5.1 stringr_1.4.0
## [5] dplyr_1.0.5 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3
## [9] tibble_3.1.0 ggplot2_3.3.3 tidyverse_1.3.0 RColorBrewer_1.1-2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 lubridate_1.7.10 leaflet.providers_1.9.0
## [4] class_7.3-18 assertthat_0.2.1 digest_0.6.27
## [7] utf8_1.1.4 R6_2.5.0 cellranger_1.1.0
## [10] backports_1.2.1 reprex_1.0.0 evaluate_0.14
## [13] e1071_1.7-4 httr_1.4.2 pillar_1.5.1
## [16] rlang_0.4.10 readxl_1.3.1 rstudioapi_0.13
## [19] rmarkdown_2.7 htmlwidgets_1.5.3 munsell_0.5.0
## [22] broom_0.7.5 compiler_4.0.4 modelr_0.1.8
## [25] xfun_0.21 pkgconfig_2.0.3 htmltools_0.5.1.1
## [28] tidyselect_1.1.0 fansi_0.4.2 crayon_1.4.1
## [31] dbplyr_2.1.0 withr_2.4.1 grid_4.0.4
## [34] jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.0
## [37] DBI_1.1.1 magrittr_2.0.1 units_0.7-0
## [40] scales_1.1.1 KernSmooth_2.23-18 cli_2.3.1
## [43] stringi_1.5.3 farver_2.1.0 fs_1.5.0
## [46] xml2_1.3.2 ellipsis_0.3.1 generics_0.1.0
## [49] vctrs_0.3.6 tools_4.0.4 glue_1.4.2
## [52] hms_1.0.0 crosstalk_1.1.1 yaml_2.2.1
## [55] colorspace_2.0-0 classInt_0.4-3 rvest_1.0.0
## [58] knitr_1.31 haven_2.3.1