1. Why this notebook?

This is an R Markdown Notebook illustrating estadisticas agricolas for the Antioquia departamento in Colombia. It has been created, compiled and published from RStudio Cloud. You are advised to use your own RStudio local installation, if you have one. Otherwise, do it on the cloud.

This notebook aims to help Geomatica Basica students at Universidad Nacional to understand basic geomatics concepts useful for agronomy. Every student should replicate this notebook BUT adapting its contents to her/his department.

The due date for publishing this notebook is on 22th April 2020 at 11:59 am. Its publication at RPubs counts as the first technical report for the course. Its mark represents 15% of the final grade.

2. GIS functionalities

Exploration of non-spatial statistics is essential for understanding what is going on at territories. Several R libraries, in particular dplyr and tidyverse, are very useful for exploring and summarizing statistics.

On the another hand, geospatial operations can improve our understanding of several issues affecting geographic regions. For example, you want to figure out what is the location of those municipalities whose crop yields are outstanding (or, alternatively, lower than the average). For doing such exploration, we need to join non-spatial data with spatial data. We did already this task using QGIS. Now, we will do it with R.

In addition, we could also explore spatial joins. These operations are based on the intersection between two spatial objects, often points and the polygons. There are many ways we can join objects, which may include specific options like crosses,near, within, touches, etc. The point being not to push you, we can do this in another R Notebook!

Let’s start by removing the memory’s contents:

rm(list=ls())

Now, let’s install the libraries we need. Note that, in the following chunk, any package is installed only if it has not been previously installed.

list.of.packages <- c("here", "tidyverse", "rgeos", "maptools", "raster", "sf",  "viridis", "rnaturalearth", "GSODR", "ggrepel", "cowplot")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

Now, let’s load the libraries.

library(here)
## here() starts at /cloud/project
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.5
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(rgeos)
## Loading required package: sp
## rgeos version: 0.5-2, (SVN revision 621)
##  GEOS runtime version: 3.5.1-CAPI-1.9.1 
##  Linking to sp version: 1.4-1 
##  Polygon checking: TRUE
library(maptools)
## Checking rgeos availability: TRUE
library(raster)
## 
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:tidyr':
## 
##     extract
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.2.2, PROJ 4.9.2
library(viridis)
## Loading required package: viridisLite
library(rnaturalearth)
library(GSODR)
library(ggrepel)
library(cowplot)
## 
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
##   default ggplot2 theme anymore. To recover the previous
##   behavior, execute:
##   theme_set(theme_cowplot())
## ********************************************************

3. Exploring agricultural statistics in Antioquia

Previously, I have downloaded statistical data, in csv format, on Estadisticas Municipales Agropecuarias to my computer. Then, I have used excel to eliminate rows for municipalities in departments different than Antioquia. I saved the file with the remaining rows as a local file with name EVA_Antioquia.csv at my computer. Then, I uploaded the file into the agro folder at RStudio Cloud.

Let’s read the csv file with “estadisticas municipales agropecuarias” for Antioquia.

datos <- read_csv("./agro/EVA_Antioquia.csv")
## Parsed with column specification:
## cols(
##   COD_DEP = col_double(),
##   DEPARTAMENTO = col_character(),
##   C0D_MUN = col_double(),
##   MUNICIPIO = col_character(),
##   GRUPO = col_character(),
##   SUBGRUPO = col_character(),
##   CULTIVO = col_character(),
##   YEAR = col_double(),
##   Area_Siembra = col_double(),
##   Area_Cosecha = col_double(),
##   Produccion = col_double(),
##   Rendimiento = col_double(),
##   ESTADO = col_character(),
##   CICLO = col_character()
## )

Let’s check what are the attributes of datos.

head(datos)
## # A tibble: 6 x 14
##   COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO  YEAR
##     <dbl> <chr>          <dbl> <chr>     <chr> <chr>    <chr>   <dbl>
## 1       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2016
## 2       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2016
## 3       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2017
## 4       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2017
## 5       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2018
## 6       5 ANTIOQUIA       5475 MURINDO   TUBE… MALANGA  MALANGA  2011
## # … with 6 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## #   Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>
tail(datos)
## # A tibble: 6 x 14
##   COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO  YEAR
##     <dbl> <chr>          <dbl> <chr>     <chr> <chr>    <chr>   <dbl>
## 1       5 ANTIOQUIA       5761 SOPETRAN  FRUT… ZAPOTE   ZAPOTE   2017
## 2       5 ANTIOQUIA       5854 VALDIVIA  FRUT… ZAPOTE   ZAPOTE   2017
## 3       5 ANTIOQUIA       5142 CARACOLI  FRUT… ZAPOTE   ZAPOTE   2017
## 4       5 ANTIOQUIA       5761 SOPETRAN  FRUT… ZAPOTE   ZAPOTE   2018
## 5       5 ANTIOQUIA       5854 VALDIVIA  FRUT… ZAPOTE   ZAPOTE   2018
## 6       5 ANTIOQUIA       5142 CARACOLI  FRUT… ZAPOTE   ZAPOTE   2018
## # … with 6 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## #   Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>

Note that every municipality has statistics about area sembrada, area cosechada, y rendimiento for different crops in different years. The attribute SUBGRUPO and CULTIVO seem to refer to the same thing (i.e. a crop). Crops are further classified into a given GRUPO.

In this table, we do not have units. However, if we check the original csv file, we find that area units are hectares and that rendimiento units are Ton/ha

We will use the dplyr library to explore the contents of the datos object.

First, let’s get a summary of rendimiento (i.e. the average yield during several years) by grupo and municipio:

datos %>%
  group_by(MUNICIPIO, GRUPO) %>%
  summarise(rend_prom = mean(Rendimiento, na.rm = TRUE)) -> rend_resumen
### Let's visualize only the first six records
head(rend_resumen)
## # A tibble: 6 x 3
## # Groups:   MUNICIPIO [1]
##   MUNICIPIO GRUPO             rend_prom
##   <chr>     <chr>                 <dbl>
## 1 ABEJORRAL CEREALES               3.63
## 2 ABEJORRAL FIBRAS               NaN   
## 3 ABEJORRAL FLORES Y FOLLAJES     17.3 
## 4 ABEJORRAL FRUTALES              11.7 
## 5 ABEJORRAL HORTALIZAS           110   
## 6 ABEJORRAL LEGUMINOSAS            1.41

We can also calculate the average yield per GRUPO at Antioquia’s municipalities:

datos %>%
  group_by(GRUPO) %>%
  summarise(rend_dep = mean(Rendimiento, na.rm = TRUE)) -> rend_antioquia

rend_antioquia
## # A tibble: 11 x 2
##    GRUPO                                            rend_dep
##    <chr>                                               <dbl>
##  1 CEREALES                                             1.87
##  2 FIBRAS                                               1.50
##  3 FLORES Y FOLLAJES                                    9.59
##  4 FORESTALES                                           1.06
##  5 FRUTALES                                            14.0 
##  6 HORTALIZAS                                          39.5 
##  7 LEGUMINOSAS                                          1.36
##  8 OLEAGINOSAS                                          2.86
##  9 OTROS PERMANENTES                                    2.01
## 10 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDICINALES     4.34
## 11 TUBERCULOS Y PLATANOS                               10.3

Note that the highest yields correspond to HORTALIZAS, FRUTALES and TUBERCULOS Y PLATANOS.

Then, let’s find what are the municipalites with largest yield for every group of crops in 2018:

datos %>% 
  filter(YEAR==2018) %>% 
  group_by(GRUPO, MUNICIPIO) %>%
  summarize(max_rend = max(Rendimiento, na.rm = TRUE)) %>%
    slice(which.max(max_rend)) -> rend_max_18

rend_max_18
## # A tibble: 11 x 3
## # Groups:   GRUPO [11]
##    GRUPO                                            MUNICIPIO   max_rend
##    <chr>                                            <chr>          <dbl>
##  1 CEREALES                                         LA UNION        33.6
##  2 FIBRAS                                           GOMEZ PLATA      2.6
##  3 FLORES Y FOLLAJES                                GUARNE          35  
##  4 FORESTALES                                       CACERES          2  
##  5 FRUTALES                                         NECOCLI         99  
##  6 HORTALIZAS                                       GUATAPE        192. 
##  7 LEGUMINOSAS                                      PEÑOL            9  
##  8 OLEAGINOSAS                                      DABEIBA          7  
##  9 OTROS PERMANENTES                                BRICEÑO         28  
## 10 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDICINALES RIONEGRO        15.2
## 11 TUBERCULOS Y PLATANOS                            URRAO           40.5

Now, let’s find what are the municipalites with largest harvested area for every group of crops in 2018:

datos %>% 
  filter(YEAR==2018) %>% 
  group_by(GRUPO, MUNICIPIO) %>%
  summarize(max_area_cosecha = max(Area_Cosecha, na.rm = TRUE)) %>%
    slice(which.max(max_area_cosecha)) -> area_cosecha_max

area_cosecha_max
## # A tibble: 11 x 3
## # Groups:   GRUPO [11]
##    GRUPO                                       MUNICIPIO        max_area_cosecha
##    <chr>                                       <chr>                       <dbl>
##  1 CEREALES                                    NECOCLI                      3900
##  2 FIBRAS                                      URRAO                         157
##  3 FLORES Y FOLLAJES                           CARMEN DE VIBOR…              678
##  4 FORESTALES                                  TARAZA                        760
##  5 FRUTALES                                    TURBO                       10560
##  6 HORTALIZAS                                  CARMEN DE VIBOR…             1000
##  7 LEGUMINOSAS                                 SAN VICENTE                   911
##  8 OLEAGINOSAS                                 MUTATA                        700
##  9 OTROS PERMANENTES                           ANDES                        7648
## 10 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDIC… RIONEGRO                       52
## 11 TUBERCULOS Y PLATANOS                       TURBO                        9441

Note that the larget harvested area in 2018 corresponded to FRUTALES in TURB0 (Urabá). You may know that Turbo’s economy is based on banana crop and tourism industry.

First, let’s select banana’s production (Tons) in Turbo for every year:

datos %>% 
  filter(MUNICIPIO=="TURBO" & SUBGRUPO=="BANANO") %>% 
  group_by(YEAR, CULTIVO) ->  turbo_banano

turbo_banano
## # A tibble: 18 x 14
## # Groups:   YEAR, CULTIVO [18]
##    COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO  YEAR
##      <dbl> <chr>          <dbl> <chr>     <chr> <chr>    <chr>   <dbl>
##  1       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2007
##  2       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2008
##  3       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2009
##  4       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2010
##  5       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2011
##  6       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2012
##  7       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2013
##  8       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2014
##  9       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2015
## 10       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2016
## 11       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2017
## 12       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANO   2018
## 13       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANI…  2013
## 14       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANI…  2014
## 15       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANI…  2015
## 16       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANI…  2016
## 17       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANI…  2017
## 18       5 ANTIOQUIA       5837 TURBO     FRUT… BANANO   BANANI…  2018
## # … with 6 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## #   Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>

Let’s do a basic graphic exploration:

# we use the ggplot 2 library
g <- ggplot(aes(x=YEAR, y=Produccion/1000), data = turbo_banano) + geom_bar(stat='identity') + labs(y='Banana Production [Ton x 1000]')
g + ggtitle("Evolution of Banana Production in Turbo from 2007 to 2018") + labs(caption= "Based on EMA data (DANE, 2018)")

Now, let’s investigate which crops had the largest harvested area in 2018:

datos %>% 
  filter(YEAR==2018) %>% 
  group_by(GRUPO) %>%
  summarize(sum_area_cosecha = sum(Area_Cosecha, na.rm = TRUE)) %>%
     arrange(desc(sum_area_cosecha)) -> total_area_cosecha

total_area_cosecha
## # A tibble: 11 x 2
##    GRUPO                                            sum_area_cosecha
##    <chr>                                                       <dbl>
##  1 OTROS PERMANENTES                                          153313
##  2 TUBERCULOS Y PLATANOS                                       71233
##  3 FRUTALES                                                    66233
##  4 CEREALES                                                    32996
##  5 LEGUMINOSAS                                                  8071
##  6 HORTALIZAS                                                   4693
##  7 OLEAGINOSAS                                                  2289
##  8 FLORES Y FOLLAJES                                            1946
##  9 FORESTALES                                                   1769
## 10 FIBRAS                                                        762
## 11 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDICINALES              147

We can see that other permanent crops had the largest share of harvested area in 2018 for Antioquia. Which crops belong to this GRUPO? You may look for more information at DANE, the government agency in charge of national statistics in Colombia.

We can also look for this information at the data themselves:

datos %>%
  filter(GRUPO=="OTROS PERMANENTES" & YEAR==2018) %>%
  group_by(CULTIVO) %>%
  summarize(sum_cosecha = sum(Area_Cosecha, na.rm = TRUE)) %>%
     arrange(desc(sum_cosecha)) -> total_cosecha


total_cosecha
## # A tibble: 3 x 2
##   CULTIVO       sum_cosecha
##   <chr>               <dbl>
## 1 CAFE                98038
## 2 CAÑA PANELERA       37989
## 3 CACAO               17286

Note that cafe is key in the Antioquian agriculture!

Now, let’s find which are the municipalites with largest harvested area for every permanent crop in 2018:

datos %>% 
  filter(YEAR==2018 & GRUPO=="OTROS PERMANENTES") %>% 
  group_by(CULTIVO, MUNICIPIO) %>%
  summarize(max_area2 = max(Area_Cosecha, na.rm = TRUE)) %>%
    slice(which.max(max_area2)) -> area_cosecha2

area_cosecha2
## # A tibble: 3 x 3
## # Groups:   CULTIVO [3]
##   CULTIVO       MUNICIPIO max_area2
##   <chr>         <chr>         <dbl>
## 1 CACAO         TURBO          1761
## 2 CAFE          ANDES          7648
## 3 CAÑA PANELERA YOLOMBO        5220

Let’s return to permanent crops data. Before plotting, we will need to add, to the total_area_cosecha object, a new field with abbreviatures for every GROUP of crops. Otherwise, the plot will look messy.

total_area_cosecha$CROP <- abbreviate(total_area_cosecha$GRUPO, 3)

Now, it"s plotting time:

# we use the ggplot 2 library
g <- ggplot(aes(x=CROP, y=sum_area_cosecha), data = total_area_cosecha) + geom_bar(stat='identity') + labs(y='Total Harvested Area [Ha]')
g+ ggtitle("Total harvested area by crop groups in 2018 for Antioquia") + theme(plot.title = element_text(hjust = 0.5)) +
   labs(caption= "Based on EMA data (DANE, 2018)")

You can further explore the statistics corresponding to your department. Try to gain insight about the performance of different crops, mainly those which are essential to the economy of your department.

4. Joining agricultural statistics to municipalities

I have already uploaded to RStudio Cloud the administrative data for Antioquia. As previously discussed on a virtual class, it is a good idea to use the shapefile corresponding to Marco Geoestadistico Departamental which is available at DANE Geoportal.

Let’s start reading the data using the sf library:

ant_munic <- sf::st_read("./antioquia/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp")
## Reading layer `MGN_MPIO_POLITICO' from data source `/cloud/project/antioquia/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp' using driver `ESRI Shapefile'
## Simple feature collection with 125 features and 9 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -77.12783 ymin: 5.418558 xmax: -73.88128 ymax: 8.873974
## CRS:            4326

What is ant_munic?

ant_munic
## Simple feature collection with 125 features and 9 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -77.12783 ymin: 5.418558 xmax: -73.88128 ymax: 8.873974
## CRS:            4326
## First 10 features:
##    DPTO_CCDGO MPIO_CCDGO     MPIO_CNMBR
## 1          05      05001       MEDELLÍN
## 2          05      05101 CIUDAD BOLÍVAR
## 3          05      05107        BRICEÑO
## 4          05      05113       BURITICÁ
## 5          05      05120        CÁCERES
## 6          05      05093        BETULIA
## 7          05      05091        BETANIA
## 8          05      05088          BELLO
## 9          05      05086        BELMIRA
## 10         05      05079        BARBOSA
##                                           MPIO_CRSLC MPIO_NAREA MPIO_NANO
## 1                                               1965   374.8280      2017
## 2                                               1869   260.4461      2017
## 3               Ordenanza 27 de Noviembre 26 de 1980   376.3468      2017
## 4                                               1812   355.2103      2017
## 5  Decreto departamental 160 del 16 de Marzo de 1903  1873.8033      2017
## 6  Decreto departamental 629 del 28 de Enero de 1884   262.3675      2017
## 7               Ordenanza 42 del 24 de Abril de 1920   180.5260      2017
## 8                Ordenanza 48 del 29 deAbril de 1913   147.7589      2017
## 9                                               1814   296.1532      2017
## 10                                              1812   205.6662      2017
##    DPTO_CNMBR Shape_Leng Shape_Area                       geometry
## 1   ANTIOQUIA  1.0327835 0.03060723 POLYGON ((-75.66974 6.37359...
## 2   ANTIOQUIA  0.7085039 0.02124224 POLYGON ((-76.04467 5.92774...
## 3   ANTIOQUIA  1.0044720 0.03078496 POLYGON ((-75.45818 7.22284...
## 4   ANTIOQUIA  0.9637233 0.02902757 POLYGON ((-75.90857 6.97378...
## 5   ANTIOQUIA  2.9333643 0.15350440 POLYGON ((-75.20358 7.95716...
## 6   ANTIOQUIA  0.8476756 0.02141352 POLYGON ((-76.00304 6.28171...
## 7   ANTIOQUIA  0.6923058 0.01472138 POLYGON ((-75.95474 5.79522...
## 8   ANTIOQUIA  0.6143227 0.01206804 POLYGON ((-75.58203 6.42510...
## 9   ANTIOQUIA  1.1730035 0.02420036 POLYGON ((-75.69252 6.75917...
## 10  ANTIOQUIA  0.7700180 0.01680390 POLYGON ((-75.32148 6.51265...

Note that ant_munic is a simple feature collection. Make sure to review this link for understanding what is a simple feature.

Note also that the data uses the WGS1984 geographic coordinate reference system (i.e. 4326 epsg code).

We can use the left_join function to join the municipalities and selected agricultural statistics. Let’s look for help to understand how this function works:

## don't do this inside the notebook
## do it in the R console
## ?left_join
##

We need a common attribute (or shared variable) to base the join on. The best attribute is an ID. In ant_munic, the MPIO_CCDGO attribute seems fine (it reads 05001 for Medellin). In datos, the corresponding attribute is COD_MUN (it reads 5001 for Medellin).

Let’s verify the latter statement:

datos %>% filter (MUNICIPIO =="MEDELLIN") ->  med_datos
med_datos
## # A tibble: 595 x 14
##    COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO  YEAR
##      <dbl> <chr>          <dbl> <chr>     <chr> <chr>    <chr>   <dbl>
##  1       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2007
##  2       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2008
##  3       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2009
##  4       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2010
##  5       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2011
##  6       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2012
##  7       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2013
##  8       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2014
##  9       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2015
## 10       5 ANTIOQUIA       5001 MEDELLIN  FRUT… AGUACATE AGUACA…  2016
## # … with 585 more rows, and 6 more variables: Area_Siembra <dbl>,
## #   Area_Cosecha <dbl>, Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>,
## #   CICLO <chr>
class(med_datos$C0D_MUN)
## [1] "numeric"

To be able to do the join, we need to change both data type and contents for the code which identifies every municipality. For this task, it is a good idea to create a copy of the original statistical data. Using this approach, any false movement will not spoil the original data.

Let’s proceed step by step:

datos2 <- datos
datos2$TEMP <-  as.character(datos2$C0D_MUN)
datos2$MPIO_CCDGO <- as.factor(paste(0, datos2$TEMP, sep=""))
head(datos2)
## # A tibble: 6 x 16
##   COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO  YEAR
##     <dbl> <chr>          <dbl> <chr>     <chr> <chr>    <chr>   <dbl>
## 1       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2016
## 2       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2016
## 3       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2017
## 4       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2017
## 5       5 ANTIOQUIA       5440 MARINILLA HORT… ACELGA   ACELGA   2018
## 6       5 ANTIOQUIA       5475 MURINDO   TUBE… MALANGA  MALANGA  2011
## # … with 8 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## #   Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>, TEMP <chr>,
## #   MPIO_CCDGO <fct>

Make sure to check, in the datos 2 object, characteristics of the new attribute MPIO_CCDGO.

Now, let’s filter a single year and select only two relevant attributes:

datos2 %>% filter(CULTIVO == "CAFE")  -> datos3
head(datos3)
## # A tibble: 6 x 16
##   COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO  YEAR
##     <dbl> <chr>          <dbl> <chr>     <chr> <chr>    <chr>   <dbl>
## 1       5 ANTIOQUIA       5101 CIUDAD B… OTRO… CAFE     CAFE     2007
## 2       5 ANTIOQUIA       5034 ANDES     OTRO… CAFE     CAFE     2007
## 3       5 ANTIOQUIA       5642 SALGAR    OTRO… CAFE     CAFE     2007
## 4       5 ANTIOQUIA       5209 CONCORDIA OTRO… CAFE     CAFE     2007
## 5       5 ANTIOQUIA       5091 BETANIA   OTRO… CAFE     CAFE     2007
## 6       5 ANTIOQUIA       5093 BETULIA   OTRO… CAFE     CAFE     2007
## # … with 8 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## #   Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>, TEMP <chr>,
## #   MPIO_CCDGO <fct>
class(datos3)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"
datos4 <- datos3 %>% dplyr::select(MUNICIPIO, MPIO_CCDGO, YEAR, Produccion, Rendimiento) 
datos4
## # A tibble: 1,099 x 5
##    MUNICIPIO      MPIO_CCDGO  YEAR Produccion Rendimiento
##    <chr>          <fct>      <dbl>      <dbl>       <dbl>
##  1 CIUDAD BOLIVAR 05101       2007       9935        1.17
##  2 ANDES          05034       2007      13574        1.6 
##  3 SALGAR         05642       2007       4720        1.12
##  4 CONCORDIA      05209       2007       6182        1.23
##  5 BETANIA        05091       2007       5482        1.01
##  6 BETULIA        05093       2007       4826        1.06
##  7 ABEJORRAL      05002       2007       2329        0.61
##  8 SONSON         05756       2007       3600        1.2 
##  9 ITUANGO        05361       2007       3110        1.08
## 10 NARIÑO         05483       2007       2643        1   
## # … with 1,089 more rows
datos4 %>% 
  gather("YEAR", "Produccion", "Rendimiento" , key = variable, value = number)
## # A tibble: 3,297 x 4
##    MUNICIPIO      MPIO_CCDGO variable number
##    <chr>          <fct>      <chr>     <dbl>
##  1 CIUDAD BOLIVAR 05101      YEAR       2007
##  2 ANDES          05034      YEAR       2007
##  3 SALGAR         05642      YEAR       2007
##  4 CONCORDIA      05209      YEAR       2007
##  5 BETANIA        05091      YEAR       2007
##  6 BETULIA        05093      YEAR       2007
##  7 ABEJORRAL      05002      YEAR       2007
##  8 SONSON         05756      YEAR       2007
##  9 ITUANGO        05361      YEAR       2007
## 10 NARIÑO         05483      YEAR       2007
## # … with 3,287 more rows
head(datos4)
## # A tibble: 6 x 5
##   MUNICIPIO      MPIO_CCDGO  YEAR Produccion Rendimiento
##   <chr>          <fct>      <dbl>      <dbl>       <dbl>
## 1 CIUDAD BOLIVAR 05101       2007       9935        1.17
## 2 ANDES          05034       2007      13574        1.6 
## 3 SALGAR         05642       2007       4720        1.12
## 4 CONCORDIA      05209       2007       6182        1.23
## 5 BETANIA        05091       2007       5482        1.01
## 6 BETULIA        05093       2007       4826        1.06

This is a key task. It involves several steps to be able to convert the attribute table from long format into wide format. More information about these steps here.

datos4 %>% 
  group_by(MPIO_CCDGO) %>% 
  mutate(Visit = 1:n()) %>% 
  gather("YEAR", "Produccion", "Rendimiento", key = variable, value = number) %>% 
  unite(combi, variable, Visit) %>%
  spread(combi, number) -> datos5
head(datos5)
## # A tibble: 6 x 38
## # Groups:   MPIO_CCDGO [125]
##   MUNICIPIO MPIO_CCDGO Produccion_1 Produccion_10 Produccion_11 Produccion_12
##   <chr>     <fct>             <dbl>         <dbl>         <dbl>         <dbl>
## 1 ABEJORRAL 05002              2329          3352          5018          4718
## 2 ABRIAQUI  05004                81           114           182           193
## 3 ALEJANDR… 05021               733           682           788           786
## 4 AMAGA     05030               570           643           822           791
## 5 AMALFI    05031              1439          1825          1194          1212
## 6 ANDES     05034             13574         10029          9463          9888
## # … with 32 more variables: Produccion_2 <dbl>, Produccion_3 <dbl>,
## #   Produccion_4 <dbl>, Produccion_5 <dbl>, Produccion_6 <dbl>,
## #   Produccion_7 <dbl>, Produccion_8 <dbl>, Produccion_9 <dbl>,
## #   Rendimiento_1 <dbl>, Rendimiento_10 <dbl>, Rendimiento_11 <dbl>,
## #   Rendimiento_12 <dbl>, Rendimiento_2 <dbl>, Rendimiento_3 <dbl>,
## #   Rendimiento_4 <dbl>, Rendimiento_5 <dbl>, Rendimiento_6 <dbl>,
## #   Rendimiento_7 <dbl>, Rendimiento_8 <dbl>, Rendimiento_9 <dbl>,
## #   YEAR_1 <dbl>, YEAR_10 <dbl>, YEAR_11 <dbl>, YEAR_12 <dbl>, YEAR_2 <dbl>,
## #   YEAR_3 <dbl>, YEAR_4 <dbl>, YEAR_5 <dbl>, YEAR_6 <dbl>, YEAR_7 <dbl>,
## #   YEAR_8 <dbl>, YEAR_9 <dbl>
tail(datos5)
## # A tibble: 6 x 38
## # Groups:   MPIO_CCDGO [125]
##   MUNICIPIO MPIO_CCDGO Produccion_1 Produccion_10 Produccion_11 Produccion_12
##   <chr>     <fct>             <dbl>         <dbl>         <dbl>         <dbl>
## 1 VALPARAI… 05856               734           545           497           515
## 2 VEGACHI   05858               619           383           222           216
## 3 VENECIA   05861               586           718           869           880
## 4 YALI      05885                64           314           172           154
## 5 YARUMAL   05887               350           309           341           332
## 6 YOLOMBO   05890              1360          1337          1168          1190
## # … with 32 more variables: Produccion_2 <dbl>, Produccion_3 <dbl>,
## #   Produccion_4 <dbl>, Produccion_5 <dbl>, Produccion_6 <dbl>,
## #   Produccion_7 <dbl>, Produccion_8 <dbl>, Produccion_9 <dbl>,
## #   Rendimiento_1 <dbl>, Rendimiento_10 <dbl>, Rendimiento_11 <dbl>,
## #   Rendimiento_12 <dbl>, Rendimiento_2 <dbl>, Rendimiento_3 <dbl>,
## #   Rendimiento_4 <dbl>, Rendimiento_5 <dbl>, Rendimiento_6 <dbl>,
## #   Rendimiento_7 <dbl>, Rendimiento_8 <dbl>, Rendimiento_9 <dbl>,
## #   YEAR_1 <dbl>, YEAR_10 <dbl>, YEAR_11 <dbl>, YEAR_12 <dbl>, YEAR_2 <dbl>,
## #   YEAR_3 <dbl>, YEAR_4 <dbl>, YEAR_5 <dbl>, YEAR_6 <dbl>, YEAR_7 <dbl>,
## #   YEAR_8 <dbl>, YEAR_9 <dbl>

We will also make a copy of the simple features collection (again, just in case of a false move):

ant_munic2 <- ant_munic

Now, it’s join time:

ant_munic_stat = left_join(ant_munic2, datos5, by="MPIO_CCDGO")
summary(ant_munic_stat)
##  DPTO_CCDGO   MPIO_CCDGO       MPIO_CNMBR    MPIO_CRSLC    MPIO_NAREA     
##  05:125     05001  :  1   ABEJORRAL :  1   1814   :  9   Min.   :  15.84  
##             05002  :  1   ABRIAQUÍ  :  1   1833   :  4   1st Qu.: 140.35  
##             05004  :  1   ALEJANDRÍA:  1   1812   :  3   Median : 258.09  
##             05021  :  1   AMAGÁ     :  1   1813   :  2   Mean   : 503.74  
##             05030  :  1   AMALFI    :  1   1817   :  2   3rd Qu.: 535.18  
##             05031  :  1   ANDES     :  1   1821   :  2   Max.   :2959.36  
##             (Other):119   (Other)   :119   (Other):103                    
##    MPIO_NANO        DPTO_CNMBR    Shape_Leng       Shape_Area      
##  Min.   :2017   ANTIOQUIA:125   Min.   :0.1730   Min.   :0.001293  
##  1st Qu.:2017                   1st Qu.:0.6143   1st Qu.:0.011454  
##  Median :2017                   Median :0.8907   Median :0.021059  
##  Mean   :2017                   Mean   :1.1800   Mean   :0.041079  
##  3rd Qu.:2017                   3rd Qu.:1.4197   3rd Qu.:0.043801  
##  Max.   :2017                   Max.   :6.6118   Max.   :0.238949  
##                                                                    
##   MUNICIPIO          Produccion_1     Produccion_10   Produccion_11    
##  Length:125         Min.   :    0.0   Min.   :   17   Min.   :   14.0  
##  Class :character   1st Qu.:  227.2   1st Qu.:  271   1st Qu.:  300.2  
##  Mode  :character   Median :  733.5   Median :  718   Median :  871.0  
##                     Mean   : 1282.8   Mean   : 1346   Mean   : 1593.4  
##                     3rd Qu.: 1371.2   3rd Qu.: 1607   3rd Qu.: 1927.0  
##                     Max.   :13574.0   Max.   :10029   Max.   :11249.0  
##                     NA's   :31        NA's   :36      NA's   :37       
##  Produccion_12      Produccion_2      Produccion_3      Produccion_4    
##  Min.   :   13.0   Min.   :    0.0   Min.   :   11.0   Min.   :   13.0  
##  1st Qu.:  295.0   1st Qu.:  262.8   1st Qu.:  203.8   1st Qu.:  238.2  
##  Median :  902.5   Median :  707.0   Median :  541.0   Median :  694.5  
##  Mean   : 1610.4   Mean   : 1208.7   Mean   : 1104.5   Mean   : 1291.3  
##  3rd Qu.: 1844.2   3rd Qu.: 1217.5   3rd Qu.: 1252.8   3rd Qu.: 1410.8  
##  Max.   :11439.0   Max.   :13574.0   Max.   :13594.0   Max.   :13437.0  
##  NA's   :37        NA's   :31        NA's   :31        NA's   :31       
##   Produccion_5      Produccion_6     Produccion_7      Produccion_8    
##  Min.   :   10.0   Min.   :   0.0   Min.   :   12.0   Min.   :   12.0  
##  1st Qu.:  224.0   1st Qu.: 193.5   1st Qu.:  179.5   1st Qu.:  182.0  
##  Median :  611.5   Median : 525.5   Median :  495.0   Median :  543.5  
##  Mean   : 1228.0   Mean   : 976.6   Mean   : 1124.9   Mean   : 1237.3  
##  3rd Qu.: 1295.2   3rd Qu.:1259.2   3rd Qu.: 1139.5   3rd Qu.: 1335.0  
##  Max.   :10344.0   Max.   :5943.0   Max.   :10080.0   Max.   :10296.0  
##  NA's   :31        NA's   :31       NA's   :34        NA's   :35       
##   Produccion_9   Rendimiento_1    Rendimiento_10  Rendimiento_11 
##  Min.   :   17   Min.   :0.4300   Min.   :0.620   Min.   :0.660  
##  1st Qu.:  264   1st Qu.:0.8800   1st Qu.:1.030   1st Qu.:1.010  
##  Median :  716   Median :1.0000   Median :1.080   Median :1.260  
##  Mean   : 1351   Mean   :0.9796   Mean   :1.087   Mean   :1.289  
##  3rd Qu.: 1601   3rd Qu.:1.1200   3rd Qu.:1.120   3rd Qu.:1.520  
##  Max.   :10097   Max.   :1.6000   Max.   :1.830   Max.   :1.820  
##  NA's   :36      NA's   :32       NA's   :36      NA's   :37     
##  Rendimiento_12  Rendimiento_2    Rendimiento_3    Rendimiento_4  
##  Min.   :0.620   Min.   :0.3800   Min.   :0.3800   Min.   :0.250  
##  1st Qu.:1.030   1st Qu.:0.7800   1st Qu.:0.6350   1st Qu.:0.800  
##  Median :1.290   Median :0.9500   Median :0.8950   Median :1.000  
##  Mean   :1.316   Mean   :0.9231   Mean   :0.8534   Mean   :1.004  
##  3rd Qu.:1.550   3rd Qu.:1.0900   3rd Qu.:1.0200   3rd Qu.:1.238  
##  Max.   :1.870   Max.   :1.6000   Max.   :1.6000   Max.   :1.600  
##  NA's   :37      NA's   :32       NA's   :31       NA's   :31     
##  Rendimiento_5    Rendimiento_6    Rendimiento_7    Rendimiento_8   
##  Min.   :0.2500   Min.   :0.4000   Min.   :0.3800   Min.   :0.4100  
##  1st Qu.:0.8025   1st Qu.:0.7500   1st Qu.:0.6300   1st Qu.:0.6800  
##  Median :0.9600   Median :0.7500   Median :0.7500   Median :0.8100  
##  Mean   :0.9805   Mean   :0.8625   Mean   :0.7929   Mean   :0.8461  
##  3rd Qu.:1.2000   3rd Qu.:1.0000   3rd Qu.:0.9400   3rd Qu.:1.0100  
##  Max.   :2.0000   Max.   :1.5000   Max.   :1.5000   Max.   :1.4900  
##  NA's   :31       NA's   :32       NA's   :34       NA's   :35      
##  Rendimiento_9       YEAR_1        YEAR_10        YEAR_11        YEAR_12    
##  Min.   :0.660   Min.   :2007   Min.   :2016   Min.   :2017   Min.   :2018  
##  1st Qu.:0.990   1st Qu.:2007   1st Qu.:2016   1st Qu.:2017   1st Qu.:2018  
##  Median :1.040   Median :2007   Median :2016   Median :2017   Median :2018  
##  Mean   :1.051   Mean   :2007   Mean   :2016   Mean   :2017   Mean   :2018  
##  3rd Qu.:1.090   3rd Qu.:2007   3rd Qu.:2016   3rd Qu.:2017   3rd Qu.:2018  
##  Max.   :1.770   Max.   :2013   Max.   :2018   Max.   :2017   Max.   :2018  
##  NA's   :36      NA's   :31     NA's   :36     NA's   :37     NA's   :37    
##      YEAR_2         YEAR_3         YEAR_4         YEAR_5         YEAR_6    
##  Min.   :2008   Min.   :2009   Min.   :2010   Min.   :2011   Min.   :2012  
##  1st Qu.:2008   1st Qu.:2009   1st Qu.:2010   1st Qu.:2011   1st Qu.:2012  
##  Median :2008   Median :2009   Median :2010   Median :2011   Median :2012  
##  Mean   :2008   Mean   :2009   Mean   :2010   Mean   :2011   Mean   :2012  
##  3rd Qu.:2008   3rd Qu.:2009   3rd Qu.:2010   3rd Qu.:2011   3rd Qu.:2012  
##  Max.   :2014   Max.   :2015   Max.   :2016   Max.   :2017   Max.   :2018  
##  NA's   :31     NA's   :31     NA's   :31     NA's   :31     NA's   :31    
##      YEAR_7         YEAR_8         YEAR_9              geometry  
##  Min.   :2013   Min.   :2014   Min.   :2015   POLYGON      :125  
##  1st Qu.:2013   1st Qu.:2014   1st Qu.:2015   epsg:4326    :  0  
##  Median :2013   Median :2014   Median :2015   +proj=long...:  0  
##  Mean   :2013   Mean   :2014   Mean   :2015                      
##  3rd Qu.:2013   3rd Qu.:2014   3rd Qu.:2015                      
##  Max.   :2018   Max.   :2018   Max.   :2017                      
##  NA's   :34     NA's   :35     NA's   :36

5. Plotting

#install.packages("RColorBrewer")
library(RColorBrewer)
library(leaflet)

Let’s plot the municipalities with their corresponding coffee production for a single year:

bins <- c(0, 250, 500, 1000, 2000, 5000, 10000, 15000)
pal <- colorBin("YlOrRd", domain = ant_munic_stat$Produccion_12, bins = bins)

  mapa <- leaflet(data = ant_munic_stat) %>%
  addTiles() %>%
  addPolygons(label = ~Produccion_12,
              popup = ~MPIO_CNMBR,
              fillColor = ~pal(Produccion_12),
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.5,
              highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
              ) %>%
  addProviderTiles(providers$OpenStreetMap) %>%
  addLegend("bottomright", pal = pal, values = ~Produccion_12,
    title = "Coffee production in Antioquia [Ton] (2018)",
    opacity = 1
  )
mapa

Note that I used both labels and popups in this map. In case you want to learn more about drawing interactive maps with leaflet, follow this link.

Now, it’s your time for replicationg, adapting and improving this notebook for exploring agricultural statistics for your department.

Good luck!!!

sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/atlas-base/atlas/libblas.so.3.0
## LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] leaflet_2.0.3       RColorBrewer_1.1-2  cowplot_1.0.0      
##  [4] ggrepel_0.8.2       GSODR_2.0.1         rnaturalearth_0.1.0
##  [7] viridis_0.5.1       viridisLite_0.3.0   sf_0.9-0           
## [10] raster_3.0-12       maptools_0.9-9      rgeos_0.5-2        
## [13] sp_1.4-1            forcats_0.5.0       stringr_1.4.0      
## [16] dplyr_0.8.5         purrr_0.3.3         readr_1.3.1        
## [19] tidyr_1.0.2         tibble_2.1.3        ggplot2_3.3.0      
## [22] tidyverse_1.3.0     here_0.1           
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.1              jsonlite_1.6.1          modelr_0.1.6           
##  [4] assertthat_0.2.1        cellranger_1.1.0        yaml_2.2.1             
##  [7] pillar_1.4.3            backports_1.1.5         lattice_0.20-38        
## [10] glue_1.3.2              digest_0.6.25           rvest_0.3.5            
## [13] leaflet.providers_1.9.0 colorspace_1.4-1        htmltools_0.4.0        
## [16] pkgconfig_2.0.3         broom_0.5.5             haven_2.2.0            
## [19] scales_1.1.0            farver_2.0.3            generics_0.0.2         
## [22] ellipsis_0.3.0          withr_2.1.2             cli_2.0.2              
## [25] magrittr_1.5            crayon_1.3.4            readxl_1.3.1           
## [28] evaluate_0.14           fs_1.3.2                fansi_0.4.1            
## [31] nlme_3.1-139            xml2_1.2.5              foreign_0.8-71         
## [34] class_7.3-15            tools_3.6.0             data.table_1.12.8      
## [37] hms_0.5.3               lifecycle_0.2.0         munsell_0.5.0          
## [40] reprex_0.3.0            compiler_3.6.0          e1071_1.7-3            
## [43] rlang_0.4.5             classInt_0.4-2          units_0.6-6            
## [46] grid_3.6.0              rstudioapi_0.11         htmlwidgets_1.5.1      
## [49] crosstalk_1.1.0.1       labeling_0.3            rmarkdown_2.1          
## [52] gtable_0.3.0            codetools_0.2-16        DBI_1.1.0              
## [55] R6_2.4.1                gridExtra_2.3           lubridate_1.7.4        
## [58] knitr_1.28              utf8_1.1.4              rprojroot_1.3-2        
## [61] KernSmooth_2.23-15      stringi_1.4.6           Rcpp_1.0.3             
## [64] vctrs_0.2.4             dbplyr_1.4.2            tidyselect_1.0.0       
## [67] xfun_0.12