This is an R Markdown Notebook illustrating estadisticas agricolas for the Antioquia departamento in Colombia. It has been created, compiled and published from RStudio Cloud. You are advised to use your own RStudio local installation, if you have one. Otherwise, do it on the cloud.
This notebook aims to help Geomatica Basica students at Universidad Nacional to understand basic geomatics concepts useful for agronomy. Every student should replicate this notebook BUT adapting its contents to her/his department.
The due date for publishing this notebook is on 22th April 2020 at 11:59 am. Its publication at RPubs counts as the first technical report for the course. Its mark represents 15% of the final grade.
Exploration of non-spatial statistics is essential for understanding what is going on at territories. Several R libraries, in particular dplyr and tidyverse, are very useful for exploring and summarizing statistics.
On the another hand, geospatial operations can improve our understanding of several issues affecting geographic regions. For example, you want to figure out what is the location of those municipalities whose crop yields are outstanding (or, alternatively, lower than the average). For doing such exploration, we need to join non-spatial data with spatial data. We did already this task using QGIS. Now, we will do it with R.
In addition, we could also explore spatial joins. These operations are based on the intersection between two spatial objects, often points and the polygons. There are many ways we can join objects, which may include specific options like crosses,near, within, touches, etc. The point being not to push you, we can do this in another R Notebook!
Let’s start by removing the memory’s contents:
rm(list=ls())
Now, let’s install the libraries we need. Note that, in the following chunk, any package is installed only if it has not been previously installed.
list.of.packages <- c("here", "tidyverse", "rgeos", "maptools", "raster", "sf", "viridis", "rnaturalearth", "GSODR", "ggrepel", "cowplot")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
Now, let’s load the libraries.
library(here)
## here() starts at /cloud/project
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(rgeos)
## Loading required package: sp
## rgeos version: 0.5-2, (SVN revision 621)
## GEOS runtime version: 3.5.1-CAPI-1.9.1
## Linking to sp version: 1.4-1
## Polygon checking: TRUE
library(maptools)
## Checking rgeos availability: TRUE
library(raster)
##
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
##
## select
## The following object is masked from 'package:tidyr':
##
## extract
library(sf)
## Linking to GEOS 3.5.1, GDAL 2.2.2, PROJ 4.9.2
library(viridis)
## Loading required package: viridisLite
library(rnaturalearth)
library(GSODR)
library(ggrepel)
library(cowplot)
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
Previously, I have downloaded statistical data, in csv format, on Estadisticas Municipales Agropecuarias to my computer. Then, I have used excel to eliminate rows for municipalities in departments different than Antioquia. I saved the file with the remaining rows as a local file with name EVA_Antioquia.csv at my computer. Then, I uploaded the file into the agro folder at RStudio Cloud.
Let’s read the csv file with “estadisticas municipales agropecuarias” for Antioquia.
datos <- read_csv("./agro/EVA_Antioquia.csv")
## Parsed with column specification:
## cols(
## COD_DEP = col_double(),
## DEPARTAMENTO = col_character(),
## C0D_MUN = col_double(),
## MUNICIPIO = col_character(),
## GRUPO = col_character(),
## SUBGRUPO = col_character(),
## CULTIVO = col_character(),
## YEAR = col_double(),
## Area_Siembra = col_double(),
## Area_Cosecha = col_double(),
## Produccion = col_double(),
## Rendimiento = col_double(),
## ESTADO = col_character(),
## CICLO = col_character()
## )
Let’s check what are the attributes of datos.
head(datos)
## # A tibble: 6 x 14
## COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO YEAR
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2016
## 2 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2016
## 3 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2017
## 4 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2017
## 5 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2018
## 6 5 ANTIOQUIA 5475 MURINDO TUBE… MALANGA MALANGA 2011
## # … with 6 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## # Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>
tail(datos)
## # A tibble: 6 x 14
## COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO YEAR
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 5 ANTIOQUIA 5761 SOPETRAN FRUT… ZAPOTE ZAPOTE 2017
## 2 5 ANTIOQUIA 5854 VALDIVIA FRUT… ZAPOTE ZAPOTE 2017
## 3 5 ANTIOQUIA 5142 CARACOLI FRUT… ZAPOTE ZAPOTE 2017
## 4 5 ANTIOQUIA 5761 SOPETRAN FRUT… ZAPOTE ZAPOTE 2018
## 5 5 ANTIOQUIA 5854 VALDIVIA FRUT… ZAPOTE ZAPOTE 2018
## 6 5 ANTIOQUIA 5142 CARACOLI FRUT… ZAPOTE ZAPOTE 2018
## # … with 6 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## # Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>
Note that every municipality has statistics about area sembrada, area cosechada, y rendimiento for different crops in different years. The attribute SUBGRUPO and CULTIVO seem to refer to the same thing (i.e. a crop). Crops are further classified into a given GRUPO.
In this table, we do not have units. However, if we check the original csv file, we find that area units are hectares and that rendimiento units are Ton/ha
We will use the dplyr library to explore the contents of the datos object.
First, let’s get a summary of rendimiento (i.e. the average yield during several years) by grupo and municipio:
datos %>%
group_by(MUNICIPIO, GRUPO) %>%
summarise(rend_prom = mean(Rendimiento, na.rm = TRUE)) -> rend_resumen
### Let's visualize only the first six records
head(rend_resumen)
## # A tibble: 6 x 3
## # Groups: MUNICIPIO [1]
## MUNICIPIO GRUPO rend_prom
## <chr> <chr> <dbl>
## 1 ABEJORRAL CEREALES 3.63
## 2 ABEJORRAL FIBRAS NaN
## 3 ABEJORRAL FLORES Y FOLLAJES 17.3
## 4 ABEJORRAL FRUTALES 11.7
## 5 ABEJORRAL HORTALIZAS 110
## 6 ABEJORRAL LEGUMINOSAS 1.41
We can also calculate the average yield per GRUPO at Antioquia’s municipalities:
datos %>%
group_by(GRUPO) %>%
summarise(rend_dep = mean(Rendimiento, na.rm = TRUE)) -> rend_antioquia
rend_antioquia
## # A tibble: 11 x 2
## GRUPO rend_dep
## <chr> <dbl>
## 1 CEREALES 1.87
## 2 FIBRAS 1.50
## 3 FLORES Y FOLLAJES 9.59
## 4 FORESTALES 1.06
## 5 FRUTALES 14.0
## 6 HORTALIZAS 39.5
## 7 LEGUMINOSAS 1.36
## 8 OLEAGINOSAS 2.86
## 9 OTROS PERMANENTES 2.01
## 10 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDICINALES 4.34
## 11 TUBERCULOS Y PLATANOS 10.3
Note that the highest yields correspond to HORTALIZAS, FRUTALES and TUBERCULOS Y PLATANOS.
Then, let’s find what are the municipalites with largest yield for every group of crops in 2018:
datos %>%
filter(YEAR==2018) %>%
group_by(GRUPO, MUNICIPIO) %>%
summarize(max_rend = max(Rendimiento, na.rm = TRUE)) %>%
slice(which.max(max_rend)) -> rend_max_18
rend_max_18
## # A tibble: 11 x 3
## # Groups: GRUPO [11]
## GRUPO MUNICIPIO max_rend
## <chr> <chr> <dbl>
## 1 CEREALES LA UNION 33.6
## 2 FIBRAS GOMEZ PLATA 2.6
## 3 FLORES Y FOLLAJES GUARNE 35
## 4 FORESTALES CACERES 2
## 5 FRUTALES NECOCLI 99
## 6 HORTALIZAS GUATAPE 192.
## 7 LEGUMINOSAS PEÑOL 9
## 8 OLEAGINOSAS DABEIBA 7
## 9 OTROS PERMANENTES BRICEÑO 28
## 10 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDICINALES RIONEGRO 15.2
## 11 TUBERCULOS Y PLATANOS URRAO 40.5
Now, let’s find what are the municipalites with largest harvested area for every group of crops in 2018:
datos %>%
filter(YEAR==2018) %>%
group_by(GRUPO, MUNICIPIO) %>%
summarize(max_area_cosecha = max(Area_Cosecha, na.rm = TRUE)) %>%
slice(which.max(max_area_cosecha)) -> area_cosecha_max
area_cosecha_max
## # A tibble: 11 x 3
## # Groups: GRUPO [11]
## GRUPO MUNICIPIO max_area_cosecha
## <chr> <chr> <dbl>
## 1 CEREALES NECOCLI 3900
## 2 FIBRAS URRAO 157
## 3 FLORES Y FOLLAJES CARMEN DE VIBOR… 678
## 4 FORESTALES TARAZA 760
## 5 FRUTALES TURBO 10560
## 6 HORTALIZAS CARMEN DE VIBOR… 1000
## 7 LEGUMINOSAS SAN VICENTE 911
## 8 OLEAGINOSAS MUTATA 700
## 9 OTROS PERMANENTES ANDES 7648
## 10 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDIC… RIONEGRO 52
## 11 TUBERCULOS Y PLATANOS TURBO 9441
Note that the larget harvested area in 2018 corresponded to FRUTALES in TURB0 (Urabá). You may know that Turbo’s economy is based on banana crop and tourism industry.
First, let’s select banana’s production (Tons) in Turbo for every year:
datos %>%
filter(MUNICIPIO=="TURBO" & SUBGRUPO=="BANANO") %>%
group_by(YEAR, CULTIVO) -> turbo_banano
turbo_banano
## # A tibble: 18 x 14
## # Groups: YEAR, CULTIVO [18]
## COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO YEAR
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2007
## 2 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2008
## 3 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2009
## 4 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2010
## 5 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2011
## 6 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2012
## 7 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2013
## 8 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2014
## 9 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2015
## 10 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2016
## 11 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2017
## 12 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANO 2018
## 13 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANI… 2013
## 14 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANI… 2014
## 15 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANI… 2015
## 16 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANI… 2016
## 17 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANI… 2017
## 18 5 ANTIOQUIA 5837 TURBO FRUT… BANANO BANANI… 2018
## # … with 6 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## # Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>
Let’s do a basic graphic exploration:
# we use the ggplot 2 library
g <- ggplot(aes(x=YEAR, y=Produccion/1000), data = turbo_banano) + geom_bar(stat='identity') + labs(y='Banana Production [Ton x 1000]')
g + ggtitle("Evolution of Banana Production in Turbo from 2007 to 2018") + labs(caption= "Based on EMA data (DANE, 2018)")
Now, let’s investigate which crops had the largest harvested area in 2018:
datos %>%
filter(YEAR==2018) %>%
group_by(GRUPO) %>%
summarize(sum_area_cosecha = sum(Area_Cosecha, na.rm = TRUE)) %>%
arrange(desc(sum_area_cosecha)) -> total_area_cosecha
total_area_cosecha
## # A tibble: 11 x 2
## GRUPO sum_area_cosecha
## <chr> <dbl>
## 1 OTROS PERMANENTES 153313
## 2 TUBERCULOS Y PLATANOS 71233
## 3 FRUTALES 66233
## 4 CEREALES 32996
## 5 LEGUMINOSAS 8071
## 6 HORTALIZAS 4693
## 7 OLEAGINOSAS 2289
## 8 FLORES Y FOLLAJES 1946
## 9 FORESTALES 1769
## 10 FIBRAS 762
## 11 PLANTAS AROMATICAS, CONDIMENTARIAS Y MEDICINALES 147
We can see that other permanent crops had the largest share of harvested area in 2018 for Antioquia. Which crops belong to this GRUPO? You may look for more information at DANE, the government agency in charge of national statistics in Colombia.
We can also look for this information at the data themselves:
datos %>%
filter(GRUPO=="OTROS PERMANENTES" & YEAR==2018) %>%
group_by(CULTIVO) %>%
summarize(sum_cosecha = sum(Area_Cosecha, na.rm = TRUE)) %>%
arrange(desc(sum_cosecha)) -> total_cosecha
total_cosecha
## # A tibble: 3 x 2
## CULTIVO sum_cosecha
## <chr> <dbl>
## 1 CAFE 98038
## 2 CAÑA PANELERA 37989
## 3 CACAO 17286
Note that cafe is key in the Antioquian agriculture!
Now, let’s find which are the municipalites with largest harvested area for every permanent crop in 2018:
datos %>%
filter(YEAR==2018 & GRUPO=="OTROS PERMANENTES") %>%
group_by(CULTIVO, MUNICIPIO) %>%
summarize(max_area2 = max(Area_Cosecha, na.rm = TRUE)) %>%
slice(which.max(max_area2)) -> area_cosecha2
area_cosecha2
## # A tibble: 3 x 3
## # Groups: CULTIVO [3]
## CULTIVO MUNICIPIO max_area2
## <chr> <chr> <dbl>
## 1 CACAO TURBO 1761
## 2 CAFE ANDES 7648
## 3 CAÑA PANELERA YOLOMBO 5220
Let’s return to permanent crops data. Before plotting, we will need to add, to the total_area_cosecha object, a new field with abbreviatures for every GROUP of crops. Otherwise, the plot will look messy.
total_area_cosecha$CROP <- abbreviate(total_area_cosecha$GRUPO, 3)
Now, it"s plotting time:
# we use the ggplot 2 library
g <- ggplot(aes(x=CROP, y=sum_area_cosecha), data = total_area_cosecha) + geom_bar(stat='identity') + labs(y='Total Harvested Area [Ha]')
g+ ggtitle("Total harvested area by crop groups in 2018 for Antioquia") + theme(plot.title = element_text(hjust = 0.5)) +
labs(caption= "Based on EMA data (DANE, 2018)")
You can further explore the statistics corresponding to your department. Try to gain insight about the performance of different crops, mainly those which are essential to the economy of your department.
I have already uploaded to RStudio Cloud the administrative data for Antioquia. As previously discussed on a virtual class, it is a good idea to use the shapefile corresponding to Marco Geoestadistico Departamental which is available at DANE Geoportal.
Let’s start reading the data using the sf library:
ant_munic <- sf::st_read("./antioquia/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp")
## Reading layer `MGN_MPIO_POLITICO' from data source `/cloud/project/antioquia/ADMINISTRATIVO/MGN_MPIO_POLITICO.shp' using driver `ESRI Shapefile'
## Simple feature collection with 125 features and 9 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -77.12783 ymin: 5.418558 xmax: -73.88128 ymax: 8.873974
## CRS: 4326
What is ant_munic?
ant_munic
## Simple feature collection with 125 features and 9 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -77.12783 ymin: 5.418558 xmax: -73.88128 ymax: 8.873974
## CRS: 4326
## First 10 features:
## DPTO_CCDGO MPIO_CCDGO MPIO_CNMBR
## 1 05 05001 MEDELLÍN
## 2 05 05101 CIUDAD BOLÍVAR
## 3 05 05107 BRICEÑO
## 4 05 05113 BURITICÁ
## 5 05 05120 CÁCERES
## 6 05 05093 BETULIA
## 7 05 05091 BETANIA
## 8 05 05088 BELLO
## 9 05 05086 BELMIRA
## 10 05 05079 BARBOSA
## MPIO_CRSLC MPIO_NAREA MPIO_NANO
## 1 1965 374.8280 2017
## 2 1869 260.4461 2017
## 3 Ordenanza 27 de Noviembre 26 de 1980 376.3468 2017
## 4 1812 355.2103 2017
## 5 Decreto departamental 160 del 16 de Marzo de 1903 1873.8033 2017
## 6 Decreto departamental 629 del 28 de Enero de 1884 262.3675 2017
## 7 Ordenanza 42 del 24 de Abril de 1920 180.5260 2017
## 8 Ordenanza 48 del 29 deAbril de 1913 147.7589 2017
## 9 1814 296.1532 2017
## 10 1812 205.6662 2017
## DPTO_CNMBR Shape_Leng Shape_Area geometry
## 1 ANTIOQUIA 1.0327835 0.03060723 POLYGON ((-75.66974 6.37359...
## 2 ANTIOQUIA 0.7085039 0.02124224 POLYGON ((-76.04467 5.92774...
## 3 ANTIOQUIA 1.0044720 0.03078496 POLYGON ((-75.45818 7.22284...
## 4 ANTIOQUIA 0.9637233 0.02902757 POLYGON ((-75.90857 6.97378...
## 5 ANTIOQUIA 2.9333643 0.15350440 POLYGON ((-75.20358 7.95716...
## 6 ANTIOQUIA 0.8476756 0.02141352 POLYGON ((-76.00304 6.28171...
## 7 ANTIOQUIA 0.6923058 0.01472138 POLYGON ((-75.95474 5.79522...
## 8 ANTIOQUIA 0.6143227 0.01206804 POLYGON ((-75.58203 6.42510...
## 9 ANTIOQUIA 1.1730035 0.02420036 POLYGON ((-75.69252 6.75917...
## 10 ANTIOQUIA 0.7700180 0.01680390 POLYGON ((-75.32148 6.51265...
Note that ant_munic is a simple feature collection. Make sure to review this link for understanding what is a simple feature.
Note also that the data uses the WGS1984 geographic coordinate reference system (i.e. 4326 epsg code).
We can use the left_join function to join the municipalities and selected agricultural statistics. Let’s look for help to understand how this function works:
## don't do this inside the notebook
## do it in the R console
## ?left_join
##
We need a common attribute (or shared variable) to base the join on. The best attribute is an ID. In ant_munic, the MPIO_CCDGO attribute seems fine (it reads 05001 for Medellin). In datos, the corresponding attribute is COD_MUN (it reads 5001 for Medellin).
Let’s verify the latter statement:
datos %>% filter (MUNICIPIO =="MEDELLIN") -> med_datos
med_datos
## # A tibble: 595 x 14
## COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO YEAR
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2007
## 2 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2008
## 3 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2009
## 4 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2010
## 5 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2011
## 6 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2012
## 7 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2013
## 8 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2014
## 9 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2015
## 10 5 ANTIOQUIA 5001 MEDELLIN FRUT… AGUACATE AGUACA… 2016
## # … with 585 more rows, and 6 more variables: Area_Siembra <dbl>,
## # Area_Cosecha <dbl>, Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>,
## # CICLO <chr>
class(med_datos$C0D_MUN)
## [1] "numeric"
To be able to do the join, we need to change both data type and contents for the code which identifies every municipality. For this task, it is a good idea to create a copy of the original statistical data. Using this approach, any false movement will not spoil the original data.
Let’s proceed step by step:
datos2 <- datos
datos2$TEMP <- as.character(datos2$C0D_MUN)
datos2$MPIO_CCDGO <- as.factor(paste(0, datos2$TEMP, sep=""))
head(datos2)
## # A tibble: 6 x 16
## COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO YEAR
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2016
## 2 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2016
## 3 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2017
## 4 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2017
## 5 5 ANTIOQUIA 5440 MARINILLA HORT… ACELGA ACELGA 2018
## 6 5 ANTIOQUIA 5475 MURINDO TUBE… MALANGA MALANGA 2011
## # … with 8 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## # Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>, TEMP <chr>,
## # MPIO_CCDGO <fct>
Make sure to check, in the datos 2 object, characteristics of the new attribute MPIO_CCDGO.
Now, let’s filter a single year and select only two relevant attributes:
datos2 %>% filter(CULTIVO == "CAFE") -> datos3
head(datos3)
## # A tibble: 6 x 16
## COD_DEP DEPARTAMENTO C0D_MUN MUNICIPIO GRUPO SUBGRUPO CULTIVO YEAR
## <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 5 ANTIOQUIA 5101 CIUDAD B… OTRO… CAFE CAFE 2007
## 2 5 ANTIOQUIA 5034 ANDES OTRO… CAFE CAFE 2007
## 3 5 ANTIOQUIA 5642 SALGAR OTRO… CAFE CAFE 2007
## 4 5 ANTIOQUIA 5209 CONCORDIA OTRO… CAFE CAFE 2007
## 5 5 ANTIOQUIA 5091 BETANIA OTRO… CAFE CAFE 2007
## 6 5 ANTIOQUIA 5093 BETULIA OTRO… CAFE CAFE 2007
## # … with 8 more variables: Area_Siembra <dbl>, Area_Cosecha <dbl>,
## # Produccion <dbl>, Rendimiento <dbl>, ESTADO <chr>, CICLO <chr>, TEMP <chr>,
## # MPIO_CCDGO <fct>
class(datos3)
## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
datos4 <- datos3 %>% dplyr::select(MUNICIPIO, MPIO_CCDGO, YEAR, Produccion, Rendimiento)
datos4
## # A tibble: 1,099 x 5
## MUNICIPIO MPIO_CCDGO YEAR Produccion Rendimiento
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CIUDAD BOLIVAR 05101 2007 9935 1.17
## 2 ANDES 05034 2007 13574 1.6
## 3 SALGAR 05642 2007 4720 1.12
## 4 CONCORDIA 05209 2007 6182 1.23
## 5 BETANIA 05091 2007 5482 1.01
## 6 BETULIA 05093 2007 4826 1.06
## 7 ABEJORRAL 05002 2007 2329 0.61
## 8 SONSON 05756 2007 3600 1.2
## 9 ITUANGO 05361 2007 3110 1.08
## 10 NARIÑO 05483 2007 2643 1
## # … with 1,089 more rows
datos4 %>%
gather("YEAR", "Produccion", "Rendimiento" , key = variable, value = number)
## # A tibble: 3,297 x 4
## MUNICIPIO MPIO_CCDGO variable number
## <chr> <fct> <chr> <dbl>
## 1 CIUDAD BOLIVAR 05101 YEAR 2007
## 2 ANDES 05034 YEAR 2007
## 3 SALGAR 05642 YEAR 2007
## 4 CONCORDIA 05209 YEAR 2007
## 5 BETANIA 05091 YEAR 2007
## 6 BETULIA 05093 YEAR 2007
## 7 ABEJORRAL 05002 YEAR 2007
## 8 SONSON 05756 YEAR 2007
## 9 ITUANGO 05361 YEAR 2007
## 10 NARIÑO 05483 YEAR 2007
## # … with 3,287 more rows
head(datos4)
## # A tibble: 6 x 5
## MUNICIPIO MPIO_CCDGO YEAR Produccion Rendimiento
## <chr> <fct> <dbl> <dbl> <dbl>
## 1 CIUDAD BOLIVAR 05101 2007 9935 1.17
## 2 ANDES 05034 2007 13574 1.6
## 3 SALGAR 05642 2007 4720 1.12
## 4 CONCORDIA 05209 2007 6182 1.23
## 5 BETANIA 05091 2007 5482 1.01
## 6 BETULIA 05093 2007 4826 1.06
This is a key task. It involves several steps to be able to convert the attribute table from long format into wide format. More information about these steps here.
datos4 %>%
group_by(MPIO_CCDGO) %>%
mutate(Visit = 1:n()) %>%
gather("YEAR", "Produccion", "Rendimiento", key = variable, value = number) %>%
unite(combi, variable, Visit) %>%
spread(combi, number) -> datos5
head(datos5)
## # A tibble: 6 x 38
## # Groups: MPIO_CCDGO [125]
## MUNICIPIO MPIO_CCDGO Produccion_1 Produccion_10 Produccion_11 Produccion_12
## <chr> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 ABEJORRAL 05002 2329 3352 5018 4718
## 2 ABRIAQUI 05004 81 114 182 193
## 3 ALEJANDR… 05021 733 682 788 786
## 4 AMAGA 05030 570 643 822 791
## 5 AMALFI 05031 1439 1825 1194 1212
## 6 ANDES 05034 13574 10029 9463 9888
## # … with 32 more variables: Produccion_2 <dbl>, Produccion_3 <dbl>,
## # Produccion_4 <dbl>, Produccion_5 <dbl>, Produccion_6 <dbl>,
## # Produccion_7 <dbl>, Produccion_8 <dbl>, Produccion_9 <dbl>,
## # Rendimiento_1 <dbl>, Rendimiento_10 <dbl>, Rendimiento_11 <dbl>,
## # Rendimiento_12 <dbl>, Rendimiento_2 <dbl>, Rendimiento_3 <dbl>,
## # Rendimiento_4 <dbl>, Rendimiento_5 <dbl>, Rendimiento_6 <dbl>,
## # Rendimiento_7 <dbl>, Rendimiento_8 <dbl>, Rendimiento_9 <dbl>,
## # YEAR_1 <dbl>, YEAR_10 <dbl>, YEAR_11 <dbl>, YEAR_12 <dbl>, YEAR_2 <dbl>,
## # YEAR_3 <dbl>, YEAR_4 <dbl>, YEAR_5 <dbl>, YEAR_6 <dbl>, YEAR_7 <dbl>,
## # YEAR_8 <dbl>, YEAR_9 <dbl>
tail(datos5)
## # A tibble: 6 x 38
## # Groups: MPIO_CCDGO [125]
## MUNICIPIO MPIO_CCDGO Produccion_1 Produccion_10 Produccion_11 Produccion_12
## <chr> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 VALPARAI… 05856 734 545 497 515
## 2 VEGACHI 05858 619 383 222 216
## 3 VENECIA 05861 586 718 869 880
## 4 YALI 05885 64 314 172 154
## 5 YARUMAL 05887 350 309 341 332
## 6 YOLOMBO 05890 1360 1337 1168 1190
## # … with 32 more variables: Produccion_2 <dbl>, Produccion_3 <dbl>,
## # Produccion_4 <dbl>, Produccion_5 <dbl>, Produccion_6 <dbl>,
## # Produccion_7 <dbl>, Produccion_8 <dbl>, Produccion_9 <dbl>,
## # Rendimiento_1 <dbl>, Rendimiento_10 <dbl>, Rendimiento_11 <dbl>,
## # Rendimiento_12 <dbl>, Rendimiento_2 <dbl>, Rendimiento_3 <dbl>,
## # Rendimiento_4 <dbl>, Rendimiento_5 <dbl>, Rendimiento_6 <dbl>,
## # Rendimiento_7 <dbl>, Rendimiento_8 <dbl>, Rendimiento_9 <dbl>,
## # YEAR_1 <dbl>, YEAR_10 <dbl>, YEAR_11 <dbl>, YEAR_12 <dbl>, YEAR_2 <dbl>,
## # YEAR_3 <dbl>, YEAR_4 <dbl>, YEAR_5 <dbl>, YEAR_6 <dbl>, YEAR_7 <dbl>,
## # YEAR_8 <dbl>, YEAR_9 <dbl>
We will also make a copy of the simple features collection (again, just in case of a false move):
ant_munic2 <- ant_munic
Now, it’s join time:
ant_munic_stat = left_join(ant_munic2, datos5, by="MPIO_CCDGO")
summary(ant_munic_stat)
## DPTO_CCDGO MPIO_CCDGO MPIO_CNMBR MPIO_CRSLC MPIO_NAREA
## 05:125 05001 : 1 ABEJORRAL : 1 1814 : 9 Min. : 15.84
## 05002 : 1 ABRIAQUÍ : 1 1833 : 4 1st Qu.: 140.35
## 05004 : 1 ALEJANDRÍA: 1 1812 : 3 Median : 258.09
## 05021 : 1 AMAGÁ : 1 1813 : 2 Mean : 503.74
## 05030 : 1 AMALFI : 1 1817 : 2 3rd Qu.: 535.18
## 05031 : 1 ANDES : 1 1821 : 2 Max. :2959.36
## (Other):119 (Other) :119 (Other):103
## MPIO_NANO DPTO_CNMBR Shape_Leng Shape_Area
## Min. :2017 ANTIOQUIA:125 Min. :0.1730 Min. :0.001293
## 1st Qu.:2017 1st Qu.:0.6143 1st Qu.:0.011454
## Median :2017 Median :0.8907 Median :0.021059
## Mean :2017 Mean :1.1800 Mean :0.041079
## 3rd Qu.:2017 3rd Qu.:1.4197 3rd Qu.:0.043801
## Max. :2017 Max. :6.6118 Max. :0.238949
##
## MUNICIPIO Produccion_1 Produccion_10 Produccion_11
## Length:125 Min. : 0.0 Min. : 17 Min. : 14.0
## Class :character 1st Qu.: 227.2 1st Qu.: 271 1st Qu.: 300.2
## Mode :character Median : 733.5 Median : 718 Median : 871.0
## Mean : 1282.8 Mean : 1346 Mean : 1593.4
## 3rd Qu.: 1371.2 3rd Qu.: 1607 3rd Qu.: 1927.0
## Max. :13574.0 Max. :10029 Max. :11249.0
## NA's :31 NA's :36 NA's :37
## Produccion_12 Produccion_2 Produccion_3 Produccion_4
## Min. : 13.0 Min. : 0.0 Min. : 11.0 Min. : 13.0
## 1st Qu.: 295.0 1st Qu.: 262.8 1st Qu.: 203.8 1st Qu.: 238.2
## Median : 902.5 Median : 707.0 Median : 541.0 Median : 694.5
## Mean : 1610.4 Mean : 1208.7 Mean : 1104.5 Mean : 1291.3
## 3rd Qu.: 1844.2 3rd Qu.: 1217.5 3rd Qu.: 1252.8 3rd Qu.: 1410.8
## Max. :11439.0 Max. :13574.0 Max. :13594.0 Max. :13437.0
## NA's :37 NA's :31 NA's :31 NA's :31
## Produccion_5 Produccion_6 Produccion_7 Produccion_8
## Min. : 10.0 Min. : 0.0 Min. : 12.0 Min. : 12.0
## 1st Qu.: 224.0 1st Qu.: 193.5 1st Qu.: 179.5 1st Qu.: 182.0
## Median : 611.5 Median : 525.5 Median : 495.0 Median : 543.5
## Mean : 1228.0 Mean : 976.6 Mean : 1124.9 Mean : 1237.3
## 3rd Qu.: 1295.2 3rd Qu.:1259.2 3rd Qu.: 1139.5 3rd Qu.: 1335.0
## Max. :10344.0 Max. :5943.0 Max. :10080.0 Max. :10296.0
## NA's :31 NA's :31 NA's :34 NA's :35
## Produccion_9 Rendimiento_1 Rendimiento_10 Rendimiento_11
## Min. : 17 Min. :0.4300 Min. :0.620 Min. :0.660
## 1st Qu.: 264 1st Qu.:0.8800 1st Qu.:1.030 1st Qu.:1.010
## Median : 716 Median :1.0000 Median :1.080 Median :1.260
## Mean : 1351 Mean :0.9796 Mean :1.087 Mean :1.289
## 3rd Qu.: 1601 3rd Qu.:1.1200 3rd Qu.:1.120 3rd Qu.:1.520
## Max. :10097 Max. :1.6000 Max. :1.830 Max. :1.820
## NA's :36 NA's :32 NA's :36 NA's :37
## Rendimiento_12 Rendimiento_2 Rendimiento_3 Rendimiento_4
## Min. :0.620 Min. :0.3800 Min. :0.3800 Min. :0.250
## 1st Qu.:1.030 1st Qu.:0.7800 1st Qu.:0.6350 1st Qu.:0.800
## Median :1.290 Median :0.9500 Median :0.8950 Median :1.000
## Mean :1.316 Mean :0.9231 Mean :0.8534 Mean :1.004
## 3rd Qu.:1.550 3rd Qu.:1.0900 3rd Qu.:1.0200 3rd Qu.:1.238
## Max. :1.870 Max. :1.6000 Max. :1.6000 Max. :1.600
## NA's :37 NA's :32 NA's :31 NA's :31
## Rendimiento_5 Rendimiento_6 Rendimiento_7 Rendimiento_8
## Min. :0.2500 Min. :0.4000 Min. :0.3800 Min. :0.4100
## 1st Qu.:0.8025 1st Qu.:0.7500 1st Qu.:0.6300 1st Qu.:0.6800
## Median :0.9600 Median :0.7500 Median :0.7500 Median :0.8100
## Mean :0.9805 Mean :0.8625 Mean :0.7929 Mean :0.8461
## 3rd Qu.:1.2000 3rd Qu.:1.0000 3rd Qu.:0.9400 3rd Qu.:1.0100
## Max. :2.0000 Max. :1.5000 Max. :1.5000 Max. :1.4900
## NA's :31 NA's :32 NA's :34 NA's :35
## Rendimiento_9 YEAR_1 YEAR_10 YEAR_11 YEAR_12
## Min. :0.660 Min. :2007 Min. :2016 Min. :2017 Min. :2018
## 1st Qu.:0.990 1st Qu.:2007 1st Qu.:2016 1st Qu.:2017 1st Qu.:2018
## Median :1.040 Median :2007 Median :2016 Median :2017 Median :2018
## Mean :1.051 Mean :2007 Mean :2016 Mean :2017 Mean :2018
## 3rd Qu.:1.090 3rd Qu.:2007 3rd Qu.:2016 3rd Qu.:2017 3rd Qu.:2018
## Max. :1.770 Max. :2013 Max. :2018 Max. :2017 Max. :2018
## NA's :36 NA's :31 NA's :36 NA's :37 NA's :37
## YEAR_2 YEAR_3 YEAR_4 YEAR_5 YEAR_6
## Min. :2008 Min. :2009 Min. :2010 Min. :2011 Min. :2012
## 1st Qu.:2008 1st Qu.:2009 1st Qu.:2010 1st Qu.:2011 1st Qu.:2012
## Median :2008 Median :2009 Median :2010 Median :2011 Median :2012
## Mean :2008 Mean :2009 Mean :2010 Mean :2011 Mean :2012
## 3rd Qu.:2008 3rd Qu.:2009 3rd Qu.:2010 3rd Qu.:2011 3rd Qu.:2012
## Max. :2014 Max. :2015 Max. :2016 Max. :2017 Max. :2018
## NA's :31 NA's :31 NA's :31 NA's :31 NA's :31
## YEAR_7 YEAR_8 YEAR_9 geometry
## Min. :2013 Min. :2014 Min. :2015 POLYGON :125
## 1st Qu.:2013 1st Qu.:2014 1st Qu.:2015 epsg:4326 : 0
## Median :2013 Median :2014 Median :2015 +proj=long...: 0
## Mean :2013 Mean :2014 Mean :2015
## 3rd Qu.:2013 3rd Qu.:2014 3rd Qu.:2015
## Max. :2018 Max. :2018 Max. :2017
## NA's :34 NA's :35 NA's :36
#install.packages("RColorBrewer")
library(RColorBrewer)
library(leaflet)
Let’s plot the municipalities with their corresponding coffee production for a single year:
bins <- c(0, 250, 500, 1000, 2000, 5000, 10000, 15000)
pal <- colorBin("YlOrRd", domain = ant_munic_stat$Produccion_12, bins = bins)
mapa <- leaflet(data = ant_munic_stat) %>%
addTiles() %>%
addPolygons(label = ~Produccion_12,
popup = ~MPIO_CNMBR,
fillColor = ~pal(Produccion_12),
color = "#444444",
weight = 1,
smoothFactor = 0.5,
opacity = 1.0,
fillOpacity = 0.5,
highlightOptions = highlightOptions(color = "white", weight = 2, bringToFront = TRUE)
) %>%
addProviderTiles(providers$OpenStreetMap) %>%
addLegend("bottomright", pal = pal, values = ~Produccion_12,
title = "Coffee production in Antioquia [Ton] (2018)",
opacity = 1
)
mapa