1.Introduction

This is the first R Markdown Notebook for the Geomatica Basica 2022 course. It aims to know what are the agricultural crops with largest production in a given department. We will use as main source an excel file containing the Evaluaciones Agropecuarias Municipales (EVA) produced by UPRA for 2020.

For writing the notebook follow these directions:

You may use Preview to get a rendered HTML copy of the contents of the editor. However, if you want to run all R code chunks, you need to use Knit to HTML which is available under the Preview option.

2. Setup

In this step, we will install and load the required R libraries.

#uncomment for first-time execution
#install.packages('tidyverse') 
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readxl)

3. Read an excel file with estadisticas agropecuarias municipales

To know names of the excel files existing in our workspace type:

(archivos = list.files(pattern='xls'))
## [1] "~$GB_IL_lista.xlsx"        "12-Anexos-municipales.xls"
## [3] "BaseSIPRA2020.xlsx"        "CNPV-2018-NBI.xlsx"       
## [5] "divipola_editado.xls"      "divipola.xls"             
## [7] "GB_IL_lista.xls"           "GB_IL_lista.xlsx"         
## [9] "NBI.xlsx"

To know names of the excel sheets existing in the EVA file:

(hojas = readxl::excel_sheets("BaseSIPRA2020.xlsx"))
## [1] "Agrícola_SIPRA_AGRONET"       "InventarioBovino"            
## [3] "InventarioPorcino"            "InvBufalosCaprinoOvinoEquino"
## [5] "InvAves"

To read the excel sheet we are interested:

eva2020 = readxl::read_excel("BaseSIPRA2020.xlsx", sheet = "Agrícola_SIPRA_AGRONET")

To know what object is eva2020:

eva2020

4. Subset data relevant to our department

As we are interested in just a department we need to filter eva2020:

stder2020 = dplyr::filter(eva2020, Departamento == "Santander")

To display what the filtered object:

stder2020

5. Clean the filtered data

Note that there a few colummns that are not relevant (e.g.  Codigo del departamento and Departamento). Let’s keep the relevant colummns in another object:

stder2020.tmp <-  stder2020 %>% select('Código del Municipio':'Ciclo del cultivo')

To display the result:

stder2020.tmp

Note that there are several columns with strange names (i.e. blank spaces or accent marks). We need to rename those columns:

stder2020.tmp %>% rename(Cod_Mun = 'Código del Municipio', 
                         Grupo = 'Grupo cultivo según especie', 
                         Subgrupo = 'Subgrupo cultivo según especie',
                         AreaSiembra = 'Area Sembrada (ha)',
                         AreaCosecha = 'Area Cosechada (ha)',
                         Produccion = 'Producción (t)',                              
                         Rendimiento = 'Rendimiento (t/ha)', Ciclo='Ciclo del cultivo') ->                                  nstder2020

To display the result:

nstder2020

Note also that several columns which represent numeric values (e.g. Produccion or Rendimiento) are defined as character data types. We need to change the type of those columns:

nstder2020 %>% mutate(AreaSiembra = as.numeric(AreaSiembra),
                       AreaCosecha = as.numeric(AreaCosecha),
                       Produccion = as.numeric(Produccion),
                       Rendimiento = as.numeric(Rendimiento)) -> nstder2020 

To display the result:

nstder2020

6. Data processing & analysis

Many data analysis tasks can be approached using the split-apply-combine paradigm: split the data into groups, apply some analysis to each group, and then combine the results.

dplyr makes this very easy through the use of the group_by() function, which splits the data into groups. When the data is grouped in this way summarize() can be used to collapse each group into a single-row summary. summarize() does this by applying an aggregating or summary function to each group.

For example, if we wanted to group by crop and find the total production (Ton/ha) in our department’s municipalities, we type:

nstder2020 %>%
  filter(Produccion > 0) %>%
  group_by(Cultivo) %>%
  summarize(total_produccion = sum(Produccion)) %>% 
  arrange(desc(total_produccion))

We found the most important crops in our department.

Now we want to find the municipalities with higher production for every crop:

nstder2020 %>%
  group_by(Cultivo, Municipio) %>%
  summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
  slice(which.max(max_prod))  %>%
  arrange(desc(max_prod))
## `summarise()` has grouped output by 'Cultivo'. You can override using the `.groups` argument.
nstder2020 %>%
  group_by(Grupo,Municipio) %>%
  summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
  slice(which.max(max_prod))  %>%
  arrange(desc(max_prod))
## `summarise()` has grouped output by 'Grupo'. You can override using the `.groups` argument.

Now, let’s select a group of crops and export the data for future reuse:

nstder2020 %>%
  group_by(Cod_Mun, Municipio, Grupo) %>%
  filter(Grupo=='Frutales') %>% 
  summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
  arrange(desc(max_prod)) -> frutales2020
## `summarise()` has grouped output by 'Cod_Mun', 'Municipio'. You can override using the `.groups` argument.
frutales2020 
nstder2020 %>%
  group_by(Cod_Mun, Municipio, Grupo) %>%
  filter(Grupo=='Leguminosas Y Oleaginosas') %>% 
  summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
  arrange(desc(max_prod)) -> oleaginosas2020
## `summarise()` has grouped output by 'Cod_Mun', 'Municipio'. You can override using the `.groups` argument.

7. Write selected data to file

#write_csv(frutales2020, "./stder_frutales_2020.csv")
write_csv(oleaginosas2020, "./stder_oleag_2020.csv")

8. Reproducibility

Cite this work as Lizarazo, I. Reading and processing municipal agricultural statistics for 2020. Available at: https://rpubs.com/ials2un/readingEVAv1.

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] readxl_1.3.1    forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7    
##  [5] purrr_0.3.4     readr_2.1.2     tidyr_1.2.0     tibble_3.1.3   
##  [9] ggplot2_3.3.5   tidyverse_1.3.1
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.1  xfun_0.24         bslib_0.2.5.1     haven_2.4.3      
##  [5] colorspace_2.0-2  vctrs_0.3.8       generics_0.1.0    htmltools_0.5.1.1
##  [9] yaml_2.2.1        utf8_1.2.2        rlang_0.4.11      jquerylib_0.1.4  
## [13] pillar_1.6.2      glue_1.4.2        withr_2.4.2       DBI_1.1.1        
## [17] bit64_4.0.5       dbplyr_2.1.1      modelr_0.1.8      lifecycle_1.0.0  
## [21] munsell_0.5.0     gtable_0.3.0      cellranger_1.1.0  rvest_1.0.1      
## [25] evaluate_0.14     knitr_1.33        tzdb_0.2.0        parallel_4.1.0   
## [29] fansi_0.5.0       broom_0.7.12      Rcpp_1.0.7        backports_1.4.1  
## [33] scales_1.1.1      vroom_1.5.7       jsonlite_1.7.2    bit_4.0.4        
## [37] fs_1.5.0          hms_1.1.1         digest_0.6.27     stringi_1.7.3    
## [41] grid_4.1.0        cli_3.0.1         tools_4.1.0       magrittr_2.0.1   
## [45] sass_0.4.0        crayon_1.4.1      pkgconfig_2.0.3   ellipsis_0.3.2   
## [49] xml2_1.3.2        reprex_2.0.1      lubridate_1.8.0   rstudioapi_0.13  
## [53] assertthat_0.2.1  rmarkdown_2.9     httr_1.4.2        R6_2.5.0         
## [57] compiler_4.1.0