This is the first R Markdown Notebook for the Geomatica Basica 2022 course. It aims to know what are the agricultural crops with largest production in a given department. We will use as main source an excel file containing the Evaluaciones Agropecuarias Municipales (EVA) produced by UPRA for 2020.
For writing the notebook follow these directions:
Each chunk is executed by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
A new chunk is added by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
The notebook should be saved with the .Rmd extension in the same directory where the UPRA EVA file was stored.
You may use Preview to get a rendered HTML copy of the contents of the editor. However, if you want to run all R code chunks, you need to use Knit to HTML which is available under the Preview option.
In this step, we will install and load the required R libraries.
#uncomment for first-time execution
#install.packages('tidyverse')
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
To know names of the excel files existing in our workspace type:
(archivos = list.files(pattern='xls'))
## [1] "~$GB_IL_lista.xlsx" "12-Anexos-municipales.xls"
## [3] "BaseSIPRA2020.xlsx" "CNPV-2018-NBI.xlsx"
## [5] "divipola_editado.xls" "divipola.xls"
## [7] "GB_IL_lista.xls" "GB_IL_lista.xlsx"
## [9] "NBI.xlsx"
To know names of the excel sheets existing in the EVA file:
(hojas = readxl::excel_sheets("BaseSIPRA2020.xlsx"))
## [1] "Agrícola_SIPRA_AGRONET" "InventarioBovino"
## [3] "InventarioPorcino" "InvBufalosCaprinoOvinoEquino"
## [5] "InvAves"
To read the excel sheet we are interested:
eva2020 = readxl::read_excel("BaseSIPRA2020.xlsx", sheet = "Agrícola_SIPRA_AGRONET")
To know what object is eva2020:
eva2020
As we are interested in just a department we need to filter eva2020:
stder2020 = dplyr::filter(eva2020, Departamento == "Santander")
To display what the filtered object:
stder2020
Note that there a few colummns that are not relevant (e.g. Codigo del departamento and Departamento). Let’s keep the relevant colummns in another object:
stder2020.tmp <- stder2020 %>% select('Código del Municipio':'Ciclo del cultivo')
To display the result:
stder2020.tmp
Note that there are several columns with strange names (i.e. blank spaces or accent marks). We need to rename those columns:
stder2020.tmp %>% rename(Cod_Mun = 'Código del Municipio',
Grupo = 'Grupo cultivo según especie',
Subgrupo = 'Subgrupo cultivo según especie',
AreaSiembra = 'Area Sembrada (ha)',
AreaCosecha = 'Area Cosechada (ha)',
Produccion = 'Producción (t)',
Rendimiento = 'Rendimiento (t/ha)', Ciclo='Ciclo del cultivo') -> nstder2020
To display the result:
nstder2020
Note also that several columns which represent numeric values (e.g. Produccion or Rendimiento) are defined as character data types. We need to change the type of those columns:
nstder2020 %>% mutate(AreaSiembra = as.numeric(AreaSiembra),
AreaCosecha = as.numeric(AreaCosecha),
Produccion = as.numeric(Produccion),
Rendimiento = as.numeric(Rendimiento)) -> nstder2020
To display the result:
nstder2020
Many data analysis tasks can be approached using the split-apply-combine paradigm: split the data into groups, apply some analysis to each group, and then combine the results.
dplyr makes this very easy through the use of the group_by() function, which splits the data into groups. When the data is grouped in this way summarize() can be used to collapse each group into a single-row summary. summarize() does this by applying an aggregating or summary function to each group.
For example, if we wanted to group by crop and find the total production (Ton/ha) in our department’s municipalities, we type:
nstder2020 %>%
filter(Produccion > 0) %>%
group_by(Cultivo) %>%
summarize(total_produccion = sum(Produccion)) %>%
arrange(desc(total_produccion))
We found the most important crops in our department.
Now we want to find the municipalities with higher production for every crop:
nstder2020 %>%
group_by(Cultivo, Municipio) %>%
summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
slice(which.max(max_prod)) %>%
arrange(desc(max_prod))
## `summarise()` has grouped output by 'Cultivo'. You can override using the `.groups` argument.
nstder2020 %>%
group_by(Grupo,Municipio) %>%
summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
slice(which.max(max_prod)) %>%
arrange(desc(max_prod))
## `summarise()` has grouped output by 'Grupo'. You can override using the `.groups` argument.
Now, let’s select a group of crops and export the data for future reuse:
nstder2020 %>%
group_by(Cod_Mun, Municipio, Grupo) %>%
filter(Grupo=='Frutales') %>%
summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
arrange(desc(max_prod)) -> frutales2020
## `summarise()` has grouped output by 'Cod_Mun', 'Municipio'. You can override using the `.groups` argument.
frutales2020
nstder2020 %>%
group_by(Cod_Mun, Municipio, Grupo) %>%
filter(Grupo=='Leguminosas Y Oleaginosas') %>%
summarize(max_prod = max(Produccion, na.rm = TRUE)) %>%
arrange(desc(max_prod)) -> oleaginosas2020
## `summarise()` has grouped output by 'Cod_Mun', 'Municipio'. You can override using the `.groups` argument.
#write_csv(frutales2020, "./stder_frutales_2020.csv")
write_csv(oleaginosas2020, "./stder_oleag_2020.csv")
Cite this work as Lizarazo, I. Reading and processing municipal agricultural statistics for 2020. Available at: https://rpubs.com/ials2un/readingEVAv1.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
## [5] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.3
## [9] ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.1 xfun_0.24 bslib_0.2.5.1 haven_2.4.3
## [5] colorspace_2.0-2 vctrs_0.3.8 generics_0.1.0 htmltools_0.5.1.1
## [9] yaml_2.2.1 utf8_1.2.2 rlang_0.4.11 jquerylib_0.1.4
## [13] pillar_1.6.2 glue_1.4.2 withr_2.4.2 DBI_1.1.1
## [17] bit64_4.0.5 dbplyr_2.1.1 modelr_0.1.8 lifecycle_1.0.0
## [21] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_1.0.1
## [25] evaluate_0.14 knitr_1.33 tzdb_0.2.0 parallel_4.1.0
## [29] fansi_0.5.0 broom_0.7.12 Rcpp_1.0.7 backports_1.4.1
## [33] scales_1.1.1 vroom_1.5.7 jsonlite_1.7.2 bit_4.0.4
## [37] fs_1.5.0 hms_1.1.1 digest_0.6.27 stringi_1.7.3
## [41] grid_4.1.0 cli_3.0.1 tools_4.1.0 magrittr_2.0.1
## [45] sass_0.4.0 crayon_1.4.1 pkgconfig_2.0.3 ellipsis_0.3.2
## [49] xml2_1.3.2 reprex_2.0.1 lubridate_1.8.0 rstudioapi_0.13
## [53] assertthat_0.2.1 rmarkdown_2.9 httr_1.4.2 R6_2.5.0
## [57] compiler_4.1.0