This is a very simple notebook which illustrates how to subset EVA 2019-2022 agricultural statistics data provided by UPRA at https://upra.gov.co/es-co/Evas_Documentos/BaseEVA_Agr%C3%ADcola20192022.xlsx.
It aims to help Geomatica Basica students at UNAL to gain digital skills in R.
It is assumed that you have already downloaded the xlsx file above mentioned and saved it into the same directory where this notebook lives.
When opened in Excel, the first rows of the BaseEVA_Agricola20192022.xlsx file look like this:
Then, you need to prune the original file in order to remove banners as well as to rename the columns to be able to read it from R. There should not be blank spaces nor strangeous characters on any column name.
Clean the temporal memory:
rm(list=ls())
Load the R libraries:
library(readxl)
library(readr)
library(dplyr)
Let’s check names of files which live in the same directory than this notebook:
list.files()
## [1] "~$BaseEVA_Agrícola20192022.xlsx" "BaseEVA_Agrícola20192022.xlsx"
## [3] "EVA_19_22.xlsx" "figs"
## [5] "Produccion2018.nb.html" "Produccion2022.html"
## [7] "Produccion2022.nb.html" "Produccion2022.Rmd"
## [9] "rsconnect" "stder_frutales_2022.csv"
In this case, the fixed file is EVA_19_22.xlsx
As this is an excel file, it is worth to know how many sheets are there:
hojas <- excel_sheets("EVA_19_22.xlsx")
Let’s see:
hojas
## [1] "Datos"
There is a single sheet there. Let’s read it:
EVA19_22 <- read_excel("EVA_19_22.xlsx")
EVA19_22
Note that the COD_MUN column is a character. This is the common data type for identifiers.
As we are interested in just a department we need to filter EVA19_22:
(stder19_22 = dplyr::filter(EVA19_22, DPTO == "Santander"))
Now, we will select only data for 2022:
(stder_22 = dplyr::filter(stder19_22, YEAR == 2022))
Now, let’s select only the group of crops we are interested in, and summarize the production per municipality:
stder_22 %>%
group_by(COD_MUN, MUNICIPIO, GRUPO) %>%
filter(GRUPO=='Frutales') %>%
summarize(max_prod = max(PRODUCCION, na.rm = TRUE)) %>%
arrange(desc(max_prod)) -> frutales2022
## `summarise()` has grouped output by 'COD_MUN', 'MUNICIPIO'. You can override
## using the `.groups` argument.
frutales2022
We will use the readr library to save our data in csv format:
write_csv(frutales2022, "stder_frutales_2022.csv")
Let’s check that the file has been created:
list.files()
## [1] "~$BaseEVA_Agrícola20192022.xlsx" "BaseEVA_Agrícola20192022.xlsx"
## [3] "EVA_19_22.xlsx" "figs"
## [5] "Produccion2018.nb.html" "Produccion2022.html"
## [7] "Produccion2022.nb.html" "Produccion2022.Rmd"
## [9] "rsconnect" "stder_frutales_2022.csv"
Let’s read the file to check that the data are correct:
los_datos22 <- read_csv("stder_frutales_2022.csv")
## Rows: 86 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): MUNICIPIO, GRUPO
## dbl (2): COD_MUN, max_prod
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
los_datos22
Note that the COD_MUN column is now a double (i.e. numeric). This is not the common data type for identifiers.
In case we need to change such data type, we can use mutate:
los_datos22 %>% mutate(COD_MUN = as.character(COD_MUN)) -> nuevos_datos22
head(nuevos_datos22)
Well done! Now, you can proceed to create the notebook where we will join the 2022 production data to the polygons representing municipalities in our department.
If you use code from this notebook in your project please cite it as follows:
Lizarazo, I. 2023. How to read and subset .xlsx agricultural statistics data. Available at https://rpubs.com/otra_vez_eva
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Ventura 13.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Bogota
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.2 readr_2.1.4 readxl_1.4.2
##
## loaded via a namespace (and not attached):
## [1] crayon_1.5.2 vctrs_0.6.3 cli_3.6.1 knitr_1.43
## [5] rlang_1.1.1 xfun_0.39 generics_0.1.3 jsonlite_1.8.7
## [9] bit_4.0.5 glue_1.6.2 htmltools_0.5.5 sass_0.4.6
## [13] hms_1.1.3 fansi_1.0.4 rmarkdown_2.22 cellranger_1.1.0
## [17] evaluate_0.21 jquerylib_0.1.4 tibble_3.2.1 tzdb_0.4.0
## [21] fastmap_1.1.1 yaml_2.3.7 lifecycle_1.0.3 compiler_4.3.0
## [25] pkgconfig_2.0.3 rstudioapi_0.15.0 digest_0.6.31 R6_2.5.1
## [29] tidyselect_1.2.0 utf8_1.2.3 parallel_4.3.0 vroom_1.6.3
## [33] pillar_1.9.0 magrittr_2.0.3 bslib_0.4.2 bit64_4.0.5
## [37] tools_4.3.0 cachem_1.0.8