The survey of attributes associated with a set of geospatial data is essential to identify points for correlation between different types of geographic information, as presented by Machlis (2017). The recognition of these attributes allows social and environmental information to be connected within the mapping programming, generating cartography for different areas of knowledge, as discussed by Anderson (2021). From this perspective, Moreno and Basile (2021) showed that through this integration it is possible, in the universe of the R® language, to produce maps with sufficient quality to compose territorial, spatial, socio-environmental, environmental and cultural analyses, thus serving the most diverse scientific fields. . Considering this, Gauthier (2021) and Ellis, Beusen and Goldewijk (2020) demonstrated, through the global mapping of anthropogenic biomes (anthromes), that the correlation between different types of geospatial information, in addition to being possible, proves to be a tool modern approach to socio-ecological studies. These authors carried out the indicated mapping using computational language, R® language itself, and were able to generate a relevant mapping on a 96 km² hexagonal grid of the DGG.
According to the information reported in the structural bases on anthromes, Ellis and Ramankutty (2008) pointed out that demographic information, such as demographic density, and land use and coverage, such as coverage typologies, are fundamental for the identification, classification and mapping of different anthropogenic classes. Both in the cartographic product reported by these two authors, and in that published by Gauthier (2021) and Ellis, Beusen and Goldewijk (2021), it is noticeable that the territorial sections of land uses (structured polygons) represent significant information for the classification and for mapping anthromes.
Assuming the cartographic products described above and the indications on database attributes, and intending to produce local mapping of Brazilian anthromes in the future, we carried out an exploratory analysis of data on land use and cover. These data are produced by the Brazilian Institute of Geography and Statistics (IBGE), following the technical description documented in the Technical Manual for Land Use and Cover (IBGE, 2013b).
The central objectives of this investigation were to identify the attributes that made up the data made available by the institution, recognize the spatialization of geographic information on land use and land cover and verify whether there were key attributes for correlating these geospatial data with demographic data, analyzed by us in another work. To achieve these objectives, we based ourselves on the logical and analytical formatting presented in the works of Lovelace et al. (2019) and Anderson (2021), where the functions for exploratory data analysis in the R® software are presented.
The data on land use and coverage used in this research were those available on the Brazilian Institute of Geography and Statistics (IBGE) platform. This data was downloaded and saved in the workbook (directory) associated with this stage of the investigation. We emphasize that it is part of Brazilian national open data policies (IBGE, 2019) and is available for remote access at the link:
We highlight that land use and land cover data were produced by IBGE following the methodological guidelines expressed in the Technical Manual on Land Use and Cover (IBGE, 2013b). Previously, we analyzed the quality of the data made available by the Brazilian institute, checking their compliance with the intended use for the regionalization of Brazilian anthromes (GOBBO, 2020).
Continuing our PhD research, in this exploratory analysis we investigated attributes that were part of the data collection made available by IBGE. The central objective was to identify which attributes associated with this geographic information would allow the correlation between land use and cover data and demographic data, as well as the plotting of this information in the local mapping of anthromes, as suggested by Guathier (2021) and Ellis , Beusen and Goldewijk (2020).
To this end, the exploratory analysis of land use and land cover data was carried out using the R® software, aligning our investigation with those carried out by the aforementioned authors on anthromes. The analytical guidelines used in the R® language were taken from the works of Lovelace et al. (2019) and Anderson (2021), where the authors follow an investigative path to identify the constituent attributes of the data and the possibility of integrating information into mappings.
As we previously presented in the Thesis Methodology, our focus was on the State of São Paulo, justifying ourselves in the representativeness of the Federation Unit for national economy, politics and management and by the territorial divisions and sociocultural diversity of the populations that make up the São Paulo mosaic. From this perspective, data relating to São Paulo’s land cover for the period between 2010 and 2018 were downloaded from the IBGE digital collection.
Following the methodological recommendations of Gauthier (2021) for classifying and mapping anthromes, which indicates that use and coverage and census data must have the same dating to optimize correlation and reduce information distortions, we chose to use raster data referring to to 2010, the last year in which data from the census operation was fully available on the institute’s platform (IBGE, 2013a).
To present and confirm the working directory for the exploratory analysis of data on land use and cover, we use the getwd() function, which informs the name of the folder in which the files to be analyzed were saved, as shown below.
getwd()
## [1] "C:/ARQUIVOS COMPUTADOR/DOUTORADO/DOUTORADO TESE/03 DADOS GEOESPACIAIS/03.1 LAND USE AND COVER"
We highlight that this Brazilian data is of the raster type and, therefore, follows a different loading logic than the vector data in the R® software. Therefore, we use the shapefile() function to load this data into the program. Between the parentheses of the function, we specify the name of the file in the directory, whose name was coverage_sp.shp. We emphasize that this name reflected a set of files made available by IBGE with different extensions, which should be included in the work folder so that the shapefile() function could interpret them together.
Script 12 expresses in the first line how the data was loaded. After loading the raster data set, we create an object (data frame) from it, using the indicator (arrow) preceded by the given name of coverage_sp, as demonstrated in the second line of the code. Next, we confirm the creation of the object and its composition using only the name of the coverage_sp set (highlighted in the script).
Script 01: Loading the file coverage_sp.shp.
coverage_sp <- st_read("coverage_sp.shp")
## Reading layer `coverage_sp' from data source
## `C:\ARQUIVOS COMPUTADOR\DOUTORADO\DOUTORADO TESE\03 DADOS GEOESPACIAIS\03.1 LAND USE AND COVER\coverage_sp.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 248217 features and 8 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -53.10706 ymin: -25.30115 xmax: -44.16211 ymax: -19.78065
## Geodetic CRS: SIRGAS 2000
Source: the authors (2023). Caption: script describing the loading of data from “coverage_sp.shp” through the st_read() function and the creation of the coverage_sp data frame. The script presents different information that constitutes the file and the loaded data.
The results presented in Script 12 provided relevant information about the composition of the coverage_sp set. Firstly, we observed that the class of the object is of the data frame type of spatial polygons, that is, the data described a set of polygons that represented the areas that made up the territorial mosaic of the State of São Paulo. Additionally, we verified that the object was composed of 248,217 lines (features) and 8 variables. These variables had the following names: INDICE_GRE, USO2000, USO2010, USO2012, USO2014, USO2016, USO2018 and ID_GRE.
In minimum and maximum values (respectively, min values and max values) the minimum and maximum values for each of the variables that made up the data set were presented. On the other hand, in extend, we are faced with the indication of the minimum and maximum extension of the X and Y values that structured the set of polygons, that is, the values of the ordered pairs for plotting the data. Additionally, through the CRS indicated in Script 12, we recognized that there was a geographic projection and positioning system linked to the structure of the data set.
In order to broaden our perception of the coverage_sp set and confirm the above statements, we use the summary () function to identify basic characteristics of the polygonal data frame. Subsequently, we use the class () function in the analysis to confirm the class that the set belongs to. Sequentially, we used the names () function to identify the names expressed in the first line of the data frame, that is, the attributes that comprised it. To confirm the number of lines that make up the data frame, we apply the dim () function. The functions and their results are reported in Script 13 below.
Script 02: Preliminary coverage_sp analysis
class(coverage_sp)
## [1] "sf" "data.frame"
names(coverage_sp)
## [1] "INDICE_GRE" "USO2000" "USO2010" "USO2012" "USO2014"
## [6] "USO2016" "USO2018" "ID_GRE" "geometry"
dim(coverage_sp)
## [1] 248217 9
summary(coverage_sp)
## INDICE_GRE USO2000 USO2010 USO2012
## Length:248217 Min. : 1.000 Min. : 1.000 Min. : 1.000
## Class :character 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000
## Mode :character Median : 3.000 Median : 3.000 Median : 3.000
## Mean : 3.848 Mean : 3.697 Mean : 3.643
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. :13.000 Max. :13.000 Max. :13.000
## USO2014 USO2016 USO2018 ID_GRE
## Min. : 1.000 Min. : 1.000 Min. : 1.000 Min. :1.124e+13
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.:1.124e+13
## Median : 3.000 Median : 3.000 Median : 3.000 Median :1.124e+13
## Mean : 3.624 Mean : 3.605 Mean : 3.597 Mean :1.124e+13
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.:1.124e+13
## Max. :13.000 Max. :13.000 Max. :13.000 Max. :1.124e+13
## geometry
## POLYGON :248217
## epsg:4674 : 0
## +proj=long...: 0
##
##
##
Source: the authors (2023). Caption: script with preliminary coverage analysis, where the class(), names(), dim() and summary() functions were used.
According to the results presented above (Script 13), we confirmed through the class () function that the object belonged to the spatial polygon data frame class. In the structure of the object, we also verified that the object had a geographic coordinate system (CRS) integrated into it, which was represented in the Script by the minimum and maximum values of X and Y and by the expression “proj4string”. This expression revealed the geographic projection capacity of the data thanks to the coordinates indicated in the object structure (XY ordered pairs).
Furthermore, we proved that coverage_sp had the spatial characteristic integrated into its structure, revealed by the expression “sp”, which comes from the English term spatial. Another important aspect was the indication of “attr”, an expression that demonstrated that the object was composed of a set of attributes (attr), which were stored in package format.
Furthermore, the results from the names () function reaffirmed the names of the attributes of the coverage_sp set, as well as their distribution in 8 columns. Regarding the dim () function, the results showed that the object consisted of 248,217 lines and 8 columns. We thus verified that each of the lines referred to different polygonal structures in the territory of São Paulo, that is, they did not portray the locations and typologies of use and coverage that make up the mosaic of the State of São Paulo.
On the other hand, considering the results obtained by the names() function and the typology of results for each of the attributes, indicated by the summary() function, we recognize that the attributes “USO2000”, “USO2010”, “USO2012”, “USO2014” , “USO2016” and “USO2018” portrayed the different years of Earth use and coverage monitoring, an operation carried out by IBGE every two years (IBGE, 2017; 2018; 2020).
From this perspective, as we were interested in the year 2010 for structuring the decision tree for classifying Brazilian anthromes, we filtered the data referring to the year in the plot() function, using the name of the attribute “USO2010” between the square brackets [] in the function. Figure 01 reports how the coverage data was plotted for the year 2010.
Figure 01: Plot of coverage data for the year 2010.
Source: the authors (2023). Caption: Figure produced by plotting the coverage data, using the plot (_) function for production and the brackets [] to filter the data for the year 2010.
Figure 01, produced by plotting the data set, demonstrated that the polygonal structure of the State of São Paulo was visibly structured within the coverage_sp data frame. However, the complete filling of the territorial area did not allow the differentiation of land uses and covers that made up the mosaic. However, from the results obtained previously, we were able to identify that there are numbers in the USO2010 column and, as suggested by the IBGE metadata, we know that these numbers refer to different types of land use and cover. This fact can be demonstrated by applying the head() function, whose application deals with demonstrating the first lines of the set. The following code illustrates the use of the function.
Script: Application of the head() function in coverage_sp.
head(coverage_sp)
## Simple feature collection with 6 features and 8 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -49.89744 ymin: -23.05592 xmax: -49.88686 ymax: -23.00143
## Geodetic CRS: SIRGAS 2000
## INDICE_GRE USO2000 USO2010 USO2012 USO2014 USO2016 USO2018 ID_GRE
## 1 1KME5422N8762 4 4 4 4 4 4 1.123535e+13
## 2 1KME5422N8763 2 2 2 2 2 2 1.123535e+13
## 3 1KME5422N8764 2 2 2 2 2 2 1.123535e+13
## 4 1KME5422N8765 2 2 2 2 2 2 1.123535e+13
## 5 1KME5422N8766 4 4 4 4 4 4 1.123535e+13
## 6 1KME5422N8767 4 4 4 4 4 4 1.123535e+13
## geometry
## 1 POLYGON ((-49.88686 -23.055...
## 2 POLYGON ((-49.88701 -23.046...
## 3 POLYGON ((-49.88715 -23.037...
## 4 POLYGON ((-49.88729 -23.028...
## 5 POLYGON ((-49.88744 -23.019...
## 6 POLYGON ((-49.88758 -23.010...
Source: the authors (2024). Caption: demonstration of the head() function to find the numbers in the first six lines of the set coverage_sp.
As seen in the results presented by the head() function, we notice that in the USO2010 column there are different numbers for each line in the set. Synergistically, we notice that in the figure generated previously a scale is presented that goes from 0 to 14, which reveals which numbers are present in this set. Given these findings, we moved forward with our analyzes for data mining and manipulation.
Considering our focus on 2010 data, given the demographic data previously analyzed, our mining began by isolating the columns from the coverage_sp set that would be useful for subsequent analyses. Therefore, using the select() function, we created a new set with the columns USO2010 and geometry, which was named coverage_sp_2010, as illustrated in the following code.
Script: creation of the coverage_sp_2010 set.
coverage_sp_2010 <- coverage_sp[c("USO2010", "geometry")]
coverage_sp_2010
## Simple feature collection with 248217 features and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -53.10706 ymin: -25.30115 xmax: -44.16211 ymax: -19.78065
## Geodetic CRS: SIRGAS 2000
## First 10 features:
## USO2010 geometry
## 1 4 POLYGON ((-49.88686 -23.055...
## 2 2 POLYGON ((-49.88701 -23.046...
## 3 2 POLYGON ((-49.88715 -23.037...
## 4 2 POLYGON ((-49.88729 -23.028...
## 5 4 POLYGON ((-49.88744 -23.019...
## 6 4 POLYGON ((-49.88758 -23.010...
## 7 4 POLYGON ((-49.88772 -23.001...
## 8 4 POLYGON ((-49.88787 -22.992...
## 9 4 POLYGON ((-49.88801 -22.983...
## 10 4 POLYGON ((-49.93661 -22.983...
Source: the authors (2024). Caption: application of the select() function to create a new data set for the year 2010. In the code, the INDICE_GRE, USO2010, ID_GRE and geometry columns from the coverage_sp set were isolated.
As can be seen in the results, the set coverage_sp_2010 follows the same structure as cobertura_sp, however only with the relevant data for subsequent analyzes referring to the year 2010. Based on this, we carried out a survey in the column USO2010 to confirm what was previously presented about the values present in this column. To carry out such polling, we use the unique() function, which reports the values, in this case numeric, present in the column. Below is the presentation of the code and results.
Script: application of the unique() function in coverage_sp_2010.
unique(coverage_sp_2010$USO2010)
## [1] 4 2 1 3 12 5 6 11 10 13 9
Source: the authors (2024). Legend: application of the unique() function in the coverage_sp_2010 data set in the USO2010 column to identify the numeric values present in the column.
According to the results reported by the unique() function, we note the numeric identifiers: 1,2,3,4,5,6,9,10,11,12 and 13. We call them numerical identifiers because, According to the IBGE technical documentation, these values are assigned to different land uses and covers. Considering this, we must provide a synthesis, based on the document, of the respective uses and coverage in order to advance in our analyses. In the metadata made available by IBGE, we find the file coverage_sp2018.xlsx, which contains the respective uses and coverage and a brief description for each of the numbers identified above. This information is reported in full below.
1.Artificial area: Areas where non-agricultural anthropogenic surfaces predominate. These are those structured by buildings and road systems, which include metropolises, cities, towns, indigenous villages and quilombola communities, areas occupied by industrial and commercial complexes and buildings that may, in some cases, be located in peri-urban areas. . Areas where exploration or extraction of mineral substances occurs, through mining or mining, also belong to this class.
Agricultural Area: Area characterized by temporary, semi-perennial and permanent crops, irrigated or not, with the land used for the production of food, fiber, fuel and other raw materials. It follows the parameters adopted in IBGE agricultural research and includes all cultivated areas, including those that are fallow or located on flood able land. It can be represented by heterogeneous agricultural zones or extensive areas of plantations. Includes aquaculture tanks.
Managed Pasture: Areas intended for grazing cattle and other animals, with cultivated herbaceous vegetation (brachiaria, ryegrass, etc.) or field vegetation (natural), both presenting high-intensity anthropogenic interference. These interferences may include planting; cleaning the land (destopping and destonering); elimination of weeds mechanically or chemically (application of herbicides); harrowing; liming; fertilizing; among others that distort the natural coverage.
Mosaic of Occupations in Forest Area: Area characterized by mixed occupation of agricultural, pasture and/or forestry areas associated or not with forest remnants, in which individualization of its components is not possible. It also includes areas with natural and anthropogenic disturbances, mechanical or non-mechanical, that make it difficult to characterize the area.
Silviculture: Area characterized by forest plantations of exotic or native species as monocultures. It follows the parameters adopted in IBGE’s plant extraction and forestry research.
Forest Vegetation: Area occupied by forests. Tree formations exceeding 5 meters in height are considered forestry, including areas of Dense Rainforest, Open Rainforest, Seasonal Forest, as well as Mixed Rainforest. It includes other features due to their size exceeding 5 m in height, such as the Forested Savanna, Forested Campinarana, Forested Steppe Savanna, Mangroves and Buritizais, According to the Technical Manual for Land Use (IBGE, 2013).
Wet Area: Area characterized by natural herbaceous or shrubby vegetation (coverage of 10% or more), permanently or periodically flooded by fresh or brackish water. Includes ponds, swamps, wet fields, estuaries, among others. The flooding period must be at least 2 months per year. Shrub or tree vegetation may occur, as long as they occupy less than 10% of the total area.
Countryside Vegetation: Area characterized by grassland formations. Countryside is understood as the different categories of vegetation that are physiognomically very different from forestry, that is, those that are characterized by a predominantly shrubby stratum, sparsely distributed over a grassy-woody stratum. This category includes Savannas, Steppes, Steppe Savannas, Pioneer Formations and Ecological Refuges. They are spread across different phytogeographic regions, comprising different primary typologies: plateau steppes, rocky fields in coastal mountains and coastal hydro-sandy fields (restinga), According to the Technical Manual for Land Use (IBGE, 2013). These areas may be subject to grazing and other low-intensity anthropogenic interference, such as the unmanaged pasture areas of Rio Grande do Sul and Pantanal.
Mosaic of Occupations in Countryside Areas: Area characterized by mixed occupation of agricultural, pasture and/or forestry areas associated or not with countryside remnants, in which individualization of its components is not possible. It also includes areas with natural and anthropogenic disturbances, mechanical or non-mechanical, that make it difficult to characterize the area.
Continental body of water: Includes all inland waters, such as rivers, streams, canals and other linear bodies of water. It also encompasses naturally closed bodies of water (natural lakes) and artificial reservoirs (artificial water dams built for irrigation, flood control, water supply and electricity generation). Does not include aquaculture tanks.
13: Coastal body of water: Includes waters within 12 nautical miles, According to Law No. 8,617, of January 4, 1993.
Based on this extract from the IBGE document, we resumed our table of anthropogenic classes to establish a parallel between the categories of land use and cover and the anthropogenic classes. The following table represents the categories and colors referring to each anthrome to be classified.
Table: Categories and colors of anthromes.
## categories colors
## 1 Urban (urban center and dense population cluster) #FF0000
## 2 Mixed urban (urban periphery) #FF4747
## 3 Agricultural villages #ED833B
## 4 Livestock villages #FFD966
## 5 Traditional Communities #9CC2E5
## 6 Mining villages #8C8C8C
## 7 Agricultural land #F4B488
## 8 Livestock land #FFEEB7
## 9 Mining lands #B6B6B6
## 10 Forestry #E2EFD9
## 11 Forest areas for sustainable use #A8D08D
## 12 Preservation and conservation areas #538135
Source: the authors (2024). Caption: Categories of anthromes and their respective color codes (RGB).
Considering this table and the classes previously presented for land uses and covers, we must make some notes to help understand the next analytical steps. These notes cover the adjustments made to establish the parallel between the classes of anthromes and the classes of land use and cover present in the coverage_sp_2010 data. We will list the adjustments below.
The class of artificial area brings together different types of population clusters, which makes their distinction impossible at first. Therefore, we established the nomenclature and color for this class referring to the anthropogenic category of mixed urban, considering the range of anthropogenic classes that these data can group together.
For the typology agricultural area (use and coverage) we established the nomenclature and coloring of agricultural land, considering the description provided by IBGE for this category of land use and coverage. The same was applied to the category of use and coverage referring to managed pasture, whose reference for anthromes was livestock land. Furthermore, since silviculture is a common category between the use and cover classes and anthromes, the same coloration was established.
For the land use and cover categories countryside vegetation, humid area and forest vegetation, we established the nomenclature and coloring of preservation and conservation areas, merging them as in a single set. Our choice was made based on the description of the uses and coverage that these areas describe. According to IBGE, the three types of use and coverage refer to natural areas that have not been disturbed. Notably, we disregarded the phytophysiological and biogeographical differences associated with the three typologies, however, this does not preclude further analysis and can be considered as a limitation of our analyses.
The category of use and coverage mosaic of occupations in forest areas was understood by us as forest areas of sustainable use in the categories of anthromes. Therefore, we established this nomenclature and coloring for such areas. On the other hand, we consider the category mosaic of occupations in rural areas as synergistic with agricultural/livestock villages; thus, we established the nomenclature and coloring of agricultural villages (anthrome category) for this category of use and coverage. It should be noted that, as in artificial areas, it was not possible to distinguish between agriculture and livestock; therefore, the distinction was made a posteriori with the integration of demographic data with land use and land cover
For the categories continental water body and coastal water body there were no representative categories in the anthromes. In order not to remove this data set from our analyses, we established the color “#000066” (dark blue) to include the data in subsequent analyzes and defined the typology as waters. The maintenance of this set aimed to portray the relevance of water courses for the maintenance and supply of communities, as well as for territorial planning policies.
After these words, we structure, using the if/else() function, the code so that the data is interpreted as mentioned above. Therefore, we determined in coverage_sp_2010 what each of the values in the USO2010 column represents as a typology of land use and coverage and the color defined for each of the typologies. The following code illustrates the procedure for inserting two new columns into the coverage_sp_2010 set: typology and color.
Script: Determination of land use and coverage and the color of the typology based on the numeric attribute in the USO2010 column.
# Add usage and coverage typology column with if/else
coverage_sp_2010$tipology <- ifelse(coverage_sp_2010$USO2010 == 1, "mixed urban",
ifelse(coverage_sp_2010$USO2010 == 2, "agricultural lands",
ifelse(coverage_sp_2010$USO2010 == 3, "livestock lands",
ifelse(coverage_sp_2010$USO2010 == 4, "sustainable use forest areas",
ifelse(coverage_sp_2010$USO2010 == 5, "silviculture",
ifelse(coverage_sp_2010$USO2010 == 6, "preservation and conservation areas",
ifelse(coverage_sp_2010$USO2010 == 9, "preservation and conservation areas",
ifelse(coverage_sp_2010$USO2010 == 10, "preservation and conservation areas",
ifelse(coverage_sp_2010$USO2010 == 11, "agricultural villages",
ifelse(coverage_sp_2010$USO2010 == 12, "waters",
ifelse(coverage_sp_2010$USO2010 == 13, "waters", NA)))))))))))
# Add color column specified with if/else
coverage_sp_2010$colors <- ifelse(coverage_sp_2010$USO2010 == 1, "#FF4747",
ifelse(coverage_sp_2010$USO2010 == 2, "#F4B488",
ifelse(coverage_sp_2010$USO2010 == 3, "#FFEEB7",
ifelse(coverage_sp_2010$USO2010 == 4, "#A8D08D",
ifelse(coverage_sp_2010$USO2010 == 5, "#E2EFD9",
ifelse(coverage_sp_2010$USO2010 == 6, "#538135",
ifelse(coverage_sp_2010$USO2010 == 9, "#538135",
ifelse(coverage_sp_2010$USO2010 == 10, "#538135",
ifelse(coverage_sp_2010$USO2010 == 11, "#ED833B",
ifelse(coverage_sp_2010$USO2010 == 12, "#000066",
ifelse(coverage_sp_2010$USO2010 == 13, "#000066", NA)))))))))))
Source: the authors (2024). Caption: application of the if/else() function to construct the typology (land use and coverage) and color columns, using as a factor the numeric attribute of the USO2010 column of coverage_sp_2010.
After inserting these two pieces of information into the coverage_sp_2010 set, we fragmented it into the different groups of land use and coverage represented by the codes: 1,2,3,4,5,6,9,10,11 .12 and 13. The following script illustrates the process for constructing anthrome classes based on soil use and land cover attributes.
Script: fragmentation of the coverage_sp_2010 set into land use and land cover groups.
# mixed urban (1)
ucs_mixed_urban = coverage_sp_2010 %>% filter(USO2010 == 1)
# agricultural lands (2)
ucs_agricultural_lands = coverage_sp_2010 %>% filter(USO2010==2)
# livestock lands (3)
ucs_livestock_lands = coverage_sp_2010 %>% filter(USO2010 == 3)
# sustainable use forest areas (4)
ucs_sustainable_use_forest = coverage_sp_2010 %>% filter(USO2010 == 4)
# silviculture (5)
ucs_silviculture = coverage_sp_2010 %>% filter(USO2010 == 5)
# preservation and conservation areas - combination of Forest Vegetation (6), Wet Area (9) and Countryside Vegetation (10)
ucs_preservation_conservation = coverage_sp_2010 %>% filter(USO2010 %in% c(6,9,10))
# Mosaic of occupations in rural areas (11)
ucs_agricultural_villages = coverage_sp_2010 %>% filter(USO2010 == 11)
# Continental (12) and coastal water bodies (13)
ucs_waters = coverage_sp_2010 %>% filter(USO2010 %in% c(12,13))
Source: the authors (2024). Caption: creation of different classes of anthromes based on the attribute of soil use and land cover (from coverage_sp_2010).
In the script above, we fragmented the **coverage_sp_2010* set using the codes for the different types of land use and coverage. We highlight that the areas were aggregated in accolorsdance with the previously postulated. Coming from this fragmentation, also understood as data mining, we created the following sets (8 in total):
These sets represent the types of land use and cover present in the State of São Paulo and also the anthromes classified based on the uses and covers present in the Federation Unit. It should be noted that, as we previously reported, some typologies aggregate data from different territorial contexts. An example is the artificial area which clusters areas such as cities, towns, indigenous villages and quilombola communities. From this perspective, we had a warning for future analyses, considering that these areas describe different populated anthromes. Therefore, at this stage we consider them only as artificial areas, treating them together. However, when combining land use and coverage data with population data, the distinction between the typology of the anthropogenic biome was referred to based on data from populated anthromes, with a view to better differentiating the occupations and uses of anthropogenic biomes.
After data mining and manipulation was carried out, creating sets of land use and land cover anthromes present in the State of São Paulo, we moved on to viewing these sets individually. In the following code, we establish, through st_geometry() the mechanism for operating the plot() function, indicating that it uses the geometry of the polygons of each type of use and coverage as a form for plotting. Furthermore, we used the col factor so that the plot() function assigned the respective colors intended for the anthropogenic typology to the polygons. Furthermore, we remove the edges of the polygons, allowing only the plotting of the colored shape (border = NA). Below are represented the different anthromes based on land uses and covers.
Figure: coded structure for producing images referring to anthrome’s typologies based on the attribute land use and cover.
# Plot mixed urban (1)
plot(st_geometry(ucs_mixed_urban), col = ucs_mixed_urban$colors, main = "mixed urban", border = NA)
# Plot agricultural lands (2)
plot(st_geometry(ucs_agricultural_lands), col = ucs_agricultural_lands$colors, main = "agricultural lands", border = NA)
# Plot livestock lands
plot(st_geometry(ucs_livestock_lands), col = ucs_livestock_lands$colors, main = "livestock lands", border = NA)
# Plot sustainable use forest areas (4)
plot(st_geometry(ucs_sustainable_use_forest), col = ucs_sustainable_use_forest$colors, main = "sustainable use forest areas", border = NA)
# Plot silviculture (5)
plot(st_geometry(ucs_silviculture), col = ucs_silviculture$colors, main = "silviculture", border = NA)
# Plot conservation and preservation areas (6)
plot(st_geometry(ucs_preservation_conservation), col = ucs_preservation_conservation$colors, main = "conservation and preservation areas", border = NA)
# Plot agricultural villages (11)
plot(st_geometry(ucs_agricultural_villages), col = ucs_agricultural_villages$colors, main = "agricultural villages", border = NA)
# Plot bodies of water
plot(st_geometry(ucs_waters), col = ucs_waters$colors, main = "bodies of water", border = NA)
Source: the authors (2024). Caption: São Paulo anthromes isolated based on the typology of land use and cover.
The images generated by plotting the sets revealed how São Paulo’s anthromes are distributed within the Federation Unit separately. In the figure artificial area we observe the distribution of population clusters, mainly those associated with municipalities, towns, districts and traditional communities. It is worth noting that in the images mosaic of occupations in forest areas and mosaic of occupations in rural areas there is also the presence of a population, however, they are represented here in a generic way. As we pointed out previously, at the junction of land use and land cover anthromes with populated anthromes, we will be able to distinguish where the populations are actually inserted in these two sets.
Regarding the images agricultural area and managed pasture, we have agricultural areas and livestock areas represented separately. Here we have reference to the planting and breeding areas, with no directly associated population. Likewise, native forests, conservation and preservation represent anthromes where there is no human presence and associated human activity. In contrast, the silviculture image represents areas destined for the production of tree species or areas of use, but they are also areas where human presence is not very representative.
Finally, the image bodies of water represents the magnitude of the rivers, lakes, lagoons and watercourses, including coastal ones, that run through the State. Although it does not exactly represent an anthromes, we consider the relevance of rivers both for maintaining the quality of life and the environment, as well as for their significant importance for territorial planning, as recommended by national standards and regulations.
Continuing with our analyses, we proceeded to join these sets to jointly plot the anthromes of land use and cover, in order to obtain a general overview of the distribution of these anthromes in the territory of São Paulo. To this end, it was necessary to create a legend that represented the typologies and colors associated with each of the soil use and cover anthromes. To do so, we extracted this information (typology and color) from the coverage_sp_2010 set to construct the legend. The following script illustrates the process of creating anthromes_ucs_legend.
Script: creation of the legend for land use and land cover anthromes (anthromes_ucs_legend).
# Set the caption manually
anthromes_ucs_legend <- data.frame(tipology = c("mixed urban", "agricultural lands", "livestock lands", "sustainable use forest", "silviculture", "conservation and preservation areas", "agricultural villages", "waters"),colors = c("#FF4747", "#F4B488", "#FFEEB7", "#A8D08D", "#E2EFD9", "#538135", "#ED833B", "#000066"))
Source: the authors (2024). Legend: script demonstrating the mechanism for creating the legend to be plotted on the image of land use and land cover anthromes.
Once these procedures were carried out to construct the image of soil use and land cover anthromes, we proceeded to plotting the sets together. Using the operator add = TRUE we place the different sets of land use and land cover anthromes on the same plot. The following image represents the anthromes of use and coverage present in the State of São Paulo.
Figure: anthromes of land use and cover in the State of São Paulo.
# Plot all geometries with their corresponding colors
plot(st_geometry(ucs_mixed_urban), col = ucs_mixed_urban$colors, border = NA, main = "land use and cover anthromes")
plot(st_geometry(ucs_agricultural_lands), col = ucs_agricultural_lands$colors, border = NA, add = TRUE)
plot(st_geometry(ucs_livestock_lands), col = ucs_livestock_lands$colors, border = NA, add = TRUE)
plot(st_geometry(ucs_sustainable_use_forest), col = ucs_sustainable_use_forest$colors, border = NA, add = TRUE)
plot(st_geometry(ucs_silviculture), col = ucs_silviculture$colors, border = NA, add = TRUE)
plot(st_geometry(ucs_preservation_conservation), col = ucs_preservation_conservation$colors, border = NA, add = TRUE)
plot(st_geometry(ucs_agricultural_villages), col = ucs_agricultural_villages$colors, border = NA, add = TRUE)
plot(st_geometry(ucs_waters), col = ucs_waters$colors, border = NA, add = TRUE)
# Add caption
legend("topright", legend = anthromes_ucs_legend$tipology, fill = anthromes_ucs_legend$colors, title = "Tipology", cex = 0.8)
Source: the authors (2024). Caption: figure showing the different soil use and cover anthromes present in the State of São Paulo. The legend in the figure represents the colors associated with each of the anthromes typologies.
The generated Figure represents the different anthromes of land use and cover. In this plot we have the 8 classes listed previously colored as stipulated for the classes. It should be noted that the plot, despite considering the geographic referencing system, does not represent the mapping in the coordinate plane, being an ordered extract of the data. In the next steps we will join the data for plotting using the ggplot() function, where the cartographic plane is visible.
Continuing the analyses, in this stage we carried out the construction of the static mapping using the ggplot() function. However, to accomplish this, it was first necessary to combine the eight classes of soil use and land cover anthromes. Next, the code for joining the data into a single data frame (anthromes_ucs_sp) is presented. Furthermore, it was necessary to create a set containing the colors associated with each of the types of use and coverage (anthromes_ucs_sp_colors); the code for this procedure is also represented in the sequence.
Script: Joining the anthromes classes of land use and land cover into a single data frame(anthromes_ucs_sp) and joining colors ((anthromes_ucs_sp_colors))
# Create a data frame with corresponding geometries and colors
anthromes_ucs_sp <- rbind(ucs_mixed_urban, ucs_agricultural_villages, ucs_agricultural_lands, ucs_livestock_lands, ucs_silviculture, ucs_sustainable_use_forest, ucs_preservation_conservation, ucs_waters)
# Get the unique levels from the "color" column
anthromes_ucs_sp_colors <- unique(anthromes_ucs_sp$colors)
Source: the authors (2024). Legend: script presenting the model for building the anthromes_ucs_sp set, which groups together the 8 classes of land use and cover anthromes, and the anthromes_ucs_sp_colors set, which integrates the colors associated with the different land use and cover typologies.
According to what was observed in the code above, the anthromes_ucs_sp set was constructed correctly, aggregating the different classes of land use and land cover anthromes into a single data frame. Starting from this set, we proceeded to create the static mapping of soil use and cover anthromes. The following code demonstrates how the ggplot() function was structured to map data on land use and land cover anthromes, followed by the image referring to the Static Mapping of Land Use and Cover anthromes.
Script: Static Mapping of Land Use and Cover anthromes
# Plot with ggplot2 and add the legend directly
map_anthromes_ucs <- ggplot(anthromes_ucs_sp, aes(fill = factor(colors, levels = unique(anthromes_ucs_legend$colors)))) +
geom_sf(color = NA) +
scale_fill_manual(values = anthromes_ucs_legend$colors,
labels = anthromes_ucs_legend$tipology,
name = "Tipology") +
labs(title = "Land Use and Cover Anthromes",
subtitle = "Study Area: State of São Paulo",
x = "Longitude",
y = "Latitude") +
theme_minimal()
print(map_anthromes_ucs)
Source: the authors (2024). Caption: the image generated by the script represents the anthromes of land use and cover in the State of São Paulo. In detail, in ggplot() we inserted the name of the set (anthromes_ucs_sp) and indicated that the polygons were plotted with the fills (fill) present in the set, following and pairing with the colors (colors) present in the set anthromes_ucs_sp_colors. In geom_sf we determined that the edges of the polygons were not represented (color = NA), so as not to interfere with the colors represented by the fill.
Additionally, in scale_fill_manual() we determine the format for constructing the legend present in the mapping; in values = caption_anthromes_ucs\(colors* we indicate that the values would be the colors present in the set **anthromes_ucs_legend**, while in *labels = caption_anthromes_ucs\)typology we indicate the categories that should be present in the legend. Finally, in name = “typology” we determine the name written on the caption.
In labs() we determine graphic aspects for the plot, such as title (title), subtitle (subtitle), graphic dimensions x (longitude) and y (latitude). Finally, in theme_minimal() we have the graphical representation format of the mapping, such as lines and visual distributions, not requiring adjustments for this.
With this, we were able to carry out static mapping of soil use and land cover anthromes in the State of São Paulo. Analysis to validate the mapping will be carried out later. From this point, we started building the interactive mapping, using the Open Street Maps (OSM) database.
Once the static mapping of land use and land cover anthromes in the State of São Paulo was completed, we moved on to interactive mapping. The interactive mapping aimed to spatialize the classified results in a dynamic layout, which could be integrated with other desktop, mobile or web platforms. To this end, the procedure for creating the interactive mapping is represented below. It is noteworthy that in attempts not reported in this work, the anthromes_ucs_sp data set required a large amount of processing time, compromising the computer’s memory. With this in mind, we started using the 8 subsets that made up the larger set, that is, those 8 sets that dealt with the different typologies of land use and land cover anthromes, namely:
That said, the first step in building the interactive mapping was to determine that the 8 subsets had the same geometric projection base or coordinate system (crs) as Open Street Maps (OSM). To do this, we use the st_transform() function on the 8 subsets, selecting the crs +proj=longlat +datum=WGS84 in st_crs(). At this stage, the names of the subsets were kept the same, as the change indicated by the functions only protects the data projection structure.
Once the crs transformation of the subsets was carried out, we proceeded to merge the polygons that made up the subsets. As we presented previously, the attempt to create interactive mapping with fragmented data required a lot of processing time and did not achieve the expected result. Therefore, we fused the polygons using the st_union() function, which allowed the construction of multipolygon layers. For each of the 8 subsets, layers were constructed to represent them, namely:
Once the 8 layers were structured for the subsets of land use and land cover, we carried out the layer simplification procedure. The objective of this procedure was also to reduce processing time and memory demand for building the interactive mapping. According to the literature, this is a common procedure when the amount of data to be mapped is significant (more than 200 thousand polygons in the State of São Paulo, for example). Therefore, following the authors’ suggestions, we applied a tolerance index (tolerancia_deg) of 0.001, which is a value that would not harm the resolution of the interactive mapping.
After determining the tolerance index for simplification, we built the function to simplify the multipolygons together. The function established the structure for simplification to occur, in each of the 8 layers, in a unique way, using the basis of the tolerance index for simplification. The use of this function culminated in the creation of 8 simplified layers referring to the different types of use and coverage, namely:
Once the simplification stage is complete, we move on to organizing the layout of the layers to be plotted in the interactive mapping. This organization was due to the structure of how the data was organized and distributed in the XY dimensions. In order for the layers to be processed in a single way (reducing processing time), the data that made up the layers needed to be read in a single way. Therefore, we assigned two other keys “[]” to the list, so that the structure could be read individually, as illustrated in the following code.
Script: Organization for reading the layer in a unique way.
# Simplified Artificial Area Layer
mixed_urban_multipolygons <- layer_ucs_mixed_urban_simplified[[1]]
agricultural_lands_multipolygons <- layer_ucs_agricultural_lands_simplified[[1]]
livestock_lands_multipolygons <- layer_ucs_livestock_lands_simplified[[1]]
sustainable_use_forest_multipolygons <- layer_ucs_sustainable_use_forest_simplified[[1]]
silviculture_multipolygons <- layer_ucs_silviculture_simplified[[1]]
preservation_conservation_multipolygons <- layer_ucs_preservation_conservation_simplified[[1]]
agricultural_villages_multipolygons <- layer_ucs_agricultural_villages_simplified[[1]]
waters_multipolygons <- layer_ucs_waters_simplified[[1]]
Source: the authors (2024). Legend: code demonstrating the format for establishing a single reading per layer of multipolygons, in order to reduce data processing time during information mapping.
As noted in the code above, 8 layers were structured in the model for representation in interactive mapping. Each of them represents a typology of land use and land cover anthromes to be included in the mapping. The constructed layers are:
With these 8 structured layers, we started building the interactive mapping using the leaflet() function. The basis (background) for mapping the data was the same used in the construction of the interactive mapping of populated anthromes, Open Street Maps (OSM). We reiterate that our choice was made due to the free nature of OSM services and the collaboration in the construction of mappings that this mapping source allows. Regarding the mapping structure, we direct the visualization to the region of the State of São Paulo through the setView() function, using the information lng = -47.9292, lat = -23.5505, zoom = 7 (longitude, latitude and zoom, respectively) for such.
The addPolygons() function was responsible for integrating the 8 layers into the mapping. In it, we direct the data set to be plotted (date), the color of the edges (color), the filling color of the polygons (fillColor), the opacity of the filling (fillOpacity) and the name indicator for that layer (popup) . We must point out that the colors used in the fillColor mapping attribute were the same as those established in anthromes_ucs_legend, in order to maintain the colorimetric standard used throughout the work. Additionally, we removed the edges in the layer representation (color = NA), so that they would not overlap other polygons, affecting the visualization of other layers.
Finally, we added layer by layer to the interactive mapping so that we could observe whether the processing time was optimized and that the computer would be able to generate the interactive mapping of the land use and land cover anthromes in the State of São Paulo. Therefore, the following code represents the model for building interactive layer mapping.
Script: Construction of interactive mapping of soil use and land cover anthromes in the State of São Paulo by layers.
# Create a Leaflet map based on OpenStreetMap and add the artificial area layer
anthromes_ucs_interactive <- leaflet() %>%
addTiles() %>%
setView(lng = -47.9292, lat = -23.5505, zoom = 7) %>%
addPolygons(data = mixed_urban_multipolygons,
color = NA,
fillColor = "#FF4747",
fillOpacity = 1,
popup = "Mixed Urban")
# Add agricultural lands layer
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = agricultural_lands_multipolygons,
color = NA,
fillColor = "#F4B488",
fillOpacity = 1,
popup = "Agricultural Lands")
# Add pasture layer with management
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = livestock_lands_multipolygons,
color = NA,
fillColor = "#FFEEB7",
fillOpacity = 1,
popup = "Livestock Lands")
# Add forest occupation layer
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = sustainable_use_forest_multipolygons,
color = NA,
fillColor = "#A8D08D",
fillOpacity = 1,
popup = "Sustainable Use Forest")
# Add silviculture layer
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = silviculture_multipolygons,
color = NA,
fillColor = "#E2EFD9",
fillOpacity = 1,
popup = "Silviculture")
# Add preservation and conservation layer
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = preservation_conservation_multipolygons,
color = NA,
fillColor = "#538135",
fillOpacity = 1,
popup = " Preservation and Conservation")
#Add agricultural villages layer
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = agricultural_villages_multipolygons,
color = NA,
fillColor = "#ED833B",
fillOpacity = 1,
popup = "Agricultural Villages")
# Add waters layer
anthromes_ucs_interactive <- anthromes_ucs_interactive %>% addPolygons(data = waters_multipolygons,
color = NA,
fillColor = "#000066",
fillOpacity = 1,
popup = "Waters")
# Add the legend to the map
anthromes_ucs_interactive <- anthromes_ucs_interactive %>%
addLegend(position = "bottomright",
colors = anthromes_ucs_legend$colors,
labels = anthromes_ucs_legend$tipology,
title = "Tipology")
Source: the authors (2024). Caption: code illustrating the construction of the interactive mapping of anthropogenic biomes of land use and land cover in the State of São Paulo through the leaflet() function in the Open Street Maps database. Each command line represents the insertion of one of the layers that represent the different anthromes in the State.
Once the mapping was structured, we moved on to visualizing the interactive mapping of land use and land cover anthromes in the State of São Paulo.
Figure: Interactive mapping of land use and land cover anthromes in the State of São Paulo.
anthromes_ucs_interactive