Classificação de folhas e mapa de localização

Author

Adriana Simonetti, 2024

1. Instalar R, seguido de RStudio

Baixar R-4.3.1 for Windows (79 megabytes, 64 bit) – Instalar primeiro
Baixar e instalar RStudio (ou atualize sua instalação)
OBS: Caso já esteja instalado no seu PC, seguir para o próximo passo

2. Consulte os livros online

“Metadata Standards - Guide”

Artigo: “Tabular strategies for metadata in ecology, evolution, and the environmental sciences”

“ggplot: Elegant Graphics for Data Analysis”

3. Baixar pasta do projeto

Baixar a pasta ‘aula_organização_dados’, neste link de GDrive.
Baixar a pasta inteira. Na página de destino do link clicar no diretório ‘aula_organização_dados’ com o botão direito do mouse e selecione ‘download’. Deve cair na sua pasta de Downloads. Retire do zip e coloque a pasta ‘aula_organização_dados’ em qualquer lugar no seu micro.

3. Ativar o projeto e abrir o script QMD

No RStudio: File - Open Project - navegue até o local de sua pasta ‘aula_organização_dados’ e clicar duas vezes no arquivo ‘Biometria_2024.Rproj’. File - Open File - clicar duas vezes no arquivo ‘exerc_organização_dados_no_R.qmd’.
Agora seus caminhos dentro do script qmd serão relativos a esta pasta. A estrutura de pastas e subpastas acima da pasta de projeto não deve ser incluída nos caminhos.

Exercício: Ler dados coletados; classificar folhas; mapas de localizacao

1. Instale e carregue pacotes

Rodar este primeiro bloco de código usando ‘Run current chunk’

# Vetor contendo nomes dos pacotes; 
packages <- c("dplyr", 
              "randomForest", 
              "caret", 
              "sf",
              "terra", 
              "tidyterra",
              "tidyverse",
              "viridis",
              "ggplot2")

# Verifica se cada pacote está instalado e, se não, instala 
for (item in packages) { if (!item %in% installed.packages()[,"Package"]) {install.packages(item)}  
  
# Carrega na memória RAM cada pacote (cada item em pac_nomes) 
  
  library(item, character.only = TRUE)}

Warning: package 'dplyr' was built under R version 4.3.2


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Warning: package 'randomForest' was built under R version 4.3.2

randomForest 4.7-1.1

Type rfNews() to see new features/changes/bug fixes.


Attaching package: 'randomForest'

The following object is masked from 'package:dplyr':

    combine

Loading required package: ggplot2

Warning: package 'ggplot2' was built under R version 4.3.2


Attaching package: 'ggplot2'

The following object is masked from 'package:randomForest':

    margin

Loading required package: lattice

Warning: package 'sf' was built under R version 4.3.2

Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE

Warning: package 'terra' was built under R version 4.3.2

terra 1.7.65

Warning: package 'tidyterra' was built under R version 4.3.3


Attaching package: 'tidyterra'

The following object is masked from 'package:stats':

    filter

Warning: package 'readr' was built under R version 4.3.2

Warning: package 'purrr' was built under R version 4.3.2

Warning: package 'stringr' was built under R version 4.3.2

Warning: package 'lubridate' was built under R version 4.3.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.0
✔ readr     2.1.5

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ randomForest::combine() masks dplyr::combine()
✖ tidyr::extract()        masks terra::extract()
✖ tidyterra::filter()     masks dplyr::filter(), stats::filter()
✖ dplyr::lag()            masks stats::lag()
✖ purrr::lift()           masks caret::lift()
✖ ggplot2::margin()       masks randomForest::margin()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Warning: package 'viridis' was built under R version 4.3.2

Loading required package: viridisLite

2. Importar dados coletados (em csv) para o formato dataframe

# Carregue o dataset de um específico arquivo CSV 

# Dataset com 8 variáveis coletadas com 30 repeticoes de cada espécie (3 espécies), totalizando 90 obs
dat <- read.csv("./tables/dat_example.csv", 
                            header = T, sep = ",")

3. Examine as estrutura dos dados e defina as classes

# Examine a estrutura dos dados
str(dat)

'data.frame':   90 obs. of  9 variables:
 $ tree_class        : int  1 1 1 1 1 1 1 1 1 1 ...
 $ altura_cm         : int  28 31 28 30 29 32 30 28 31 29 ...
 $ largura_cm        : int  16 17 15 16 15 17 16 15 17 16 ...
 $ massa_fresca_g    : int  130 140 130 135 130 140 135 125 135 130 ...
 $ area_foliar_cm2   : int  310 315 305 320 300 325 310 305 320 265 ...
 $ esbeltez          : num  1.75 1.82 1.87 1.88 1.93 1.88 1.88 1.87 1.82 1.81 ...
 $ y_coord_geographic: num  -3.09 -3.09 -3.09 -3.09 -3.09 ...
 $ x_coord_geographic: num  -60 -60 -60 -60 -60 ...
 $ dap               : int  50 50 50 50 50 50 50 50 50 50 ...

# Examine as estatísticas dos dados numéricos
summary(dat)

   tree_class   altura_cm       largura_cm    massa_fresca_g  area_foliar_cm2
 Min.   :1    Min.   :18.00   Min.   :10.00   Min.   : 95.0   Min.   :190.0  
 1st Qu.:1    1st Qu.:26.00   1st Qu.:15.00   1st Qu.:130.0   1st Qu.:214.2  
 Median :2    Median :30.00   Median :17.00   Median :140.0   Median :307.5  
 Mean   :2    Mean   :30.56   Mean   :16.68   Mean   :153.4   Mean   :308.4  
 3rd Qu.:3    3rd Qu.:35.00   3rd Qu.:19.00   3rd Qu.:193.8   3rd Qu.:388.8  
 Max.   :3    Max.   :43.00   Max.   :23.00   Max.   :215.0   Max.   :480.0  
    esbeltez     y_coord_geographic x_coord_geographic      dap       
 Min.   :1.120   Min.   :-3.092     Min.   :-59.99     Min.   :15.00  
 1st Qu.:1.820   1st Qu.:-3.092     1st Qu.:-59.99     1st Qu.:15.00  
 Median :1.870   Median :-3.092     Median :-59.99     Median :30.00  
 Mean   :1.849   Mean   :-3.092     Mean   :-59.99     Mean   :31.67  
 3rd Qu.:1.897   3rd Qu.:-3.092     3rd Qu.:-59.99     3rd Qu.:50.00  
 Max.   :2.700   Max.   :-3.092     Max.   :-59.99     Max.   :50.00

# Transforme a coluna "tree_class" em classes
dat$tree_class <- factor(dat$tree_class)

# Examine a estrutura dos dados novamente
str(dat)

'data.frame':   90 obs. of  9 variables:
 $ tree_class        : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ altura_cm         : int  28 31 28 30 29 32 30 28 31 29 ...
 $ largura_cm        : int  16 17 15 16 15 17 16 15 17 16 ...
 $ massa_fresca_g    : int  130 140 130 135 130 140 135 125 135 130 ...
 $ area_foliar_cm2   : int  310 315 305 320 300 325 310 305 320 265 ...
 $ esbeltez          : num  1.75 1.82 1.87 1.88 1.93 1.88 1.88 1.87 1.82 1.81 ...
 $ y_coord_geographic: num  -3.09 -3.09 -3.09 -3.09 -3.09 ...
 $ x_coord_geographic: num  -60 -60 -60 -60 -60 ...
 $ dap               : int  50 50 50 50 50 50 50 50 50 50 ...

# Remove two columns
dat_subset <- subset(dat, select = -c(y_coord_geographic, x_coord_geographic))

# Check the class counts
class_counts_dat <- table(dat_subset$tree_class)
class_counts_dat


 1  2  3 
30 30 30

# 1  2  3 
# 30 30 30

Random Forest method

5-fold cross validation

4. Criar modelo (rf) usando dados de treinamento e teste (5f-cv)

set.seed(123)

# Define a configuração de validação cruzada com 10 folders
fitcontrol_10fold <- trainControl(method = "cv", number = 10, search = "random", savePredictions = TRUE)

# Train the Random Forest model with 10-fold cross-validation 
mod_10cv <- train(factor(tree_class) ~ ., 
                  data = dat_subset, 
                  method = "rf", 
                  trControl = fitcontrol_10fold, 
                  tuneLength = 2, 
                  ntree = 10,
                  maxnodes = 30, # Número máximo de nós na árvore
                  max_depth = 10) # Profundidade máxima da árvore

# Check the best parameters
mod_10cv$bestTune

  mtry
1    3

# Plot variable importance
plot(varImp(mod_10cv, scale = FALSE), main = "Var. Imp. RF 5 Fold CV")

# Subset predictions with the best mtry parameter
sub_mod_10cv <- subset(mod_10cv$pred, mtry == mod_10cv$bestTune$mtry)

# Confusion Matrix
conf_matrix_mod <- confusionMatrix(table(Classificado = sub_mod_10cv$pred, Reference = sub_mod_10cv$obs))

print(conf_matrix_mod)

Confusion Matrix and Statistics

            Reference
Classificado  1  2  3
           1 30  0  0
           2  0 30  0
           3  0  0 30

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.9598, 1)
    No Information Rate : 0.3333     
    P-Value [Acc > NIR] : < 2.2e-16  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: 1 Class: 2 Class: 3
Sensitivity            1.0000   1.0000   1.0000
Specificity            1.0000   1.0000   1.0000
Pos Pred Value         1.0000   1.0000   1.0000
Neg Pred Value         1.0000   1.0000   1.0000
Prevalence             0.3333   0.3333   0.3333
Detection Rate         0.3333   0.3333   0.3333
Detection Prevalence   0.3333   0.3333   0.3333
Balanced Accuracy      1.0000   1.0000   1.0000

Matriz de confusao (acurácia usuário e produtor)

Criando mapa de localizacao com Modelo Digital do Terreno(MDT) da cidade de Manaus

5. Importar raster geotiff para o formato SpatRaster

mdt_1 <- terra::rast("./rasters/manaus_s1.tif")

mdt_2 <- terra::rast("./rasters/manaus_s2.tif")

print(mdt_1)

class       : SpatRaster 
dimensions  : 3601, 3601, 1  (nrow, ncol, nlyr)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : -60.00014, -58.99986, -4.000139, -2.999861  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : manaus_s1.tif 
name        : manaus_s1

print(mdt_2)

class       : SpatRaster 
dimensions  : 3601, 3601, 1  (nrow, ncol, nlyr)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : -61.00014, -59.99986, -4.000139, -2.999861  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : manaus_s2.tif 
name        : manaus_s2

6. Combinar os rasters

# Combinar os rasters
mdt_combined <- terra::merge(mdt_1, mdt_2)

print(mdt_combined)

class       : SpatRaster 
dimensions  : 3601, 7201, 1  (nrow, ncol, nlyr)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : -61.00014, -58.99986, -4.000139, -2.999861  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source(s)   : memory
varname     : manaus_s1 
name        : manaus_s1 
min value   :       -44 
max value   :       129

# Plotar o raster combinado
plot(mdt_combined, main = "MDT Combined")

7. Recortar o raster pela área de estudo

# Visualizar o MDT com escala de cor viridis
plot(mdt_combined, col = viridis(256), main = "MDT Combined")

# Importar o vetor da área de estudo
study_area <- sf::st_read("./shp/study_local.shp")

Reading layer `study_local' from data source 
  `C:\R_projects\Biometria_2024\Biometria_2024\shp\study_local.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -59.99672 ymin: -3.093296 xmax: -59.99209 ymax: -3.090149
Geodetic CRS:  WGS 84

# Criar um buffer em torno da área de estudo
study_area_buffer <- sf::st_buffer(study_area, dist = 200)

# Converter o buffer para o mesmo CRS que mdt_combined
study_area_buffer <- sf::st_transform(study_area_buffer, crs(mdt_combined))

# Cortar o mdt_combined usando o buffer da study_area
mdt_combined_cropped <- terra::crop(mdt_combined, study_area_buffer)

plot(mdt_combined_cropped, col = viridis(256), main = "Study area")

8. Insira as coordenadas das árvores (em grau decimal)

# Selecione o rótulo das árvores, o dap e as coordenadas 
coords_geog <- dat %>% select(tree_class
,x_coord_geographic, y_coord_geographic, dap)

# Mantenha somente uma linha de coordenada por árvore
unique_coords <- coords_geog %>%
  distinct(x_coord_geographic, y_coord_geographic, .keep_all = T)

# Remova os NAS
unique_coords <- na.omit(unique_coords)

# Conferir o df
head(unique_coords)

  tree_class x_coord_geographic y_coord_geographic dap
1          1          -59.99302           -3.09190  50
2          2          -59.99319           -3.09164  15
3          3          -59.99363           -3.09162  30

# Criar um objeto vetor com as coordenadas e os atributos desejados
unique_coords_sf <- unique_coords %>%
                    st_as_sf(coords = c("x_coord_geographic", "y_coord_geographic"), crs = 4326) %>%
                    mutate(dap = unique_coords$dap,
                    tree_class = unique_coords$tree_class)

#Escale os dados de DAP
(unique_coords_sf$dap)/10

[1] 5.0 1.5 3.0

9. Plote o mapa final de localizacao

# Plot a area de estudo
p_1 <- ggplot() +
     geom_spatraster(data = mdt_combined_cropped) +
     scale_fill_viridis() +
     labs(title = "Study area", fill = "Elevation") +
     theme_minimal()+
     geom_sf(data = unique_coords_sf, color = "white", size = 2) +
    ggspatial::annotation_scale(location = "bl", 
              pad_x=unit(1, "cm"), 
              pad_y = unit(1, "cm")) +
    ggspatial::annotation_north_arrow(location = "tr", 
              height = unit(1.0, "cm"), # encolhe a seta
              width = unit(1.0, "cm"), # encolhe a seta
              pad_x=unit(1, "cm"), 
              pad_y = unit(1, "cm"))

p_1

# Plot a area de estudo com o DAP proporcional ao tamanho

cores_tree_class <- c("1" = "red", "2" = "blue", "3" = "green")  

p_2 <- ggplot() +
     geom_spatraster(data = mdt_combined_cropped) +
     scale_fill_gradientn(colors = c("black", "white")) +
     labs(title = "Study area", fill = "Elevation") +
     theme_minimal()+
     geom_sf(data = unique_coords_sf, aes(color = as.factor(tree_class)), size = (unique_coords_sf$dap)/10) +
     scale_color_manual(values = cores_tree_class,name = "Tree Class")+
     ggspatial::annotation_scale(location = "bl", 
              pad_x=unit(1, "cm"), 
              pad_y = unit(1, "cm")) +
     ggspatial::annotation_north_arrow(location = "tr", 
              height = unit(1.0, "cm"), # encolhe a seta
              width = unit(1.0, "cm"), # encolhe a seta
              pad_x=unit(1, "cm"), 
              pad_y = unit(1, "cm"))

p_2

10. Exercício para entregar

Faca um metadado com os dados coletados, define os tipos de dados, unidade e todas as informacoes necessárias para os metadados.