analysis.knit

Context & Objectives

To be continued…

We’ll work through 2 processes:
1. ISSA process method well described into by Chatterjee et AL. in 2024 (Section III.1) 2. The more classic State– space models method using and testing the aniMotum package described by Jonsen et AL. in 2022 (Section III.2) 3. Conclusion between the results brought from the two methods

In addition, we’ll also provide descriptive analysis regarding the nest phenology depending the factor “enclosure/open habitats”: 1. Nesting 2. Chick-rearing

STEP 1 - Download and extract the RProject from GitLab to your device :
https://gitlab.com/TheMaskou/tly_habitat_selection_bauges/-/archive/master/tly_habitat_selection_bauges-master.zip

STEP 2 - Open the analysis.R file located in /R folder and run its code - also detailed step by step here below.

STEP 3 - Read below how the study has been processed through the R code, and get better informed with the details provided.

I. SETTING : Set the project & download the data

Run the following code to set-up the project: set the working directory, check and update R and Rstudio (optional), load and update packages and functions, download the data from G-drive.

          #Settings the working directory - here below the default (delete the dash and run on .R)
          #current_wd                         <- dirname(rstudioapi::getActiveDocumentContext()$path)

          #Settings the project
          suppressWarnings(source(here::here("project_setting.R")))

## Error in eval(ei, envir): object 'user_choice_update_R' not found

          #Cleaning Global environment (delete the dash here below)
          #rm(cran_packages, github_packages, n_i_p_cran, n_i_p_ghub)

          #Specify the Google link to download the files from (drive_id = );
          #Set the location where g_data will be stored (path_data = ./data, the default); 
          #Run the code and download the data
          yn_data_download(drive_id  = "1gtA-lACNH2kIbn5SuXUm17p8-KMH7Jsw", 
                           path_data = path_data)

## Error in yn_data_download(drive_id = "1gtA-lACNH2kIbn5SuXUm17p8-KMH7Jsw", : could not find function "yn_data_download"

II. DATA : Load the data

Open the data just downloaded and used in this study, and load them in the environment of Rstudio :

          #Opening the 'sf' data set (birds)
          data_sum                          <- readRDS(here("data", "data_sum.rds"))
          data_all                          <- read_rds(here("data", "tot.ind.bauges.rds")) %>% st_transform(2154)
      
          #Optional data
          data_BDD                          <- st_read(here("data", "bilan_BDD_nid_poule_bauges_2024.gpkg"))

## Reading layer `t_nid_nid' from data source 
##   `D:\Documents\Etudes supérieures & Pro\5-POST_MASTER\FRANCE\OFB\2024 - Méribel\Bauges_Sc\bauges_tly_female-habitat-selection\data\bilan_BDD_nid_poule_bauges_2024.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 9 features and 15 fields (with 2 geometries empty)
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 6.224168 ymin: 45.61211 xmax: 6.255551 ymax: 45.70174
## Geodetic CRS:  WGS 84

          #Opening the area data
          study_area                        <- read_stars(here("data", "lidar_map.tif"))

Information about the data you should have now appearing in your environment: - data_sum: table that summarize all of the meta information concerning the birds studied - edited by the author - data_BDD: table providing additional meta information to the data_sum - provided by OFB - data_all: data-set providing all of the GPS information - collected from GPS trackers and cleaned-up by Marc Montadert - study_area: raster manually completed on QGIS to acquire good enough data. Indeed, since from the 2 initial raster, resolutions and metadata were too different, we could not use efficiently an automated command (ie. packages as ‘terra’ ; ‘sf’ or ‘stars’).

Open here the data for one specified bird (Optional) :

          #Bird selected
          bird                              <- "Arnica"

          #Opening the bird selected data
          head(assign(bird,
                      st_read(here("data", paste0(bird, ".gpkg"))),
                      envir = .GlobalEnv))

## Reading layer `Arnica' from data source 
##   `D:\Documents\Etudes supérieures & Pro\5-POST_MASTER\FRANCE\OFB\2024 - Méribel\Bauges_Sc\bauges_tly_female-habitat-selection\data\Arnica.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 598 features and 21 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 951250.1 ymin: 6506947 xmax: 951765.1 ymax: 6507531
## Projected CRS: RGF93 v1 / Lambert-93

## Simple feature collection with 6 features and 21 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 951283.9 ymin: 6507434 xmax: 951420 ymax: 6507499
## Projected CRS: RGF93 v1 / Lambert-93
##   ani_nom tag_id                date code_jour       jour    heure
## 1  Arnica  17459 2017-07-18 20:03:41       199 2017-07-18 19:03:41
## 2  Arnica  17459 2017-07-18 20:13:26       199 2017-07-18 19:13:26
## 3  Arnica  17459 2017-07-18 20:24:35       199 2017-07-18 19:24:35
## 4  Arnica  17459 2017-07-18 20:33:20       199 2017-07-18 19:33:20
## 5  Arnica  17459 2017-07-18 20:43:20       199 2017-07-18 19:43:20
## 6  Arnica  17459 2017-07-18 20:53:27       199 2017-07-18 19:53:27
##   heure_relative_solaire     am_pm period_jour period_chant saison   cor
## 1               00:12:54 apresmidi        jour      journee    ete FALSE
## 2               00:03:09 apresmidi        jour      journee    ete FALSE
## 3              -00:08:00 apresmidi  crepuscule      journee    ete FALSE
## 4              -00:16:45 apresmidi  crepuscule      journee    ete FALSE
## 5              -00:26:45 apresmidi  crepuscule      journee    ete FALSE
## 6              -00:36:52 apresmidi  crepuscule      journee    ete FALSE
##       dist  dt  rel_angle altitude mnt_altitude temperature acc_x acc_y acc_z
## 1 71.30711 585         NA     1714      1785.21          22    42   -79  1026
## 2 22.22461 669  2.0198721     1721      1761.20          23    76   -77  1026
## 3 23.20683 525  1.9254387     1764      1756.01          23    81    56  1028
## 4 81.92260 600 -1.7887462     1756      1767.50          23  -293   182   765
## 5 44.21306 607 -0.4373428     1750      1772.85          20   219  -573   829
## 6 26.11233 593  1.6212039     1779      1784.86          19   155  -600   799
##                       geom
## 1   POINT (951393 6507434)
## 2   POINT (951420 6507499)
## 3 POINT (951397.8 6507498)
## 4 POINT (951407.2 6507477)
## 5 POINT (951326.8 6507461)
## 6 POINT (951283.9 6507471)

Information about the data you should have now appearing in your environment: - “bird”: a sf object called depending the name of the bird you want to work on and that you specified line 84 is now in your environment. NB: Rather change and re-specify the ‘bird’ object with the bird’s name you want to work on than using filter() on data_sum. Unique bird files are stored in your /data.

III. ANALYSIS :

## III.1 ISSA process method (Chatterjee et AL., 2024)

# Presentation of the method

Integrated step- selection analysis (ISSA) offers an appealing framework for modelling complex movements in response to local environmental features. ISSA is the ability to model both habitat selection, using covariates at the end of the movement step, and habitat- dependent movement, by allowing an animal’s step lengths (distances between sequential locations) and turn angles (changes in direction) to depend on habitat at the start of the movement step. Tentative movement parameters are first estimated using observed step lengths and turn angles, ignoring the effect of habitat selection on observed movements, and then movement parameters are updated using additional coefficients from the fitted conditional logistic regression model. Mixed ISSA can be used to study among-individual variation in both movement and habitat-selection parameters.

# Prepare the data: step lengths & turn angles extraction

Packages used : ‘sf’, ‘amt’

          #Create .X and .Y columns
          Arnica_track                       <- Arnica %>%
                                                  mutate(x = st_coordinates(geom)[, "X"],
                                                         y = st_coordinates(geom)[,"Y"])

          #Create the bird_track
          Arnica_track                       <- make_track(Arnica_track, 
                                                           .x = x, 
                                                           .y = y, 
                                                           .t = date)
          
          #Extract step lengths & turn angles (in radians)
          Arnica_track                       <- steps(Arnica_track)
          
          
          hist(Arnica_track$ta_)

# Fit a distribution for step length

Packages used : ‘fitdistrplus’

          #Visualize sl_
          hist(Arnica_track$sl_)

          #Fitting different distributions to sl_ 
          sl_exp                            <- fitdist(Arnica_track$sl_, "exp"    , method="mle")
          sl_logis                          <- fitdist(Arnica_track$sl_, "logis"  , method="mle")
          sl_norm                           <- fitdist(Arnica_track$sl_, "norm"   , method="mle")
          sl_wb                             <- fitdist(Arnica_track$sl_, "weibull", method="mle", start = list(scale = 1, shape = 0.5)) # (Fandos et AL., 2023)
          
          #Visualize the fitness sl_ data VS distribution law
          par(mfrow = c(2,3))
          plot(sl_exp)

          plot(sl_logis)

          plot(sl_norm)

          plot(sl_wb)

          #Log-Likelihood comparison (highest is best - position 1)
          print(sort(sapply(list("Exp dist. Log(L):"     = sl_exp, 
                                 "Logi dist. Log(L):"    = sl_logis, 
                                 "Norm dist. Log(L):"    = sl_norm,
                                 "Weibull dist. Log(L):" = sl_wb), logLik), decreasing = T))

## Weibull dist. Log(L):     Exp dist. Log(L):    Logi dist. Log(L): 
##             -2462.030             -2478.465             -2615.043 
##    Norm dist. Log(L): 
##             -2730.887

          #AIC comparison for distribution selection (lowest is best - position 1) 
          print(sort(sapply(list("Exp dist. AIC"     = sl_exp, 
                                 "Logi dist. AIC"    = sl_logis, 
                                 "Norm dist. AIC"    = sl_norm,
                                 "Weibull dist. AIC" = sl_wb), AIC)))

## Weibull dist. AIC     Exp dist. AIC    Logi dist. AIC    Norm dist. AIC 
##          4928.060          4958.930          5234.087          5465.774

          #GOF comparison for distribution selection (lowest is best - position 1) 
          #Anderson-Darling test focused in - gives more weight to the tails of the distribution (McClintock & Michelot, 2021)
          print(sort(sapply(list("Exp dist. GOF"     = sl_exp, 
                                 "Logi dist. GOF"    = sl_logis, 
                                 "Norm dist. GOF"    = sl_norm,
                                 "Weibull dist. GOF" = sl_wb), function(x) gofstat(x)$ad)))

## Weibull dist. GOF.1-mle-weibull         Exp dist. GOF.1-mle-exp 
##                        7.046866                       14.972344 
##      Logi dist. GOF.1-mle-logis       Norm dist. GOF.1-mle-norm 
##                       21.650764                             Inf

          #Clean environment
          rm(sl_logis, sl_norm)

Reminder: AIC is an estimator of prediction error and relative quality of statistical models for a given set of data. Based on the classic formula, the AIC for von Mises distribution is computed as:
- AIC = 2Np − 2log(L), where Np is the number of parameters and L log_likelihood

Regarding the steps length data ‘sl_’ distribution, has been tested - visually plot() and regarding the AIC values - different function of density to fit the data.

We then find above:
- the Weibull distribution is the most likely against Exponential, Logistic and Normal distributions (log(L): -2462 > -2478 > -2615 < 2730) and suggests a more appropriate model for turn angles distribution - the Weibull distribution is rather selected by the Akaike Information criterion than the Exponential, Logistic and Normal distributions (AIC: 4928 < 4958 < 5234 < 5465) - Regarding the GOF - Anderson-Darling test result focused on, (McClintock & Michelot, 2021) - the Weibull value is the lowest in comparison to the Exponential, Logistic and Normal distribution values (7 < 14 < 21 < Infinity). It suggests the Weibull distribution as the one having the less differences with our data (sl_) in their distributions.

We will then fit an Weibull model to the sl_ data-set - step lengths. A very flexible distribution (Wilson et AL., 2022; Fandos et AL., 2022)

# Fit a distribution for turn angle

Packages used : ‘fitdistrplus’, ‘circular’

          #Visualize ta_
          hist(na.omit(Arnica_track$ta_))

          #Fitting von Mises distributions to ta_ (assuming that is the best distribution)
          ta_unif                            <- fitdist(c(na.omit(Arnica_track$ta_)), "unif", method = "mle") #c() use because works with vector only
          ta_vm                              <- mle.vonmises(na.omit(Arnica_track$ta_)) #mle.vonmises() used because fitdist() doesn't support von Mises method

## Warning in as.circular(x): an object is coerced to the class 'circular' using default value for the following components:
##   type: 'angles'
##   units: 'radians'
##   template: 'none'
##   modulo: 'asis'
##   zero: 0
##   rotation: 'counter'
## conversion.circularxradians0counter2pi

          ta_wc                              <- mle.wrappedcauchy(na.omit(Arnica_track$ta_)) #mle.vonmises() used because fitdist() doesn't support Wrapped Cauchy method

## Warning in as.circular(x): an object is coerced to the class 'circular' using default value for the following components:
##   type: 'angles'
##   units: 'radians'
##   template: 'none'
##   modulo: 'asis'
##   zero: 0
##   rotation: 'counter'
## conversion.circularxradians0counter2pi

          #Visualize the fitness ta_ data VS distribution law
          #Struggling, might have to create a custumized function?
          
          #Distribution selection:
          
          #Log-Likelihood comparison (highest is best)
          print(sort(sapply(list(
                     "van Mises dist. Log(L):"       = suppressWarnings(
                                                       sum(log(dvonmises(na.omit(Arnica_track$ta_), mu = ta_vm$mu, kappa = ta_vm$kappa)))),
                     "Uniform dist. Log(L):"         = logLik(ta_unif),
                     "Wrapped Cauchy dist. Log(L):"  = suppressWarnings(
                                                       sum(log(dwrappedcauchy(na.omit(Arnica_track$ta_), mu = ta_wc$mu, rho = ta_wc$rho))))),
                            as.numeric), decreasing = T))

## Wrapped Cauchy dist. Log(L):      van Mises dist. Log(L): 
##                    -1065.168                    -1067.496 
##        Uniform dist. Log(L): 
##                    -1092.334

          #AIC comparison for distribution selection (lowest is best - position 1) 
          print(sort(sapply(list(
                     "Uniform dist. AIC:"        = ta_unif$aic, 
                     "Wrapped Cauchy dist. AIC:" = 2*2 - 2 * suppressWarnings(
                                                   sum(log(dwrappedcauchy(na.omit(Arnica_track$ta_), mu = ta_wc$mu, rho = ta_wc$rho)))),
                     "von Mises dist. AIC:"      = 2*2 - 2 * suppressWarnings(
                                                   sum(log(dvonmises(na.omit(Arnica_track$ta_), mu = ta_vm$mu, kappa = ta_vm$kappa))))),
                             as.numeric)))

## Wrapped Cauchy dist. AIC:      von Mises dist. AIC:        Uniform dist. AIC: 
##                  2134.335                  2138.993                  2188.668

          #Calculate U² statistic for GOF (lowest is best - position 1)
          print(sort(sapply(list(
                     "GOF for von Mises dist.:"      = GOF_vm(na.omit(Arnica_track$ta_)),
                     "GOF for Wrapped Cauchy dist.:" = GOF_wc(na.omit(Arnica_track$ta_)),
                     "GOF for Uniform dist.:"        = GOF_unif(na.omit(Arnica_track$ta_))),
                             as.numeric)))

##        GOF for Uniform dist.:      GOF for von Mises dist.: 
##                      95.65582                     125.00311 
## GOF for Wrapped Cauchy dist.: 
##                    8325.79088

          #Cleaning environment
          rm(ta_unif)

Visually, the distribution of our turn angles - hist(na.omit(Arnica_track$ta_)) - data would fit what is called von Mises, and it’s relyable on the study made by Lond et AL, 2022. Since the hist() suggests a uniform distribution too, both are tested.

The packages do not always perform the extraction of criteria (log-likelihood, AIC, GOF) to select the best distribution to apply. Some have been done manually.

Reminder: Likelihood is the product of every probabilities for each x to appear. To simplify with log the likelihood to transform all products in sums. Based on the classic formula, the Log-Likelihood (L) has been computed as:
- L = Log OF the sum of every ‘x’ THAT ARE distributed under the ‘von Mises function of density’ WITH the 2 parameters ‘mu’ and ‘kappa’.

Reminder: We will use for the Goodness Of Fit (GOF) the Watson’s U² statistic to evaluate the fit of circular data. Extracted manually, functions build with internet (Might have to be better dug out).

We then find above:

the Wrapped Cauchy distribution is the most likely against an van Mises and Uniform distribution (log(L): -1065 > -1067 > -1092) and suggests a more appropriate model for turn angles distribution
the Wrapped Cauchy distribution is rather selected by the Akaike Information criterion than the van Mises and Uniform distribution (AIC: 2134 < 2138 < 2189)
Regarding the GOF (Watson’s U² stat) results, since the Wrapped Cauchy distribution is very much defavored against Univorm and von Mises distribution (12705 > 125 > 96). Watson U² Statistic specifically measures how well the empirical distribution of the data matches the theoretical distribution. The high U² value given to Wrapped Cauchy model indicates a poor fit, suggesting that the observed angles deviate significantly from what would be expected under the Wrapped Cauchy distribution. It also suggests that bird might prefer turning uniformly (regarding the data-set available here) at random rather than turning angles (based on Wikipedia research and GOF_ta.pdf in /resource).

Log-Likelihood and AIC are metrics that assess how well a Wrapped Cauchy model explains the data based on the likelihood of observing the given data under that model. As the lowest AIC provided, it indicates a better model when considering both fit and complexity. Even if the U² value does not favor this model, we will fit a Wrapped Cauchy distribution to our data-set of ta_ - turn angles.

# Random steps VS Observed steps using

Packages used : ‘amt’

          #To be continued...
          #random_steps()

# III.1 Prepare the data & Fit the model (Jonsen et AL., 2022)

Packages used : ‘animotum’

          #To be continued...