Hooded Warbler (Setophaga citrina)

Source for this page is from the weecology lecture: Introduction to Species Distribution Modeling Using R

Required data for this work: download here

Libraries

library(dplyr)
library(ggplot2)
library(dismo)
library(conflicted)

Preventing conflicts between functions with similar name in different packages

conflicts_prefer(raster::extract)
conflicts_prefer(dplyr::select)
conflicts_prefer(dplyr::filter)

Step 1: Preparing the data

The basic idea is to prepare and visualize the relationship between the environmental conditions and the presence of the Hooded Warbler. By combining species location data with environmental data, the code enables the visualization of species occurrence within the environmental space, helping to understand how environmental factors like temperature and precipitation influence the distribution of the species.

Loading data for Hooded Warbler

hooded_warb_data <- read.csv("C:/Users/Matěj Tvarůžka/Documenty/Programming/R_statistics/Geocomputing_with_R/data/hooded_warb_locations.csv")

Loading data for environment

env_data_current <- stack("C:/Users/Matěj Tvarůžka/Documenty/Programming/R_statistics/Geocomputing_with_R/data/env_current.grd")
env_data_forecast <- stack("C:/Users/Matěj Tvarůžka/Documenty/Programming/R_statistics/Geocomputing_with_R/data/env_forecast.grd")

Plotting the world map for precipitation and minimum temperature

plot(env_data_current$tmin)

plot(env_data_current$precip)

Selecting longitude and latitude for extraction of suitable environmental conditions for the species

hooded_warb_locations <- select(hooded_warb_data, lon, lat)

Creating data.frame of suitable environmental conditions for the species

hooded_warb_env <- extract(env_data_current, hooded_warb_locations)

Join the environmental data.frame with data about warblers locations and presence

hooded_warb_data <- cbind(hooded_warb_data, hooded_warb_env)

Plot showing where species occurs in environmental space

ggplot(hooded_warb_data, mapping = aes(x = tmin, y = precip, color = present)) + geom_point()

Step 2: Building generalized linear model using multivariate logistic regression

Building a predictive model that can determine the likelihood of the Hooded Warbler’s presence based on environmental conditions, specifically minimum temperature and precipitation. By using logistic regression, the model can estimate the probability of species presence as a function of these environmental predictors.

Creating the general linear model

logistic_regr_model <- glm(present ~ tmin + precip,
                           family = binomial(link = "logit"),
                           data = hooded_warb_data)
summary(logistic_regr_model)
## 
## Call:
## glm(formula = present ~ tmin + precip, family = binomial(link = "logit"), 
##     data = hooded_warb_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6098  -0.5055  -0.2265  -0.1149   2.3704  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.1321879  0.1817786  -33.73   <2e-16 ***
## tmin         0.0115134  0.0009303   12.38   <2e-16 ***
## precip       0.0410928  0.0019430   21.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5773.6  on 5688  degrees of freedom
## Residual deviance: 3886.1  on 5686  degrees of freedom
##   (1 observation deleted due to missingness)
## AIC: 3892.1
## 
## Number of Fisher Scoring iterations: 6

There are significant coefficients for both precipitation and minimum temperature.

Step 3: Evaluation of the model

Assessing the performance of the logistic regression model by using the ROC curve. The ROC curve helps to visualize how well the model distinguishes between the presence and absence of the Hooded Warbler based on the predicted probabilities.

What is ROC (Receiver Operating Characteristic) model?

A tool for evaluating the performance of binary classification models. The ROC curve helps to understand how many correct positive predictions (true positives) the model makes versus incorrect positive predictions (false positives) across different thresholds.

Model Evaluation:

By examining the ROC curve, you can assess the model’s ability to correctly classify the presence and absence of the species. A model that perfectly distinguishes between the two will have an ROC curve that passes through the top left corner of the plot (indicating a true positive rate of 1 and a false positive rate of 0).

presence_data <- filter(hooded_warb_data, present == 1)
absence_data <- filter(hooded_warb_data, present == 0)

ROC model

evaluation <- evaluate(presence_data, absence_data, logistic_regr_model)
plot(evaluation , "ROC")

Step 4: Making predictions from the model

Using the logistic regression model to predict the presence of the Hooded Warbler under current and future environmental conditions, visualize these predictions, and assess changes over time.

Current predictions

Predicting the probability of the HW presence under current environmental conditions:

predictions <- predict(env_data_current, logistic_regr_model, type = "response")
plot(predictions, ext = extent(-140, -50, 25, 60), main = "Presence under Current Environmental Conditions")
# Adding points showing actual presence based on BBS data
points(presence_data[c("lon", "lat")], pch = "+", cex = 0.5)

The black crosses shows real points of presence. We can setup threshold Plotting a binary map where areas with a predicted probability greater than 50 % are shown as suitable for the HW

plot(predictions > 0.5, ext = extent(-140, -50, 25, 60),
     main = "Suitable areas with predicted probability greater than 50 %")

Calculating a more refined threshold for presence/absence based on the model evaluation, specifically using the prevalence statistic.

tr <- threshold(evaluation, stat = "prevalence")
plot(predictions > tr, ext = extent(-140, -50, 25, 60),
     main = "Suitable Areas with Predicted Probability by Refined Threshold")
points(presence_data[c("lon", "lat")], pch = "+", cex = 0.5)

Forecast predictions

Using the logistic regression model to predict the probability of the HW presence under forecasted (future) environmental conditions

forecasts <- predict(env_data_forecast, logistic_regr_model, type = "response")
plot(forecasts, ext = extent(-140, -50, 25, 60),
     main = "Probability of Presence under Forecasted Environmental Conditions
     in The Next 50 Years")

plot(forecasts > tr, ext = extent(-140, -50, 25, 60),
     main = "Probability of Presence under Forecasted Environmental Conditions
     in The Next 50 Years Using Refined Threshold")

Visualize the difference between the forecasted and current probabilities, showing the change in the suitability of the environment for the HW. Areas with positive values indicate increased suitability, while negative values indicate decreased suitability.

plot(forecasts - predictions, ext = extent(-140, -50, 25, 60),
     main = "Change in The Suitability of The Environment in The Next 50 Years")