WHCR Suitability Model Kansas - Random Forest Model Markdown

PART 1 - BACKGROUND AND GOALS

Introduction

The whooping crane (Grus americana) is the rarest crane (family Gruidae) species and was listed in 1970 as endangered under the Endangered Species Preservation Act. This bird species is endemic to North America (French, Converse, & Austin, 2019) and faces critical challenges during its long migratory journey (Armbruster, 1990). At one time, the population of whooping cranes exceeded 10,000 individuals (Canadian Wildlife Service and U.S. Fish and Wildlife Service, 2007) but, by the late 1940s the species was close to extinction, with only about 15 individuals remaining in the wild. Through combined conservation efforts, including habitat restoration, captive breeding programs, and different protection measures, there has been a significant increase in the whooping crane population. Their numbers today have risen to approximately 600 individuals in the wild (Leaverton & Fellows, 2023; U.S. Fish & WIldlife Service, 2023) . Despite their increase in numbers, whooping cranes still face threats from land use changes, predation, and hunting pressures during migration (Pearse A. T., Metzger, Brandt, Shaffer, & Bidwell, 2021; U.S. Fish and Wildlife Service., 2009).

Despite extensive research on whooping crane stopover habitat selection, there is still a significant information gap, especially along the Kansas migratory corridor. Prior studies have primarily focused on broad-scale habitat preferences across the Central Flyway (Pearse, et al., 2018; Harrell & Bidwell, 2020), often overlooking finer spatial and temporal variations critical for targeted conservation efforts. Habitat suitability models have traditionally been built using GIS-based methods (Kaminski, Comer, Garner, Hung, & Calkins, 2013; Watershed Institute, Inc., 2013). But opportunities exist to combine high-resolution spatial data with the latest machine learning techniques for habitat analyses (Cutler, et al., 2007). Since Kansas is a critical part of the whooping crante migration route – and a potential geographic bottleneck for the species, a more detailed approach to refine conservation and habitat management strategies is warranted.

Research Goals

This study aims to improve understanding of whooping crane stopover ecology to inform conservation strategies along the Aransas-Wood Buffalo migratory flyway. One question that will be investigated is whether characteristics of preferred whooping crane stopover habitats vary spatially, and potentially, temporally (i.e., depending on where and when they are in the migratory corridor. This study will also assess whether key variables change within (from one migratory season to the next) and between years by identifying important environmental factors (e.g., wetland cover, access to food, urban development, proximity to roads, and other land cover types) influencing whooping crane habitat selection in Kansasusing telemetry data in a random forest model (Breiman, 2001). By comparing the relative importance of these key environmental variables in Kansas and Nebraska, potentially important intra- and interstate differences might be uncovered. This study builds on past research suggesting that whooping cranes select areas with a higher percentage of wetlands and access to food while avoiding regions with substantial urban development.

Given the high wind resource potential in Kansas, another question that will be explored is how these migration preferences affect wind energy exploration within the state. This work will highlight the importance of integrating habitat modeling with land-use planning to facilitate prioritization of habitat protection and restoration while addressing potential conflicts with urbanization and renewable energy development along the migratory corridor.

Study Area

The Aransas-Wood Buffalo population is the only self-sustaining whooping crane population in the wild. It makes a biannual migration of about 4000 km (Hefley, Baasch, Tyre, & Blankenship, 2015) in the fall from its breeding ground in Wood Buffalo National Park (WBNP), Alberta, Canada, to its wintering ground in Aransas National Wildlife Refuge (ANWR) in Aransas, Texas, (Pearse A. T., et al., 2020). During its migration, the species has been observed to travel through the Canadian Prairies and the Great Plains of North America (Pearse, et al., 2015; Baasch D. M., Farrell, Pearse, Brandt, & Caven, 2019). Observational siting data have shown their presence in Alberta and Saskatchewan provinces in Canada and the states of North and South Dakota, Nebraska, Kansas, Oklahoma, and Texas, with the 75% confidence interval core corridor being narrowest in central Kansas (Pearse, et al., 2018).

The study area for this research is the entire state of Kansas, which was selected because of increased farming activities and the continuous conversion of prairie grassland into cropland. Another factor is the increase in wind energy exploration within the state, which could disrupt their migratory route and potentially cause a shift in the flyway. As an endangered bird species with highly specific habitat preferences during migration, there has also been a limited number of studies of the whooping crane habitat preferences within the state. For our initial analysis, we limited the geographic scope to the 95% confidence interval migratory corridor bounds but later expanded it to include the entire state of Kansas. The rationale behind this was to include all areas within Kansas where the probability of detection was greater than 0.

Figure 1. Whooping crane GPS telemetry points from their breeding habitat in Wood Buffalo National Park to their wintering habitat in Aransas National Wildlife Refuge along the expanse of their migratory corridor from 2010 to 2016. #

The state has an area of 82,278mi2 (213,100km²) and is the 15th largest state by area (United States Census Bureau, 2010). There are 105 counties within the state, and it has eight (8) distinct level III ecoregions, including Central Great Plains, Central Irregular Plains, Cross Timbers, Flint Hills, High Plains, Ozark Highlands, Southwestern Tablelands, and Western Corn Belt Plains (Bailey, Avers, King, & McNab, 1994). Of the 105 counties, the 95% bound of the cranes’ migratory corridor sits within 60 counties and encompasses four of the eight ecoregions, with 77.8% falling within the Central Great Plains ecoregion (Figure 3.2). This ecoregion is primarily a prairie region and contains the largest area of mixed grassland with a few trees scattered across the region (Burke, et al., 1991). It is home to approximately 2,900 species of plants and over 500 animal species (World Wildlife Fund, 2024). The eastern portion of this region receives the most rain and is known as the prairie pothole region (USGS: Land Management Research Program, n.d.).

Figure 2 Kansas: 95% confidence interval of migratory corridor ecoregions and county boundaries. #

Materials and Methods

Materials:

We acquired whooping crane sightings data (1942 – 2020) from the database compiled by the U.S. Fish and Wildlife Service (USFWS, Nebraska Field Office, unpublished data) and Whooping crane telemetry GPS data from 68 radio-tagged unique individuals from the Aransas – Wood Buffalo population from 2009 – 2017 (Pearse, et al., 2015). Because of potential biases in the observational data (Austin & Richert, 2001), we decided to use only the telemetry database, which has the most precise location points. There were records from 56 unique individuals (n = 1,253) for Kansas with an average observation per day of 4 hours; all data points were included in the analysis. According to (Belaire J. A., Kreakie, Keitt, & Minor, 2014), conservation efforts are targeted toward used habitats across the flyway. Therefore, we used all the stopover sites in the database without considering the migration season (Fall or Spring). We obtained 30m x 30m landcover classification data from the National Landcover Database (NLCD). To select the year to use for our land cover analysis, we tested to see how much of the land cover had changed during the data collection period and found that most of the area remained unchanged for 96% of the study area from 2008 to 2021. For our analysis, we therefore chose the 2011 land cover dataset because it falls within our study time period. The habitat selection type used for this study is multi-level, single-scale habitat selection, i.e., multiple factors are assessed at a single scale within each level. A grid size of 1 km by 1 km was adopted for this study because, based on past whooping crane literature, at this scale the environmental characteristics of the whooping crane were best analyzed (Belaire J. A., Kreakie, Keitt, & Minor, 2014). Four land cover types previously associated with the whooping crane were tested as predictor variables: agricultural land, wetlands and rivers, urban areas, and roads (Table 1). To generate our covariates, we examined the percent cover of each of these variables within our 1 km2 grid. Two additional covariates were generated to investigate how fine-scale and broad-scale landscape factors impact stopover habitats.

Methods:

We developed a categorical ecotone with 16 categories from A - P for fine-scaled investigation. This category quantifies the structure of the landscape in terms of the arrangement of certain features. Each ecotone category represents the distance of each 1 km2 cell from the nearest agricultural area and wetland in meters. This layer was added to examine if whooping cranes select for areas close to these landcover types and the threshold between these landcover types where stopover habitats start to become unfavorable. A second variable, bearing, was developed to analyze broad-scale landscape interactions. It describes the directional heading from the centroid of each cell towards the wintering ground in ANWR (Belaire J. A., Kreakie, Keitt, & Minor, 2014), providing information on how far along the migratory corridor cranes start to adhere to the center of the corridor, regardless of surrounding land cover type. It helps the model differentiate between used and available habitats. The bearing was developed with the geosphere package in R (Hijmans, Williams, & Vennes, 2011) and to avoid a bimodal distribution, 180 was subtracted from each value (Belaire J. A., Kreakie, Keitt, & Minor, 2014).

Species Distribution Modeling and Validation

A species distribution model was developed using a machine-learning approach to estimate the relative probability of use of a specific area based on its environmental characteristics and generate a suitability map within the state. The random forest method (package ’randomForest; (Liaw & Wiener, 2002)) was chosen because it is a statistical classifier that can effectively model the non-linear and complex relationships between environmental variables, offering simple interpretations and performing better than other classification procedures (Cutler, et al., 2007). 0A random forest is a collection of decision trees that make decisions and classify data. According to (Breiman, 2001), it consists of tree-structured classifiers {h(x,Θ_k ),k=1,…} where the {Θ_k} are independent identically distributed random numbers. Each tree casts a unit vote for the most popular class at input x. Unlike single classification trees, which are prone to overfitting and perform poorly on new data, the random forest model combines the simplicity of decision trees with flexibility, resulting in more accurate predictions.

Information on both species’ presence and absence is required to model the environmental space they can inhabit (Iturbide, et al., 2015). Getting true absence points is daunting for mobile species, especially those that migrate long distances. Therefore, pseudo-absence points are used as background data instead of true absence points. When using machine learning models, the model accuracy is impacted by the number of pseudo-absence points added to the model (Barbet-Massin, Jiguet, Albert, & Thuille, 2012). Therefore, it is essential to find the right balance between presence and pseudo-absence point. To avoid sampling bias, it is ideal to have an equally balanced sample of presence and absence points (i.e. 50/50 split) (Brizuela-Torres, Elith, Guillera-Arroita, & Briscoe, 2024) but in the absence of true absence points, (Barbet-Massin, Jiguet, Albert, & Thuille, 2012) proposed the 1/3 rule of sample balance. This rule suggests that at least 33% of the sample points are present locations. For our study, we used the Poisson Point process to generate pseudo-absences. We generated 2,000 random points across the entire study space and used these as our pseudo-absence points. We chose this number because it meets the 1/3 rule, with the sample balance of presence locations being 38.5%. We split the data into training and test sets for our model run and withheld 40% of our data during model training, ensuring each run had an equal number of presence and pseudo-absence points. Next, we tested for multicollinearity of all the variables at α=0.05 using a leave-one-out assessment (Sullins, et al., 2019).

We then estimated how important a variable is within the model using the mean decrease in accuracy (MDA). The mean decrease in accuracy shows how much accuracy is lost from the model when a variable is removed. The MDA doesn’t have a fixed range, but in general, a higher mean decrease in accuracy shows that a variable is more important for classifying the data (Cutler, et al., 2007). Ranks of model variable importance were then estimated using the mean decrease in out-of-bag error. We validated the model with data that wasn’t used in the predictive model training. The validation was done based on the accuracy, specificity, and sensitivity of the predictive model. The area under the receiver operating curve (AUC) was also estimated and used to evaluate the predictive power of the model. It assesses the ability of the model to correctly categorize data. AUC is an indicator of a model’s accuracy, and it ranges from 0 to 1, with 0.5 showing that the model was no different than random (Fielding & Bell, 1997). Finally, we generated a relative probability function map showing suitability within the study area. To test how well the probability map estimates occurrence, we overlaid the test data set over the map and then calculated how accurately the map depicted presence and pseudo-absence.

PART 2 - ANALYSIS

All analyses were conducted using R statistical software. The randomForest, tidyverse, caret, caTools, raster, sf, pROC, verification, rfUtilities, tools, ROCR, lubridate, geosphere, readr, adehabitatHS, lme4, lmerTest, amt, ggplot2, broom.mixed packages were used for data manipulation, analysis, and visualization. The pacman package was used for package management.

library(pacman)
p_load(tidyverse, randomForest, caret, caTools, raster, sf, pROC, verification, rfUtilities, tools, ROCR, lubridate, geosphere, readr, adehabitatHS, lme4, lmerTest, amt, ggplot2, broom.mixed, ggspatial, spData, terra)

Exploratory analysis

We ran an exploratory analysis on the telemetry data to look at the number of observations recorded per day, the transmission cycle, the first and last transmissions each day, whether bird locations formed clusters, and the flight distance between consecutive transmissions.

# Load the KS Telemetry file
WHCR_KS_Telemetry <- read.csv("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/KS_Telemetry_GPS_Data.csv", header=TRUE, sep=",")

# Convert time columns to proper datetime format
WHCR_KS_Telemetry$TIME_OM <- as.POSIXct(WHCR_KS_Telemetry$TIME_OM, format = "%m/%d/%Y %H:%M")
WHCR_KS_Telemetry$TIME_CTZ <- as.POSIXct(WHCR_KS_Telemetry$TIME_CTZ, format = "%m/%d/%Y %H:%M")
WHCR_KS_Telemetry$TIME_CTZ_R <- as.POSIXct(WHCR_KS_Telemetry$TIME_CTZ_R, format = "%m/%d/%Y %H:%M")

# Task 1: Average observations per bird per day
(avg_obs_per_day <- WHCR_KS_Telemetry %>%
  group_by(PTT_ID, DATE = as.Date(TIME_CTZ)) %>%
  summarise(observations = n()) %>%
  summarise(avg_observations = mean(observations)))

## # A tibble: 56 × 2
##    PTT_ID avg_observations
##     <int>            <dbl>
##  1  98794             2   
##  2  98801             2.29
##  3  98802             2.42
##  4  98803             3   
##  5  98804             2.67
##  6  98805             2.67
##  7  98806             2.08
##  8  98807             1.5 
##  9  98808             3.06
## 10  98809             3.42
## # ℹ 46 more rows

#Calculating overall average observations
(overall_avg <- mean(avg_obs_per_day$avg_observations, na.rm = TRUE))

## [1] 2.698673

# Task 2: Telemetry transmitting cycle (converted to hours)
data <- arrange(WHCR_KS_Telemetry, PTT_ID, TIME_CTZ_R)

transmitting_cycle_minutes <- data %>%
  group_by(PTT_ID) %>%
  mutate(transmitting_cycle = difftime(TIME_CTZ_R, lag(TIME_CTZ_R), units = "mins")) %>%
  summarise(avg_transmitting_cycle = mean(as.numeric(transmitting_cycle, units = "mins"), na.rm = TRUE))


# Calculate overall average per bird
overall_avg_per_bird <- mean(transmitting_cycle_minutes$avg_transmitting_cycle, na.rm = TRUE)

# Convert overall average to hours
overall_avg_hours <- overall_avg_per_bird / 60


# Task 3: First and last time of transmitting per day
first_last_transmit <- WHCR_KS_Telemetry %>%
  group_by(PTT_ID, DATE = as.Date(TIME_OM)) %>%
  summarise(first_transmit = min(TIME_CTZ), last_transmit = max(TIME_CTZ))

# Task 4: Clustering of bird points
# Assuming clustering by latitude and longitude within a certain radius
clustered_points <- WHCR_KS_Telemetry %>%
  group_by(PTT_ID, DATE = as.Date(TIME_OM)) %>%
  summarise(cluster_count = n_distinct(paste0(round(LAT, 3), round(LONG, 3))))



# Task 5: Flight distance between consecutive transmissions
flight_distance <- WHCR_KS_Telemetry %>%
  mutate(previous_time = lag(TIME_OM),
         previous_lat = lag(LAT),
         previous_long = lag(LONG)) %>%
  filter(!is.na(previous_time)) %>%
  mutate(distance = distm(cbind(previous_long, previous_lat), cbind(LONG, LAT), fun = distVincentyEllipsoid))

# Print results
print("Task 1: Average observations per bird per day")

## [1] "Task 1: Average observations per bird per day"

print(avg_obs_per_day)

## # A tibble: 56 × 2
##    PTT_ID avg_observations
##     <int>            <dbl>
##  1  98794             2   
##  2  98801             2.29
##  3  98802             2.42
##  4  98803             3   
##  5  98804             2.67
##  6  98805             2.67
##  7  98806             2.08
##  8  98807             1.5 
##  9  98808             3.06
## 10  98809             3.42
## # ℹ 46 more rows

print("Task 2: Telemetry transmitting cycle")

## [1] "Task 2: Telemetry transmitting cycle"

print(overall_avg_hours)

## [1] 778.4052

print("Task 3: First and last time of transmitting per day")

## [1] "Task 3: First and last time of transmitting per day"

print(first_last_transmit)

## # A tibble: 435 × 4
## # Groups:   PTT_ID [56]
##    PTT_ID DATE       first_transmit      last_transmit      
##     <int> <date>     <dttm>              <dttm>             
##  1  98794 2011-04-08 2011-04-08 01:47:00 2011-04-08 06:46:00
##  2  98801 2010-10-30 2010-10-30 16:32:00 2010-10-30 16:32:00
##  3  98801 2010-10-31 2010-10-30 22:31:00 2010-10-31 04:30:00
##  4  98801 2011-04-03 2011-04-02 19:59:00 2011-04-02 19:59:00
##  5  98801 2011-04-04 2011-04-03 19:55:00 2011-04-04 13:53:00
##  6  98801 2011-04-05 2011-04-04 19:53:00 2011-04-05 07:50:00
##  7  98801 2012-11-07 2012-11-07 01:03:00 2012-11-07 07:01:00
##  8  98801 2013-05-07 2013-05-06 20:13:00 2013-05-07 08:12:00
##  9  98802 2010-11-16 2010-11-15 23:38:00 2010-11-16 05:38:00
## 10  98802 2012-04-03 2012-04-03 00:16:00 2012-04-03 12:15:00
## # ℹ 425 more rows

print("Task 4: Clustering of bird points")

## [1] "Task 4: Clustering of bird points"

print(clustered_points)

## # A tibble: 435 × 3
## # Groups:   PTT_ID [56]
##    PTT_ID DATE       cluster_count
##     <int> <date>             <int>
##  1  98794 2011-04-08             2
##  2  98801 2010-10-30             1
##  3  98801 2010-10-31             2
##  4  98801 2011-04-03             1
##  5  98801 2011-04-04             3
##  6  98801 2011-04-05             2
##  7  98801 2012-11-07             2
##  8  98801 2013-05-07             2
##  9  98802 2010-11-16             1
## 10  98802 2012-04-03             1
## # ℹ 425 more rows

print(paste0("Average cluster is ", mean(clustered_points$cluster_count)))

## [1] "Average cluster is 2.0183908045977"

print("Task 5: Flight distance between consecutive transmissions")

## [1] "Task 5: Flight distance between consecutive transmissions"

print(mean(flight_distance$distance))

## [1] 143527.6

Bearing Calculations

Bearing captures the directional bearing from the centroid of each grid cell to the wintering grounds in the Aransas National Wildlife Refuge (ANWR), as described by Belaire et al. (2014). It provides a sense of orientation within the migratory corridor, indicating how aligned a particular location is with the general southward path cranes follow during migration. By doing this, it reflects how far along the migration route cranes begin to follow the central axis of the corridor, regardless of the specific land cover in the area. This spatial orientation helps the model distinguish between areas the cranes are likely to use versus those that are simply available but not chosen. Bearing was calculated using the geosphere package in R (Hijmans, Williams, & Vennes, 2011). To account for the circular nature of directional data and to avoid a bimodal distribution—where north- and south-facing directions might be misinterpreted—180 degrees was subtracted from each value, following the approach in Belaire et al. (2014).

# Load table with cell and wintering habitat coordinates
data <- read.csv("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/BearingTable.csv", header=TRUE, sep=",")

# Create bearing function from using geosphere package
# p1 = wintering ground and p2 = cell centroid
f <- function(x, output) {
  p1 = c(x["POINT1_X"], x["POINT1_Y"])
  p2 = c(x["POINT_X"], x["POINT_Y"])
  return(bearing(p1, p2))
}

# Apply bearing function to each row in the input, then add it to the dataframe
bearing <- apply(data, 1, f)
data2 <- data.frame(bearing)

# Write an output csv file that can be joined in GIS
write.csv(data2, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/OutputBearings.csv")

Ecotone Categories Analysis

We developed a categorical ecotone layer with 16 distinct classes, labeled A through P, to support a fine-scale analysis of landscape structure. This ecotone classification captures the spatial relationship between key landcover types—specifically, agricultural areas and wetlands. Each 1 km² cell in the study area was assigned a category based on its distance (in meters) to the nearest agricultural land and wetland. These categories reflect a gradient, from areas located very close to both landcover types to those farther away.

# Read CSV file
WHCRMCGridCell_dist <- read.csv("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/WHCR_KS_GridCell_Dist.csv")

# Display structure of the data frame
str(WHCRMCGridCell_dist)

## 'data.frame':    230256 obs. of  6 variables:
##  $ OID_            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ bearing         : num  -25.5 -25.5 -25.4 -25.4 -25.3 ...
##  $ DistanceAgric   : num  13972 13917 13909 13869 13836 ...
##  $ DistanceWetlands: num  3127 3917 4787 5699 4947 ...
##  $ Shape_Length    : num  4000 4000 4000 4000 4000 4000 4000 4000 4000 4000 ...
##  $ Shape_Area      : num  1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 ...

# Convert columns to numeric
WHCRMCGridCell_dist <- WHCRMCGridCell_dist %>%
  mutate(DistanceWetlands = as.numeric(DistanceWetlands),
         DistanceAgric = as.numeric(DistanceAgric))

# Define a function to categorize distances
categorize_distance <- function(DistanceWetlands, DistanceAgric) {
  ifelse(DistanceWetlands >= 0 & DistanceWetlands <= 100,
         ifelse(DistanceAgric >= 0 & DistanceAgric <= 100, "A",
                ifelse(DistanceAgric > 100 & DistanceAgric <= 500, "B",
                       ifelse(DistanceAgric > 500 & DistanceAgric <= 1000, "E", "J"))),
         ifelse(DistanceWetlands > 100 & DistanceWetlands <= 500,
                ifelse(DistanceAgric >= 0 & DistanceAgric <= 100, "C",
                       ifelse(DistanceAgric > 100 & DistanceAgric <= 500, "D",
                              ifelse(DistanceAgric > 500 & DistanceAgric <= 1000, "F", "K"))),
                ifelse(DistanceWetlands > 500 & DistanceWetlands <= 1000,
                       ifelse(DistanceAgric >= 0 & DistanceAgric <= 100, "G",
                              ifelse(DistanceAgric > 100 & DistanceAgric <= 500, "H",
                                     ifelse(DistanceAgric > 500 & DistanceAgric <= 1000, "I", "L"))),
                       ifelse(DistanceAgric >= 0 & DistanceAgric <= 100, "M",
                              ifelse(DistanceAgric > 100 & DistanceAgric <= 500, "N",
                                     ifelse(DistanceAgric > 500 & DistanceAgric <= 1000, "O", "P"))))))
}

# Create a new column 'ecotone_code'
ecotone <- WHCRMCGridCell_dist %>%
  #select(-(6:33)) %>%
  mutate(ecotone_code = categorize_distance(DistanceWetlands, DistanceAgric))


write_csv(ecotone, file = "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/ecotone.csv")

Data Preparation

All required data, as earlier stated, was imported and the data processing was carried out. The data processing and formatting carried out on the data includes:

Coordinate System Standardization: All spatial datasets were standardized to a consistent coordinate reference system (CRS) using the st_transform() function in the sf package for vector data and the projectRaster() function in the raster package for raster data.
Study Area Definition: The study area included the entire state of Kansas and Kansas boundary layer was used to delineate results.
Tidying and Cropping: Spatial datasets were intersected with the study area boundary to retain only relevant features within the study area. Dates that fall within the study period were randomly generated for all pseudo-absence points and all variables were converted to the right format. Additionally, raster datasets were cropped to the extent of the study area using the mask() function for raster data.

########## Import Data

WC_data <- read.csv("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/Final RF Variables.csv", header=TRUE, sep=",")
head(WC_data)

##   OID_    bearing ecotone_code Obs_Type Percent_cover_Agric Percent_cover_Road
## 1    1 -11.783226            A  Absence           19.952263          0.0000000
## 2    2  -7.102490            A  Absence           14.251602          1.3203970
## 3    3 -21.483424            M  Absence           95.493309          0.0000000
## 4    4   7.672267            A  Absence           29.459473          1.8332956
## 5    5 -10.377231            A  Absence           28.288782          0.6606433
## 6    6   1.283704            A  Absence            5.849168          1.1900764
##   Percent_cover_Urban Percent_cover_Wetland  Latitude  Longitude TIME_OM
## 1                   0             0.5400493 -288885.8 -271998.85        
## 2                   0             0.0000000 -190885.8 -199998.85        
## 3                   0             0.0000000 -202885.8 -479998.85        
## 4                   0             0.6866135 -149885.8   80001.15        
## 5                   0             1.0903701 -149885.8 -267998.85        
## 6                   0             2.3642426 -286885.8  -46998.85

########## Create Time Variable

# Define the date range and calculate the number of days
start_date <- as.Date("2010-03-23")
end_date <- as.Date("2016-11-25")
num_days <- as.numeric(difftime(end_date, start_date, units = "days"))


# Generate random numbers from 1 to 2435
set.seed(123)
random_days <- sample(1:num_days, size = 2000, replace = TRUE)


#Convert the random numbers to actual dates
random_dates <- start_date + random_days


# Create data frame for pseudo-absence points with their assigned random dates
pseudo_absence_date <- data.frame(
  OID_ = 1:2000,  
  random_date = random_dates
)
head(pseudo_absence_date)

##   OID_ random_date
## 1    1  2016-04-27
## 2    2  2011-08-31
## 3    3  2010-10-04
## 4    4  2015-04-08
## 5    5  2013-05-08
## 6    6  2013-08-27

#convert time column to date format
WC_data$TIME_OM <- as.Date(WC_data$TIME_OM, format = "%m/%d/%Y")
str(WC_data)

## 'data.frame':    3253 obs. of  11 variables:
##  $ OID_                 : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ bearing              : num  -11.78 -7.1 -21.48 7.67 -10.38 ...
##  $ ecotone_code         : chr  "A" "A" "M" "A" ...
##  $ Obs_Type             : chr  "Absence" "Absence" "Absence" "Absence" ...
##  $ Percent_cover_Agric  : num  20 14.3 95.5 29.5 28.3 ...
##  $ Percent_cover_Road   : num  0 1.32 0 1.833 0.661 ...
##  $ Percent_cover_Urban  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Percent_cover_Wetland: num  0.54 0 0 0.687 1.09 ...
##  $ Latitude             : num  -288886 -190886 -202886 -149886 -149886 ...
##  $ Longitude            : num  -271999 -199999 -479999 80001 -267999 ...
##  $ TIME_OM              : Date, format: NA NA ...

# Merge the pseudo-absence dates into WC_data based on OID
WC_data <- WC_data %>%
  left_join(pseudo_absence_date, by = "OID_") %>%
  mutate(
    obs_date = coalesce(as.Date(random_date), as.Date(TIME_OM))  # Ensure Date format & prioritize random_date
  ) %>%
  dplyr::select(-OID_, -TIME_OM, -random_date)  # Remove the original columns
head(WC_data)

##      bearing ecotone_code Obs_Type Percent_cover_Agric Percent_cover_Road
## 1 -11.783226            A  Absence           19.952263          0.0000000
## 2  -7.102490            A  Absence           14.251602          1.3203970
## 3 -21.483424            M  Absence           95.493309          0.0000000
## 4   7.672267            A  Absence           29.459473          1.8332956
## 5 -10.377231            A  Absence           28.288782          0.6606433
## 6   1.283704            A  Absence            5.849168          1.1900764
##   Percent_cover_Urban Percent_cover_Wetland  Latitude  Longitude   obs_date
## 1                   0             0.5400493 -288885.8 -271998.85 2016-04-27
## 2                   0             0.0000000 -190885.8 -199998.85 2011-08-31
## 3                   0             0.0000000 -202885.8 -479998.85 2010-10-04
## 4                   0             0.6866135 -149885.8   80001.15 2015-04-08
## 5                   0             1.0903701 -149885.8 -267998.85 2013-05-08
## 6                   0             2.3642426 -286885.8  -46998.85 2013-08-27

write.csv(WC_data, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Tables/Final RF Variables_Date.csv")


########## Tidy Data

# Transform data types
WC_data <- as.data.frame(WC_data) %>% 
  mutate(
    sd = as.factor(ifelse(Obs_Type %in% c("A", "Absent", "Absence"), "Absent", "Present")), #sd is specie distribution
    
    ec = as.factor(ecotone_code),
    pca = as.factor(Percent_cover_Agric),
    pcu = as.factor(Percent_cover_Urban),
    pcw = as.factor(Percent_cover_Wetland),
    pcr = as.factor(Percent_cover_Road),
    be = as.factor(bearing),
    Date = as.Date(obs_date),
    Long = as.factor(Longitude),
    Lat = as.factor(Latitude)
  ) %>% 
  dplyr::select(-(1:10)) 
str(WC_data)

## 'data.frame':    3253 obs. of  10 variables:
##  $ sd  : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec  : Factor w/ 16 levels "A","B","C","D",..: 1 1 13 1 1 1 16 1 3 10 ...
##  $ pca : Factor w/ 1805 levels "0","0.0011969",..: 500 372 1685 617 602 191 1 1128 853 1 ...
##  $ pcu : Factor w/ 171 levels "0","0.000963",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ pcw : Factor w/ 906 levels "0","0.0001182",..: 265 1 1 329 452 661 1 37 1 536 ...
##  $ pcr : Factor w/ 1512 levels "0","0.0224564",..: 1 1156 1 1359 546 971 1 611 187 1 ...
##  $ be  : Factor w/ 2332 levels "-25.2043131071031",..: 736 1181 73 2182 826 1806 193 490 1655 2122 ...
##  $ Date: Date, format: "2016-04-27" "2011-08-31" ...
##  $ Long: Factor w/ 606 levels "-506998.852700001",..: 229 299 27 571 233 449 109 113 407 567 ...
##  $ Lat : Factor w/ 369 levels "-352885.777899999",..: 65 163 151 204 204 67 8 336 283 348 ...

# Convert ecotone categories to numbers
WC_data$ec <- as.factor(match(WC_data$ec, LETTERS[1:16]))
head(WC_data)

##       sd ec        pca pcu       pcw       pcr                be       Date
## 1 Absent  1 19.9522629   0 0.5400493         0 -11.7832260925023 2016-04-27
## 2 Absent  1 14.2516022   0         0  1.320397 -7.10249040029449 2011-08-31
## 3 Absent 13  95.493309   0         0         0 -21.4834239076578 2010-10-04
## 4 Absent  1 29.4594727   0 0.6866135 1.8332956  7.67226661416858 2015-04-08
## 5 Absent  1 28.2887821   0 1.0903701 0.6606433 -10.3772312136036 2013-05-08
## 6 Absent  1  5.8491683   0 2.3642426 1.1900764  1.28370396103281 2013-08-27
##                Long               Lat
## 1 -271998.852700001 -288885.777899999
## 2 -199998.852700001 -190885.777899999
## 3 -479998.852700001 -202885.777899999
## 4  80001.1472999994 -149885.777899999
## 5 -267998.852700001 -149885.777899999
## 6 -46998.8527000006 -286885.777899999

str(WC_data)

## 'data.frame':    3253 obs. of  10 variables:
##  $ sd  : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec  : Factor w/ 16 levels "1","2","3","4",..: 1 1 13 1 1 1 16 1 3 10 ...
##  $ pca : Factor w/ 1805 levels "0","0.0011969",..: 500 372 1685 617 602 191 1 1128 853 1 ...
##  $ pcu : Factor w/ 171 levels "0","0.000963",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ pcw : Factor w/ 906 levels "0","0.0001182",..: 265 1 1 329 452 661 1 37 1 536 ...
##  $ pcr : Factor w/ 1512 levels "0","0.0224564",..: 1 1156 1 1359 546 971 1 611 187 1 ...
##  $ be  : Factor w/ 2332 levels "-25.2043131071031",..: 736 1181 73 2182 826 1806 193 490 1655 2122 ...
##  $ Date: Date, format: "2016-04-27" "2011-08-31" ...
##  $ Long: Factor w/ 606 levels "-506998.852700001",..: 229 299 27 571 233 449 109 113 407 567 ...
##  $ Lat : Factor w/ 369 levels "-352885.777899999",..: 65 163 151 204 204 67 8 336 283 348 ...

# Adding Season variable
wcDataframe_season <- WC_data %>%
  mutate(
    Date = as.Date(Date),  # Keep as Date type
    season = case_when(
      month(Date) %in% 3:5 ~ "Spring",
      month(Date) %in% 9:11 ~ "Fall",
      month(Date) %in% 6:8 ~ "Summer",
      TRUE ~ "Winter"))


# Replace NA values
# Convert factor columns to numeric and replace NA values with 0
wcDataframe_season <- wcDataframe_season %>%
  mutate(
    DateNum = as.numeric(as.Date(Date)),
    season = as.factor(as.character(season)),
    pca = as.numeric(as.character(pca)),
    pcu = as.numeric(as.character(pcu)),
    pcw = as.numeric(as.character(pcw)),
    pcr = as.numeric(as.character(pcr)),
    be = as.numeric(as.character(be)),
    Long = as.numeric(as.character(Long)),
    Lat = as.numeric(as.character(Lat))
  ) %>%
  mutate(
    DateNum = replace_na(DateNum, 0),
    season = replace_na(season, 0),
    pca = replace_na(pca, 0),
    pcu = replace_na(pcu, 0),
    pcw = replace_na(pcw, 0),
    pcr = replace_na(pcr, 0),
    be = replace_na(be, 0),
    Long = replace_na(Long, 0),
    Lat = replace_na(Lat, 0)
  ) %>% 
  dplyr::select(-Date)


str(wcDataframe_season)

## 'data.frame':    3253 obs. of  11 variables:
##  $ sd     : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec     : Factor w/ 16 levels "1","2","3","4",..: 1 1 13 1 1 1 16 1 3 10 ...
##  $ pca    : num  20 14.3 95.5 29.5 28.3 ...
##  $ pcu    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pcw    : num  0.54 0 0 0.687 1.09 ...
##  $ pcr    : num  0 1.32 0 1.833 0.661 ...
##  $ be     : num  -11.78 -7.1 -21.48 7.67 -10.38 ...
##  $ Long   : num  -271999 -199999 -479999 80001 -267999 ...
##  $ Lat    : num  -288886 -190886 -202886 -149886 -149886 ...
##  $ season : Factor w/ 4 levels "Fall","Spring",..: 2 3 1 2 2 3 1 4 4 1 ...
##  $ DateNum: num  16918 15217 14886 16533 15833 ...

# Remove Season Variable
wcDataframe <- wcDataframe_season %>%
  dplyr::select(-season)

str(wcDataframe)

## 'data.frame':    3253 obs. of  10 variables:
##  $ sd     : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec     : Factor w/ 16 levels "1","2","3","4",..: 1 1 13 1 1 1 16 1 3 10 ...
##  $ pca    : num  20 14.3 95.5 29.5 28.3 ...
##  $ pcu    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pcw    : num  0.54 0 0 0.687 1.09 ...
##  $ pcr    : num  0 1.32 0 1.833 0.661 ...
##  $ be     : num  -11.78 -7.1 -21.48 7.67 -10.38 ...
##  $ Long   : num  -271999 -199999 -479999 80001 -267999 ...
##  $ Lat    : num  -288886 -190886 -202886 -149886 -149886 ...
##  $ DateNum: num  16918 15217 14886 16533 15833 ...

PART 3 - BELAIRE MODEL ADAPTATION

Areas within the refuge that are unsuitable for Whooping Cranes due to proximity to human disturbances such as roads, railroads, power lines, and bridges were identified and excluded from the analysis. This was achieved by creating buffers around areas of human disturbance and assessing the extent to which these overlap with wetland habitats. Finally, due to the size of individual birds (approximately 5 feet tall and 7 feet wide (wing tip to wing tip)) and the fact that the birds prefer habitats of certain size (Watershed Institute, Inc., 2013), all wetlands of 2.5 acres and below were screened out as unsuitable habitats.

Data Formatting

# Set Seed
set.seed(123)

##### Select Required Variables
WcD_Belaire <- wcDataframe %>% 
  dplyr::select(-(8:10))
str(WcD_Belaire)

## 'data.frame':    3253 obs. of  7 variables:
##  $ sd : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec : Factor w/ 16 levels "1","2","3","4",..: 1 1 13 1 1 1 16 1 3 10 ...
##  $ pca: num  20 14.3 95.5 29.5 28.3 ...
##  $ pcu: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pcw: num  0.54 0 0 0.687 1.09 ...
##  $ pcr: num  0 1.32 0 1.833 0.661 ...
##  $ be : num  -11.78 -7.1 -21.48 7.67 -10.38 ...

# Set up the formula for random forest
formula <- sd ~ pcu + pcw + pca + pcr + ec + be

# Split the data into training and testing sets
trainIndex <- createDataPartition(WcD_Belaire$sd, p = 0.75, list = FALSE)
trainData <- WcD_Belaire[trainIndex, ]
testData <- WcD_Belaire[-trainIndex, ]

Random Forest Model

# Set Seed
set.seed(123)

# Check for NA Values and view summary of data
colSums(is.na(WcD_Belaire))

##  sd  ec pca pcu pcw pcr  be 
##   0   0   0   0   0   0   0

summary(WcD_Belaire)

##        sd             ec            pca               pcu         
##  Absent :2000   1      :1283   Min.   :  0.000   Min.   : 0.0000  
##  Present:1253   13     : 497   1st Qu.:  1.029   1st Qu.: 0.0000  
##                 3      : 493   Median : 27.919   Median : 0.0000  
##                 7      : 335   Mean   : 37.109   Mean   : 0.3045  
##                 5      : 150   3rd Qu.: 69.218   3rd Qu.: 0.0000  
##                 10     : 143   Max.   :100.000   Max.   :67.6093  
##                 (Other): 352                                      
##       pcw               pcr                be         
##  Min.   :  0.000   Min.   : 0.0000   Min.   :-25.204  
##  1st Qu.:  0.000   1st Qu.: 0.0000   1st Qu.:-11.986  
##  Median :  0.000   Median : 0.6601   Median : -7.509  
##  Mean   :  9.287   Mean   : 0.6651   Mean   : -6.784  
##  3rd Qu.:  1.405   3rd Qu.: 0.8708   3rd Qu.: -1.895  
##  Max.   :100.000   Max.   :19.5900   Max.   : 10.991  
##

# Fit the Random Forest model
(SDmodel <- randomForest(formula, 
                         data = trainData, 
                         importance = TRUE,
                         Proximity = TRUE,
                         Positive = "Present",
                         ntree = 500))

## 
## Call:
##  randomForest(formula = formula, data = trainData, importance = TRUE,      Proximity = TRUE, Positive = "Present", ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 7.09%
## Confusion matrix:
##         Absent Present class.error
## Absent    1424      76  0.05066667
## Present     97     843  0.10319149

#Plot variable Importance
importance(SDmodel)

##        Absent   Present MeanDecreaseAccuracy MeanDecreaseGini
## pcu  7.837266  32.68563             32.24021          19.2514
## pcw 38.384192 107.82836             99.08106         259.3597
## pca 11.657001  79.58942             74.76961         181.5055
## pcr  7.567096  74.22031             66.41497         122.0975
## ec   8.987979  41.32317             45.01034          92.6506
## be  61.272301 109.25317            115.02634         395.7718

varImpPlot(SDmodel, main = "Species Distribution Model", scale = TRUE)

Optimizing Model

# Set Seed
set.seed(123)

# Optimum nTrees
# Plotting the error rates for each tree based on a matrix within the model called err.rate
head(SDmodel$err.rate, 5)

##            OOB    Absent   Present
## [1,] 0.1355353 0.1235741 0.1534091
## [2,] 0.1249142 0.1153415 0.1400709
## [3,] 0.1117287 0.1084777 0.1168091
## [4,] 0.1194391 0.1101426 0.1339950
## [5,] 0.1149478 0.1060493 0.1287703

oob.error.data <- data.frame(
  Trees = rep(1:nrow(SDmodel$err.rate), times = 3),
  Type = rep(c("OOB", "Present", "Absent"), each = nrow(SDmodel$err.rate)),
  Error = c(SDmodel$err.rate[ , "OOB"],
            SDmodel$err.rate[ , "Present"],
            SDmodel$err.rate[ , "Absent"])
)

oob.error.data %>% 
  ggplot(aes(Trees, Error)) +
  geom_line(aes(color = Type))

# making model with nTrees
(SDmodelnT <- randomForest(sd ~ ., 
                           data = trainData, 
                           ntree = 1000,
                           importance = TRUE,
                           Proximity = TRUE))

## 
## Call:
##  randomForest(formula = sd ~ ., data = trainData, ntree = 1000,      importance = TRUE, Proximity = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 1000
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 6.93%
## Confusion matrix:
##         Absent Present class.error
## Absent    1428      72   0.0480000
## Present     97     843   0.1031915

# to see if plotting with nT is better, plot error rate
oob.error.data1 <- data.frame(
  Trees = rep(1:nrow(SDmodelnT$err.rate), times = 3),
  Type = rep(c("OOB", "Present", "Absent"), each = nrow(SDmodelnT$err.rate)),
  Error = c(SDmodelnT$err.rate[ , "OOB"],
            SDmodelnT$err.rate[ , "Present"],
            SDmodelnT$err.rate[ , "Absent"])
)

oob.error.data1 %>% 
  ggplot(aes(Trees, Error)) +
  geom_line(aes(color = Type))

# Optimum Variables per node

oob.values <- vector(length = 10) 

for (i in 1:10) { 
  temp.model <- temp.model <- randomForest(sd ~ ., data = trainData, mtry=i, ntree=1500) 
  oob.values[i] <- temp.model$err.rate[nrow(temp.model$err.rate),1]
}

oob.values #print out the different oob error values and the one with the lowest value is the number of variables that works best.

##  [1] 0.10778689 0.07049180 0.07008197 0.07213115 0.07377049 0.07540984
##  [7] 0.07418033 0.07336066 0.07336066 0.07540984

Final Random Forest Model

# Set Seed
set.seed(123)

## find the minimum error
min(oob.values)

## [1] 0.07008197

## find the optimal value for mtry...
mtryV <- which(oob.values == min(oob.values))
## create a model for proximities using the best value for mtry
(SDmodel_Belaire <- randomForest(sd ~ ., 
                                 data = trainData,
                                 ntree = 1001, 
                                 proximity = TRUE,
                                 importance = TRUE,
                                 mtryV = mtryV))

## 
## Call:
##  randomForest(formula = sd ~ ., data = trainData, ntree = 1001,      proximity = TRUE, importance = TRUE, mtryV = mtryV) 
##                Type of random forest: classification
##                      Number of trees: 1001
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 6.93%
## Confusion matrix:
##         Absent Present class.error
## Absent    1428      72   0.0480000
## Present     97     843   0.1031915

importance(SDmodel_Belaire)

##        Absent   Present MeanDecreaseAccuracy MeanDecreaseGini
## ec  12.924712  61.95951             67.70381         95.68005
## pca 15.687921 116.28165            110.18667        182.96273
## pcu  7.840469  46.48404             42.62872         18.95825
## pcw 56.786387 154.03701            149.82857        256.46308
## pcr 11.272700  99.54822             91.78421        123.61201
## be  89.536225 155.64528            164.19442        391.58428

varImpPlot(SDmodel_Belaire, main = "Species Distribution Model", scale = TRUE)

Making Predictions

# Set Seed
set.seed(123)

# Predict probabilities on the test set
testData$Predicted <- predict(SDmodel_Belaire, testData, type = "prob")[, "Present"]

# Convert the predicted probabilities to "Absent" or "Present" based on a threshold
testData$PredictedClass <- ifelse(testData$Predicted > 0.5, "Present", "Absent")
testData$PredictedClass <- factor(testData$PredictedClass, levels = c("Absent", "Present"))


# Evaluate model performance
(cm <- confusionMatrix(data = testData$PredictedClass, 
                       reference = testData$sd,
                       positive = "Present"))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Absent Present
##    Absent     486      29
##    Present     14     284
##                                           
##                Accuracy : 0.9471          
##                  95% CI : (0.9294, 0.9615)
##     No Information Rate : 0.615           
##     P-Value [Acc > NIR] : < 2e-16         
##                                           
##                   Kappa : 0.8873          
##                                           
##  Mcnemar's Test P-Value : 0.03276         
##                                           
##             Sensitivity : 0.9073          
##             Specificity : 0.9720          
##          Pos Pred Value : 0.9530          
##          Neg Pred Value : 0.9437          
##              Prevalence : 0.3850          
##          Detection Rate : 0.3493          
##    Detection Prevalence : 0.3665          
##       Balanced Accuracy : 0.9397          
##                                           
##        'Positive' Class : Present         
##

# Extract the confusion matrix table
cm_table <- as.table(cm$table)


# Convert the table to a data frame
cm_df <- as.data.frame(cm_table)

# Plot confusion matrix
ggplot(data = cm_df, aes(x = Reference, y = Prediction, fill = Freq)) +
  geom_tile() +
  geom_text(aes(label = Freq), color = "white", size = 5) +
  scale_fill_gradient(low = "brown", high = "darkslateblue") +
  labs(x = "Actual Class", y = "Predicted Class") +
  theme_minimal()

Plotting AUC Curve

par(pty = "s")

# Plot the ROC curve using the test set response and predicted probabilities
ROC <- roc(testData$sd, testData$Predicted, percent = TRUE, 
           levels = c("Absent", "Present"), legacy.axes=TRUE)
ROC

## 
## Call:
## roc.default(response = testData$sd, predictor = testData$Predicted,     levels = c("Absent", "Present"), percent = TRUE, legacy.axes = TRUE)
## 
## Data: testData$Predicted in 500 controls (testData$sd Absent) < 313 cases (testData$sd Present).
## Area under the curve: 98.68%

str(ROC)

## List of 15
##  $ percent           : logi TRUE
##  $ sensitivities     : num [1:312] 100 100 100 100 100 100 100 100 100 100 ...
##  $ specificities     : num [1:312] 0 6.8 9.4 11.2 13.8 17.4 20.2 22.2 24.2 25.2 ...
##  $ thresholds        : num [1:312] -Inf 0.0005 0.0015 0.0025 0.0035 ...
##  $ direction         : chr "<"
##  $ cases             : num [1:313] 0.205 0.769 0.398 0.781 0.781 ...
##  $ controls          : num [1:500] 0.047 0 0.004 0.0779 0.1798 ...
##  $ fun.sesp          :function (thresholds, controls, cases, direction)  
##  $ auc               : 'auc' num 98.7
##   ..- attr(*, "partial.auc")= logi FALSE
##   ..- attr(*, "percent")= logi TRUE
##   ..- attr(*, "roc")=List of 15
##   .. ..$ percent           : logi TRUE
##   .. ..$ sensitivities     : num [1:312] 100 100 100 100 100 100 100 100 100 100 ...
##   .. ..$ specificities     : num [1:312] 0 6.8 9.4 11.2 13.8 17.4 20.2 22.2 24.2 25.2 ...
##   .. ..$ thresholds        : num [1:312] -Inf 0.0005 0.0015 0.0025 0.0035 ...
##   .. ..$ direction         : chr "<"
##   .. ..$ cases             : num [1:313] 0.205 0.769 0.398 0.781 0.781 ...
##   .. ..$ controls          : num [1:500] 0.047 0 0.004 0.0779 0.1798 ...
##   .. ..$ fun.sesp          :function (thresholds, controls, cases, direction)  
##   .. ..$ auc               : 'auc' num 98.7
##   .. .. ..- attr(*, "partial.auc")= logi FALSE
##   .. .. ..- attr(*, "percent")= logi TRUE
##   .. .. ..- attr(*, "roc")=List of 8
##   .. .. .. ..$ percent      : logi TRUE
##   .. .. .. ..$ sensitivities: num [1:312] 100 100 100 100 100 100 100 100 100 100 ...
##   .. .. .. ..$ specificities: num [1:312] 0 6.8 9.4 11.2 13.8 17.4 20.2 22.2 24.2 25.2 ...
##   .. .. .. ..$ thresholds   : num [1:312] -Inf 0.0005 0.0015 0.0025 0.0035 ...
##   .. .. .. ..$ direction    : chr "<"
##   .. .. .. ..$ cases        : num [1:313] 0.205 0.769 0.398 0.781 0.781 ...
##   .. .. .. ..$ controls     : num [1:500] 0.047 0 0.004 0.0779 0.1798 ...
##   .. .. .. ..$ fun.sesp     :function (thresholds, controls, cases, direction)  
##   .. .. .. ..- attr(*, "class")= chr "roc"
##   .. ..$ call              : language roc.default(response = testData$sd, predictor = testData$Predicted, levels = c("Absent",      "Present"), percent| __truncated__
##   .. ..$ original.predictor: num [1:813] 0.047 0 0.004 0.0779 0.1798 ...
##   .. ..$ original.response : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ predictor         : num [1:813] 0.047 0 0.004 0.0779 0.1798 ...
##   .. ..$ response          : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ levels            : chr [1:2] "Absent" "Present"
##   .. ..- attr(*, "class")= chr "roc"
##  $ call              : language roc.default(response = testData$sd, predictor = testData$Predicted, levels = c("Absent",      "Present"), percent| __truncated__
##  $ original.predictor: num [1:813] 0.047 0 0.004 0.0779 0.1798 ...
##  $ original.response : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ predictor         : num [1:813] 0.047 0 0.004 0.0779 0.1798 ...
##  $ response          : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ levels            : chr [1:2] "Absent" "Present"
##  - attr(*, "class")= chr "roc"

AUC_Plot <- plot.roc(ROC, col = "#00008B", lwd = 4, print.auc = TRUE,
                     xlab="False Positive Percentage", ylab="True Postive Percentage")

par(pty = "m")

Partial Dependence Plots

# Function to normalize a vector to a 0-1 scale
normalize <- function(x) {
  (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

# Get importance of variables and sort them
imp <- importance(SDmodel)
impvar <- rownames(imp)[order(imp[, 1], decreasing = TRUE)]

# Define labels for ecotone category (A-P)
ecotone_labels <- LETTERS[1:16]  # Creates A-P

# Set up the plot layout
op <- par(mfrow = c(2, 3))

# Loop through each variable and plot
for (i in seq_along(impvar)) {
  if (impvar[i] == "ec") {
    # Handle categorical variable (ecotone category) as a bar plot
    mean_values <- tapply(predict(SDmodel, trainData, type = "prob")[, "Present"], 
                          trainData[[impvar[i]]], mean)
    
    # Normalize the values
    mean_values <- normalize(mean_values)
    
    # Barplot with custom ecotone labels (A-P)
    barplot(mean_values, 
            names.arg = ecotone_labels,  # Replace numbers with A-P
            main = "Partial Dependence on Ecotone Category", 
            xlab = "Ecotope Category", 
            ylab = "Normalized Predicted Suitability", 
            col = "blue")
  } else {
    # Generate partial dependence data
    pd_data <- partialPlot(SDmodel, trainData, impvar[i], which.class = "Present", smooth = "loess", plot = FALSE)
    
    # Normalize y-values
    pd_data$y <- normalize(pd_data$y)
    
    # Plot the normalized partial dependence plot
    plot(pd_data$x, pd_data$y, type = "l", col = "#4daf4a", lwd = 3, 
         main = paste("Partial Dependence on", impvar[i]), 
         xlab = impvar[i], 
         ylab = "Normalized Predicted Suitability")
  }
}

# Reset plot layout
par(op)

Habitat Suitability Map

urban_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Pcr.tif")
wetland_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcw.tif")
agric_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pca.tif")
road_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcr.tif")
ecotone_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/ec.tif")
bearing_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/be.tif")
Longitude_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Long.tif")
Latitude_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Lat.tif")

# Stack the rasters
env_var <- stack(urban_raster, wetland_raster, agric_raster, road_raster, ecotone_raster, bearing_raster)
names(env_var) <- c("pcu", "pcw", "pca", "pcr", "ec", "be")


# Predict the suitability/probability of occurrence across the landscape
suitability_map <- predict(env_var, SDmodel_Belaire, type = "prob", index = 2, progress = "window")


#######################################################
# Creating suitability map for Kansas boundary
#######################################################

# Importing Kansas Boundary
KSBoundary <- st_read("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Scripts/Whoopin-crane---Belaire/KansasBoundary/KansasBoundary.shp")

## Reading layer `KansasBoundary' from data source 
##   `C:\Users\ifeom\OneDrive - Kansas State University\Desktop\New Research Stuff to Merge with SSD Drive\HabitatAssessment_Belaire\Scripts\Whoopin-crane---Belaire\KansasBoundary\KansasBoundary.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 55 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -507498.9 ymin: -353385.8 xmax: 116034 ymax: 15392.83
## Projected CRS: North_America_Albers_Equal_Area_Conic

# st_crs(KSBoundary)
# 
# crs <- '+proj=utm +zone=11 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'
# 
# # Use project for SpatRasters in terra
# 
# suitability_map_prj <- project(suitability_map, crs, res = res(suitability_map), method = "near")

# Cropping out Kansas
KSsuitability_map <- mask(suitability_map, mask = KSBoundary)

# Plot the suitability map
plot(KSsuitability_map, main = "Whooping Crane Habitat Suitability Map")

# Save the suitability map
# writeRaster(KSsuitability_map, "G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/SuitabilityMap_Belaire.tif", format = "GTiff", overwrite = TRUE)
writeRaster(KSsuitability_map, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - Belaire/RelativeProbabilityFunction_Belaire_2011_1km2.tif", format = "GTiff", overwrite = TRUE)

Model Selection

# Set Seed
set.seed(123)

# This approach is used to find the most parsimonious model that is not fitting noise. 
# This allows us to test the model parameters and select the model with the best error component.
(rf.model <- rf.modelSel(x=WcD_Belaire[,2:ncol(WcD_Belaire)], 
                         y=WcD_Belaire$sd, 
                         imp.scale="mir", 
                         ntree=1001))

## Selected variables: 
##   ec pca pcw be 
## 
## Variables in parameter set 1 
##   ec pca pcu pcw pcr be 
## 
## Variables in parameter set 2 
##   ec pca pcw be 
## 
## Variables in parameter set 3 
##   pca pcw be 
## 
## Variables in parameter set 4 
##   pcw be 
## 
## Variable importance for selected parameters: 
## 
##            imp
## ec  0.41915728
## pca 0.59487973
## pcu 0.05363442
## pcw 0.89398487
## pcr 0.41368972
## be  1.00000000
## 
## Variable importance test for selected parameters: 
## 
##   THRESHOLD OOBERROR CLASS.ERROR NPARAMETERS x1  x2   x3   x4   x5   x6
## 2 0.4150566 6.332616    7.702415           4 be pcw  pca   ec <NA> <NA>
## 1 0.0000000 5.625576    7.741421           6 be pcw  pca   ec  pcr  pcu
## 3 0.5070185 7.470028   11.157627           3 be pcw  pca <NA> <NA> <NA>
## 4 0.8192086 9.867814   17.567435           2 be pcw <NA> <NA> <NA> <NA>

# First we select the default parameters
sel.vars <- rf.model$parameters[[1]]

# Set up the formula for random forest
SelVar <- as.formula(paste("sd ~", paste(sel.vars, collapse = " + ")))

Identify Habitats as Binary Variable

rm(env_var) #Remove stack to free up some memory

# Occurrence thresholds
occurrence.threshold(SDmodel_Belaire, trainData[,sel.vars], class="Present", p = seq(0.1, 0.9, 0.02), type = "delta.ss")

## Evaluation statistic: delta.ss 
## 
## Moments for thresholds: 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00067 0.01879 0.05010 0.06654 0.09615 0.26677 
## 
## Probability threshold with minimum delta sensitivity/specificity =  0.46

occurrence.threshold(SDmodel_Belaire, trainData[,sel.vars], class="Present", p = seq(0.1, 0.9, 0.02), type = "kappa")

## Evaluation statistic: kappa 
## 
## Moments for thresholds: 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7229  0.8864  0.9419  0.9216  0.9776  0.9896 
## 
## Probability threshold with maximum kappa =  0.46

occurrence.threshold(SDmodel_Belaire, trainData[,sel.vars], class="Present", p = seq(0.1, 0.7, 0.02), type = "sum.ss")

## Evaluation statistic: sum.ss 
## 
## Moments for thresholds: 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.733   1.928   1.968   1.937   1.982   1.990 
## 
## Probability threshold with maximum cummlative sensitivity/specificity =  0.46

# change raster to binary layer
suitability_mapbinary <- KSsuitability_map

# set based on occurrence threshold from summed sensitivity and specificity above to create binary(1 or 0) habitat map
suitability_mapbinary[suitability_mapbinary >= 0.4] <- 1
suitability_mapbinary[suitability_mapbinary < 0.4] <- 0

# double check
head(suitability_mapbinary)

##     1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
## 1  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 2  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 3  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 4  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 6  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 7  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 8  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 9  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

plot(suitability_mapbinary)

suitability_mapbinary[suitability_mapbinary == 0] <- NA
writeRaster(suitability_mapbinary, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - Belaire/RelativeProbabilityFunction_Binary.tif", format = "GTiff", overwrite = TRUE)

PART 4 - IMPROVED MODEL

The quality of the remaining wetlands was evaluated based on key habitat characteristics that are known to influence Whooping Crane habitat selection. These characteristics include water regime, wetland size, wetland type (natural vs. artificial), wetland density, and distance to food. The model variables were created by setting a function for each habitat characteristic and then using these set functions and spatial analysis to score different wetland types. Each characteristic was scored on a scale, and these scores were used to calculate a composite habitat quality score for each wetland.

Multicolinearity and Sample Balance Test

# Identify multicollinear variables in columns 3 to end
cl <- multi.collinear(wcDataframe[, 3:ncol(wcDataframe)], p = 0.05)
cl

## NULL

# Only proceed if cl is not NULL
if (!is.null(cl)) {
  # Perform "leave one out" test
  for (l in cl) {
    cl.test <- wcDataframe[, -which(names(wcDataframe) == l)]
    print(paste("Remove variable", l, sep = ": "))
    multi.collinear(cl.test[, 3:ncol(cl.test)], p = 0.05)
  }

  # Only remove variables that exist in wcDataframe (safety check)
  existing_cl <- cl[cl %in% names(wcDataframe)]
  wcDataframe <- wcDataframe[, !names(wcDataframe) %in% existing_cl]
}

str(wcDataframe)

## 'data.frame':    3253 obs. of  10 variables:
##  $ sd     : Factor w/ 2 levels "Absent","Present": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ec     : Factor w/ 16 levels "1","2","3","4",..: 1 1 13 1 1 1 16 1 3 10 ...
##  $ pca    : num  20 14.3 95.5 29.5 28.3 ...
##  $ pcu    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ pcw    : num  0.54 0 0 0.687 1.09 ...
##  $ pcr    : num  0 1.32 0 1.833 0.661 ...
##  $ be     : num  -11.78 -7.1 -21.48 7.67 -10.38 ...
##  $ Long   : num  -271999 -199999 -479999 80001 -267999 ...
##  $ Lat    : num  -288886 -190886 -202886 -149886 -149886 ...
##  $ DateNum: num  16918 15217 14886 16533 15833 ...

##############################################
##############################################
# Calculate the percent of the positive
# (Present) class to check for sample
# balance (zero inflation issues) in the model
##############################################
##############################################
(nrow(wcDataframe[wcDataframe$sd == "Present", ]) / nrow(wcDataframe)) * 100

## [1] 38.51829

Model Selection

# Set Seed
set.seed(123)

# This approach is used to find the most parsimonious model that is not fitting noise.
# This allows us to test the model parameters and select the model with the best error component.
# set.seed(123)
(rf.model <- rf.modelSel(x=wcDataframe[,2:ncol(wcDataframe)],
                         y=wcDataframe$sd,
                         imp.scale="mir",
                         ntree=1001))

## Selected variables: 
##   pca pcw be Long DateNum 
## 
## Variables in parameter set 1 
##   ec pca pcu pcw pcr be Long Lat DateNum 
## 
## Variables in parameter set 2 
##   ec pca pcw be Long Lat DateNum 
## 
## Variables in parameter set 3 
##   pca pcw be Long DateNum 
## 
## Variables in parameter set 4 
##   pcw be Long 
## 
## Variable importance for selected parameters: 
## 
##                imp
## ec      0.39956062
## pca     0.45445521
## pcu     0.03061287
## pcw     0.84738695
## pcr     0.26557168
## be      1.00000000
## Long    0.84147098
## Lat     0.44589037
## DateNum 0.45157014
## 
## Variable importance test for selected parameters: 
## 
##   THRESHOLD OOBERROR CLASS.ERROR NPARAMETERS x1  x2   x3   x4      x5   x6   x7
## 3 0.4515701 3.473717    3.617987           5 be pcw Long  pca DateNum <NA> <NA>
## 2 0.3995606 3.412235    4.567501           7 be pcw Long  pca DateNum  Lat   ec
## 1 0.0000000 3.688903    5.666401           9 be pcw Long  pca DateNum  Lat   ec
## 4 0.8414710 6.793729   10.312813           3 be pcw Long <NA>    <NA> <NA> <NA>
##     x8   x9
## 3 <NA> <NA>
## 2 <NA> <NA>
## 1  pcr  pcu
## 4 <NA> <NA>

# First we select the default parameters
# sel.vars <- rf.model$selvars

# # Alternatively, choose a specific model (in this case, parameters associated with model 2).
sel.vars <- rf.model$parameters[[2]]


# Set up the formula for random forest
SelVar <- as.formula(paste("sd ~", paste(sel.vars, collapse = " + ")))

Train and Test Data

# Set Seed
set.seed(123)

# Create a combined stratification variable
wcDataframe <- wcDataframe %>%
  mutate(strata = interaction(sd, wcDataframe_season$season, drop = TRUE))

# Verify the distribution
table(wcDataframe$strata)

## 
##    Absent.Fall   Present.Fall  Absent.Spring Present.Spring  Absent.Summer 
##            496            596            502            597            525 
## Present.Summer  Absent.Winter Present.Winter 
##             24            477             36

trainIndex <- createDataPartition(wcDataframe$strata,
                                  p = 0.60,
                                  list = FALSE,
                                  times = 1)
trainData <- wcDataframe[trainIndex, ]
testData <- wcDataframe[-trainIndex, ]

Random Forest model

# Set Seed
set.seed(123)

(SDmodelI <- randomForest(SelVar,
                          data = trainData,
                          importance = TRUE,
                          Proximity = TRUE,
                          norm.votes=TRUE,
                          positive = "Present",
                          ntree = 1001))

## 
## Call:
##  randomForest(formula = SelVar, data = trainData, importance = TRUE,      Proximity = TRUE, norm.votes = TRUE, positive = "Present",      ntree = 1001) 
##                Type of random forest: classification
##                      Number of trees: 1001
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 5.21%
## Confusion matrix:
##         Absent Present class.error
## Absent    1159      43  0.03577371
## Present     59     695  0.07824934

importance(SDmodelI)

##            Absent   Present MeanDecreaseAccuracy MeanDecreaseGini
## ec       8.938493  59.90689             62.20867         64.81423
## pca     13.596324  76.42473             75.56720         93.32090
## pcw     55.513690 116.82531            115.34086        180.29211
## be      46.632190  86.82085             92.66660        204.55167
## Long    43.755288  78.18554             86.62223        174.68949
## Lat     15.730088  77.54051             73.69115         96.99982
## DateNum 19.301837  89.03476             82.89617        106.23566

varImpPlot(SDmodelI, main = "Species Distribution Model", scale = TRUE)

Making Predictions

# Set Seed
set.seed(123)

# Predict probabilities on the test set
testData$Predicted <- predict(SDmodelI, testData, type = "prob")[, "Present"]

# Convert the predicted probabilities to "Absent" or "Present" based on a threshold
testData$PredictedClass <- ifelse(testData$Predicted > 0.5, "Present", "Absent")
testData$PredictedClass <- factor(testData$PredictedClass, levels = c("Absent", "Present"))

# Evaluate model performance
(cm <- confusionMatrix(data = testData$PredictedClass,
                       reference = testData$sd,
                       positive = "Present"))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Absent Present
##    Absent     774      31
##    Present     24     468
##                                           
##                Accuracy : 0.9576          
##                  95% CI : (0.9452, 0.9679)
##     No Information Rate : 0.6153          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9102          
##                                           
##  Mcnemar's Test P-Value : 0.4185          
##                                           
##             Sensitivity : 0.9379          
##             Specificity : 0.9699          
##          Pos Pred Value : 0.9512          
##          Neg Pred Value : 0.9615          
##              Prevalence : 0.3847          
##          Detection Rate : 0.3608          
##    Detection Prevalence : 0.3793          
##       Balanced Accuracy : 0.9539          
##                                           
##        'Positive' Class : Present         
##

########## Plot Confusion Matrix
# Extract the confusion matrix table
cm_table <- as.table(cm$table)

# Convert the table to a data frame
cm_df <- as.data.frame(cm_table)

# Plot confusion matrix
ggplot(data = cm_df, aes(x = Reference, y = Prediction, fill = Freq)) +
  geom_tile() +
  geom_text(aes(label = Freq), color = "white", size = 5) +
  scale_fill_gradient(low = "brown", high = "darkslateblue") +
  labs(x = "Actual Class", y = "Predicted Class") +
  theme_minimal()

Plotting AUC Curve

par(pty = "s")

# Plot the ROC curve using the test set response and predicted probabilities
ROC <- roc(testData$sd, testData$Predicted, percent = TRUE,
           levels = c("Absent", "Present"), legacy.axes=TRUE)

# Plot the AUC using the ROC Curve
AUC_Plot <- plot.roc(ROC, col = "#00008B", lwd = 4, print.auc = TRUE,
                     xlab="False Positive Percentage", ylab="True Postive Percentage")

par(pty = "m")

Making Partial Dependence Plots

########## Smoothed Normalized Plots

# Set up the plot layout
op <- par(mfrow = c(3, 3))

# Function to normalize a vector to a 0-1 scale
normalize <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

# Define labels for ecotone category (A-P)
ecotone_labels <- LETTERS[1:16]  # Creates A-P

# Get importance of variables and sort them
imp <- importance(SDmodelI)
impvar <- rownames(imp)[order(imp[, 1], decreasing = TRUE)]

for (i in seq_along(impvar)) {
  var_name <- impvar[i]  # Get variable name

  if (impvar[i] == "ec") {
    # Handle categorical variable (ecotone category) as a bar plot
    mean_values <- tapply(predict(SDmodelI, testData, type = "prob")[, "Present"],
                          testData[[impvar[i]]], mean)

    # Normalize the values
    mean_values <- normalize(mean_values)

    # Barplot with custom ecotone labels (A-P)
    barplot(mean_values,
            names.arg = ecotone_labels,  # Replace numbers with A-P
            main = "Partial Dependence on Ecotone Category",
            xlab = "Ecotone Category",
            ylab = "Normalized Predicted Suitability",
            col = "blue")
  } else {
    # Process numerical variables normally
    rf.partial.prob(SDmodelI, testData, var_name, "Present",
                    main = paste("Partial Dependence on", var_name),
                    smooth = "loess", raw = TRUE, rug = FALSE,
                    ylab = "Normalized Probability of Use")
  }
}

# Restore previous plotting settings
par(op)

Create Resource Selection Function for a Day

# Define the raster folder
raster_folder <- "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data"

# Input date and create day raster
InputDate <- "2015-11-15"
DatePredict <- as.numeric(as.Date(InputDate))
date <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcw.tif")  # Create a template raster

values(date) <- DatePredict  # Assign the future date value to all cells


# Get all variables that are included in the random forest model (including "Date")
raster_vars <- rownames(SDmodelI$importance)

# Construct file paths for all raster variables
raster_files <- file.path(raster_folder, paste0(raster_vars, ".tif"))

# Check if the files exist before stacking
raster_files <- raster_files[file.exists(raster_files)]

# Stack the raster layers
env_var <- stack(raster_files, date)

# Ensure variable names match the model
(names(env_var) <- raster_vars);

## [1] "ec"      "pca"     "pcw"     "be"      "Long"    "Lat"     "DateNum"

##############################################
##############################################
########## Predict RSF for that Day
##############################################
##############################################

# Remove comment for type of variables to use in model out of "All Variables", "Default Parameters", and "Selected parameters"
predict(env_var, SDmodelI, "suitabilityModel_AllVariables_2015-11-15.tif", type="prob", index=2, na.rm=TRUE, overwrite=TRUE, progress="window")

## class      : RasterLayer 
## dimensions : 369, 624, 230256  (nrow, ncol, ncell)
## resolution : 1000, 1000  (x, y)
## extent     : -507498.9, 116501.1, -353385.8, 15614.22  (xmin, xmax, ymin, ymax)
## crs        : +proj=aea +lat_0=40 +lon_0=-96 +lat_1=20 +lat_2=60 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs 
## source     : suitabilityModel_AllVariables_2015-11-15.tif 
## names      : suitabilityModel_AllVariables_2015.11.15 
## values     : 0, 0.982018  (min, max)

# Load predicted probability raster
r <- raster("suitabilityModel_AllVariables_2015-11-15.tif")
# Plot the probability raster
# plot(r)
# Save the suitability map
# writeRaster(r, "G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/Suitability Map - Selected Parameters/Suitability.model_Parameter(2)15Oct15.tif", format = "GTiff", overwrite = TRUE)
# writeRaster(r, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - All Variables/RelativeProbabilityFunction_AllVariables1_2015-11-15.tif", format = "GTiff", overwrite = TRUE)


#######################################################
# Creating suitability map for Kansas boundary
#######################################################

# Cropping out Kansas
KSr <- mask(r, mask = KSBoundary)

# Plot the suitability map
plot(KSr, main = "Whooping Crane Habitat Suitability Map")

writeRaster(KSr, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - All Variables/RelativeProbabilityFunction_AllVariables1_2015-11-15.tif", format = "GTiff", overwrite = TRUE)

Identify Habitats as Binary Variable

rm(env_var) #Remove stack to free up some memory

# Occurrence thresholds
occurrence.threshold(SDmodelI, trainData[,sel.vars], class="Present", p = seq(0.1, 0.9, 0.02), type = "delta.ss")

## Evaluation statistic: delta.ss 
## 
## Moments for thresholds: 
## 
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.000000 0.000000 0.008251 0.030528 0.044515 0.210471 
## 
## Probability threshold with minimum delta sensitivity/specificity =  0.38

occurrence.threshold(SDmodelI, trainData[,sel.vars], class="Present", p = seq(0.1, 0.9, 0.02), type = "kappa")

## Evaluation statistic: kappa 
## 
## Moments for thresholds: 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7934  0.9487  0.9892  0.9662  1.0000  1.0000 
## 
## Probability threshold with maximum kappa =  0.38

occurrence.threshold(SDmodelI, trainData[,sel.vars], class="Present", p = seq(0.1, 0.7, 0.02), type = "sum.ss")

## Evaluation statistic: sum.ss 
## 
## Moments for thresholds: 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.790   1.987   1.996   1.975   2.000   2.000 
## 
## Probability threshold with maximum cummlative sensitivity/specificity =  0.38

# change raster to binary layer
rbinary <- KSr

# set based on occurrence threshold from summed sensitivity and specificity above to create binary(1 or 0) habitat map
rbinary[rbinary >= 0.4] <- 1
rbinary[rbinary < 0.4] <- 0

# double check
head(rbinary)

##     1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
## 1  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 2  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 3  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 4  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 6  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 7  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 8  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 9  NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

plot(rbinary)

rbinary[rbinary == 0] <- NA
writeRaster(rbinary, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - All Variables/RelativeProbabilityFunction_AllVariables_Binary.tif", format = "GTiff", overwrite = TRUE)

PART 5 - TEMPORAL ANALYSIS

Variations between migration seasons (Spring and Fall)

Comprehensive assessment of habitat suitability for Whooping Cranes within the Quivira National Wildlife Refuge (NWR) gives valuable insights into the factors influencing their habitat selection. The findings highlighted in this research shows the importance of various habitat characteristics in shaping the quality of habitats for these endangered birds.

The spatial analysis revealed a range of habitat quality scores across the refuge, with scores ranging from 6 to 18. This indicates significant variability in the suitability of habitats for Whooping Cranes. Habitats scoring 12 or higher are considered potentially suitable, covering an area of approximately 22.56 hectares within the study area.

# Make sure wcDataframe has Date column in Date format
wcDataframe$Date <- as.Date(WC_data$Date)

# Define function to classify season
season <- function(date) {
  month <- month(date)
  if (month %in% c(3, 4, 5, 6)) {
    return("Spring")
  } else if (month %in% c(9, 10, 11, 12)) {
    return("Fall")
  } else if (month %in% c(7, 8)) {
    return("Summer")
  } else {
    return("Winter")
  }
}

# Add Year and Season columns
wcDataframe <- wcDataframe %>%
  mutate(Year = year(Date),
         Season = sapply(Date, season))

# Specify year and season of interest
target_year <- 2015
target_season <- "Fall"  # or "Spring", "Summer", "Winter"

# Filter for specific season within a year
subset_data <- wcDataframe %>%
  filter(Year == target_year, Season == target_season)

# Check data size
cat("Subset size:", nrow(subset_data), "rows\n")

## Subset size: 271 rows

# Partition data
set.seed(1234)
train_idx <- createDataPartition(subset_data$sd, p = 0.70, list = FALSE)
train_data <- subset_data[train_idx, ]
test_data <- subset_data[-train_idx, ]


formula <- sd ~ pcu + pcw + pca + pcr + ec + be + Long + Lat #+ DateNum
# Train model
(rf_season <- randomForest(formula,
                           data = train_data,
                           importance = TRUE,
                           Proximity = TRUE,
                           norm.votes=TRUE,
                           positive = "Present",
                           ntree = 500))

## 
## Call:
##  randomForest(formula = formula, data = train_data, importance = TRUE,      Proximity = TRUE, norm.votes = TRUE, positive = "Present",      ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 3.14%
## Confusion matrix:
##         Absent Present class.error
## Absent      77       5 0.060975610
## Present      1     108 0.009174312

importance(rf_season)

##         Absent   Present MeanDecreaseAccuracy MeanDecreaseGini
## pcu  -1.001002  2.703372             2.340957        0.1684258
## pcw  16.133145 21.593140            23.051593       13.8612699
## pca   1.235138 13.624943            13.144120        4.2452686
## pcr  11.122032 20.675033            21.548863       10.2761745
## ec    4.010309 18.782027            17.888123       11.7668060
## be   19.765988 28.070765            29.551590       22.9744817
## Long 18.013874 24.881990            26.635754       19.2681742
## Lat   6.407583 22.728649            20.932984       10.1661639

varImpPlot(rf_season, main = "Species Distribution Model", scale = TRUE)

##############################################
##############################################
########## Making Predictions
##############################################
##############################################

# Predict probabilities on the test set
testData$Predicted <- predict(rf_season, testData, type = "prob")[, "Present"]

# Convert the predicted probabilities to "Absent" or "Present" based on a threshold
testData$PredictedClass <- ifelse(testData$Predicted > 0.5, "Present", "Absent")
testData$PredictedClass <- factor(testData$PredictedClass, levels = c("Absent", "Present"))

# Evaluate model performance
(cm <- confusionMatrix(data = testData$PredictedClass, 
                       reference = testData$sd,
                       positive = "Present"))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Absent Present
##    Absent     766     309
##    Present     32     190
##                                           
##                Accuracy : 0.7371          
##                  95% CI : (0.7122, 0.7609)
##     No Information Rate : 0.6153          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.3802          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.3808          
##             Specificity : 0.9599          
##          Pos Pred Value : 0.8559          
##          Neg Pred Value : 0.7126          
##              Prevalence : 0.3847          
##          Detection Rate : 0.1465          
##    Detection Prevalence : 0.1712          
##       Balanced Accuracy : 0.6703          
##                                           
##        'Positive' Class : Present         
##

##############################################
##############################################
########## Plotting AUC Curve
##############################################
##############################################
par(pty = "s")

# Plot the ROC curve using the test set response and predicted probabilities
ROC <- roc(testData$sd, testData$Predicted, percent = TRUE,
           levels = c("Absent", "Present"), legacy.axes=TRUE)

# Plot the AUC using the ROC Curve
AUC_Plot <- plot.roc(ROC, col = "#00008B", lwd = 4, print.auc = TRUE,
                     xlab="False Positive Percentage", ylab="True Postive Percentage")

par(pty = "m")


##############################################
##############################################
########## Normalized Plots - Smoothening
##############################################
##############################################

# Set up the plot layout
op <- par(mfrow = c(3, 3))

# Function to normalize a vector to a 0-1 scale
normalize <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

# Define labels for ecotone category (A-P)
ecotone_labels <- LETTERS[1:16]  # Creates A-P

# Get importance of variables and sort them
imp <- importance(rf_season)
impvar <- rownames(imp)[order(imp[, 1], decreasing = TRUE)]

for (i in seq_along(impvar)) {
  var_name <- impvar[i]  # Get variable name
  
  if (impvar[i] == "ec") {
    # Handle categorical variable (ecotone category) as a bar plot
    mean_values <- tapply(predict(rf_season, testData, type = "prob")[, "Present"], 
                          testData[[impvar[i]]], mean)
    
    # Normalize the values
    mean_values <- normalize(mean_values)
    
    # Barplot with custom ecotone labels (A-P)
    barplot(mean_values, 
            names.arg = ecotone_labels,  # Replace numbers with A-P
            main = "Partial Dependence on Ecotone Category", 
            xlab = "Ecotone Category", 
            ylab = "Normalized Predicted Suitability", 
            col = "blue")
  } else {
    # Process numerical variables normally
    rf.partial.prob(rf_season, testData, var_name, "Present",
                    main = paste("Partial Dependence on", var_name),
                    smooth = "loess", raw = TRUE, rug = FALSE,
                    ylab = "Normalized Probability of Use")
  }
}

# Restore previous plotting settings
par(op)

#######################################
#######################################
####### Habitat Suitability Map
#######################################
#######################################

# Load the raster layers
# urban_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/Pcr.tif")
# wetland_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/pcw.tif")
# agric_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/pca.tif")
# road_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/pcr.tif")
# ecotone_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/ec.tif")
# bearing_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/be.tif")
# Longitude_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/Long.tif")
# Latitude_raster <- raster("G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data/Lat.tif")

urban_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Pcr.tif")
wetland_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcw.tif")
agric_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pca.tif")
road_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcr.tif")
ecotone_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/ec.tif")
bearing_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/be.tif")
Longitude_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Long.tif")
Latitude_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Lat.tif")

# Stack the rasters
env_var <- stack(urban_raster, wetland_raster, agric_raster, road_raster, ecotone_raster, bearing_raster, Longitude_raster, Latitude_raster)
names(env_var) <- c("pcu", "pcw", "pca", "pcr", "ec", "be", "Long", "Lat")


# Predict the suitability/probability of occurrence across the landscape
suitability_map <- predict(env_var, rf_season, type = "prob", index = 2, progress = "window")

# Plot the suitability map
plot(suitability_map, main = "Whooping Crane Habitat Suitability Map")

writeRaster(suitability_map, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - All Variables/RelativeProbabilityFunction_Fall15.tif", format = "GTiff", overwrite = TRUE)

Variations between years

# Set target year
target_year <- 2010

# Filter data for that year
subset_data <- wcDataframe %>%
  filter(Year == target_year)

# Check sample size
cat("Subset size:", nrow(subset_data), "rows\n")

## Subset size: 328 rows

# Split into training and testing
set.seed(1234)
train_idx <- createDataPartition(subset_data$sd, p = 0.70, list = FALSE)
train_data <- subset_data[train_idx, ]
test_data <- subset_data[-train_idx, ]

# Fit Random Forest model
formula <- sd ~ pcu + pcw + pca + pcr + ec + be + Long + Lat #+ DateNum

# Train model
(rf_year <- randomForest(formula,
                         data = train_data,
                         importance = TRUE,
                         Proximity = TRUE,
                         norm.votes=TRUE,
                         positive = "Present",
                         ntree = 500))

## 
## Call:
##  randomForest(formula = formula, data = train_data, importance = TRUE,      Proximity = TRUE, norm.votes = TRUE, positive = "Present",      ntree = 500) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##         OOB estimate of  error rate: 2.61%
## Confusion matrix:
##         Absent Present class.error
## Absent     155       2  0.01273885
## Present      4      69  0.05479452

importance(rf_year)

##         Absent   Present MeanDecreaseAccuracy MeanDecreaseGini
## pcu   1.612750  6.484834             6.649647        0.6263223
## pcw   8.956994 18.707413            19.102456        5.9273510
## pca   8.276827 25.358564            24.369277       12.1297920
## pcr   3.138296 17.230106            15.627606        5.7840279
## ec    5.339288 23.326454            21.614132        9.5726858
## be   21.419320 35.547618            35.008933       25.8215037
## Long 21.034911 33.026611            33.750272       23.1067555
## Lat   7.035034 27.607614            25.028676       14.6853681

varImpPlot(rf_year, main = "Species Distribution Model", scale = TRUE)

##############################################
##############################################
########## Making Predictions
##############################################
##############################################

# Predict probabilities on the test set
testData$Predicted <- predict(rf_year, testData, type = "prob")[, "Present"]

# Convert the predicted probabilities to "Absent" or "Present" based on a threshold
testData$PredictedClass <- ifelse(testData$Predicted > 0.5, "Present", "Absent")
testData$PredictedClass <- factor(testData$PredictedClass, levels = c("Absent", "Present"))

# Evaluate model performance
(cm <- confusionMatrix(data = testData$PredictedClass, 
                       reference = testData$sd,
                       positive = "Present"))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Absent Present
##    Absent     794     376
##    Present      4     123
##                                           
##                Accuracy : 0.707           
##                  95% CI : (0.6814, 0.7317)
##     No Information Rate : 0.6153          
##     P-Value [Acc > NIR] : 2.856e-12       
##                                           
##                   Kappa : 0.2807          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.24649         
##             Specificity : 0.99499         
##          Pos Pred Value : 0.96850         
##          Neg Pred Value : 0.67863         
##              Prevalence : 0.38473         
##          Detection Rate : 0.09483         
##    Detection Prevalence : 0.09792         
##       Balanced Accuracy : 0.62074         
##                                           
##        'Positive' Class : Present         
##

##############################################
##############################################
########## Plotting AUC Curve
##############################################
##############################################
par(pty = "s")

# Plot the ROC curve using the test set response and predicted probabilities
ROC <- roc(testData$sd, testData$Predicted, percent = TRUE,
           levels = c("Absent", "Present"), legacy.axes=TRUE)

# Plot the AUC using the ROC Curve
AUC_Plot <- plot.roc(ROC, col = "#00008B", lwd = 4, print.auc = TRUE,
                     xlab="False Positive Percentage", ylab="True Postive Percentage")

par(pty = "m")


##############################################
##############################################
########## Normalized Plots - Smoothening
##############################################
##############################################

# Set up the plot layout
op <- par(mfrow = c(3, 3))

# Function to normalize a vector to a 0-1 scale
normalize <- function(x) {
  (x - min(x)) / (max(x) - min(x))
}

# Define labels for ecotone category (A-P)
ecotone_labels <- LETTERS[1:16]  # Creates A-P

# Get importance of variables and sort them
imp <- importance(rf_year)
impvar <- rownames(imp)[order(imp[, 1], decreasing = TRUE)]

for (i in seq_along(impvar)) {
  var_name <- impvar[i]  # Get variable name
  
  if (impvar[i] == "ec") {
    # Handle categorical variable (ecotone category) as a bar plot
    mean_values <- tapply(predict(rf_year, testData, type = "prob")[, "Present"], 
                          testData[[impvar[i]]], mean)
    
    # Normalize the values
    mean_values <- normalize(mean_values)
    
    # Barplot with custom ecotone labels (A-P)
    barplot(mean_values, 
            names.arg = ecotone_labels,  # Replace numbers with A-P
            main = "Partial Dependence on Ecotone Category", 
            xlab = "Ecotone Category", 
            ylab = "Normalized Predicted Suitability", 
            col = "blue")
  } else {
    # Process numerical variables normally
    rf.partial.prob(rf_year, testData, var_name, "Present",
                    main = paste("Partial Dependence on", var_name),
                    smooth = "loess", raw = TRUE, rug = FALSE,
                    ylab = "Normalized Probability of Use")
  }
}

# Restore previous plotting settings
par(op)

#######################################
#######################################
####### Habitat Suitability Map
#######################################
#######################################

# Load the raster layers
urban_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Pcr.tif")
wetland_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcw.tif")
agric_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pca.tif")
road_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/pcr.tif")
ecotone_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/ec.tif")
bearing_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/be.tif")
Longitude_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Long.tif")
Latitude_raster <- raster("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Lat.tif")

# Stack the rasters
env_var <- stack(urban_raster, wetland_raster, agric_raster, road_raster, ecotone_raster, bearing_raster, Longitude_raster, Latitude_raster)
names(env_var) <- c("pcu", "pcw", "pca", "pcr", "ec", "be", "Long", "Lat")


# Predict the suitability/probability of occurrence across the landscape
suitability_map <- predict(env_var, rf_year, type = "prob", index = 2, progress = "window")

# Plot the suitability map
plot(suitability_map, main = "Whooping Crane Habitat Suitability Map")

writeRaster(suitability_map, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/Suitability Map - All Variables/RelativeProbabilityFunction_2010.tif", format = "GTiff", overwrite = TRUE)

Time series Analysis

# Define the raster folder
# raster_folder <- "G:/KDWP_WhoopingCrane/HabitatAssessment_Belaire_New/HabitatAssessment_Belaire/Raster Data"
raster_folder <- "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data"

# Define years and migration seasons
years <- seq(2010, 2016, by = 3)  # 5-years with a 5-year interval
spring_dates <- c("-03-23", "-04-15", "-05-10")  # Spring migration dates
fall_dates <- c("-09-15", "-10-10", "-11-05")  # Fall migration dates

# List to store suitability rasters
suitability_rasters <- list()

# Loop through years and seasons
for (year in years) {
  for (season_name in names(list(Spring = spring_dates, Fall = fall_dates))) {
    dates <- if (season_name == "Spring") spring_dates else fall_dates  # Select correct dates
    
    for (date_suffix in dates) {
      InputDate <- paste0(year, date_suffix)  # Construct date
      DatePredict <- as.numeric(as.Date(InputDate))  # Convert to numeric format for model
      
      # Load a reference raster to match extent
      date_raster <- raster(file.path(raster_folder, "pcw.tif"))
      values(date_raster) <- DatePredict  # Assign date value
      
      # Get environmental variables used in the model
      raster_vars <- rownames(SDmodelI$importance)  # Ensure this matches model variables
      raster_files <- file.path(raster_folder, paste0(raster_vars, ".tif"))
      raster_files <- raster_files[file.exists(raster_files)]  # Keep only existing files
      
      # Stack raster layers
      env_var <- stack(raster_files, date_raster)
      names(env_var) <- raster_vars  # Ensure names match the model
      
      # Generate output file name
      output_file <- file.path(raster_folder, paste0("Suitability_", year, "_", season_name, "_", gsub("-", "", date_suffix), ".tif"))
      
      # Remove existing file if needed
      if (file.exists(output_file)) {
        file.remove(output_file)
      }
      
      # Predict habitat suitability and save as GeoTIFF
      suitability_raster <- predict(env_var, SDmodelI, filename = output_file, type = "prob",
                                    index = 2, na.rm = TRUE, overwrite = TRUE, progress = "window")
      
      # Optional: Plot suitability map
      plot(suitability_raster, main = paste("Habitat Suitability -", season_name, year, date_suffix))
      
      # Store raster for animation
      suitability_rasters <- append(suitability_rasters, list(suitability_raster))
    }
  }
}

PART 6 - WIND RESOURCE ANALYSIS

Proximity Analysis

wksp <- "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire"

# Load data
# Combine path correctly
crane_path <- file.path(wksp, "Tables/Final RF Variables_Date.csv")


cranes <- read.csv(crane_path, header=TRUE, sep=",") %>% 
  st_as_sf(coords = c("Longitude", "Latitude"), crs = 4326) %>%
  st_transform(32614) # UTM zone for study area

turbine_path <- file.path(wksp, "Tables/TurbinesinHighSuitableAreas.shp")

turbines <- st_read(turbine_path) %>%
  st_transform(32614)

## Reading layer `TurbinesinHighSuitableAreas' from data source 
##   `C:\Users\ifeom\OneDrive - Kansas State University\Desktop\New Research Stuff to Merge with SSD Drive\HabitatAssessment_Belaire\Tables\TurbinesinHighSuitableAreas.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 344 features and 28 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -99.78402 ymin: 37.32852 xmax: -97.05572 ymax: 38.91019
## Geodetic CRS:  NAD83

# Classify crane points by turbine proximity
cranes <- cranes %>%
  mutate(
    NearTurbine = lengths(st_within(., turbines)) > 0,
    Season = ifelse(month(obs_date) %in% 3:5, "Spring", "Fall")
  )


study_area <- st_read("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Scripts/Whoopin-crane---Belaire/KansasBoundary/KansasBoundary.shp") %>% 
  st_transform(32614)

## Reading layer `KansasBoundary' from data source 
##   `C:\Users\ifeom\OneDrive - Kansas State University\Desktop\New Research Stuff to Merge with SSD Drive\HabitatAssessment_Belaire\Scripts\Whoopin-crane---Belaire\KansasBoundary\KansasBoundary.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 55 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -507498.9 ymin: -353385.8 xmax: 116034 ymax: 15392.83
## Projected CRS: North_America_Albers_Equal_Area_Conic

total_study_area <- sum(st_area(study_area)) / 1e6  # convert to km²


# Calculate use rate (locations/km²)
use_rates <- cranes %>%
  group_by(Season, NearTurbine) %>%
  summarise(
    Locations = n(),
    Area_km2 = ifelse(first(NearTurbine), 
                      sum(st_area(turbines)) / 1e6, 
                      total_study_area - sum(st_area(turbines)) / 1e6),
    UseRate = Locations / Area_km2,
    .groups = "drop"
  )

Convert Wind Data to Raster

# Replace with your actual file path
wind_polygons <- st_read("C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/New Data/KS_50mwind.shp")

## Reading layer `KS_50mwind' from data source 
##   `C:\Users\ifeom\OneDrive - Kansas State University\Desktop\New Research Stuff to Merge with SSD Drive\HabitatAssessment_Belaire\New Data\KS_50mwind.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 23815 features and 2 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -11360350 ymin: 4438143 xmax: -10529660 ymax: 4866419
## Projected CRS: WGS 84 / Pseudo-Mercator

# C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/New Data
wind_crs <- "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs"

if (st_crs(wind_polygons)$wkt != wind_crs) {
  wind_polygons <- st_transform(wind_polygons, crs = wind_crs)
}

wind_vect <- vect(wind_polygons)

# Create template (adjust extent as needed)
template <- rast(extent = ext(wind_polygons), 
                 resolution = 1000,  # NLCD standard resolution
                 crs = wind_crs)
# Create a raster with 500m resolution (adjust as needed)
#template <- rast(ext(wind_vect), resolution = 1000, crs(wind_crs))

# template <- rast("existing_raster.tif")

# Replace "class" with the actual field name containing your 6 classes
wind_raster <- rasterize(wind_vect, template, field = "WPC")
plot(wind_raster)

writeRaster(wind_raster, "C:/Users/ifeom/OneDrive - Kansas State University/Desktop/New Research Stuff to Merge with SSD Drive/HabitatAssessment_Belaire/Raster Data/wind_power_density.tif", overwrite = TRUE)

CONCLUSION AND FINAL THOUGHTS

Conclusion

Whooping cranes rely on a network of ephemeral and variable habitats, therefore static conservation models fail to capture the complexity of their spatial ecology. By incorporating time-sensitive and location-specific predictors and predicting habitat suitability across space and time, resource managers can proactively safeguard wetlands while managing agricultural landscapes to reduce displacement. Identifying drought-sensitive habitats allows for pre-emptive management of alternative stopover sites before water shortages occur.

Final Thoughts

The findings of this study provide a robust foundation for proactive conservation planning, which offers strategies that can be put to action to safeguard whooping crane habitats while accommodating the growing demands of renewable energy development. These results have important implications for conservation planning, showing that Whooping Cranes demonstrate behavioral plasticity in their habitat selection, therefore, the same conservation strategies should not be adopted for all regions along the migratory corridor. The high suitability of areas with wetland-adjacent habitats suggests that efforts should prioritize preserving and restoring wetlands, particularly those near existing agricultural landscapes, to maintain ecological functionality within the migratory corridor.

By integrating dynamic habitat suitability models with wind resource data, this research identifies critical stopover sites that must be prioritized for protection, as well as areas where wind energy projects could be developed with minimal ecological disruption. The spatial-temporal approach reveals how migratory corridors shift in response to environmental changes, enabling managers to anticipate and mitigate future conflicts. For instance, the identification of high-probability habitats in Kansas emphasizes the need for stringent protection in these regions, including buffer zones and restrictions on infrastructure development.

Wind energy developers can leverage the study’s suitability maps to guide project siting, ensuring turbines are placed in areas with low predicted crane activity. The analysis highlights western Kansas as optimal for low-impact development, where high wind potential coincides with low habitat suitability. However, even in these zones, precautionary measures—such as turbine shutdowns during peak migration periods and the use of radar systems to detect approaching flocks—can further reduce risks. Collaboration between conservationists, policymakers, and energy companies is essential to implement these strategies effectively. Policies that mandate pre-construction habitat assessments and incentivize wildlife-friendly siting practices can help balance renewable energy goals with species protection.

References

Armbruster, M. J. (1990). Characterization of Habitat Used by Whooping Cranes During Migration. U.S. Department of the Interior . Springfield, VA : U.S> Fish and Wildlife Service.

Austin, J. E., & Richert, A. L. (2001). comprehensive review of observational and site evaluation data of migrant whooping cranes in the United States, 1943–99. Jamestown, North Dakota: U.S. Geological Survey, Northern Prairie Wildlife Research Center .

Baasch, D. M., Farrell, P. D., Pearse, A. T., Brandt, D. A., & Caven, A. J. (2019). Diurnal habitat selection of migrating Whooping Crane in the Great Plains. USGS Northern Prairie Wildlife Research Center, 14(1), 1-16. doi:https://doi.org/10.5751/ACE-01317-140106

Bailey, R. G., Avers, P., King, T., & McNab, W. (1994). Ecoregions and Subregions of the United States [Map]. Washington, D.C., U.S: Department of Agriculture–Forest Service. doi:10.3133/70250183

Barbet-Massin, M., Jiguet, F., Albert, C. H., & Thuille, W. (2012, January 19). Selecting pseudo-absences for species distribution models: how, where and how many? Methods in Ecology and Evolution, 3(2), 327-338. Retrieved from https://doi.org/10.1111/j.2041-210X.2011.00172.x

Belaire, J. A., Kreakie, B. J., Keitt, T., & Minor, E. (2014, April). Predicting and mapping potential Whooping Crane stopover habitat to guide site selection for wind energy projects. Conservation Biology, 28(2), 541-50. doi:10.1111/cobi.12199

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Brizuela-Torres, D., Elith, J., Guillera-Arroita, G., & Briscoe, N. (2024, January 16). Dealing with sampling bias and inferring absence data to improve distribution models of a widely distributed vulnerable marsupial. Austral Ecology, 49(1), e13474. Retrieved from https://doi.org/10.1111/aec.13474

Burke, I., Kittel, T., Lauenroth, W., Snook, P., Yonker , C., & Parton , W. (1991, November). Regional Analysis of the Central Great Plains: Sensitivity to climate variability. BioScience, 41(10), 685-692. Retrieved from https://doi.org/10.2307/1311763

Canadian Wildlife Service and U.S. Fish and Wildlife Service. (2007). International recovery plan for the whooping crane. Albuquerque, New Mexico: Ottawa: Recovery of Nationally Endangered Wildlife (RENEW), and U.S. Fish and Wildlife Service.

Cutler, R. D., Edwards, J. T., Beard, K. H., Cutler, A., Hess, K. T., Gibson, J., & Lawler, J. J. (2007, December). Random Forests for Classification in Ecology. Ecology, 88(11), 2783-2792. doi:10.1890/07-0539.1

Fielding, A. H., & Bell, J. F. (1997). A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation, 24, 38-49.

French, J. J., Converse, S. J., & Austin, J. E. (2019). Whooping Cranes Past and Present. Whooping Cranes: Biology and Conservation, 3-14. Retrieved from https://doi.org/10.1016/B978-0-12-803555-9.00001-3

Harrell, W., & Bidwell, M. (2020). Climate impacts on whooping crane migration: A review of weather patterns and habitat availability. Ecological Applications, 30(4), e02145.

Hefley, T. J., Baasch, D. M., Tyre, A. J., & Blankenship, E. E. (2015, January 14). Use of opportunistic sightings and expert knowledge to predict and compare Whooping Crane stopover habitat. Conservation Biology, 29(5), 1337-1346. doi:10.1111/cobi.12515

Hijmans, R. J., Williams, E., & Vennes, C. (2011). Geosphere: spherical trigonometry. Retrieved from R package version 1.2–18: (http://CRAN.R-project.org/package=geosphere)

Iturbide, M., Bedia, J., Herrera, S., Hierro, O., Pinto, M., & Gutiérrez, J. M. (2015, May). A framework for species distribution modelling with improvedpseudo-absence generation. Ecological Modelling, 312, 166-174. Retrieved from http://dx.doi.org/10.1016/j.ecolmodel.2015.05.018

Kaminski, D., Comer, C., Garner, N., Hung, I.-K., & Calkins, G. (2013, September 19). Using GIS-based, regional extent habitat suitability modeling to identify conservation priority areas: A case study of the Louisiana black bear in east Texas. Journal of Wildlife Management, 77(8), 1639-1649. Retrieved from https://doi.org/10.1002/jwmg.611

Leaverton , C., & Fellows, V. (2023, August 22). Whooping Cranes: Reflecting on 50 Years of ESA Protection and Habitat Conservation with Our Partners. Retrieved from U.S.

Fish & Wildlife Service Website: https://www.fws.gov/story/2023-08/whooping-cranes-reflecting-50-years-esa-protection-and-habitat-conservation#:~:text=In%20response%2C%20the%20whooping%20crane,of%201966%2C%20the%20ESA’s%20predecessor.

Liaw, A., & Wiener, M. (2002, December 03). Classification and Regression by randomForest. R News, 2, pp. 18-22.

Pearse, A. T., Brandt, D. A., Harrell, W. C., Metzger, K. L., Baasch, D. M., & Hefley, T. J. (2015, January). Whooping Crane Stopover Site Use Intensity Within the Great Plains. USGS professional paper, 1-24. doi:DOI: 10.3133/ofr20151166

Pearse, A. T., Metzger, K. L., Brandt, D. A., Bidwell, M. T., Harner, M. J., Baasch, D. M., & Harrell, W. (2020, March). Heterogeneity in migration strategies of Whooping Cranes. The Condor, 122(1), 1-15. Retrieved from https://doi.org/10.1093/condor/duz056.

Pearse, A. T., Metzger, K. L., Brandt, D. A., Shaffer, J. A., & Bidwell, M. T. (2021, April). Migrating Whooping Cranes avoid wind‐energy infrastructure when selecting stopover habitat. Ecological Applications , 31(5). doi:10.1002/eap.2324

Pearse, A. T., Rabbe, M., Juliusson, L. M., Bidwell, M. T., Craig-Moore, L., Brandt, D. A., & Harrell, W. (2018, February 15). Delineating and identifying long-term changes in the whooping crane (Grus americana) migration corridor. PLoS ONE, 13(2), 1-15. Retrieved from https://doi.org/10.1371/journal.pone.0192737

Sullins, D., Haukos, D., Lautenbach, J., Lautenbach, J., Robinson, S., Rice, M., . . . Hagen, C. (2019, August 11). Strategic conservation for lesser prairie-chickens among landscapes of varying anthropogenic influence. Biological Conservation, 238, 108213. Retrieved from https://doi.org/10.1016/j.biocon.2019.108213

U.S. Fish & WIldlife Service. (2023, May 23). 2023 Wintering Whooping Crane Count. Retrieved from U.S. Fish & WIldlife Service: https://www.fws.gov/press-release/2023-05/2023-wintering-whooping-crane-count

U.S. Fish and Wildlife Service. (2009). Whooping Cranes and Wind Development. Regions 2 and 6: An Issue Paper. U.S. Fish and Wildlife Service.

United States Census Bureau. (2010, January 1). State Area Measurements and Internal Point Coordinates. Retrieved from census.gov: https://www.census.gov/geographies/reference-files/2010/geo/state-area.html

USGS: Land Management Research Program. (n.d.). America’s Grasslands: Great Plains and Pothole Prairies. Retrieved February 18, 2025, from U.S. Geological Survey: https://www.usgs.gov/programs/land-management-research-program/americas-grasslands-great-plains-and-pothole-prairies#:~:text=America%27s%20grasslands%20are%20in%20the,%2C%20Black%2Dfoot%20ferret%29.

Watershed Institute, Inc. (2013). Potentially Suitable Habitat Assessment for the Whooping Crane. Topeka, Kansas: The Watershed Institute.

World Wildlife Fund. (2024). Great Plains. Retrieved from www.worldwildlife.org: https://www.worldwildlife.org/places/great-plains#:~:text=This%20sweeping%20landscape%20in%20the%20heart%20of,tens%20of%20thousands%20of%20species%20of%20insects.

Xu, D., Bisht, G., Tan, Z., Sinha, E., Di Vittorio, A., Zhou, T., . . . Leung, L. (2024, March 18). Climate change will reduce North American inland wetland areas and disrupt their seasonal regimes. Nature Communications, 15, 2438. Retrieved from https://doi.org/10.1038/s41467-024-45286-z

WHCR Suitability Model Kansas - Random Forest Model Markdown

Ifeoma Okonye

2025-05-27

PART 1 - BACKGROUND AND GOALS

Introduction

Research Goals

Study Area

Materials and Methods

PART 2 - ANALYSIS

Exploratory analysis

Bearing Calculations

Ecotone Categories Analysis

Data Preparation

PART 3 - BELAIRE MODEL ADAPTATION

Data Formatting

Random Forest Model

Optimizing Model

Final Random Forest Model

Making Predictions

Plotting AUC Curve

Partial Dependence Plots

Habitat Suitability Map

Model Selection

Identify Habitats as Binary Variable

PART 4 - IMPROVED MODEL

Multicolinearity and Sample Balance Test

Model Selection

Train and Test Data

Random Forest model

Making Predictions

Plotting AUC Curve

Making Partial Dependence Plots

Create Resource Selection Function for a Day

Identify Habitats as Binary Variable

PART 5 - TEMPORAL ANALYSIS

Variations between migration seasons (Spring and Fall)

Variations between years

Time series Analysis

PART 6 - WIND RESOURCE ANALYSIS

Proximity Analysis

Convert Wind Data to Raster

CONCLUSION AND FINAL THOUGHTS

Conclusion

Final Thoughts

References