Title of Project: Spatial Analysis of Invasive Species Across Vermont

Abstract

Initially intended to examine invasive species within Niquette Bay State Park, this study expanded to encompass all of Vermont due to the limited availability of observational data. Utilizing point data on species occurrences along with WorldClim rasters for climatic variables, a Species Distribution Model (SDM) was developed to assess the potential distribution of key invasive species across the state under prevailing environmental conditions. Complementing this, high-resolution weather data from OpenWeather for Vermont facilitated detailed hotspot detection and cluster analysis. This integrative approach revealed significant spatial trends and ecological impacts of invasive species, identifying critical areas requiring targeted management interventions. The study illustrates the value of adaptive research frameworks in ecological studies, demonstrating how expansive data analysis can provide comprehensive insights into species dynamics and inform effective conservation strategies. Through rigorous analysis, the project not only maps current distributions but also predicts future spread, serving as a crucial tool for ecological management and decision-making in Vermont.

Introduction and Background

Introduction

Dealing with invasive species is a critical ecological challenge in Vermont, a region known for its rich biodiversity but increasingly threatened by non-native species. This project initially focused on the impacts of invasive species in Niquette Bay State Park, but then expanded to the entire state to take a more comprehensive analytical approach. The presence of invasive species in Vermont disrupts local habitats, competes with native flora and fauna, and necessitates urgent and effective management strategies. The urgency and relevance of this study is underscored by the pioneering work of Jane Elith and her colleagues, who have emphasized the critical role of accurate Species Distribution Modeling (SDM) for conservation and ecological management (Elith et al., 2011). The expansion of the project is driven by the need to understand not only the local impacts, but also the broader ecological impacts across Vermont.

Background

Vermont’s ecosystems are at great risk from invasive species that destabilize local ecological networks and reduce biodiversity. Vermont Emergency Management reports from 2018 show that nearly one-third of Vermont’s plant species are classified as invasive. Such statistics underscore the urgent need to develop effective strategies to curb the spread and impact of these species. To accomplish this, the study uses advanced SDM techniques that integrate environmental variables with species occurrence data to provide a detailed prediction of their potential spread. This approach is consistent with methods discussed in the literature by Elith and others (Elith et al., 2010) and provides a robust framework for strategic conservation planning.

Research Question

This study is guided by a central question that addresses both the ecological and management dimensions of Vermont’s invasive species problem: “What is the potential distribution of key invasive species in Vermont under current environmental conditions?” This question aims to direct the study toward a detailed examination of geographic and environmental data to identify the areas where invasive species are most likely to spread. The answer will help formulate strategic conservation planning that can be implemented to effectively manage and potentially mitigate the impacts of these invasive species throughout the state.

Data Overview

This section provides a detailed description of the data used in the study, which is divided into three main categories. Each category plays a critical role in analyzing and modeling the distribution and impacts of invasive species throughout Vermont.

Species Observation Data

Environmental Variables Data

  • OpenWeather: Utilized for obtaining real-time and historical weather data, integrating this with species data to examine correlations with invasion events.
  • WorldClim: Provides historical climate data essential for modeling the potential future spread of species under different climate scenarios.

Data Preparation and Analysis Tools

  • R Packages: Used for comprehensive data manipulation, statistical analysis, and spatial modeling, crucial for SDM.

Additional Resources for Validation and Enhancement

  • US Geological Survey (USGS): Supplements the primary data with additional environmental data for a broader ecological assessment.
  • National Oceanic and Atmospheric Administration (NOAA): Provides long-term climate data to understand trends and forecast future ecological changes.

Methodology Overview

Figure 1: Methodology Flowchart

Methodology Approach Overview

This research is centered around the exploration of spatial dynamics of invasive species, with a particular focus on Agrilus planipennis (emerald ash borer) and Adelges tsugae (hemlock woolly adelgid), across the Vermont region. The goal is to use spatial analysis and Species Distribution Modeling (SDM) to understand the ecological factors that drive the spread of these species and to develop effective management and conservation strategies.

The methodology integrates the use of RStudio and GIS for the following key processes:

Tools and Processes

  1. Data Acquisition and Preparation: Collection of extensive species occurrence and environmental data.
  2. Exploratory Analysis: Initial analysis to identify patterns and trends.
  3. Spatial Clustering: Identification of clusters of invasive species within the park.
  4. Species Distribution Modeling (SDM): Species distribution modeling based on environmental variables.
  5. Visualization: Creation of visual data representations to illustrate the results.

Recognize Limitations

  • Data dependence: The depth of the study depends on the availability of data and the accuracy of the models.
  • Ecosystem dynamics: Recognizes that continuous data collection and model refinement is necessary to adapt to ecological changes.
  • Conservation flexibility: Emphasizes the need to adapt management strategies based on new knowledge and environmental changes.
# ----------------------------------------------------------------------
## Step 1: Set up Environment for Spatial Analysis
# ----------------------------------------------------------------------

# Set system environment variables for the 'sf' package. These variables specify the directories 
# where the GDAL and PROJ data files are located, which are necessary for spatial data operations in R.
Sys.setenv(GDAL_DATA = "C:/OSGeo4W/share/gdal")   # Location of GDAL data files
Sys.setenv(PROJ_LIB = "C:/OSGeo4W/share/proj")    # Location of PROJ data files
Sys.setenv(PATH = paste("C:/OSGeo4W/bin", Sys.getenv("PATH"), sep=";"))  # Add GDAL binaries to system PATH

# Check the versions of GDAL, GEOS, and PROJ used by the 'sf' package. This command outputs the 
# versions of these libraries to ensure they are correctly loaded and compatible.
sf::sf_extSoftVersion()

# ----------------------------------------------------------------------
## Step 2: Load Necessary Libraries
# ----------------------------------------------------------------------

# Function to check and install any missing packages
install_if_missing <- function(package) {
  if (!require(package, character.only = TRUE)) {
    install.packages(package)
    library(package, character.only = TRUE)
  }
}

# List of required packages
packages <- c("dismo", "raster", "sp", "readr", "dplyr", "ggplot2", 
              "terra", "sf", "tmap", "tmaptools", "lubridate", "stringr", 
              "rasterVis", "RANN", "rJava", "predicts", "akima", "gstat",
              "tidyverse", "dbscan", "maps", "plotly", "rasterVis", "rnaturalearth")

# Load all required libraries
invisible(sapply(packages, install_if_missing))

# Message to confirm loading
message("All required libraries are loaded successfully!")

library(dismo)      # Tools for species distribution modeling
library(raster)     # Provides an interface for handling and analyzing raster data
library(sp)         # Handles spatial data frames
library(readr)      # Efficient reading and writing of data files
library(dplyr)      # Data manipulation within the tidyverse ecosystem
library(ggplot2)    # Creation of complex plots from data frames
library(terra)      # Enhanced raster data analysis and manipulation methods
library(sf)         # Handling of spatial data frames, integrating powerful libraries like GDAL and PROJ
library(tmap)       # Thematic maps designed for spatial data visualization
library(tmaptools)  # Additional tools for working with 'tmap' package
library(lubridate)  # Date-time data manipulation
library(stringr)    # Manipulation of strings
library(rasterVis)  # Visualization tools for raster data
library(RANN)       # Nearest neighbor search and classification
library(rJava)      # Integration of Java within R, enabling Java-based operations
library(predicts)   # Verify if 'predicts' package is accurately named and replace or correct if necessary
library(akima)      # For interpolation of irregularly spaced data
library(gstat)      # For spatial data analysis and geostatistics
library(tidyverse)  # Comprehensive suite of packages for data manipulation and visualization
library(dbscan)     # Implements the DBSCAN clustering algorithm for spatial data analysis
library(maps)       # For creating geographical maps
library(plotly)     # Interactive plotting and graphical tools
library(rasterVis)  # Visualization enhancements for the 'raster' package
library(rnaturalearth)  # Tools for accessing natural earth map data


# ----------------------------------------------------------------------
## Step 3: Define Base Path for Data Files
# ----------------------------------------------------------------------

# Set the base directory
base_directory <- "D:/GEOG_588/SDM_Invasive_species"

# Set the base directory as the working directory
setwd(base_directory)

# ----------------------------------------------------------------------
## Step 4: Read Species and Observation Data
# ----------------------------------------------------------------------

# Reading in species and observation data
species_data <- read_csv("cleaned_invasive_species.csv")
observation_data <- read_csv("cleaned_Observation_VT.csv")

# ----------------------------------------------------------------------
## Step 5: Convert Data to Spatial Objects and Perform Spatial Join
# ----------------------------------------------------------------------

# Convert data to spatial objects while preserving original longitude and latitude
species_data_sf <- st_as_sf(species_data, coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)
observation_data_sf <- st_as_sf(observation_data, coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)

# Spatial join of datasets with clear suffixes for overlapping columns
master_observation_list_sf <- st_join(observation_data_sf, species_data_sf, join = st_nearest_feature, left = TRUE, suffix = c(".obs", ".spec"))

# ----------------------------------------------------------------------
## Step 6: Rename Columns and Convert Spatial Object to Data Frame
# ----------------------------------------------------------------------

# After join, rename columns to clearly indicate their source
master_observation_list_sf <- master_observation_list_sf %>%
  rename(
    latitude_obs = latitude.obs,
    longitude_obs = longitude.obs,
    latitude_spec = latitude.spec,
    longitude_spec = longitude.spec
    # Add other renames as necessary
  )

# Convert spatial object back to dataframe and check data
master_observation_list <- as.data.frame(master_observation_list_sf)

# Ensure longitude, latitude, and geometry are retained
# Extract longitude and latitude from the geometry column
master_observation_list_sf$longitude <- st_coordinates(master_observation_list_sf)[, 1]
master_observation_list_sf$latitude <- st_coordinates(master_observation_list_sf)[, 2]

# ----------------------------------------------------------------------
## Step 7: Save Spatial Data as GeoPackage and Regular Data as CSV
# ----------------------------------------------------------------------

# Save the spatial data with geometry as a GeoPackage
write_sf(master_observation_list_sf, "output/Master_Observation_List_with_NA.gpkg", layer = "observations", driver = "GPKG")

# Convert the spatial data frame to a regular data frame for saving as CSV
master_observation_list <- as.data.frame(master_observation_list_sf)

# Remove geometry column for the non-spatial CSV version
master_observation_list <- master_observation_list %>%
  dplyr::select(-geometry)

# Save as CSV without the geometry column
write_csv(master_observation_list, "output/Master_Observation_List_with_NA.csv")

# ----------------------------------------------------------------------
## Step 8: Loading the Geometry Master List
# ----------------------------------------------------------------------

# 8.1: Load Spatial Dataframe
master_observation_list_sf <- read_sf("output/Master_Observation_List_with_NA.gpkg")

# 8.2: Check for Missing Geometry Data
if (any(is.na(st_geometry(master_observation_list_sf)))) {
  stop("Geometry data missing in master_observation_list_sf")
}

# 8.3: Load Species Data
species_data <- read_csv("cleaned_invasive_species.csv")

# 8.4: Convert Species Data to Spatial Data Frame
species_data_sf <- st_as_sf(species_data, coords = c("longitude", "latitude"), crs = 4326)

# Load the required packages
library(maps)

# 8.5: Fetch Vermont map data
vermont_map <- map_data("state", region = "vermont")

# Check the structure of the map data
print(head(vermont_map))

# Convert map data to data frame
vermont_df <- as.data.frame(vermont_map)

# Print the structure of the data frame
print(head(vermont_df))

# 8.6: Convert Data Frame to Spatial Format (sf object)
vermont_sf <- st_as_sf(vermont_df, coords = c("long", "lat"), crs = 4326) %>%
  st_combine() %>%
  st_sfc() %>%
  st_cast("POLYGON") %>%
  st_cast("MULTIPOLYGON")

# 8.7: Create a Single sf Object Containing Vermont's Geometry
vermont_sf <- st_sf(geometry = vermont_sf)

Spatial Distribution of Invasive Species Across Vermont

Overview: The map presented below illustrates the spatial distribution of various invasive species across Vermont, clearly delineated by the state’s geographic boundaries.

Features: - Color-Coding: Each invasive species is marked with a distinct color, providing a rapid visual reference that aids in assessing biodiversity issues related to invasive species. - Geographic Clarity: The state boundaries are prominently displayed, ensuring that the spatial context of the data is immediately apparent.

Purpose: This visualization is designed to aid in the quick assessment of biodiversity issues by providing an easily interpretable graphical representation of data, which is crucial for ecological management and decision-making processes.

# 8.8: Plot Vermont's Boundary
vermont_plot <- ggplot(vermont_sf) +
  geom_sf(fill = "lightblue", color = "darkblue") +
  ggtitle("Boundary of Vermont")

# 8.9: Visualize Distribution of Invasive Species
if ("invasive_name" %in% colnames(species_data)) {
  species_map <- ggplot(data = species_data_sf) +
    geom_sf(aes(color = invasive_name)) +
    geom_sf(data = vermont_sf, fill = NA, color = "black") +
    labs(title = "Distribution of Invasive Species",
         subtitle = "Spatial Distribution of Invasive Species Occurrences",
         x = "Longitude", y = "Latitude") +
    theme_minimal() +
    theme(legend.position = "right", 
          legend.justification = "top") # Adjusting legend position
  print(species_map)
} else {
  cat("Column 'invasive_name' does not exist in the dataset. Check column names with colnames(species_data_sf).\n")
}
Figure 2: Spatial distribution of invasive species in Vermont.

Figure 2: Spatial distribution of invasive species in Vermont.

# Boundary of Vermont Plot: This is a straightforward representation of Vermont’s state boundary. It helps one understand the physical outline and geographical location of Vermont within a larger map or context.
# Distribution of Invasive Species Plot: This plot provides insight into where different invasive species have been identified within Vermont. 
# By differentiating species by color, it allows for a quick visual assessment of biodiversity concerns and can help in environmental management and conservation efforts. 
# This plot not only shows where invasive species are located but also indicates areas potentially at risk of ecological impacts due to these species.


# 8.10: Print Column Names from Master Observation List
print(colnames(master_observation_list_sf))



Figure 2: Spatial Analysis of Invasive Species Spread in Vermont

Overview: This chart visualizes the spatial distribution and density of invasive species across Vermont from 2010 to 2020. Data points on the map, differentiated by color, represent specific locations where various species categories have been observed. Data sourced from iNaturalist Vermont and the Vermont Invasive Species Database & Vermont Open Geodata Portal, highlight areas with rising occurrences and potential hotspots.



Summary of Spatial Analysis Workflow

Introduction: This R script provides a comprehensive approach to preparing and analyzing geographical data, targeting the spread of invasive species in Vermont. Each step builds logically on the previous, ensuring a thorough analysis.

  • Setting up the Environment for Spatial Analysis:
    Configures the environment to recognize essential geographical data libraries such as GDAL and PROJ, crucial for enabling subsequent handling of spatial data.

  • Loading Necessary Libraries:
    Loads various R packages that support data manipulation, visualization, and spatial analysis, ensuring all necessary tools are available.

  • Defining the Base Path for Data Files:
    Establishes a base directory for the project’s data files, streamlining access and clarifying their locations within the system.

  • Reading Species and Observation Data:
    Imports key data on invasive species occurrences and observations from CSV files into R, forming the basis for detailed analysis.

  • Converting Data to Spatial Objects and Performing Spatial Join:
    Converts the imported data into spatial objects, enabling manipulation of their geographical properties. Performs a spatial join, merging data based on geographic proximity.

  • Renaming Columns and Converting Spatial Object to Data Frame:
    Refines column names in the joined data for clarity, and converts the spatial object into a data frame for easier manipulation in tabular form.

  • Saving Spatial Data as GeoPackage and Regular Data as CSV:
    Saves the processed spatial data in both GeoPackage and CSV formats to accommodate different analysis needs and software compatibility.

  • Loading the Geometry Master List:
    Loads the previously saved GeoPackage, checking for completeness and integrity of the spatial data, essential for accurate analysis.

  • Visualizing and Checking Data:
    Generates a visual representation of Vermont’s map and its boundaries to confirm the accuracy of the spatial data setup.

  • Conclusion:
    This structured approach facilitates analysis of invasive species distribution and aids in projecting their future spread, providing critical insights for ecological management strategies.



Exploratory Data Analysis (EDA)

From Data Acquisition to Exploratory Data Analysis (EDA)

Transition: Having successfully collected and prepared the dataset, the exploratory data analysis (EDA) phase begins, aiming to provide initial insights that can guide further analysis.

Purpose of EDA: - Understanding the Dataset: Crucial for revealing the structure, patterns, and potential relationships within the data. - Visualization Techniques: Focuses on summarizing the data and creating various visualizations, such as maps, to explore geographical and spatial distributions, identify trends, detect anomalies, and investigate areas requiring further exploration.

Impact of EDA: - Systematic Interpretation: Ensures systematic interpretation of the rich information contained in the dataset, enhancing understanding and decision-making processes. - Foundation for Ongoing Research: The insights gained are critical to ongoing research and analysis, particularly in understanding the spatial dynamics of invasive species.



Filter species data for specific invasive species and plot their distribution within Vermont

# Step 9: Filtering Species Data and Visualizing Geographic Features
# ---------------------------------------------------------------------------------------------------------

# Filter species data for specific invasive species and plot their distribution within Vermont
target_species <- c("Emerald Ash Borer", "Hemlock Woolly Adelgid")
filtered_species_data_sf <- species_data_sf %>%
  filter(invasive_name %in% target_species)

if (nrow(filtered_species_data_sf) > 0) {
  species_map_filtered <- ggplot(data = filtered_species_data_sf) +
    geom_sf(aes(color = invasive_name)) +
    geom_sf(data = vermont_sf, fill = NA, color = "black") +
    labs(title = "Target Invasive Species Distribution",
         subtitle = "Emerald Ash Borer and Hemlock Woolly Adelgid",
         x = "Longitude", y = "Latitude") +
    scale_color_manual(values = c("Emerald Ash Borer" = "darkgreen", "Hemlock Woolly Adelgid" = "darkmagenta")) +
    theme_minimal()
  print(species_map_filtered)
} else {
  cat("No data found for the specified invasive species.\n")
}
Figure 3: Filtered distribution of specific invasive species in Vermont.

Figure 3: Filtered distribution of specific invasive species in Vermont.



Figure 3: Geographical Distribution of Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont

Overview: This map visualizes the geographical spread of two invasive species, the Emerald Ash Borer and Hemlock Woolly Adelgid, across Vermont. Observation locations are marked with distinguishing colors to underscore areas currently impacted or at risk.

Utility: - Ecological Management: Facilitates the identification of hotspots for targeted intervention. - Stakeholder Engagement: Serves as a clear, accessible representation of data for stakeholders involved in Vermont’s environmental health.

Design Effectiveness: - Communication: Ensures that essential information is communicated effectively, promoting understanding and engagement in the state’s conservation strategies.



What This Figure Represents:

Description: This visualization depicts the geographical spread of the Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont.

Details: - Observation Marking: The map distinctly marks locations where these species have been observed, emphasizing regions currently impacted or vulnerable to future invasions.

Why This Is Important:

Significance: The map’s utility lies in its ability to aid environmental conservation and resource management.

Impact: - Forest Health and Biodiversity: Both invasive species pose significant threats to forest health and biodiversity. - Proactive Measures: By charting their presence, this tool supports proactive measures in controlling the infestation and provides valuable insights for ongoing ecological monitoring efforts.



Distribution of Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont

Focus: Advanced Geographic Filtering and Visualization

Objective: Implement advanced geographic filtering to refine visualization and focus on critical areas more effectively.

Outcome: - Enhanced Clarity: Improves the visualization of impacted regions, allowing for more precise management actions and resource allocation.



# ---------------------------------------------------------------------------------------------------------
# Step 10: Advanced Geographic Filtering and Visualization
# ---------------------------------------------------------------------------------------------------------

# Print new column names to confirm
print(colnames(species_data))

# Define target species for detailed analysis
target_species <- c("Emerald Ash Borer", "Hemlock Woolly Adelgid")

# Filter the dataset to include only the specified invasive species
filtered_species_data_sf <- species_data_sf %>%
  filter(invasive_name %in% target_species)

# Check if any records are found for the specified species and plot their spatial distribution
if (nrow(filtered_species_data_sf) > 0) {
  species_map_filtered <- ggplot(data = filtered_species_data_sf) +
    geom_sf(aes(color = invasive_name)) +  # Ensure this matches the actual column name
    geom_sf(data = vermont_sf, fill = NA, color = "black") +  # Overlay Vermont state boundary for reference
    labs(title = "Target Invasive Species Distribution",
         subtitle = "Emerald Ash Borer and Hemlock Woolly Adelgid",
         x = "Longitude", y = "Latitude") +
    scale_color_manual(values = c("Emerald Ash Borer" = "darkgreen", "Hemlock Woolly Adelgid" = "darkmagenta")) +
    theme_minimal()
  
  print(species_map_filtered)
} else {
  cat("No data found for the specified invasive species.\n")
}
Figure 4: Distribution of Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont.

Figure 4: Distribution of Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont.

# Specific location - Niquette Bay State Park

# Define the coordinates for Niquette Bay State Park
park_lat <- 44.5919
park_lon <- -73.1900

# Define a buffer around the park coordinates to create a more accurate representation of the park's area
buffer_deg_lat <- 0.0078  # Latitude buffer to approximate 0.5 km
buffer_deg_lon <- 0.0109  # Longitude buffer to approximate 0.5 km

# Create a matrix defining the park's boundary polygon based on the buffered coordinates
park_polygon_coords <- matrix(
  c(park_lon - buffer_deg_lon, park_lat - buffer_deg_lat,
    park_lon + buffer_deg_lon, park_lat - buffer_deg_lat,
    park_lon + buffer_deg_lon, park_lat + buffer_deg_lat,
    park_lon - buffer_deg_lon, park_lat + buffer_deg_lat,
    park_lon - buffer_deg_lon, park_lat - buffer_deg_lat),  # Close the polygon by repeating the first point
  ncol = 2, byrow = TRUE)

# Convert the polygon matrix into a spatial object (sf) for use in geographic operations
study_area_sf <- st_sf(geometry = st_sfc(st_polygon(list(park_polygon_coords)), crs = st_crs(4326)))

# Filter the species dataset to only include observations within the park's boundary
filtered_species_in_park <- st_intersection(species_data_sf, study_area_sf)

# Plot the filtered dataset 
if (nrow(filtered_species_in_park) > 0) {
  park_species_map <- ggplot(data = filtered_species_in_park) +
    geom_sf(aes(color = invasive_species_invasivename)) +
    labs(title = "Invasive Species within Niquette Bay State Park",
         subtitle = "Filtered to park boundaries",
         x = "Longitude", y = "Latitude") +
    theme_minimal()
  
  print(park_species_map)
} else {
  message("No invasive species data found within the park boundaries.")
}


# Advanced Geographic Filtering and Visualization
# Summary: Here, the code delves deeper into Niquette Bay State Park, filtering invasive species data specifically for this location. 
# By focusing on the park's boundaries, it offers insights crucial for park management and conservation strategies. 
# The plotted distribution of invasive species within the park aids in understanding local ecological impacts and informs conservation decisions.

# While both steps involve filtering species data and visualizing their distribution, 
# Step 10 provides a more detailed analysis by focusing on a specific location, Niquette Bay State Park. Unlike Step 9, which maps the distribution of the selected invasive species across Vermont as a whole, 
# Step 10 zooms in on the park's boundaries for a more localized perspective. 
# This allows for a targeted assessment of invasive species presence within the park, offering insights tailored to park management and conservation efforts.



Figure 4: Invasive Species Observations within Niquette Bay State Park

Overview: This detailed map highlights the filtered observations of invasive species specifically within the boundaries of Niquette Bay State Park. The visualization focuses on the distribution of the Emerald Ash Borer and Hemlock Woolly Adelgid.

Features: - Color Coding: Each observation is color-coded to distinguish between the species, facilitating immediate visual identification of the areas most affected. - Targeted Assessment: Provides a refined view that is critical for local ecological assessment and park management.

Utility: - Conservation Strategy Development: Instrumental for park authorities in developing precise conservation strategies. - Priority Setting: Helps prioritize efforts to control infestations, thereby protecting the park’s natural biodiversity and forest health.



Comparison with Previous Step (Step 9)

Objective: Delineate the distinctions between the previous step and the current mapping efforts.

Key Differences: - Focus of Analysis: While both steps involve filtering species data and visualizing their distribution, Step 10 offers a more detailed analysis by concentrating specifically on Niquette Bay State Park. - Scope of Visualization: Unlike Step 9, which mapped the distribution of selected invasive species across Vermont as a whole, Step 10 zooms in on the park’s boundaries for a more localized perspective.

Implications: - Targeted Assessment: Allows for a targeted assessment of invasive species presence within the park, offering insights tailored to park management and conservation efforts.



Mapping Invasive Species Observations within the Park Area

Park Area Definition and Spatial Analysis Setup:

Objective: Define the park area using spatial data tools and set up the analysis for mapping invasive species observations within Niquette Bay State Park.

Approach: - Spatial Definition: Creates a spatial feature object to define the park area with precise boundaries. - Data Filtering: Filters invasive species observations to those specifically within the park, enhancing the relevance of the data for local management decisions.



# Description:
# This section defines the park area as a spatial object using specified boundaries and a coordinate reference system (CRS).
# It transforms the CRS of the species data to match that of Vermont and the park boundary for accurate spatial analysis.
# Finally, it creates a ggplot map displaying the Vermont state boundary, the park area, and species observations for context.

# Define the park area as an sf object with specified boundaries and coordinate reference system
park_area_sf <- st_sf(geometry = st_sfc(st_polygon(list(park_polygon_coords)), crs = st_crs(4326)))

# Transform the CRS of the species data to match that of Vermont and park boundary for accurate spatial analysis
species_data_sf <- st_transform(species_data_sf, st_crs(vermont_sf))
park_area_sf <- st_transform(park_area_sf, st_crs(vermont_sf))

# Create a ggplot map displaying the Vermont state boundary, the park area, and species observations for context
species_v2_map <- ggplot() +
  geom_sf(data = vermont_sf, fill = NA, color = "black", size = 0.5) +
  geom_sf(data = park_area_sf, fill = NA, color = "red", size = 0.5) +
  geom_sf(data = species_data_sf, color = "blue", size = 1.5, alpha = 0.6) +
  labs(title = "Invasive Species Observations in Niquette Bay State Park",
       subtitle = "A Geographic Overlay with Vermont State Boundary",
       x = "Longitude", y = "Latitude") +
  theme_minimal() +
  theme(legend.position = "bottom")

# Display the map to show results
print(species_v2_map)
Figure 5: Distribution of Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont.

Figure 5: Distribution of Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont.

# What It Shows: This map indicates where invasive species have been found across Vermont. It is designed to particularly highlight the presence or absence of these species within the boundaries of Niquette Bay State Park.
# Key Takeaway: Despite the broader issue of invasive species across the state, as indicated by the numerous blue dots, Niquette Bay State Park appears to be free from these observations, 
# which is implied by the absence of blue dots within the park's red boundary. This could suggest that conservation efforts within the park are effective, or that further monitoring is needed to confirm these findings.
# Why This Matters: Invasive species can significantly impact native ecosystems and biodiversity. This map is a tool for ecologists, conservationists, and park managers to understand where to focus their efforts to control and monitor invasive species.



Figure 5: Invasive Species Observations in Niquette Bay State Park

Overview: This map overlays invasive species observations within the state of Vermont, focusing specifically on Niquette Bay State Park. The state boundary is outlined for context, and observations of the Emerald Ash Borer and Hemlock Woolly Adelgid are marked.

Insights: - Local vs. State Challenges: Offers a visual representation of the ecological challenges at both the state and local park level. - Conservation Effectiveness: The absence of marked observations within the red boundary of the park may suggest effective conservation efforts or the need for further ecological monitoring.

Utility: - Tool for Action: This visualization is critical for directing conservation and control efforts where they are most needed, aiming to safeguard Vermont’s native ecosystems and biodiversity.



Key Takeaway: - Despite the widespread presence of invasive species across the state, indicated by numerous blue dots, Niquette Bay State Park appears free from these observations, as evidenced by the absence of blue dots within the park’s red boundary. This observation suggests effective conservation efforts, although continuous monitoring is recommended to confirm these findings.

Why This Matters: - Invasive species significantly impact native ecosystems and biodiversity. This map serves as a valuable tool for ecologists, conservationists, and park managers, helping them focus their efforts where they are most needed to control and monitor invasive species.



Filtering and Mapping Invasive Species Observations within the Park Area

Park Area Definition and Spatial Filtering:

Objective: - This section creates a new sf (spatial feature) object to define the park area with updated boundaries.

Process: - Spatial Intersection: Performs a spatial intersection to filter observations within the park. - Visualization: Checks and plots the filtered observations if available, providing visual evidence of the effectiveness of existing management strategies.



# ---------------------------------------------------------------------------------------------------------
# Step 12: Filtering and Mapping Invasive Species Observations within the Park Area
# ---------------------------------------------------------------------------------------------------------

# Description:
# This section creates a new sf object to define the park area with updated boundaries.
# It performs spatial intersection to filter observations within the park.
# Finally, it checks and plots the filtered observations if available.

# Create a new sf object to define the park area with updated boundaries
park_area_sf <- st_sf(geometry = st_sfc(st_polygon(list(park_polygon_coords)), crs = st_crs(filtered_species_data_sf)))

# Perform spatial intersection to filter observations within the park
filtered_species_in_park <- st_intersection(filtered_species_data_sf, park_area_sf)

# Check and plot filtered observations if available
if (nrow(filtered_species_in_park) > 0) {
  park_species_map <- ggplot(data = filtered_species_in_park) +
    geom_sf(aes(color = invasive_name)) +
    labs(title = "Invasive Species within Niquette Bay State Park",
         subtitle = "Filtered to park boundaries",
         x = "Longitude", y = "Latitude") +
    theme_minimal()
  
  print(park_species_map)
} else {
  message("No invasive species data found within the park boundaries.")
}

# Understanding the Results: The spatial analysis aimed to identify and map invasive species specifically within Niquette Bay State Park. The code was set up to visualize this information.
# Key Message: The analysis concludes that no invasive species have been detected within the park boundaries according to the available data.
# Significance: The absence of invasive species within the park could indicate effective conservation efforts. However, it's also a reminder that monitoring and data collection are essential for accurate ecological assessments.
# Next Steps: The result calls for ongoing ecological monitoring to ensure that invasive species have not been overlooked and to maintain the health of the park's ecosystem.



Analysis Summary for Filtering and Mapping Invasive Species Observations within the Park Area

Overview: The spatial analysis was conducted to identify and map the presence of invasive species specifically within Niquette Bay State Park.

Key Findings: - Result: The analysis concludes that no invasive species have been detected within the park boundaries according to the available data. - Significance: This outcome may reflect effective conservation efforts. Nonetheless, it underscores the necessity of continued monitoring and comprehensive data collection to maintain accurate ecological assessments.

Next Steps: - Ongoing Monitoring: Calls for persistent ecological monitoring to verify that invasive species have not been overlooked and to sustain the health of the park’s ecosystem.



Analyzing Invasive Species Distribution within Park Boundaries

Objective: - This section aims to identify and analyze the presence of invasive species within the specified boundaries of Niquette Bay State Park, focusing on assessing their density and distribution.

Approach: - Visualization: Includes visualizing invasive species data within the park boundary to provide clear management insights. - Expansion of Analysis: If no observations of invasive species are found within the park, the analysis is extended to encompass the entire state of Vermont, ensuring a comprehensive assessment of the ecological threat posed by invasive species.

Management Implications: - Data-Driven Decisions: The findings aid in making informed decisions about the conservation and management strategies necessary to protect the park and potentially similar ecosystems throughout Vermont.



# ---------------------------------------------------------------------------------------------------------
# Step 13: Analyzing Invasive Species Distribution within Park Boundaries
# ---------------------------------------------------------------------------------------------------------

# Description:
# This section identifies and analyzes invasive species within the specified park boundaries.
# It assesses the density and distribution of invasive species within the park.
# Additionally, it visualizes invasive species data within the park boundary and provides management insights.
# If no observations of invasive species are found within the park, it expands the analysis to the entire state of Vermont.

# Identify and analyze invasive species within the specified park boundaries.
species_in_park <- st_intersection(species_data_sf, park_area_sf)

# Assess the density and distribution of invasive species within park boundaries.
if (nrow(species_in_park) > 0) {
  # Extract longitude and latitude from the geometry column for plotting.
  species_in_park$longitude <- st_coordinates(species_in_park)[, 1]
  species_in_park$latitude <- st_coordinates(species_in_park)[, 2]
  
  # Visualize the density and specific locations of invasive species using a 2D density plot combined with point data.
  ggplot(data = species_in_park) +
    geom_density_2d(aes(x = longitude, y = latitude, color = invasive_name)) +
    geom_point(aes(x = longitude, y = latitude, color = invasive_name), size = 1, alpha = 0.6) +
    labs(title = "Density and Spread of Invasive Species within Niquette Bay State Park") +
    theme_minimal()
} else {
  # Display a message and an empty plot if no invasive species are found within the park.
  message_text <- "No invasive species found within Niquette Bay State Park."
  ggplot() +
    annotate("text", x = 0.5, y = 0.5, label = message_text, size = 6, color = "red", hjust = 0.5, vjust = 0.5) +
    labs(title = "No Invasive Species Found") +
    theme_void()
}
Figure 6: Density and Spread of Invasive Species within Niquette Bay State Park

Figure 6: Density and Spread of Invasive Species within Niquette Bay State Park

# Final Confirmation: This step acts as a final confirmation that there are indeed no invasive species detected within the boundaries of Niquette Bay State Park based on the data provided.
# Visualization of Absence: Rather than leaving the result implicit, the code explicitly visualizes the absence of data through an annotated message within a plot. 
# This reinforces the finding visually, which can be particularly impactful for stakeholders.
# Implications for Park Management: The repeated absence of invasive species findings is a positive sign for the park's ecosystem but also suggests that ongoing monitoring is essential. 
# It indicates that either the park's environmental management practices are effective or that there might be gaps in data collection.
# Cautious Interpretation: A novice reader is reminded that absence of evidence is not necessarily evidence of absence. 
# It’s crucial to maintain vigilance and continue regular monitoring to ensure the park remains free of invasive species and to verify that the current results are not due to insufficient data.



Figure 6: Analysis of Invasive Species Distribution within Niquette Bay State Park

Overview: This figure presents the investigation into the presence of invasive species within Niquette Bay State Park. A density plot reveals the distribution of species within the park’s boundaries, offering insights into ecological patterns.

Key Points: - Visualization Purpose: Should no observations be present, the visualization will clearly communicate this, underscoring the park’s ecological health. - Management Tool: The data acts as a critical resource for environmental management, indicating either the success of protective measures or the need for enhanced monitoring.



Final Confirmation: Absence of Invasive Species

Cautious Interpretation: - Important Reminder: A novice reader is reminded that the absence of evidence is not necessarily evidence of absence. - Recommended Action: It’s crucial to maintain vigilance and continue regular monitoring to ensure the park remains free of invasive species and to verify that the current results are not due to insufficient data collection.



# Management Efforts: Overlay the park boundary and locations of invasive species for management planning.
ggplot() +
  geom_sf(data = species_in_park, aes(color = invasive_name)) +
  geom_sf(data = park_area_sf, fill = NA, color = "black", size = 0.5) +
  labs(title = "Invasive Species within Niquette Bay State Park",
       subtitle = "Mapped with park boundary overlay",
       x = "Longitude", y = "Latitude") +
  theme_minimal() +
  theme(legend.position = "bottom")
Figure 7: Invasive Species Management Planning within Niquette Bay State Park

Figure 7: Invasive Species Management Planning within Niquette Bay State Park

# Intended Data Check: This step was conducted as a precautionary measure to ensure that no invasive species were overlooked in previous analyses within the boundaries of Niquette Bay State Park.
# Visualization Outcome: The plot does not display any invasive species data, which indicates that after a thorough check, no invasive species have been recorded within the park according to the dataset being used.
# Why This Step Is Important: Performing such checks is crucial in environmental management. It helps confirm the effectiveness of existing conservation measures or highlights areas that may require additional surveying and protection efforts.
# Management Implications: The absence of data points on the map confirms that, based on the current information, the park does not have an invasive species problem within its bounds. 
# This information can be used for future park management and conservation planning to maintain the integrity of the park's ecosystem.
# In essence, the map tells us that the park is currently in a good state concerning invasive species, but it also underlines the necessity for ongoing vigilance in environmental monitoring and data collection.



Figure 7: Invasive Species Management Planning within Niquette Bay State Park

Overview: This map serves as an essential instrument for conservation planning, depicting the spatial relation of invasive species observations within the boundaries of Niquette Bay State Park. Notably, the map shows an absence of detected invasive species, potentially reflecting the effectiveness of current conservation efforts.

Key Points: - Visualization: Provides a clear visual representation of the area free from invasive species. - Conservation Success: The lack of invasive species may indicate successful conservation practices within the park.



Data Check for Invasive Species, Figure 7 above

Objective: Conduct a precautionary check to ensure no invasive species were overlooked in Niquette Bay State Park.

Importance: - Environmental Management: Crucial for confirming the effectiveness of existing conservation measures and identifying any areas that may require further surveying and protection efforts. - Outcome: Confirms that, based on current data, the park does not have an invasive species problem, supporting ongoing management and conservation planning.

Conclusion: - Current Status: Suggests the park is in a good state concerning invasive species management. - Ongoing Vigilance: Underlines the necessity for continuous environmental monitoring and data collection to maintain the integrity of the park’s ecosystem.



Analyzing Invasive Species Distribution in Vermont

Expansion to Statewide Analysis:

Objective: Broaden the scope of the analysis to encompass the entire state of Vermont, examining the distribution of all invasive species using spatial data.

Purpose: - Statewide Overview: Checks and visualizes the distribution of invasive species across Vermont, offering insights into broader ecological challenges. - Data Utilization: Employs spatial data tools to provide a comprehensive overview of invasive species presence and impact statewide.



# If no observations of invasive species are found within the park, expand the analysis to the entire state of Vermont.
species_in_vermont <- st_intersection(species_data_sf, vermont_sf)

# Check and print invasive species data for Vermont, handling cases where no data is found.
if (nrow(species_in_vermont) > 0) {
  print("Invasive species data within Vermont:")
  print(species_in_vermont)
} else {
  message("No invasive species data found within Vermont boundaries.")
}

# Visualize the distribution of all invasive species in Vermont using spatial data.
species_map_v3 <- ggplot(species_in_vermont) +
  geom_sf(aes(color = invasive_name)) +  # Use color to differentiate species
  geom_sf(data = vermont_sf, fill = NA, color = "black") +  # Outline Vermont state boundary
  labs(title = "Distribution of Invasive Species in Vermont",
       subtitle = "Spatial distribution of invasive species occurrences",
       x = "Longitude", y = "Latitude") +
  theme_minimal() +
  theme(legend.position = "left")
print(species_map_v3)
Figure 8: Distribution of Invasive Species in Vermont

Figure 8: Distribution of Invasive Species in Vermont



Figure 8: Distribution of Invasive Species in Vermont

Overview: This map showcases the localized occurrences of invasive species within the boundaries of Vermont, emphasizing specific areas of detection.

Purpose: - Highlight Critical Areas: Identifies locations that may require immediate attention and intervention due to the presence of invasive species. - Visualization Aid: Assists in understanding the distribution patterns within the state, crucial for targeted ecological management and effective resource allocation.



# Display a map of all recorded invasive species observations across Vermont for comprehensive insight.
all_species_vermont_map_v4 <- ggplot(data = species_data_sf) +
  geom_sf(aes(color = invasive_name)) +
  geom_sf(data = vermont_sf, fill = NA, color = "black") +
  labs(title = "Distribution of Invasive Species Across Vermont",
       subtitle = "All Recorded Species Observations",
       x = "Longitude", y = "Latitude") +
  theme_minimal() +
  theme(legend.position = "left")
print(all_species_vermont_map_v4)
Figure 9: Analyzing Invasive Species Distribution in Vermont

Figure 9: Analyzing Invasive Species Distribution in Vermont

# differentiate between the two sections of code within Step 14:

# Checking and Visualizing Invasive Species Data within Vermont Boundaries:
# This section intersects the species data with the boundaries of Vermont to check for the presence of invasive species within the state.
# If invasive species data is found within Vermont, it prints and displays the data.
# Visualizes the distribution of all invasive species within Vermont boundaries using spatial data.
# Visualizing the Distribution of All Recorded Invasive Species Observations Across Vermont:
# This section directly visualizes the distribution of all recorded invasive species observations across Vermont.
# It does not filter the data based on Vermont boundaries but rather plots all recorded invasive species occurrences across the entire state.
# Provides a comprehensive view of invasive species occurrences statewide, without specifically focusing on intersecting with Vermont boundaries.


# Contextual Expansion: After confirming that Niquette Bay State Park did not show invasive species within its boundaries, the analysis was expanded to see if this was also the case across Vermont. 
# The results confirm the presence of invasive species elsewhere in the state.
# Comprehensive View: The visualization of this expanded dataset will illustrate the extent of the invasive species issue across Vermont, providing a visual and data-driven narrative of the ecological challenges the state faces.
# Implications for Policy and Management: Understanding the distribution of invasive species at the state level allows policymakers and environmental managers to allocate resources effectively, prioritize areas for treatment, and track the success of conservation efforts.
# Importance of Repeated Analysis: Reiterating the analysis at different scales (statewide vs. park-level) is crucial for accuracy in environmental monitoring. It ensures that broader patterns are not missed when focusing on smaller, specific areas like Niquette Bay State Park.



Figure 9: Spatial Distribution of All Recorded Invasive Species Observations Across Vermont

Overview: This comprehensive map presents all recorded observations of invasive species across Vermont, providing a holistic view of the ecological challenges posed by these species throughout the state.

Purpose: - State-wide Assessment: Essential for informing policymakers and environmental managers about the widespread nature of these threats. - Strategic Planning: Aids in strategic planning for conservation efforts, offering a broad perspective crucial for effective ecological management.



Summary of Analysis of Invasive Species in Vermont

Checking and Visualizing Invasive Species Data within Vermont Boundaries

Objective: Detect and visualize the presence of invasive species confined within Vermont’s geographic boundaries using spatial data tools.

Process: - Spatial Intersection: Involves the intersection of species data with Vermont’s geographic boundaries. - Visualization: Detected occurrences are displayed to assess the distribution of invasive species within the state.

Visualizing the Distribution of All Recorded Invasive Species Observations Across Vermont

Comparison: - Unrestricted Data: Unlike previous analyses, this segment visualizes all recorded observations without restricting the dataset to state boundaries. - Broad Overview: Provides an extensive look at the ecological challenges Vermont faces, underscoring the importance of comprehensive assessments.

Contextual Expansion of Analysis Beyond Niquette Bay State Park

Initial Findings: - No Invasive Species Detected: Initial analysis at Niquette Bay State Park showed no invasive species presence. - State-wide Expansion: The scope was broadened to include the entire state following initial findings, confirming the presence of invasive species in various statewide locations.

Implications for Policy and Management

Strategic Importance: - Resource Allocation: Crucial for efficient resource allocation and prioritization of treatment areas. - Policy Making: Supports the evaluation of conservation effort effectiveness, guiding policy adjustments.

Importance of Repeated Analysis

Methodology: - Repeated Analyses: Essential for ensuring precise environmental monitoring by performing analyses at various scales, from specific locales to statewide assessments. - Ecological Patterns: Helps identify and address larger ecological patterns, preventing oversight that can occur by focusing solely on localized areas.



Filtering and Analyzing Target Invasive Species Data Across Vermont

# ----------------------------------------------------------------------------------
# Step 16: Filtering and Analyzing Target Invasive Species Data Across Vermont
# ----------------------------------------------------------------------------------

# Description:
# This step filters the species data to include only the target invasive species across Vermont.
# It focuses on the distribution of the Emerald Ash Borer and Hemlock Woolly Adelgid observations.
# The filtered data is visualized to analyze the spatial distribution of these target invasive species across Vermont.

# Filter and analyze target invasive species data across Vermont.
target_species <- c("Emerald Ash Borer", "Hemlock Woolly Adelgid")
filtered_species_vermont <- species_data_sf %>%
  filter(invasive_name %in% target_species)

# Visualize the distribution of targeted invasive species across Vermont.
filtered_species_vermont_map <- ggplot(data = filtered_species_vermont) +
  geom_sf(aes(color = invasive_name)) +
  geom_sf(data = vermont_sf, fill = NA, color = "black") +
  labs(title = "Distribution of Target Invasive Species Across Vermont",
       subtitle = "Emerald Ash Borer and Hemlock Woolly Adelgid Observations",
       x = "Longitude", y = "Latitude") +
  scale_color_manual(values = c("Emerald Ash Borer" = "darkgreen",
                                "Hemlock Woolly Adelgid" = "darkmagenta")) +
  theme_minimal()

print(filtered_species_vermont_map)
Figure 10: Filtering and Analyzing Invasive Species Data Across Vermont

Figure 10: Filtering and Analyzing Invasive Species Data Across Vermont

# Narrowed Focus: After a broader examination of all invasive species in Vermont, this analysis narrows the focus to two critical species to understand their specific distribution patterns.
# Color-Coded Clarity: The map's use of distinct colors allows anyone, regardless of their expertise, to see where each of these species has been observed. This clarity is essential for communicating complex data simply.
# Environmental Insight: The visual distribution offers immediate insights into which areas may be more affected or at risk. This information is vital for planning targeted responses to mitigate the impact of these invasive species.
# Management Strategy Development: The map aids in identifying 'hotspots' of activity for each invasive species, allowing for more efficient allocation of resources for management and control efforts.
# In summary, the data tells us where we need to pay attention and potentially direct our conservation efforts to address the specific threats posed by the Emerald Ash Borer and Hemlock Woolly Adelgid in Vermont.



Figure 10: Impact of Invasive Species Distribution in Vermont

Objective: This figure illustrates the distribution of two significant invasive species in Vermont: the Emerald Ash Borer and the Hemlock Woolly Adelgid, highlighting critical hotspots where these species pose substantial threats to biodiversity and forest health.

Action Required: - Immediate Management: Targeted management actions are necessary to effectively address the challenges posed by these hotspots.

Discussion on Figure 10: Strategic Implications

Strategic Allocation: - Resource Optimization: The map aids in strategically allocating conservation resources by identifying regions requiring urgent intervention. - Effort Focusing: Focusing efforts on these high-risk areas enables environmental managers to optimize strategies for controlling the spread of invasive species, enhancing the effectiveness of conservation efforts.

Discussion on Figure 10: Future Considerations

Adaptive Management: - Continuous Monitoring: Continuous monitoring and periodic analysis updates are crucial as environmental conditions and species behaviors evolve. - Strategy Adjustment: Expanding monitoring efforts to include emerging hotspots and adjusting management strategies accordingly are essential for staying ahead of new challenges.

Outcome: - Sustainable Practices: This proactive approach ensures the long-term effectiveness of conservation efforts, adapting to new challenges and promoting sustainable ecological practices across similar settings.



Comprehensive Analysis of Invasive Species Distribution and Environmental Impacts in Vermont

# ------------------------------------------------------------------------------------------------
# Step 17: Analyze Species Occurrences
# ------------------------------------------------------------------------------------------------

# Description:
# In this step, we calculate the count of occurrences for each invasive species, focusing on identifying the most prevalent ones based on the provided species data.

# Calculate the count of occurrences for each species.
species_count <- species_data_sf %>%
  group_by(invasive_name) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

# Display the count of observations per species.
print(species_count)
## Simple feature collection with 34 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -73.41649 ymin: 42.72899 xmax: -71.62245 ymax: 45.01408
## Geodetic CRS:  WGS 84
## # A tibble: 34 × 3
##    invasive_name          count                                         geometry
##    <chr>                  <int>                                 <MULTIPOINT [°]>
##  1 Hemlock Woolly Adelgid   331 ((-72.23133 43.7532), (-72.22273 43.82385), (-7…
##  2 Tatarian honeysuckle      84 ((-71.62476 44.68753), (-71.62245 44.688), (-71…
##  3 Common buckthorn          66 ((-72.2355 43.8118), (-72.614 43.7345), (-72.63…
##  4 Emerald Ash Borer         59 ((-72.13964 44.41226), (-72.15172 44.41078), (-…
##  5 Common reed               57 ((-71.63021 44.71069), (-71.67049 44.68651), (-…
##  6 Japanese honeysuckle      49 ((-72.06293 44.65791), (-72.18078 44.5078), (-7…
##  7 Elongate Hemlock Scale    48 ((-72.23122 43.75284), (-72.22223 43.83185), (-…
##  8 Japanese knotweed         46 ((-72.06206 44.65957), (-71.79214 44.53043), (-…
##  9 Goutweed                  39 ((-72.2366 43.8112), (-72.4068 43.6402), (-72.6…
## 10 Oriental bittersweet      37 ((-71.79917 44.5807), (-72.30571 44.22848), (-7…
## # ℹ 24 more rows
# Prevalence Ranking: The output is a simple feature collection table listing the invasive species names and the count of their occurrences. This table shows that the Hemlock Woolly Adelgid has the highest number of recorded occurrences, followed by other species like Tatarian honeysuckle, Common buckthorn, and the Emerald Ash Borer.
# Most Affected Species: The Hemlock Woolly Adelgid appears as the most prevalent invasive species within the observed data, signaling a potentially significant threat to local ecosystems.
# Potential Actions: This data can inform conservationists and policymakers where to focus efforts for surveys, control, and prevention measures. Species with higher counts may require urgent action to prevent further spread and mitigate ecological impact.
# In essence, this analysis helps to clarify the magnitude of invasive species issues by quantifying observations, thereby guiding decision-making processes for ecological management and resource allocation



Comprehensive Analysis of Invasive Species Distribution and Environmental Impacts in Vermont

Objective: This section synthesizes data from various sources to visualize the current distribution and potential future spread of invasive species across Vermont.

Data Sources: - iNaturalist Vermont: Provides geo-tagged observations of invasive species. - Vermont Invasive Species Database & Vermont Open Geodata Portal: Offer spatial data crucial for identifying areas with significant invasive activity. - OpenWeather and WorldClim: Supply environmental variables and historical climate data to assess correlations between weather patterns and invasive species invasions.

Insights: - Predictive Analysis: Utilizes predictive insights to forecast future threats under varying climate scenarios. - Strategic Planning: Supports strategic conservation planning with a nuanced understanding of ecological dynamics.

Discussion on Detailed Analysis of Species Prevalence and Impact

Focus: This analysis highlights the prevalence and geographic distribution of key invasive species within Vermont, particularly the Emerald Ash Borer and Hemlock Woolly Adelgid.

Key Points: - Hotspot Identification: Identifies specific areas heavily affected by these invasive species to target management actions. - Resource Allocation: Facilitates the efficient allocation of resources and planning of containment and eradication strategies. - Adaptive Management: Emphasizes the importance of continual monitoring and updating of analyses to respond to changing environmental conditions and adaptive invasive species.

Outcome: - Enhanced Management Efficiency: Ensures that management efforts are effective and future strategies are informed by accurate and timely data.



Identify and Visualize Top Two Most Common Invasive Species



#---------------------------------------------------------------------------------
# Step 18: Identify and Visualize Top Two Most Common Invasive Species
#---------------------------------------------------------------------------------

# Description:
# In this step, we identify and visualize the distribution of the top two most common invasive species in Vermont based on the occurrence counts calculated in the previous step.

# Identify the names of the top two most common invasive species.
top_species_names <- head(species_count$invasive_name, 2)

# Filter the species data to include only observations of the top two species.
top_species_data <- species_data_sf %>%
  filter(invasive_name %in% top_species_names)

# Create a ggplot map to visualize the distribution of the top two species across Vermont.
top_species_map <- ggplot(data = top_species_data) +
  geom_sf(aes(color = invasive_name)) +
  geom_sf(data = vermont_sf, fill = NA, color = "black") +
  labs(title = "Distribution of the Top Two Most Common Invasive Species in Vermont",
       subtitle = "Observations of the Two Most Prevalent Species",
       x = "Longitude", y = "Latitude") +
  scale_color_manual(values = c(setNames(rainbow(2), top_species_names))) +
  theme_minimal()

# Display the map.
print(top_species_map)
Figure 11: Identify and Visualize Top Two Most Common Invasive Species

Figure 11: Identify and Visualize Top Two Most Common Invasive Species

# Unexpected Findings: Contrary to initial reports and publications that indicated the Emerald Ash Borer and Hemlock Woolly Adelgid were the most invasive, 
# the data reveals a different story. The actual most common species based on observed occurrences are Hemlock Woolly Adelgid, with the highest count, and another species that was not initially expected to be as prevalent.
# Visual Evidence: The ggplot map provides visual evidence of the distribution of these two species, emphasizing the actual impact as reflected by the data.
# Implications for Research and Management: This discrepancy between expectation and data highlights the importance of direct data analysis in ecological studies. While prior research and reports are valuable, 
# empirical data can sometimes tell a different story, which can lead to updated priorities and strategies in managing invasive species.
# Effective Communication: the data underscores that while certain species may be anticipated to be predominant due to their reputation or impact in other areas, local data analysis is essential for accurate ecological assessment and effective resource management.
# The map generated in this step tells the factual story of Vermont's current situation regarding invasive species, based on actual data, allowing for a technical and logical approach to addressing the ecological challenges presented by these species.



Figure 11: Distribution Discrepancy Analysis of Invasive Species in Vermont

Objective: Present an analysis of observed occurrences of invasive species in Vermont, focusing on unexpected findings regarding the prevalence of certain species.

Details: - Unexpected Findings: Contrary to initial expectations, the Hemlock Woolly Adelgid appears as the most common species, alongside another unexpectedly prevalent species. - Visualization: Uses a ggplot map to represent the distribution of these species, providing empirical evidence that challenges prior assumptions.



Implications for Research and Management

Key Insight: The discrepancy between expectation and empirical data underscores the importance of direct data analysis in ecological studies.

Management Strategy: - Updated Priorities: Empirical data may lead to updated priorities and strategies in managing invasive species. - Communication: Effective communication of these findings is crucial for local data analysis to guide accurate ecological assessment and resource management.

Conclusion

Map Analysis: The map generated from this analysis provides a factual representation of Vermont’s current invasive species situation, facilitating a technical and logical approach to addressing these ecological challenges and guiding future research and management efforts.



Transition from Hotspot Identification to Clustering with DBSCAN, Elbow, and K-means

From Hotspot Identification:

The analysis has successfully pinpointed regions in Vermont where invasive species are most prevalent, setting the stage for a deeper exploration of spatial distribution patterns.

Introduction to Clustering Techniques:

Transitioning to advanced clustering techniques, including DBSCAN, Elbow method, and K-means, to identify underlying patterns and potential ecological drivers.

DBSCAN:

DBSCAN groups closely packed points into clusters and marks points in low-density areas as outliers, helping identify dense regions of invasive species occurrences.

Elbow Method:

This heuristic method helps determine the optimal number of clusters by identifying the “elbow point” in a plot of within-cluster sum of squares (WCSS) against the number of clusters.

K-means Clustering:

K-means partitions the dataset into distinct, non-overlapping clusters, iteratively assigning data points to the nearest cluster centroid and updating centroids based on the assigned points’ mean.

Rationale for Transition:

Moving from hotspot identification to clustering provides insights beyond high-density areas, uncovering complex spatial patterns and groupings of invasive species occurrences, informing targeted conservation and management strategies.



Spatial Clustering Analysis

Objective: Conduct spatial clustering analysis using spatial clustering algorithms to identify clusters or hotspots of invasive species occurrences.

Method: - Technique Used: Spatial clustering algorithms group spatially close observations into clusters based on their geographical proximity. - Outcome: The resulting clusters reveal insights into the spatial distribution patterns of invasive species.

DBSCAN Clustering of Observations



# ----------------------------------------------------------------------------

# Spatial Clustering Analysis

# ----------------------------------------------------------------------------

# This step involves conducting spatial clustering analysis on the species data to identify clusters or 
# hotspots of invasive species occurrences. It employs spatial clustering algorithms to group spatially close observations into clusters based on their geographical proximity. 
# The resulting clusters provide insights into the spatial distribution patterns of invasive species.

## --------------------------------------------------------------------------------
## Step 19: DBSCAN Clustering Analysis
## --------------------------------------------------------------------------------

# Setting Up Environment
# Ensure the 'sf' package is available, install it if not
if (!requireNamespace("sf", quietly = TRUE)) {
  install.packages("sf")
}

# Extracting Coordinates
# Extract geographical coordinates (longitude, latitude) from the 'master_observation_list_sf' spatial dataframe
example_coords <- st_coordinates(master_observation_list_sf)

# Perform DBSCAN Clustering
# Apply the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering algorithm to the extracted coordinates
# Adjust the parameters 'eps' (maximum distance between two points to be considered in the same neighborhood) and 'MinPts' (minimum number of points required to form a dense region) based on data characteristics
db_clusters <- dbscan(example_coords, eps = 0.01, MinPts = 5)

# Incorporate clustering results into the original spatial dataframe
# Create a new column 'cluster' as a factor to represent different clusters identified by DBSCAN
master_observation_list_sf$cluster <- as.factor(db_clusters$cluster)

# Visualization of DBSCAN Clustering Results

# Create a map visualizing the results of DBSCAN clustering, with different colors representing different clusters:

cluster_map <- ggplot() +
  geom_sf(data = master_observation_list_sf, aes(color = cluster)) +  # Plot clusters using ggplot2
  labs(title = "DBSCAN Clustering of Observations",
       subtitle = "Clusters based on geographical coordinates",
       x = "Longitude", y = "Latitude") +  # Add titles and labels
  theme_minimal() +  # Apply minimal theme for visualization aesthetics
  theme(legend.position = c(2.1, 0.5))  # Move legend slightly to the right
print(cluster_map)
Figure 12: DBSCAN Clustering of Observations

Figure 12: DBSCAN Clustering of Observations

# Visualization and Data Story:
# Cluster Map Creation: A map is generated to visualize these clusters, with different colors representing each cluster, clearly distinguishing the groups of observations.
# Clustering Insights: The visualization shows where invasive species occurrences are not random but instead clustered in certain areas – these are the hotspots where species are more densely located.
# Management Implications: Understanding these clusters can help in identifying areas that might be at higher risk of invasion and thus could benefit from focused management efforts.
# Summary for Novices:
# What is DBSCAN?: DBSCAN is a method that finds neighbors that are closely packed together and groups them into clusters. It helps us see if there are any 'hotspots' where invasive species are especially numerous.
# The Plot’s Message: The clusters on the map show us where the invasive species are not just scattered randomly but are concentrated in specific areas. Each color on the map represents a different cluster, or a 'neighborhood,' of invasive species.
# Why This Matters: By knowing where these clusters are, we can better manage invasive species because these are the areas where they're most likely to cause problems. We might need to take extra care in these hotspots to protect the local environment.
# Practicality of Findings: The analysis gives us a practical way to use limited resources more effectively by targeting the areas where the need is greatest.
# In essence, the DBSCAN clustering tells us a story about where the invasive species are gathering in groups across the landscape, which can be crucial information for making smart decisions about environmental management.



Figure 12: Visualization of DBSCAN Clustering Results

Objective: Display the results of DBSCAN clustering on a map, each cluster represented by a different color for clear differentiation.

Details: - Visualization: The map shows the spatial distribution of invasive species occurrences, with each cluster differentiated by color. - Insights: Reveals that invasive species occurrences are clustered in certain areas, representing hotspots of activity.



Spatial Clustering Analysis of Invasive Species Occurrences in Vermont

Objective

Present a clear visual representation using DBSCAN clustering to identify where invasive species occurrences are concentrated across Vermont.

Cluster Map Creation

  • Visualization Method: Utilize varied colors on the map to represent distinct clusters, enhancing the clarity of spatially clustered observations.

Clustering Insights

  • Distribution Pattern: Analysis shows that invasive species do not appear randomly but are significantly clustered in specific areas.
  • Identification of Hotspots: These clusters highlight areas with dense populations of invasive species.

Management Implications

Importance of Recognizing Clusters

Understanding these clusters is vital for pinpointing areas potentially at greater risk of invasion, guiding targeted management efforts effectively.

Summary

What is DBSCAN?

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering method that groups closely packed points into clusters, effectively identifying dense ‘hotspots’ of activity.

The Plot’s Message

  • Spatial Distribution: The map demonstrates that invasive species are not randomly dispersed but are notably concentrated in specific areas.
  • Cluster Delineation: Each cluster is marked by a different color, distinguishing various ‘neighborhoods’ of species.

Why This Matters

Identifying these clusters enables more effective management of invasive species by focusing resources and conservation efforts on the most impacted areas.

Practicality of Findings

The insights from the clustering analysis offer a strategic approach to allocate resources efficiently, concentrating on areas with the most urgent needs.

In summary, DBSCAN clustering provides valuable insights into the distribution of invasive species across Vermont, serving as a critical tool for informed environmental management decisions.



K-Means Clustering Analysis



## ----------------------------------------------------------------------------------------------------------------
## Step 20: K-Means Clustering Analysis
## ----------------------------------------------------------------------------------------------------------------

# Convert 'master_observation_list' to a spatial dataframe (sf object) with geographical coordinates
master_observation_list_sf <- st_as_sf(master_observation_list, coords = c("longitude", "latitude"), crs = 4326)

# K-Means Clustering:

# Perform K-Means clustering on the spatial coordinates
set.seed(123)  # Ensure reproducibility
k_means_result <- kmeans(st_coordinates(master_observation_list_sf), centers = 5)

# Add the K-Means cluster assignments to the spatial dataframe
master_observation_list_sf$cluster_kmeans <- as.factor(k_means_result$cluster)

# Convert the spatial dataframe to a regular dataframe for visualization with ggplot2
master_observation_list_df <- as.data.frame(master_observation_list_sf)

# Plot K-Means clusters with ggplot2
k_means_map <- ggplot() +
  geom_sf(data = master_observation_list_sf, aes(color = cluster_kmeans)) +  # Plot clusters using ggplot2
  labs(title = "K-Means Clustering of Observations",
       subtitle = "Clusters based on geographical coordinates",
       x = "Longitude", y = "Latitude") +  # Add titles and labels
  theme_minimal()  # Apply minimal theme for visualization aesthetics

# Display the K-Means cluster map
print(k_means_map)
Figure 13: K-Means Clustering Analysis

Figure 13: K-Means Clustering Analysis

# Visualization and Data Story:
# Map Generation: The ggplot2 package creates a map that displays these clusters, each marked with a different color for easy differentiation.
# Interpretation: The map illustrates how the observations are not randomly scattered but tend to group together in certain areas—these are the clusters identified by the algorithm.
# Insights for Management: Identifying these clusters helps focus conservation efforts where they're most needed. If a cluster corresponds to a critical habitat or a highly affected area, it might require more intensive management actions.
# Summary for Novices:
# What is K-Means?: K-Means is a way to organize scattered data into groups (clusters). It's like organizing scattered points on a map into five different regions based on their location.
# Map's Role: The map with different colored points shows the 'regions' or clusters where similar observations are grouped, helping to visualize patterns in the distribution of invasive species.
# Management Relevance: Knowing these patterns helps us decide where to act to control invasive species. It's a strategy to use resources wisely, by concentrating on areas with the most observations.
# Data-Driven Decisions: This step illustrates the importance of using data to make informed decisions in managing the environment. It confirms that invasive species have a pattern in their spread, which we can target for better ecological outcomes.
# This K-Means clustering thus tells a data-driven story about how and where invasive species congregate within Vermont, guiding future action to address these ecological concerns in an efficient manner.



Figure 13: DBSCAN Clustering of Invasive Species Distribution

Objective: Demonstrate the results of DBSCAN clustering on the spatial distribution of invasive species using the ggplot2 package in R.

Details: - Visualization: Each cluster is depicted in a unique color to simplify the differentiation between groups of spatially aggregated observations. - Insights: The map reveals that invasive species are not randomly distributed but tend to form distinct clusters in certain areas, indicating potential hotspots. - Application: This information is crucial for directing targeted management strategies, particularly in critical habitats or heavily impacted areas.

Impact: - Conservation Priorities: Aids in understanding spatial patterns to inform conservation priorities. - Resource Allocation: Facilitates strategic deployment of resources to mitigate the impact of invasive species.



Understanding K-Means Clustering for Ecological Management

What is K-Means?

K-Means clustering is a method used to organize scattered data into meaningful groups or clusters. In the context of ecological management, it organizes observations of invasive species into distinct regions based on geographical distribution.

Role of Mapping

  • Visualization: Clusters are visualized on a map, each represented by a different color.
  • Pattern Recognition: Helps ecologists and environmental managers identify patterns in the distribution of invasive species.

Relevance in Management

Understanding these patterns is essential for making informed decisions about managing invasive species, focusing resources and efforts on areas with the highest concentration of observations to combat invasive species strategically.

Importance of Data-Driven Decisions

  • Decision Making: Emphasizes the importance of using data to drive decisions in environmental management.
  • Efficiency: Recognizing and targeting patterns in the spread of invasive species leads to more efficient and effective ecological outcomes.

Conclusion

K-Means clustering provides insights into how and where invasive species aggregate within a specific area, such as Vermont. These insights guide future actions and interventions aimed at addressing ecological concerns in a manner that optimizes resource allocation and conservation efforts.



Elbow Method for Optimal Cluster Count Selection

# -----------------------------------------------------------------------------------------------------------------------------------
# Step 21: Elbow Method for Optimal Cluster Count Selection
# -----------------------------------------------------------------------------------------------------------------------------------
# Description:
# The Elbow Method is employed to determine the optimal number of clusters for the K-Means clustering algorithm. By plotting the total within-cluster sum of squares against the number of clusters (k), this method helps identify the point where the rate of decrease in within-cluster variance slows down, indicating the optimal number of clusters.

# Perform the Elbow Method to compute the total within-cluster sum of squares for different cluster counts (k).
wss <- map_dbl(1:10, function(k) {
  kmeans(example_coords, centers = k, nstart = 10)$tot.withinss
})

# Visualize the Elbow plot to identify the optimal number of clusters (k).
elbow_plot <- data.frame(k = 1:10, wss = wss) %>%
  ggplot(aes(x = k, y = wss)) +
  geom_line() +
  geom_point() +
  labs(title = "Elbow Method for Choosing k",
       x = "Number of Clusters (k)",
       y = "Total Within-Cluster Sum of Squares")

print(elbow_plot)
Figure 14: Elbow Method for Optimal Cluster Count Selection

Figure 14: Elbow Method for Optimal Cluster Count Selection

# Visualization and Interpretation:
# The Plot: On the Elbow plot, the x-axis represents the number of clusters, and the y-axis shows the WSS. Points on the plot show the WSS for each k.
# Finding the Elbow: The "elbow" is the point where increases in k result in smaller reductions in WSS. It suggests adding more clusters doesn't provide a much better fit.
# Optimal Clusters: The k at the elbow is considered the optimal number of clusters because it's a good trade-off between compactness (low WSS) and the number of clusters.
# Summary for Novices:
# What's the Elbow Method?: It's like trying to find the right place to cut a tree branch—the spot where you get the best cut with the least effort. Here, we look for the point where adding more clusters doesn't make our model much better.
# What Does the Plot Show?: The graph helps us see how well our data fits into a certain number of groups. At first, adding more groups (clusters) really helps, but after a certain point, it doesn't improve much.
# Why Is This Useful?: This method helps us avoid two things: having too many groups, which is unnecessary, or too few, which might lump different observations together.
# Decisions Based on Data: The Elbow Method ensures that the choice of how many groups to divide the data into is not random but based on actual trends in the data.
# In this way, the Elbow Method tells a story about finding balance—enough clusters to accurately reflect the data without overcomplicating the model. It guides us to a logical decision on the number of clusters to use for our invasive species analysis.



Figure 14: Elbow Plot for Determining Optimal Number of Clusters

Objective: Illustrate the relationship between the number of clusters (k) and the Within-Cluster Sum of Squares (WSS) to aid in determining the optimal number of clusters for the K-Means algorithm.

Details: - X-axis: Number of clusters (k). - Y-axis: Corresponding WSS values. - Data Points: Each point on the plot represents the WSS for a specific value of k.

Interpreting the Elbow Plot:

  • Finding the Elbow: Identify the “elbow” point where increases in k result in diminishing reductions in WSS, depicting the trade-off between the number of clusters and the compactness of the data.
  • Determining Optimal Clusters: The k value at the elbow is considered optimal, balancing low WSS (indicative of compact clusters) and minimized complexity from additional clusters.



Understanding the Elbow Method for Cluster Analysis

What is the Elbow Method?

The Elbow Method is a technique akin to finding the optimal place to prune a tree branch—it helps identify the point where adding more clusters does not significantly improve the model’s fit.

Insights from the Plot

  • Initial Improvement: Adding more clusters initially enhances the fit of the model.
  • Diminishing Returns: Beyond a certain point, the improvement in fit becomes marginal, indicating the optimal clustering threshold.

Utility of the Elbow Method

  • Balance Between Clusters: This method helps strike a balance between too many clusters, which can introduce unnecessary complexity, and too few clusters, which may merge distinct observations.
  • Data-Driven Decision Making: Ensures that the choice of the number of clusters is informed by the underlying trends in the data, rather than being arbitrary.

Conclusion

The Elbow Method provides a narrative of finding equilibrium—selecting an appropriate number of clusters that accurately represent the data without overly complicating the model. It serves as a guide for making logical and data-driven decisions in determining the number of clusters for our analysis of invasive species.

Spatial Distribution of Invasive Species



# ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Step 22: Spatial Distribution of Invasive Species 
# ------------------------------------------------------------------------------------

# top_species_data

# Check if top_species_data exists, if not, create a mock data frame for demonstration
if (!exists("top_species_data")) {
  print("top_species_data does not exist, creating a mock data frame for demonstration.")
  top_species_data <- data.frame(
    species = sample(c("Species1", "Species2", "Species3"), 415, replace = TRUE),
    longitude = runif(415, min = -73.5, max = -71.5),
    latitude = runif(415, min = 43.5, max = 45.5)
  )
}

# Convert data frame to an sf object while retaining original longitude and latitude columns
top_species_data_sf <- top_species_data %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326, remove = FALSE)  # Set remove = FALSE to keep the original columns

# Check and print the structure to verify that the geometry column has been added and original columns are retained
print(str(top_species_data_sf))
## sf [415 × 26] (S3: sf/tbl_df/tbl/data.frame)
##  $ x.id                      : num [1:415] 454405 454462 454516 454470 454317 ...
##  $ y.id                      : num [1:415] 156319 156518 156218 156452 156364 ...
##  $ site_name                 : chr [1:415] "Branbury S.P." "Branbury S.P." "Branbury S.P." "Branbury S.P." ...
##  $ observation_id            : num [1:415] 103 104 105 106 107 108 109 110 120 135 ...
##  $ invasive_name             : chr [1:415] "Tatarian honeysuckle" "Tatarian honeysuckle" "Tatarian honeysuckle" "Tatarian honeysuckle" ...
##  $ observation_date          : chr [1:415] "09/06/2010" "10/06/2010" "09/06/2010" "09/06/2010" ...
##  $ observer                  : chr [1:415] "Tess Greaves" "Tess Greaves" "Tess Greaves" "Tess Greaves" ...
##  $ survey_type               : chr [1:415] "Contractor" "Contractor" "Contractor" "Contractor" ...
##  $ town                      : chr [1:415] "Salisbury" "Salisbury" "Salisbury" "Salisbury" ...
##  $ plant_description         : chr [1:415] NA NA NA NA ...
##  $ distribution_name         : chr [1:415] NA "Scattered Plants or Clumps" NA NA ...
##  $ assessmen_date            : chr [1:415] "Jun  9 2010" "Jun 10 2010" "Jun  9 2010" "Jun  9 2010" ...
##  $ treatment_date            : chr [1:415] NA NA NA NA ...
##  $ treatment_effectiveness   : chr [1:415] NA NA NA NA ...
##  $ treatment_type            : chr [1:415] NA NA NA NA ...
##  $ treatment_person          : chr [1:415] "Tess Greaves" "Tess Greaves" "Tess Greaves" "Tess Greaves" ...
##  $ treatment_assessment_date : chr [1:415] NA NA NA NA ...
##  $ assessor                  : chr [1:415] NA NA NA NA ...
##  $ treatment_assessment_notes: chr [1:415] NA NA NA NA ...
##  $ chemical_name             : chr [1:415] NA NA NA NA ...
##  $ chemical_consentration    : chr [1:415] NA NA NA NA ...
##  $ chemical_ounces           : chr [1:415] NA NA NA NA ...
##  $ application_method        : chr [1:415] NA NA NA NA ...
##  $ certified_applicator      : chr [1:415] NA NA NA NA ...
##  $ eparegistation_number     : chr [1:415] NA NA NA NA ...
##  $ geometry                  :sfc_POINT of length 415; first list element:  'XY' num [1:2] -73.1 43.9
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:25] "x.id" "y.id" "site_name" "observation_id" ...
## NULL
# Data Integrity: The first action, checking if data exists and creating mock data if necessary, shows an important step in data analysis—making sure that you have the data you need to work with, and if not, how to create a stand-in for demonstration.
# Spatial Analysis Readiness: By converting the data to a spatial format while keeping original columns, the data is now ready for spatial analysis, like creating maps or conducting other geospatial computations.
# Geospatial Visualization: This step is foundational for creating a map that will visually show where the most invasive species are located in Vermont. These maps can help us understand how invasive species are spread across the landscape.



# Data Preparation for Spatial Analysis

This document outlines the steps for preparing and analyzing spatial data, focusing on the analysis of invasive species in Vermont.

Ensuring Data Integrity

Objective: Verify the availability of essential data and generate mock data if necessary.

Details: - Data Verification: Check the availability of the required datasets for the analysis. - Mock Data Creation: If the original data is unavailable, create a substitute dataset to ensure that the analysis can proceed without interruptions. This mock dataset will simulate the characteristics of the expected real data.

Readiness for Spatial Analysis

Objective: Convert data into a spatially compatible format while preserving the integrity of the original columns.

Details: - Data Transformation: Convert the dataset into a format suitable for spatial analysis (e.g., GeoDataFrame in Python). - Column Preservation: Ensure that all original data columns are retained during the conversion process to maintain data integrity.

Geospatial Visualization

Objective: Create visual representations of the data to illustrate the spatial distribution of invasive species across Vermont.

Details: - Map Creation: Develop maps that display the geographical spread of invasive species. - Analysis Tool: Utilize these maps as analytical tools to aid in decision-making and to inform ecological management strategies.

Spatial Distribution of Invasive Species



# ------------------------------------------------------------------------
#  Step 23: Spatial Distribution of Invasive Species 
# ------------------------------------------------------------------------

#  'top_species_data_sf' is your spatial dataframe and includes 'invasive_name'
# Check if invasive_name column exists to avoid runtime errors
if ("invasive_name" %in% colnames(top_species_data_sf)) {
  # Prepare for interactive visualization
  ggplot_data <- ggplot(top_species_data_sf) +
    geom_sf(aes(color = invasive_name, text = paste("Invasive Species:", invasive_name)), size = 3) +
    labs(title = "Spatial Distribution of Invasive Species",
         x = "Longitude", y = "Latitude") +
    theme_minimal()
  
  # Convert to Plotly for interactivity
  ggplotly_obj <- ggplotly(ggplot_data, tooltip = "text") %>%
    layout(legend = list(orientation = "h", y = -0.3))
  
  # Print the interactive plot
  ggplotly_obj
} else {
  "Column 'invasive_name' does not exist in the dataset. Please check the dataset."
}

Figure 15: Spatial Distribution of Top Invasive Species

# What's Happening?: Think of the dataset like a guest list for an event, where the 'invasive_name' column is the name of each guest. The code first checks to make sure the guest list is there. Then it creates a map to show where guests are seated, 
# with different colors for different guests, and labels to identify them.
# Visualization Purpose: By turning the list into a colorful map, it becomes much easier to see patterns—like if certain guests are grouped together, which in our case, would mean certain invasive species are found more in some areas than others.
# Why Interactive?: The interactive map allows you to get more information by moving your cursor over the points. It's like walking around the event and getting to know the guests by having a quick chat with them.
# Understanding the Message: The data tells us how these unwanted 'guests' (invasive species) are spread out through our 'event' (the region). It helps us understand where we need to focus our efforts to manage these species.



Figure 15: Spatial Distribution of Invasive Species Across Vermont

Objective: This map illustrates the spatial distribution of various invasive species across Vermont, clearly delineated by the state’s geographic boundaries.

Details: - Color-Coding: Each invasive species is marked with a distinct color, aiding in rapid assessment of biodiversity issues. - Geographic Clarity: The prominent display of state boundaries ensures immediate spatial context.

Purpose: This visualization provides an easily interpretable graphical representation of data, crucial for ecological management and decision-making processes.



What’s Happening?

Analogy Explanation: Think of the dataset like a guest list for an event, where the ‘invasive_name’ column represents the name of each guest. The code first ensures that the guest list exists, then creates a map to display where guests are seated, using different colors and labels to identify them.

Visualization Purpose

Key Functions: - Data Representation: The visualization transforms the guest list into a colorful map, simplifying the identification of patterns. For example, certain guests (invasive species) may be grouped together, indicating higher concentrations in specific areas.

Why Interactive?

Engagement Strategy: - User Interaction: The interactive map enhances user engagement by allowing them to obtain additional information by hovering over data points. This feature simulates the experience of walking around the event and engaging with guests (invasive species) for deeper insights.

Understanding the Message

Data Insights: - The dataset reveals the spatial distribution of invasive species throughout Vermont. Analyzing this data helps prioritize management efforts by identifying areas with higher concentrations of invasive species.

Spatial Clustering Analysis of Top Invasive Species



# ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Step 24: Spatial Clustering Analysis of Top Invasive Species
# ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Assuming 'top_species_data_sf' is already created and contains the necessary geographic data
# Determine the number of clusters
num_clusters <- 3  # Adjust this number based on your specific needs or analysis

# Perform k-means clustering on coordinates
coordinates <- st_coordinates(top_species_data_sf)
kmeans_result <- kmeans(coordinates, centers = num_clusters)
top_species_data_sf$cluster <- as.factor(kmeans_result$cluster)  # Add cluster results to your sf object

# Optional: Check the structure and clustering results
print(table(top_species_data_sf$cluster))
# Prepare for interactive visualization
# Map each cluster to a color and use invasive species names in the hover text
ggplot_data <- ggplot(top_species_data_sf) +
  geom_sf(aes(color = cluster, text = paste("Invasive Species:", invasive_name, "<br>Cluster:", cluster)), size = 3) +
  labs(title = "Spatial Distribution of Invasive Species by Cluster",
       x = "Longitude", y = "Latitude") +
  theme_minimal()

# Convert to Plotly for interactivity
ggplotly_obj <- ggplotly(ggplot_data, tooltip = "text") %>%
  layout(legend = list(orientation = "h", y = -0.3))

# Display the interactive plot
ggplotly_obj  

Figure 16: Spatial Distribution of Top Invasive Species by Regions

# What Does This Analysis Show?: The clustering divides the region into three areas based on the proximity of invasive species observations. This can highlight areas with high densities of invasive species, potentially signaling regions of concern.
# Purpose of Interactive Visualization: The interactive elements allow stakeholders, such as conservation managers or educational groups, to explore the data more deeply. By hovering over points, they can get specific information about the species at each location and see which cluster each point belongs to.
# Understanding the Outcome: The colors represent different clusters, making it visually apparent which areas are grouped together in the analysis. This can help in planning targeted management actions or further ecological studies.
# Practical Use: This kind of visualization not only provides a clear picture of where invasive species are most prevalent but also helps in understanding how these species are grouped geographically, which is crucial for effective environmental management and resource allocation.
# In summary, this step leverages both clustering analysis and interactive visualization to provide a comprehensive view of invasive species distribution, enhancing understanding and facilitating more informed decision-making.



Figure 16: Spatial Distribution of Top Invasive Species by Regions

Overview: This figure depicts the spatial distribution of top invasive species across different regions, identified through clustering analysis. Each region is represented by a distinct color, enhancing the visualization of spatial patterns in invasive species distribution.

Features: - Interactive Tooltips: Provide additional information about specific invasive species and their respective clusters, enabling a more interactive and informative user experience.



What Does This Analysis Show?

Clustering Analysis: - The clustering divides the region into three areas based on the proximity of invasive species observations. This method highlights areas with high densities of invasive species, potentially signaling regions of concern.

Purpose of Interactive Visualization

Interactive Elements: - Stakeholder Engagement: The interactive elements allow stakeholders, such as conservation managers or educational groups, to explore the data more deeply. By hovering over points, they can obtain specific information about the species at each location and identify the cluster each point belongs to.

Understanding the Outcome

Visual Clarity: - The colors represent different clusters, making it visually apparent which areas are grouped together in the analysis. This clarity aids in planning targeted management actions or further ecological studies.

Practical Use

Application: - This visualization provides a clear picture of where invasive species are most prevalent, and helps in understanding how these species are grouped geographically. This insight is crucial for effective environmental management and resource allocation.

Summary

Comprehensive View: - This step leverages both clustering analysis and interactive visualization to provide a comprehensive view of invasive species distribution, enhancing understanding and facilitating more informed decision-making.



Spatial Analysis and Transition to Species Distribution Modeling (SDM)

Overview

Context: Having completed an extensive exploratory data analysis, clustering, and hotspot identification, we now have a nuanced understanding of the spatial patterns and prevalence of invasive species across Vermont. This foundational knowledge sets the stage for more advanced ecological assessments.

Advancing to Species Distribution Modeling (SDM)

Transition to SDM

Objective: As we transition into species distribution modeling (SDM), our focus will shift to integrating raster-based weather data. This integration will enrich our models by incorporating crucial environmental variables such as temperature and precipitation, which play a significant role in predicting species distributions under varying climatic conditions.

Significance of the Transition

Enhanced Predictive Models: This step marks a sophisticated evolution in our analytical approach. By synthesizing climatic factors with biological data, we aim to enhance the accuracy and applicability of our predictive models, providing more precise insights into future species movements and potential new hotspots.

Expectations from SDM

Improved Outcomes: - Model Enrichment: The inclusion of environmental variables is expected to provide a richer, more contextually accurate framework for our species distribution models. - Decision Support: Enhanced predictive models will support more informed decision-making in ecological management and conservation planning.

Conclusion

Next Steps: - The transition to species distribution modeling represents a pivotal phase in our ongoing research. It promises to bring deeper insights into invasive species behavior and adaptation under changing climatic scenarios, ensuring that our conservation strategies are as effective and forward-looking as possible.