Analysis of Invasive Species Observations

Introduction

This R script outlines a comprehensive analysis workflow focused on understanding patterns of invasive species observations in relation to environmental factors such as climate. By integrating, transforming, and visualizing the data, the script aims to uncover insights into how the abundance and distribution of invasive species are influenced by climate variables and temporal changes.

Storyline: The overarching story revolves around the investigation of invasive species dynamics over time and across various locations, with a particular focus on how climatic factors such as temperature and precipitation influence these patterns. This analysis seeks to provide ecological insights into the resilience and spread of invasive species in response to environmental conditions, aiding in conservation efforts and the development of strategies for managing invasive populations.

# load in library

library(tidyverse)
library(here)
library(ggbeeswarm)
library(ggplot2)
library(ggrepel)
library(viridisLite)
library(RColorBrewer)
library(magick)
library(beepr)
library(gganimate)


# Read in the data
plotspecies <- read_csv(here("data", "new_clean_invasive_species.csv"))

Data Prepartion

The R code provided includes several data preparation steps to prepare the invasive species and climate data for analysis. Here you will find a summary of the most important data preparation steps:

  • Merging two CSV files: Annual Summaries of Vermont Climate Trends and Invasive Species
  • Loading libraries
  • Reading data sets
  • Transforming variables
  • Creating charts
  • Summarize observations
  • Additional transformations for plotting
  • Aggregating and summarizing data
  • Renaming columns
  • Writing data to CSV
  • Analysis with climate data
  • Sampling and filtered analysis
  • Calculations and transformations for the analysis

These steps for data preparation are fundamental for the subsequent analysis and visualization tasks and create the prerequisites for answering the research questions posed in the analysis. They show how important data manipulation, summarization and transformation are in order to gain insights from complex data sets.

Exploratory Data Analysis

Observations of Invasive Species by Year and Locaton

Impact of Climatic Variables on Invasive Species Distribution

# let's look at a different dataset  by merging two different csvs for a better story -------------------------

# Read in the data --------------------
climate <- read_csv(here("data", "joined_climate_species_data.csv"))

Impact of climate variables on the spread of invasive species

Understanding how climatic variables affect the spread of invasive species is crucial for predicting change and implementing management strategies. We analyze the relationship between climatic conditions such as maximum temperature, precipitation and the number of invasive species using scatter plots and facet patterns.

To analyze the effects of climate variables on invasive species distribution and include this analysis I focused on scatterplots and facet breaks that relate climate variables (such as maximum temperature and precipitation) to the number of species.

# Scatter plot of species count by maximum temperature, colored by year

climate %>%
  ggplot(aes(x = max_temp, y = species_count, color = Year)) +
  geom_point() + # Add points
  geom_smooth(method = "lm", se = FALSE, color = "black") + # Linear regression line without confidence interval
  theme_minimal() + # Minimal theme for a cleaner look
  labs(
  title = "Species Count by Maximum Temperature",
  x = "Maximum Temperature (degrees C)",
  y = "Species Count",
  color = "Year"
  )

# Scatter plot of species count by average monthly precipitation, facet by Year

climate %>%
  ggplot(aes(x = precip_mm_avg_month, y = species_count)) +
  geom_jitter(alpha = 0.5) + # Add jitter to better distinguish overlapping points
  facet_wrap(~Year, scales = "free_y") + # Facet by Year to compare across years
  labs(
    title = "Species Count by Average Monthly Precipitation, Faceted by Year",
    x = "Average Monthly Precipitation (mm)",
    y = "Species Count"
  ) +
  theme_minimal() 

# Use a minimal theme for a clean appearance
# Take a random sample of 10% of the data to make the plot more manageable

sampled_climate_data <- climate %>% sample_frac(0.1)

# Plot species count by maximum temperature with increased jitter and improved theme
# To include the species name associated with the count directly in the plot

sampled_climate_data %>% 
  ggplot(aes(x = max_temp, y = species_count, color = as.factor(species_count), label = invasive_name)) +
  geom_jitter(width = 0.4, height = 0.2, alpha = 0.5) +
  geom_text(check_overlap = TRUE, vjust = -0.5, hjust = 0.5, size = 2.5) + # Add species names as labels
  scale_color_brewer(palette = "Set1") +
  theme_minimal() +
  theme(
    text = element_text(size = 10),
    axis.text.x = element_text(angle = 45, hjust = 1)
  ) +
   labs(
    title = "Sampled Species Count by Maximum Temperature",
    x = "Maximum Temperature (degree C)",
    y = "Species Count"
  )

# scatter plots and correlation

climate %>%
  ggplot(aes(x = precip_mm_avg_month, y = species_count, color = max_temp)) +
  geom_point() + # Add points
  geom_smooth(method = "loess", se = FALSE) + # Add a smooth line without the standard error shading
  theme_classic() + # Use the classic theme
  labs(
    x = "Average Monthly Precipitation (mm)",
    y = "Species Count",
    title = "Correlation Between Average Monthly Precipitation and Max Temperature",
    color = "Max Temperature (degree C)" # Move the color labelling here
  ) +
  theme(
    text = element_text(size = 12), # Adjust the global text size
    axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x axis labels
    legend.position = "bottom" # Move the legend to the bottom
  )

Analysis

Scattering species by maximum temperature, color-coded by year, allows me to explore the correlation between temperature and invasive species. Adding a regression line helps visualize how the number of species evolves with temperature.

Faceting the scatterplot by average monthly precipitation over the years enables me to investigate the annual impact of precipitation on invasive species. I can identify unusual patterns, such as extremely wet or dry conditions.

Certain species tend to be more common in warmer climates, while others thrive in cooler conditions. It’s important to analyze overlapping species in specific temperature ranges.

I’ve observed a correlation between average monthly precipitation and species numbers, with a non-linear relationship suggesting an optimal precipitation range for the highest species diversity. Peaks occur at moderate temperatures rather than extreme highs.

As precipitation increases, data points scatter less favorably, indicating challenges for species dispersal during extreme weather. Outliers, especially at lower precipitation levels, suggest the influence of other factors.

This graph highlights the ecological complexity and the necessity of considering various climatic interactions when managing invasive species. It’s clear that sophisticated statistical analysis is needed to fully comprehend these relationships and develop effective management strategies.

Identifying specific climatic conditions that favor invasive species allows for the development of targeted conservation strategies. For instance, an increase in invasive species with rising temperatures might signal vulnerability to climate change, necessitating adaptive management practices.

Prevalence of Specific Invasive Species in Certain Climatic Conditions or Years

To investigate the prevalence of certain invasive species under certain climatic conditions or over different years, I used facet plots and jitter plots to visualize these relationships.

# Facet wrap of species counts by maximum temperature, separated by invasive species name

climate %>%
  ggplot(aes(x = max_temp, y = species_count, color = invasive_name)) +
  geom_jitter(alpha = 0.5) + # Use jitter to display overlapping points
  facet_wrap(~invasive_name, scales = "free_y") + # Facet by invasive species name
  labs(
    title = "Species Count by Maximum Temperature for Different Invasive Species",
    x = "Maximum Temperature (degree C",
    y = "Species Count",
    color = "Invasive Species"
  ) +
  theme_minimal() # Clean theme for better readability

# Scatter plot with jitter and color by Year, focusing on a specific invasive species
# This example assumes you are interested in a specific species, replace specific_species_name "with the actual name"
specific_species_data <- climate %>% 
  filter(invasive_name == "Tatarian honeysuckle") # Filter for the species of interest

specific_species_data %>%
  ggplot(aes(x = precip_mm_avg_month, y = species_count, color = Year)) +
  geom_jitter(alpha = 0.5) + # Jitter for better visibility
  labs(
    title = "Impact of Average Monthly Precipitation on Specific Species Count Across Years",
    x = "Average Monthly Precipitation (mm)",
    y = "Species Count",
    color = "Year"
  ) +
  theme_minimal()

Analysis

The faceted display of species numbers by maximum temperature, separated by invasive species, offers me a comprehensive overview of how various species react to temperature shifts. This visualization unveils species-specific temperature preferences or tolerances and spotlights those likely to proliferate more in warmer conditions.

As I delve into the scatterplot, which zooms in on a particular invasive species, I notice it’s color-coded by year and correlated with average monthly precipitation. This setup enables me to meticulously examine how fluctuations in rainfall across different years influence the species’ numbers. This analysis can unveil whether the species thrives in wetter or drier conditions and reveal temporal trends that suggest adaptation to evolving climatic conditions.

Through these assessments, I can pinpoint environmental conditions or time periods conducive to the prevalence of specific invasive species. Such insights are pivotal for crafting targeted management and mitigation strategies aimed at curbing the spread of invasive species in response to climate change.

Detailed Data Insights

Aggregated and Summarized Data Insights

This section focuses on the aggregated and summarized insights derived from the climate dataset. Here we focus on the observations per year and location and assess the variability in these observations with error bars for the mean species numbers. Our aim is to uncover patterns that could give us insight into how climatic conditions influence the distribution and prevalence of invasive species.

# Summarizing observations per year
yearly_summary <- climate %>%
  group_by(Year) %>%
  summarise(mean_species_count = mean(species_count, na.rm = TRUE),
            sd = sd(species_count, na.rm = TRUE),
            n = n(),
            se = sd / sqrt(n))

# Plotting mean species count per year with error bars
ggplot(yearly_summary, aes(x = Year, y = mean_species_count)) +
  geom_line() +
  geom_point() +
  geom_errorbar(aes(ymin = mean_species_count - se, ymax = mean_species_count + se), width = 0.2) +
  labs(title = "Mean Species Count per Year with Error Bars", x = "Year", y = "Mean Species Count")

# Summarizing observations by site town
town_summary <- climate %>%
  group_by(site_town) %>%
  summarise(mean_species_count = mean(species_count, na.rm = TRUE),
            sd = sd(species_count, na.rm = TRUE),
            n = n(),
            se = sd / sqrt(n))

# Plotting mean species count by site town with error bars
ggplot(town_summary, aes(x = reorder(site_town, mean_species_count), y = mean_species_count)) +
  geom_col() +
  geom_errorbar(aes(ymin = mean_species_count - se, ymax = mean_species_count + se), width = 0.4) +
  coord_flip() +
  labs(title = "Mean Species Count by Site Town with Error Bars", x = "Site Town", y = "Mean Species Count")

# Assuming `mean_species_data` needs to be created
mean_species_data <- climate %>%
  group_by(Year, invasive_name) %>%
  summarise(meanspecies = mean(species_count, na.rm = TRUE)) %>%
  ungroup()

# Create a basic column chart with labels and facet wrap
mean_species_data %>%
  ggplot(aes(x = Year, y = meanspecies, color = meanspecies)) +
  geom_col() +
  labs(title = "Mean Species Count by Year and Invasive Name",
       x = "Year",
       y = "Mean Species Count") +
  facet_wrap(~invasive_name)  # Add faceting by invasive_name

Analysis

The diagrams and summaries provide me with a clear overview of the distribution and frequency of observations of invasive species over time and in different locations. The presentation of annual observations shows trends or fluctuations in the data, indicating periods of increased or decreased invasive activity.

As I look at observations by locality, I can see which areas are most affected by invasive species, highlighting regions that may require more targeted management measures.

Examining mean species counts with error bars gives me insight into the variability and reliability of the observed counts. This helps me identify species with consistently high or low counts and guides further research into the factors that determine these patterns.

The faceted bar chart allows me to quickly see which species are most prevalent in certain years and observe general trends over the entire time frame of the study. I can also determine whether certain invasive species are becoming more common or less common, indicating changes in the ecosystem or the effectiveness of management strategies. However, such a detailed chart may face data overload, especially when multiple species or years are included, which may require the use of interactive visualizations or additional data filtering for clearer insights.

Through this aggregated and summarized data, I gain deeper insight into overall trends, variability of observations, and general patterns in invasive species monitoring. This informs our strategies for managing invasive species and protecting ecosystems.

Understanding these trends is critical for predicting potential future impacts of invasive species and for evaluating the success of past and ongoing management efforts. This aggregated view can highlight successes in controlling invasive species or, conversely, highlight where efforts may need to be intensified.

Effects of Environmental Conditions on Species Counts in Specific Sites or Towns

To investigate how local conditions affect the spread of invasive species, focusing in particular on preferred sites or those with higher than average numbers of species, you can use the sections of the R code discussed earlier that deal with filtering and visualizing data. Specifically, this would mean selecting subsets of the data based on certain criteria (e.g. specific locations or cities, higher than average numbers of species) and then graphing these subsets to observe patterns or trends related to environmental conditions. Examining the effects of local environmental conditions on the spread of invasive species can shed light on how different factors contribute to their spread. By focusing on specific locations or cities, especially those with higher than average species numbers, we can identify unique patterns and potential management strategies.

### Effects of Environmental Conditions on Species Counts in Specific Sites or Towns

# First, calculate the average species count across the entire dataset for comparison
average_species_count <- mean(climate$species_count, na.rm = TRUE)

# Filter data for favorite sites or those with above-average species counts
# Replace 'favorite_sites' with actual site names of interest
favorite_sites <- c("eagle point", "Cotton Mill") # Example site names
filtered_data <- climate %>%
  filter(site_town %in% favorite_sites | species_count > average_species_count)

# Plotting species counts for these selected sites/towns, colored by environmental condition (e.g., max_temp)
ggplot(filtered_data, aes(x = site_town, y = species_count, color = max_temp)) +
  geom_jitter(alpha = 0.5) +
  theme_minimal() +
  labs(
    title = "Species Counts in Selected Sites/Towns Colored by Maximum Temperature",
    x = "Site/Town",
    y = "Species Count",
    color = "Max Temp (°C)"
  )

# Additionally, if looking into precipitation effects, you can change 'color = max_temp' to 'color = precip_mm_avg_month'

Analysis

In this analysis, I’m delving into the relationship between environmental conditions and the number of invasive species in specific locations, especially those exhibiting above-average species diversity. By crunching numbers to calculate average species counts and filtering the data for particular locations or conditions (like above-average species numbers or specific maximum temperatures), my aim is to uncover the environmental preferences of invasive species. Presenting this filtered data, colored by environmental factors such as temperature, offers insights into how different conditions influence species distribution and abundance.

This approach is vital for understanding the ecological dynamics of invasive species spread, crafting targeted management strategies, and forecasting the impact of climate change on biodiversity. Essentially, it underscores the pivotal role of local environmental conditions in shaping biological communities and stresses the significance of tailored conservation measures.

Identifying Patterns, Outliers, and Informing Management Strategies

To recognize patterns and outliers and develop management strategies based on the climate data set, you will use a mixture of boxplots, violin plots, scatterplots and histograms. These visualizations can help you understand the distribution of species numbers, the influence of climatic variables and the identification of anomalies.

### Identifying Patterns, Outliers, and Informing Management Strategies

# Combination Boxplot and Scatter Plot

climate %>%
  filter(species_count_is_above_average == "TRUE") %>%
  ggplot(aes(x = site_town, y = log_species_count)) +
  geom_boxplot() +
  geom_point(aes(color = as.factor(Year))) +
  coord_flip() +
  labs(title = "Scatter Plot of Log-Transformed Species Count by Year", 
       x = "Site Town", 
       y = "Log-transformed Species Count")

# Violin plot of species counts by year to observe distribution patterns
ggplot(climate, aes(x = factor(Year), y = species_count)) +
  geom_violin(fill = "lightblue") +
  labs(title = "Species Counts by Year", x = "Year", y = "Species Count")

# Histogram of species counts to visualize the overall distribution
ggplot(climate, aes(x = species_count, group = Year)) +
  geom_histogram(binwidth = 1, fill = "darkgreen", color = "black") +
  labs(title = "Distribution of Species Counts in Year: {frame_time}", x = "Species Count", y = "Frequency") +
  theme_minimal() +
  transition_time(Year) + # Animate over 'Year'
  ease_aes('linear')

Analysis

The logarithm transformation proves invaluable in normalizing the data and unveiling patterns that remain hidden in the raw counts, especially when dealing with widely separated values. Through this visualization, I can discern how species counts fluctuate across different sites, with year-specific color coding revealing trends over time. Some sites exhibit consistent patterns across the years, while others display spikes or declines in counts, possibly signaling ecological events or the outcomes of management efforts.

The violin graph, segmented by year, provides a visual representation of the distribution of species numbers throughout the years, enabling me to track shifts over time. This offers insights into whether certain years experienced more conducive conditions for the proliferation of invasive species.

Furthermore, the histogram furnishes a comprehensive overview of the distribution of species numbers across the entire dataset. A skewed distribution might indicate a significantly higher concentration of species in a few locations, warranting targeted management strategies.

Conclusion

My analysis of invasive species and their relationship to climatic conditions in different locations and years reveals important implications for invasive species management. In particular, I found that higher temperatures and fluctuating precipitation are associated with an increase in the number of invasive species, highlighting the exacerbating effect of climate change on the challenges posed by invasive species.

However, it is important to recognize the limitations of my study. The crowding of data points in some visualizations compromised clarity and obscured patterns. In addition, by using only the available climate and species count data, other influencing factors such as habitat characteristics or human-induced changes may have been overlooked. The observational nature of my dataset also limits the causal determination between environmental conditions and the occurrence of invasive species.

The lack of data in important columns further complicates the analysis and made it necessary to rely on R-Ladies video and this week reading material to try and create a virbrant story. Addressing these limitations will help me to improve the reliability in future research.

Sources: