Analyzing Biodiversity Metrics

Author

Lani Elise Thompson

Published

November 23, 2025

Question 1: Setup, Load, and Explore Dataset

Load Dataset

# Load the California species abundances dataset
species_data <- read_csv("california_species_abundances.csv")
Rows: 85 Columns: 21
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Species
dbl (20): Yosemite Valley, Big Sur Coast, Mojave Desert, Sierra Foothills, Point Reyes, Lake Tahoe, Death Valley, Sa...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the first few rows to understand data structure
kable(head(species_data, 10),
      caption = "First 10 Species in the California Wildlife Dataset",
      align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14,
                position = "left")
First 10 Species in the California Wildlife Dataset
Species Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
California Quail 5 0 4 8 9 5 6 7 10 8 1 3 4 2 2 2 4 1 6 9
American Kestrel 8 9 0 9 5 5 2 3 2 7 7 1 2 4 4 3 4 8 10 6
Western Bluebird 6 2 4 7 2 11 8 4 0 9 1 6 9 7 1 12 10 13 7 6
Acorn Woodpecker 3 11 4 6 3 2 8 2 3 5 8 3 8 5 10 0 5 11 4 6
Red-tailed Hawk 4 3 7 0 2 3 4 2 12 10 1 5 12 1 1 7 12 6 5 9
Great Egret 3 4 2 12 9 3 0 1 8 0 4 3 4 5 9 6 0 7 3 3
Snowy Plover 2 10 8 10 5 7 4 3 9 11 0 4 7 2 13 4 11 6 0 3
Peregrine Falcon 7 12 3 1 1 5 7 3 4 1 5 8 7 2 4 10 10 13 11 6
Northern Flicker 3 8 12 7 6 0 11 3 7 5 6 8 2 7 6 1 4 10 0 9
Western Meadowlark 3 4 2 7 5 3 0 2 1 9 1 3 10 10 1 6 1 0 8 2

Explore Dataset Structure

# Explore the dataset structure and dimensions
cat("Dataset dimensions:", dim(species_data), "\n")
Dataset dimensions: 85 21 
cat("Number of species:", nrow(species_data), "\n")
Number of species: 85 
cat("Number of sites:", ncol(species_data) - 1, "\n")
Number of sites: 20 
# Summary statistics
summary(species_data)
   Species          Yosemite Valley  Big Sur Coast    Mojave Desert    Sierra Foothills  Point Reyes    
 Length:85          Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 Class :character   1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 3.000   1st Qu.: 2.000  
 Mode  :character   Median : 4.000   Median : 4.000   Median : 4.000   Median : 5.000   Median : 5.000  
                    Mean   : 5.024   Mean   : 4.788   Mean   : 5.271   Mean   : 5.306   Mean   : 5.106  
                    3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 8.000  
                    Max.   :14.000   Max.   :14.000   Max.   :15.000   Max.   :15.000   Max.   :13.000  
   Lake Tahoe      Death Valley    Santa Monica Mountains Channel Islands  Central Valley Wetlands
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000         Min.   : 0.000   Min.   : 0.000         
 1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 2.000         1st Qu.: 3.000   1st Qu.: 3.000         
 Median : 5.000   Median : 5.000   Median : 4.000         Median : 5.000   Median : 5.000         
 Mean   : 5.318   Mean   : 5.388   Mean   : 4.553         Mean   : 5.141   Mean   : 5.318         
 3rd Qu.: 8.000   3rd Qu.: 8.000   3rd Qu.: 7.000         3rd Qu.: 7.000   3rd Qu.: 8.000         
 Max.   :15.000   Max.   :13.000   Max.   :12.000         Max.   :17.000   Max.   :13.000         
 San Gabriel Mountains  Anza-Borrego  Redwood National Park   Salton Sea     Lassen Volcanic Park Elkhorn Slough  
 Min.   : 0.000        Min.   : 0.0   Min.   : 0.000        Min.   : 0.000   Min.   : 0.000       Min.   : 0.000  
 1st Qu.: 2.000        1st Qu.: 3.0   1st Qu.: 2.000        1st Qu.: 3.000   1st Qu.: 2.000       1st Qu.: 2.000  
 Median : 4.000        Median : 5.0   Median : 5.000        Median : 6.000   Median : 4.000       Median : 4.000  
 Mean   : 4.482        Mean   : 4.8   Mean   : 5.424        Mean   : 5.541   Mean   : 5.424       Mean   : 5.012  
 3rd Qu.: 6.000        3rd Qu.: 7.0   3rd Qu.: 8.000        3rd Qu.: 8.000   3rd Qu.: 8.000       3rd Qu.: 7.000  
 Max.   :12.000        Max.   :13.0   Max.   :14.000        Max.   :13.000   Max.   :14.000       Max.   :14.000  
 Carrizo Plain     Mount Shasta    San Diego Chaparral Lake Berryessa  
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000      Min.   : 0.000  
 1st Qu.: 2.000   1st Qu.: 3.000   1st Qu.: 3.000      1st Qu.: 3.000  
 Median : 4.000   Median : 5.000   Median : 5.000      Median : 5.000  
 Mean   : 5.353   Mean   : 5.635   Mean   : 5.294      Mean   : 5.424  
 3rd Qu.: 8.000   3rd Qu.: 8.000   3rd Qu.: 8.000      3rd Qu.: 8.000  
 Max.   :16.000   Max.   :16.000   Max.   :13.000      Max.   :14.000  

Dataset Description

This dataset contains abundance data for 86 species across 20 different California sites. The sites represent diverse ecosystems across California, including coastal areas (Big Sur Coast, Point Reyes, Channel Islands), high-elevation locations (Yosemite Valley, Mount Shasta, Lassen Volcanic Park), deserts (Mojave Desert, Death Valley, Anza-Borrego), wetlands (Central Valley Wetlands, Salton Sea, Elkhorn Slough), and various other habitat types.

The species include a diverse array of California wildlife: birds (California Quail, Peregrine Falcon, Great Egret), mammals (Mule Deer, Mountain Lion, Coyote, Bobcat, Black Bear), reptiles (Western Rattlesnake, Gopher Snake, various lizards), amphibians (California Red-legged Frog, Pacific Tree Frog, various salamanders), and fish (Chinook Salmon, Steelhead Trout, Sacramento Sucker).

This type of data would typically be collected through systematic field surveys conducted by wildlife biologists and ecological researchers. Different survey methods would be employed for different taxa: point counts and transect surveys for birds, camera traps and track surveys for mammals, visual encounter surveys for reptiles and amphibians, and electrofishing or net surveys for aquatic species. Surveys would be standardized across sites to allow for meaningful comparisons of species abundances.

Exploratory Visualizations

Histogram of Species Abundances

# Create histogram of species abundances across all sites
species_long_temp <- species_data %>%
  pivot_longer(cols = -Species, names_to = "Site", values_to = "Abundance")

ggplot(species_long_temp, aes(x = Abundance)) +
  geom_histogram(bins = 30, fill = "steelblue", color = "black", alpha = 0.7) +
  labs(title = "Distribution of Species Abundances Across All Sites",
       x = "Abundance Count",
       y = "Frequency") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

The histogram shows that most abundance counts are relatively low (0-10 individuals), with a right-skewed distribution. This is typical of ecological data where most species are present at low to moderate abundances, with occasional high abundance counts for particularly common or well-suited species.

Total Abundance by Site

# Calculate total abundance per site
site_totals <- species_data %>%
  pivot_longer(cols = -Species, names_to = "Site", values_to = "Abundance") %>%
  group_by(Site) %>%
  summarize(Total_Abundance = sum(Abundance)) %>%
  arrange(desc(Total_Abundance))

# Bar plot of total abundance per site
ggplot(site_totals, aes(x = reorder(Site, Total_Abundance), y = Total_Abundance)) +
  geom_bar(stat = "identity", fill = "forestgreen", alpha = 0.7) +
  coord_flip() +
  labs(title = "Total Species Abundance by California Site",
       x = "Site",
       y = "Total Abundance Count") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14),
        axis.text.y = element_text(size = 13))

Total abundance varies considerably across sites, with some sites showing much higher overall wildlife counts than others. This could reflect differences in habitat productivity, sampling effort, or the suitability of different habitats for supporting diverse wildlife communities.

Species Richness by Site

# Calculate species richness (number of species present) per site
species_richness <- species_data %>%
  pivot_longer(cols = -Species, names_to = "Site", values_to = "Abundance") %>%
  filter(Abundance > 0) %>%
  group_by(Site) %>%
  summarize(Species_Richness = n()) %>%
  arrange(desc(Species_Richness))

# Bar plot of species richness per site
ggplot(species_richness, aes(x = reorder(Site, Species_Richness), y = Species_Richness)) +
  geom_bar(stat = "identity", fill = "darkorange", alpha = 0.7) +
  coord_flip() +
  labs(title = "Species Richness by California Site",
       subtitle = "Number of Species Recorded at Each Location",
       x = "Site",
       y = "Number of Species") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14),
        axis.text.y = element_text(size = 13))

Species richness shows interesting variation across sites. Most sites support between 70-85 species, but there is notable variation. Sites with lower species richness may be more extreme environments (like Death Valley or Mojave Desert) or more specialized habitats, while sites with higher richness may represent ecotonal areas or more heterogeneous landscapes.

Species Richness vs Total Abundance

# Create a comparison plot
comparison_data <- site_totals %>%
  left_join(species_richness, by = "Site")

ggplot(comparison_data, aes(x = Species_Richness, y = Total_Abundance)) +
  geom_point(size = 5, color = "darkblue", alpha = 0.7) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 4.5, check_overlap = TRUE) +
  labs(title = "Species Richness vs Total Abundance",
       subtitle = "Relationship between diversity and abundance across California sites",
       x = "Species Richness (Number of Species)",
       y = "Total Abundance") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

The scatterplot reveals an interesting pattern: sites with higher species richness do not necessarily have the highest total abundances. This suggests that some sites may support many species at low abundances (high diversity, more even communities), while others may be dominated by a few very abundant species (lower diversity, less even communities).

Data Cleaning Needs

The dataset appears well-structured but will need to be reformatted for analysis with the vegan package. Specifically, we need to: (1) convert from wide format to long format for some analyses, (2) create a proper community data matrix with sites as rows and species as columns, and (3) ensure there are no missing values or data quality issues that could affect diversity calculations.


Question 2: Clean and Wrangle Data

Create Long Format Dataset

# Create LONG FORMAT dataset
# This format is useful for ggplot2 visualizations and some analyses
# Each row represents a single species at a single site
species_long <- species_data %>%
  pivot_longer(cols = -Species,
               names_to = "Site",
               values_to = "Abundance") %>%
  filter(!is.na(Abundance))  # Remove any NA values

# Display structure
cat("Long format dimensions:", dim(species_long), "\n\n")
Long format dimensions: 1700 3 
kable(head(species_long, 15),
      caption = "Long Format Data Structure (First 15 Rows)",
      align = "lrr",
      col.names = c("Species", "Site", "Abundance")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14,
                position = "left")
Long Format Data Structure (First 15 Rows)
Species Site Abundance
California Quail Yosemite Valley 5
California Quail Big Sur Coast 0
California Quail Mojave Desert 4
California Quail Sierra Foothills 8
California Quail Point Reyes 9
California Quail Lake Tahoe 5
California Quail Death Valley 6
California Quail Santa Monica Mountains 7
California Quail Channel Islands 10
California Quail Central Valley Wetlands 8
California Quail San Gabriel Mountains 1
California Quail Anza-Borrego 3
California Quail Redwood National Park 4
California Quail Salton Sea 2
California Quail Lassen Volcanic Park 2

Why this step: The long format transformation converts our data from wide format (where each site is a column) to long format (where each row represents one species-site combination). This format is essential for many tidyverse operations and visualizations. We use pivot_longer() to reshape the data, creating a “Site” column for location names and an “Abundance” column for the counts. We also filter out any NA values to ensure clean data for analysis.

Create Wide Format Dataset

# Create WIDE FORMAT dataset (clean version)
species_wide <- species_long %>%
  pivot_wider(names_from = Site,
              values_from = Abundance,
              values_fill = 0)

cat("Wide format dimensions:", dim(species_wide), "\n\n")
Wide format dimensions: 85 21 
kable(head(species_wide[, 1:8], 10),
      caption = "Wide Format Data Structure (First 10 Species, First 8 Sites)",
      align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE,
                font_size = 13,
                position = "left") %>%
  scroll_box(width = "100%")
Wide Format Data Structure (First 10 Species, First 8 Sites)
Species Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe Death Valley
California Quail 5 0 4 8 9 5 6
American Kestrel 8 9 0 9 5 5 2
Western Bluebird 6 2 4 7 2 11 8
Acorn Woodpecker 3 11 4 6 3 2 8
Red-tailed Hawk 4 3 7 0 2 3 4
Great Egret 3 4 2 12 9 3 0
Snowy Plover 2 10 8 10 5 7 4
Peregrine Falcon 7 12 3 1 1 5 7
Northern Flicker 3 8 12 7 6 0 11
Western Meadowlark 3 4 2 7 5 3 0

Why this step: The wide format keeps our data in the original structure but ensures it’s clean and standardized. Each species is a row, and each site is a column. We use values_fill = 0 to replace any missing values with zeros, which is ecologically meaningful (absence = zero abundance). This format is intuitive for viewing the full dataset and is useful for some matrix operations.

Create Community Data Matrix

# Create COMMUNITY DATA MATRIX for vegan package
# This is the CRITICAL format for diversity analyses
# Rows = sites (sampling units), Columns = species
# This is the TRANSPOSE of our wide format

community_matrix <- species_wide %>%
  select(-Species) %>%              
  t() %>%                           
  as.data.frame()                  

# Set column names to species names
colnames(community_matrix) <- species_wide$Species

# Set row names to site names
rownames(community_matrix) <- colnames(species_wide)[-1]  # Exclude "Species" column

# Display structure
cat("Community matrix dimensions:", dim(community_matrix), "\n")
Community matrix dimensions: 20 85 
cat("Sites (rows):", nrow(community_matrix), "\n")
Sites (rows): 20 
cat("Species (columns):", ncol(community_matrix), "\n\n")
Species (columns): 85 
kable(community_matrix[1:10, 1:8],
      caption = "Community Data Matrix (First 10 Sites, First 8 Species)",
      align = "r") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE,
                font_size = 13,
                position = "left") %>%
  scroll_box(width = "100%", height = "400px")
Community Data Matrix (First 10 Sites, First 8 Species)
California Quail American Kestrel Western Bluebird Acorn Woodpecker Red-tailed Hawk Great Egret Snowy Plover Peregrine Falcon
Yosemite Valley 5 8 6 3 4 3 2 7
Big Sur Coast 0 9 2 11 3 4 10 12
Mojave Desert 4 0 4 4 7 2 8 3
Sierra Foothills 8 9 7 6 0 12 10 1
Point Reyes 9 5 2 3 2 9 5 1
Lake Tahoe 5 5 11 2 3 3 7 5
Death Valley 6 2 8 8 4 0 4 7
Santa Monica Mountains 7 3 4 2 2 1 3 3
Channel Islands 10 2 0 3 12 8 9 4
Central Valley Wetlands 8 7 9 5 10 0 11 1

Why this step: The community data matrix is the most important data structure for ecological diversity analysis using the vegan package. The vegan package requires data in this specific format: rows represent sites (sampling units) and columns represent species. This is the opposite of our original data structure, so we transpose the matrix. Each cell contains the abundance of a particular species at a particular site. This format allows vegan functions to calculate diversity indices, distance metrics, and perform ordination analyses correctly. The row names are site names and column names are species names, which ensures proper labeling in all downstream analyses.


Question 3: Calculate Diversity Metrics for 3 Sites

Selected Sites

# Select 3 diverse sites for detailed comparison
selected_sites <- c("Lake Tahoe", "Death Valley", "Channel Islands")

Species Richness

# Calculate Species Richness (S) for selected sites
richness_selected <- data.frame(
  Site = selected_sites,
  Species_Richness = sapply(selected_sites, function(site) {
    sum(community_matrix[site, ] > 0)
  })
)

cat("Species Richness for Selected Sites:\n\n")
Species Richness for Selected Sites:
kable(richness_selected,
      caption = "Species Richness for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Species Richness")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Species Richness for Three Selected California Sites
Site Species Richness
Lake Tahoe Lake Tahoe 79
Death Valley Death Valley 78
Channel Islands Channel Islands 80

Shannon Diversity Index

# Calculate Shannon Diversity Index (H') for selected sites
shannon_selected <- data.frame(
  Site = selected_sites,
  Shannon_Index = sapply(selected_sites, function(site) {
    diversity(community_matrix[site, ], index = "shannon")
  })
)

cat("Shannon Diversity Index (H') for Selected Sites:\n\n")
Shannon Diversity Index (H') for Selected Sites:
kable(shannon_selected,
      caption = "Shannon Diversity Index for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Shannon Index (H')"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Shannon Diversity Index for Three Selected California Sites
Site Shannon Index (H')
Lake Tahoe Lake Tahoe 4.1765
Death Valley Death Valley 4.1851
Channel Islands Channel Islands 4.2261

Simpson’s Diversity Index

# Calculate Simpson's Diversity Index (D) for selected sites
simpson_selected <- data.frame(
  Site = selected_sites,
  Simpson_Index = sapply(selected_sites, function(site) {
    diversity(community_matrix[site, ], index = "simpson")
  })
)

cat("Simpson's Diversity Index (D) for Selected Sites:\n\n")
Simpson's Diversity Index (D) for Selected Sites:
kable(simpson_selected,
      caption = "Simpson's Diversity Index (Gini-Simpson) for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Simpson Index (1-D)"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Simpson's Diversity Index (Gini-Simpson) for Three Selected California Sites
Site Simpson Index (1-D)
Lake Tahoe Lake Tahoe 0.9826
Death Valley Death Valley 0.9831
Channel Islands Channel Islands 0.9836
# Also calculate the Inverse Simpson's Index
simpson_original <- data.frame(
  Site = selected_sites,
  Simpson_Dominance = sapply(selected_sites, function(site) {
    diversity(community_matrix[site, ], index = "invsimpson")
  })
)

cat("\n\nInverse Simpson's Index (1/D) for Selected Sites:\n\n")


Inverse Simpson's Index (1/D) for Selected Sites:
kable(simpson_original,
      caption = "Inverse Simpson's Index for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Inverse Simpson (1/D)"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Inverse Simpson's Index for Three Selected California Sites
Site Inverse Simpson (1/D)
Lake Tahoe Lake Tahoe 57.6153
Death Valley Death Valley 59.1885
Channel Islands Channel Islands 60.9930

Combined Comparison Table

# Combine all diversity metrics for comparison
diversity_comparison <- richness_selected %>%
  left_join(shannon_selected, by = "Site") %>%
  left_join(simpson_selected, by = "Site")

cat("\nCombined Diversity Metrics for All Three Sites:\n\n")

Combined Diversity Metrics for All Three Sites:
kable(diversity_comparison,
      caption = "Combined Diversity Metrics: Comparison Across Three Sites",
      align = "lrrr",
      col.names = c("Site", "Species Richness", "Shannon Index (H')", "Simpson Index (1-D)"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"),
                full_width = FALSE,
                font_size = 16,
                position = "center") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2:4, width = "10em") %>%
  row_spec(0, bold = TRUE, color = "white", background = "#3498db")
Combined Diversity Metrics: Comparison Across Three Sites
Site Species Richness Shannon Index (H') Simpson Index (1-D)
Lake Tahoe 79 4.1765 0.9826
Death Valley 78 4.1851 0.9831
Channel Islands 80 4.2261 0.9836

Visualization: Comparison of Metrics

# Create visualization comparing all three metrics
diversity_long_comparison <- diversity_comparison %>%
  pivot_longer(cols = -Site,
               names_to = "Metric",
               values_to = "Value")

ggplot(diversity_long_comparison, aes(x = Site, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.7) +
  facet_wrap(~Metric, scales = "free_y") +
  labs(title = "Comparison of Diversity Metrics Across Three California Sites",
       subtitle = "Species Richness, Shannon Index, and Simpson's Index",
       x = "Site",
       y = "Index Value") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 13),
        axis.text.x = element_text(angle = 45, hjust = 1, size = 14),
        strip.text = element_text(size = 15, face = "bold"),
        legend.position = "none")

Observations and Interpretation

The diversity metrics reveal interesting patterns among the three selected sites:

Sierra Foothills shows high species richness with relatively high Shannon and Simpson indices, indicating a diverse community with good evenness. This site supports many species without strong dominance by any single species, reflecting the heterogeneous habitat typical of foothill ecosystems.

Death Valley, despite being an extreme desert environment, maintains moderate diversity. While species richness might be slightly lower than other sites, the Shannon and Simpson indices suggest that the species present are relatively evenly distributed, indicating specialized communities adapted to harsh conditions.

Channel Islands demonstrates unique diversity patterns characteristic of island ecosystems. The diversity indices reflect the balance between limited species pool (due to island biogeography) and the potential for high evenness among established species.

These results align well with our initial exploratory plots. Sites showing higher total abundances don’t necessarily have the highest diversity indices, confirming that abundance and diversity are distinct ecological properties. The Shannon index (which is more sensitive to rare species) and Simpson’s index (which is more sensitive to dominant species) provide complementary information about community structure. Sites with similar species richness can have very different Shannon and Simpson values depending on the evenness of the species abundance distribution.


Question 4: Distance Metrics and NMDS

Euclidean Distance Matrix

# Calculate Euclidean distance matrix for all sites
euclidean_dist <- vegdist(community_matrix, method = "euclidean")

# Display the complete distance matrix for all 20 sites
cat("Euclidean Distance Matrix (All 20 Sites):\n")
Euclidean Distance Matrix (All 20 Sites):
print(as.matrix(euclidean_dist))
                        Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe
Yosemite Valley                 0.00000      43.49713      52.71622         43.61192    48.25971   48.61070
Big Sur Coast                  43.49713       0.00000      45.90207         50.21952    47.44470   49.48737
Mojave Desert                  52.71622      45.90207       0.00000         45.26588    49.55805   48.51804
Sierra Foothills               43.61192      50.21952      45.26588          0.00000    45.90207   46.27094
Point Reyes                    48.25971      47.44470      49.55805         45.90207     0.00000   48.31149
Lake Tahoe                     48.61070      49.48737      48.51804         46.27094    48.31149    0.00000
Death Valley                   48.13523      48.38388      44.33960         45.02222    48.39421   45.36518
Santa Monica Mountains         45.71652      47.26521      49.32545         46.64762    45.13314   42.93018
Channel Islands                45.05552      42.87190      46.52956         46.10857    44.10215   48.52834
Central Valley Wetlands        46.74398      43.89761      40.96340         43.64631    46.79744   46.62617
San Gabriel Mountains          46.98936      43.79498      42.27292         45.45327    43.06971   43.94315
Anza-Borrego                   42.53234      41.70132      43.40507         46.22770    41.06093   47.43416
Redwood National Park          46.73329      47.32864      50.56679         45.49725    48.36321   46.01087
Salton Sea                     44.00000      49.09175      46.33573         41.01219    43.66921   47.88528
Lassen Volcanic Park           47.11688      40.66940      46.03260         43.56604    46.67976   44.37342
Elkhorn Slough                 46.67976      48.59012      47.72840         46.85083    51.55580   45.25483
Carrizo Plain                  48.24935      50.55690      53.33854         51.14685    49.12230   50.78386
Mount Shasta                   48.18714      44.38468      50.68530         47.32864    48.42520   44.03408
San Diego Chaparral            45.92385      41.96427      47.64452         48.81598    47.79121   43.47413
Lake Berryessa                 46.21688      45.71652      46.29255         48.22862    45.98913   47.94789
                        Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands
Yosemite Valley             48.13523               45.71652        45.05552                46.74398
Big Sur Coast               48.38388               47.26521        42.87190                43.89761
Mojave Desert               44.33960               49.32545        46.52956                40.96340
Sierra Foothills            45.02222               46.64762        46.10857                43.64631
Point Reyes                 48.39421               45.13314        44.10215                46.79744
Lake Tahoe                  45.36518               42.93018        48.52834                46.62617
Death Valley                 0.00000               43.48563        46.72259                42.28475
Santa Monica Mountains      43.48563                0.00000        37.38984                45.61798
Channel Islands             46.72259               37.38984         0.00000                45.77117
Central Valley Wetlands     42.28475               45.61798        45.77117                 0.00000
San Gabriel Mountains       42.90688               43.19722        42.73172                44.91102
Anza-Borrego                43.26662               36.45545        41.46082                43.15090
Redwood National Park       44.17013               42.98837        44.58699                40.87787
Salton Sea                  44.35087               41.71331        42.91853                43.16248
Lassen Volcanic Park        49.74937               41.44876        45.21062                44.62062
Elkhorn Slough              52.74467               43.64631        47.65501                48.24935
Carrizo Plain               47.52894               47.28636        47.51842                47.31807
Mount Shasta                41.00000               43.86342        44.92215                47.67599
San Diego Chaparral         45.65085               41.79713        46.91482                46.15192
Lake Berryessa              44.75489               43.03487        44.72136                45.24378
                        San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park
Yosemite Valley                      46.98936     42.53234              46.73329   44.00000             47.11688
Big Sur Coast                        43.79498     41.70132              47.32864   49.09175             40.66940
Mojave Desert                        42.27292     43.40507              50.56679   46.33573             46.03260
Sierra Foothills                     45.45327     46.22770              45.49725   41.01219             43.56604
Point Reyes                          43.06971     41.06093              48.36321   43.66921             46.67976
Lake Tahoe                           43.94315     47.43416              46.01087   47.88528             44.37342
Death Valley                         42.90688     43.26662              44.17013   44.35087             49.74937
Santa Monica Mountains               43.19722     36.45545              42.98837   41.71331             41.44876
Channel Islands                      42.73172     41.46082              44.58699   42.91853             45.21062
Central Valley Wetlands              44.91102     43.15090              40.87787   43.16248             44.62062
San Gabriel Mountains                 0.00000     38.71692              44.63183   42.87190             46.34652
Anza-Borrego                         38.71692      0.00000              43.39355   40.18706             40.95119
Redwood National Park                44.63183     43.39355               0.00000   47.30750             49.85980
Salton Sea                           42.87190     40.18706              47.30750    0.00000             45.91296
Lassen Volcanic Park                 46.34652     40.95119              49.85980   45.91296              0.00000
Elkhorn Slough                       42.72002     41.10961              46.18441   42.39104             47.33920
Carrizo Plain                        47.18050     41.07311              50.37857   48.12484             46.10857
Mount Shasta                         45.09989     43.13931              47.72840   46.62617             44.38468
San Diego Chaparral                  44.35087     42.30839              44.93328   43.09292             48.05206
Lake Berryessa                       44.49719     39.78693              45.49725   47.58151             45.65085
                        Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
Yosemite Valley               46.67976      48.24935     48.18714            45.92385       46.21688
Big Sur Coast                 48.59012      50.55690     44.38468            41.96427       45.71652
Mojave Desert                 47.72840      53.33854     50.68530            47.64452       46.29255
Sierra Foothills              46.85083      51.14685     47.32864            48.81598       48.22862
Point Reyes                   51.55580      49.12230     48.42520            47.79121       45.98913
Lake Tahoe                    45.25483      50.78386     44.03408            43.47413       47.94789
Death Valley                  52.74467      47.52894     41.00000            45.65085       44.75489
Santa Monica Mountains        43.64631      47.28636     43.86342            41.79713       43.03487
Channel Islands               47.65501      47.51842     44.92215            46.91482       44.72136
Central Valley Wetlands       48.24935      47.31807     47.67599            46.15192       45.24378
San Gabriel Mountains         42.72002      47.18050     45.09989            44.35087       44.49719
Anza-Borrego                  41.10961      41.07311     43.13931            42.30839       39.78693
Redwood National Park         46.18441      50.37857     47.72840            44.93328       45.49725
Salton Sea                    42.39104      48.12484     46.62617            43.09292       47.58151
Lassen Volcanic Park          47.33920      46.10857     44.38468            48.05206       45.65085
Elkhorn Slough                 0.00000      46.91482     49.60847            41.35215       45.19956
Carrizo Plain                 46.91482       0.00000     46.45428            47.40253       47.15930
Mount Shasta                  49.60847      46.45428      0.00000            49.66890       43.19722
San Diego Chaparral           41.35215      47.40253     49.66890             0.00000       48.54894
Lake Berryessa                45.19956      47.15930     43.19722            48.54894        0.00000

Bray-Curtis Dissimilarity Matrix

# Calculate Bray-Curtis dissimilarity matrix for all sites
bray_curtis_dist <- vegdist(community_matrix, method = "bray")

cat("Bray-Curtis Dissimilarity Matrix (All 20 Sites):\n")
Bray-Curtis Dissimilarity Matrix (All 20 Sites):
print(as.matrix(bray_curtis_dist))
                        Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe
Yosemite Valley               0.0000000     0.3764988     0.4445714        0.3735763   0.4308943  0.4038680
Big Sur Coast                 0.3764988     0.0000000     0.3988304        0.4289044   0.4102259  0.4365541
Mojave Desert                 0.4445714     0.3988304     0.0000000        0.3659622   0.4331066  0.3844444
Sierra Foothills              0.3735763     0.4289044     0.3659622        0.0000000   0.3830508  0.3798450
Point Reyes                   0.4308943     0.4102259     0.4331066        0.3830508   0.0000000  0.4176072
Lake Tahoe                    0.4038680     0.4365541     0.3844444        0.3798450   0.4176072  0.0000000
Death Valley                  0.3943503     0.4127168     0.3532009        0.3751375   0.4170404  0.3714286
Santa Monica Mountains        0.3955774     0.4483627     0.4323353        0.4057279   0.4153471  0.3563766
Channel Islands               0.4027778     0.3483412     0.3875706        0.3761261   0.3754305  0.3993251
Central Valley Wetlands       0.3947668     0.3853318     0.3244444        0.3554817   0.4085779  0.3783186
San Gabriel Mountains         0.4356436     0.3984772     0.3606755        0.3918269   0.3963190  0.3853541
Anza-Borrego                  0.3748503     0.3865031     0.3761682        0.3923166   0.3634204  0.4116279
Redwood National Park         0.3963964     0.4216590     0.4169417        0.3837719   0.4189944  0.3669222
Salton Sea                    0.3585746     0.4145786     0.3710555        0.3167028   0.3679558  0.3846154
Lassen Volcanic Park          0.3941441     0.3271889     0.3685369        0.3552632   0.3899441  0.3537788
Elkhorn Slough                0.4114889     0.4405762     0.3981693        0.3911060   0.4697674  0.3667426
Carrizo Plain                 0.4217687     0.4338747     0.4197121        0.4105960   0.4150731  0.3958104
Mount Shasta                  0.4150110     0.3521445     0.4066882        0.3849462   0.3932092  0.3555317
San Diego Chaparral           0.3911060     0.3395566     0.3719376        0.3962264   0.4049774  0.3481153
Lake Berryessa                0.3806306     0.3963134     0.3707371        0.3969298   0.3832402  0.3866375
                        Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands
Yosemite Valley            0.3943503              0.3955774       0.4027778               0.3947668
Big Sur Coast              0.4127168              0.4483627       0.3483412               0.3853318
Mojave Desert              0.3532009              0.4323353       0.3875706               0.3244444
Sierra Foothills           0.3751375              0.4057279       0.3761261               0.3554817
Point Reyes                0.4170404              0.4153471       0.3754305               0.4085779
Lake Tahoe                 0.3714286              0.3563766       0.3993251               0.3783186
Death Valley               0.0000000              0.3940828       0.3877095               0.3560440
Santa Monica Mountains     0.3940828              0.0000000       0.3373786               0.4088200
Channel Islands            0.3877095              0.3373786       0.0000000               0.3835771
Central Valley Wetlands    0.3560440              0.4088200       0.3835771               0.0000000
San Gabriel Mountains      0.3897497              0.4218750       0.3863081               0.4117647
Anza-Borrego               0.3602771              0.3333333       0.3514793               0.3860465
Redwood National Park      0.3427639              0.3750000       0.3608018               0.3362541
Salton Sea                 0.3541442              0.3519814       0.3546256               0.3434453
Lassen Volcanic Park       0.4058760              0.3655660       0.3674833               0.3537788
Elkhorn Slough             0.4524887              0.4022140       0.4090382               0.4191344
Carrizo Plain              0.3625411              0.4133017       0.3946188               0.3781698
Mount Shasta               0.3191035              0.3602771       0.3646288               0.3813104
San Diego Chaparral        0.3634361              0.3643967       0.3866967               0.3813747
Lake Berryessa             0.3536453              0.3726415       0.3741648               0.3625411
                        San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park
Yosemite Valley                     0.4356436    0.3748503             0.3963964  0.3585746            0.3941441
Big Sur Coast                       0.3984772    0.3865031             0.4216590  0.4145786            0.3271889
Mojave Desert                       0.3606755    0.3761682             0.4169417  0.3710555            0.3685369
Sierra Foothills                    0.3918269    0.3923166             0.3837719  0.3167028            0.3552632
Point Reyes                         0.3963190    0.3634204             0.4189944  0.3679558            0.3899441
Lake Tahoe                          0.3853541    0.4116279             0.3669222  0.3846154            0.3537788
Death Valley                        0.3897497    0.3602771             0.3427639  0.3541442            0.4058760
Santa Monica Mountains              0.4218750    0.3333333             0.3750000  0.3519814            0.3655660
Channel Islands                     0.3863081    0.3514793             0.3608018  0.3546256            0.3674833
Central Valley Wetlands             0.4117647    0.3860465             0.3362541  0.3434453            0.3537788
San Gabriel Mountains               0.0000000    0.3688213             0.3729216  0.3732394            0.3990499
Anza-Borrego                        0.3688213    0.0000000             0.3808976  0.3424346            0.3578826
Redwood National Park               0.3729216    0.3808976             0.0000000  0.3841202            0.4056399
Salton Sea                          0.3732394    0.3424346             0.3841202  0.0000000            0.3519313
Lassen Volcanic Park                0.3990499    0.3578826             0.4056399  0.3519313            0.0000000
Elkhorn Slough                      0.3977695    0.3812950             0.3596392  0.3489409            0.3957159
Carrizo Plain                       0.4043062    0.3371958             0.3995633  0.3909287            0.3580786
Mount Shasta                        0.3930233    0.3573844             0.3808511  0.3705263            0.3446809
San Diego Chaparral                 0.3983153    0.3473193             0.3743139  0.3463626            0.3874863
Lake Berryessa                      0.3895487    0.3371692             0.3644252  0.3884120            0.3644252
                        Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
Yosemite Valley              0.4114889     0.4217687    0.4150110           0.3911060      0.3806306
Big Sur Coast                0.4405762     0.4338747    0.3521445           0.3395566      0.3963134
Mojave Desert                0.3981693     0.4197121    0.4066882           0.3719376      0.3707371
Sierra Foothills             0.3911060     0.4105960    0.3849462           0.3962264      0.3969298
Point Reyes                  0.4697674     0.4150731    0.3932092           0.4049774      0.3832402
Lake Tahoe                   0.3667426     0.3958104    0.3555317           0.3481153      0.3866375
Death Valley                 0.4524887     0.3625411    0.3191035           0.3634361      0.3536453
Santa Monica Mountains       0.4022140     0.4133017    0.3602771           0.3643967      0.3726415
Channel Islands              0.4090382     0.3946188    0.3646288           0.3866967      0.3741648
Central Valley Wetlands      0.4191344     0.3781698    0.3813104           0.3813747      0.3625411
San Gabriel Mountains        0.3977695     0.4043062    0.3930233           0.3983153      0.3895487
Anza-Borrego                 0.3812950     0.3371958    0.3573844           0.3473193      0.3371692
Redwood National Park        0.3596392     0.3995633    0.3808511           0.3743139      0.3644252
Salton Sea                   0.3489409     0.3909287    0.3705263           0.3463626      0.3884120
Lassen Volcanic Park         0.3957159     0.3580786    0.3446809           0.3874863      0.3644252
Elkhorn Slough               0.0000000     0.3870602    0.4099448           0.3447489      0.3709132
Carrizo Plain                0.3870602     0.0000000    0.3811563           0.3723757      0.3864629
Mount Shasta                 0.4099448     0.3811563    0.0000000           0.3950484      0.3361702
San Diego Chaparral          0.3447489     0.3723757    0.3950484           0.0000000      0.4072448
Lake Berryessa               0.3709132     0.3864629    0.3361702           0.4072448      0.0000000

Jaccard Distance Matrix

# Calculate Jaccard distance matrix for all sites
jaccard_dist <- vegdist(community_matrix, method = "jaccard", binary = TRUE)

cat("Jaccard Distance Matrix (All 20 Sites):\n")
Jaccard Distance Matrix (All 20 Sites):
print(as.matrix(jaccard_dist))
                        Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe
Yosemite Valley              0.00000000     0.1428571    0.12941176        0.1309524  0.16470588 0.14117647
Big Sur Coast                0.14285714     0.0000000    0.15294118        0.1764706  0.16666667 0.16470588
Mojave Desert                0.12941176     0.1529412    0.00000000        0.1411765  0.10843373 0.12941176
Sierra Foothills             0.13095238     0.1764706    0.14117647        0.0000000  0.15476190 0.13095238
Point Reyes                  0.16470588     0.1666667    0.10843373        0.1547619  0.00000000 0.16470588
Lake Tahoe                   0.14117647     0.1647059    0.12941176        0.1309524  0.16470588 0.00000000
Death Valley                 0.15294118     0.1764706    0.14117647        0.1428571  0.17647059 0.15294118
Santa Monica Mountains       0.11764706     0.1411765    0.10588235        0.1071429  0.14117647 0.11764706
Channel Islands              0.10714286     0.1309524    0.11764706        0.1411765  0.15294118 0.12941176
Central Valley Wetlands      0.15476190     0.2000000    0.14285714        0.1666667  0.15662651 0.17647059
San Gabriel Mountains        0.14117647     0.1647059    0.10714286        0.1309524  0.12048193 0.14117647
Anza-Borrego                 0.12941176     0.1529412    0.11764706        0.1411765  0.15294118 0.12941176
Redwood National Park        0.14117647     0.1647059    0.10714286        0.1529412  0.14285714 0.14117647
Salton Sea                   0.11764706     0.1411765    0.10588235        0.1294118  0.14117647 0.09523810
Lassen Volcanic Park         0.09411765     0.1176471    0.08235294        0.1058824  0.11764706 0.09411765
Elkhorn Slough               0.12941176     0.1309524    0.11764706        0.1411765  0.13095238 0.12941176
Carrizo Plain                0.11764706     0.1411765    0.08333333        0.1071429  0.09638554 0.11764706
Mount Shasta                 0.12941176     0.1529412    0.11764706        0.1411765  0.13095238 0.12941176
San Diego Chaparral          0.11764706     0.1190476    0.10588235        0.1294118  0.14117647 0.09523810
Lake Berryessa               0.10714286     0.1309524    0.11764706        0.1411765  0.13095238 0.10714286
                        Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands
Yosemite Valley           0.15294118             0.11764706      0.10714286               0.1547619
Big Sur Coast             0.17647059             0.14117647      0.13095238               0.2000000
Mojave Desert             0.14117647             0.10588235      0.11764706               0.1428571
Sierra Foothills          0.14285714             0.10714286      0.14117647               0.1666667
Point Reyes               0.17647059             0.14117647      0.15294118               0.1566265
Lake Tahoe                0.15294118             0.11764706      0.12941176               0.1764706
Death Valley              0.00000000             0.12941176      0.14117647               0.1445783
Santa Monica Mountains    0.12941176             0.00000000      0.08333333               0.1529412
Channel Islands           0.14117647             0.08333333      0.00000000               0.1647059
Central Valley Wetlands   0.14457831             0.15294118      0.16470588               0.0000000
San Gabriel Mountains     0.15294118             0.09523810      0.12941176               0.1764706
Anza-Borrego              0.14117647             0.10588235      0.11764706               0.1428571
Redwood National Park     0.10843373             0.11764706      0.12941176               0.1097561
Salton Sea                0.10714286             0.09411765      0.10588235               0.1529412
Lassen Volcanic Park      0.10588235             0.04761905      0.05952381               0.1294118
Elkhorn Slough            0.14117647             0.10588235      0.11764706               0.1647059
Carrizo Plain             0.08433735             0.09411765      0.10588235               0.1309524
Mount Shasta              0.11904762             0.10588235      0.11764706               0.1428571
San Diego Chaparral       0.12941176             0.09411765      0.08333333               0.1529412
Lake Berryessa            0.14117647             0.08333333      0.11764706               0.1647059
                        San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park
Yosemite Valley                    0.14117647   0.12941176            0.14117647 0.11764706           0.09411765
Big Sur Coast                      0.16470588   0.15294118            0.16470588 0.14117647           0.11764706
Mojave Desert                      0.10714286   0.11764706            0.10714286 0.10588235           0.08235294
Sierra Foothills                   0.13095238   0.14117647            0.15294118 0.12941176           0.10588235
Point Reyes                        0.12048193   0.15294118            0.14285714 0.14117647           0.11764706
Lake Tahoe                         0.14117647   0.12941176            0.14117647 0.09523810           0.09411765
Death Valley                       0.15294118   0.14117647            0.10843373 0.10714286           0.10588235
Santa Monica Mountains             0.09523810   0.10588235            0.11764706 0.09411765           0.04761905
Channel Islands                    0.12941176   0.11764706            0.12941176 0.10588235           0.05952381
Central Valley Wetlands            0.17647059   0.14285714            0.10975610 0.15294118           0.12941176
San Gabriel Mountains              0.00000000   0.12941176            0.11904762 0.11764706           0.09411765
Anza-Borrego                       0.12941176   0.00000000            0.12941176 0.10588235           0.08235294
Redwood National Park              0.11904762   0.12941176            0.00000000 0.09523810           0.09411765
Salton Sea                         0.11764706   0.10588235            0.09523810 0.00000000           0.07058824
Lassen Volcanic Park               0.09411765   0.08235294            0.09411765 0.07058824           0.00000000
Elkhorn Slough                     0.12941176   0.11764706            0.12941176 0.10588235           0.08235294
Carrizo Plain                      0.09523810   0.10588235            0.09523810 0.07142857           0.07058824
Mount Shasta                       0.12941176   0.11764706            0.12941176 0.10588235           0.05952381
San Diego Chaparral                0.09523810   0.10588235            0.11764706 0.07142857           0.07058824
Lake Berryessa                     0.12941176   0.09523810            0.12941176 0.10588235           0.08235294
                        Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
Yosemite Valley             0.12941176    0.11764706   0.12941176          0.11764706     0.10714286
Big Sur Coast               0.13095238    0.14117647   0.15294118          0.11904762     0.13095238
Mojave Desert               0.11764706    0.08333333   0.11764706          0.10588235     0.11764706
Sierra Foothills            0.14117647    0.10714286   0.14117647          0.12941176     0.14117647
Point Reyes                 0.13095238    0.09638554   0.13095238          0.14117647     0.13095238
Lake Tahoe                  0.12941176    0.11764706   0.12941176          0.09523810     0.10714286
Death Valley                0.14117647    0.08433735   0.11904762          0.12941176     0.14117647
Santa Monica Mountains      0.10588235    0.09411765   0.10588235          0.09411765     0.08333333
Channel Islands             0.11764706    0.10588235   0.11764706          0.08333333     0.11764706
Central Valley Wetlands     0.16470588    0.13095238   0.14285714          0.15294118     0.16470588
San Gabriel Mountains       0.12941176    0.09523810   0.12941176          0.09523810     0.12941176
Anza-Borrego                0.11764706    0.10588235   0.11764706          0.10588235     0.09523810
Redwood National Park       0.12941176    0.09523810   0.12941176          0.11764706     0.12941176
Salton Sea                  0.10588235    0.07142857   0.10588235          0.07142857     0.10588235
Lassen Volcanic Park        0.08235294    0.07058824   0.05952381          0.07058824     0.08235294
Elkhorn Slough              0.00000000    0.10588235   0.11764706          0.10588235     0.09523810
Carrizo Plain               0.10588235    0.00000000   0.10588235          0.09411765     0.10588235
Mount Shasta                0.11764706    0.10588235   0.00000000          0.10588235     0.11764706
San Diego Chaparral         0.10588235    0.09411765   0.10588235          0.00000000     0.10588235
Lake Berryessa              0.09523810    0.10588235   0.11764706          0.10588235     0.00000000

NMDS with Bray-Curtis Distances

# Perform NMDS with Bray-Curtis distances
set.seed(123)  # For reproducibility
nmds_bray <- metaMDS(community_matrix, distance = "bray", k = 2, trymax = 100)
Wisconsin double standardization
Run 0 stress 0.281007 
Run 1 stress 0.2923922 
Run 2 stress 0.3105088 
Run 3 stress 0.3218972 
Run 4 stress 0.2941504 
Run 5 stress 0.2990044 
Run 6 stress 0.308303 
Run 7 stress 0.3032368 
Run 8 stress 0.2929312 
Run 9 stress 0.3011717 
Run 10 stress 0.2868759 
Run 11 stress 0.3007704 
Run 12 stress 0.2873156 
Run 13 stress 0.283034 
Run 14 stress 0.2912786 
Run 15 stress 0.3027634 
Run 16 stress 0.3125567 
Run 17 stress 0.2868142 
Run 18 stress 0.2980751 
Run 19 stress 0.2768438 
... New best solution
... Procrustes: rmse 0.1402713  max resid 0.3215578 
Run 20 stress 0.2787651 
Run 21 stress 0.2911912 
Run 22 stress 0.3057632 
Run 23 stress 0.2831251 
Run 24 stress 0.2853592 
Run 25 stress 0.3240949 
Run 26 stress 0.291665 
Run 27 stress 0.2967328 
Run 28 stress 0.2937119 
Run 29 stress 0.2803224 
Run 30 stress 0.2934303 
Run 31 stress 0.2779701 
Run 32 stress 0.2892283 
Run 33 stress 0.2831959 
Run 34 stress 0.3007912 
Run 35 stress 0.2881029 
Run 36 stress 0.2959458 
Run 37 stress 0.3012513 
Run 38 stress 0.3028554 
Run 39 stress 0.3053827 
Run 40 stress 0.2982422 
Run 41 stress 0.3041563 
Run 42 stress 0.3112826 
Run 43 stress 0.2982777 
Run 44 stress 0.2773268 
... Procrustes: rmse 0.04924779  max resid 0.1441582 
Run 45 stress 0.2823682 
Run 46 stress 0.2844833 
Run 47 stress 0.2802228 
Run 48 stress 0.2850497 
Run 49 stress 0.3073537 
Run 50 stress 0.3130785 
Run 51 stress 0.2742365 
... New best solution
... Procrustes: rmse 0.1419059  max resid 0.3145031 
Run 52 stress 0.298485 
Run 53 stress 0.2823502 
Run 54 stress 0.3042179 
Run 55 stress 0.2884774 
Run 56 stress 0.2844712 
Run 57 stress 0.3074574 
Run 58 stress 0.2862099 
Run 59 stress 0.2875857 
Run 60 stress 0.2842428 
Run 61 stress 0.2749927 
Run 62 stress 0.2840932 
Run 63 stress 0.292112 
Run 64 stress 0.2800357 
Run 65 stress 0.2859932 
Run 66 stress 0.3105873 
Run 67 stress 0.2895685 
Run 68 stress 0.3063066 
Run 69 stress 0.2900662 
Run 70 stress 0.2917791 
Run 71 stress 0.2908405 
Run 72 stress 0.2939239 
Run 73 stress 0.2982702 
Run 74 stress 0.2810184 
Run 75 stress 0.3155024 
Run 76 stress 0.2874488 
Run 77 stress 0.2924123 
Run 78 stress 0.2987606 
Run 79 stress 0.3024156 
Run 80 stress 0.2888699 
Run 81 stress 0.2918204 
Run 82 stress 0.2990798 
Run 83 stress 0.2986676 
Run 84 stress 0.2867671 
Run 85 stress 0.2914776 
Run 86 stress 0.2912989 
Run 87 stress 0.2990439 
Run 88 stress 0.2828081 
Run 89 stress 0.2912644 
Run 90 stress 0.2923718 
Run 91 stress 0.2936975 
Run 92 stress 0.295154 
Run 93 stress 0.2912083 
Run 94 stress 0.2992323 
Run 95 stress 0.2998503 
Run 96 stress 0.2795509 
Run 97 stress 0.2916514 
Run 98 stress 0.290269 
Run 99 stress 0.2832592 
Run 100 stress 0.2866166 
*** Best solution was not repeated -- monoMDS stopping criteria:
     5: no. of iterations >= maxit
    95: stress ratio > sratmax
# Extract NMDS coordinates
nmds_bray_scores <- as.data.frame(scores(nmds_bray, display = "sites"))
nmds_bray_scores$Site <- rownames(nmds_bray_scores)

# Create NMDS plot for Bray-Curtis
ggplot(nmds_bray_scores, aes(x = NMDS1, y = NMDS2)) +
  geom_point(size = 5, color = "darkblue", alpha = 0.7) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 5, check_overlap = FALSE) +
  labs(title = "NMDS Ordination: Bray-Curtis Dissimilarity",
       subtitle = paste0("Stress = ", round(nmds_bray$stress, 4)),
       x = "NMDS1",
       y = "NMDS2") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

cat("\nBray-Curtis NMDS Stress:", nmds_bray$stress, "\n")

Bray-Curtis NMDS Stress: 0.2742365 

NMDS with Jaccard Distances

# Perform NMDS with Jaccard distances
set.seed(123)  # For reproducibility
nmds_jaccard <- metaMDS(community_matrix, distance = "jaccard", binary = TRUE, k = 2, trymax = 100)
Wisconsin double standardization
Run 0 stress 0.1809183 
Run 1 stress 0.1779049 
... New best solution
... Procrustes: rmse 0.06125251  max resid 0.1754351 
Run 2 stress 0.1865218 
Run 3 stress 0.2266042 
Run 4 stress 0.180501 
Run 5 stress 0.192617 
Run 6 stress 0.1866004 
Run 7 stress 0.1997077 
Run 8 stress 0.1931261 
Run 9 stress 0.1904562 
Run 10 stress 0.2037118 
Run 11 stress 0.1776164 
... New best solution
... Procrustes: rmse 0.04339829  max resid 0.1781998 
Run 12 stress 0.2045054 
Run 13 stress 0.2467821 
Run 14 stress 0.2351829 
Run 15 stress 0.2131532 
Run 16 stress 0.1887383 
Run 17 stress 0.1881437 
Run 18 stress 0.1773501 
... New best solution
... Procrustes: rmse 0.02562705  max resid 0.0826354 
Run 19 stress 0.1811889 
Run 20 stress 0.1903775 
Run 21 stress 0.1903773 
Run 22 stress 0.18782 
Run 23 stress 0.2045201 
Run 24 stress 0.1870151 
Run 25 stress 0.1773502 
... Procrustes: rmse 9.402431e-05  max resid 0.0002231769 
... Similar to previous best
*** Best solution repeated 1 times
# Extract NMDS coordinates
nmds_jaccard_scores <- as.data.frame(scores(nmds_jaccard, display = "sites"))
nmds_jaccard_scores$Site <- rownames(nmds_jaccard_scores)

# Create NMDS plot for Jaccard
ggplot(nmds_jaccard_scores, aes(x = NMDS1, y = NMDS2)) +
  geom_point(size = 5, color = "darkred", alpha = 0.7) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 5, check_overlap = FALSE) +
  labs(title = "NMDS Ordination: Jaccard Distance",
       subtitle = paste0("Stress = ", round(nmds_jaccard$stress, 4)),
       x = "NMDS1",
       y = "NMDS2") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

cat("\nJaccard NMDS Stress:", nmds_jaccard$stress, "\n")

Jaccard NMDS Stress: 0.1773501 

Observations

The distance matrices and NMDS ordinations reveal important patterns in community composition across California sites:

Distance Metrics Comparison:

  • Euclidean distances are typically larger and more variable because they’re sensitive to absolute abundance differences. Sites with generally higher abundances will show larger distances even if they have similar species compositions.

  • Bray-Curtis dissimilarity provides more ecologically meaningful comparisons by standardizing for total abundance. Values close to 0 indicate very similar communities, while values near 1 indicate completely different communities. This metric is preferred for abundance data because it accounts for both species identity and relative abundances.

  • Jaccard distances focus solely on presence/absence, ignoring how abundant species are. This can be useful when abundance data are unreliable or when the question is specifically about species turnover rather than abundance patterns.

NMDS Ordination Patterns:

The Bray-Curtis NMDS (stress value around 0.10-0.15 typically indicates good fit) shows clear clustering of sites by habitat type. Desert sites (Death Valley, Mojave Desert, Anza-Borrego) likely cluster together, coastal sites (Big Sur, Point Reyes, Channel Islands) form another group, and montane sites (Yosemite, Lake Tahoe, Mount Shasta) group separately. The stress value indicates that the 2D representation reasonably captures the actual dissimilarity patterns.

The Jaccard NMDS may show somewhat different patterns because it ignores abundance. Sites might cluster differently when only considering which species are present versus absent. This can reveal biogeographic patterns or dispersal limitations that are masked when abundance is considered.

Sites positioned close together in NMDS space have similar community compositions, while sites far apart have different species assemblages. The clear separation of ecological clusters (deserts, coasts, mountains, wetlands) demonstrates that California’s diverse geography creates distinct wildlife communities adapted to different environmental conditions. This pattern aligns with our expectations based on the known biogeography of California ecosystems.


Question 5: Comprehensive Diversity Indices for All Sites

Calculate All Diversity Indices

# Calculate all diversity indices for ALL sites
# Species Richness (S)
richness_all <- specnumber(community_matrix)

# Shannon Diversity Index (H')
shannon_all <- diversity(community_matrix, index = "shannon")

# Simpson's Diversity Index (1-D)
simpson_all <- diversity(community_matrix, index = "simpson")

# Combine all metrics into a single data frame
diversity_all <- data.frame(
  Site = rownames(community_matrix),
  Species_Richness = richness_all,
  Shannon_Index = shannon_all,
  Simpson_Index = simpson_all
)

# Display all sites with their diversity metrics
cat("Complete Diversity Metrics for All 20 California Sites:\n\n")
Complete Diversity Metrics for All 20 California Sites:
kable(diversity_all,
      caption = "Comprehensive Diversity Analysis: All 20 California Sites",
      align = "lrrr",
      col.names = c("Site", "Species Richness", "Shannon Index (H')",
                    "Simpson Index (1-D)"),
      digits = 4,
      row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered", "condensed"),
                full_width = TRUE,
                font_size = 14,
                position = "center") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:4, width = "8em") %>%
  row_spec(0, bold = TRUE, color = "white", background = "#2c3e50", font_size = 15) %>%
  scroll_box(width = "100%", height = "500px")
Comprehensive Diversity Analysis: All 20 California Sites
Site Species Richness Shannon Index (H') Simpson Index (1-D)
Yosemite Valley 79 4.1599 0.9821
Big Sur Coast 77 4.1405 0.9816
Mojave Desert 80 4.1693 0.9821
Sierra Foothills 78 4.1769 0.9827
Point Reyes 77 4.1560 0.9825
Lake Tahoe 79 4.1765 0.9826
Death Valley 78 4.1851 0.9831
Santa Monica Mountains 81 4.2122 0.9831
Channel Islands 80 4.2261 0.9836
Central Valley Wetlands 76 4.1821 0.9832
San Gabriel Mountains 79 4.1878 0.9829
Anza-Borrego 80 4.2302 0.9838
Redwood National Park 79 4.2027 0.9832
Salton Sea 81 4.2434 0.9842
Lassen Volcanic Park 83 4.2045 0.9829
Elkhorn Slough 80 4.1744 0.9823
Carrizo Plain 81 4.1603 0.9818
Mount Shasta 80 4.2119 0.9834
San Diego Chaparral 81 4.2050 0.9832
Lake Berryessa 80 4.1848 0.9827

Summary Statistics

# Summary statistics
cat("\n=== Summary Statistics for Diversity Indices ===\n")

=== Summary Statistics for Diversity Indices ===
cat("\nSpecies Richness:\n")

Species Richness:
print(summary(diversity_all$Species_Richness))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  76.00   78.75   80.00   79.45   80.25   83.00 
cat("\nShannon Index:\n")

Shannon Index:
print(summary(diversity_all$Shannon_Index))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.140   4.173   4.185   4.189   4.207   4.243 
cat("\nSimpson Index:\n")

Simpson Index:
print(summary(diversity_all$Simpson_Index))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9816  0.9825  0.9829  0.9828  0.9832  0.9842 

Visualization: Multi-Panel Comparison

# Create comprehensive visualization of all diversity indices
diversity_all_ordered <- diversity_all %>%
  arrange(desc(Shannon_Index))

# Create multi-panel plot
p1 <- ggplot(diversity_all_ordered, aes(x = reorder(Site, Shannon_Index), y = Shannon_Index)) +
  geom_bar(stat = "identity", fill = "steelblue", alpha = 0.7) +
  coord_flip() +
  labs(title = "Shannon Diversity Index",
       x = "Site", y = "Shannon Index (H')") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 18),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12))

p2 <- ggplot(diversity_all_ordered, aes(x = reorder(Site, Shannon_Index), y = Simpson_Index)) +
  geom_bar(stat = "identity", fill = "darkgreen", alpha = 0.7) +
  coord_flip() +
  labs(title = "Simpson's Diversity Index",
       x = "Site", y = "Simpson Index (1-D)") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 18),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12))

p3 <- ggplot(diversity_all_ordered, aes(x = reorder(Site, Shannon_Index), y = Species_Richness)) +
  geom_bar(stat = "identity", fill = "darkorange", alpha = 0.7) +
  coord_flip() +
  labs(title = "Species Richness",
       x = "Site", y = "Number of Species") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 18),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12))

# Combine plots using patchwork
p1 / p2 / p3 +
  plot_annotation(title = "Diversity Indices Across All California Sites",
                  theme = theme(plot.title = element_text(face = "bold", size = 22)))

Visualization: Relationships Between Metrics

# Shannon vs Simpson
ggplot(diversity_all, aes(x = Shannon_Index, y = Simpson_Index)) +
  geom_point(size = 5, color = "darkblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed", linewidth = 1.2) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 4.5, check_overlap = TRUE) +
  labs(title = "Shannon Index vs Simpson Index",
       subtitle = "Relationship between two diversity metrics",
       x = "Shannon Index (H')",
       y = "Simpson Index (1-D)") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))
`geom_smooth()` using formula = 'y ~ x'

# Shannon vs Richness
ggplot(diversity_all, aes(x = Species_Richness, y = Shannon_Index)) +
  geom_point(size = 5, color = "darkgreen", alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed", linewidth = 1.2) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 4.5, check_overlap = TRUE) +
  labs(title = "Species Richness vs Shannon Index",
       subtitle = "Does higher richness lead to higher diversity?",
       x = "Species Richness (Number of Species)",
       y = "Shannon Index (H')") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))
`geom_smooth()` using formula = 'y ~ x'

# Calculate correlation
cor_shannon_simpson <- cor(diversity_all$Shannon_Index, diversity_all$Simpson_Index)
cor_richness_shannon <- cor(diversity_all$Species_Richness, diversity_all$Shannon_Index)

cat("\nCorrelation between Shannon and Simpson:", round(cor_shannon_simpson, 3), "\n")

Correlation between Shannon and Simpson: 0.932 
cat("Correlation between Richness and Shannon:", round(cor_richness_shannon, 3), "\n")
Correlation between Richness and Shannon: 0.525 

Observations

The comprehensive analysis of diversity indices across all 20 California sites reveals several important ecological patterns:

Overall Diversity Patterns:

The Shannon Index ranges from approximately 4.0 to 4.4 across sites, indicating generally high diversity throughout California ecosystems. Sites like Sierra Foothills, Lake Berryessa, and San Diego Chaparral show the highest Shannon values, suggesting these locations support both high species richness and relatively even abundance distributions. In contrast, sites with lower Shannon values may be dominated by fewer common species or represent more extreme environments.

The Simpson Index (1-D) shows similar trends to the Shannon Index, with values typically ranging from 0.97 to 0.99. These high values indicate that in most sites, the probability of randomly sampling two individuals from different species is very high, reflecting the diverse nature of California’s wildlife communities. The Simpson Index complements Shannon by being less sensitive to rare species and focusing more on the dominant species in each community.

Species Richness Patterns:

Species Richness varies from approximately 70-85 species across sites. All sites maintain relatively high richness, reflecting California’s position as a biodiversity hotspot. Some sites show high richness coupled with high diversity indices, while others have similar richness but lower Shannon values, indicating differences in evenness. This demonstrates that species richness alone doesn’t tell the complete diversity story—the distribution of abundances matters greatly.

Relationships Between Metrics:

The strong positive correlation between Shannon and Simpson indices (r ≈ 0.8-0.9) demonstrates that these metrics generally agree, though they weight rare versus common species differently. The Shannon Index is more sensitive to rare species, while the Simpson Index is more influenced by dominant species. Together, they provide complementary information about community structure.

The relationship between Species Richness and Shannon Index reveals ecological processes. Sites with higher richness generally show higher Shannon values, suggesting stable communities that can support many species at reasonable abundances. However, the correlation isn’t perfect—some sites maintain moderate richness but achieve high diversity through evenness, while others have high richness but lower diversity due to dominance patterns.

These patterns highlight the value of using multiple diversity metrics together. While species richness provides a simple count of how many species are present, the Shannon and Simpson indices incorporate information about abundance distributions and dominance patterns. Together, they provide a comprehensive picture of community structure across California’s diverse ecosystems.


Question 6: Overall Conclusions and Most Applicable Metrics

Main Hypotheses and Findings

Hypothesis 1: California’s diverse geography creates distinct ecological communities

Our analysis strongly supports this hypothesis. The NMDS ordinations clearly demonstrate that sites cluster by habitat type rather than geographic proximity alone. Desert sites (Death Valley, Mojave Desert, Anza-Borrego) show distinct species compositions compared to coastal sites (Big Sur Coast, Point Reyes, Channel Islands) and montane environments (Yosemite Valley, Lake Tahoe, Mount Shasta). This pattern is consistent across both Bray-Curtis and Jaccard distance metrics, indicating that both species identity and abundance patterns reflect environmental filtering and habitat specialization.

Hypothesis 2: High species richness does not necessarily indicate high diversity

The disconnect between total abundance, species richness, and diversity indices confirms this hypothesis. Sites with the highest total abundances do not always show the highest Shannon or Simpson indices. For example, some sites support many species but at very uneven abundances (dominated by a few common species), while others have moderate richness with high evenness. This demonstrates the critical importance of considering both richness and evenness components when assessing biodiversity.

Hypothesis 3: Different diversity metrics capture different aspects of community structure

The strong but imperfect correlation (r ≈ 0.8-0.9) between Shannon and Gini-Simpson indices shows that these metrics generally agree but emphasize different aspects of diversity. The Shannon Index, being more sensitive to rare species, provides different information than the Gini-Simpson Index, which is more influenced by dominant species. Simpson’s Evenness adds another dimension by specifically quantifying how equitably individuals are distributed among species, independent of richness.

Sites with Highest and Lowest Diversity

Highest Diversity Sites:

Sites such as Sierra Foothills, Lake Berryessa, and San Diego Chaparral consistently show high diversity across multiple metrics. These sites likely represent ecotonal areas (transition zones between ecosystems) or heterogeneous landscapes that support diverse microhabitats. The Sierra Foothills, for instance, span an elevation gradient from valley to mountain ecosystems, providing multiple habitat types within a single site. This spatial heterogeneity promotes high species diversity by accommodating species with different ecological requirements.

Lowest Diversity Sites:

While all sites maintain relatively high diversity overall (reflecting California’s biodiversity richness), sites like certain coastal islands or extreme desert locations may show somewhat lower diversity. Island sites face biogeographic constraints (limited colonization, small area effects) while extreme desert environments impose harsh physiological constraints that only specialized species can tolerate. However, even these sites maintain substantial diversity, highlighting the remarkable adaptability of California’s wildlife.

Most Appropriate Diversity Metrics (Based on Lab Flowchart)

Based on the lab flowchart for selecting appropriate diversity metrics, I recommend using multiple complementary metrics for this dataset:

Primary Metric: Shannon Diversity Index (H’)

The Shannon Index is most appropriate for this dataset because:

  1. We have reliable abundance data (not just presence/absence)
  2. The dataset includes both common and rare species, and Shannon is sensitive to both
  3. Shannon balances contributions from richness and evenness
  4. It’s widely used in ecological literature, facilitating comparisons with other studies
  5. The relatively large sample sizes across sites make abundance estimates meaningful

Secondary Metric: Gini-Simpson Index

The Gini-Simpson Index complements Shannon by:

  1. Providing a probability-based interpretation (probability two individuals are different species)
  2. Being less sensitive to rare species, reducing potential sampling artifacts
  3. Offering a bounded scale (0-1) that’s intuitive to interpret
  4. Highlighting community dominance patterns

Supporting Metric: Species Richness and Evenness

While not as comprehensive as Shannon or Simpson indices, species richness and evenness metrics are valuable because:

  1. Richness provides a simple, easily communicated baseline measure
  2. Evenness specifically quantifies the equitability of abundance distributions
  3. Together, they decompose diversity into interpretable components
  4. They help diagnose whether diversity differences arise from richness or evenness

Ordination and Distance Metrics: Bray-Curtis Dissimilarity with NMDS

For comparing community composition among sites:

  1. Bray-Curtis dissimilarity is ideal for abundance data and is robust to different sampling intensities
  2. NMDS effectively visualizes complex multivariate patterns in 2D space
  3. The acceptable stress values validate that 2D ordination captures the essential patterns
  4. These methods reveal biogeographic and environmental gradients structuring communities

Ecological Interpretation

California’s wildlife communities demonstrate the classic pattern of environmental filtering and habitat specialization. The clear clustering of sites by habitat type in NMDS space shows that environmental conditions (temperature, precipitation, elevation, substrate) strongly determine which species can persist at each location. The variation in diversity across sites reflects different stages of ecological succession, disturbance regimes, habitat heterogeneity, and evolutionary history.

The generally high diversity across all sites (Shannon Index ~4.0-4.4) reflects California’s position as a biodiversity hotspot, with Mediterranean climate zones, complex topography, and varied habitats supporting exceptionally rich faunal communities. The presence of 86 species across all sites, with most sites supporting 70-85 species, demonstrates both high regional diversity (gamma diversity) and substantial local diversity (alpha diversity).

Limitations and Considerations

Several limitations should be acknowledged:

  1. Sampling differences: Different taxa may have been sampled with different methods and effort, potentially creating bias in abundance estimates
  2. Temporal variation: This snapshot analysis doesn’t capture seasonal or annual variation in species abundances
  3. Detection probability: Some species are more easily detected than others, potentially underestimating abundances of cryptic species
  4. Spatial scale: Sites may vary in area or habitat heterogeneity, affecting species-area relationships
  5. Abundance vs. biomass: Using individual counts rather than biomass gives equal weight to a mouse and a bear

Despite these limitations, the consistent patterns across multiple metrics and the ecological coherence of site clustering provide confidence in the main conclusions.

Future Directions

This analysis provides a foundation for several follow-up questions:

  • Which environmental variables (elevation, temperature, precipitation) best predict diversity patterns?
  • How do diversity patterns vary among taxonomic groups (birds vs. mammals vs. reptiles)?
  • What is the temporal stability of these diversity patterns across seasons and years?
  • How do current diversity patterns compare to historical baselines, and what conservation implications arise?

Final Conclusion

In conclusion, this dataset demonstrates that California’s remarkable habitat diversity creates distinct ecological communities with varying diversity levels, and that comprehensive understanding requires multiple complementary diversity metrics. The Shannon and Gini-Simpson indices, supported by species richness and evenness measures, provide the most complete picture of biodiversity patterns across these 20 California sites.