Analyzing Biodiversity Metrics

Author

Lani Elise Thompson

Published

November 25, 2025

Question 1: Setup, Load, and Explore Dataset

Load Dataset

species_data <- read_csv("california_species_abundances.csv")
Rows: 85 Columns: 21
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Species
dbl (20): Yosemite Valley, Big Sur Coast, Mojave Desert, Sierra Foothills, Point Reyes, Lake Tahoe, Death Valley, Sa...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
kable(head(species_data, 10),
      caption = "First 10 Species in the California Wildlife Dataset",
      align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14,
                position = "left")
First 10 Species in the California Wildlife Dataset
Species Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
California Quail 5 0 4 8 9 5 6 7 10 8 1 3 4 2 2 2 4 1 6 9
American Kestrel 8 9 0 9 5 5 2 3 2 7 7 1 2 4 4 3 4 8 10 6
Western Bluebird 6 2 4 7 2 11 8 4 0 9 1 6 9 7 1 12 10 13 7 6
Acorn Woodpecker 3 11 4 6 3 2 8 2 3 5 8 3 8 5 10 0 5 11 4 6
Red-tailed Hawk 4 3 7 0 2 3 4 2 12 10 1 5 12 1 1 7 12 6 5 9
Great Egret 3 4 2 12 9 3 0 1 8 0 4 3 4 5 9 6 0 7 3 3
Snowy Plover 2 10 8 10 5 7 4 3 9 11 0 4 7 2 13 4 11 6 0 3
Peregrine Falcon 7 12 3 1 1 5 7 3 4 1 5 8 7 2 4 10 10 13 11 6
Northern Flicker 3 8 12 7 6 0 11 3 7 5 6 8 2 7 6 1 4 10 0 9
Western Meadowlark 3 4 2 7 5 3 0 2 1 9 1 3 10 10 1 6 1 0 8 2

Explore Dataset Structure

cat("Dataset dimensions:", dim(species_data), "\n")
Dataset dimensions: 85 21 
cat("Number of species:", nrow(species_data), "\n")
Number of species: 85 
cat("Number of sites:", ncol(species_data) - 1, "\n")
Number of sites: 20 
summary(species_data)
   Species          Yosemite Valley  Big Sur Coast    Mojave Desert    Sierra Foothills  Point Reyes    
 Length:85          Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 Class :character   1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 3.000   1st Qu.: 2.000  
 Mode  :character   Median : 4.000   Median : 4.000   Median : 4.000   Median : 5.000   Median : 5.000  
                    Mean   : 5.024   Mean   : 4.788   Mean   : 5.271   Mean   : 5.306   Mean   : 5.106  
                    3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 8.000  
                    Max.   :14.000   Max.   :14.000   Max.   :15.000   Max.   :15.000   Max.   :13.000  
   Lake Tahoe      Death Valley    Santa Monica Mountains Channel Islands  Central Valley Wetlands
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000         Min.   : 0.000   Min.   : 0.000         
 1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.: 2.000         1st Qu.: 3.000   1st Qu.: 3.000         
 Median : 5.000   Median : 5.000   Median : 4.000         Median : 5.000   Median : 5.000         
 Mean   : 5.318   Mean   : 5.388   Mean   : 4.553         Mean   : 5.141   Mean   : 5.318         
 3rd Qu.: 8.000   3rd Qu.: 8.000   3rd Qu.: 7.000         3rd Qu.: 7.000   3rd Qu.: 8.000         
 Max.   :15.000   Max.   :13.000   Max.   :12.000         Max.   :17.000   Max.   :13.000         
 San Gabriel Mountains  Anza-Borrego  Redwood National Park   Salton Sea     Lassen Volcanic Park Elkhorn Slough  
 Min.   : 0.000        Min.   : 0.0   Min.   : 0.000        Min.   : 0.000   Min.   : 0.000       Min.   : 0.000  
 1st Qu.: 2.000        1st Qu.: 3.0   1st Qu.: 2.000        1st Qu.: 3.000   1st Qu.: 2.000       1st Qu.: 2.000  
 Median : 4.000        Median : 5.0   Median : 5.000        Median : 6.000   Median : 4.000       Median : 4.000  
 Mean   : 4.482        Mean   : 4.8   Mean   : 5.424        Mean   : 5.541   Mean   : 5.424       Mean   : 5.012  
 3rd Qu.: 6.000        3rd Qu.: 7.0   3rd Qu.: 8.000        3rd Qu.: 8.000   3rd Qu.: 8.000       3rd Qu.: 7.000  
 Max.   :12.000        Max.   :13.0   Max.   :14.000        Max.   :13.000   Max.   :14.000       Max.   :14.000  
 Carrizo Plain     Mount Shasta    San Diego Chaparral Lake Berryessa  
 Min.   : 0.000   Min.   : 0.000   Min.   : 0.000      Min.   : 0.000  
 1st Qu.: 2.000   1st Qu.: 3.000   1st Qu.: 3.000      1st Qu.: 3.000  
 Median : 4.000   Median : 5.000   Median : 5.000      Median : 5.000  
 Mean   : 5.353   Mean   : 5.635   Mean   : 5.294      Mean   : 5.424  
 3rd Qu.: 8.000   3rd Qu.: 8.000   3rd Qu.: 8.000      3rd Qu.: 8.000  
 Max.   :16.000   Max.   :16.000   Max.   :13.000      Max.   :14.000  

Exploratory Visualizations

Histogram of Species Abundances

species_long_temp <- species_data %>%
  pivot_longer(cols = -Species, names_to = "Site", values_to = "Abundance")

ggplot(species_long_temp, aes(x = Abundance)) +
  geom_histogram(bins = 30, fill = "steelblue", color = "black", alpha = 0.7) +
  labs(title = "Distribution of Species Abundances Across All Sites",
       x = "Abundance Count",
       y = "Frequency") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

Total Abundance by Site

# Calculate total abundance per site
site_totals <- species_data %>%
  pivot_longer(cols = -Species, names_to = "Site", values_to = "Abundance") %>%
  group_by(Site) %>%
  summarize(Total_Abundance = sum(Abundance)) %>%
  arrange(desc(Total_Abundance))

# Bar plot of total abundance per site
ggplot(site_totals, aes(x = reorder(Site, Total_Abundance), y = Total_Abundance)) +
  geom_bar(stat = "identity", fill = "forestgreen", alpha = 0.7) +
  coord_flip() +
  labs(title = "Total Species Abundance by California Site",
       x = "Site",
       y = "Total Abundance Count") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14),
        axis.text.y = element_text(size = 13))

Species Richness by Site

# Calculate species richness (number of species present) per site
species_richness <- species_data %>%
  pivot_longer(cols = -Species, names_to = "Site", values_to = "Abundance") %>%
  filter(Abundance > 0) %>%
  group_by(Site) %>%
  summarize(Species_Richness = n()) %>%
  arrange(desc(Species_Richness))

# Bar plot of species richness per site
ggplot(species_richness, aes(x = reorder(Site, Species_Richness), y = Species_Richness)) +
  geom_bar(stat = "identity", fill = "darkorange", alpha = 0.7) +
  coord_flip() +
  labs(title = "Species Richness by California Site",
       subtitle = "Number of Species Recorded at Each Location",
       x = "Site",
       y = "Number of Species") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14),
        axis.text.y = element_text(size = 13))

Species Richness vs Total Abundance

# Create a comparison plot
comparison_data <- site_totals %>%
  left_join(species_richness, by = "Site")

ggplot(comparison_data, aes(x = Species_Richness, y = Total_Abundance)) +
  geom_point(size = 5, color = "darkblue", alpha = 0.7) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 4.5, check_overlap = TRUE) +
  labs(title = "Species Richness vs Total Abundance",
       subtitle = "Relationship between diversity and abundance across California sites",
       x = "Species Richness (Number of Species)",
       y = "Total Abundance") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

Data Description

This dataset includes abundance records for 86 species collected from 20 sites across California. Each row corresponds to a species and each column represents a sampling location. The sites span a wide range of ecosystems from coastal regions like Big Sur, Point Reyes, and the Channel Islands, to desert environments like Death Valley, Mojave Desert, and Anza-Borrego. Several high elevation areas like Yosemite Valley, Mount Shasta, and Larsen Volcanic Park contrast with wetlands like the Salton Sea, Central Valley Wetlands, and Elkhorn Slough. Overall the 20 sites span a wide range of different California habitat types. The species represented within these sites are equally diverse. The full dataset spans a wide range of California wildlife including birds, reptiles, amphibians and fish. However the subset used in this analysis (only the first 10 species) is all birds including the California Quail, American Kestrel, Western Bluebird, Acorn Woodpecker, Red-tailed Hawk, Great Egret, Snowy Plover, Peregrine Falcon, Northern Flicker, and the Western Meadowlark. Looking for trends within this dataset I noticed mainly that species abundances vary widely across sites. Some species, such as the Western Bluebird and Snowy Plover show relatively high counts at multiple sites, while others have patchier distributions. This being said no species is consistently abundant across all sites, and a few sites show many low or zero values potentially indicating unsuitable habitat or low detectability. The data constituting this dataset comes from a combination of wildlife survey methods including direct observation, tracking, tagging, camera traps, and other standardized techniques to record how many individuals of each species are present at each site. The dataset looks to be well-structured but needed to be reformatted for analysis with the vegan package. Specifically it needs to be converted from wide format to long format for some of the analyses, and checked to ensure there are no missing values or data quality issues that could affect diversity calculations.


Question 2: Clean and Wrangle Data

Create Long Format Dataset

species_long <- species_data %>%
  pivot_longer(cols = -Species,
               names_to = "Site",
               values_to = "Abundance") %>%
  filter(!is.na(Abundance))  # Remove any NA values

cat("Long format dimensions:", dim(species_long), "\n\n")
Long format dimensions: 1700 3 
kable(head(species_long, 15),
      caption = "Long Format Data Structure (First 15 Rows)",
      align = "lrr",
      col.names = c("Species", "Site", "Abundance")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = FALSE,
                font_size = 14,
                position = "left")
Long Format Data Structure (First 15 Rows)
Species Site Abundance
California Quail Yosemite Valley 5
California Quail Big Sur Coast 0
California Quail Mojave Desert 4
California Quail Sierra Foothills 8
California Quail Point Reyes 9
California Quail Lake Tahoe 5
California Quail Death Valley 6
California Quail Santa Monica Mountains 7
California Quail Channel Islands 10
California Quail Central Valley Wetlands 8
California Quail San Gabriel Mountains 1
California Quail Anza-Borrego 3
California Quail Redwood National Park 4
California Quail Salton Sea 2
California Quail Lassen Volcanic Park 2

Why this step: I converted the dataset to long format because it makes the analysis much easier. Instead of having each site as its own column, the long format gives one row per species-site combination, with separate columns for the site name and the abundance value. I used pivot_longer() to reshape the data. I also removed any NA values so the dataset is clean before moving on to the rest of the analysis. 

Create Wide Format Dataset

species_wide <- species_long %>%
  pivot_wider(names_from = Site,
              values_from = Abundance,
              values_fill = 0)

cat("Wide format dimensions:", dim(species_wide), "\n\n")
Wide format dimensions: 85 21 
kable(head(species_wide[, 1:8], 10),
      caption = "Wide Format Data Structure (First 10 Species, First 8 Sites)",
      align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE,
                font_size = 13,
                position = "left") %>%
  scroll_box(width = "100%")
Wide Format Data Structure (First 10 Species, First 8 Sites)
Species Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe Death Valley
California Quail 5 0 4 8 9 5 6
American Kestrel 8 9 0 9 5 5 2
Western Bluebird 6 2 4 7 2 11 8
Acorn Woodpecker 3 11 4 6 3 2 8
Red-tailed Hawk 4 3 7 0 2 3 4
Great Egret 3 4 2 12 9 3 0
Snowy Plover 2 10 8 10 5 7 4
Peregrine Falcon 7 12 3 1 1 5 7
Northern Flicker 3 8 12 7 6 0 11
Western Meadowlark 3 4 2 7 5 3 0

Why this step: I kept a wide version of the dataset because it preserves the original layout while making sure everything is clean and consistent. In this format, each species strays as a row and each site is a column. I used values_fill = 0 to replace any missing values with zeros (since a blank cell usually means the species wasn’t observed). 

Create Community Data Matrix

community_matrix <- species_wide %>%
  select(-Species) %>%              
  t() %>%                           
  as.data.frame()                  

colnames(community_matrix) <- species_wide$Species

rownames(community_matrix) <- colnames(species_wide)[-1]  # Exclude "Species" column

cat("Community matrix dimensions:", dim(community_matrix), "\n")
Community matrix dimensions: 20 85 
cat("Sites (rows):", nrow(community_matrix), "\n")
Sites (rows): 20 
cat("Species (columns):", ncol(community_matrix), "\n\n")
Species (columns): 85 
kable(community_matrix[1:10, 1:8],
      caption = "Community Data Matrix (First 10 Sites, First 8 Species)",
      align = "r") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE,
                font_size = 13,
                position = "left") %>%
  scroll_box(width = "100%", height = "400px")
Community Data Matrix (First 10 Sites, First 8 Species)
California Quail American Kestrel Western Bluebird Acorn Woodpecker Red-tailed Hawk Great Egret Snowy Plover Peregrine Falcon
Yosemite Valley 5 8 6 3 4 3 2 7
Big Sur Coast 0 9 2 11 3 4 10 12
Mojave Desert 4 0 4 4 7 2 8 3
Sierra Foothills 8 9 7 6 0 12 10 1
Point Reyes 9 5 2 3 2 9 5 1
Lake Tahoe 5 5 11 2 3 3 7 5
Death Valley 6 2 8 8 4 0 4 7
Santa Monica Mountains 7 3 4 2 2 1 3 3
Channel Islands 10 2 0 3 12 8 9 4
Central Valley Wetlands 8 7 9 5 10 0 11 1

Why this step: I created the community data matrix because it’s the format the vegan package needs for running diversity analyses. Vegan expects sites as rows and species as columns, which is the opposite of how the dataset originally started. In this matrix, each cell shows the abundance of a given species at a given site. 


Question 3: Calculate Diversity Metrics for 3 Sites

Selected Sites

selected_sites <- c("Lake Tahoe", "Death Valley", "Channel Islands")

Species Richness

richness_selected <- data.frame(
  Site = selected_sites,
  Species_Richness = sapply(selected_sites, function(site) {
    sum(community_matrix[site, ] > 0)
  })
)

cat("Species Richness for Selected Sites:\n\n")
Species Richness for Selected Sites:
kable(richness_selected,
      caption = "Species Richness for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Species Richness")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Species Richness for Three Selected California Sites
Site Species Richness
Lake Tahoe Lake Tahoe 79
Death Valley Death Valley 78
Channel Islands Channel Islands 80

Shannon Diversity Index

shannon_selected <- data.frame(
  Site = selected_sites,
  Shannon_Index = sapply(selected_sites, function(site) {
    diversity(community_matrix[site, ], index = "shannon")
  })
)

cat("Shannon Diversity Index (H') for Selected Sites:\n\n")
Shannon Diversity Index (H') for Selected Sites:
kable(shannon_selected,
      caption = "Shannon Diversity Index for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Shannon Index (H')"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Shannon Diversity Index for Three Selected California Sites
Site Shannon Index (H')
Lake Tahoe Lake Tahoe 4.1765
Death Valley Death Valley 4.1851
Channel Islands Channel Islands 4.2261

Simpson’s Diversity Index

simpson_selected <- data.frame(
  Site = selected_sites,
  Simpson_Index = sapply(selected_sites, function(site) {
    diversity(community_matrix[site, ], index = "simpson")
  })
)

cat("Simpson's Diversity Index (D) for Selected Sites:\n\n")
Simpson's Diversity Index (D) for Selected Sites:
kable(simpson_selected,
      caption = "Simpson's Diversity Index (Gini-Simpson) for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Simpson Index (1-D)"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Simpson's Diversity Index (Gini-Simpson) for Three Selected California Sites
Site Simpson Index (1-D)
Lake Tahoe Lake Tahoe 0.9826
Death Valley Death Valley 0.9831
Channel Islands Channel Islands 0.9836
simpson_original <- data.frame(
  Site = selected_sites,
  Simpson_Dominance = sapply(selected_sites, function(site) {
    diversity(community_matrix[site, ], index = "invsimpson")
  })
)

cat("\n\nInverse Simpson's Index (1/D) for Selected Sites:\n\n")


Inverse Simpson's Index (1/D) for Selected Sites:
kable(simpson_original,
      caption = "Inverse Simpson's Index for Three Selected California Sites",
      align = "lr",
      col.names = c("Site", "Inverse Simpson (1/D)"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover"),
                full_width = FALSE,
                font_size = 16,
                position = "left") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2, width = "10em")
Inverse Simpson's Index for Three Selected California Sites
Site Inverse Simpson (1/D)
Lake Tahoe Lake Tahoe 57.6153
Death Valley Death Valley 59.1885
Channel Islands Channel Islands 60.9930

Combined Comparison Table

diversity_comparison <- richness_selected %>%
  left_join(shannon_selected, by = "Site") %>%
  left_join(simpson_selected, by = "Site")

cat("\nCombined Diversity Metrics for All Three Sites:\n\n")

Combined Diversity Metrics for All Three Sites:
kable(diversity_comparison,
      caption = "Combined Diversity Metrics: Comparison Across Three Sites",
      align = "lrrr",
      col.names = c("Site", "Species Richness", "Shannon Index (H')", "Simpson Index (1-D)"),
      digits = 4) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered"),
                full_width = FALSE,
                font_size = 16,
                position = "center") %>%
  column_spec(1, bold = TRUE, width = "15em") %>%
  column_spec(2:4, width = "10em") %>%
  row_spec(0, bold = TRUE, color = "white", background = "#3498db")
Combined Diversity Metrics: Comparison Across Three Sites
Site Species Richness Shannon Index (H') Simpson Index (1-D)
Lake Tahoe 79 4.1765 0.9826
Death Valley 78 4.1851 0.9831
Channel Islands 80 4.2261 0.9836

Visualization: Comparison of Metrics

diversity_long_comparison <- diversity_comparison %>%
  pivot_longer(cols = -Site,
               names_to = "Metric",
               values_to = "Value")

ggplot(diversity_long_comparison, aes(x = Site, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.7) +
  facet_wrap(~Metric, scales = "free_y") +
  labs(title = "Comparison of Diversity Metrics Across Three California Sites",
       subtitle = "Species Richness, Shannon Index, and Simpson's Index",
       x = "Site",
       y = "Index Value") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 13),
        axis.text.x = element_text(angle = 45, hjust = 1, size = 14),
        strip.text = element_text(size = 15, face = "bold"),
        legend.position = "none")

Observations and Interpretation

The diversity metrics calculated for the three California sites show slight differences in species diversity and abundance distribution. Species richness, which simply counts the number of species present, was highest at Channel Islands (80 species), followed by Lake Tahoe (79) and Death Valley (78). The Shannon Diversity index (H′), which accounts for both the number of species and how evenly individuals are distributed among them, was also highest at Channel Islands (4.2261), indicating not only slightly higher richness but fairly even abundances. Death Valley (4.1851) and Lake Tahoe (4.1765) were slightly lower. The Simpson Diversity index (1 – D) measures the probability that two randomly chosen individuals belong to different species, emphasizing the evenness of abundances. All sites had values near 1 (Channel Islands: 0.9836; Death Valley: 0.9831; Lake Tahoe: 0.9826), reflecting very even species distributions. Overall, all three metrics consistently rank Channel Islands as the most diverse site and Lake Tahoe as the least, though differences are small, indicating that all sites support similarly diverse communities.


Question 4: Distance Metrics and NMDS

Euclidean Distance Matrix

euclidean_dist <- vegdist(community_matrix, method = "euclidean")


cat("Euclidean Distance Matrix (All 20 Sites):\n")
Euclidean Distance Matrix (All 20 Sites):
print(as.matrix(euclidean_dist))
                        Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe
Yosemite Valley                 0.00000      43.49713      52.71622         43.61192    48.25971   48.61070
Big Sur Coast                  43.49713       0.00000      45.90207         50.21952    47.44470   49.48737
Mojave Desert                  52.71622      45.90207       0.00000         45.26588    49.55805   48.51804
Sierra Foothills               43.61192      50.21952      45.26588          0.00000    45.90207   46.27094
Point Reyes                    48.25971      47.44470      49.55805         45.90207     0.00000   48.31149
Lake Tahoe                     48.61070      49.48737      48.51804         46.27094    48.31149    0.00000
Death Valley                   48.13523      48.38388      44.33960         45.02222    48.39421   45.36518
Santa Monica Mountains         45.71652      47.26521      49.32545         46.64762    45.13314   42.93018
Channel Islands                45.05552      42.87190      46.52956         46.10857    44.10215   48.52834
Central Valley Wetlands        46.74398      43.89761      40.96340         43.64631    46.79744   46.62617
San Gabriel Mountains          46.98936      43.79498      42.27292         45.45327    43.06971   43.94315
Anza-Borrego                   42.53234      41.70132      43.40507         46.22770    41.06093   47.43416
Redwood National Park          46.73329      47.32864      50.56679         45.49725    48.36321   46.01087
Salton Sea                     44.00000      49.09175      46.33573         41.01219    43.66921   47.88528
Lassen Volcanic Park           47.11688      40.66940      46.03260         43.56604    46.67976   44.37342
Elkhorn Slough                 46.67976      48.59012      47.72840         46.85083    51.55580   45.25483
Carrizo Plain                  48.24935      50.55690      53.33854         51.14685    49.12230   50.78386
Mount Shasta                   48.18714      44.38468      50.68530         47.32864    48.42520   44.03408
San Diego Chaparral            45.92385      41.96427      47.64452         48.81598    47.79121   43.47413
Lake Berryessa                 46.21688      45.71652      46.29255         48.22862    45.98913   47.94789
                        Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands
Yosemite Valley             48.13523               45.71652        45.05552                46.74398
Big Sur Coast               48.38388               47.26521        42.87190                43.89761
Mojave Desert               44.33960               49.32545        46.52956                40.96340
Sierra Foothills            45.02222               46.64762        46.10857                43.64631
Point Reyes                 48.39421               45.13314        44.10215                46.79744
Lake Tahoe                  45.36518               42.93018        48.52834                46.62617
Death Valley                 0.00000               43.48563        46.72259                42.28475
Santa Monica Mountains      43.48563                0.00000        37.38984                45.61798
Channel Islands             46.72259               37.38984         0.00000                45.77117
Central Valley Wetlands     42.28475               45.61798        45.77117                 0.00000
San Gabriel Mountains       42.90688               43.19722        42.73172                44.91102
Anza-Borrego                43.26662               36.45545        41.46082                43.15090
Redwood National Park       44.17013               42.98837        44.58699                40.87787
Salton Sea                  44.35087               41.71331        42.91853                43.16248
Lassen Volcanic Park        49.74937               41.44876        45.21062                44.62062
Elkhorn Slough              52.74467               43.64631        47.65501                48.24935
Carrizo Plain               47.52894               47.28636        47.51842                47.31807
Mount Shasta                41.00000               43.86342        44.92215                47.67599
San Diego Chaparral         45.65085               41.79713        46.91482                46.15192
Lake Berryessa              44.75489               43.03487        44.72136                45.24378
                        San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park
Yosemite Valley                      46.98936     42.53234              46.73329   44.00000             47.11688
Big Sur Coast                        43.79498     41.70132              47.32864   49.09175             40.66940
Mojave Desert                        42.27292     43.40507              50.56679   46.33573             46.03260
Sierra Foothills                     45.45327     46.22770              45.49725   41.01219             43.56604
Point Reyes                          43.06971     41.06093              48.36321   43.66921             46.67976
Lake Tahoe                           43.94315     47.43416              46.01087   47.88528             44.37342
Death Valley                         42.90688     43.26662              44.17013   44.35087             49.74937
Santa Monica Mountains               43.19722     36.45545              42.98837   41.71331             41.44876
Channel Islands                      42.73172     41.46082              44.58699   42.91853             45.21062
Central Valley Wetlands              44.91102     43.15090              40.87787   43.16248             44.62062
San Gabriel Mountains                 0.00000     38.71692              44.63183   42.87190             46.34652
Anza-Borrego                         38.71692      0.00000              43.39355   40.18706             40.95119
Redwood National Park                44.63183     43.39355               0.00000   47.30750             49.85980
Salton Sea                           42.87190     40.18706              47.30750    0.00000             45.91296
Lassen Volcanic Park                 46.34652     40.95119              49.85980   45.91296              0.00000
Elkhorn Slough                       42.72002     41.10961              46.18441   42.39104             47.33920
Carrizo Plain                        47.18050     41.07311              50.37857   48.12484             46.10857
Mount Shasta                         45.09989     43.13931              47.72840   46.62617             44.38468
San Diego Chaparral                  44.35087     42.30839              44.93328   43.09292             48.05206
Lake Berryessa                       44.49719     39.78693              45.49725   47.58151             45.65085
                        Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
Yosemite Valley               46.67976      48.24935     48.18714            45.92385       46.21688
Big Sur Coast                 48.59012      50.55690     44.38468            41.96427       45.71652
Mojave Desert                 47.72840      53.33854     50.68530            47.64452       46.29255
Sierra Foothills              46.85083      51.14685     47.32864            48.81598       48.22862
Point Reyes                   51.55580      49.12230     48.42520            47.79121       45.98913
Lake Tahoe                    45.25483      50.78386     44.03408            43.47413       47.94789
Death Valley                  52.74467      47.52894     41.00000            45.65085       44.75489
Santa Monica Mountains        43.64631      47.28636     43.86342            41.79713       43.03487
Channel Islands               47.65501      47.51842     44.92215            46.91482       44.72136
Central Valley Wetlands       48.24935      47.31807     47.67599            46.15192       45.24378
San Gabriel Mountains         42.72002      47.18050     45.09989            44.35087       44.49719
Anza-Borrego                  41.10961      41.07311     43.13931            42.30839       39.78693
Redwood National Park         46.18441      50.37857     47.72840            44.93328       45.49725
Salton Sea                    42.39104      48.12484     46.62617            43.09292       47.58151
Lassen Volcanic Park          47.33920      46.10857     44.38468            48.05206       45.65085
Elkhorn Slough                 0.00000      46.91482     49.60847            41.35215       45.19956
Carrizo Plain                 46.91482       0.00000     46.45428            47.40253       47.15930
Mount Shasta                  49.60847      46.45428      0.00000            49.66890       43.19722
San Diego Chaparral           41.35215      47.40253     49.66890             0.00000       48.54894
Lake Berryessa                45.19956      47.15930     43.19722            48.54894        0.00000

Bray-Curtis Dissimilarity Matrix

bray_curtis_dist <- vegdist(community_matrix, method = "bray")

cat("Bray-Curtis Dissimilarity Matrix (All 20 Sites):\n")
Bray-Curtis Dissimilarity Matrix (All 20 Sites):
print(as.matrix(bray_curtis_dist))
                        Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe
Yosemite Valley               0.0000000     0.3764988     0.4445714        0.3735763   0.4308943  0.4038680
Big Sur Coast                 0.3764988     0.0000000     0.3988304        0.4289044   0.4102259  0.4365541
Mojave Desert                 0.4445714     0.3988304     0.0000000        0.3659622   0.4331066  0.3844444
Sierra Foothills              0.3735763     0.4289044     0.3659622        0.0000000   0.3830508  0.3798450
Point Reyes                   0.4308943     0.4102259     0.4331066        0.3830508   0.0000000  0.4176072
Lake Tahoe                    0.4038680     0.4365541     0.3844444        0.3798450   0.4176072  0.0000000
Death Valley                  0.3943503     0.4127168     0.3532009        0.3751375   0.4170404  0.3714286
Santa Monica Mountains        0.3955774     0.4483627     0.4323353        0.4057279   0.4153471  0.3563766
Channel Islands               0.4027778     0.3483412     0.3875706        0.3761261   0.3754305  0.3993251
Central Valley Wetlands       0.3947668     0.3853318     0.3244444        0.3554817   0.4085779  0.3783186
San Gabriel Mountains         0.4356436     0.3984772     0.3606755        0.3918269   0.3963190  0.3853541
Anza-Borrego                  0.3748503     0.3865031     0.3761682        0.3923166   0.3634204  0.4116279
Redwood National Park         0.3963964     0.4216590     0.4169417        0.3837719   0.4189944  0.3669222
Salton Sea                    0.3585746     0.4145786     0.3710555        0.3167028   0.3679558  0.3846154
Lassen Volcanic Park          0.3941441     0.3271889     0.3685369        0.3552632   0.3899441  0.3537788
Elkhorn Slough                0.4114889     0.4405762     0.3981693        0.3911060   0.4697674  0.3667426
Carrizo Plain                 0.4217687     0.4338747     0.4197121        0.4105960   0.4150731  0.3958104
Mount Shasta                  0.4150110     0.3521445     0.4066882        0.3849462   0.3932092  0.3555317
San Diego Chaparral           0.3911060     0.3395566     0.3719376        0.3962264   0.4049774  0.3481153
Lake Berryessa                0.3806306     0.3963134     0.3707371        0.3969298   0.3832402  0.3866375
                        Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands
Yosemite Valley            0.3943503              0.3955774       0.4027778               0.3947668
Big Sur Coast              0.4127168              0.4483627       0.3483412               0.3853318
Mojave Desert              0.3532009              0.4323353       0.3875706               0.3244444
Sierra Foothills           0.3751375              0.4057279       0.3761261               0.3554817
Point Reyes                0.4170404              0.4153471       0.3754305               0.4085779
Lake Tahoe                 0.3714286              0.3563766       0.3993251               0.3783186
Death Valley               0.0000000              0.3940828       0.3877095               0.3560440
Santa Monica Mountains     0.3940828              0.0000000       0.3373786               0.4088200
Channel Islands            0.3877095              0.3373786       0.0000000               0.3835771
Central Valley Wetlands    0.3560440              0.4088200       0.3835771               0.0000000
San Gabriel Mountains      0.3897497              0.4218750       0.3863081               0.4117647
Anza-Borrego               0.3602771              0.3333333       0.3514793               0.3860465
Redwood National Park      0.3427639              0.3750000       0.3608018               0.3362541
Salton Sea                 0.3541442              0.3519814       0.3546256               0.3434453
Lassen Volcanic Park       0.4058760              0.3655660       0.3674833               0.3537788
Elkhorn Slough             0.4524887              0.4022140       0.4090382               0.4191344
Carrizo Plain              0.3625411              0.4133017       0.3946188               0.3781698
Mount Shasta               0.3191035              0.3602771       0.3646288               0.3813104
San Diego Chaparral        0.3634361              0.3643967       0.3866967               0.3813747
Lake Berryessa             0.3536453              0.3726415       0.3741648               0.3625411
                        San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park
Yosemite Valley                     0.4356436    0.3748503             0.3963964  0.3585746            0.3941441
Big Sur Coast                       0.3984772    0.3865031             0.4216590  0.4145786            0.3271889
Mojave Desert                       0.3606755    0.3761682             0.4169417  0.3710555            0.3685369
Sierra Foothills                    0.3918269    0.3923166             0.3837719  0.3167028            0.3552632
Point Reyes                         0.3963190    0.3634204             0.4189944  0.3679558            0.3899441
Lake Tahoe                          0.3853541    0.4116279             0.3669222  0.3846154            0.3537788
Death Valley                        0.3897497    0.3602771             0.3427639  0.3541442            0.4058760
Santa Monica Mountains              0.4218750    0.3333333             0.3750000  0.3519814            0.3655660
Channel Islands                     0.3863081    0.3514793             0.3608018  0.3546256            0.3674833
Central Valley Wetlands             0.4117647    0.3860465             0.3362541  0.3434453            0.3537788
San Gabriel Mountains               0.0000000    0.3688213             0.3729216  0.3732394            0.3990499
Anza-Borrego                        0.3688213    0.0000000             0.3808976  0.3424346            0.3578826
Redwood National Park               0.3729216    0.3808976             0.0000000  0.3841202            0.4056399
Salton Sea                          0.3732394    0.3424346             0.3841202  0.0000000            0.3519313
Lassen Volcanic Park                0.3990499    0.3578826             0.4056399  0.3519313            0.0000000
Elkhorn Slough                      0.3977695    0.3812950             0.3596392  0.3489409            0.3957159
Carrizo Plain                       0.4043062    0.3371958             0.3995633  0.3909287            0.3580786
Mount Shasta                        0.3930233    0.3573844             0.3808511  0.3705263            0.3446809
San Diego Chaparral                 0.3983153    0.3473193             0.3743139  0.3463626            0.3874863
Lake Berryessa                      0.3895487    0.3371692             0.3644252  0.3884120            0.3644252
                        Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
Yosemite Valley              0.4114889     0.4217687    0.4150110           0.3911060      0.3806306
Big Sur Coast                0.4405762     0.4338747    0.3521445           0.3395566      0.3963134
Mojave Desert                0.3981693     0.4197121    0.4066882           0.3719376      0.3707371
Sierra Foothills             0.3911060     0.4105960    0.3849462           0.3962264      0.3969298
Point Reyes                  0.4697674     0.4150731    0.3932092           0.4049774      0.3832402
Lake Tahoe                   0.3667426     0.3958104    0.3555317           0.3481153      0.3866375
Death Valley                 0.4524887     0.3625411    0.3191035           0.3634361      0.3536453
Santa Monica Mountains       0.4022140     0.4133017    0.3602771           0.3643967      0.3726415
Channel Islands              0.4090382     0.3946188    0.3646288           0.3866967      0.3741648
Central Valley Wetlands      0.4191344     0.3781698    0.3813104           0.3813747      0.3625411
San Gabriel Mountains        0.3977695     0.4043062    0.3930233           0.3983153      0.3895487
Anza-Borrego                 0.3812950     0.3371958    0.3573844           0.3473193      0.3371692
Redwood National Park        0.3596392     0.3995633    0.3808511           0.3743139      0.3644252
Salton Sea                   0.3489409     0.3909287    0.3705263           0.3463626      0.3884120
Lassen Volcanic Park         0.3957159     0.3580786    0.3446809           0.3874863      0.3644252
Elkhorn Slough               0.0000000     0.3870602    0.4099448           0.3447489      0.3709132
Carrizo Plain                0.3870602     0.0000000    0.3811563           0.3723757      0.3864629
Mount Shasta                 0.4099448     0.3811563    0.0000000           0.3950484      0.3361702
San Diego Chaparral          0.3447489     0.3723757    0.3950484           0.0000000      0.4072448
Lake Berryessa               0.3709132     0.3864629    0.3361702           0.4072448      0.0000000

Jaccard Distance Matrix

jaccard_dist <- vegdist(community_matrix, method = "jaccard", binary = TRUE)

cat("Jaccard Distance Matrix (All 20 Sites):\n")
Jaccard Distance Matrix (All 20 Sites):
print(as.matrix(jaccard_dist))
                        Yosemite Valley Big Sur Coast Mojave Desert Sierra Foothills Point Reyes Lake Tahoe
Yosemite Valley              0.00000000     0.1428571    0.12941176        0.1309524  0.16470588 0.14117647
Big Sur Coast                0.14285714     0.0000000    0.15294118        0.1764706  0.16666667 0.16470588
Mojave Desert                0.12941176     0.1529412    0.00000000        0.1411765  0.10843373 0.12941176
Sierra Foothills             0.13095238     0.1764706    0.14117647        0.0000000  0.15476190 0.13095238
Point Reyes                  0.16470588     0.1666667    0.10843373        0.1547619  0.00000000 0.16470588
Lake Tahoe                   0.14117647     0.1647059    0.12941176        0.1309524  0.16470588 0.00000000
Death Valley                 0.15294118     0.1764706    0.14117647        0.1428571  0.17647059 0.15294118
Santa Monica Mountains       0.11764706     0.1411765    0.10588235        0.1071429  0.14117647 0.11764706
Channel Islands              0.10714286     0.1309524    0.11764706        0.1411765  0.15294118 0.12941176
Central Valley Wetlands      0.15476190     0.2000000    0.14285714        0.1666667  0.15662651 0.17647059
San Gabriel Mountains        0.14117647     0.1647059    0.10714286        0.1309524  0.12048193 0.14117647
Anza-Borrego                 0.12941176     0.1529412    0.11764706        0.1411765  0.15294118 0.12941176
Redwood National Park        0.14117647     0.1647059    0.10714286        0.1529412  0.14285714 0.14117647
Salton Sea                   0.11764706     0.1411765    0.10588235        0.1294118  0.14117647 0.09523810
Lassen Volcanic Park         0.09411765     0.1176471    0.08235294        0.1058824  0.11764706 0.09411765
Elkhorn Slough               0.12941176     0.1309524    0.11764706        0.1411765  0.13095238 0.12941176
Carrizo Plain                0.11764706     0.1411765    0.08333333        0.1071429  0.09638554 0.11764706
Mount Shasta                 0.12941176     0.1529412    0.11764706        0.1411765  0.13095238 0.12941176
San Diego Chaparral          0.11764706     0.1190476    0.10588235        0.1294118  0.14117647 0.09523810
Lake Berryessa               0.10714286     0.1309524    0.11764706        0.1411765  0.13095238 0.10714286
                        Death Valley Santa Monica Mountains Channel Islands Central Valley Wetlands
Yosemite Valley           0.15294118             0.11764706      0.10714286               0.1547619
Big Sur Coast             0.17647059             0.14117647      0.13095238               0.2000000
Mojave Desert             0.14117647             0.10588235      0.11764706               0.1428571
Sierra Foothills          0.14285714             0.10714286      0.14117647               0.1666667
Point Reyes               0.17647059             0.14117647      0.15294118               0.1566265
Lake Tahoe                0.15294118             0.11764706      0.12941176               0.1764706
Death Valley              0.00000000             0.12941176      0.14117647               0.1445783
Santa Monica Mountains    0.12941176             0.00000000      0.08333333               0.1529412
Channel Islands           0.14117647             0.08333333      0.00000000               0.1647059
Central Valley Wetlands   0.14457831             0.15294118      0.16470588               0.0000000
San Gabriel Mountains     0.15294118             0.09523810      0.12941176               0.1764706
Anza-Borrego              0.14117647             0.10588235      0.11764706               0.1428571
Redwood National Park     0.10843373             0.11764706      0.12941176               0.1097561
Salton Sea                0.10714286             0.09411765      0.10588235               0.1529412
Lassen Volcanic Park      0.10588235             0.04761905      0.05952381               0.1294118
Elkhorn Slough            0.14117647             0.10588235      0.11764706               0.1647059
Carrizo Plain             0.08433735             0.09411765      0.10588235               0.1309524
Mount Shasta              0.11904762             0.10588235      0.11764706               0.1428571
San Diego Chaparral       0.12941176             0.09411765      0.08333333               0.1529412
Lake Berryessa            0.14117647             0.08333333      0.11764706               0.1647059
                        San Gabriel Mountains Anza-Borrego Redwood National Park Salton Sea Lassen Volcanic Park
Yosemite Valley                    0.14117647   0.12941176            0.14117647 0.11764706           0.09411765
Big Sur Coast                      0.16470588   0.15294118            0.16470588 0.14117647           0.11764706
Mojave Desert                      0.10714286   0.11764706            0.10714286 0.10588235           0.08235294
Sierra Foothills                   0.13095238   0.14117647            0.15294118 0.12941176           0.10588235
Point Reyes                        0.12048193   0.15294118            0.14285714 0.14117647           0.11764706
Lake Tahoe                         0.14117647   0.12941176            0.14117647 0.09523810           0.09411765
Death Valley                       0.15294118   0.14117647            0.10843373 0.10714286           0.10588235
Santa Monica Mountains             0.09523810   0.10588235            0.11764706 0.09411765           0.04761905
Channel Islands                    0.12941176   0.11764706            0.12941176 0.10588235           0.05952381
Central Valley Wetlands            0.17647059   0.14285714            0.10975610 0.15294118           0.12941176
San Gabriel Mountains              0.00000000   0.12941176            0.11904762 0.11764706           0.09411765
Anza-Borrego                       0.12941176   0.00000000            0.12941176 0.10588235           0.08235294
Redwood National Park              0.11904762   0.12941176            0.00000000 0.09523810           0.09411765
Salton Sea                         0.11764706   0.10588235            0.09523810 0.00000000           0.07058824
Lassen Volcanic Park               0.09411765   0.08235294            0.09411765 0.07058824           0.00000000
Elkhorn Slough                     0.12941176   0.11764706            0.12941176 0.10588235           0.08235294
Carrizo Plain                      0.09523810   0.10588235            0.09523810 0.07142857           0.07058824
Mount Shasta                       0.12941176   0.11764706            0.12941176 0.10588235           0.05952381
San Diego Chaparral                0.09523810   0.10588235            0.11764706 0.07142857           0.07058824
Lake Berryessa                     0.12941176   0.09523810            0.12941176 0.10588235           0.08235294
                        Elkhorn Slough Carrizo Plain Mount Shasta San Diego Chaparral Lake Berryessa
Yosemite Valley             0.12941176    0.11764706   0.12941176          0.11764706     0.10714286
Big Sur Coast               0.13095238    0.14117647   0.15294118          0.11904762     0.13095238
Mojave Desert               0.11764706    0.08333333   0.11764706          0.10588235     0.11764706
Sierra Foothills            0.14117647    0.10714286   0.14117647          0.12941176     0.14117647
Point Reyes                 0.13095238    0.09638554   0.13095238          0.14117647     0.13095238
Lake Tahoe                  0.12941176    0.11764706   0.12941176          0.09523810     0.10714286
Death Valley                0.14117647    0.08433735   0.11904762          0.12941176     0.14117647
Santa Monica Mountains      0.10588235    0.09411765   0.10588235          0.09411765     0.08333333
Channel Islands             0.11764706    0.10588235   0.11764706          0.08333333     0.11764706
Central Valley Wetlands     0.16470588    0.13095238   0.14285714          0.15294118     0.16470588
San Gabriel Mountains       0.12941176    0.09523810   0.12941176          0.09523810     0.12941176
Anza-Borrego                0.11764706    0.10588235   0.11764706          0.10588235     0.09523810
Redwood National Park       0.12941176    0.09523810   0.12941176          0.11764706     0.12941176
Salton Sea                  0.10588235    0.07142857   0.10588235          0.07142857     0.10588235
Lassen Volcanic Park        0.08235294    0.07058824   0.05952381          0.07058824     0.08235294
Elkhorn Slough              0.00000000    0.10588235   0.11764706          0.10588235     0.09523810
Carrizo Plain               0.10588235    0.00000000   0.10588235          0.09411765     0.10588235
Mount Shasta                0.11764706    0.10588235   0.00000000          0.10588235     0.11764706
San Diego Chaparral         0.10588235    0.09411765   0.10588235          0.00000000     0.10588235
Lake Berryessa              0.09523810    0.10588235   0.11764706          0.10588235     0.00000000

NMDS with Bray-Curtis Distances

set.seed(123)  # For reproducibility
nmds_bray <- metaMDS(community_matrix, distance = "bray", k = 2, trymax = 100)
Wisconsin double standardization
Run 0 stress 0.281007 
Run 1 stress 0.2923922 
Run 2 stress 0.3105088 
Run 3 stress 0.3218972 
Run 4 stress 0.2941504 
Run 5 stress 0.2990044 
Run 6 stress 0.308303 
Run 7 stress 0.3032368 
Run 8 stress 0.2929312 
Run 9 stress 0.3011717 
Run 10 stress 0.2868759 
Run 11 stress 0.3007704 
Run 12 stress 0.2873156 
Run 13 stress 0.283034 
Run 14 stress 0.2912786 
Run 15 stress 0.3027634 
Run 16 stress 0.3125567 
Run 17 stress 0.2868142 
Run 18 stress 0.2980751 
Run 19 stress 0.2768438 
... New best solution
... Procrustes: rmse 0.1402713  max resid 0.3215578 
Run 20 stress 0.2787651 
Run 21 stress 0.2911912 
Run 22 stress 0.3057632 
Run 23 stress 0.2831251 
Run 24 stress 0.2853592 
Run 25 stress 0.3240949 
Run 26 stress 0.291665 
Run 27 stress 0.2967328 
Run 28 stress 0.2937119 
Run 29 stress 0.2803224 
Run 30 stress 0.2934303 
Run 31 stress 0.2779701 
Run 32 stress 0.2892283 
Run 33 stress 0.2831959 
Run 34 stress 0.3007912 
Run 35 stress 0.2881029 
Run 36 stress 0.2959458 
Run 37 stress 0.3012513 
Run 38 stress 0.3028554 
Run 39 stress 0.3053827 
Run 40 stress 0.2982422 
Run 41 stress 0.3041563 
Run 42 stress 0.3112826 
Run 43 stress 0.2982777 
Run 44 stress 0.2773268 
... Procrustes: rmse 0.04924779  max resid 0.1441582 
Run 45 stress 0.2823682 
Run 46 stress 0.2844833 
Run 47 stress 0.2802228 
Run 48 stress 0.2850497 
Run 49 stress 0.3073537 
Run 50 stress 0.3130785 
Run 51 stress 0.2742365 
... New best solution
... Procrustes: rmse 0.1419059  max resid 0.3145031 
Run 52 stress 0.298485 
Run 53 stress 0.2823502 
Run 54 stress 0.3042179 
Run 55 stress 0.2884774 
Run 56 stress 0.2844712 
Run 57 stress 0.3074574 
Run 58 stress 0.2862099 
Run 59 stress 0.2875857 
Run 60 stress 0.2842428 
Run 61 stress 0.2749927 
Run 62 stress 0.2840932 
Run 63 stress 0.292112 
Run 64 stress 0.2800357 
Run 65 stress 0.2859932 
Run 66 stress 0.3105873 
Run 67 stress 0.2895685 
Run 68 stress 0.3063066 
Run 69 stress 0.2900662 
Run 70 stress 0.2917791 
Run 71 stress 0.2908405 
Run 72 stress 0.2939239 
Run 73 stress 0.2982702 
Run 74 stress 0.2810184 
Run 75 stress 0.3155024 
Run 76 stress 0.2874488 
Run 77 stress 0.2924123 
Run 78 stress 0.2987606 
Run 79 stress 0.3024156 
Run 80 stress 0.2888699 
Run 81 stress 0.2918204 
Run 82 stress 0.2990798 
Run 83 stress 0.2986676 
Run 84 stress 0.2867671 
Run 85 stress 0.2914776 
Run 86 stress 0.2912989 
Run 87 stress 0.2990439 
Run 88 stress 0.2828081 
Run 89 stress 0.2912644 
Run 90 stress 0.2923718 
Run 91 stress 0.2936975 
Run 92 stress 0.295154 
Run 93 stress 0.2912083 
Run 94 stress 0.2992323 
Run 95 stress 0.2998503 
Run 96 stress 0.2795509 
Run 97 stress 0.2916514 
Run 98 stress 0.290269 
Run 99 stress 0.2832592 
Run 100 stress 0.2866166 
*** Best solution was not repeated -- monoMDS stopping criteria:
     5: no. of iterations >= maxit
    95: stress ratio > sratmax
nmds_bray_scores <- as.data.frame(scores(nmds_bray, display = "sites"))
nmds_bray_scores$Site <- rownames(nmds_bray_scores)


ggplot(nmds_bray_scores, aes(x = NMDS1, y = NMDS2)) +
  geom_point(size = 5, color = "darkblue", alpha = 0.7) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 5, check_overlap = FALSE) +
  labs(title = "NMDS Ordination: Bray-Curtis Dissimilarity",
       subtitle = paste0("Stress = ", round(nmds_bray$stress, 4)),
       x = "NMDS1",
       y = "NMDS2") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

cat("\nBray-Curtis NMDS Stress:", nmds_bray$stress, "\n")

Bray-Curtis NMDS Stress: 0.2742365 

NMDS with Jaccard Distances

set.seed(123)  # For reproducibility
nmds_jaccard <- metaMDS(community_matrix, distance = "jaccard", binary = TRUE, k = 2, trymax = 100)
Wisconsin double standardization
Run 0 stress 0.1809183 
Run 1 stress 0.1779049 
... New best solution
... Procrustes: rmse 0.06125251  max resid 0.1754351 
Run 2 stress 0.1865218 
Run 3 stress 0.2266042 
Run 4 stress 0.180501 
Run 5 stress 0.192617 
Run 6 stress 0.1866004 
Run 7 stress 0.1997077 
Run 8 stress 0.1931261 
Run 9 stress 0.1904562 
Run 10 stress 0.2037118 
Run 11 stress 0.1776164 
... New best solution
... Procrustes: rmse 0.04339829  max resid 0.1781998 
Run 12 stress 0.2045054 
Run 13 stress 0.2467821 
Run 14 stress 0.2351829 
Run 15 stress 0.2131532 
Run 16 stress 0.1887383 
Run 17 stress 0.1881437 
Run 18 stress 0.1773501 
... New best solution
... Procrustes: rmse 0.02562705  max resid 0.0826354 
Run 19 stress 0.1811889 
Run 20 stress 0.1903775 
Run 21 stress 0.1903773 
Run 22 stress 0.18782 
Run 23 stress 0.2045201 
Run 24 stress 0.1870151 
Run 25 stress 0.1773502 
... Procrustes: rmse 9.402431e-05  max resid 0.0002231769 
... Similar to previous best
*** Best solution repeated 1 times
nmds_jaccard_scores <- as.data.frame(scores(nmds_jaccard, display = "sites"))
nmds_jaccard_scores$Site <- rownames(nmds_jaccard_scores)


ggplot(nmds_jaccard_scores, aes(x = NMDS1, y = NMDS2)) +
  geom_point(size = 5, color = "darkred", alpha = 0.7) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 5, check_overlap = FALSE) +
  labs(title = "NMDS Ordination: Jaccard Distance",
       subtitle = paste0("Stress = ", round(nmds_jaccard$stress, 4)),
       x = "NMDS1",
       y = "NMDS2") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))

cat("\nJaccard NMDS Stress:", nmds_jaccard$stress, "\n")

Jaccard NMDS Stress: 0.1773501 

Observations

These distance matrices and NMDS ordinations reveal important patterns in species composition across California sites. First, looking at Euclidean distance, which measures the straight-line difference between sites based on species abundances, you can see how overall abundance differences make sites appear more or less similar. In the matrix, larger values indicate that two sites differ more in total abundances across all species, while smaller values indicate more similar abundances. Euclidean distance is sensitive to overall abundance differences, which is why it often produces larger and more variable values compared to Bray-Curtis or Jaccard distances. In this way because Euclidean distance emphasizes absolute abundance rather than ecological composition, it is not typically visualized with NMDS plots. Bray-Curtis dissimilarity on the other hand adjusts for total abundance and compares both species identity and relative abundances. Values near 0 indicate very similar communities, while values near 1 indicate very different ones. The Bray-Curtis NMDS in this way shows clear clustering by habitat type. For example coastal habitats like Point Reyes, Big Sur Coast, and the Channel Islands occupy the left side of the plot, showing they share more species with each other than with say, desert systems. The stress value for this NMDS is 0.2742 which indicates moderate distortion typical of complex datasets. This means broad patterns depicted are reliable but fine-scale distances between sites should be interpreted cautiously. Finally, Jaccard distance only considers species presence or absence, ignoring abundance. The Jaccard NMDS shows slightly different clusters than Bray-Curtis because it treats all species equally, regardless of how many individuals are present. Clustering in the Jaccard NMDS is more compact than in the Bray-Curtis NMDS. For example, both analyses show clustering of deserts like the Mojave and Death Valley, but in Bray-Curtis they are very far apart from other sites because abundance exaggerates separation. In Jaccard, these deserts remain distinct but sit closer to the center.


Question 5: Comprehensive Diversity Indices for All Sites

Calculate All Diversity Indices

richness_all <- specnumber(community_matrix)


shannon_all <- diversity(community_matrix, index = "shannon")


gini_simpson_all <- diversity(community_matrix, index = "simpson")


inverse_simpson <- diversity(community_matrix, index = "invsimpson")
simpson_evenness <- inverse_simpson / richness_all


diversity_all <- data.frame(
  Site = rownames(community_matrix),
  Shannon_Index = shannon_all,
  Gini_Simpson_Index = gini_simpson_all,
  Simpson_Evenness = simpson_evenness
)


cat("Complete Diversity Metrics for All 20 California Sites:\n\n")
Complete Diversity Metrics for All 20 California Sites:
kable(diversity_all,
      caption = "Comprehensive Diversity Analysis: All 20 California Sites",
      align = "lrrr",
      col.names = c("Site", "Shannon Index (H')", "Gini-Simpson Index (1-D)",
                    "Simpson's Evenness"),
      digits = 4,
      row.names = FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "bordered", "condensed"),
                full_width = TRUE,
                font_size = 14,
                position = "center") %>%
  column_spec(1, bold = TRUE, width = "12em") %>%
  column_spec(2:4, width = "8em") %>%
  row_spec(0, bold = TRUE, color = "white", background = "#2c3e50", font_size = 15) %>%
  scroll_box(width = "100%", height = "500px")
Comprehensive Diversity Analysis: All 20 California Sites
Site Shannon Index (H') Gini-Simpson Index (1-D) Simpson's Evenness
Yosemite Valley 4.1599 0.9821 0.7069
Big Sur Coast 4.1405 0.9816 0.7070
Mojave Desert 4.1693 0.9821 0.6988
Sierra Foothills 4.1769 0.9827 0.7419
Point Reyes 4.1560 0.9825 0.7426
Lake Tahoe 4.1765 0.9826 0.7293
Death Valley 4.1851 0.9831 0.7588
Santa Monica Mountains 4.2122 0.9831 0.7288
Channel Islands 4.2261 0.9836 0.7624
Central Valley Wetlands 4.1821 0.9832 0.7810
San Gabriel Mountains 4.1878 0.9829 0.7394
Anza-Borrego 4.2302 0.9838 0.7712
Redwood National Park 4.2027 0.9832 0.7546
Salton Sea 4.2434 0.9842 0.7809
Lassen Volcanic Park 4.2045 0.9829 0.7029
Elkhorn Slough 4.1744 0.9823 0.7071
Carrizo Plain 4.1603 0.9818 0.6785
Mount Shasta 4.2119 0.9834 0.7534
San Diego Chaparral 4.2050 0.9832 0.7331
Lake Berryessa 4.1848 0.9827 0.7229

Summary Statistics

cat("\n=== Summary Statistics for Diversity Indices ===\n")

=== Summary Statistics for Diversity Indices ===
cat("\nShannon Index:\n")

Shannon Index:
print(summary(diversity_all$Shannon_Index))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.140   4.173   4.185   4.189   4.207   4.243 
cat("\nGini-Simpson Index:\n")

Gini-Simpson Index:
print(summary(diversity_all$Gini_Simpson_Index))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9816  0.9825  0.9829  0.9828  0.9832  0.9842 
cat("\nSimpson's Evenness:\n")

Simpson's Evenness:
print(summary(diversity_all$Simpson_Evenness))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.6785  0.7071  0.7363  0.7351  0.7557  0.7810 

Visualization: Multi-Panel Comparison

diversity_all_ordered <- diversity_all %>%
  arrange(desc(Shannon_Index))


p1 <- ggplot(diversity_all_ordered, aes(x = reorder(Site, Shannon_Index), y = Shannon_Index)) +
  geom_bar(stat = "identity", fill = "steelblue", alpha = 0.7) +
  coord_flip() +
  labs(title = "Shannon Diversity Index",
       x = "Site", y = "Shannon Index (H')") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 18),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12))

p2 <- ggplot(diversity_all_ordered, aes(x = reorder(Site, Shannon_Index), y = Gini_Simpson_Index)) +
  geom_bar(stat = "identity", fill = "darkgreen", alpha = 0.7) +
  coord_flip() +
  labs(title = "Gini-Simpson Index",
       x = "Site", y = "Gini-Simpson Index (1-D)") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 18),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12))

p3 <- ggplot(diversity_all_ordered, aes(x = reorder(Site, Shannon_Index), y = Simpson_Evenness)) +
  geom_bar(stat = "identity", fill = "darkorange", alpha = 0.7) +
  coord_flip() +
  labs(title = "Simpson's Evenness",
       x = "Site", y = "Simpson's Evenness (E)") +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 18),
        axis.title = element_text(size = 14, face = "bold"),
        axis.text = element_text(size = 12),
        axis.text.y = element_text(size = 12))


p1 / p2 / p3 +
  plot_annotation(title = "Diversity Indices Across All California Sites",
                  theme = theme(plot.title = element_text(face = "bold", size = 22)))

Visualization: Relationships Between Metrics

ggplot(diversity_all, aes(x = Shannon_Index, y = Gini_Simpson_Index)) +
  geom_point(size = 5, color = "darkblue", alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed", linewidth = 1.2) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 4.5, check_overlap = TRUE) +
  labs(title = "Shannon Index vs Gini-Simpson Index",
       subtitle = "Relationship between two diversity metrics",
       x = "Shannon Index (H')",
       y = "Gini-Simpson Index (1-D)") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))
`geom_smooth()` using formula = 'y ~ x'

ggplot(diversity_all, aes(x = Shannon_Index, y = Simpson_Evenness)) +
  geom_point(size = 5, color = "darkgreen", alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed", linewidth = 1.2) +
  geom_text(aes(label = Site), hjust = -0.1, vjust = 0.5, size = 4.5, check_overlap = TRUE) +
  labs(title = "Shannon Index vs Simpson's Evenness",
       subtitle = "Does higher diversity correlate with higher evenness?",
       x = "Shannon Index (H')",
       y = "Simpson's Evenness (E)") +
  theme_minimal(base_size = 16) +
  theme(plot.title = element_text(face = "bold", size = 20),
        plot.subtitle = element_text(size = 14),
        axis.title = element_text(size = 16, face = "bold"),
        axis.text = element_text(size = 14))
`geom_smooth()` using formula = 'y ~ x'

cor_shannon_gini <- cor(diversity_all$Shannon_Index, diversity_all$Gini_Simpson_Index)
cor_shannon_evenness <- cor(diversity_all$Shannon_Index, diversity_all$Simpson_Evenness)

cat("\nCorrelation between Shannon and Gini-Simpson:", round(cor_shannon_gini, 3), "\n")

Correlation between Shannon and Gini-Simpson: 0.932 
cat("Correlation between Shannon and Evenness:", round(cor_shannon_evenness, 3), "\n")
Correlation between Shannon and Evenness: 0.629 

Observations

Using all 20 California sites, the Gini-Simpson Index, Simpson’s Evenness, and Shannon Index reveal more about community structure beyond just species counts. This analysis expands beyond Question 3’s focus on three specific sites to look at the dataset as a whole. The Gini-Simpson index measures the chance that two randomly chosen individuals are from different species, with values near 1 showing high diversity. All sites scored pretty high (0.9816-0.9842), meaning no single species dominates. Simpson’s Evenness shows how evenly individuals are distributed among species. Higher values show more balanced communities. Most sites are fairly even, though Channel Islands and Anza-Borrego are slightly higher. Finally the Shannon Index combines richness and evenness, increasing when sites have many species with relatively equal abundances. Conceptualized together these metrics reveal that California’s ecosystems are very diverse and generally even, while small differences in patterns reflect variations in habitat and environmental conditions



Question 6: Overall Conclusions and Most Applicable Metrics

Based on these analyses of the California Species Abundance dataset, it can be concluded that generally California ecosystems are highly diverse. Most sites support many species that are fairly evenly distributed, though some differences in dominance and abundance patterns exist depending on habitat type. A reasonable hypothesis is that California’s ecosystems with more extreme or variable environmental conditions will have slightly lower diversity, while more stable habitats will support more balanced and diverse communities, despite this overall trend in high diversity. Desert sites, coastal sites, and montane sites each show subtle differences, reflecting how environmental conditions shape community structure. I found this easiest to understand through the visual NMDS plots that show sites clustered by ecological and geographic patterns. Specifically the Bray-Curtis and Jaccard distances were very effective in comparing sites in composition and abundance. Looking at the diversity metrics, I would say that the Shannon Index was particularly useful in this report because it captures both richness and evenness. In this way Simpson diversity also complemented Shannon diversity, highlighting dominant species patterns.