This workshop is designed for entomology students who are new to R and may need a refresher on ecological concepts related to biodiversity data analysis. No prior R experience is required!
By the end of this workshop, you will be able to:
When we set pitfall traps across different habitats, weβre fundamentally asking:
βDo different environments support different insect communities, and why?β
This question connects to foundational ecological theories:
Key Ecological Concepts:
Niche Theory: Each species has specific environmental requirements (temperature, moisture, food). Species occur where conditions match their niche.
Habitat Filtering: The environment βfiltersβ which species can survive. Forest conditions filter out sun-loving species; agricultural disturbance filters out sensitive specialists.
Species-Area Relationship: Larger, more complex habitats typically support more species.
Indicator Species: Some species are sensitive to environmental conditions and can indicate habitat quality or disturbance level.
Our statistical analyses are tools to detect these ecological patterns and test whether the differences we observe are real or just due to chance.
Before writing any code, understand the journey from raw data to ecological insights:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COMPLETE ANALYSIS WORKFLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ
β RAW DATA β
β (Your pitfall β
β trap counts) β
ββββββββββ¬βββββββββ
β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PHASE 1: DATA PREPARATION (Day 1, Sessions 1-3) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£ β
β β β’ Import data into R β β
β β β’ Check for errors (typos, impossible values) β β
β β β’ Standardize names (taxonomy is messy!) β β
β β β’ Create community matrix (sites Γ species) β β
β β β β
β β KEY QUESTION: "Is my data clean and trustworthy?" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PHASE 2: DATA EXPLORATION (Day 1, Session 4) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£ β
β β β’ Summarize: How many species? How many individuals? β β
β β β’ Explore: Which insect ORDERS are abundant? β β
β β β’ Decide: Which groups have "interesting stories"? β β
β β β’ Visualize: What patterns can we SEE? β β
β β β β
β β KEY QUESTION: "Which insect groups should I focus β β
β β my analysis on?" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
D β βΌ β
A β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
Y β β PHASE 3: DIVERSITY ANALYSIS (Day 2, Session 5) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£ β
1 β β ALPHA DIVERSITY (within-site) β β
β β β’ How many species at each site? (Richness) β β
| β β β’ How diverse is each site? (Shannon, Simpson) β β
β β β’ Are differences due to sampling effort? β β
D β β (Rarefaction) β β
A β β β β
Y β β KEY QUESTION: "Which habitats are most diverse?" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
2 β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PHASE 4: COMMUNITY ANALYSIS (Day 2, Sessions 6-7) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£ β
β β BETA DIVERSITY (between-sites) β β
β β β’ How different are communities? (Distance matrix) β β
β β β’ Can we VISUALIZE the pattern? (NMDS/PCoA) β β
β β β’ Is the pattern SIGNIFICANT? (PERMANOVA) β β
β β β’ WHICH species drive differences? (SIMPER) β β
β β β β
β β KEY QUESTION: "Do habitats have different β β
β β communities? Which species matter?" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β PHASE 5: INTERPRETATION (Day 2, Session 8) β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£ β
β β β’ What do the numbers MEAN ecologically? β β
β β β’ What MECHANISMS might explain the patterns? β β
β β β’ What are the LIMITATIONS? β β
β β β’ What NEW QUESTIONS arise? β β
β β β β
β β KEY QUESTION: "So what? Why does this matter?" β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββ
β PUBLICATION / β
β THESIS CHAPTER β
βββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DAY 1: DATA FOUNDATIONS & EXPLORATION 9:00 - 15:00 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β 09:00 - 10:15 Session 1: R Basics & Importing Data 75 min β
β 10:15 - 10:30 β Break 15 min β
β 10:30 - 11:30 Session 2: Data Wrangling with tidyverse 60 min β
β 11:30 - 12:30 π½οΈ Lunch 60 min β
β 12:30 - 13:45 Session 3: Data Cleaning & Community Matrices 75 min β
β 13:45 - 14:00 β Break 15 min β
β 14:00 - 15:00 Session 4: Choosing Your Focal Taxa 60 min β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β DAY 2: ANALYSIS & INTERPRETATION 9:00 - 15:00 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β 09:00 - 10:00 Session 5: Diversity Metrics & Visualization 60 min β
β 10:00 - 10:15 β Break 15 min β
β 10:15 - 11:15 Session 6: Ordination - Seeing Community Patterns 60 min β
β 11:15 - 12:15 π½οΈ Lunch 60 min β
β 12:15 - 13:30 Session 7: Statistical Testing 75 min β
β 13:30 - 13:45 β Break 15 min β
β 13:45 - 14:45 Session 8: Interpretation & Synthesis 60 min β
β 14:45 - 15:00 Wrap-up & Next Steps 15 min β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Weβll use sample_insect_data.csv - a
pitfall trap survey of ground-dwelling arthropods.
About Pitfall Traps:
Pitfall traps are cups buried flush with the ground surface. Ground-active arthropods fall in and are collected. This method is:
Always consider sampling bias when interpreting results!
| Column | Description |
|---|---|
| site | Site identifier (S01-S12) |
| habitat | Habitat type (forest, grassland, agriculture) |
| landscape | Landscape complexity (complex, simple) |
| order | Insect order (Coleoptera, Hymenoptera, etc.) |
| family | Insect family |
| morphospecies | Species identifier |
| abundance | Number of individuals |
Time: 09:00 - 10:15 (75 minutes)
Why do we use R instead of Excel?
Reproducibility: Your analysis is saved as a script. Anyone (including future you) can re-run it exactly.
Ecological packages: R has specialized packages
like vegan designed specifically for community
ecology.
Handling complexity: Ecological data often has hundreds of species and complex designs. R handles this easily.
Publication-quality figures: R creates beautiful, customizable plots ready for journals.
# Packages are collections of functions that extend R's capabilities
# We load them at the start of EVERY session
library(tidyverse) # Data wrangling and visualization
library(vegan) # Community ecology analyses (diversity, ordination)
library(ape) # Phylogenetic tools and PCoA ordination
# If you get "there is no package called 'x'", install it first:
# install.packages("x")Why these packages?
filter, select,
mutate# R as a calculator
2 + 3 # Addition
10 / 2 # Division
5^2 # Power (5 squared = 25)
sqrt(16) # Square root
# Creating objects with <- (assignment operator)
species_count <- 42 # Store a number
site_name <- "Forest_A" # Store text
# Vectors: collections of values (fundamental to R!)
abundance <- c(12, 45, 23, 8, 67, 34)
species <- c("Carabidae_sp1", "Carabidae_sp2", "Staphylinidae_sp1")
# Basic operations on vectors
length(abundance) # How many elements? β 6
sum(abundance) # Total β 189
mean(abundance) # Average β 31.5
max(abundance) # Maximum β 67
min(abundance) # Minimum β 8
# Extracting elements (indexing)
abundance[1] # First element β 12
abundance[c(1, 3, 5)] # Elements 1, 3, and 5
abundance[abundance > 30] # Only values > 30 β 45, 67, 34# Import our main dataset
# Make sure the file is in your working directory or provide full path
insect_data <- read.csv("data/raw/sample_insect_data.csv")
# More robust import with options for common issues:
insect_data <- read.csv(
"data/raw/sample_insect_data.csv",
header = TRUE, # First row contains column names
stringsAsFactors = FALSE, # Keep text as text (not factors)
na.strings = c("", "NA", "N/A", "-") # Recognize these as missing data
)Why explore data immediately?
Before any analysis, you need to know:
Never trust data blindly - always look at it first!
# First look at your data
head(insect_data) # First 6 rows
tail(insect_data) # Last 6 rows
View(insect_data) # Spreadsheet view (capital V!)
# Structure and dimensions
str(insect_data) # Column types and preview
dim(insect_data) # Rows Γ columns
nrow(insect_data) # Number of rows
ncol(insect_data) # Number of columns
names(insect_data) # Column names
# Summary statistics
summary(insect_data) # Statistical summary of each column# Missing values (NA = "Not Available")
sum(is.na(insect_data)) # Total NAs in entire dataset
colSums(is.na(insect_data)) # NAs per column
# Check categorical columns for typos
unique(insect_data$habitat) # What habitat values exist?
unique(insect_data$order) # What insect orders are present?
# Frequency tables
table(insect_data$habitat) # How many records per habitat?
table(insect_data$order) # How many records per order?
# Check numeric ranges for impossible values
range(insect_data$abundance) # Min and max abundance
summary(insect_data$abundance) # Full summary
# Look for suspicious values
insect_data[insect_data$abundance < 0, ] # Negative abundances?β BREAK: 10:15 - 10:30
Time: 10:30 - 11:30 (60 minutes)
Why use pipes (%>%)?
Pipes make code readable by turning nested operations into a
step-by-step recipe. Read %>% as
βTHENβ.
Instead of: βBake(Mix(Measure(flour, sugar), eggs))β
Write: βTake flour and sugar, THEN measure, THEN mix with eggs, THEN bakeβ
library(tidyverse)
# Without pipe - confusing nested functions:
round(mean(sqrt(c(1, 4, 9, 16))), 2)
# With pipe - clear step-by-step:
c(1, 4, 9, 16) %>% # Start with these numbers
sqrt() %>% # THEN take square root
mean() %>% # THEN calculate mean
round(2) # THEN round to 2 decimals
# Keyboard shortcut: Ctrl + Shift + M (Windows) or Cmd + Shift + M (Mac)Think of these as actions you perform on your data:
| Verb | Action | Example Use |
|---|---|---|
filter() |
Keep rows that match a condition | Keep only forest sites |
select() |
Keep or remove columns | Keep only site, species, abundance |
mutate() |
Create new columns | Calculate log-transformed abundance |
arrange() |
Sort rows | Sort by abundance (high to low) |
summarise() |
Collapse to summary statistics | Calculate mean abundance |
group_by() |
Group data for operations | Calculate mean BY habitat |
# Keep only forest sites
insect_data %>%
filter(habitat == "forest")
# Multiple conditions with AND (comma or &)
insect_data %>%
filter(habitat == "forest", abundance > 10)
# Multiple options with OR (|) or %in%
insect_data %>%
filter(habitat %in% c("forest", "grassland"))
# Exclude with !=
insect_data %>%
filter(habitat != "agriculture")# Keep specific columns
insect_data %>%
select(site, habitat, morphospecies, abundance)
# Remove columns with minus sign
insect_data %>%
select(-family)
# Select helpers for patterns
insect_data %>%
select(starts_with("site")) # Columns starting with "site"
insect_data %>%
select(contains("species")) # Columns containing "species"Why is this so powerful?
In ecology, we almost always want to compare GROUPS (habitats,
treatments, sites). group_by() tells R to do calculations
separately for each group.
# Calculate summary statistics BY habitat
insect_data %>%
group_by(habitat) %>%
summarise(
total_abundance = sum(abundance),
n_species = n_distinct(morphospecies),
n_sites = n_distinct(site),
mean_abundance = mean(abundance),
.groups = "drop" # Clean up grouping after summarise
)
# Multiple grouping variables
insect_data %>%
group_by(habitat, order) %>%
summarise(
total_abundance = sum(abundance),
n_species = n_distinct(morphospecies),
.groups = "drop"
)Ecological Data Formats:
Long format: One row per observation (site-species-abundance). Good for: data entry, plotting, dplyr operations.
Wide format (Community Matrix): Rows = sites, Columns = species. Required for: vegan package, ordination, diversity indices.
Most raw data is long; most analyses need wide. Youβll convert between them constantly!
# LONG TO WIDE: Create a community matrix
species_wide <- insect_data %>%
group_by(site, morphospecies) %>%
summarise(abundance = sum(abundance), .groups = "drop") %>%
pivot_wider(
names_from = morphospecies, # Species become columns
values_from = abundance, # Cell values are abundances
values_fill = 0 # Missing = 0 (species absent)
)
head(species_wide)
# WIDE TO LONG: For plotting
species_long <- species_wide %>%
pivot_longer(
cols = -site, # All columns except site
names_to = "species", # Column names go to "species"
values_to = "abundance" # Values go to "abundance"
)π½οΈ LUNCH: 11:30 - 12:30
Time: 12:30 - 13:45 (75 minutes)
βGarbage In, Garbage Outβ
Taxonomic data is notoriously messy. Common problems:
Your cleaning decisions affect your conclusions. Document what you did!
# Check for problems first
unique(insect_data$habitat) # Look for typos, case issues
# Common cleaning operations
clean_data <- insect_data %>%
mutate(
# Standardize text: lowercase and remove extra spaces
habitat = tolower(trimws(habitat)),
order = tolower(trimws(order)),
# Fix specific typos (if any exist)
habitat = case_when(
habitat == "forrest" ~ "forest", # Common typo
habitat == "grasland" ~ "grassland",
TRUE ~ habitat # Keep all others as-is
)
) %>%
# Remove impossible or missing values
filter(
abundance >= 0, # No negative abundances
!is.na(abundance), # No missing abundances
!is.na(morphospecies) # No missing species IDs
)
# Verify cleaning worked
unique(clean_data$habitat)What is a Community Matrix?
The community matrix is the fundamental data structure in community ecology:
Species_A Species_B Species_C Species_D
Site_1 12 5 0 8
Site_2 0 23 15 2
Site_3 7 0 0 45
This matrix is what ordination and diversity analyses use!
# For this workshop, we'll focus on beetles (Coleoptera)
# We'll learn HOW to choose focal taxa in Session 4
comm_matrix <- insect_data %>%
filter(order == "Coleoptera") %>% # Focus on beetles
group_by(site, morphospecies) %>%
summarise(abundance = sum(abundance), .groups = "drop") %>%
pivot_wider(
names_from = morphospecies,
values_from = abundance,
values_fill = 0 # CRITICAL: Missing species = 0, not NA
) %>%
column_to_rownames("site") # Site names become row names
# Check the result
dim(comm_matrix) # Sites Γ Species
head(comm_matrix[, 1:5]) # First 5 speciesWhy values_fill = 0?
In ecology, if we didnβt record a species at a site, we assume it was absent (abundance = 0), not βunknownβ (NA).
This is an important assumption! It assumes: - Equal sampling effort across sites - Species detection is reliable
If sampling effort varied, you should use rarefaction (Session 5).
# Environmental data: one row per site with habitat info
env_data <- insect_data %>%
select(site, habitat, landscape) %>%
distinct() %>% # One row per site
column_to_rownames("site") # Site names as row names
# CRITICAL: Ensure rows match between matrices!
env_data <- env_data[rownames(comm_matrix), , drop = FALSE]
# rownames(comm_matrix): Extracts the unique IDs (like "S01", "S02") from your insect data.
# env_data[..., ]: Selects and reorders rows in the environmental table based on those IDs
# , (The blank space): Tells R to keep all columns (e.g., habitat, elevation, distance).
# drop = FALSE -- A "defensive" command. It ensures that even if you only have one environmental variable, R keeps the data as a data frame instead of turning it into a simple list (vector)..
# Verify alignment (MUST be TRUE!)
all(rownames(comm_matrix) == rownames(env_data))β BREAK: 13:45 - 14:00
Time: 14:00 - 15:00 (60 minutes)
The Central Question of This Session:
Your pitfall traps caught multiple insect orders (Coleoptera, Hymenoptera, Araneae, etc.). Which groups should you analyze in detail?
Not all groups are equal! Some have ecological stories to tell; others are just noise.
Why Not Analyze Everything?
Good focal taxa have:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DECISION FLOWCHART: CHOOSING FOCAL INSECT GROUPS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
For each insect ORDER in your data:
β
βΌ
βββββββββββββββββββββββββββββββββ
β CRITERION 1: ABUNDANCE β
β Total individuals β₯ 50? β
βββββββββββββββββ¬ββββββββββββββββ
β
NO βββββββββ΄ββββββββΊ YES
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββββββββββββββββ
β EXCLUDE β β CRITERION 2: DIVERSITY β
β Too few data β β Number of morphospecies β₯ 5? β
β for statistics β βββββββββββββββββ¬ββββββββββββββββ
βββββββββββββββββββ β
NO βββββββββ΄ββββββββΊ YES
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββββββββββββββββ
β POSSIBLE but β β CRITERION 3: OCCURRENCE β
β Limited β β Present in β₯ 50% of sites? β
β (can only β βββββββββββββββββ¬ββββββββββββββββ
β analyze abund- β β
β ance, not β NO βββββββββ΄ββββββββΊ YES
β diversity) β β β
βββββββββββββββββββ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββββββββββββββββ
β CAUTION β β CRITERION 4: VARIATION β
β May be habitat β β Shows differences between β
β specialist or β β habitats? (CV > 30%) β
β rare β βββββββββββββββββ¬ββββββββββββββββ
βββββββββββββββββββ β
NO βββββββββ΄ββββββββΊ YES
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β POSSIBLE β β EXCELLENT β
β May be β β FOCAL TAXON! β
β generalist β β β
β (still β β High priority β
β interesting!) β β for analysis β
βββββββββββββββββββ βββββββββββββββββββ
# What do we have?
cat("=== DATASET OVERVIEW ===\n")
cat("Total records:", nrow(insect_data), "\n")
cat("Total individuals:", sum(insect_data$abundance), "\n")
cat("Number of sites:", n_distinct(insect_data$site), "\n")
cat("Number of habitats:", n_distinct(insect_data$habitat), "\n")
cat("Number of orders:", n_distinct(insect_data$order), "\n")
cat("Number of morphospecies:", n_distinct(insect_data$morphospecies), "\n")
# How are individuals distributed across orders?
insect_data %>%
group_by(order) %>%
summarise(
total_abundance = sum(abundance),
percent_of_total = round(sum(abundance) / sum(insect_data$abundance) * 100, 1)
) %>%
arrange(desc(total_abundance))# Calculate all criteria for each order
order_evaluation <- insect_data %>%
group_by(order) %>%
summarise(
# Criterion 1: Abundance
total_abundance = sum(abundance),
# Criterion 2: Diversity
n_morphospecies = n_distinct(morphospecies),
# Criterion 3: Occurrence
n_sites_present = n_distinct(site[abundance > 0]),
pct_sites = round(n_distinct(site[abundance > 0]) / n_distinct(insect_data$site) * 100, 0),
.groups = "drop"
) %>%
arrange(desc(total_abundance))
print(order_evaluation)
# Visualize: Which orders are most abundant?
ggplot(order_evaluation, aes(x = reorder(order, total_abundance),
y = total_abundance)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = paste0("n=", n_morphospecies, " spp")),
hjust = -0.1, size = 3) +
coord_flip() +
labs(
x = "Order",
y = "Total Abundance",
title = "Which Insect Orders Dominate Your Samples?",
subtitle = "Numbers show morphospecies count"
) +
theme_bw()# For Criterion 4: Do abundances vary between habitats?
habitat_variation <- insect_data %>%
group_by(order, habitat) %>%
summarise(abundance = sum(abundance), .groups = "drop") %>%
group_by(order) %>%
summarise(
# Coefficient of Variation: higher = more variation between habitats
cv_abundance = round(sd(abundance) / mean(abundance) * 100, 1),
# Which habitat has most individuals?
dominant_habitat = habitat[which.max(abundance)],
# How much more in best vs worst habitat?
max_min_ratio = round(max(abundance) / (min(abundance) + 1), 1),
.groups = "drop"
)
print(habitat_variation)Interpreting Coefficient of Variation (CV):
Both patterns are ecologically interesting! Generalists tell us habitats are functionally similar for that group; specialists show habitat filtering.
# Combine criteria and score each order
focal_taxa_scores <- order_evaluation %>%
left_join(habitat_variation, by = "order") %>%
mutate(
# Score each criterion (0-3 points each)
score_abundance = case_when(
total_abundance >= 100 ~ 3,
total_abundance >= 50 ~ 2,
total_abundance >= 30 ~ 1,
TRUE ~ 0
),
score_diversity = case_when(
n_morphospecies >= 10 ~ 3,
n_morphospecies >= 5 ~ 2,
n_morphospecies >= 3 ~ 1,
TRUE ~ 0
),
score_occurrence = case_when(
pct_sites >= 80 ~ 3,
pct_sites >= 50 ~ 2,
pct_sites >= 30 ~ 1,
TRUE ~ 0
),
score_variation = case_when(
cv_abundance >= 50 ~ 3,
cv_abundance >= 30 ~ 2,
cv_abundance >= 15 ~ 1,
TRUE ~ 0
),
# Total score (max = 12)
total_score = score_abundance + score_diversity + score_occurrence + score_variation,
# Final recommendation
recommendation = case_when(
total_score >= 10 ~ "β
β
β
Highly recommended",
total_score >= 7 ~ "β
β
Recommended",
total_score >= 4 ~ "β
Possible with caution",
TRUE ~ "β Not recommended"
)
) %>%
arrange(desc(total_score))
# Display decision table
cat("\n========================================\n")
cat(" FOCAL TAXA RECOMMENDATION TABLE\n")
cat("========================================\n\n")
focal_taxa_scores %>%
select(order, total_abundance, n_morphospecies, pct_sites,
cv_abundance, total_score, recommendation) %>%
print(n = Inf)# Create visual summary of recommendations
ggplot(focal_taxa_scores, aes(x = reorder(order, total_score),
y = total_score,
fill = recommendation)) +
geom_col() +
geom_hline(yintercept = c(4, 7, 10), linetype = "dashed", alpha = 0.5) +
coord_flip() +
scale_fill_manual(values = c(
"β
β
β
Highly recommended" = "#27ae60",
"β
β
Recommended" = "#3498db",
"β
Possible with caution" = "#f39c12",
"β Not recommended" = "#95a5a6"
)) +
labs(
x = "Order",
y = "Suitability Score (max = 12)",
title = "Which Insect Groups Should You Analyze?",
subtitle = "Based on abundance, diversity, occurrence, and habitat variation",
fill = "Recommendation"
) +
theme_bw() +
theme(legend.position = "bottom")Common Pitfall Trap Groups and Their Ecological Value:
| Group | Pitfall Suitability | Ecological Role | Indicator Value |
|---|---|---|---|
| Carabidae (ground beetles) | β β β Excellent | Predators, seed eaters | High - respond to disturbance, moisture |
| Formicidae (ants) | β β β Excellent | Ecosystem engineers | High - sensitive to temperature, disturbance |
| Araneae (spiders) | β β Good | Predators | High - indicate prey availability |
| Staphylinidae (rove beetles) | β β Good | Predators, decomposers | Moderate - very diverse |
| Scarabaeidae (dung beetles) | β β Good | Decomposers | High - depend on mammal presence |
| Orthoptera (grasshoppers) | β Moderate | Herbivores | Moderate - vegetation-dependent |
| Diptera (flies) | β Poor | Various | Low - mostly fly away from traps |
| Lepidoptera (moths) | β Poor | Herbivores, pollinators | Low - rarely caught in pitfalls |
# Once you've chosen a focal group (e.g., Coleoptera), preview the patterns
# Create community matrix for chosen group
focal_matrix <- insect_data %>%
filter(order == "Coleoptera") %>% # Replace with your chosen order
group_by(site, morphospecies) %>%
summarise(abundance = sum(abundance), .groups = "drop") %>%
pivot_wider(names_from = morphospecies, values_from = abundance, values_fill = 0) %>%
column_to_rownames("site")
# Get environmental data
focal_env <- insect_data %>%
select(site, habitat, landscape) %>%
distinct() %>%
column_to_rownames("site")
focal_env <- focal_env[rownames(focal_matrix), ]
# Quick NMDS preview (we'll learn this properly tomorrow!)
library(vegan)
set.seed(123)
nmds_preview <- metaMDS(focal_matrix, distance = "bray", k = 2, trymax = 50)
# Plot preview
nmds_scores <- as.data.frame(scores(nmds_preview, display = "sites"))
nmds_scores$habitat <- focal_env$habitat
ggplot(nmds_scores, aes(x = NMDS1, y = NMDS2, color = habitat)) +
geom_point(size = 4) +
stat_ellipse(linetype = 2) +
labs(
title = "Preview: Do Beetle Communities Differ by Habitat?",
subtitle = paste("NMDS stress =", round(nmds_preview$stress, 3),
"- We'll interpret this properly tomorrow!"),
color = "Habitat"
) +
scale_color_brewer(palette = "Set1") +
theme_bw() +
coord_equal()End of Day 1 Checkpoint:
Before tomorrow, you should have:
Objects weβll use tomorrow:
insect_data - The raw datacomm_matrix - Community matrix for your focal
groupenv_data - Environmental data matching the community
matrixHappy coding and happy bug hunting! π