ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DAY 1: DATA FOUNDATIONS 9:00 - 15:00 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β 09:00 - 10:15 Session 1: Project Setup & R Essentials 75 min β
β 10:15 - 10:30 β Break 15 min β
β 10:30 - 11:30 Session 2: Importing & Exploring Data 60 min β
β 11:30 - 12:30 π½οΈ LUNCH 60 min β
β 12:30 - 13:45 Session 3: Data Wrangling with tidyverse 75 min β
β 13:45 - 14:00 β Break 15 min β
β 14:00 - 15:00 Session 4: Data Exploration & Cleaning 60 min β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β DAY 2: ANALYSIS & INTERPRETATION 9:00 - 15:00 β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β 09:00 - 10:00 Session 5: Visualization with ggplot2 60 min β
β 10:00 - 10:15 β Break 15 min β
β 10:15 - 11:15 Session 6: Diversity Metrics 60 min β
β 11:15 - 12:15 π½οΈ LUNCH 60 min β
β 12:15 - 13:30 Session 7: Multivariate Analysis (NMDS) 75 min β
β 13:30 - 13:45 β Break 15 min β
β 13:45 - 14:45 Session 8: Statistical Testing 60 min β
β 14:45 - 15:00 Wrap-up & Next Steps 15 min β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Before this workshop, you should have completed:
When we sample insects across different habitats or landscapes, weβre fundamentally asking:
βDo different environments support different insect communities, and why?β
This connects to ecological theory:
Our statistical analyses detect these patterns and test hypotheses.
RAW DATA β CLEAN & PREPARE β EXPLORE β ANALYZE β INTERPRET
β β β
β β βββ Alpha diversity
β β βββ NMDS ordination
β β βββ PERMANOVA
β β
β βββ Which taxa to focus on?
β
βββ Community matrix + Environmental data
π KEY CONCEPT: Always work within an R Project. Never use
setwd().
Insect_Ecology_Workshop folder.Rproj fileVerify youβre in the right place:
Insect_Ecology_Workshop/
β
βββ Insect_Ecology_Workshop.Rproj β Always open THIS file!
β
βββ data/
β βββ raw/ β Original data (NEVER modify!)
β β βββ sample_insect_data.csv
β β βββ pollinator_data.csv
β βββ processed/ β Cleaned data goes here
β βββ beetles_clean.csv
β
βββ scripts/ β Your R scripts
β βββ 01_data_import.R
β βββ 02_exploration.R
β βββ 03_diversity.R
β βββ 04_ordination.R
β
βββ output/ β Tables, results
β βββ diversity_table.csv
β
βββ figures/ β Saved plots
βββ nmds_plot.png
βββ diversity_boxplot.pdf
# β BAD - Absolute paths that break on other computers:
setwd("C:/Users/John/Desktop/thesis/chapter2/data")
data <- read.csv("beetles.csv")
# β BAD - Files scattered everywhere:
data <- read.csv("C:/Users/John/Downloads/beetles.csv")
# β
GOOD - Relative paths from project folder:
data <- read.csv("data/raw/beetles.csv")
# Works on ANY computer with the same folder structure!
# β
GOOD - Saving outputs to organized folders:
write.csv(results, "output/diversity_results.csv")
ggsave("figures/nmds_plot.png")Never modify files in
data/raw/!
Your raw data should remain exactly as you received it. If you need to clean or modify data:
data/raw/data/processed/# Read raw data
beetles_raw <- read.csv("data/raw/beetle_survey.csv")
# Clean it
beetles_clean <- beetles_raw %>% # beetles_clean is the name of the new cleaned object.
filter(!is.na(abundance)) %>%
# This function keeps only the rows that meet a certain condition-keeps only the rows where the abundance is not missing.
mutate(habitat = tolower(habitat))
# This function is used to create a new column or modify an existing one.
# tolower() function converts all text in the `habitat` column to **lowercase**.
# Save cleaned version on processed folder
write.csv(beetles_clean, "data/processed/beetles_clean.csv", row.names = FALSE)# Vectors are collections of values
abundances <- c(12, 45, 23, 8, 67, 34)
# Operations on vectors
sum(abundances) # 189
mean(abundances) # 31.5
abundances * 2 # Doubles each element
# Indexing: extract elements with [ ]
abundances[1] # First element: 12
abundances[c(1, 3, 5)] # Elements 1, 3, 5
abundances[abundances > 30] # Elements greater than 30# Data frames are like spreadsheets
beetle_data <- data.frame(
site = c("A", "B", "C", "D"),
habitat = c("forest", "forest", "grassland", "grassland"),
abundance = c(45, 32, 67, 51)
)
# Access columns with $
beetle_data$abundance
beetle_data$habitat
# Access rows with [ row , column ]
beetle_data[1, ] # First row
beetle_data[, 3] # Third column
beetle_data[beetle_data$habitat == "forest", ] # Forest rows onlyCreate a new script for this workshop:
scripts/01_day1_analysis.RScript header template:
#============================================================================
# Insect Community Analysis - Day 1
# Author: [Your Name]
# Date: [Today's Date]
# Description: Import, clean, and explore insect survey data
#============================================================================
# Load packages ----
library(tidyverse)
library(vegan)
# Import data ----
# Data exploration ----
# Analysis ----scripts/practice.Rmy_counts with values 5, 12, 8,
20, 15avg_count# Make sure your data file is in data/raw/
# Import with read.csv()
insect_data <- read.csv("data/raw/sample_insect_data.csv")
# Common options for messy data:
insect_data <- read.csv(
"data/raw/sample_insect_data.csv",
header = TRUE, # First row is column names
stringsAsFactors = FALSE, # Keep text as text
na.strings = c("", "NA", "N/A") # Recognize these as missing
)Always explore data immediately after importing!
# View the first few rows
head(insect_data)
# View in spreadsheet format (capital V!)
View(insect_data)
# Structure: column types and preview
str(insect_data)
# Dimensions: rows Γ columns
dim(insect_data)
nrow(insect_data)
ncol(insect_data)
# Column names
names(insect_data)
# Summary statistics
summary(insect_data)# Missing values
sum(is.na(insect_data)) # Total NAs
colSums(is.na(insect_data)) # NAs per column
# Unique values in categorical columns
unique(insect_data$habitat)
unique(insect_data$order)
# Check for unexpected values
table(insect_data$habitat) # Frequency table
table(insect_data$order)
# Check numeric ranges
range(insect_data$abundance)
summary(insect_data$abundance)# Filter rows by condition
forest_data <- insect_data[insect_data$habitat == "forest", ]
coleoptera <- insect_data[insect_data$order == "Coleoptera", ]
# Select specific columns
selected <- insect_data[, c("site", "habitat", "morphospecies", "abundance")]
# Combine conditions
forest_beetles <- insect_data[
insect_data$habitat == "forest" & insect_data$order == "Coleoptera",
]sample_insect_data.csvThe pipe %>% chains operations together. Read it as
βthenβ.
library(tidyverse)
# Without pipe (nested, hard to read):
round(mean(sqrt(c(1, 4, 9, 16))), 2)
# With pipe (step by step, easy to read):
c(1, 4, 9, 16) %>%
sqrt() %>%
mean() %>%
round(2)
# Read as: "Take 1,4,9,16, THEN sqrt, THEN mean, THEN round"Keyboard shortcut: Ctrl + Shift + M (Windows) or Cmd + Shift + M (Mac)
# Keep only forest sites
insect_data %>%
filter(habitat == "forest")
# Multiple conditions (AND)
insect_data %>%
filter(habitat == "forest", abundance > 10)
# Multiple options (OR)
insect_data %>%
filter(habitat %in% c("forest", "grassland"))
# Exclude
insect_data %>%
filter(habitat != "agriculture")This is extremely powerful!
# Summary by habitat
insect_data %>%
group_by(habitat) %>%
summarise(
total_abundance = sum(abundance),
n_species = n_distinct(morphospecies),
n_sites = n_distinct(site),
.groups = "drop"
)
# Summary by habitat AND order
insect_data %>%
group_by(habitat, order) %>%
summarise(
mean_abundance = mean(abundance),
n_species = n_distinct(morphospecies),
.groups = "drop"
)Long format (one observation per row):
site | species | abundance
------|-------------|----------
S01 | Carabus_sp1 | 12
S01 | Carabus_sp2 | 5
S02 | Carabus_sp1 | 18
Wide format (community matrix):
site | Carabus_sp1 | Carabus_sp2
------|-------------|------------
S01 | 12 | 5
S02 | 18 | 0
# Complete data preparation pipeline
beetle_summary <- insect_data %>%
# Filter to beetles only
filter(order == "Coleoptera") %>%
# Group by site and habitat
group_by(site, habitat, landscape) %>%
# Calculate summary statistics
summarise(
abundance = sum(abundance),
richness = n_distinct(morphospecies),
.groups = "drop"
) %>%
# Add new columns
mutate(
log_abundance = log(abundance + 1)
) %>%
# Sort by richness
arrange(desc(richness))
beetle_summaryUsing insect_data:
Data exploration is detective work. Find the patterns before testing them.
# Quick overview
cat("=== DATASET OVERVIEW ===\n")
cat("Total records:", nrow(insect_data), "\n")
cat("Total individuals:", sum(insect_data$abundance), "\n")
cat("Number of sites:", n_distinct(insect_data$site), "\n")
cat("Number of habitats:", n_distinct(insect_data$habitat), "\n")
cat("Number of orders:", n_distinct(insect_data$order), "\n")
cat("Number of morphospecies:", n_distinct(insect_data$morphospecies), "\n")Not all groups are suitable for analysis. Choose wisely!
| Criterion | Minimum | Why |
|---|---|---|
| Total abundance | β₯ 50 | Statistical power |
| Species richness | β₯ 5 | Diversity to analyze |
| Sites present | β₯ 50% of sites | Not just rare occurrence |
| Habitat variation | CV > 20% | Interesting patterns |
| Ecological relevance | High | Meaningful interpretation |
# Calculate habitat variation
habitat_variation <- insect_data %>%
group_by(order, habitat) %>%
summarise(abundance = sum(abundance), .groups = "drop") %>%
group_by(order) %>%
summarise(
cv_abundance = sd(abundance) / mean(abundance) * 100,
.groups = "drop"
)
# Combine with order summary
order_evaluation <- order_summary %>%
left_join(habitat_variation, by = "order") %>%
mutate(
recommended = total_abundance >= 50 &
n_species >= 5 &
cv_abundance >= 20
)
order_evaluation| Group | Pitfall Suitability | Notes |
|---|---|---|
| Carabidae (ground beetles) | Excellent | Well-studied indicators |
| Formicidae (ants) | Excellent | Colonial, sensitive to disturbance |
| Araneae (spiders) | Good | Predators, hunting guilds |
| Staphylinidae (rove beetles) | Good | Diverse decomposers |
| Orthoptera (grasshoppers) | Moderate | Vegetation-dependent |
| Flying insects | Poor | Undersampled by pitfalls |
# Check for problems
unique(insect_data$habitat) # Look for typos, case issues
# Standardize text
clean_data <- insect_data %>%
mutate(
habitat = tolower(trimws(habitat)), # Lowercase, remove spaces
habitat = case_when( # Fix typos
habitat == "forrest" ~ "forest",
habitat == "grasland" ~ "grassland",
TRUE ~ habitat
)
)
# Handle impossible values
clean_data <- clean_data %>%
filter(abundance >= 0) %>% # Remove negative values
filter(!is.na(abundance)) # Remove missing values# For Coleoptera analysis
comm_matrix <- insect_data %>%
filter(order == "Coleoptera") %>%
group_by(site, morphospecies) %>%
summarise(abundance = sum(abundance), .groups = "drop") %>%
pivot_wider(
names_from = morphospecies,
values_from = abundance,
values_fill = 0
) %>%
column_to_rownames("site")
head(comm_matrix)# Create matching environmental data
env_data <- insect_data %>%
select(site, habitat, landscape) %>%
distinct() %>%
column_to_rownames("site")
# Ensure same order as community matrix
env_data <- env_data[rownames(comm_matrix), , drop = FALSE]
# Verify alignment
all(rownames(comm_matrix) == rownames(env_data)) # Must be TRUE!data/processed/Today you learned:
Happy coding and happy bug hunting!