Reviewed by: Adam B. Mitchell, Matthew Sato, Tyler McMahon, and Addison Singleton

Presentation and Poster References

Cole, M., P. Lindeque, C. Halsband, and T. S. Galloway. 2011. Microplastics as contaminants in the marine environment: a review. Marine Pollution Bulletin 62:2588–2597.

Andrady, A. L. 2011. Microplastics in the marine environment. Marine Pollution Bulletin 62:1596–1605.

McGeoch, M. A. 1998. The selection, testing and application of terrestrial insects as bioindicators. Biological Reviews of the Cambridge Philosophical Society 73:181–201.

Hilbe, J. M. 2011. Negative binomial regression, 2nd ed. Negative binomial regression, 2nd ed, Cambridge University Press, New York, NY, US.

Data Import and Packages

# DATA IMPORT
setwd("C:/Data/R")
swesa26 <- read.csv("swesa26.csv")

# LOAD PACKAGES
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'tibble' was built under R version 4.3.3
## Warning: package 'tidyr' was built under R version 4.3.3
## Warning: package 'readr' was built under R version 4.3.3
## Warning: package 'purrr' was built under R version 4.3.3
## Warning: package 'dplyr' was built under R version 4.3.3
## Warning: package 'stringr' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'

Quality Control and Data Cleaning

# Data structure check
head(swesa26) # quick review of data structure
##    site transect       order         family functional_group count dryweight
## 1 PRBMF        1 Hymenoptera     Formicidae      Opportunist   397    0.0595
## 2 PRBMF        1  Coleoptera Mycetophagidae           Grazer    20    0.0052
## 3 PRBMF        1  Coleoptera      Carabidae Pursuit Predator     2    0.0144
## 4 PRBMF        1  Coleoptera     Scarabidae       Decomposer     1    0.0424
## 5 PRBMF        1 Hymenoptera   Pteromalidae       Pollinator     1   <0.0001
## 6 PRBMF        1     Araneae     Salticidae Pursuit Predator    23    0.0119
##   total_mp fragments fibers films microbeads
## 1      280       278      2     0          0
## 2      174       142     32     0          0
## 3      525       517      7     0          1
## 4       24        18      8     0          0
## 5      217       212      4     1          0
## 6      581       576      5     0          0
str(swesa26) # 79 observations of 12 variables, correct
## 'data.frame':    79 obs. of  12 variables:
##  $ site            : chr  "PRBMF" "PRBMF" "PRBMF" "PRBMF" ...
##  $ transect        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ order           : chr  "Hymenoptera" "Coleoptera" "Coleoptera" "Coleoptera" ...
##  $ family          : chr  "Formicidae" "Mycetophagidae" "Carabidae" "Scarabidae" ...
##  $ functional_group: chr  "Opportunist" "Grazer" "Pursuit Predator" "Decomposer" ...
##  $ count           : int  397 20 2 1 1 23 1 1 2 27 ...
##  $ dryweight       : chr  "0.0595" "0.0052" "0.0144" "0.0424" ...
##  $ total_mp        : int  280 174 525 24 217 581 12 33 35 28 ...
##  $ fragments       : int  278 142 517 18 212 576 12 30 27 17 ...
##  $ fibers          : int  2 32 7 8 4 5 0 3 8 11 ...
##  $ films           : int  0 0 0 0 1 0 0 0 0 0 ...
##  $ microbeads      : int  0 0 1 0 0 0 0 0 0 0 ...
names(swesa26) # columns names, correct
##  [1] "site"             "transect"         "order"            "family"          
##  [5] "functional_group" "count"            "dryweight"        "total_mp"        
##  [9] "fragments"        "fibers"           "films"            "microbeads"
colSums(is.na(swesa26)) # 16 missing values, under dryweight, fix error
##             site         transect            order           family 
##                0                0                0                0 
## functional_group            count        dryweight         total_mp 
##                0                0                0                0 
##        fragments           fibers            films       microbeads 
##                0                0                0                0
  sort(unique(swesa26$dryweight))
##  [1] "<0.0001" "0.0001"  "0.0002"  "0.0003"  "0.0005"  "0.0006"  "0.0007" 
##  [8] "0.0008"  "0.0013"  "0.0014"  "0.0015"  "0.0016"  "0.0017"  "0.0018" 
## [15] "0.002"   "0.0021"  "0.0023"  "0.0027"  "0.0031"  "0.0052"  "0.0058" 
## [22] "0.0068"  "0.0119"  "0.0139"  "0.0144"  "0.0277"  "0.0295"  "0.0424" 
## [29] "0.0553"  "0.0595"  "0.0599"  "0.0604"  "0.0810"  "0.1267"  "0.2044" 
## [36] "0.3240"  "0.5478"
# R cannot read <0.0001, replace with half the detection limit
  swesa26$dryweight <- as.character(swesa26$dryweight)
  swesa26$dryweight[grepl("<", swesa26$dryweight)] <- "0.00005"
  swesa26$dryweight <- as.numeric(swesa26$dryweight)
  summary(swesa26$dryweight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00005 0.00010 0.00130 0.02195 0.00415 0.54780
  colSums(is.na(swesa26))
##             site         transect            order           family 
##                0                0                0                0 
## functional_group            count        dryweight         total_mp 
##                0                0                0                0 
##        fragments           fibers            films       microbeads 
##                0                0                0                0
# Atomic Data
swesa26$functional_group <- as.factor(swesa26$functional_group)
swesa26$order <- as.factor(swesa26$order)
swesa26$family <- as.factor(swesa26$family)
swesa26$dryweight <- as.numeric(swesa26$dryweight)
swesa26$count <- as.numeric(swesa26$count)
swesa26$total_mp <- as.numeric(swesa26$total_mp)
swesa26$fragments <- as.numeric(swesa26$fragments)
swesa26$fibers <- as.numeric(swesa26$fibers)
swesa26$films <- as.numeric(swesa26$films)
swesa26$microbeads <- as.numeric(swesa26$microbeads)

# Detailed Descriptions of the Variables
# functional_group (factor): Designated as pursuit predator, ambush predator, fluid feeder, grazer, pollinator, opportunist, and variable. The variable functional group designation indicates a group where most, if not all, listed functional groups might be represented by the taxa included.
# family (factor): Taxonomic family.
# dryweight (numeric): Measured in grams in the CSV file, but changed to milligrams in this s script for ease of figure interpretation.
# count (numeric): The number of individuals in a taxonomic family sample from one sampling location.
# total_mp (numeric): The total number of microplastics in one taxonomic family from one sampling location. Includes fragment, fiber, film, and microbead categories.
# fragments (numeric): Microplastic category.
# fibers (numeric): Microplastic category.
# films (numeric): Microplastic category.
# microbeads (numeric): Microplastic category.

# Data structure check
str(swesa26)
## 'data.frame':    79 obs. of  12 variables:
##  $ site            : chr  "PRBMF" "PRBMF" "PRBMF" "PRBMF" ...
##  $ transect        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ order           : Factor w/ 16 levels "Acari","Amphipoda",..: 9 4 4 4 9 3 8 8 4 1 ...
##  $ family          : Factor w/ 43 levels "Acari","Anthicidae",..: 21 29 7 37 33 36 3 16 41 1 ...
##  $ functional_group: Factor w/ 9 levels "Ambush Predator",..: 6 4 8 2 7 8 5 3 6 9 ...
##  $ count           : num  397 20 2 1 1 23 1 1 2 27 ...
##  $ dryweight       : num  0.0595 0.0052 0.0144 0.0424 0.00005 0.0119 0.0016 0.0018 0.0003 0.0005 ...
##  $ total_mp        : num  280 174 525 24 217 581 12 33 35 28 ...
##  $ fragments       : num  278 142 517 18 212 576 12 30 27 17 ...
##  $ fibers          : num  2 32 7 8 4 5 0 3 8 11 ...
##  $ films           : num  0 0 0 0 1 0 0 0 0 0 ...
##  $ microbeads      : num  0 0 1 0 0 0 0 0 0 0 ...
# Response variable check
summary(swesa26$total_mp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    12.0    32.5    59.0   137.0   169.0   982.0
hist(swesa26$total_mp) # negatively skewed distribution

# Samples per functional group
table(swesa26$functional_group) # data error! "Opportinist"
## 
##  Ambush Predator       Decomposer     Fluid Feeder           Grazer 
##                3               13               11                6 
##      Opportinist      Opportunist       Pollinator Pursuit Predator 
##                1               17                8               11 
##         Variable 
##                9
# Fix data error
swesa26$functional_group <- as.character(swesa26$functional_group)
swesa26$functional_group <- trimws(swesa26$functional_group)
swesa26$functional_group[swesa26$functional_group == "Opportinist"] <- "Opportunist"
swesa26$functional_group <- as.factor(swesa26$functional_group)

unique(swesa26$functional_group)
## [1] Opportunist      Grazer           Pursuit Predator Decomposer      
## [5] Pollinator       Fluid Feeder     Variable         Ambush Predator 
## 8 Levels: Ambush Predator Decomposer Fluid Feeder Grazer ... Variable
table(swesa26$functional_group) # error fixed!
## 
##  Ambush Predator       Decomposer     Fluid Feeder           Grazer 
##                3               13               11                6 
##      Opportunist       Pollinator Pursuit Predator         Variable 
##               18                8               11                9
# Change grams to milligrams for clarity
swesa26$dryweight_mg <- swesa26$dryweight * 1000
summary(swesa26$dryweight_mg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.05    0.10    1.30   21.95    4.15  547.80
view(swesa26)

# Check for zeros in predictor (dryweight_mg) and response (total_mp) variables
sum(swesa26$total_mp == 0) # no zeros [1] 0
## [1] 0
sum(swesa26$dryweight_mg == 0) # no zeroes [1] 0
## [1] 0

Is greater total biomass indicative of higher MP load?

model_biomass <- glm.nb(total_mp ~ dryweight_mg, data = swesa26)
model_biomass_null <- glm.nb(total_mp ~ 1, data = swesa26)
summary(model_biomass)
## 
## Call:
## glm.nb(formula = total_mp ~ dryweight_mg, data = swesa26, init.theta = 1.269939025, 
##     link = log)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  4.537536   0.104645  43.361  < 2e-16 ***
## dryweight_mg 0.008854   0.001325   6.685 2.31e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(1.2699) family taken to be 1)
## 
##     Null deviance: 125.671  on 78  degrees of freedom
## Residual deviance:  88.168  on 77  degrees of freedom
## AIC: 909.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  1.270 
##           Std. Err.:  0.184 
## 
##  2 x log-likelihood:  -903.803
# ANOVA testing
anova(model_biomass_null, model_biomass, test = "Chisq")
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: total_mp
##          Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
## 1            1 0.9210679        78       -935.5962                      
## 2 dryweight_mg 1.2699390        77       -903.8026 1 vs 2     1 31.79357
##       Pr(Chi)
## 1            
## 2 1.71459e-08
# create figure
objective_1 <- ggplot(swesa26, aes(x = dryweight_mg, y = total_mp)) +
  geom_point(alpha = 0.7, size = 2, color = "cyan4") +
  geom_smooth(method = "glm.nb", method.args = list(family = "poisson"), se = TRUE, color = "red4", linewidth = 1) +
  scale_x_log10() +
  scale_y_log10() +
  theme_classic(base_size = 13) +
  labs(
    x = "Dry weight (mg)",
    y = "Total microplastics") +
  theme(
    axis.title = element_text(face = "bold"),
    axis.text = element_text(size = 11))

print(objective_1)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Failed to fit group -1.
## Caused by error in `glm.control()`:
## ! unused argument (family = "poisson")

Microplastic abundance increased significantly with biomass (negative binomial GLM; likelihood ratio test: χ² = 31.79, df = 1, p < 0.001).

Do functional groups differ in MP load?

model_guild_null <- glm.nb(total_mp ~ 1, data = swesa26)
model_guild_mass <- glm.nb(total_mp ~ dryweight_mg, data = swesa26)
model_guild_full <- glm.nb(total_mp ~ functional_group + dryweight_mg, data = swesa26)

# ANOVA testing
anova(model_guild_null, model_guild_mass, test = "Chisq")
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: total_mp
##          Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
## 1            1 0.9210679        78       -935.5962                      
## 2 dryweight_mg 1.2699390        77       -903.8026 1 vs 2     1 31.79357
##       Pr(Chi)
## 1            
## 2 1.71459e-08
anova(model_guild_mass, model_guild_full, test = "Chisq")
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: total_mp
##                             Model    theta Resid. df    2 x log-lik.   Test
## 1                    dryweight_mg 1.269939        77       -903.8026       
## 2 functional_group + dryweight_mg 1.762277        70       -873.3255 1 vs 2
##      df LR stat.      Pr(Chi)
## 1                            
## 2     7 30.47711 7.760854e-05
# create figure
objective_2 <- ggplot(swesa26, aes(functional_group, total_mp, fill = functional_group)) +
  geom_boxplot(outlier.shape = NA, color = "black", alpha = 0.85) +
  geom_jitter(width = 0.2, color = "black", alpha = 0.85) +
  scale_y_log10() +
  scale_fill_manual(values = c(
    "Ambush Predator"   = "#FD5901",
    "Decomposer"        = "#F78104",
    "Fluid Feeder"      = "#FAAB36",
    "Grazer"            = "#008062",
    "Opportunist"       = "#008083",
    "Pollinator"        = "#005F60",
    "Pursuit Predator"  = "#003738",
    "Variable"          = "#BDBDBD"
  )) +
  theme_classic(base_size = 13) +
  xlab("Functional Group") +
  ylab("Total Microplastics") +
  theme(
    axis.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")

print(objective_2)

Both biomass and functional group significantly influenced microplastic abundance. Functional group explained additional variation beyond biomass (χ² = 30.48, df = 7, p < 0.001).

Do predaceous groups (ambush predators and pursuit predators) accumulate more MPs than herbivorous groups (grazers and fluid feeders) while accounting for mass?

# create predator and herbivore categories
swesa26$feeding_category <- NA
swesa26$feeding_category[swesa26$functional_group %in% 
                           c("Ambush Predator","Pursuit Predator")] <- "Predator"
swesa26$feeding_category[swesa26$functional_group %in% 
                           c("Fluid Feeder","Grazer")] <- "Herbivore"
swesa26$feeding_category <- as.factor(swesa26$feeding_category)
table(swesa26$feeding_category)
## 
## Herbivore  Predator 
##        17        14
pred_herb <- subset(swesa26, feeding_category %in% c("Predator","Herbivore"))
table(pred_herb$feeding_category)
## 
## Herbivore  Predator 
##        17        14
# create models
model_predherb_null <- glm.nb(total_mp ~ 1, data = pred_herb)
model_predherb_mass <- glm.nb(total_mp ~ dryweight_mg, data = pred_herb)
model_predherb <- glm.nb(total_mp ~ feeding_category + dryweight_mg, data = pred_herb)

pred_herb$mp_per_mg <- pred_herb$total_mp / pred_herb$dryweight_mg
lm(mp_per_mg ~ feeding_category, data = pred_herb) # checks MPs per gram of biomass
## 
## Call:
## lm(formula = mp_per_mg ~ feeding_category, data = pred_herb)
## 
## Coefficients:
##              (Intercept)  feeding_categoryPredator  
##                    228.1                    -111.7
# ANOVA testing
anova(model_predherb_null, model_predherb_mass, test = "Chisq")
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: total_mp
##          Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
## 1            1 0.8199511        30       -380.1502                      
## 2 dryweight_mg 1.2774899        29       -362.7469 1 vs 2     1 17.40327
##        Pr(Chi)
## 1             
## 2 3.023056e-05
anova(model_predherb_mass, model_predherb, test = "Chisq")
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: total_mp
##                             Model    theta Resid. df    2 x log-lik.   Test
## 1                    dryweight_mg 1.277490        29       -362.7469       
## 2 feeding_category + dryweight_mg 1.633791        28       -353.6988 1 vs 2
##      df LR stat.     Pr(Chi)
## 1                           
## 2     1 9.048097 0.002629685
# create figure
objective_3 <- ggplot(pred_herb, aes(x = feeding_category, y = total_mp, fill = feeding_category)) +
  geom_boxplot(width = 0.55, outlier.shape = NA, color = "black", linewidth = 0.5) +
  geom_jitter(width = 0.12, height = 0, alpha = 0.7, size = 2) +
  scale_fill_manual(values = c("Herbivore" = "#008062", "Predator" = "#FD5901")) +
  theme_classic(base_size = 13) +
  labs(x = "Feeding category", y = "Total microplastics, controlling for biomass") +
  theme(
    legend.position = "none",
    axis.title = element_text(face = "bold"))

print(objective_3)

Microplastic abundance was significantly influenced by both biomass and feeding category. Biomass alone significantly improved model fit (likelihood ratio test: χ² = 17.40, df = 1, p < 0.001), and the addition of feeding category further improved the model (χ² = 9.05, df = 1, p = 0.003), indicating that predators contain significantly higher microplastic loads than herbivores even after controlling for biomass.

Do predaceous groups contain more MPs per gram of weight than herbivorous groups?

# create models
model_predherb_offset_null <- glm.nb(total_mp ~ offset(log(dryweight_mg)), data = pred_herb)
model_predherb_mp <- glm.nb(total_mp ~ feeding_category + offset(log(dryweight_mg)), data = pred_herb)

# ANOVA testing
anova(model_predherb_offset_null, model_predherb_mp, test = "Chisq")
## Likelihood ratio tests of Negative Binomial Models
## 
## Response: total_mp
##                                          Model     theta Resid. df
## 1                    offset(log(dryweight_mg)) 0.5374813        30
## 2 feeding_category + offset(log(dryweight_mg)) 0.5639459        29
##      2 x log-lik.   Test    df LR stat.  Pr(Chi)
## 1       -398.4307                               
## 2       -396.4601 1 vs 2     1 1.970564 0.160388
# create figure
objective_4 <- ggplot(pred_herb, aes(x = feeding_category, y = mp_per_mg, fill = feeding_category)) +
  geom_boxplot(width = 0.55, outlier.shape = NA, color = "black", linewidth = 0.5) +
  geom_jitter(width = 0.12, height = 0, alpha = 0.7, size = 2) +
  scale_fill_manual(values = c("Herbivore" = "#008062", "Predator" = "#FD5901")) +
  theme_classic(base_size = 13) +
  labs(
    x = "Feeding category",
    y = "Microplastics per mg dry weight") +
  theme(
    legend.position = "none",
    axis.title = element_text(face = "bold"))

print(objective_4)

Although predators exhibited significantly higher total microplastic loads than herbivore, feeding category did not significantly influence microplastic abundance per unit biomass (χ² = 1.97, df = 1, p = 0.16), suggesting that increased microplastic loads in predators are driven by biomass rather than trophic transfer.