SEA 2026 start

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1

[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).

Import CSV file (make a box using ctr+alt+i). Right click CSV file and select copy as path.
Replace all “\” with “/” in the path

smallMP<-read.csv("C:/Users/jdesw/Desktop/AEON LAB/Purency Results/smaller than 100/SEA_2026_R/Complete MP small SEA prep.csv")

add libraries

library(tidyverse)

Warning: package 'readr' was built under R version 4.5.2

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Make sure the data is there

glimpse(smallMP)

Rows: 618
Columns: 7
$ Station    <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
$ Polymer    <chr> "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP",…
$ Length     <dbl> 80.27000, 244.54000, 181.17000, 387.27000, 81.08000, 68.337…
$ Width      <dbl> 75.080000, 207.820000, 81.600000, 317.150000, 11.980000, 35…
$ Aspect     <chr> "1.069126265", "1.176691368", "2.220220588", "1.22109412", …
$ Morphology <chr> "Fragment", "Fragment", "Fragment", "Fragment", "Fiber", "F…
$ Colour     <chr> "Orange", "Transluscent", "Transluscent", "Orange", "Transl…

pca_ready <- smallMP %>%
  select(Station, Polymer, Morphology, Length, Width, Aspect) %>%
  mutate(across(c(Length, Width, Aspect),
                ~as.numeric(gsub("[^0-9.]", "", .)))) %>%
  drop_na(Length, Width, Aspect)

What we did here was keep station and polymer attached while cleaning so when rows are removed, everything stays synchronized. pca_ready becomes the main pca dataset.

Now extract only the numeric columns for the pca

pca_data <- pca_ready %>%
  select(Length, Width, Aspect)

Then re scale then run the pca

pca_scaled <- scale(pca_data)
pca <- prcomp(pca_scaled)

Now we can extract the scores and the row counts will be the same

scores <- as.data.frame(pca$x)

Try and attach metadata again:

scores$Polymer <- pca_ready$Polymer
scores$Station <- pca_ready$Station
scores$Morphology <- pca_ready$Morphology

In the future to make sure everything has the right amount of rows, we can use the nrows(data) fro any data sheet to see if they are equal.

Now lets plot it

ggplot(scores, aes(PC1, PC2, colour = Polymer))+
  geom_point(size = 3, alpha = 0.7)+
  theme_minimal()

ggplot(scores, aes(PC1, PC2, colour = Station))+
  geom_point(size = 3, alpha = 0.7)+
  theme_minimal()

We must understand what the PCs mean.

pca$rotation

             PC1         PC2        PC3
Length 0.7107533 -0.05378882  0.7013818
Width  0.1227224 -0.97229984 -0.1989277
Aspect 0.6926535  0.22746380 -0.6844643

We observe:

PC1 is strongly influenced by Length and Aspect (long particles).
- as we approach 0 it is more compact particle
PC2 is strongly influenced by Width (wide vs very thin particles)
- As we approach 0 we get less wide particles.

Lets also look at morphology:

ggplot(scores, aes(PC1, PC2, colour = Morphology))+
  geom_point(size = 3, alpha = 0.7)+
  theme_minimal()

summary(pca)

Importance of components:
                          PC1    PC2     PC3
Standard deviation     1.3529 1.0112 0.38362
Proportion of Variance 0.6101 0.3408 0.04905
Cumulative Proportion  0.6101 0.9509 1.00000

From the summary we can say:
- 61.0% of the varience is explained by PC1
- 34.1% of the varience is explained by PC2
- 4.9% of the varience is explained by PC3
PC1 + PC2 = 95.1%

Now Lets determine the amount of plastics per station:

station_counts <- smallMP %>%
  count(Station)

plot it:

ggplot(station_counts, aes(x = Station, y = n)) +
  geom_col() +
  labs(y = "Number of Microplastics")

Lets look at polymer distribution per station

poly_station <- smallMP %>%
  count(Station, Polymer)

plot with colour blind friendly themes:

ggplot(poly_station, aes(x = Station, y = n, fill = Polymer)) +
  geom_col(position = "stack")+
  scale_fill_viridis_d(option = "turbo") + #high conrast colour palette
  labs(
    title = "Polymer Distribution by Station",
    x = "station ID",
    y = "total count",
    fill = "polymer type"
  )+
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Another way to visualize it

ggplot(poly_station, aes(x = Station, y = n, fill = Polymer)) +
  geom_col(position = "fill") +
  scale_fill_viridis_d(option = "turbo") + 
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Relative Polymer Composition by Station",
    x = "Station ID",
    y = "Percentage",
    fill = "Polymer Type"
  ) +
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Lets now look at Fibers vs Fragments across stations.

morph_station <- smallMP %>%
  count(Station, Morphology)

Plot:

ggplot(morph_station, aes(x = Station, y = n, fill = Morphology)) +
  geom_col()

and

ggplot(morph_station, aes(x = Station, y = n, fill = Morphology)) +
  geom_col(position = "fill")+
  scale_fill_viridis_d(option = "turbo") + 
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Relative MP Morphology Composition by Station",
    x = "Station ID",
    y = "Percentage",
    fill = "Morphology Type"
  ) +
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Now lets look at the size distribution

ggplot(smallMP, aes(x = Length)) +
  geom_histogram(bins = 100, binwidth = 20) +
  xlim(0,800)+
  facet_wrap(~Station)

Warning: Removed 16 rows containing non-finite outside the scale range
(`stat_bin()`).

Warning: Removed 18 rows containing missing values or values outside the scale range
(`geom_bar()`).

Keep in mind this cuts off anything 800um+

Back to the PCA, we want to know if stations differ in particle type and not just count. via PERMANOVA

install and load the ecological stats package vegan.

library(vegan)

Warning: package 'vegan' was built under R version 4.5.2

Loading required package: permute

Warning: package 'permute' was built under R version 4.5.2

build a matrix of PCA coordinates using only PC1 and PC2

pc_matrix <- scores[, c("PC1", "PC2")]

PERMANOVA:

adonis2(pc_matrix ~ Station, data = scores)

Warning in vegdist(as.matrix(lhs), method = method, ...): results may be meaningless because data have negative entries
                 in method "bray"

Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = pc_matrix ~ Station, data = scores)
          Df SumOfSqs      R2      F Pr(>F)
Model      8 10391833 0.77642 263.93   0.65
Residual 608  2992422 0.22358              
Total    616 13384255 1.00000

This looks at if MPs from different stations are physically different (size and shape)
It does not look at MP abundance or polymer.
The R2 is the % of variation per station
The high P value (0.653) means that separation by station due to size does not favour certain stations and clustering in the visual PCA may be due to chance.

MP physical characteristics are largely homogenized across the bay

But we also want to check to make sure the PERMANOVA didn’t fail to detect differences due to dispersion differences among groups. we test the homogeneity of dispersion:

bd <- betadisper(dist(pc_matrix), scores$Station)
anova(bd)

Analysis of Variance Table

Response: Distances
           Df  Sum Sq Mean Sq F value   Pr(>F)   
Groups      8   64.75  8.0941   3.147 0.001687 **
Residuals 608 1563.79  2.5720                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This confirms: stations don’t contain different kinds of particles (PERMANOVA p = 0.653), but there is a significant difference in dispersion (betadisper p = 0.0017).
Stations don’t contain different types of particles but they differ in how variable their particles are.
We can then assume some stations have more similarity between particles shape/size than others.

Lets visualize this: (centroid = mean location of points in PCA space)

boxplot(bd, xlab = "Station", ylab = "Distance to Centroid")

Much more can be done here, but in general, the taller the box and outliers, the more variability there is in size/shape per station.