1 + 1[1] 2
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
1 + 1[1] 2
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).
smallMP<-read.csv("C:/Users/jdesw/Desktop/AEON LAB/Purency Results/smaller than 100/SEA_2026_R/Complete MP small SEA prep.csv")library(tidyverse)Warning: package 'readr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.0 ✔ stringr 1.5.2
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
glimpse(smallMP)Rows: 618
Columns: 7
$ Station <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
$ Polymer <chr> "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP", "PP",…
$ Length <dbl> 80.27000, 244.54000, 181.17000, 387.27000, 81.08000, 68.337…
$ Width <dbl> 75.080000, 207.820000, 81.600000, 317.150000, 11.980000, 35…
$ Aspect <chr> "1.069126265", "1.176691368", "2.220220588", "1.22109412", …
$ Morphology <chr> "Fragment", "Fragment", "Fragment", "Fragment", "Fiber", "F…
$ Colour <chr> "Orange", "Transluscent", "Transluscent", "Orange", "Transl…
pca_ready <- smallMP %>%
select(Station, Polymer, Morphology, Length, Width, Aspect) %>%
mutate(across(c(Length, Width, Aspect),
~as.numeric(gsub("[^0-9.]", "", .)))) %>%
drop_na(Length, Width, Aspect)What we did here was keep station and polymer attached while cleaning so when rows are removed, everything stays synchronized. pca_ready becomes the main pca dataset.
Now extract only the numeric columns for the pca
pca_data <- pca_ready %>%
select(Length, Width, Aspect)Then re scale then run the pca
pca_scaled <- scale(pca_data)
pca <- prcomp(pca_scaled)Now we can extract the scores and the row counts will be the same
scores <- as.data.frame(pca$x)Try and attach metadata again:
scores$Polymer <- pca_ready$Polymer
scores$Station <- pca_ready$Station
scores$Morphology <- pca_ready$MorphologyIn the future to make sure everything has the right amount of rows, we can use the nrows(data) fro any data sheet to see if they are equal.
ggplot(scores, aes(PC1, PC2, colour = Polymer))+
geom_point(size = 3, alpha = 0.7)+
theme_minimal()ggplot(scores, aes(PC1, PC2, colour = Station))+
geom_point(size = 3, alpha = 0.7)+
theme_minimal()pca$rotation PC1 PC2 PC3
Length 0.7107533 -0.05378882 0.7013818
Width 0.1227224 -0.97229984 -0.1989277
Aspect 0.6926535 0.22746380 -0.6844643
We observe:
PC1 is strongly influenced by Length and Aspect (long particles).
PC2 is strongly influenced by Width (wide vs very thin particles)
Lets also look at morphology:
ggplot(scores, aes(PC1, PC2, colour = Morphology))+
geom_point(size = 3, alpha = 0.7)+
theme_minimal()summary(pca)Importance of components:
PC1 PC2 PC3
Standard deviation 1.3529 1.0112 0.38362
Proportion of Variance 0.6101 0.3408 0.04905
Cumulative Proportion 0.6101 0.9509 1.00000
From the summary we can say:
61.0% of the varience is explained by PC1
34.1% of the varience is explained by PC2
4.9% of the varience is explained by PC3
PC1 + PC2 = 95.1%
Now Lets determine the amount of plastics per station:
station_counts <- smallMP %>%
count(Station)plot it:
ggplot(station_counts, aes(x = Station, y = n)) +
geom_col() +
labs(y = "Number of Microplastics")poly_station <- smallMP %>%
count(Station, Polymer)plot with colour blind friendly themes:
ggplot(poly_station, aes(x = Station, y = n, fill = Polymer)) +
geom_col(position = "stack")+
scale_fill_viridis_d(option = "turbo") + #high conrast colour palette
labs(
title = "Polymer Distribution by Station",
x = "station ID",
y = "total count",
fill = "polymer type"
)+
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Another way to visualize it
ggplot(poly_station, aes(x = Station, y = n, fill = Polymer)) +
geom_col(position = "fill") +
scale_fill_viridis_d(option = "turbo") +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Relative Polymer Composition by Station",
x = "Station ID",
y = "Percentage",
fill = "Polymer Type"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))morph_station <- smallMP %>%
count(Station, Morphology)Plot:
ggplot(morph_station, aes(x = Station, y = n, fill = Morphology)) +
geom_col()and
ggplot(morph_station, aes(x = Station, y = n, fill = Morphology)) +
geom_col(position = "fill")+
scale_fill_viridis_d(option = "turbo") +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Relative MP Morphology Composition by Station",
x = "Station ID",
y = "Percentage",
fill = "Morphology Type"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))ggplot(smallMP, aes(x = Length)) +
geom_histogram(bins = 100, binwidth = 20) +
xlim(0,800)+
facet_wrap(~Station)Warning: Removed 16 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 18 rows containing missing values or values outside the scale range
(`geom_bar()`).
Back to the PCA, we want to know if stations differ in particle type and not just count. via PERMANOVA
library(vegan)Warning: package 'vegan' was built under R version 4.5.2
Loading required package: permute
Warning: package 'permute' was built under R version 4.5.2
pc_matrix <- scores[, c("PC1", "PC2")]PERMANOVA:
adonis2(pc_matrix ~ Station, data = scores)Warning in vegdist(as.matrix(lhs), method = method, ...): results may be meaningless because data have negative entries
in method "bray"
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = pc_matrix ~ Station, data = scores)
Df SumOfSqs R2 F Pr(>F)
Model 8 10391833 0.77642 263.93 0.65
Residual 608 2992422 0.22358
Total 616 13384255 1.00000
This looks at if MPs from different stations are physically different (size and shape)
It does not look at MP abundance or polymer.
The R2 is the % of variation per station
The high P value (0.653) means that separation by station due to size does not favour certain stations and clustering in the visual PCA may be due to chance.
MP physical characteristics are largely homogenized across the bay
But we also want to check to make sure the PERMANOVA didn’t fail to detect differences due to dispersion differences among groups. we test the homogeneity of dispersion:
bd <- betadisper(dist(pc_matrix), scores$Station)
anova(bd)Analysis of Variance Table
Response: Distances
Df Sum Sq Mean Sq F value Pr(>F)
Groups 8 64.75 8.0941 3.147 0.001687 **
Residuals 608 1563.79 2.5720
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This confirms: stations don’t contain different kinds of particles (PERMANOVA p = 0.653), but there is a significant difference in dispersion (betadisper p = 0.0017).
Stations don’t contain different types of particles but they differ in how variable their particles are.
We can then assume some stations have more similarity between particles shape/size than others.
Lets visualize this: (centroid = mean location of points in PCA space)
boxplot(bd, xlab = "Station", ylab = "Distance to Centroid")