Introduction- According to Libretexts (2022), soil formation
involves 45% inorganic minerals, mostly sand, silt, and clay, allowing
elements such as ammonia, calcium, and zinc to be available. The organic
matter in soil constitutes 5% (Libretexts, 2022) of macroorganisms and
microorganisms, which may include dead or decaying matter. These are
crucial for completing the ecosystem with continuous nutrient cycles
such as phosphorus, carbon, and nitrogen (Bormann & Likens, 1970).
Soil also contains 25% water (Libretexts, 2022), which provides
essential hydrocarbons for cellular processes for macro and
microorganisms, and 25% air (Libretexts, 2022), which is ideal for
aerobic organisms.The diversity of microbial populations in soil
fluctuates and adapts quickly due to the quick reproduction of
generations, DNA mutations and genome or epigenetic modifications (Elena
& Lenski, 2003) due to changes in PH, temperature, and nutrient
availability in soil. (Daniel, 2005). These environmental factors affect
the microbial’s functional genes, adaptations, species distributions,
interactions, and evolutionary relationships (Elena & Lenski, 2003).
These complex microscopic communities’ genotypes and phenotypes can be
observed and compared with other abundant microbial communities in
various habitats (Fierer & Jackson, 2006). Originating from
discovering microorganisms under a microscope (Gest, 2004) by Robert
Hooke and Antoni Van Leeuwenhoek. The continuous modern advances of
inspection and deeper understanding of the biodiversity of prokaryotic
species today uncover their function, purpose and possibilities, as well
as resemblance from DNA and RNA sequencing and data analysis. (Burke et
al., 2011). In particular, the “Third-generation sequencing” (Petersen
et al., 2019) Oxford Nanopore sequencing 16S Barcoding Kit is
specifically designed to target the rRNA gene, allowing for identifying
bacterial communities. The technology relies on DNA strands passing
through nanopores, altering electrical currents to determine base
sequences (Petersen et al., 2019). Unlike Next Generation Sequencing,
where DNA fragments are short-reads (Soliman et al., 2017), nanopore
technology can sequence long-reads, generating additional DNA sequences
for further analysis; this makes it an optimal choice for observing
microorganisms in soil samples. This study investigates and compares
soil samples from three locations: the lake (on campus), the Wivenhoe
trail (outside campus), and the garden (outside campus). To explore what
species are present in soil samples in the various locations, critical
functional genes in the microorganisms found in soil samples and any
unique DNA adaptations or traits in the samples that evolved in response
to habitat conditions. In addition, metagenomic analysis of soil DNA
provides insights into community interactions, for example, ecological
competition or symbiosis. Hypothesis 1: The study aims to show that the
conserved functional genes in microorganisms found in soil differ
significantly in different environmental habitats.Hypothesis 2:
Microorganisms in soil samples show unique genetic adaptations and
traits within different locations and environmental conditions.
install.packages("readr")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library("readr")
data = read_tsv("data.tsv")
## Rows: 3432 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (1): tax
## dbl (26): barcode01, barcode02, barcode03, barcode04, barcode05, barcode06, ...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 27
## tax barcode01 barcode02 barcode03 barcode04 barcode05 barcode06 barcode07
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Bacteri… 28213 345 21467 62888 664 0 0
## 2 Bacteri… 6073 52 36614 87607 168 0 0
## 3 Bacteri… 7734 31 16041 44558 302 0 0
## 4 Bacteri… 8797 92 8632 43361 269 0 0
## 5 Bacteri… 7951 31 13659 31858 85 0 0
## 6 Bacteri… 39851 113 1628 2193 199 0 0
## # ℹ 19 more variables: barcode08 <dbl>, barcode09 <dbl>, barcode10 <dbl>,
## # barcode11 <dbl>, barcode12 <dbl>, barcode13 <dbl>, barcode14 <dbl>,
## # barcode15 <dbl>, barcode16 <dbl>, barcode17 <dbl>, barcode18 <dbl>,
## # barcode19 <dbl>, barcode20 <dbl>, barcode21 <dbl>, barcode22 <dbl>,
## # barcode23 <dbl>, barcode24 <dbl>, unclassified <dbl>, total <dbl>
# Install and load dplyr package
install.packages("dplyr")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Select specific columns
df_cleaned <- data %>%
select(`tax`, `barcode01`, `barcode02`, `barcode03`, `barcode04`, `barcode05`, `barcode10`, unclassified, total)
# View the cleaned dataframe
head(df_cleaned)
## # A tibble: 6 × 9
## tax barcode01 barcode02 barcode03 barcode04 barcode05 barcode10 unclassified
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Bact… 28213 345 21467 62888 664 23479 3543
## 2 Bact… 6073 52 36614 87607 168 1646 3789
## 3 Bact… 7734 31 16041 44558 302 6639 2094
## 4 Bact… 8797 92 8632 43361 269 1264 1666
## 5 Bact… 7951 31 13659 31858 85 1043 1631
## 6 Bact… 39851 113 1628 2193 199 5357 1170
## # ℹ 1 more variable: total <dbl>
# Aggregate data by species
species_aggregate <- df_cleaned %>%
group_by(tax) %>%
summarise(across(starts_with("barcode"), sum, na.rm = TRUE)) # Summing up all barcode columns for each species
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `across(starts_with("barcode"), sum, na.rm = TRUE)`.
## ℹ In group 1: `tax =
## "Archaea;Archaea_none;Euryarchaeota;Halobacteria;Halobacteriales;Haloarculaceae;Haloarcula"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
# View the aggregated data
head(species_aggregate)
## # A tibble: 6 × 7
## tax barcode01 barcode02 barcode03 barcode04 barcode05 barcode10
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Archaea;Archaea_n… 1 0 0 0 0 0
## 2 Bacteria;Bacteria… 91 12 145 66 4 28
## 3 Bacteria;Bacteria… 529 22 2299 1237 22 35
## 4 Bacteria;Bacteria… 124 5 767 236 2 5
## 5 Bacteria;Bacteria… 171 2 654 848 4 8
## 6 Bacteria;Bacteria… 350 9 2456 1023 6 15
# Install the phyloseq package
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
## Bioconductor version '3.12' is out-of-date; the current release version '3.18'
## is available with R version '4.3'; see https://bioconductor.org/install
BiocManager::install("phyloseq")
## 'getOption("repos")' replaces Bioconductor standard repositories, see
## 'help("repositories", package = "BiocManager")' for details.
## Replacement repositories:
## CRAN: https://cloud.r-project.org
## Bioconductor version 3.12 (BiocManager 1.30.22), R 4.0.4 (2021-02-15)
## Warning: package(s) not installed when version(s) same as or greater than current; use
## `force = TRUE` to re-install: 'phyloseq'
## Installation paths not writeable, unable to update packages
## path: /usr/lib/R/library
## packages:
## boot, class, cluster, codetools, foreign, KernSmooth, lattice, MASS,
## Matrix, mgcv, nlme, nnet, rpart, spatial, survival
install.packages("lattice")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(phyloseq)
library(lattice)
install.packages(c("ggplot2", "dplyr", "tidyr", "readr", "purrr", "tibble", "stringr", "forcats", "magrittr" ))
## Installing packages into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(magrittr)
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
install.packages("magrittr")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(magrittr)
install.packages("tidyr")
## Installing package into '/home/catherinetaylor35/R/x86_64-pc-linux-gnu-library/4.0'
## (as 'lib' is unspecified)
library(tidyr)
df_cleaned <- species_aggregate %>% separate(tax, sep = ";", into = c("Domain", "Kingdom", "Phylum", "Class", "Order", "Family", "Genus"))
head(df_cleaned)
## # A tibble: 6 × 13
## Domain Kingdom Phylum Class Order Family Genus barcode01 barcode02 barcode03
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 Archaea Archae… Eurya… Halo… Halo… Haloa… Halo… 1 0 0
## 2 Bacteria Bacter… Abdit… Abdi… Abdi… Abiti… Abdi… 91 12 145
## 3 Bacteria Bacter… Acido… Acid… Acid… Acido… Acid… 529 22 2299
## 4 Bacteria Bacter… Acido… Acid… Acid… Acido… Acid… 124 5 767
## 5 Bacteria Bacter… Acido… Acid… Acid… Acido… Acid… 171 2 654
## 6 Bacteria Bacter… Acido… Acid… Acid… Acido… Acid… 350 9 2456
## # ℹ 3 more variables: barcode04 <dbl>, barcode05 <dbl>, barcode10 <dbl>
tax_data <- df_cleaned %>% select(c("Phylum", "Class", "Order", "Family", "Genus"))
tax_ranks <- colnames(tax_data)
tax_data <- tax_table(tax_data)
## Warning in .local(object): Coercing from data.frame class to character matrix
## prior to building taxonomyTable.
## This could introduce artifacts.
## Check your taxonomyTable, or coerce to matrix manually.
dimnames(tax_data)[[2]] <- tax_ranks
head(tax_data)
## Taxonomy Table: [6 taxa by 5 taxonomic ranks]:
## Phylum Class Order Family
## sp1 "Euryarchaeota" "Halobacteria" "Halobacteriales" "Haloarculaceae"
## sp2 "Abditibacteriota" "Abditibacteria" "Abditibacteriales" "Abitibacteriaceae"
## sp3 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## sp4 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## sp5 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## sp6 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## Genus
## sp1 "Haloarcula"
## sp2 "Abditibacterium"
## sp3 "Acidicapsa"
## sp4 "Acidipila"
## sp5 "Acidisarcina"
## sp6 "Acidobacterium"
# first, extract all columns starting with 'barcode'
otu_tab <- df_cleaned %>% select(starts_with("barcode"))
# now we convert it to an otuTable object
# the 'taxa_are_rows' argument tells phyloseq which orientation our data are in
otu_tab <- otu_table(otu_tab, taxa_are_rows = T)
head(otu_tab)
## OTU Table: [6 taxa and 6 samples]
## taxa are rows
## barcode01 barcode02 barcode03 barcode04 barcode05 barcode10
## sp1 1 0 0 0 0 0
## sp2 91 12 145 66 4 28
## sp3 529 22 2299 1237 22 35
## sp4 124 5 767 236 2 5
## sp5 171 2 654 848 4 8
## sp6 350 9 2456 1023 6 15
microbiome <- phyloseq(otu_tab, tax_data)
microbiome # prints a nice summary of the data
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 3432 taxa and 6 samples ]
## tax_table() Taxonomy Table: [ 3432 taxa by 5 taxonomic ranks ]
head(otu_table(microbiome))
## OTU Table: [6 taxa and 6 samples]
## taxa are rows
## barcode01 barcode02 barcode03 barcode04 barcode05 barcode10
## sp1 1 0 0 0 0 0
## sp2 91 12 145 66 4 28
## sp3 529 22 2299 1237 22 35
## sp4 124 5 767 236 2 5
## sp5 171 2 654 848 4 8
## sp6 350 9 2456 1023 6 15
head(tax_table(microbiome))
## Taxonomy Table: [6 taxa by 5 taxonomic ranks]:
## Phylum Class Order Family
## sp1 "Euryarchaeota" "Halobacteria" "Halobacteriales" "Haloarculaceae"
## sp2 "Abditibacteriota" "Abditibacteria" "Abditibacteriales" "Abitibacteriaceae"
## sp3 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## sp4 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## sp5 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## sp6 "Acidobacteria" "Acidobacteriia" "Acidobacteriales" "Acidobacteriaceae"
## Genus
## sp1 "Haloarcula"
## sp2 "Abditibacterium"
## sp3 "Acidicapsa"
## sp4 "Acidipila"
## sp5 "Acidisarcina"
## sp6 "Acidobacterium"
# Can change which taxonomic rank the plot shows
plot_bar(microbiome, fill = "Phylum")

The graph presents a comparative analysis of alpha diversity within
samples using two metrics: the ACE Index and the Shannon Index. The ACE
Index, represented by black error bars, estimates species richness with
a focus on rare species, while the Shannon Index, indicated by blue
triangles, evaluates species’ abundance and evenness. The x-axis labels
‘barcode01’ through ‘barcode10’ likely denote individual samples or
sequences, and the dual y-axes allow for a direct comparison between the
two indices. This visualization aids in discerning the biodiversity of
each sample, where higher values on both indices indicate greater
diversity. The error bars capture variability in the ACE Index, which
provides insight into the confidence of these estimations.
# First, create the data frame with your data
data <- data.frame(
Barcode = c("barcode01", "barcode02", "barcode03", "barcode04", "barcode05", "barcode10"),
ACE = c(1240.594, 1538.493, 1251.353, 1055.012, 1071.254, 1130.701),
Shannon = c(5.147862, 5.196808, 5.693757, 4.823278, 5.250453, 5.345040)
)
# Descriptive statistics for ACE
ace_mean <- mean(data$ACE)
ace_sd <- sd(data$ACE)
ace_se <- ace_sd / sqrt(length(data$ACE))
# Descriptive statistics for Shannon
shannon_mean <- mean(data$Shannon)
shannon_sd <- sd(data$Shannon)
shannon_se <- shannon_sd / sqrt(length(data$Shannon))
# Output the descriptive statistics
cat("Descriptive Statistics for ACE:\n")
## Descriptive Statistics for ACE:
cat("Mean:", ace_mean, "SD:", ace_sd, "SE:", ace_se, "\n\n")
## Mean: 1214.568 SD: 178.8791 SE: 73.02707
cat("Descriptive Statistics for Shannon:\n")
## Descriptive Statistics for Shannon:
cat("Mean:", shannon_mean, "SD:", shannon_sd, "SE:", shannon_se, "\n")
## Mean: 5.242866 SD: 0.2831964 SE: 0.1156144
nmdsResult <- ordinate(rarefiedData, "NMDS", "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.005619765
## Run 1 stress 0.002148052
## ... New best solution
## ... Procrustes: rmse 0.03755366 max resid 0.0677368
## Run 2 stress 0.004187829
## Run 3 stress 0.2761339
## Run 4 stress 0.002923068
## Run 5 stress 0.001145372
## ... New best solution
## ... Procrustes: rmse 0.01144639 max resid 0.02058192
## Run 6 stress 0.002429543
## Run 7 stress 0.002722903
## Run 8 stress 0.00488279
## Run 9 stress 0.001715008
## Run 10 stress 0.001243233
## ... Procrustes: rmse 0.004500633 max resid 0.009126539
## ... Similar to previous best
## Run 11 stress 0.001265862
## ... Procrustes: rmse 0.004603063 max resid 0.009135091
## ... Similar to previous best
## Run 12 stress 9.961051e-05
## ... New best solution
## ... Procrustes: rmse 0.01077764 max resid 0.02029719
## Run 13 stress 0.002386339
## Run 14 stress 0.0002277899
## ... Procrustes: rmse 0.001327226 max resid 0.002401117
## ... Similar to previous best
## Run 15 stress 0.0007979903
## Run 16 stress 0.005543951
## Run 17 stress 0.001823967
## Run 18 stress 0.003237181
## Run 19 stress 0.007731411
## Run 20 stress 0.0002161793
## ... Procrustes: rmse 0.001209897 max resid 0.002157458
## ... Similar to previous best
## *** Best solution repeated 2 times
## Warning in metaMDS(veganifyOTU(physeq), distance, ...): stress is (nearly)
## zero: you may have insufficient data
plot_ordination(rarefiedData, nmdsResult, type = "sample")
## No available covariate data to map on the points for this plot `type`
