0.1 Introduction

Amanita is a genus of fungi that is primarily made up of agarics, which are a particular kind of fungus that grows on mushrooms. There are about 600 distinct agaric species in the genus Amanita of fungus. Among the many different types of fungi that are known to exist are the deadliest mushrooms in the world, as well as some highly valued culinary varieties. The genus Amanita of mushrooms belongs to the Amanitaceae family of fungi that create mushrooms.

The Classification of This Genus Follows the Levels as Follows -

Kingdom: Fungi

Division: Basidiomycota

Class: Agaricomycetes

Order: Agaricales

Family: Amanitaceae

0.2 R packages used

library(traits)
library(treeio)
library(ggtree)
library(reshape2)
library(ggstance)
library(ape)
library(seqinr)
library(adegenet)
library(DECIPHER)
library(viridis)
library(ggplot2)
library(tidyverse)

0.3 Downloading Data from NCBI

0.3.1 we’ll be using the Library trait

Sequence <- ncbi_byid(ids = c("NR_187078.1", "NR_187077.1", "NR_187076.1", "NR_187075.1", "NR_187074.1", "NR_187073.1", "NR_187072.1", "NR_187071.1", "NR_187070.1", "NR_187069.1", "NR_187068.1", "NR_187067.1", "NR_187066.1", "NR_185704.1", "NR_185703.1", "NR_184989.1", "NR_184934.1", "NR_182949.1", "NR_182712.1", "NR_182711.1", "NR_182482.1", "NR_178171.1", "NR_177541.1", "NR_177182.1", "NR_177133.1", "NR_176704.1", "NR_175723.1", "NR_175722.1", "NR_175721.1", "NR_175720.1", "NR_175719.1", "NR_175718.1", "NR_175717.1", "NR_175716.1", "NR_175715.1", "NR_175714.1", "NR_175713.1", "NR_175712.1", "NR_175711.1", "NR_175710.1", "NR_175709.1", "NR_175708.1", "NR_175707.1", "NR_174910.1", "NR_173939.1", "NR_173938.1", "NR_173801.1", "NR_173776.1", "NR_173773.1", "NR_173766.1", "NR_173190.1", "NR_173189.1", "NR_173188.1", "NR_173187.1", "NR_173159.1", "NR_173158.1", "NR_172802.1", "NR_169902.1", "NR_166224.1", "NR_164607.1", "NR_164606.1", "NR_164493.1", "NR_119968.1", "NR_119715.1", "NR_119714.1", "NR_119713.1", "NR_119499.1", "NR_119498.1", "NR_119390.1", "NR_119389.1", "NR_119388.1", "NR_119387.1", "NR_159596.1", "NR_159595.1", "NR_159593.1", "NR_159592.1", "NR_159591.1", "NR_159590.1", "NR_159589.1", "NR_159588.1", "NR_159587.1", "NR_159586.1", "NR_159585.1", "NR_159584.1", "NR_159583.1", "NR_159582.1", "NR_159581.1", "NR_159580.1", "NR_159579.1", "NR_159577.1", "NR_159576.1", "NR_159575.1", "NR_159574.1", "NR_159572.1", "NR_159571.1", "NR_159570.1", "NR_159569.1", "NR_159568.1", "NR_159567.1", "NR_159564.1", "NR_151657.1", "NR_151656.1", "NR_158347.1", "NR_158316.1", "NR_154703.1", "NR_154693.1", "NR_154692.1", "NR_154691.1", "NR_154690.1", "NR_154689.1", "NR_154683.1", "NR_154668.1", "NR_151654.1", "NR_151653.1", "NR_151652.1", "NR_151651.1", "NR_151650.1", "NR_151649.1", "NR_147634.1", "NR_147633.1", "NR_147632.1", "NR_137609.1", "NR_137116.1", "NR_151655.1"), verbose = TRUE)

0.3.2 View Data

Sequence

0.4 Preprocessing

0.4.1 Impact of filtering and normalization

#create a table from the data by selecting columns using dplyr
Seq <- Sequence %>% 
  select(acc_no, taxon, journal, country, sequence, first_author)
Seq

0.4.2 table of filtered and normalized data

Seqq <- Seq %>%
  filter(journal != "Unpublished") #get rid of unpublished data
Seqqq <- Seqq %>%
  filter(first_author != "NA") #get rid of data with no known author
Seqqqq <- Seqqq %>%
  filter(country != "NA") #get rid of data with no country recorded
Seqqqq

0.4.3 Selecting Relevant column

Seqs <- Seqqqq %>%
  select(acc_no, country, taxon, sequence)
Seqs

0.4.4 load the sequences from the file

seqs <- readDNAStringSet("newickk.txt", format = "fasta")
seqs
## DNAStringSet object of length 69:
##      width seq                                              names               
##  [1]   601 TTGAATAAAACCCCCAATGGTTG...TTTTGGACAAAGTTGAACAAAT A.cretaceaverruca_1
##  [2]   600 TTGAATGCTTTAAACCCATTGGC...ATGAGCAATTATACTTGTATAT A.brunneola_1 
##  [3]   600 TTGAATGCTTTAAACCCATTGGC...ATGAGCAATTATACTTGTATAT A.brunneola_2
##  [4]   605 TTGAATGCTTTAAACCCATTGGC...ATGAGCAATTATACTTGTATAT A.brunneola_3
##  [5]   604 TTGAATGCTTTAAACCCATTGGC...ATGAGCAATTATACTTGTATAT A.brunneola_4 
##  ...   ... ...
## [65]   610 GGATCATTAGTGAAATGAACTTT...GAAATGCACAACTTGACCTCAA A.griseorosea 
## [66]   606 GGATCATTAGTGAAATGAACCAT...GAATTGACCAACTTGACCTCAA A.molliuscula 
## [67]   604 GGATCATTAAAGAAATGAACCCT...AACTTGACCAACTTGACCTCAA A.subfuliginea 
## [68]   632 GGAAGTAAAAGTCGTAACAAGGT...AAGCATATCAATAAGCGGAGGA A.drummondii 
## [69]   677 AAGTCGTAACAAGGTTTCCGTAG...TAGGACTACCCGCTGAACTTAA A.brunneitoxicaria

0.5 nucleotide sequences need to be in the same orientation

seqs <- OrientNucleotides(seqs)

seqs <- OrientNucleotides(seqs)
## ========================================================================================================================================================================================================
## 
## Time difference of 0.14 secs

0.6 Perform the alignment

aligned <- AlignSeqs(seqs)
## Determining distance matrix based on shared 9-mers:
## ================================================================================
## 
## Time difference of 0.14 secs
## 
## Clustering into groups by similarity:
## ================================================================================
## 
## Time difference of 0.03 secs
## 
## Aligning Sequences:
## ================================================================================
## 
## Time difference of 1.78 secs
## 
## Iteration 1 of 2:
## 
## Determining distance matrix based on alignment:
## ================================================================================
## 
## Time difference of 0.02 secs
## 
## Reclustering into groups by similarity:
## ================================================================================
## 
## Time difference of 0.03 secs
## 
## Realigning Sequences:
## ================================================================================
## 
## Time difference of 1.31 secs
## 
## Iteration 2 of 2:
## 
## Determining distance matrix based on alignment:
## ================================================================================
## 
## Time difference of 0.02 secs
## 
## Reclustering into groups by similarity:
## ================================================================================
## 
## Time difference of 0.03 secs
## 
## Realigning Sequences:
## ================================================================================
## 
## Time difference of 0.67 secs
## 
## Refining the alignment:
## ================================================================================
## 
## Time difference of 0.56 secs

1 view the alignment in a browser (optional)

BrowseSeqs(aligned, highlight=0)

2 write the alignment to a new FASTA file

writeXStringSet(aligned,
                file="Amanita_aligned1.fasta")

2.1 Read in the aligned data

dna1 <- read.alignment("Amanita_aligned1.fasta", format = "fasta")

2.2 Ceate a distance matrix for the alignment

D1 <- dist.alignment(dna1, matrix = "similarity")

2.3 View the distance matrix

Darker shades of gray mean a larger distance

temp <- as.data.frame(as.matrix(D1))
table.paint(temp, cleg=0, clabel.row=.5, clabel.col=.5)+
  scale_color_viridis()

## NULL

2.4 Creating the tree using Neighbour Joining

All trees created using ape package will be of class phylo

tre1 <- nj(D1)
class(tre1) 
## [1] "phylo"
tre1 <- ladderize(tre1)

2.5 Sneak view of the tree constructed

Using base R plotting we get to view the newly construucted tree

plot(tre1, cex = 0.6)
title("similarity in Amanita (ITS)")

2.6 Cluster dendrogram

It will help to see the height and the distance or the dissimilarities of sequences

h_cluster <- hclust(D1, method = "average", members = NULL) # method = average is used for UPGMA, members can be equal to NULL or a vector with a length of size D
plot(h_cluster, cex = 0.6)

2.6.1 Plot the allignment with the tree

Plotting the alignment with the tree helps to note the rate of similarities, diversions in single neuclotide plymorphism (SNPs)

tre1$tip.label <- aligned@ranges@NAMES
msaplot(p=ggtree(tre1), fasta="Amanita_aligned1.fasta", window=c(150, 175))+ scale_fill_viridis_d(alpha = 0.8)

# The next move is to use the Molecular Evolution Genetic Analysis software ————————————————————————

2.7 Exporting and Importing with MEGA

The next step is done with MEGA software, the Sequences are copied out to MEGA software and aligned and edited. The phylogenetic tree is constructed and the export via newick file format into R

nkk<- '((((((((((((A.simulans:0.02907337,A.glarea:0.04496334):0.00809242,A.variicolor:0.06018683):0.00431265,(A.lividopallescens:0.00989372,A.griseofusca:0.03271156):0.01275494):0.00524249,A.vladimirii:0.05453026):0.00613586,(A.griseofolia:0.08648412,(A.rhacopus:0.05113933,A.liquii:0.07084414):0.01336186):0.01592971):0.00719293,A.drummondii:0.08112228):0.00479849,A.griseocaerulea:0.04323164):0.03527613,A.calida:0.10272265):0.01495296,(A.minima:0.17498878,(A.fulvopulverulenta:0.16894419,(A.vernicoccora:0.13168734,A.goossensfontanae:0.17571715):0.00854798):0.01142453):0.01164006):0.00950753,(((A.submelleialba:0.12349877,A.bingensis:0.15120395):0.01583520,A.kalasinensis:0.15737323):0.01081793,(A.pallidoverruca:0.16195382,(A.robusta:0.10479585,(A.sinensis:0.08508934,A.ravicrocina:0.10983791):0.00815110):0.02939613):0.00274861):0.01146414):0.01073844,(A.ballerina:0.10803523,((A.molliuscula:0.05853378,A.griseorosea:0.07178017):0.03812872,((((A.subpallidorosea:0.05461553,A.subfuliginea:0.07663470):0.01044801,A.pallidorosea:0.04468423):0.00875680,A.rimosa:0.08054336):0.00433501,((A.fuligineoides:0.06467381,A.brunneitoxicaria:0.08299499):0.00853616,((A.millsii:0.04431407,A.gardneri:0.01133412):0.01158407,(A.harkoneniana:0.02273708,A.bweyeyensis:0.01871750):0.01010805):0.03543300):0.00559756):0.00422254):0.03906184):0.02618893):0.00447108,((A.wadulawitu:0.05563651,A.lesueurii:0.02798780):0.06810291,((((A.sabulosa_4:0.00819460,A.sabulosa_3:0.01436327):0.00516027,(A.sabulosa_2:0.00890918,A.sabulosa_1:0.00147483):0.01829013):0.08439046,((A.pupatju_4:0.00078466,A.pupatju_3:0.00439159):0.00219649,(A.pupatju_2:0.00426348,A.pupatju_1:0.00092340):0.00127862):0.09390663):0.01402404,(((A.compacta_3:0.00000000,A.compacta_2:0.00457843):0.00061855,A.compacta_1:0.00468112):0.05648131,(A.arenarioides:0.07054172,(A.pseudoarenaria_4:0.02745061,(A.pseudoarenaria_3:0.01048861,(A.pseudoarenaria_2:0.00295979,A.pseudoarenaria_1:0.00227974):0.00919717):0.01198716):0.02173716):0.00723149):0.02474670):0.05275944):0.02846739,(A.quenda:0.19741084,(((((A.brunneola_5:0.00754376,A.brunneola_4:0.00257218):0.00439174,A.brunneola_3:0.00578529):0.00925165,(A.brunneola_2:0.00087885,A.brunneola_1:0.00415833):0.00976583):0.15805804,A.heishidingensis:0.14319692):0.00434084,(((A.cretaceaverruca_7:0.01593990,A.cretaceaverruca_4:0.00772619):0.00324128,A.cretaceaverruca_2:0.00353222):0.00588864,((A.cretaceaverruca_8:0.00064111,A.cretaceaverruca_3:0.00275741):0.00806665,(A.cretaceaverruca_5:0.00911831,(A.cretaceaverruca_6:0.00346111,A.cretaceaverruca_1:0.00155082):0.00207371):0.00188128):0.00195448):0.14598532):0.02768546):0.02205341);'

Tree <- read.tree(text=nkk)

2.8 Creating a design for grouping the tree construction in R

2.8.1 checking the data to be used for the grouping

group = read.table('newick design.txt', sep = '\t',
                   col.names = c('Taxon', 'Country'),
                   header = FALSE, stringsAsFactors = FALSE)

group <- as_tibble(group)
group

2.8.2 Convert Tree from phylo to a structured data

Tree1 <- as_tibble(Tree)
Tree1

2.9 Full join of the structured tree data with the design

using the full join , we joined the tree data with the design we created by the label.

Join_Tree <- full_join(Tree1, group, by = c('label' = 'Taxon'))

2.10 Clean the new joined data

we ensure to eliminate branch length without figure.

Join_Tree1 <- Join_Tree %>%
  filter(branch.length != "NA")
Join_Tree1

2.11 convert the structured tree to treedata

Join_Tree2 <- as.treedata(Join_Tree1)

2.12 Plotting with ggtree

ggtree(Join_Tree2, layout = 'circular') + geom_treescale(fontsize=3, linesize=0.2, offset=0, color = 'red') +
  geom_tiplab(aes(color = Country)) + 
  theme(legend.position = 'left') +
  geom_highlight(node = 1:36, fill='red', alpha=.2)+
  geom_highlight(node = 37:54, fill='green', alpha=.2)+
  geom_highlight(node = 56:69, fill='skyblue', alpha=.2)

2.13 Conclusion

The similarities between the sequences has been shown, as indicated by the country from where the Amanita originated from or submitted. the phylogenetic tree showed three major clades.

2.14 Just Exploring

Joseph is an ‘R for evolution biology’ learner, still figuring out how this works please connect with me here. comment and help become better.

THANKS