This is 2019 already and people are shifting to a more eco-friendly way of living. One of them is to reduce the use of plastics. Plastics are dominantly used for packaging. Scientist are finding a way to replace plastics with a more friendly biomaterials which are biodegradable but are still strong. And they find it! It’s called Biocellulose. They find that a bacteria Gluconacetobacter xylinus actually produces this cellulose and now being analyzed for its potential for industrial scale production.
Knowing this fact, Indonesian’s packaging companies also want to apply such technology in their packaging. But the bacteria that is being analyzed right now already being pattened and quite high in price. Furthermore, it originated from from a four season region, meaning that it might be harder to grow them in our lab which has a rather tropical temperature.
Can we find our own biocellulose bacteria and produce our own healty packaging? Of course! Young scientist in Indonesia already sampled few of the potential bacteria. But which of the sample have the highest potential to produce biocellulose? So the young scientist can focus more to analyze that particular superbugs?
# if (!requireNamespace("BiocManager", quietly=TRUE)) # package untuk install package dari Bioconductor repository
# install.packages("BiocManager")
# BiocManager::install("msa")
# multiple sequence alignment (membandingkan sequence dna satu dengan yang lain)
library(msa)
library(dplyr)
library(ggplot2)
library(tibble)
library(stringr)
library(msaR)
library(seqinr)We already have the sequence data of biocellulose DNA. We will try to decode those data and find the potential genes responsible for desirable properties.
bio_seq %>%
as.data.frame() %>%
rownames_to_column(var = "sample") %>%
mutate(sample_length = str_length(x)) %>%
rename(dna_seq = x) %>%
arrange(-sample_length) %>%
select(sample, sample_length, dna_seq)From the bio_seq dataset above we have 3 column, and the glossary of data is :
sample : name of the sample and bacteriasample_length : length of sequence DNAdna_seq : sequence of bacteria DNA## use default substitution matrix
## CLUSTAL 2.1
##
## Call:
## msa(bio_seq)
##
## MsaDNAMultipleAlignment with 14 rows and 1497 columns
## aln names
## [1] ---------AGAGTTTGATCCTGGC...CTGCGGCTGGATCACCTCCTT---- Komagataeibacter ...
## [2] -------------------------...CTGC--------------------- Komagataeibacter ...
## [3] ----------GAGTTTGATCMTGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
## [4] ---------TGAGTTTGATCCTGGC...------------------------- Komagataeibacter ...
## [5] ATGAACCTGAGAGTTTGATCCTGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
## [6] ---------TGAGTTTGATCCTGGC...------------------------- Sample3
## [7] --------TTGAGTTTGATCCTGGC...------------------------- Komagataeibacter ...
## [8] ---------TGAGTTTGATCCTGGC...------------------------- Sample2
## [9] ----------GAGTTTGATCATGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
## [10] ----------GAGTTTCATCCTGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
## [11] -------------------------...CTGCGGC------------------ Komagataeibacter ...
## [12] -----------------ATCCTGGC...CTGCGGCTGGA-------------- Sample
## [13] ----------GAGTTTGATNNTGGC...CTGCGGCTGGATCACCTCCTTTCAA Gluconacetobacter...
## [14] ----------GAGTTTGATTATGGC...CTGCGGCTGGATCACCTCCTTT--- Sample4
## Con ----------GAGTTTGATCCTGGC...CTGCGGCTGGA??????????---- Consensus
The sequence of bacteria DNA has made to be parallel to find the multiple alignment of each sample bacteria. Each sequence consist of nitrogene base (A, C, G, T).
After step to finding the multiple alignment of sequence DNA, we have to visualize to made the sequence more interest.
To find the neighbor of similar bacteria, we can visualize it using phylogenetic tree.
bio_align <- msaConvert(bio_msa, type="seqinr::alignment") # untuk mengambil tipe alignment yang ada dari msa
bio_ident <- dist.alignment(bio_align, "identity") # dihitung distance matrix skornya
#as.matrix(d)[2:5, "C.annum Resistance", drop=FALSE]library(ape)
bioTree <- nj(bio_ident) # neighbor joining
plot(bioTree, main="Phylogenetic Tree of Gluconacetobacter xylinus Sequences", cex=0.6)From the phylogenetic tree we know that sample, sample3 has the shorut distance with Komagataeibacter nataicola. So, if someday the researcher want to replace those bacteria can be replace by those two sample. But we must to make the laboratorium study to make sure the sample is similar enough with those bacteria.