This is 2019 already and people are shifting to a more eco-friendly way of living. One of them is to reduce the use of plastics. Plastics are dominantly used for packaging. Scientist are finding a way to replace plastics with a more friendly biomaterials which are biodegradable but are still strong. And they find it! It’s called Biocellulose. They find that a bacteria Gluconacetobacter xylinus actually produces this cellulose and now being analyzed for its potential for industrial scale production.

Knowing this fact, Indonesian’s packaging companies also want to apply such technology in their packaging. But the bacteria that is being analyzed right now already being pattened and quite high in price. Furthermore, it originated from from a four season region, meaning that it might be harder to grow them in our lab which has a rather tropical temperature.

Can we find our own biocellulose bacteria and produce our own healty packaging? Of course! Young scientist in Indonesia already sampled few of the potential bacteria. But which of the sample have the highest potential to produce biocellulose? So the young scientist can focus more to analyze that particular superbugs?

2 Exploratory Data Analysis

We already have the sequence data of biocellulose DNA. We will try to decode those data and find the potential genes responsible for desirable properties.

From the bio_seq dataset above we have 3 column, and the glossary of data is :

  • sample : name of the sample and bacteria
  • sample_length : length of sequence DNA
  • dna_seq : sequence of bacteria DNA

3 Multiple Alignment

## use default substitution matrix
## CLUSTAL 2.1  
## 
## Call:
##    msa(bio_seq)
## 
## MsaDNAMultipleAlignment with 14 rows and 1497 columns
##      aln                                                   names
##  [1] ---------AGAGTTTGATCCTGGC...CTGCGGCTGGATCACCTCCTT---- Komagataeibacter ...
##  [2] -------------------------...CTGC--------------------- Komagataeibacter ...
##  [3] ----------GAGTTTGATCMTGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
##  [4] ---------TGAGTTTGATCCTGGC...------------------------- Komagataeibacter ...
##  [5] ATGAACCTGAGAGTTTGATCCTGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
##  [6] ---------TGAGTTTGATCCTGGC...------------------------- Sample3
##  [7] --------TTGAGTTTGATCCTGGC...------------------------- Komagataeibacter ...
##  [8] ---------TGAGTTTGATCCTGGC...-------------------------  Sample2
##  [9] ----------GAGTTTGATCATGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
## [10] ----------GAGTTTCATCCTGGC...CTGCGGCTGGATCACCTCCTTT--- Komagataeibacter ...
## [11] -------------------------...CTGCGGC------------------ Komagataeibacter ...
## [12] -----------------ATCCTGGC...CTGCGGCTGGA-------------- Sample
## [13] ----------GAGTTTGATNNTGGC...CTGCGGCTGGATCACCTCCTTTCAA Gluconacetobacter...
## [14] ----------GAGTTTGATTATGGC...CTGCGGCTGGATCACCTCCTTT--- Sample4
##  Con ----------GAGTTTGATCCTGGC...CTGCGGCTGGA??????????---- Consensus

The sequence of bacteria DNA has made to be parallel to find the multiple alignment of each sample bacteria. Each sequence consist of nitrogene base (A, C, G, T).

4 Visualize MSA Result

After step to finding the multiple alignment of sequence DNA, we have to visualize to made the sequence more interest.

6 Conclusion

From the phylogenetic tree we know that sample, sample3 has the shorut distance with Komagataeibacter nataicola. So, if someday the researcher want to replace those bacteria can be replace by those two sample. But we must to make the laboratorium study to make sure the sample is similar enough with those bacteria.