Sometimes we need to convert genetic sequences present in tabular format into plain text file (fasta) format. Also, often we need to convert fasta format into tabular format. DNA seqences present in tabular format can be used as a vector in R and we can do various operations like extracting part of DNA for each sequence. So, I have written two functions that would perform these tasks.

These functions can be pulled into R Studio directly from the github.

library (devtools)
library (tidyverse)
source_url("https://raw.githubusercontent.com/lrjoshi/FastaTabular/master/fasta_and_tabular.R")

Fasta to tabular format

Suppose we have our DNA sequences in dna_fasta.fasta file.

Convert this fasta file to table using following function. The output will be stored as dna_table.csv in the current directory.

FastaToTabular("dna_fasta.fasta")

Tabular to Fasta format

To convert csv to fasta format, one restriction is that you should have your sequence names in the first column and sequence itself in the second column. Then use following function. This will store the output table as dna_table.fasta file in the current working directoty. Remember, pre-existing file with the same name will be overwritten.


TabularToFasta("gene.csv")

If you get permission error while writing files, try to create a new directory and set that directory as working directory.

Sample files and codes are present in my Github repository.

LS0tDQp0aXRsZTogIkZhc3RhIHRvIFRhYmxlIHRvIEZhc3RhIENvbnZlcnNpb24gaW4gUiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNClNvbWV0aW1lcyB3ZSBuZWVkIHRvIGNvbnZlcnQgZ2VuZXRpYyBzZXF1ZW5jZXMgcHJlc2VudCBpbiB0YWJ1bGFyIGZvcm1hdCBpbnRvIHBsYWluIHRleHQgZmlsZSAoZmFzdGEpIGZvcm1hdC4gQWxzbywgb2Z0ZW4gd2UgbmVlZCB0byBjb252ZXJ0IGZhc3RhIGZvcm1hdCBpbnRvIHRhYnVsYXIgZm9ybWF0LiBETkEgc2VxZW5jZXMgcHJlc2VudCBpbiB0YWJ1bGFyIGZvcm1hdCBjYW4gYmUgdXNlZCBhcyBhIHZlY3RvciBpbiBSIGFuZCB3ZSBjYW4gZG8gdmFyaW91cyBvcGVyYXRpb25zIGxpa2UgZXh0cmFjdGluZyBwYXJ0IG9mIEROQSBmb3IgZWFjaCBzZXF1ZW5jZS4gU28sIEkgaGF2ZSB3cml0dGVuIHR3byBmdW5jdGlvbnMgdGhhdCB3b3VsZCBwZXJmb3JtIHRoZXNlIHRhc2tzLg0KDQoNCg0KVGhlc2UgZnVuY3Rpb25zIGNhbiBiZSBwdWxsZWQgaW50byBSIFN0dWRpbyBkaXJlY3RseSBmcm9tIHRoZSBnaXRodWIuDQoNCg0KYGBge3J9DQpsaWJyYXJ5IChkZXZ0b29scykNCmxpYnJhcnkgKHRpZHl2ZXJzZSkNCnNvdXJjZV91cmwoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9scmpvc2hpL0Zhc3RhVGFidWxhci9tYXN0ZXIvZmFzdGFfYW5kX3RhYnVsYXIuUiIpDQpgYGANCg0KDQojIyMgRmFzdGEgdG8gdGFidWxhciBmb3JtYXQgDQoNClN1cHBvc2Ugd2UgaGF2ZSBvdXIgRE5BIHNlcXVlbmNlcyBpbiBkbmFfZmFzdGEuZmFzdGEgZmlsZS4gDQoNCg0KQ29udmVydCB0aGlzIGZhc3RhIGZpbGUgdG8gdGFibGUgdXNpbmcgZm9sbG93aW5nIGZ1bmN0aW9uLiBUaGUgb3V0cHV0IHdpbGwgYmUgc3RvcmVkIGFzIGRuYV90YWJsZS5jc3YgaW4gdGhlIGN1cnJlbnQgZGlyZWN0b3J5LiANCg0KYGBgIHtyfQ0KRmFzdGFUb1RhYnVsYXIoImRuYV9mYXN0YS5mYXN0YSIpDQoNCmBgYA0KDQojIyMgVGFidWxhciB0byBGYXN0YSBmb3JtYXQNCg0KVG8gY29udmVydCBjc3YgdG8gZmFzdGEgZm9ybWF0LCBvbmUgcmVzdHJpY3Rpb24gaXMgdGhhdCB5b3Ugc2hvdWxkIGhhdmUgeW91ciBzZXF1ZW5jZSBuYW1lcyBpbiB0aGUgZmlyc3QgY29sdW1uIGFuZCBzZXF1ZW5jZSBpdHNlbGYgaW4gdGhlIHNlY29uZCBjb2x1bW4uIFRoZW4gdXNlIGZvbGxvd2luZyBmdW5jdGlvbi4gVGhpcyB3aWxsIHN0b3JlIHRoZSBvdXRwdXQgdGFibGUgYXMgZG5hX3RhYmxlLmZhc3RhIGZpbGUgaW4gdGhlIGN1cnJlbnQgd29ya2luZyBkaXJlY3RvdHkuIFJlbWVtYmVyLCBwcmUtZXhpc3RpbmcgZmlsZSB3aXRoIHRoZSBzYW1lIG5hbWUgd2lsbCBiZSBvdmVyd3JpdHRlbi4NCg0KDQpgYGB7cn0NCg0KVGFidWxhclRvRmFzdGEoImdlbmUuY3N2IikNCg0KYGBgDQpJZiB5b3UgZ2V0IHBlcm1pc3Npb24gZXJyb3Igd2hpbGUgd3JpdGluZyBmaWxlcywgdHJ5IHRvIGNyZWF0ZSBhIG5ldyBkaXJlY3RvcnkgYW5kIHNldCB0aGF0IGRpcmVjdG9yeSBhcyB3b3JraW5nIGRpcmVjdG9yeS4NCg0KU2FtcGxlIGZpbGVzIGFuZCBjb2RlcyBhcmUgcHJlc2VudCBpbiBteSBbR2l0aHViXShodHRwczovL2dpdGh1Yi5jb20vbHJqb3NoaS9GYXN0YVRhYnVsYXIpIHJlcG9zaXRvcnku