SAM/BAM Documentation

Here is the documentation for SAM.

Below is snippet of a SAM file:

SAMsnippet

SAMsnippet

LINUX based tools

github-samtools

bioconda-samtools

github-bamtools

bioconda-bamtools

github-sambamba

bioconda-sambamba

LINUX commands for analysis of SAM file

number of alignments contained insertions

cat xxxxxx.sam | cut -f6|grep “I”|wc -l

number of alignments contained deletions

cat xxxxxx.sam | cut -f6|grep “D”|wc -l

number of alignments contained both insertions and deletions

cat xxxxxx.sam | cut -f6|grep “I”|grep “D”|wc -l

number of alignments contained insertions and/or deletions

cut -f6 xxxxxx.sam | grep -c “[I,D]”

total number of reads (including no. reads unpaired, non-aligned, aligned once or more than once)

samtools flagstat xxxxxx.sam

LINUX commands for analysis of BAM file

number of sequences in the file

samtools view -H xxxxxx.bam | grep -c “SN:”

Length of 1st sequence in the file

samtools view -H xxxxxx.bam | grep “SN:” | more

name of the alignment tool

samtools view -H xxxxxx.bam | grep “^@PG

identifier (name) for the first alignment

samtools view xxxxxx.bam | head -1 | cut -f1

number of alignments in the file

samtools flagstat xxxxxx.bam

samtools view xxxxxx.bam | wc -l

number of unmapped reads

samtools view xxxxxx.bam | cut -f3 | grep -v ’*’ | wc -l

number of spliced alignments

samtools view xxxxxx.bam |cut -f6|grep “N”|wc -l

samtools view xxxxxx.bam | cut -f6 | grep -c ‘N’

number of alignments containing a deletion (D)

samtools view xxxxxx.bam| cut -f6|grep “D”|wc -l

samtools view xxxxxx.bam | cut -f6 | grep -c ‘D’

alignment with an unmapped mate (*)

samtools view xxxxxx.bam| cut -f7|grep “*“|wc -l

number of alignments where the read’s mate mapped to the same chromosome

samtools view xxxxxx.bam| cut -f7|grep “=”|wc -l

Data extraction from a defined chromosomal region: First, sort and index the file

samtools sort xxxxxx.bam xxxxxx.sorted.bam

samtools index xxxxxx.sorted.bam

number of alignments in a specified chromosomal region

samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|wc -l

number of unmapped mates in a specified chromosomal region

samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f7|grep “*“|wc -l

number of deletions in a specified chromosomal region

samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f6|grep “D”|wc -l

number of mapped mates in a specified chromosomal region

samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f7|grep “=”|wc -l

number of splicing in a specified chromosomal region

samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f6|grep “N”|wc -l

An alternative approach to extract data from a particular chromosomal region

samtools view -b xxxxxx.sorted.bam “Chr3:11777000-11794000” > xxxxxx.region.bam

samtools flagstat xxxxxx.region.bam

samtools view xxxxxx.bam | cut -f6 | grep -c ‘N’

samtools view xxxxxx.region.bam | cut -f6 | grep -c ‘D’

samtools view xxxxxx.region.bam | cut -f7 | grep -c ’*’

samtools view xxxxxx.bam | cut -f7 | grep -c ‘=’

LS0tDQp0aXRsZTogIlNBTS9CQU06IEFubm90YXRpb24gYW5kIGFuYWx5c2lzIG9mIG5leHQtZ2VuZXJhdGlvbiBzZXF1ZW5jaW5nIGRhdGEiDQphdXRob3I6ICJCaGFnaXJhdGhpIERhc2giDQpkYXRlOiAiRGVjZW1iZXIgMiwgMjAxOCINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMgU0FNL0JBTSBEb2N1bWVudGF0aW9uDQoNCkhlcmUgaXMgdGhlIGRvY3VtZW50YXRpb24gZm9yIFtTQU1dKGh0dHA6Ly9zYW10b29scy5naXRodWIuaW8vaHRzLXNwZWNzL1NBTXYxLnBkZikuDQoNCkJlbG93IGlzIHNuaXBwZXQgb2YgYSBTQU0gZmlsZToNCg0KIVtTQU1zbmlwcGV0XShDOi9Vc2Vycy9iaGFnaS9PbmVEcml2ZS9Qcm9ncmFtbWluZy1MaW51eC9TQU12MS5KUEcpDQoNCiMgTElOVVggYmFzZWQgdG9vbHMNCg0KW2dpdGh1Yi1zYW10b29sc10oaHR0cHM6Ly9naXRodWIuY29tL3NhbXRvb2xzL3NhbXRvb2xzKQ0KDQpbYmlvY29uZGEtc2FtdG9vbHNdKGh0dHBzOi8vYmlvY29uZGEuZ2l0aHViLmlvL3JlY2lwZXMvc2FtdG9vbHMvUkVBRE1FLmh0bWwpDQoNCg0KW2dpdGh1Yi1iYW10b29sc10oaHR0cHM6Ly9naXRodWIuY29tL3NhbXRvb2xzL3NhbXRvb2xzKQ0KDQpbYmlvY29uZGEtYmFtdG9vbHNdKGh0dHBzOi8vYmlvY29uZGEuZ2l0aHViLmlvL3JlY2lwZXMvYmFtdG9vbHMvUkVBRE1FLmh0bWwpDQoNCg0KW2dpdGh1Yi1zYW1iYW1iYV0oaHR0cHM6Ly9naXRodWIuY29tL2Jpb2Qvc2FtYmFtYmEpDQoNCltiaW9jb25kYS1zYW1iYW1iYV0oaHR0cHM6Ly9iaW9jb25kYS5naXRodWIuaW8vcmVjaXBlcy9zYW1iYW1iYS9SRUFETUUuaHRtbCkNCg0KDQojIExJTlVYIGNvbW1hbmRzIGZvciBhbmFseXNpcyBvZiBTQU0gZmlsZQ0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgY29udGFpbmVkIGluc2VydGlvbnMgDQoNCiBjYXQgeHh4eHh4LnNhbSB8IGN1dCAtZjZ8Z3JlcCAiSSJ8d2MgLWwNCg0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgY29udGFpbmVkIGRlbGV0aW9ucw0KDQogY2F0IHh4eHh4eC5zYW0gfCBjdXQgLWY2fGdyZXAgIkQifHdjIC1sDQoNCg0KIyMjIG51bWJlciBvZiBhbGlnbm1lbnRzIGNvbnRhaW5lZCBib3RoIGluc2VydGlvbnMgYW5kIGRlbGV0aW9ucw0KDQogY2F0IHh4eHh4eC5zYW0gfCBjdXQgLWY2fGdyZXAgIkkifGdyZXAgIkQifHdjIC1sDQoNCg0KIyMjIG51bWJlciBvZiBhbGlnbm1lbnRzIGNvbnRhaW5lZCBpbnNlcnRpb25zIGFuZC9vciBkZWxldGlvbnMNCg0KY3V0IC1mNiB4eHh4eHguc2FtIHwgZ3JlcCAtYyAiW0ksRF0iDQoNCiMjIyB0b3RhbCBudW1iZXIgb2YgcmVhZHMgKGluY2x1ZGluZyBuby4gcmVhZHMgdW5wYWlyZWQsIG5vbi1hbGlnbmVkLCBhbGlnbmVkIG9uY2Ugb3IgbW9yZSB0aGFuIG9uY2UpIA0KDQpzYW10b29scyBmbGFnc3RhdCB4eHh4eHguc2FtDQoNCg0KIyBMSU5VWCBjb21tYW5kcyBmb3IgYW5hbHlzaXMgb2YgQkFNIGZpbGUNCg0KIyMjIG51bWJlciBvZiBzZXF1ZW5jZXMgaW4gdGhlIGZpbGUNCnNhbXRvb2xzIHZpZXcgLUggeHh4eHh4LmJhbSB8IGdyZXAgLWMgIlNOOiINCg0KIyMjIExlbmd0aCBvZiAxc3Qgc2VxdWVuY2UgaW4gdGhlIGZpbGUNCnNhbXRvb2xzIHZpZXcgLUggeHh4eHh4LmJhbSB8IGdyZXAgIlNOOiIgfCBtb3JlDQoNCiMjIyBuYW1lIG9mIHRoZSBhbGlnbm1lbnQgdG9vbA0Kc2FtdG9vbHMgdmlldyAtSCB4eHh4eHguYmFtIHwgZ3JlcCAiXkBQRyINCg0KIyMjIGlkZW50aWZpZXIgKG5hbWUpIGZvciB0aGUgZmlyc3QgYWxpZ25tZW50DQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW0gfCBoZWFkIC0xIHwgY3V0IC1mMQ0KDQoNCiMjIyBudW1iZXIgb2YgYWxpZ25tZW50cyBpbiB0aGUgZmlsZSANCg0Kc2FtdG9vbHMgZmxhZ3N0YXQgeHh4eHh4LmJhbSANCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtIHwgd2MgLWwNCg0KDQojIyMgbnVtYmVyIG9mIHVubWFwcGVkIHJlYWRzDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LmJhbSB8IGN1dCAtZjMgfCBncmVwIC12ICcqJyB8IHdjIC1sIA0KDQoNCiMjIyBudW1iZXIgb2Ygc3BsaWNlZCBhbGlnbm1lbnRzDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LmJhbSB8Y3V0IC1mNnxncmVwICJOInx3YyAtbA0KDQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW0gfCBjdXQgLWY2IHwgZ3JlcCAtYyAnTicNCg0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgY29udGFpbmluZyBhIGRlbGV0aW9uIChEKQ0KDQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW18IGN1dCAtZjZ8Z3JlcCAiRCJ8d2MgLWwNCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtIHwgY3V0IC1mNiB8IGdyZXAgLWMgJ0QnDQoNCg0KIyMjIGFsaWdubWVudCB3aXRoIGFuIHVubWFwcGVkIG1hdGUgKCopDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LmJhbXwgY3V0IC1mN3xncmVwICIqInx3YyAtbA0KDQoNCiMjIyBudW1iZXIgb2YgYWxpZ25tZW50cyB3aGVyZSB0aGUgcmVhZCdzIG1hdGUgbWFwcGVkIHRvIHRoZSBzYW1lIGNocm9tb3NvbWUNCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtfCBjdXQgLWY3fGdyZXAgIj0ifHdjIC1sDQoNCg0KDQoNCiMgRGF0YSBleHRyYWN0aW9uIGZyb20gYSBkZWZpbmVkIGNocm9tb3NvbWFsIHJlZ2lvbjogRmlyc3QsIHNvcnQgYW5kIGluZGV4IHRoZSBmaWxlDQoNCnNhbXRvb2xzIHNvcnQgeHh4eHh4LmJhbSB4eHh4eHguc29ydGVkLmJhbQ0KDQpzYW10b29scyBpbmRleCB4eHh4eHguc29ydGVkLmJhbQ0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifHdjIC1sDQoNCiMjIyBudW1iZXIgb2YgdW5tYXBwZWQgbWF0ZXMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjd8Z3JlcCAiKiJ8d2MgLWwNCg0KIyMjIG51bWJlciBvZiBkZWxldGlvbnMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjZ8Z3JlcCAiRCJ8d2MgLWwNCg0KIyMjIG51bWJlciBvZiBtYXBwZWQgbWF0ZXMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjd8Z3JlcCAiPSJ8d2MgLWwNCg0KIyMjIG51bWJlciBvZiBzcGxpY2luZyAgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjZ8Z3JlcCAiTiJ8d2MgLWwNCg0KDQoNCg0KIyNBbiBhbHRlcm5hdGl2ZSBhcHByb2FjaCB0byBleHRyYWN0IGRhdGEgZnJvbSBhIHBhcnRpY3VsYXIgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgLWIgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAiID4geHh4eHh4LnJlZ2lvbi5iYW0NCg0Kc2FtdG9vbHMgZmxhZ3N0YXQgeHh4eHh4LnJlZ2lvbi5iYW0NCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtIHwgY3V0IC1mNiB8IGdyZXAgLWMgJ04nDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnJlZ2lvbi5iYW0gfCBjdXQgLWY2IHwgZ3JlcCAtYyAnRCcNCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHgucmVnaW9uLmJhbSB8IGN1dCAtZjcgfCBncmVwIC1jICcqJw0KDQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW0gfCBjdXQgLWY3IHwgZ3JlcCAtYyAnPScNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQo=