SAM/BAM Documentation
Here is the documentation for SAM.
Below is snippet of a SAM file:
LINUX commands for analysis of SAM file
number of alignments contained insertions
cat xxxxxx.sam | cut -f6|grep “I”|wc -l
number of alignments contained deletions
cat xxxxxx.sam | cut -f6|grep “D”|wc -l
number of alignments contained both insertions and deletions
cat xxxxxx.sam | cut -f6|grep “I”|grep “D”|wc -l
number of alignments contained insertions and/or deletions
cut -f6 xxxxxx.sam | grep -c “[I,D]”
total number of reads (including no. reads unpaired, non-aligned, aligned once or more than once)
samtools flagstat xxxxxx.sam
LINUX commands for analysis of BAM file
number of sequences in the file
samtools view -H xxxxxx.bam | grep -c “SN:”
Length of 1st sequence in the file
samtools view -H xxxxxx.bam | grep “SN:” | more
identifier (name) for the first alignment
samtools view xxxxxx.bam | head -1 | cut -f1
number of alignments in the file
samtools flagstat xxxxxx.bam
samtools view xxxxxx.bam | wc -l
number of unmapped reads
samtools view xxxxxx.bam | cut -f3 | grep -v ’*’ | wc -l
number of spliced alignments
samtools view xxxxxx.bam |cut -f6|grep “N”|wc -l
samtools view xxxxxx.bam | cut -f6 | grep -c ‘N’
number of alignments containing a deletion (D)
samtools view xxxxxx.bam| cut -f6|grep “D”|wc -l
samtools view xxxxxx.bam | cut -f6 | grep -c ‘D’
alignment with an unmapped mate (*)
samtools view xxxxxx.bam| cut -f7|grep “*“|wc -l
number of alignments where the read’s mate mapped to the same chromosome
samtools view xxxxxx.bam| cut -f7|grep “=”|wc -l
Data extraction from a defined chromosomal region: First, sort and index the file
samtools sort xxxxxx.bam xxxxxx.sorted.bam
samtools index xxxxxx.sorted.bam
number of alignments in a specified chromosomal region
samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|wc -l
number of unmapped mates in a specified chromosomal region
samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f7|grep “*“|wc -l
number of deletions in a specified chromosomal region
samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f6|grep “D”|wc -l
number of mapped mates in a specified chromosomal region
samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f7|grep “=”|wc -l
number of splicing in a specified chromosomal region
samtools view xxxxxx.sorted.bam “Chr3:11777000-11794000”|cut -f6|grep “N”|wc -l
LS0tDQp0aXRsZTogIlNBTS9CQU06IEFubm90YXRpb24gYW5kIGFuYWx5c2lzIG9mIG5leHQtZ2VuZXJhdGlvbiBzZXF1ZW5jaW5nIGRhdGEiDQphdXRob3I6ICJCaGFnaXJhdGhpIERhc2giDQpkYXRlOiAiRGVjZW1iZXIgMiwgMjAxOCINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMgU0FNL0JBTSBEb2N1bWVudGF0aW9uDQoNCkhlcmUgaXMgdGhlIGRvY3VtZW50YXRpb24gZm9yIFtTQU1dKGh0dHA6Ly9zYW10b29scy5naXRodWIuaW8vaHRzLXNwZWNzL1NBTXYxLnBkZikuDQoNCkJlbG93IGlzIHNuaXBwZXQgb2YgYSBTQU0gZmlsZToNCg0KIVtTQU1zbmlwcGV0XShDOi9Vc2Vycy9iaGFnaS9PbmVEcml2ZS9Qcm9ncmFtbWluZy1MaW51eC9TQU12MS5KUEcpDQoNCiMgTElOVVggYmFzZWQgdG9vbHMNCg0KW2dpdGh1Yi1zYW10b29sc10oaHR0cHM6Ly9naXRodWIuY29tL3NhbXRvb2xzL3NhbXRvb2xzKQ0KDQpbYmlvY29uZGEtc2FtdG9vbHNdKGh0dHBzOi8vYmlvY29uZGEuZ2l0aHViLmlvL3JlY2lwZXMvc2FtdG9vbHMvUkVBRE1FLmh0bWwpDQoNCg0KW2dpdGh1Yi1iYW10b29sc10oaHR0cHM6Ly9naXRodWIuY29tL3NhbXRvb2xzL3NhbXRvb2xzKQ0KDQpbYmlvY29uZGEtYmFtdG9vbHNdKGh0dHBzOi8vYmlvY29uZGEuZ2l0aHViLmlvL3JlY2lwZXMvYmFtdG9vbHMvUkVBRE1FLmh0bWwpDQoNCg0KW2dpdGh1Yi1zYW1iYW1iYV0oaHR0cHM6Ly9naXRodWIuY29tL2Jpb2Qvc2FtYmFtYmEpDQoNCltiaW9jb25kYS1zYW1iYW1iYV0oaHR0cHM6Ly9iaW9jb25kYS5naXRodWIuaW8vcmVjaXBlcy9zYW1iYW1iYS9SRUFETUUuaHRtbCkNCg0KDQojIExJTlVYIGNvbW1hbmRzIGZvciBhbmFseXNpcyBvZiBTQU0gZmlsZQ0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgY29udGFpbmVkIGluc2VydGlvbnMgDQoNCiBjYXQgeHh4eHh4LnNhbSB8IGN1dCAtZjZ8Z3JlcCAiSSJ8d2MgLWwNCg0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgY29udGFpbmVkIGRlbGV0aW9ucw0KDQogY2F0IHh4eHh4eC5zYW0gfCBjdXQgLWY2fGdyZXAgIkQifHdjIC1sDQoNCg0KIyMjIG51bWJlciBvZiBhbGlnbm1lbnRzIGNvbnRhaW5lZCBib3RoIGluc2VydGlvbnMgYW5kIGRlbGV0aW9ucw0KDQogY2F0IHh4eHh4eC5zYW0gfCBjdXQgLWY2fGdyZXAgIkkifGdyZXAgIkQifHdjIC1sDQoNCg0KIyMjIG51bWJlciBvZiBhbGlnbm1lbnRzIGNvbnRhaW5lZCBpbnNlcnRpb25zIGFuZC9vciBkZWxldGlvbnMNCg0KY3V0IC1mNiB4eHh4eHguc2FtIHwgZ3JlcCAtYyAiW0ksRF0iDQoNCiMjIyB0b3RhbCBudW1iZXIgb2YgcmVhZHMgKGluY2x1ZGluZyBuby4gcmVhZHMgdW5wYWlyZWQsIG5vbi1hbGlnbmVkLCBhbGlnbmVkIG9uY2Ugb3IgbW9yZSB0aGFuIG9uY2UpIA0KDQpzYW10b29scyBmbGFnc3RhdCB4eHh4eHguc2FtDQoNCg0KIyBMSU5VWCBjb21tYW5kcyBmb3IgYW5hbHlzaXMgb2YgQkFNIGZpbGUNCg0KIyMjIG51bWJlciBvZiBzZXF1ZW5jZXMgaW4gdGhlIGZpbGUNCnNhbXRvb2xzIHZpZXcgLUggeHh4eHh4LmJhbSB8IGdyZXAgLWMgIlNOOiINCg0KIyMjIExlbmd0aCBvZiAxc3Qgc2VxdWVuY2UgaW4gdGhlIGZpbGUNCnNhbXRvb2xzIHZpZXcgLUggeHh4eHh4LmJhbSB8IGdyZXAgIlNOOiIgfCBtb3JlDQoNCiMjIyBuYW1lIG9mIHRoZSBhbGlnbm1lbnQgdG9vbA0Kc2FtdG9vbHMgdmlldyAtSCB4eHh4eHguYmFtIHwgZ3JlcCAiXkBQRyINCg0KIyMjIGlkZW50aWZpZXIgKG5hbWUpIGZvciB0aGUgZmlyc3QgYWxpZ25tZW50DQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW0gfCBoZWFkIC0xIHwgY3V0IC1mMQ0KDQoNCiMjIyBudW1iZXIgb2YgYWxpZ25tZW50cyBpbiB0aGUgZmlsZSANCg0Kc2FtdG9vbHMgZmxhZ3N0YXQgeHh4eHh4LmJhbSANCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtIHwgd2MgLWwNCg0KDQojIyMgbnVtYmVyIG9mIHVubWFwcGVkIHJlYWRzDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LmJhbSB8IGN1dCAtZjMgfCBncmVwIC12ICcqJyB8IHdjIC1sIA0KDQoNCiMjIyBudW1iZXIgb2Ygc3BsaWNlZCBhbGlnbm1lbnRzDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LmJhbSB8Y3V0IC1mNnxncmVwICJOInx3YyAtbA0KDQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW0gfCBjdXQgLWY2IHwgZ3JlcCAtYyAnTicNCg0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgY29udGFpbmluZyBhIGRlbGV0aW9uIChEKQ0KDQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW18IGN1dCAtZjZ8Z3JlcCAiRCJ8d2MgLWwNCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtIHwgY3V0IC1mNiB8IGdyZXAgLWMgJ0QnDQoNCg0KIyMjIGFsaWdubWVudCB3aXRoIGFuIHVubWFwcGVkIG1hdGUgKCopDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LmJhbXwgY3V0IC1mN3xncmVwICIqInx3YyAtbA0KDQoNCiMjIyBudW1iZXIgb2YgYWxpZ25tZW50cyB3aGVyZSB0aGUgcmVhZCdzIG1hdGUgbWFwcGVkIHRvIHRoZSBzYW1lIGNocm9tb3NvbWUNCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtfCBjdXQgLWY3fGdyZXAgIj0ifHdjIC1sDQoNCg0KDQoNCiMgRGF0YSBleHRyYWN0aW9uIGZyb20gYSBkZWZpbmVkIGNocm9tb3NvbWFsIHJlZ2lvbjogRmlyc3QsIHNvcnQgYW5kIGluZGV4IHRoZSBmaWxlDQoNCnNhbXRvb2xzIHNvcnQgeHh4eHh4LmJhbSB4eHh4eHguc29ydGVkLmJhbQ0KDQpzYW10b29scyBpbmRleCB4eHh4eHguc29ydGVkLmJhbQ0KDQojIyMgbnVtYmVyIG9mIGFsaWdubWVudHMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifHdjIC1sDQoNCiMjIyBudW1iZXIgb2YgdW5tYXBwZWQgbWF0ZXMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjd8Z3JlcCAiKiJ8d2MgLWwNCg0KIyMjIG51bWJlciBvZiBkZWxldGlvbnMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjZ8Z3JlcCAiRCJ8d2MgLWwNCg0KIyMjIG51bWJlciBvZiBtYXBwZWQgbWF0ZXMgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjd8Z3JlcCAiPSJ8d2MgLWwNCg0KIyMjIG51bWJlciBvZiBzcGxpY2luZyAgaW4gYSBzcGVjaWZpZWQgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAifGN1dCAtZjZ8Z3JlcCAiTiJ8d2MgLWwNCg0KDQoNCg0KIyNBbiBhbHRlcm5hdGl2ZSBhcHByb2FjaCB0byBleHRyYWN0IGRhdGEgZnJvbSBhIHBhcnRpY3VsYXIgY2hyb21vc29tYWwgcmVnaW9uDQoNCnNhbXRvb2xzIHZpZXcgLWIgeHh4eHh4LnNvcnRlZC5iYW0gIkNocjM6MTE3NzcwMDAtMTE3OTQwMDAiID4geHh4eHh4LnJlZ2lvbi5iYW0NCg0Kc2FtdG9vbHMgZmxhZ3N0YXQgeHh4eHh4LnJlZ2lvbi5iYW0NCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHguYmFtIHwgY3V0IC1mNiB8IGdyZXAgLWMgJ04nDQoNCnNhbXRvb2xzIHZpZXcgeHh4eHh4LnJlZ2lvbi5iYW0gfCBjdXQgLWY2IHwgZ3JlcCAtYyAnRCcNCg0Kc2FtdG9vbHMgdmlldyB4eHh4eHgucmVnaW9uLmJhbSB8IGN1dCAtZjcgfCBncmVwIC1jICcqJw0KDQpzYW10b29scyB2aWV3IHh4eHh4eC5iYW0gfCBjdXQgLWY3IHwgZ3JlcCAtYyAnPScNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQo=