Package Installation and Similarity Calculation

install.packages("seqinr")

Load the package

Import the alignment

seqs <- read.alignment("~/Downloads/alignment_real.fa", format = "fasta")
Error: could not find function "read.alignment"

Shorten the sequences name

seqs$nam <- n1[,1]
Error in n1[, 1] : incorrect number of dimensions

Create the similarity distance

aa <- dist.alignment(seqs, matrix = "similarity")
aa1 <- as.matrix(aa)
aa1
           A1        A2        A3        A4         A5         A6
A1  0.0000000 0.5420691 0.6449453 0.6595797 0.67889164 0.68033605
A2  0.5420691 0.0000000 0.6610407 0.6718091 0.68669114 0.68824720
A3  0.6449453 0.6610407 0.0000000 0.6098214 0.67437405 0.67366598
A4  0.6595797 0.6718091 0.6098214 0.0000000 0.68920244 0.69068141
A5  0.6788916 0.6866911 0.6743741 0.6892024 0.00000000 0.03632164
A6  0.6803361 0.6882472 0.6736660 0.6906814 0.03632164 0.00000000
A7  0.6317128 0.6215816 0.6750942 0.6473389 0.63575246 0.63921995
A8  0.2166757 0.5494423 0.6360491 0.6572855 0.68611376 0.68770231
A9  0.5260345 0.4542568 0.6752970 0.6996503 0.65979636 0.66585711
A10 0.6558959 0.6757090 0.1655212 0.6023386 0.68199434 0.68145429
A11 0.6666667 0.6733895 0.5952414 0.1831474 0.68855969 0.69016166
A12 0.6758879 0.6818100 0.6724555 0.6946222 0.15759632 0.15842688
           A7        A8        A9       A10       A11       A12
A1  0.6317128 0.2166757 0.5260345 0.6558959 0.6666667 0.6758879
A2  0.6215816 0.5494423 0.4542568 0.6757090 0.6733895 0.6818100
A3  0.6750942 0.6360491 0.6752970 0.1655212 0.5952414 0.6724555
A4  0.6473389 0.6572855 0.6996503 0.6023386 0.1831474 0.6946222
A5  0.6357525 0.6861138 0.6597964 0.6819943 0.6885597 0.1575963
A6  0.6392200 0.6877023 0.6658571 0.6814543 0.6901617 0.1584269
A7  0.0000000 0.6242366 0.5835585 0.6750942 0.6286796 0.6302480
A8  0.6242366 0.0000000 0.5323333 0.6471502 0.6643478 0.6831301
A9  0.5835585 0.5323333 0.0000000 0.6801035 0.6892024 0.6587581
A10 0.6750942 0.6471502 0.6801035 0.0000000 0.5873571 0.6800973
A11 0.6286796 0.6643478 0.6892024 0.5873571 0.0000000 0.6944702
A12 0.6302480 0.6831301 0.6587581 0.6800973 0.6944702 0.0000000

check if similarity is greater than 1

range(aa1) 
[1] 0.0000000 0.6996503

less than 1. Pretty good!

create heatmap of the whole similarity matrix

cexRow and cexCol are for adjusting the font size

you can see that the the sequences from the same genes clustered together.

You can also draw without dendrogram

Now only AT.. as a row and ARAL… as a column and the heatmap

LS0tCnRpdGxlOiAiU2ltaWxhcml0eSBNYXRyaXggYW5kIEhlYXRtYXAiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCiMjIFBhY2thZ2UgSW5zdGFsbGF0aW9uIGFuZCBTaW1pbGFyaXR5IENhbGN1bGF0aW9uCgpgYGB7cn0KaW5zdGFsbC5wYWNrYWdlcygic2VxaW5yIikKYGBgCgpMb2FkIHRoZSBwYWNrYWdlCgpgYGB7cn0KbGlicmFyeShzZXFpbnIpCmBgYAoKSW1wb3J0IHRoZSBhbGlnbm1lbnQKCmBgYHtyfQpzZXFzIDwtIHJlYWQuYWxpZ25tZW50KCJ+L0Rvd25sb2Fkcy9hbGlnbm1lbnRfcmVhbC5mYSIsIGZvcm1hdCA9ICJmYXN0YSIpCmBgYAoKU2hvcnRlbiB0aGUgc2VxdWVuY2VzIG5hbWUKCmBgYHtyfQpuMSA8LSBwYXN0ZSgiQSIsIDE6bGVuZ3RoKHNlcXMkbmFtKSwgc2VwID0gIiIpCnNlcXMkbmFtIDwtIG4xCmBgYAoKCkNyZWF0ZSB0aGUgc2ltaWxhcml0eSBkaXN0YW5jZQoKYGBge3J9CmFhIDwtIGRpc3QuYWxpZ25tZW50KHNlcXMsIG1hdHJpeCA9ICJzaW1pbGFyaXR5IikKYWExIDwtIGFzLm1hdHJpeChhYSkKYWExCmBgYAoKY2hlY2sgaWYgc2ltaWxhcml0eSBpcyBncmVhdGVyIHRoYW4gMQoKYGBge3J9CnJhbmdlKGFhMSkgCmBgYAoKbGVzcyB0aGFuIDEuIFByZXR0eSBnb29kIQoKCiMjIGNyZWF0ZSBoZWF0bWFwIG9mIHRoZSB3aG9sZSBzaW1pbGFyaXR5IG1hdHJpeAoKYGBge3J9CmhlYXRtYXAoYWExLCBjZXhSb3cgPSAwLjUsIGNleENvbCA9IDAuNSkKYGBgCgpjZXhSb3cgYW5kIGNleENvbCBhcmUgZm9yIGFkanVzdGluZyB0aGUgZm9udCBzaXplCgp5b3UgY2FuIHNlZSB0aGF0IHRoZSB0aGUgc2VxdWVuY2VzIGZyb20gdGhlIHNhbWUgZ2VuZXMgY2x1c3RlcmVkIHRvZ2V0aGVyLgoKWW91IGNhbiBhbHNvIGRyYXcgd2l0aG91dCBkZW5kcm9ncmFtCgpgYGB7cn0KaGVhdG1hcChhYTEsIGNleFJvdyA9IDAuNSwgY2V4Q29sID0gMC41LCBSb3d2ID0gTkEsIENvbHYgPSBOQSkKYGBgCgoKTm93IG9ubHkgQVQuLiBhcyBhIHJvdyBhbmQgQVJBTC4uLiBhcyBhIGNvbHVtbiBhbmQgdGhlIGhlYXRtYXAKCmBgYHtyfQphYTIgPC0gYWExWzE6NywgODoxMl0KaGVhdG1hcChhYTIsIGNleFJvdyA9IDAuNSwgY2V4Q29sID0gMC41KQpgYGAKCgo=