install.packages("seqinr")
Load the package
Import the alignment
seqs <- read.alignment("~/Downloads/alignment_real.fa", format = "fasta")
Error: could not find function "read.alignment"
Shorten the sequences name
seqs$nam <- n1[,1]
Error in n1[, 1] : incorrect number of dimensions
Create the similarity distance
aa <- dist.alignment(seqs, matrix = "similarity")
aa1 <- as.matrix(aa)
aa1
A1 A2 A3 A4 A5 A6
A1 0.0000000 0.5420691 0.6449453 0.6595797 0.67889164 0.68033605
A2 0.5420691 0.0000000 0.6610407 0.6718091 0.68669114 0.68824720
A3 0.6449453 0.6610407 0.0000000 0.6098214 0.67437405 0.67366598
A4 0.6595797 0.6718091 0.6098214 0.0000000 0.68920244 0.69068141
A5 0.6788916 0.6866911 0.6743741 0.6892024 0.00000000 0.03632164
A6 0.6803361 0.6882472 0.6736660 0.6906814 0.03632164 0.00000000
A7 0.6317128 0.6215816 0.6750942 0.6473389 0.63575246 0.63921995
A8 0.2166757 0.5494423 0.6360491 0.6572855 0.68611376 0.68770231
A9 0.5260345 0.4542568 0.6752970 0.6996503 0.65979636 0.66585711
A10 0.6558959 0.6757090 0.1655212 0.6023386 0.68199434 0.68145429
A11 0.6666667 0.6733895 0.5952414 0.1831474 0.68855969 0.69016166
A12 0.6758879 0.6818100 0.6724555 0.6946222 0.15759632 0.15842688
A7 A8 A9 A10 A11 A12
A1 0.6317128 0.2166757 0.5260345 0.6558959 0.6666667 0.6758879
A2 0.6215816 0.5494423 0.4542568 0.6757090 0.6733895 0.6818100
A3 0.6750942 0.6360491 0.6752970 0.1655212 0.5952414 0.6724555
A4 0.6473389 0.6572855 0.6996503 0.6023386 0.1831474 0.6946222
A5 0.6357525 0.6861138 0.6597964 0.6819943 0.6885597 0.1575963
A6 0.6392200 0.6877023 0.6658571 0.6814543 0.6901617 0.1584269
A7 0.0000000 0.6242366 0.5835585 0.6750942 0.6286796 0.6302480
A8 0.6242366 0.0000000 0.5323333 0.6471502 0.6643478 0.6831301
A9 0.5835585 0.5323333 0.0000000 0.6801035 0.6892024 0.6587581
A10 0.6750942 0.6471502 0.6801035 0.0000000 0.5873571 0.6800973
A11 0.6286796 0.6643478 0.6892024 0.5873571 0.0000000 0.6944702
A12 0.6302480 0.6831301 0.6587581 0.6800973 0.6944702 0.0000000
check if similarity is greater than 1
range(aa1)
[1] 0.0000000 0.6996503
less than 1. Pretty good!
cexRow and cexCol are for adjusting the font size
you can see that the the sequences from the same genes clustered together.
You can also draw without dendrogram
Now only AT.. as a row and ARAL… as a column and the heatmap