Position Weight Matrix

Στην συνέχεια μετατρέπουμε τα strings σε vectors

## IT's a list!
unlist(a.tmp)
[1] "K" "A" "L" "I" "M" "E" "R" "A"

Που θα σωσουμε τα νέα vectors? Σε πίνακα…

m1 = matrix("", nrow=nrow(a), ncol=9)
m1

for(i in 1:nrow(a)){
  v = unlist(strsplit(a[i,], split=""))
  m1[i,] = v
}
m1

countmat = matrix(0, nrow=4, ncol=9)
rownames(countmat) = c("A", "C", "G", "T")
for(i in 1:nrow(m1)){
  for(j in 1:ncol(m1)){
    mychar = m1[i,j]
    ##print(mychar)
    countmat[mychar, j] = countmat[mychar, j] + 1
  }
}
countmat

Ας μετατρέψουμε τώρα τον countmat σε συχνότητες

freqmat
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A  0.3  0.6  0.1    0    0  0.6  0.7  0.2  0.1
C  0.2  0.2  0.1    0    0  0.2  0.1  0.1  0.2
G  0.1  0.1  0.7    1    0  0.1  0.1  0.5  0.1
T  0.4  0.1  0.1    0    1  0.1  0.1  0.2  0.6

Μετατροπή σε log2 προς expected values

pwm
        [,1]       [,2]      [,3]      [,4]      [,5]       [,6]      [,7]
A  0.2630356  1.2630350 -1.321924 -19.93157 -19.93157  1.2630350  1.485427
C -0.3219263 -0.3219263 -1.321924 -19.93157 -19.93157 -0.3219263 -1.321924
G -1.3219245 -1.3219245  1.485427   2.00000 -19.93157 -1.3219245 -1.321924
T  0.6780728 -1.3219245 -1.321924 -19.93157   2.00000 -1.3219245 -1.321924
        [,8]       [,9]
A -0.3219263 -1.3219245
C -1.3219245 -0.3219263
G  1.0000007 -1.3219245
T -0.3219263  1.2630350

Visualize it

LS0tCnRpdGxlOiAiUG9zaXRpb24gV2VpZ2h0IE1hdHJpeCIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgUG9zaXRpb24gV2VpZ2h0IE1hdHJpeAoKCmBgYHtyfQojIyByZWFkIHRoZSBmaWxlCmEgPSByZWFkLnRhYmxlKCJwd20udHh0IiwgY29sQ2xhc3NlcyA9ICJjaGFyYWN0ZXIiKQphCmBgYAoKzqPPhM63zr0gz4PPhc69zq3Ph861zrnOsSDOvM61z4TOsc+Ez4HOrc+Azr/Phc68zrUgz4TOsSBzdHJpbmdzIM+DzrUgdmVjdG9ycwoKYGBge3J9CmFzdHJpbmcgPSAiS0FMSU1FUkEiCmEudG1wID0gc3Ryc3BsaXQoeCA9IGEsIHNwbGl0ID0gJycpICMjIHNlcCBzaG91bGQgYmUgTk9USEhJTkcKYS50bXAKIyMgSVQncyBhIGxpc3QhCnVubGlzdChhLnRtcCkKYGBgCgrOoM6/z4UgzrjOsSDPg8+Jz4POv8+FzrzOtSDPhM6xIM69zq3OsSB2ZWN0b3JzPyDOo861IM+Azq/Ovc6xzrrOsS4uLgoKYGBge3J9Cm0xID0gbWF0cml4KCIiLCBucm93PW5yb3coYSksIG5jb2w9OSkKbTEKCmZvcihpIGluIDE6bnJvdyhhKSl7CiAgdiA9IHVubGlzdChzdHJzcGxpdChhW2ksXSwgc3BsaXQ9IiIpKQogIG0xW2ksXSA9IHYKfQptMQoKY291bnRtYXQgPSBtYXRyaXgoMCwgbnJvdz00LCBuY29sPTkpCnJvd25hbWVzKGNvdW50bWF0KSA9IGMoIkEiLCAiQyIsICJHIiwgIlQiKQpmb3IoaSBpbiAxOm5yb3cobTEpKXsKICBmb3IoaiBpbiAxOm5jb2wobTEpKXsKICAgIG15Y2hhciA9IG0xW2ksal0KICAgICMjcHJpbnQobXljaGFyKQogICAgY291bnRtYXRbbXljaGFyLCBqXSA9IGNvdW50bWF0W215Y2hhciwgal0gKyAxCiAgfQp9CmNvdW50bWF0CmBgYAoKzpHPgiDOvM61z4TOsc+Ez4HOrc+Izr/Phc68zrUgz4TPjs+BzrEgz4TOv869IGBjb3VudG1hdGAgz4POtSDPg8+Fz4fOvc+Mz4TOt8+EzrXPggoKYGBge3J9CmZyZXFtYXQgPSBtYXRyaXgoMCwgbnJvdz00LCBuY29sPTkpCmNzID0gY29sU3Vtcyhjb3VudG1hdCkKCiMjIGZyZXFtYXQgPSBjb3VudG1hdC9jcyBJdCB3aWxsIHdvcmsgYnV0IGl0J3Mgd3JvbmcKCmZyZXFtYXQgPSB0KHQoY291bnRtYXQpL2NzKQpmcmVxbWF0CmBgYAoKzpzOtc+EzrHPhM+Bzr/PgM6uIM+DzrUgbG9nMiDPgM+Bzr/PgiBleHBlY3RlZCB2YWx1ZXMKCmBgYHtyfQp0bXAubWF0ID0gZnJlcW1hdC8wLjI1ICMjIGhvdyBtb3JlIG9mdGVuIHdlIHNlZSBhIGxldHRlciB0aGFuIGV4cGVjdGVkIGJ5IGNoYW5jZQpwd20gPSBsb2cyKHRtcC5tYXQpCnB3bSAjIyBVUFMKCnRtcC5tYXQgPSBmcmVxbWF0LzAuMjUgKyAxZS02CnB3bSA9IGxvZzIodG1wLm1hdCkKcHdtCmBgYAoKVmlzdWFsaXplIGl0CgpgYGB7cn0KbGlicmFyeShzZXFMb2dvKQpzZXFMb2dvKGZyZXFtYXQpCmBgYAo=