Position Weight Matrix
Στην συνέχεια μετατρέπουμε τα strings σε vectors
## IT's a list!
unlist(a.tmp)
[1] "K" "A" "L" "I" "M" "E" "R" "A"
Που θα σωσουμε τα νέα vectors? Σε πίνακα…
m1 = matrix("", nrow=nrow(a), ncol=9)
m1
for(i in 1:nrow(a)){
v = unlist(strsplit(a[i,], split=""))
m1[i,] = v
}
m1
countmat = matrix(0, nrow=4, ncol=9)
rownames(countmat) = c("A", "C", "G", "T")
for(i in 1:nrow(m1)){
for(j in 1:ncol(m1)){
mychar = m1[i,j]
##print(mychar)
countmat[mychar, j] = countmat[mychar, j] + 1
}
}
countmat
Ας μετατρέψουμε τώρα τον countmat σε συχνότητες
freqmat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A 0.3 0.6 0.1 0 0 0.6 0.7 0.2 0.1
C 0.2 0.2 0.1 0 0 0.2 0.1 0.1 0.2
G 0.1 0.1 0.7 1 0 0.1 0.1 0.5 0.1
T 0.4 0.1 0.1 0 1 0.1 0.1 0.2 0.6
Μετατροπή σε log2 προς expected values
pwm
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
A 0.2630356 1.2630350 -1.321924 -19.93157 -19.93157 1.2630350 1.485427
C -0.3219263 -0.3219263 -1.321924 -19.93157 -19.93157 -0.3219263 -1.321924
G -1.3219245 -1.3219245 1.485427 2.00000 -19.93157 -1.3219245 -1.321924
T 0.6780728 -1.3219245 -1.321924 -19.93157 2.00000 -1.3219245 -1.321924
[,8] [,9]
A -0.3219263 -1.3219245
C -1.3219245 -0.3219263
G 1.0000007 -1.3219245
T -0.3219263 1.2630350
Visualize it

LS0tCnRpdGxlOiAiUG9zaXRpb24gV2VpZ2h0IE1hdHJpeCIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgUG9zaXRpb24gV2VpZ2h0IE1hdHJpeAoKCmBgYHtyfQojIyByZWFkIHRoZSBmaWxlCmEgPSByZWFkLnRhYmxlKCJwd20udHh0IiwgY29sQ2xhc3NlcyA9ICJjaGFyYWN0ZXIiKQphCmBgYAoKzqPPhM63zr0gz4PPhc69zq3Ph861zrnOsSDOvM61z4TOsc+Ez4HOrc+Azr/Phc68zrUgz4TOsSBzdHJpbmdzIM+DzrUgdmVjdG9ycwoKYGBge3J9CmFzdHJpbmcgPSAiS0FMSU1FUkEiCmEudG1wID0gc3Ryc3BsaXQoeCA9IGEsIHNwbGl0ID0gJycpICMjIHNlcCBzaG91bGQgYmUgTk9USEhJTkcKYS50bXAKIyMgSVQncyBhIGxpc3QhCnVubGlzdChhLnRtcCkKYGBgCgrOoM6/z4UgzrjOsSDPg8+Jz4POv8+FzrzOtSDPhM6xIM69zq3OsSB2ZWN0b3JzPyDOo861IM+Azq/Ovc6xzrrOsS4uLgoKYGBge3J9Cm0xID0gbWF0cml4KCIiLCBucm93PW5yb3coYSksIG5jb2w9OSkKbTEKCmZvcihpIGluIDE6bnJvdyhhKSl7CiAgdiA9IHVubGlzdChzdHJzcGxpdChhW2ksXSwgc3BsaXQ9IiIpKQogIG0xW2ksXSA9IHYKfQptMQoKY291bnRtYXQgPSBtYXRyaXgoMCwgbnJvdz00LCBuY29sPTkpCnJvd25hbWVzKGNvdW50bWF0KSA9IGMoIkEiLCAiQyIsICJHIiwgIlQiKQpmb3IoaSBpbiAxOm5yb3cobTEpKXsKICBmb3IoaiBpbiAxOm5jb2wobTEpKXsKICAgIG15Y2hhciA9IG0xW2ksal0KICAgICMjcHJpbnQobXljaGFyKQogICAgY291bnRtYXRbbXljaGFyLCBqXSA9IGNvdW50bWF0W215Y2hhciwgal0gKyAxCiAgfQp9CmNvdW50bWF0CmBgYAoKzpHPgiDOvM61z4TOsc+Ez4HOrc+Izr/Phc68zrUgz4TPjs+BzrEgz4TOv869IGBjb3VudG1hdGAgz4POtSDPg8+Fz4fOvc+Mz4TOt8+EzrXPggoKYGBge3J9CmZyZXFtYXQgPSBtYXRyaXgoMCwgbnJvdz00LCBuY29sPTkpCmNzID0gY29sU3Vtcyhjb3VudG1hdCkKCiMjIGZyZXFtYXQgPSBjb3VudG1hdC9jcyBJdCB3aWxsIHdvcmsgYnV0IGl0J3Mgd3JvbmcKCmZyZXFtYXQgPSB0KHQoY291bnRtYXQpL2NzKQpmcmVxbWF0CmBgYAoKzpzOtc+EzrHPhM+Bzr/PgM6uIM+DzrUgbG9nMiDPgM+Bzr/PgiBleHBlY3RlZCB2YWx1ZXMKCmBgYHtyfQp0bXAubWF0ID0gZnJlcW1hdC8wLjI1ICMjIGhvdyBtb3JlIG9mdGVuIHdlIHNlZSBhIGxldHRlciB0aGFuIGV4cGVjdGVkIGJ5IGNoYW5jZQpwd20gPSBsb2cyKHRtcC5tYXQpCnB3bSAjIyBVUFMKCnRtcC5tYXQgPSBmcmVxbWF0LzAuMjUgKyAxZS02CnB3bSA9IGxvZzIodG1wLm1hdCkKcHdtCmBgYAoKVmlzdWFsaXplIGl0CgpgYGB7cn0KbGlicmFyeShzZXFMb2dvKQpzZXFMb2dvKGZyZXFtYXQpCmBgYAo=