Dosen : Prof. Suhartono - UIN MALIKI MALANG

Author : Lia Wahyuliningtyas
Mahasiswa Magister Informatika






1. Deskripsi

In molecular biology, many situations involve counting events: how many codons use a certain spelling, how many reads of DNA match a reference, how many CG digrams are observed in a DNA sequence. These counts give us discrete variables, as opposed to quantities such as mass and intensity that are measured on continuous scales.

2. The Rate Parameter of the Poisson Distribution

probabilitas melihat x=3 peristiwa, mengambil nilai parameter laju distribusi Poisson, yang disebut lambda menjadi 5 .

dpois(x = 3, lambda = 5)
## [1] 0.1403739
dpois(x = 0:12, lambda = 5)
##  [1] 0.006737947 0.033689735 0.084224337 0.140373896 0.175467370 0.175467370
##  [7] 0.146222808 0.104444863 0.065278039 0.036265577 0.018132789 0.008242177
## [13] 0.003434240
barplot(dpois(0:12, 5), names.arg = 0:12, col = "red")

genotype = c("AA","AO","BB","AO","OO","AO","AA","BO","BO",
             "AO","BB","AO","BO","AB","OO","AB","BB","AO","AO")
table(genotype)
## genotype
## AA AB AO BB BO OO 
##  2  2  7  3  3  2
rbinom(15, prob = 0.5, size = 1)
##  [1] 1 0 0 1 0 1 1 1 0 1 0 0 1 1 1
rbinom(12, prob = 2/3, size = 1)
##  [1] 1 1 0 1 1 1 0 1 0 0 0 1
rbinom(1, prob = 2/3, size = 12)
## [1] 10
set.seed(235569515)
rbinom(1, prob = 0.3, size = 15)
## [1] 5
probabilities = dbinom(0:15, prob = 0.3, size = 15)
round(probabilities, 2)
##  [1] 0.00 0.03 0.09 0.17 0.22 0.21 0.15 0.08 0.03 0.01 0.00 0.00 0.00 0.00 0.00
## [16] 0.00
barplot(probabilities, names.arg = 0:15, col = "red")

# 3. Bernoulli trials

Simulasikan proses mutasi sepanjang 10.000 posisi dengan tingkat mutasi dan hitung jumlah mutasi. Ulangi ini berkali-kali dan plot distribusi dengan fungsi barplot

rbinom(1, prob = 5e-4, size = 10000)
## [1] 6
simulations = rbinom(n = 300000, prob = 5e-4, size = 10000)
barplot(table(simulations), col = "lavender")

Model generatif untuk deteksi epitop

load("../Documents/data/data/e100.RData")
barplot(e100, ylim = c(0, 7), width = 0.7, xlim = c(-0.5, 100.5),
  names.arg = seq(along = e100), col = "darkolivegreen")

1 - ppois(6, 0.5)
## [1] 1.00238e-06
ppois(6, 0.5, lower.tail = FALSE)
## [1] 1.00238e-06
maxes = replicate(100000, {
  max(rpois(100, 0.5))
})
table(maxes)
## maxes
##     1     2     3     4     5     6     7     9 
##     7 23027 60840 14365  1604   141    15     1
pvec = rep(1/4, 4)
t(rmultinom(1, prob = pvec, size = 8))
##      [,1] [,2] [,3] [,4]
## [1,]    2    2    3    1
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("Biostrings", "BSgenome.Celegans.UCSC.ce2"))
## Bioconductor version 3.15 (BiocManager 1.30.19), R 4.2.1 (2022-06-23 ucrt)
## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'Biostrings'
## Installing package(s) 'BSgenome.Celegans.UCSC.ce2'
## installing the source package 'BSgenome.Celegans.UCSC.ce2'
## Warning in install.packages(...): installation of package
## 'BSgenome.Celegans.UCSC.ce2' had non-zero exit status
## Installation paths not writeable, unable to update packages
##   path: C:/Program Files/R/R-4.2.1/library
##   packages:
##     cluster, foreign, MASS, Matrix, mgcv, nlme, nnet, rpart, survival
## Old packages: 'clue', 'commonmark', 'data.table', 'deSolve', 'digest', 'e1071',
##   'jsonlite', 'maptools', 'markdown', 'minqa', 'openssl', 'pcaPP', 'polyclip',
##   'rrcov', 'spatstat.geom', 'spatstat.sparse', 'spatstat.utils', 'sys',
##   'vctrs', 'vegan', 'wk', 'xfun', 'yaml'

4. Summary of this chapter

We have used mathematical formulæ and R to compute probabilities of various discrete events that can we modeled with a few basic distributions: The Bernoulli distribution was our most basic building block – it is used to represent a single binary trial such as a coin flip. We can code the outcomes as 0 and 1. We call p the probability of success (the 1 outcome).

The binomial distribution is used for the number of 1s in n binary trial and we generate the probabilities of seeing k successes using the R function dbinom. We also saw that we could simulate an n trial binomial using the function rbinom.

The Poisson distribution is most appropriate for cases when p is small (the 1s are rare). It has only one parameter λ, and the Poisson distribution for λ = n p is approximately the same as the binomial distribution for (n,p) if p is small. We used the Poisson distribution to model the number of randomly occurring false positives in an assay that tested for epitopes along a sequence, presuming that the per-position false positive rate p was small. We saw how such a parametric model enabled us to compute the probabilities of extreme events, as long as we knew all the parameters.

The multinomial distribution is used for discrete events that have more than two possible outcomes or levels. The power example showed us how to use Monte Carlo simulations to decide how much data we need to collect if we want to test whether a multinomial model with equal probabilities is consistent with the data. We used probability distributions and probabilistic models to evaluate hypotheses about how our data were generated, by making assumptions about the generative models. We term the probability of seeing the data, given a hypothesis, a p-value. This is not the same as the probability that the hypothesis is true!

5. Reference

Durbin, Richard, Sean Eddy, Anders Krogh, and Graeme Mitchison. 1998. Biological Sequence Analysis. Cambridge University Press.

Freedman, David, Robert Pisani, and Roger Purves. 1997. Statistics. New York, NY: WW Norton.

Rice, John. 2006. Mathematical Statistics and Data Analysis. Cengage Learning.

Robert, Christian, and George Casella. 2009. Introducing Monte Carlo Methods with R. Springer Science & Business Media.

LS0tDQp0aXRsZTogIkdlbmVyYXRpdmUgTW9kZWxzIGZvciBEaXNjcmV0ZSBEYXRhIg0Kc3VidGl0bGU6ICJDaGFwdGVyIDEiDQoNCm91dHB1dDoNCiAgaHRtbF9kb2N1bWVudDoNCiAgICB0b2M6IHRydWUNCiAgICB0b2NfZmxvYXQ6IHRydWUNCiAgICB0aGVtZTogZmxhdGx5DQogICAgY29kZV9mb2xkaW5nOiAic2hvdyINCiAgICBjb2RlX2Rvd25sb2FkOiB5ZXMNCi0tLQ0KDQo8YnI+IDxpbWcgc3JjPSJpbWFnZXMvbG9nb191aW5fa2VjaWwucG5nIiBzdHlsZT0iZmxvYXQ6IGxlZnQ7IG1hcmdpbjogLTIwcHggNjBweCAwcHggNTBweDsgd2lkdGg6MjUlIi8+DQoNCg0KKipEb3NlbiA6IFByb2YuIFN1aGFydG9ubyAtIFVJTiBNQUxJS0kgTUFMQU5HKioNCg0KKkF1dGhvciA6IExpYSBXYWh5dWxpbmluZ3R5YXMqDQo8YnI+DQoqTWFoYXNpc3dhIE1hZ2lzdGVyIEluZm9ybWF0aWthKg0KPGJyPjxicj48YnI+PGJyPjxicj48YnI+PGJyPg0KDQoNCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQpgYGANCg0KIyAxLiBEZXNrcmlwc2kNCg0KSW4gbW9sZWN1bGFyIGJpb2xvZ3ksIG1hbnkgc2l0dWF0aW9ucyBpbnZvbHZlIGNvdW50aW5nIGV2ZW50czogaG93IG1hbnkgY29kb25zIHVzZSBhIGNlcnRhaW4gc3BlbGxpbmcsIGhvdyBtYW55IHJlYWRzIG9mIEROQSBtYXRjaCBhIHJlZmVyZW5jZSwgaG93IG1hbnkgQ0cgZGlncmFtcyBhcmUgb2JzZXJ2ZWQgaW4gYSBETkEgc2VxdWVuY2UuIFRoZXNlIGNvdW50cyBnaXZlIHVzIGRpc2NyZXRlIHZhcmlhYmxlcywgYXMgb3Bwb3NlZCB0byBxdWFudGl0aWVzIHN1Y2ggYXMgbWFzcyBhbmQgaW50ZW5zaXR5IHRoYXQgYXJlIG1lYXN1cmVkIG9uIGNvbnRpbnVvdXMgc2NhbGVzLg0KDQojIDIuIFRoZSBSYXRlIFBhcmFtZXRlciBvZiB0aGUgUG9pc3NvbiBEaXN0cmlidXRpb24NCg0KcHJvYmFiaWxpdGFzIG1lbGloYXQgeD0zIHBlcmlzdGl3YSwgbWVuZ2FtYmlsIG5pbGFpIHBhcmFtZXRlciBsYWp1IGRpc3RyaWJ1c2kgUG9pc3NvbiwgeWFuZyBkaXNlYnV0IGxhbWJkYQ0KbWVuamFkaSA1DQouDQpgYGB7cn0NCmRwb2lzKHggPSAzLCBsYW1iZGEgPSA1KQ0KYGBgDQoNCmBgYHtyfQ0KZHBvaXMoeCA9IDA6MTIsIGxhbWJkYSA9IDUpDQpgYGANCg0KYGBge3J9DQpiYXJwbG90KGRwb2lzKDA6MTIsIDUpLCBuYW1lcy5hcmcgPSAwOjEyLCBjb2wgPSAicmVkIikNCmBgYA0KDQpgYGB7cn0NCmdlbm90eXBlID0gYygiQUEiLCJBTyIsIkJCIiwiQU8iLCJPTyIsIkFPIiwiQUEiLCJCTyIsIkJPIiwNCiAgICAgICAgICAgICAiQU8iLCJCQiIsIkFPIiwiQk8iLCJBQiIsIk9PIiwiQUIiLCJCQiIsIkFPIiwiQU8iKQ0KdGFibGUoZ2Vub3R5cGUpDQpgYGANCg0KYGBge3J9DQpyYmlub20oMTUsIHByb2IgPSAwLjUsIHNpemUgPSAxKQ0KYGBgDQoNCmBgYHtyfQ0KcmJpbm9tKDEyLCBwcm9iID0gMi8zLCBzaXplID0gMSkNCmBgYA0KDQpgYGB7cn0NCnJiaW5vbSgxLCBwcm9iID0gMi8zLCBzaXplID0gMTIpDQpgYGANCg0KYGBge3J9DQpzZXQuc2VlZCgyMzU1Njk1MTUpDQpyYmlub20oMSwgcHJvYiA9IDAuMywgc2l6ZSA9IDE1KQ0KYGBgDQoNCmBgYHtyfQ0KcHJvYmFiaWxpdGllcyA9IGRiaW5vbSgwOjE1LCBwcm9iID0gMC4zLCBzaXplID0gMTUpDQpyb3VuZChwcm9iYWJpbGl0aWVzLCAyKQ0KYmFycGxvdChwcm9iYWJpbGl0aWVzLCBuYW1lcy5hcmcgPSAwOjE1LCBjb2wgPSAicmVkIikNCmBgYA0KIyAzLiAgQmVybm91bGxpIHRyaWFscw0KDQpTaW11bGFzaWthbiBwcm9zZXMgbXV0YXNpIHNlcGFuamFuZyAxMC4wMDAgcG9zaXNpIGRlbmdhbiB0aW5na2F0IG11dGFzaSBkYW4gaGl0dW5nIGp1bWxhaCBtdXRhc2kuIFVsYW5naSBpbmkgYmVya2FsaS1rYWxpIGRhbiBwbG90IGRpc3RyaWJ1c2kgZGVuZ2FuIGZ1bmdzaSBiYXJwbG90DQoNCmBgYHtyfQ0KcmJpbm9tKDEsIHByb2IgPSA1ZS00LCBzaXplID0gMTAwMDApDQpzaW11bGF0aW9ucyA9IHJiaW5vbShuID0gMzAwMDAwLCBwcm9iID0gNWUtNCwgc2l6ZSA9IDEwMDAwKQ0KYmFycGxvdCh0YWJsZShzaW11bGF0aW9ucyksIGNvbCA9ICJsYXZlbmRlciIpDQpgYGANCg0KTW9kZWwgZ2VuZXJhdGlmIHVudHVrIGRldGVrc2kgZXBpdG9wDQpgYGB7cn0NCmxvYWQoIi4uL0RvY3VtZW50cy9kYXRhL2RhdGEvZTEwMC5SRGF0YSIpDQpiYXJwbG90KGUxMDAsIHlsaW0gPSBjKDAsIDcpLCB3aWR0aCA9IDAuNywgeGxpbSA9IGMoLTAuNSwgMTAwLjUpLA0KICBuYW1lcy5hcmcgPSBzZXEoYWxvbmcgPSBlMTAwKSwgY29sID0gImRhcmtvbGl2ZWdyZWVuIikNCmBgYA0KDQpgYGB7cn0NCjEgLSBwcG9pcyg2LCAwLjUpDQpgYGANCg0KYGBge3J9DQpwcG9pcyg2LCAwLjUsIGxvd2VyLnRhaWwgPSBGQUxTRSkNCmBgYA0KDQpgYGB7cn0NCm1heGVzID0gcmVwbGljYXRlKDEwMDAwMCwgew0KICBtYXgocnBvaXMoMTAwLCAwLjUpKQ0KfSkNCnRhYmxlKG1heGVzKQ0KYGBgDQoNCmBgYHtyfQ0KcHZlYyA9IHJlcCgxLzQsIDQpDQp0KHJtdWx0aW5vbSgxLCBwcm9iID0gcHZlYywgc2l6ZSA9IDgpKQ0KYGBgDQpgYGB7cn0NCmlmICghcmVxdWlyZU5hbWVzcGFjZSgiQmlvY01hbmFnZXIiLCBxdWlldGx5ID0gVFJVRSkpDQogICAgaW5zdGFsbC5wYWNrYWdlcygiQmlvY01hbmFnZXIiKQ0KQmlvY01hbmFnZXI6Omluc3RhbGwoYygiQmlvc3RyaW5ncyIsICJCU2dlbm9tZS5DZWxlZ2Fucy5VQ1NDLmNlMiIpKQ0KYGBgDQojIDQuICBTdW1tYXJ5IG9mIHRoaXMgY2hhcHRlcg0KV2UgaGF2ZSB1c2VkIG1hdGhlbWF0aWNhbCBmb3JtdWzDpiBhbmQgUiB0byBjb21wdXRlIHByb2JhYmlsaXRpZXMgb2YgdmFyaW91cyBkaXNjcmV0ZSBldmVudHMgdGhhdCBjYW4gd2UgbW9kZWxlZCB3aXRoIGEgZmV3IGJhc2ljIGRpc3RyaWJ1dGlvbnM6IFRoZSBCZXJub3VsbGkgZGlzdHJpYnV0aW9uIHdhcyBvdXIgbW9zdCBiYXNpYyBidWlsZGluZyBibG9jayDigJMgaXQgaXMgdXNlZCB0byByZXByZXNlbnQgYSBzaW5nbGUgYmluYXJ5IHRyaWFsIHN1Y2ggYXMgYSBjb2luIGZsaXAuIFdlIGNhbiBjb2RlIHRoZSBvdXRjb21lcyBhcyAwIGFuZCAxLiBXZSBjYWxsIHAgdGhlIHByb2JhYmlsaXR5IG9mIHN1Y2Nlc3MgKHRoZSAxIG91dGNvbWUpLg0KDQpUaGUgYmlub21pYWwgZGlzdHJpYnV0aW9uIGlzIHVzZWQgZm9yIHRoZSBudW1iZXIgb2YgMXMgaW4gbiBiaW5hcnkgdHJpYWwgYW5kIHdlIGdlbmVyYXRlIHRoZSBwcm9iYWJpbGl0aWVzIG9mIHNlZWluZyBrIHN1Y2Nlc3NlcyB1c2luZyB0aGUgUiBmdW5jdGlvbiBkYmlub20uIFdlIGFsc28gc2F3IHRoYXQgd2UgY291bGQgc2ltdWxhdGUgYW4gbiB0cmlhbCBiaW5vbWlhbCB1c2luZyB0aGUgZnVuY3Rpb24gcmJpbm9tLg0KDQpUaGUgUG9pc3NvbiBkaXN0cmlidXRpb24gaXMgbW9zdCBhcHByb3ByaWF0ZSBmb3IgY2FzZXMgd2hlbiBwIGlzIHNtYWxsICh0aGUgMXMgYXJlIHJhcmUpLiBJdCBoYXMgb25seSBvbmUgcGFyYW1ldGVyIM67LCBhbmQgdGhlIFBvaXNzb24gZGlzdHJpYnV0aW9uIGZvciANCs67ID0gbiBwDQppcyBhcHByb3hpbWF0ZWx5IHRoZSBzYW1lIGFzIHRoZSBiaW5vbWlhbCBkaXN0cmlidXRpb24gZm9yIChuLHApIGlmIHANCmlzIHNtYWxsLiBXZSB1c2VkIHRoZSBQb2lzc29uIGRpc3RyaWJ1dGlvbiB0byBtb2RlbCB0aGUgbnVtYmVyIG9mIHJhbmRvbWx5IG9jY3VycmluZyBmYWxzZSBwb3NpdGl2ZXMgaW4gYW4gYXNzYXkgdGhhdCB0ZXN0ZWQgZm9yIGVwaXRvcGVzIGFsb25nIGEgc2VxdWVuY2UsIHByZXN1bWluZyB0aGF0IHRoZSBwZXItcG9zaXRpb24gZmFsc2UgcG9zaXRpdmUgcmF0ZSBwIHdhcyBzbWFsbC4gV2Ugc2F3IGhvdyBzdWNoIGEgcGFyYW1ldHJpYyBtb2RlbCBlbmFibGVkIHVzIHRvIGNvbXB1dGUgdGhlIHByb2JhYmlsaXRpZXMgb2YgZXh0cmVtZSBldmVudHMsIGFzIGxvbmcgYXMgd2Uga25ldyBhbGwgdGhlIHBhcmFtZXRlcnMuDQoNClRoZSBtdWx0aW5vbWlhbCBkaXN0cmlidXRpb24gaXMgdXNlZCBmb3IgZGlzY3JldGUgZXZlbnRzIHRoYXQgaGF2ZSBtb3JlIHRoYW4gdHdvIHBvc3NpYmxlIG91dGNvbWVzIG9yIGxldmVscy4gVGhlIHBvd2VyIGV4YW1wbGUgc2hvd2VkIHVzIGhvdyB0byB1c2UgTW9udGUgQ2FybG8gc2ltdWxhdGlvbnMgdG8gZGVjaWRlIGhvdyBtdWNoIGRhdGEgd2UgbmVlZCB0byBjb2xsZWN0IGlmIHdlIHdhbnQgdG8gdGVzdCB3aGV0aGVyIGEgbXVsdGlub21pYWwgbW9kZWwgd2l0aCBlcXVhbCBwcm9iYWJpbGl0aWVzIGlzIGNvbnNpc3RlbnQgd2l0aCB0aGUgZGF0YS4gV2UgdXNlZCBwcm9iYWJpbGl0eSBkaXN0cmlidXRpb25zIGFuZCBwcm9iYWJpbGlzdGljIG1vZGVscyB0byBldmFsdWF0ZSBoeXBvdGhlc2VzIGFib3V0IGhvdyBvdXIgZGF0YSB3ZXJlIGdlbmVyYXRlZCwgYnkgbWFraW5nIGFzc3VtcHRpb25zIGFib3V0IHRoZSBnZW5lcmF0aXZlIG1vZGVscy4gV2UgdGVybSB0aGUgcHJvYmFiaWxpdHkgb2Ygc2VlaW5nIHRoZSBkYXRhLCBnaXZlbiBhIGh5cG90aGVzaXMsIGEgcC12YWx1ZS4gVGhpcyBpcyBub3QgdGhlIHNhbWUgYXMgdGhlIHByb2JhYmlsaXR5IHRoYXQgdGhlIGh5cG90aGVzaXMgaXMgdHJ1ZSENCg0KDQojIDUuICBSZWZlcmVuY2UNCkR1cmJpbiwgUmljaGFyZCwgU2VhbiBFZGR5LCBBbmRlcnMgS3JvZ2gsIGFuZCBHcmFlbWUgTWl0Y2hpc29uLiAxOTk4LiBCaW9sb2dpY2FsIFNlcXVlbmNlIEFuYWx5c2lzLiBDYW1icmlkZ2UgVW5pdmVyc2l0eSBQcmVzcy4NCg0KRnJlZWRtYW4sIERhdmlkLCBSb2JlcnQgUGlzYW5pLCBhbmQgUm9nZXIgUHVydmVzLiAxOTk3LiBTdGF0aXN0aWNzLiBOZXcgWW9yaywgTlk6IFdXIE5vcnRvbi4NCg0KUmljZSwgSm9obi4gMjAwNi4gTWF0aGVtYXRpY2FsIFN0YXRpc3RpY3MgYW5kIERhdGEgQW5hbHlzaXMuIENlbmdhZ2UgTGVhcm5pbmcuDQoNClJvYmVydCwgQ2hyaXN0aWFuLCBhbmQgR2VvcmdlIENhc2VsbGEuIDIwMDkuIEludHJvZHVjaW5nIE1vbnRlIENhcmxvIE1ldGhvZHMgd2l0aCBSLiBTcHJpbmdlciBTY2llbmNlICYgQnVzaW5lc3MgTWVkaWEu