SweeD, OmegaPlus and RAiSD detect complete, strong and recent selective sweeps. The selective sweep should be complete, i.e., the beneficial mutation should be fixed in the population. Furthermore, the selective sweep should be recent, i.e., the beneficial mutation should not be fixed for too long. How old can it be? Based on simulated data, we can still detect selective sweeps that are 0.05 N generations old. This means that if N = 10,000 (as it is typically in humans), we are able to detect selective sweeps that are 100 generations old, i.e. sweeps that got complete 2,500 years ago. Third, sweeps should be strong (in relation to recombination). This is needed because to detect selective sweeps we utilize the neutral genomic regions that are around the beneficial mutation. The stronger the sweep is in relation to recombination, the more signal' we have to detect the selective sweep.
SweeD detects selective sweeps based on the Site Frequency Spectrum (SFS).
SweeD accepts VCF, ms-like, FASTA etc formats. Here, we will use the VCF format, which is used for the 1000Genomes (human) data. A typical VCF file looks like:
head -10 example.VCF
## ##fileformat=VCFv4.1
## ##ALT=<ID=CN0,Description="Copy number allele: 0 copies">
## ##OTHER COMMENTS HERE.....
## #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA19197 NA19214 NA18861 NA18912 NA19129 NA18910 NA19117 NA18865 NA18501 NA19099 NA18924 NA19210 NA19130 NA19098 NA18502 NA19143 NA19114 NA19093 NA18507 NA19238 NA19147 NA18519 NA19153 NA18876 NA19248 NA19092 NA18522 NA18917 NA19095 NA18871 NA19257 NA19207 NA19160 NA19235 NA18934 NA19225 NA19096 NA19149 NA19185 NA18510 NA18907 NA19131 NA18915 NA18867 NA18499 NA18523 NA18504 NA19200 NA19190 NA18879 NA18873 NA18853 NA19107 NA19144 NA19198 NA19138 NA18874 NA19118 NA19213 NA19159 NA19206 HG00268 HG00384 HG00182 HG00280 HG00356 HG00323 HG00345 HG00349 HG00329 HG00290 HG00362 HG00381 HG00346 HG00351 HG00332 HG00315 HG00306 HG00358 HG00382 HG00181 HG00320 HG00319 HG00378 HG00188 HG00373 HG00171 HG00288 HG00285 HG00176 HG00342 HG00178 HG00328 HG00267 HG00179 HG00369 HG00376 HG00371 HG00273 HG00173 HG00344 HG00284 HG00379 HG00282 HG00366 HG00324 HG00330 HG00355 HG00341 HG00276 HG00183 HG00269 HG00361 HG00274 HG00353 HG00339 HG00336 HG00177 HG00334 HG00309 HG00364 HG00372
## 21 9411302 . G T 100 PASS AA=.|||;AC=2;AF=2.522e-03;AFR_AF=0.0098;AMR_AF=0;AN=244;DP=30980;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0 GT 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
## 21 9411410 rs78200054 C T 100 PASS AA=.|||;AC=115;AF=0.470;AFR_AF=0.4554;AMR_AF=0.4841;AN=244;DP=36256;EAS_AF=0.4534;EUR_AF=0.4642;NS=2504;SAS_AF=0.4867 GT 0|1 1|0 0|1 0|1 1|0 0|1 1|0 0|1 1|0 0|1 1|0 0|0 0|1 1|0 0|1 0|1 0|1 0|1 0|1 1|0 1|0 1|0 0|1 0|1 0|1 0|1 1|0 1|0 1|0 1|0 0|1 1|0 1|0 0|1 1|0 0|1 0|1 0|1 1|0 1|0 1|0 1|0 1|0 1|0 1|0 1|0 0|1 0|0 0|1 1|0 0|1 1|0 0|1 1|0 1|0 1|0 1|0 0|0 1|1 0|1 0|1 0|1 0|0 1|0 0|1 0|1 1|0 1|0 0|1 1|0 0|1 0|1 0|1 1|0 0|0 1|0 0|1 0|1 1|0 0|1 0|1 0|1 0|1 1|0 0|1 1|0 1|0 0|1 0|1 0|1 0|1 1|0 1|0 1|0 0|1 0|1 0|1 0|1 0|1 1|0 1|0 0|1 1|0 0|0 0|0 1|0 1|0 0|1 0|0 0|0 0|1 0|1 1|0 1|1 0|1 0|1 1|0 0|1 1|0 0|1 0|1 1|0
## 21 9411446 . C T 100 PASS AA=.|||;AC=2;AF=7.566e-03;AFR_AF=0.0272;AMR_AF=0;AN=244;DP=36220;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
## 21 9411497 . A G 100 PASS AA=.|||;AC=6;AF=7.251e-03;AFR_AF=0.0265;AMR_AF=0.0014;AN=244;DP=35268;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0 GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
## 21 9411500 rs71235073 G T 100 PASS AA=.|||;AC=114;AF=0.461;AFR_AF=0.4433;AMR_AF=0.4813;AN=244;DP=34681;EAS_AF=0.4494;EUR_AF=0.4503;NS=2504;SAS_AF=0.4836 GT 1|0 0|1 1|0 0|1 1|0 0|1 0|1 1|0 0|1 0|1 0|1 0|0 1|0 1|1 0|0 0|0 1|0 0|1 0|1 1|0 0|1 0|1 1|0 1|0 1|0 1|0 0|1 0|1 1|0 0|1 0|1 0|1 0|1 1|0 1|0 0|1 1|0 0|0 1|0 1|0 1|0 0|1 0|1 1|0 1|0 1|0 0|1 1|0 0|1 0|1 0|0 0|1 0|1 0|1 1|0 1|0 0|1 0|0 1|1 1|0 0|1 0|1 0|0 0|1 0|1 0|1 1|0 0|1 0|1 0|1 0|1 0|1 0|1 1|0 0|0 0|1 0|1 1|0 1|0 1|0 0|1 1|1 0|1 1|0 1|0 1|0 1|0 1|0 0|1 0|1 0|0 1|0 1|0 1|0 0|1 0|0 0|1 0|1 0|1 0|0 1|0 0|1 0|1 1|0 1|0 1|0 1|0 0|1 0|0 0|0 0|1 1|0 1|0 1|1 0|1 1|0 1|0 0|1 1|1 0|1 0|1 1|0
## 21 9411602 rs368646645 T C 100 PASS AA=.|||;AC=89;AF=0.363;AFR_AF=0.3729;AMR_AF=0.3026;AN=244;DP=40367;EAS_AF=0.3641;EUR_AF=0.3588;NS=2504;SAS_AF=0.408 GT 1|0 1|0 0|1 0|1 1|0 0|0 1|0 0|0 0|1 0|1 1|1 1|0 0|0 0|1 0|0 0|1 0|0 0|1 0|1 0|0 0|0 0|1 1|1 0|0 1|0 1|0 0|1 0|0 1|0 1|0 0|1 1|0 0|0 1|0 0|1 1|0 0|1 0|1 0|0 0|1 0|0 0|1 1|0 0|0 0|0 1|0 0|0 0|1 0|1 0|1 0|1 0|1 1|0 0|1 0|0 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 1|0 0|0 1|0 0|1 0|1 0|1 0|0 1|0 0|0 0|0 0|1 1|0 0|0 0|1 0|0 1|0 1|0 1|0 0|0 0|1 0|1 1|0 0|0 1|0 1|0 0|0 1|0 1|0 0|1 1|0 0|1 1|0 1|0 0|0 0|0 1|0 0|1 0|1 1|0 0|1 1|0 0|1 1|0 0|1 1|0 0|1 0|1 1|0 0|0 0|0 0|1 1|0 0|0 0|1 1|0 0|0 0|0 1|1 0|1
The first line specifies that the fileformat is VCF. This line is obligatory. SweeD understand that the format is VCF using the keyword fileformat=VCF Next, there are lines that start with ##'. In VCF such lines contain comments and are neglected by the file parser. Next, there is a line starting with a single#' and the word CHROM. This line represents the header line of the VCF. It describes the fields for the following lines as well as the sample names. For example, in the file given below, the sample “NA19197” is the name of the first sample.
For each individual is provided. Here, the genotype “0|0” means that both of the chromosomes of a sample contain the reference allele. The genotype “0|1” means that one chromosome (the first) contain the reference allele, whereas the second chromosome contains the alternative allele.
The input file consists of samples from the YRI (Yoruban) and the FIN (Finnish) population. Here, we will use the flag “-sampleNames” to run SweeD only on a part of the input data, i.e., only on the Yoruban population.
This is needed only if we want to run SweeD only in a sub-part of the input file, i.e., in a subset of the samples. We just need a 1-column file containing the names of the sample IDs.
cat sampleList_yri.txt
## NA19197
## NA19214
## NA18861
## NA18912
## NA19129
## NA18910
## NA19117
## NA18865
## NA18501
## NA19099
## NA18924
## NA19210
## NA19130
## NA19098
## NA18502
## NA19143
## NA19114
## NA19093
## NA18507
## NA19238
## NA19147
## NA18519
## NA19153
## NA18876
## NA19248
## NA19092
## NA18522
## NA18917
## NA19095
## NA18871
## NA19257
## NA19207
## NA19160
## NA19235
## NA18934
## NA19225
## NA19096
## NA19149
## NA19185
## NA18510
## NA18907
## NA19131
## NA18915
## NA18867
## NA18499
## NA18523
## NA18504
## NA19200
## NA19190
## NA18879
## NA18873
## NA18853
## NA19107
## NA19144
## NA19198
## NA19138
## NA18874
## NA19118
## NA19213
## NA19159
## NA19206
To run SweeD we use:
SweeD -input chr21_61YRI_61FIN.vcf -grid 100 -name yri -sampleList sampleList_yri.txt
The main output file of SweeD is the SweeD_Report.
cat SweeD_Report.yri
##
## //1
## Position Likelihood Alpha
## 9411302.0000 2.942610e-01 6.794537e+00
## 9802295.9062 7.545230e+00 3.119609e-05
## 10193289.8125 7.801314e-01 7.912694e-04
## 10584283.7188 2.584731e+00 1.798863e-04
## 10975277.6250 1.870629e-01 4.007882e-03
## 11366271.5312 5.166759e+01 3.808030e-06
## 11757265.4375 1.421117e+02 1.334205e-06
## 12148259.3438 1.928312e+02 8.268725e-07
## 12539253.2500 1.975667e+02 6.227276e-07
## 12930247.1562 1.203569e+02 5.888149e-07
## 13321241.0625 4.022116e+01 8.609349e-07
## 13712234.9688 4.471091e+01 1.717378e-06
## 14103228.8750 6.264191e+01 2.799948e-06
## 14494222.7812 2.387694e+00 1.684648e-04
## 14885216.6875 1.239055e-01 1.039490e-02
## 15276210.5938 1.248893e+00 3.943076e-03
## 15667204.5000 1.413087e-01 6.488796e-02
## 16058198.4062 5.140785e-01 6.070352e-03
## 16449192.3125 2.422404e-01 8.147364e-03
## 16840186.2188 8.358145e-01 6.124865e-03
## 17231180.1250 2.013594e+00 1.316748e-03
## 17622174.0312 3.333415e-01 6.826687e-02
## 18013167.9375 3.414266e-01 1.020584e-02
## 18404161.8438 6.049239e-01 1.311506e-03
## 18795155.7500 3.177763e-01 1.175475e-02
## 19186149.6562 9.427886e-01 3.010246e-03
## 19577143.5625 2.009574e-01 3.069691e-02
## 19968137.4688 9.766829e-02 9.965705e-03
## 20359131.3750 0.000000e+00 2.448980e-01
## 20750125.2812 3.523593e+00 3.316956e-04
## 21141119.1875 5.224145e-02 4.318501e-02
## 21532113.0938 8.115595e-03 3.176828e-02
## 21923107.0000 2.870023e-01 1.691630e-03
## 22314100.9062 4.984588e-01 2.028945e-03
## 22705094.8125 8.813348e-02 1.504414e-02
## 23096088.7188 2.103364e-01 1.493048e-02
## 23487082.6250 8.629216e-03 2.827470e-03
## 23878076.5312 2.097173e+00 1.275489e-03
## 24269070.4375 3.552862e-03 3.464503e-03
## 24660064.3438 1.789904e+00 1.372286e-03
## 25051058.2500 3.334141e-01 3.086371e-03
## 25442052.1562 2.164880e-01 2.388698e-01
## 25833046.0625 0.000000e+00 1.445783e-01
## 26224039.9688 5.045911e-01 2.771124e-02
## 26615033.8750 2.498790e-02 6.285291e-02
## 27006027.7812 2.892342e+00 2.865442e-03
## 27397021.6875 9.571987e-03 8.590662e-02
## 27788015.5938 9.133939e-04 5.009871e-01
## 28179009.5000 2.489853e-01 1.018124e-02
## 28570003.4062 1.220107e+00 1.063710e-02
## 28960997.3125 1.363651e+00 7.670632e-04
## 29351991.2188 3.144017e-02 1.493048e-02
## 29742985.1250 7.107725e-01 3.568572e-03
## 30133979.0312 5.034027e-01 7.195754e-03
## 30524972.9375 3.532104e-02 8.299538e-02
## 30915966.8438 8.879825e-01 9.096274e-02
## 31306960.7500 1.183413e-02 5.455854e-02
## 31697954.6562 1.558678e-01 4.235741e-02
## 32088948.5625 2.011614e-01 2.154686e-02
## 32479942.4688 1.715048e-01 1.566921e-03
## 32870936.3750 1.006537e-01 8.856547e-02
## 33261930.2812 7.899415e-02 1.092033e-02
## 33652924.1875 9.874991e-04 4.714188e-02
## 34043918.0938 9.264586e-02 6.469189e-03
## 34434912.0000 4.555837e-03 3.257143e-02
## 34825905.9062 2.242278e+00 5.421803e-03
## 35216899.8125 1.026312e+00 2.933082e-03
## 35607893.7188 3.563805e-01 1.091604e-03
## 35998887.6250 2.793871e-01 2.328092e-03
## 36389881.5312 2.595883e-04 7.773937e-02
## 36780875.4375 1.933757e+00 5.938954e-03
## 37171869.3438 1.113052e+00 2.124799e-03
## 37562863.2500 3.853937e-02 3.981162e-01
## 37953857.1562 0.000000e+00 1.875000e-01
## 38344851.0625 4.982148e-02 1.361864e-02
## 38735844.9688 1.105129e+00 5.129712e-04
## 39126838.8750 2.064621e-01 5.617320e-03
## 39517832.7812 5.402033e+00 1.834172e-04
## 39908826.6875 8.818938e-02 2.371934e-02
## 40299820.5938 0.000000e+00 1.224490e-01
## 40690814.5000 1.876212e+00 4.876569e-03
## 41081808.4062 2.097833e-02 2.376147e-02
## 41472802.3125 0.000000e+00 1.043478e-01
## 41863796.2188 2.299419e+00 8.724487e-04
## 42254790.1250 1.176823e-01 7.016558e-02
## 42645784.0312 0.000000e+00 2.553193e-01
## 43036777.9375 4.089042e-01 3.897858e-03
## 43427771.8438 1.458386e+00 1.779111e-03
## 43818765.7500 9.172638e-02 6.243654e-02
## 44209759.6562 6.512178e-01 1.077650e-02
## 44600753.5625 1.058757e+00 2.483738e-03
## 44991747.4688 3.924195e-01 1.557676e-02
## 45382741.3750 1.596262e-01 1.174020e-02
## 45773735.2812 3.190855e-01 3.260463e-03
## 46164729.1875 2.294537e-01 4.388228e-03
## 46555723.0938 2.679241e-01 1.581842e-02
## 46946717.0000 1.352751e+00 4.013845e-03
## 47337710.9062 1.485471e+00 1.025182e-03
## 47728704.8125 2.194863e-02 8.116321e-02
## 48119700.0000 8.175306e-01 4.187638e-02
This file will contain as many lines as specified by the -grid (+ 2 lines: the initial and the last SNP). The first column represents the position of the evaluation. Remember that we divided the region between the first and the last SNP in 100 pieces, thus the points of each location may be floating point numbers. This might seem meaningless, but actually it's not so important. The important thing is to remember that these positions do not necessarily correspond to SNPs. They correspond to positions at which we evaluate the score of the SweeD (i.e. we assess if a selective sweep has occurred). The second column is the score of the SweeD, formally a Composite Likelihood Ratio value (CLR) that compares the likelihood of a sweep model to the likelihood of a neutral model. The higher the value in the second column, the more evidence we have for a selective sweep. The third column is inversely proportional to the strength of selection. The smaller the value, the stronger the selection.
A typical question that arises from such analyses is “How high a value should be to assume that there is positive selection”. There are two approaches to define a threshold:
In my opinion, the first approach is more correct given that we can specify accurately the neutral demographic model. This is not an easy task, however. If we can specify the neutral demographic model, then we follow the next steps 1. Use Hudson's ms to perform neutral about 1,000 simulations with the neutral demographic model. 2. For each of the simulated datasets, run SweeD with exactly the same way that you ran it for the real datasets. 3. Get the maximum value for each simulated dataset. Thus, eventually you will have 1,000 maximum values. 4. The threshold is the 95th percentile of the maxima.
This approach is easier. However, often it might be wrong because when we apply it we make a crucial assumption: “Some percentage of the dataset is evolved under positive selection”.
OmegaPlus is based on the Linkage Disequilibrium signature of a complete selective sweep (this is different than the LD signature of an incomplete selective sweep).
Ideally, you need phased data, i.e., to have a “|” in the VCF. However, based on our experience, OmegaPlus runs adequately well with unphased data as well (those that have a “/” instead of an “|”).
The command line of OmegaPlus is the same as the SweeD. However it needs to additional arguments the “-minwin” and the “-maxwin”
OmegaPlus -input chr21_61YRI_61FIN.vcf -grid 100 -name yri -sampleList sampleList_yri.txt -minwin 10000-maxwin 100000 -seed 9382
cat OmegaPlus_Report.yri
##
## //1
## 9411302.0000 0.000000
## 9802296.0000 10.603788
## 10193290.0000 3.586661
## 10584284.0000 4.881438
## 10975278.0000 1.549859
## 11366272.0000 0.000000
## 11757266.0000 0.000000
## 12148260.0000 0.000000
## 12539254.0000 0.000000
## 12930248.0000 0.000000
## 13321242.0000 0.000000
## 13712236.0000 0.000000
## 14103230.0000 0.000000
## 14494224.0000 3.064002
## 14885218.0000 2.124483
## 15276212.0000 3.281871
## 15667206.0000 5.553235
## 16058200.0000 5.766411
## 16449194.0000 3.435114
## 16840188.0000 3.789333
## 17231182.0000 1.348858
## 17622176.0000 3.215232
## 18013170.0000 2.453793
## 18404164.0000 1.926232
## 18795158.0000 2.120644
## 19186152.0000 3.864396
## 19577146.0000 3.353946
## 19968140.0000 1.609868
## 20359134.0000 4.033772
## 20750128.0000 1.951695
## 21141122.0000 4.111059
## 21532116.0000 1.828176
## 21923110.0000 6.343532
## 22314104.0000 1.789305
## 22705098.0000 1.549660
## 23096092.0000 2.645991
## 23487086.0000 1.921136
## 23878080.0000 2.600808
## 24269074.0000 2.544547
## 24660068.0000 3.329068
## 25051062.0000 4.944273
## 25442056.0000 3.101130
## 25833050.0000 2.057784
## 26224044.0000 1.841587
## 26615038.0000 2.397617
## 27006032.0000 2.869011
## 27397026.0000 2.165385
## 27788020.0000 2.636054
## 28179014.0000 2.678717
## 28570008.0000 4.418076
## 28961002.0000 2.438923
## 29351996.0000 1.612001
## 29742990.0000 1.881970
## 30133984.0000 1.926288
## 30524978.0000 1.949290
## 30915972.0000 3.777412
## 31306966.0000 2.400773
## 31697960.0000 1.990340
## 32088954.0000 3.092060
## 32479948.0000 3.126595
## 32870942.0000 1.672112
## 33261936.0000 2.595280
## 33652928.0000 2.740213
## 34043920.0000 2.026564
## 34434912.0000 2.536452
## 34825904.0000 1.204156
## 35216896.0000 3.717905
## 35607888.0000 2.778045
## 35998880.0000 5.192340
## 36389872.0000 2.966054
## 36780864.0000 1.861009
## 37171856.0000 1.594015
## 37562848.0000 2.075480
## 37953840.0000 2.013700
## 38344832.0000 2.293964
## 38735824.0000 1.976199
## 39126816.0000 3.352223
## 39517808.0000 1.334976
## 39908800.0000 2.771131
## 40299792.0000 2.860386
## 40690784.0000 1.837032
## 41081776.0000 4.724890
## 41472768.0000 1.898254
## 41863760.0000 1.945970
## 42254752.0000 3.385714
## 42645744.0000 2.337899
## 43036736.0000 2.211709
## 43427728.0000 2.376973
## 43818720.0000 6.017141
## 44209712.0000 3.707568
## 44600704.0000 2.106304
## 44991696.0000 1.154177
## 45382688.0000 2.012803
## 45773680.0000 2.343243
## 46164672.0000 3.215011
## 46555664.0000 1.883302
## 46946656.0000 2.503688
## 47337648.0000 1.927489
## 47728640.0000 2.671232
## 48119632.0000 0.000000
Again, the first column is the position that the score is evaluated, whereas the second column is the score of the OmegaPlus.
The following R script plots the results from the SweeD, OmegaPlus analysis and also it creates a plot that combines both results. With red dots we represent the points that are outliers for both of the tests (at 90th percentile).
getMultiRun <- function(fileName, header=TRUE){
allData <- list()
con <- file(fileName, "r")
line = readLines(con)
close(con)
flag <- FALSE
myHeader <- c("Position", "Likelihood", "Alpha")
for(ln in line){
tokenized <- strsplit(ln, split="\t")[[1]]
if( ln == "" | length(tokenized) < 1 ){
next
}
if( grepl("Position", x=tokenized)[1] == TRUE){
myHeader <- tokenized
next
}
if( grepl("//", x=tokenized)[1] == TRUE ){
newData <- data.frame()
if(header == FALSE)
{
newData <- data.frame(Position=0, Score=0)
colnames(newData) <- c("Position", "Score")
}
else{
newData <- data.frame(Position=0, Likelihood=0, Alpha=1)
colnames(newData) <- myHeader
}
allData <- append(allData, list(newData))
flag <-TRUE
}
else if(flag == TRUE){
lastObject <- length(allData)
allData[[lastObject]] <- rbind(allData[[lastObject]], as.numeric(tokenized))
}
}
return(allData)
}
allDataOP <- getMultiRun("OmegaPlus_Report.yri", header=FALSE)
allDataSW <- getMultiRun("SweeD_Report.yri", header=TRUE)
## find the best Region
maxValsOP <- unlist(lapply(allDataOP, function(x){ max(x[,2])}))
maxValsSW <- unlist(lapply(allDataSW, function(x){ max(x[,2])}))
ord1 <- order(maxValsOP, decreasing=TRUE)
ord2 <- order(maxValsSW, decreasing = TRUE)
el <- NULL
for( i in 1:length(ord1)){
if(length(intersect(ord1[1:i], ord2[1:i])) > 0){
el <- intersect(ord1[1:i], ord2[1:i])
break
}
}
sw <- allDataSW[[el]][,2]
om <- allDataOP[[el]][,2]
pos <- allDataSW[[el]][,1]
#modify this to change significance cutoff
thr <- 0.1
assign.pvalues <- function(array){
#array <- sample(sw, 1000)
pvalues <- array(0, length(array))
ordered.indexes <- order(array)
j <- length(array)
for( i in ordered.indexes ){
pvalues[i] <- j/length(array)
j <- j-1
}
return(pvalues)
}
om.pval <- assign.pvalues(om)
sw.pval <- assign.pvalues(sw)
outliers <- which(om.pval < thr & sw.pval < thr)
## UNCOMMENT TO PLOT TO A FILE
##pdf("combined_plot_sweed_omegaplus.pdf", width=6.5, height=5)
layout(matrix(c(1,2,3,3), 2, 2, byrow=F), widths=c(3.2,3), heights=c(2,2))
par(mar=c(4,4,1,0))
plot(pos, sw, col="darkgray", pch=16, ylab="", xlab="")
mtext(side=1, text="Position", 2)
mtext(side=2, text="SweeD", 2)
points(pos[outliers], sw[outliers], col="red", pch=16, cex=1.2)
points(pos[outliers], sw[outliers], col="black", pch=1, cex=1.2)
mtext(side=3, text="A", adj=0.5, 0.)
plot(pos, om, col="darkgray", pch=16, ylab="", xlab="")
mtext(side=1, text="Position", 2)
mtext(side=2, text="OmegaPlus", 2)
points(pos[outliers], om[outliers], col="red", pch=16, cex=1.2)
points(pos[outliers], om[outliers], col="black", pch=1, cex=1.2)
plot(sw, om, main="", xlab="", ylab="",axes=F)
mtext(side=1, text="SweeD", 2)
mtext(side=2, text="OmegaPlus", 2)
points(sw[outliers], om[outliers], col="red", pch=16)
axis(side=1)
axis(side=2)
mtext(side=3, text="B", adj=0.5, 0.)
## UNCOMMENT TO PLOT TO A FILE
## dev.off()