Detecting positive selection using SweeD, OmegaPlus, RAiSD

Introduction

SweeD, OmegaPlus and RAiSD detect complete, strong and recent selective sweeps. The selective sweep should be complete, i.e., the beneficial mutation should be fixed in the population. Furthermore, the selective sweep should be recent, i.e., the beneficial mutation should not be fixed for too long. How old can it be? Based on simulated data, we can still detect selective sweeps that are 0.05 N generations old. This means that if N = 10,000 (as it is typically in humans), we are able to detect selective sweeps that are 100 generations old, i.e. sweeps that got complete 2,500 years ago. Third, sweeps should be strong (in relation to recombination). This is needed because to detect selective sweeps we utilize the neutral genomic regions that are around the beneficial mutation. The stronger the sweep is in relation to recombination, the more signal' we have to detect the selective sweep.

SweeD

SweeD detects selective sweeps based on the Site Frequency Spectrum (SFS).

Input

SweeD accepts VCF, ms-like, FASTA etc formats. Here, we will use the VCF format, which is used for the 1000Genomes (human) data. A typical VCF file looks like:

head -10 example.VCF
## ##fileformat=VCFv4.1
## ##ALT=<ID=CN0,Description="Copy number allele: 0 copies">
## ##OTHER COMMENTS HERE.....
## #CHROM   POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NA19197 NA19214 NA18861 NA18912 NA19129 NA18910 NA19117 NA18865 NA18501 NA19099 NA18924 NA19210 NA19130 NA19098 NA18502 NA19143 NA19114 NA19093 NA18507 NA19238 NA19147 NA18519 NA19153 NA18876 NA19248 NA19092 NA18522 NA18917 NA19095 NA18871 NA19257 NA19207 NA19160 NA19235 NA18934 NA19225 NA19096 NA19149 NA19185 NA18510 NA18907 NA19131 NA18915 NA18867 NA18499 NA18523 NA18504 NA19200 NA19190 NA18879 NA18873 NA18853 NA19107 NA19144 NA19198 NA19138 NA18874 NA19118 NA19213 NA19159 NA19206 HG00268 HG00384 HG00182 HG00280 HG00356 HG00323 HG00345 HG00349 HG00329 HG00290 HG00362 HG00381 HG00346 HG00351 HG00332 HG00315 HG00306 HG00358 HG00382 HG00181 HG00320 HG00319 HG00378 HG00188 HG00373 HG00171 HG00288 HG00285 HG00176 HG00342 HG00178 HG00328 HG00267 HG00179 HG00369 HG00376 HG00371 HG00273 HG00173 HG00344 HG00284 HG00379 HG00282 HG00366 HG00324 HG00330 HG00355 HG00341 HG00276 HG00183 HG00269 HG00361 HG00274 HG00353 HG00339 HG00336 HG00177 HG00334 HG00309 HG00364 HG00372
## 21   9411302 .   G   T   100 PASS    AA=.|||;AC=2;AF=2.522e-03;AFR_AF=0.0098;AMR_AF=0;AN=244;DP=30980;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0 GT  0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
## 21   9411410 rs78200054  C   T   100 PASS    AA=.|||;AC=115;AF=0.470;AFR_AF=0.4554;AMR_AF=0.4841;AN=244;DP=36256;EAS_AF=0.4534;EUR_AF=0.4642;NS=2504;SAS_AF=0.4867   GT  0|1 1|0 0|1 0|1 1|0 0|1 1|0 0|1 1|0 0|1 1|0 0|0 0|1 1|0 0|1 0|1 0|1 0|1 0|1 1|0 1|0 1|0 0|1 0|1 0|1 0|1 1|0 1|0 1|0 1|0 0|1 1|0 1|0 0|1 1|0 0|1 0|1 0|1 1|0 1|0 1|0 1|0 1|0 1|0 1|0 1|0 0|1 0|0 0|1 1|0 0|1 1|0 0|1 1|0 1|0 1|0 1|0 0|0 1|1 0|1 0|1 0|1 0|0 1|0 0|1 0|1 1|0 1|0 0|1 1|0 0|1 0|1 0|1 1|0 0|0 1|0 0|1 0|1 1|0 0|1 0|1 0|1 0|1 1|0 0|1 1|0 1|0 0|1 0|1 0|1 0|1 1|0 1|0 1|0 0|1 0|1 0|1 0|1 0|1 1|0 1|0 0|1 1|0 0|0 0|0 1|0 1|0 0|1 0|0 0|0 0|1 0|1 1|0 1|1 0|1 0|1 1|0 0|1 1|0 0|1 0|1 1|0
## 21   9411446 .   C   T   100 PASS    AA=.|||;AC=2;AF=7.566e-03;AFR_AF=0.0272;AMR_AF=0;AN=244;DP=36220;EAS_AF=0;EUR_AF=0;NS=2504;SAS_AF=0 GT  0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
## 21   9411497 .   A   G   100 PASS    AA=.|||;AC=6;AF=7.251e-03;AFR_AF=0.0265;AMR_AF=0.0014;AN=244;DP=35268;EAS_AF=0;EUR_AF=0.001;NS=2504;SAS_AF=0    GT  0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0
## 21   9411500 rs71235073  G   T   100 PASS    AA=.|||;AC=114;AF=0.461;AFR_AF=0.4433;AMR_AF=0.4813;AN=244;DP=34681;EAS_AF=0.4494;EUR_AF=0.4503;NS=2504;SAS_AF=0.4836   GT  1|0 0|1 1|0 0|1 1|0 0|1 0|1 1|0 0|1 0|1 0|1 0|0 1|0 1|1 0|0 0|0 1|0 0|1 0|1 1|0 0|1 0|1 1|0 1|0 1|0 1|0 0|1 0|1 1|0 0|1 0|1 0|1 0|1 1|0 1|0 0|1 1|0 0|0 1|0 1|0 1|0 0|1 0|1 1|0 1|0 1|0 0|1 1|0 0|1 0|1 0|0 0|1 0|1 0|1 1|0 1|0 0|1 0|0 1|1 1|0 0|1 0|1 0|0 0|1 0|1 0|1 1|0 0|1 0|1 0|1 0|1 0|1 0|1 1|0 0|0 0|1 0|1 1|0 1|0 1|0 0|1 1|1 0|1 1|0 1|0 1|0 1|0 1|0 0|1 0|1 0|0 1|0 1|0 1|0 0|1 0|0 0|1 0|1 0|1 0|0 1|0 0|1 0|1 1|0 1|0 1|0 1|0 0|1 0|0 0|0 0|1 1|0 1|0 1|1 0|1 1|0 1|0 0|1 1|1 0|1 0|1 1|0
## 21   9411602 rs368646645 T   C   100 PASS    AA=.|||;AC=89;AF=0.363;AFR_AF=0.3729;AMR_AF=0.3026;AN=244;DP=40367;EAS_AF=0.3641;EUR_AF=0.3588;NS=2504;SAS_AF=0.408 GT  1|0 1|0 0|1 0|1 1|0 0|0 1|0 0|0 0|1 0|1 1|1 1|0 0|0 0|1 0|0 0|1 0|0 0|1 0|1 0|0 0|0 0|1 1|1 0|0 1|0 1|0 0|1 0|0 1|0 1|0 0|1 1|0 0|0 1|0 0|1 1|0 0|1 0|1 0|0 0|1 0|0 0|1 1|0 0|0 0|0 1|0 0|0 0|1 0|1 0|1 0|1 0|1 1|0 0|1 0|0 1|0 0|1 0|0 0|0 0|1 1|0 0|0 0|0 1|0 0|0 1|0 0|1 0|1 0|1 0|0 1|0 0|0 0|0 0|1 1|0 0|0 0|1 0|0 1|0 1|0 1|0 0|0 0|1 0|1 1|0 0|0 1|0 1|0 0|0 1|0 1|0 0|1 1|0 0|1 1|0 1|0 0|0 0|0 1|0 0|1 0|1 1|0 0|1 1|0 0|1 1|0 0|1 1|0 0|1 0|1 1|0 0|0 0|0 0|1 1|0 0|0 0|1 1|0 0|0 0|0 1|1 0|1

The first line specifies that the fileformat is VCF. This line is obligatory. SweeD understand that the format is VCF using the keyword fileformat=VCF Next, there are lines that start with ##'. In VCF such lines contain comments and are neglected by the file parser. Next, there is a line starting with a single#' and the word CHROM. This line represents the header line of the VCF. It describes the fields for the following lines as well as the sample names. For example, in the file given below, the sample “NA19197” is the name of the first sample.

For each individual is provided. Here, the genotype “0|0” means that both of the chromosomes of a sample contain the reference allele. The genotype “0|1” means that one chromosome (the first) contain the reference allele, whereas the second chromosome contains the alternative allele.

The input file consists of samples from the YRI (Yoruban) and the FIN (Finnish) population. Here, we will use the flag “-sampleNames” to run SweeD only on a part of the input data, i.e., only on the Yoruban population.

Preparing the sample names of the population of interest

This is needed only if we want to run SweeD only in a sub-part of the input file, i.e., in a subset of the samples. We just need a 1-column file containing the names of the sample IDs.

cat sampleList_yri.txt
## NA19197
## NA19214
## NA18861
## NA18912
## NA19129
## NA18910
## NA19117
## NA18865
## NA18501
## NA19099
## NA18924
## NA19210
## NA19130
## NA19098
## NA18502
## NA19143
## NA19114
## NA19093
## NA18507
## NA19238
## NA19147
## NA18519
## NA19153
## NA18876
## NA19248
## NA19092
## NA18522
## NA18917
## NA19095
## NA18871
## NA19257
## NA19207
## NA19160
## NA19235
## NA18934
## NA19225
## NA19096
## NA19149
## NA19185
## NA18510
## NA18907
## NA19131
## NA18915
## NA18867
## NA18499
## NA18523
## NA18504
## NA19200
## NA19190
## NA18879
## NA18873
## NA18853
## NA19107
## NA19144
## NA19198
## NA19138
## NA18874
## NA19118
## NA19213
## NA19159
## NA19206

Running SweeD

To run SweeD we use:

SweeD -input chr21_61YRI_61FIN.vcf -grid 100 -name yri -sampleList sampleList_yri.txt

The output files

The main output file of SweeD is the SweeD_Report., where is the sting provided by the -name flag. For example, here we will have the output stored in the SweeD_Report.yri.

cat SweeD_Report.yri
## 
## //1
## Position Likelihood  Alpha
## 9411302.0000 2.942610e-01    6.794537e+00
## 9802295.9062 7.545230e+00    3.119609e-05
## 10193289.8125    7.801314e-01    7.912694e-04
## 10584283.7188    2.584731e+00    1.798863e-04
## 10975277.6250    1.870629e-01    4.007882e-03
## 11366271.5312    5.166759e+01    3.808030e-06
## 11757265.4375    1.421117e+02    1.334205e-06
## 12148259.3438    1.928312e+02    8.268725e-07
## 12539253.2500    1.975667e+02    6.227276e-07
## 12930247.1562    1.203569e+02    5.888149e-07
## 13321241.0625    4.022116e+01    8.609349e-07
## 13712234.9688    4.471091e+01    1.717378e-06
## 14103228.8750    6.264191e+01    2.799948e-06
## 14494222.7812    2.387694e+00    1.684648e-04
## 14885216.6875    1.239055e-01    1.039490e-02
## 15276210.5938    1.248893e+00    3.943076e-03
## 15667204.5000    1.413087e-01    6.488796e-02
## 16058198.4062    5.140785e-01    6.070352e-03
## 16449192.3125    2.422404e-01    8.147364e-03
## 16840186.2188    8.358145e-01    6.124865e-03
## 17231180.1250    2.013594e+00    1.316748e-03
## 17622174.0312    3.333415e-01    6.826687e-02
## 18013167.9375    3.414266e-01    1.020584e-02
## 18404161.8438    6.049239e-01    1.311506e-03
## 18795155.7500    3.177763e-01    1.175475e-02
## 19186149.6562    9.427886e-01    3.010246e-03
## 19577143.5625    2.009574e-01    3.069691e-02
## 19968137.4688    9.766829e-02    9.965705e-03
## 20359131.3750    0.000000e+00    2.448980e-01
## 20750125.2812    3.523593e+00    3.316956e-04
## 21141119.1875    5.224145e-02    4.318501e-02
## 21532113.0938    8.115595e-03    3.176828e-02
## 21923107.0000    2.870023e-01    1.691630e-03
## 22314100.9062    4.984588e-01    2.028945e-03
## 22705094.8125    8.813348e-02    1.504414e-02
## 23096088.7188    2.103364e-01    1.493048e-02
## 23487082.6250    8.629216e-03    2.827470e-03
## 23878076.5312    2.097173e+00    1.275489e-03
## 24269070.4375    3.552862e-03    3.464503e-03
## 24660064.3438    1.789904e+00    1.372286e-03
## 25051058.2500    3.334141e-01    3.086371e-03
## 25442052.1562    2.164880e-01    2.388698e-01
## 25833046.0625    0.000000e+00    1.445783e-01
## 26224039.9688    5.045911e-01    2.771124e-02
## 26615033.8750    2.498790e-02    6.285291e-02
## 27006027.7812    2.892342e+00    2.865442e-03
## 27397021.6875    9.571987e-03    8.590662e-02
## 27788015.5938    9.133939e-04    5.009871e-01
## 28179009.5000    2.489853e-01    1.018124e-02
## 28570003.4062    1.220107e+00    1.063710e-02
## 28960997.3125    1.363651e+00    7.670632e-04
## 29351991.2188    3.144017e-02    1.493048e-02
## 29742985.1250    7.107725e-01    3.568572e-03
## 30133979.0312    5.034027e-01    7.195754e-03
## 30524972.9375    3.532104e-02    8.299538e-02
## 30915966.8438    8.879825e-01    9.096274e-02
## 31306960.7500    1.183413e-02    5.455854e-02
## 31697954.6562    1.558678e-01    4.235741e-02
## 32088948.5625    2.011614e-01    2.154686e-02
## 32479942.4688    1.715048e-01    1.566921e-03
## 32870936.3750    1.006537e-01    8.856547e-02
## 33261930.2812    7.899415e-02    1.092033e-02
## 33652924.1875    9.874991e-04    4.714188e-02
## 34043918.0938    9.264586e-02    6.469189e-03
## 34434912.0000    4.555837e-03    3.257143e-02
## 34825905.9062    2.242278e+00    5.421803e-03
## 35216899.8125    1.026312e+00    2.933082e-03
## 35607893.7188    3.563805e-01    1.091604e-03
## 35998887.6250    2.793871e-01    2.328092e-03
## 36389881.5312    2.595883e-04    7.773937e-02
## 36780875.4375    1.933757e+00    5.938954e-03
## 37171869.3438    1.113052e+00    2.124799e-03
## 37562863.2500    3.853937e-02    3.981162e-01
## 37953857.1562    0.000000e+00    1.875000e-01
## 38344851.0625    4.982148e-02    1.361864e-02
## 38735844.9688    1.105129e+00    5.129712e-04
## 39126838.8750    2.064621e-01    5.617320e-03
## 39517832.7812    5.402033e+00    1.834172e-04
## 39908826.6875    8.818938e-02    2.371934e-02
## 40299820.5938    0.000000e+00    1.224490e-01
## 40690814.5000    1.876212e+00    4.876569e-03
## 41081808.4062    2.097833e-02    2.376147e-02
## 41472802.3125    0.000000e+00    1.043478e-01
## 41863796.2188    2.299419e+00    8.724487e-04
## 42254790.1250    1.176823e-01    7.016558e-02
## 42645784.0312    0.000000e+00    2.553193e-01
## 43036777.9375    4.089042e-01    3.897858e-03
## 43427771.8438    1.458386e+00    1.779111e-03
## 43818765.7500    9.172638e-02    6.243654e-02
## 44209759.6562    6.512178e-01    1.077650e-02
## 44600753.5625    1.058757e+00    2.483738e-03
## 44991747.4688    3.924195e-01    1.557676e-02
## 45382741.3750    1.596262e-01    1.174020e-02
## 45773735.2812    3.190855e-01    3.260463e-03
## 46164729.1875    2.294537e-01    4.388228e-03
## 46555723.0938    2.679241e-01    1.581842e-02
## 46946717.0000    1.352751e+00    4.013845e-03
## 47337710.9062    1.485471e+00    1.025182e-03
## 47728704.8125    2.194863e-02    8.116321e-02
## 48119700.0000    8.175306e-01    4.187638e-02

This file will contain as many lines as specified by the -grid (+ 2 lines: the initial and the last SNP). The first column represents the position of the evaluation. Remember that we divided the region between the first and the last SNP in 100 pieces, thus the points of each location may be floating point numbers. This might seem meaningless, but actually it's not so important. The important thing is to remember that these positions do not necessarily correspond to SNPs. They correspond to positions at which we evaluate the score of the SweeD (i.e. we assess if a selective sweep has occurred). The second column is the score of the SweeD, formally a Composite Likelihood Ratio value (CLR) that compares the likelihood of a sweep model to the likelihood of a neutral model. The higher the value in the second column, the more evidence we have for a selective sweep. The third column is inversely proportional to the strength of selection. The smaller the value, the stronger the selection.

Specifying a threshold for SweeD

A typical question that arises from such analyses is “How high a value should be to assume that there is positive selection”. There are two approaches to define a threshold:

  1. Perform neutral simulations
  2. Use the results themselves and highlight the outliers

Perform neutral simulations

In my opinion, the first approach is more correct given that we can specify accurately the neutral demographic model. This is not an easy task, however. If we can specify the neutral demographic model, then we follow the next steps 1. Use Hudson's ms to perform neutral about 1,000 simulations with the neutral demographic model. 2. For each of the simulated datasets, run SweeD with exactly the same way that you ran it for the real datasets. 3. Get the maximum value for each simulated dataset. Thus, eventually you will have 1,000 maximum values. 4. The threshold is the 95th percentile of the maxima.

Use the dataset itself to find the threshold

This approach is easier. However, often it might be wrong because when we apply it we make a crucial assumption: “Some percentage of the dataset is evolved under positive selection”.

  1. In this case simply specify the 95th percentile of the results. This is the threshold. Notice that by definition some points will be over the threshold. Thus, you assume that some part of the dataset is evolved under positive selection

OmegaPlus

OmegaPlus is based on the Linkage Disequilibrium signature of a complete selective sweep (this is different than the LD signature of an incomplete selective sweep).

How to run OmegaPlus

Ideally, you need phased data, i.e., to have a “|” in the VCF. However, based on our experience, OmegaPlus runs adequately well with unphased data as well (those that have a “/” instead of an “|”).

The command line of OmegaPlus is the same as the SweeD. However it needs to additional arguments the “-minwin” and the “-maxwin”

OmegaPlus -input chr21_61YRI_61FIN.vcf -grid 100 -name yri -sampleList sampleList_yri.txt -minwin 10000-maxwin 100000 -seed  9382

Output of OmegaPlus

cat OmegaPlus_Report.yri
## 
## //1
## 9411302.0000 0.000000
## 9802296.0000 10.603788
## 10193290.0000    3.586661
## 10584284.0000    4.881438
## 10975278.0000    1.549859
## 11366272.0000    0.000000
## 11757266.0000    0.000000
## 12148260.0000    0.000000
## 12539254.0000    0.000000
## 12930248.0000    0.000000
## 13321242.0000    0.000000
## 13712236.0000    0.000000
## 14103230.0000    0.000000
## 14494224.0000    3.064002
## 14885218.0000    2.124483
## 15276212.0000    3.281871
## 15667206.0000    5.553235
## 16058200.0000    5.766411
## 16449194.0000    3.435114
## 16840188.0000    3.789333
## 17231182.0000    1.348858
## 17622176.0000    3.215232
## 18013170.0000    2.453793
## 18404164.0000    1.926232
## 18795158.0000    2.120644
## 19186152.0000    3.864396
## 19577146.0000    3.353946
## 19968140.0000    1.609868
## 20359134.0000    4.033772
## 20750128.0000    1.951695
## 21141122.0000    4.111059
## 21532116.0000    1.828176
## 21923110.0000    6.343532
## 22314104.0000    1.789305
## 22705098.0000    1.549660
## 23096092.0000    2.645991
## 23487086.0000    1.921136
## 23878080.0000    2.600808
## 24269074.0000    2.544547
## 24660068.0000    3.329068
## 25051062.0000    4.944273
## 25442056.0000    3.101130
## 25833050.0000    2.057784
## 26224044.0000    1.841587
## 26615038.0000    2.397617
## 27006032.0000    2.869011
## 27397026.0000    2.165385
## 27788020.0000    2.636054
## 28179014.0000    2.678717
## 28570008.0000    4.418076
## 28961002.0000    2.438923
## 29351996.0000    1.612001
## 29742990.0000    1.881970
## 30133984.0000    1.926288
## 30524978.0000    1.949290
## 30915972.0000    3.777412
## 31306966.0000    2.400773
## 31697960.0000    1.990340
## 32088954.0000    3.092060
## 32479948.0000    3.126595
## 32870942.0000    1.672112
## 33261936.0000    2.595280
## 33652928.0000    2.740213
## 34043920.0000    2.026564
## 34434912.0000    2.536452
## 34825904.0000    1.204156
## 35216896.0000    3.717905
## 35607888.0000    2.778045
## 35998880.0000    5.192340
## 36389872.0000    2.966054
## 36780864.0000    1.861009
## 37171856.0000    1.594015
## 37562848.0000    2.075480
## 37953840.0000    2.013700
## 38344832.0000    2.293964
## 38735824.0000    1.976199
## 39126816.0000    3.352223
## 39517808.0000    1.334976
## 39908800.0000    2.771131
## 40299792.0000    2.860386
## 40690784.0000    1.837032
## 41081776.0000    4.724890
## 41472768.0000    1.898254
## 41863760.0000    1.945970
## 42254752.0000    3.385714
## 42645744.0000    2.337899
## 43036736.0000    2.211709
## 43427728.0000    2.376973
## 43818720.0000    6.017141
## 44209712.0000    3.707568
## 44600704.0000    2.106304
## 44991696.0000    1.154177
## 45382688.0000    2.012803
## 45773680.0000    2.343243
## 46164672.0000    3.215011
## 46555664.0000    1.883302
## 46946656.0000    2.503688
## 47337648.0000    1.927489
## 47728640.0000    2.671232
## 48119632.0000    0.000000

Again, the first column is the position that the score is evaluated, whereas the second column is the score of the OmegaPlus.

Plotting SweeD and OmegaPlus scores

The following R script plots the results from the SweeD, OmegaPlus analysis and also it creates a plot that combines both results. With red dots we represent the points that are outliers for both of the tests (at 90th percentile).

getMultiRun <- function(fileName, header=TRUE){

    allData <- list()

    con <- file(fileName, "r")
    line = readLines(con)
    close(con)
    flag <- FALSE
    myHeader <- c("Position", "Likelihood", "Alpha")

    for(ln in line){

        tokenized <- strsplit(ln, split="\t")[[1]]
        if( ln == "" | length(tokenized) < 1 ){
            next
        }

        if( grepl("Position", x=tokenized)[1] == TRUE){
            myHeader <- tokenized
            next
        }

        if( grepl("//", x=tokenized)[1] == TRUE ){
            newData <- data.frame()

            if(header == FALSE)
                {
                    newData <- data.frame(Position=0, Score=0)
                    colnames(newData) <- c("Position", "Score")
                }
            else{
                newData <- data.frame(Position=0, Likelihood=0, Alpha=1)
                colnames(newData) <- myHeader
            }

            allData <- append(allData, list(newData))
            flag <-TRUE
        }
        else if(flag == TRUE){
            lastObject <- length(allData)
            allData[[lastObject]] <- rbind(allData[[lastObject]], as.numeric(tokenized))
        }

    }
    return(allData)
}



allDataOP <- getMultiRun("OmegaPlus_Report.yri", header=FALSE)
allDataSW <- getMultiRun("SweeD_Report.yri", header=TRUE)


## find the best Region
maxValsOP <- unlist(lapply(allDataOP, function(x){ max(x[,2])}))
maxValsSW <- unlist(lapply(allDataSW, function(x){ max(x[,2])}))

ord1 <- order(maxValsOP, decreasing=TRUE)
ord2 <- order(maxValsSW, decreasing = TRUE)

el <- NULL
for( i in 1:length(ord1)){
    if(length(intersect(ord1[1:i], ord2[1:i])) > 0){
        el <- intersect(ord1[1:i], ord2[1:i])
        break
    }
}


sw <- allDataSW[[el]][,2]
om <- allDataOP[[el]][,2]
pos <- allDataSW[[el]][,1]


#modify this to change significance cutoff
thr <- 0.1

assign.pvalues <- function(array){
  #array <- sample(sw, 1000)
  pvalues <- array(0, length(array))
  ordered.indexes <- order(array)

  j <- length(array)
  for( i in ordered.indexes ){
    pvalues[i] <- j/length(array)
    j <- j-1
  }

  return(pvalues)  
}


om.pval <- assign.pvalues(om)
sw.pval <- assign.pvalues(sw)
outliers <- which(om.pval < thr & sw.pval < thr)

## UNCOMMENT TO PLOT TO A FILE
##pdf("combined_plot_sweed_omegaplus.pdf", width=6.5, height=5)

layout(matrix(c(1,2,3,3), 2, 2, byrow=F), widths=c(3.2,3), heights=c(2,2))

par(mar=c(4,4,1,0))

plot(pos, sw, col="darkgray", pch=16, ylab="", xlab="")
mtext(side=1, text="Position", 2)
mtext(side=2, text="SweeD", 2)
points(pos[outliers], sw[outliers], col="red", pch=16, cex=1.2)
points(pos[outliers], sw[outliers], col="black", pch=1, cex=1.2)
mtext(side=3, text="A", adj=0.5, 0.)

plot(pos, om, col="darkgray", pch=16, ylab="", xlab="")
mtext(side=1, text="Position", 2)
mtext(side=2, text="OmegaPlus", 2)
points(pos[outliers], om[outliers], col="red", pch=16, cex=1.2)
points(pos[outliers], om[outliers], col="black", pch=1, cex=1.2)

plot(sw, om, main="", xlab="", ylab="",axes=F)
mtext(side=1, text="SweeD", 2)
mtext(side=2, text="OmegaPlus", 2)

points(sw[outliers], om[outliers], col="red", pch=16)
axis(side=1)
axis(side=2)
mtext(side=3, text="B", adj=0.5, 0.)

plot of chunk unnamed-chunk-7

## UNCOMMENT TO PLOT TO A FILE
## dev.off()