1 Finding the best methods

1.1 ABC Algorithms

I used the population call WUG5 as the main experimentation population, because the dadi and tajimas D results are very clearly showing a strong expansion for it. Therefore, I expect ABC to be able to also detect this. I later had quick looks if the different algorithms do similarly in the other populations. The main task was the model selection, so I focused on this and only later will do parameter estimation with the best set of methods. I also start here with the unmasked data. So no missing data introduction and filtering done on the simulated data.

To get an idea about how the real (target) data and the simulated data actually fit together in summary statistics, I do a PCA of the all the summary statistics first.

It shows, the summary statistics of the target are not completely off of the simulated data. So generally, we simulated scenarios that probably have at least some similarity to the demographic process behind it. Also as expected, does the target fall within the summary stats simulated from the expansion scenario. On a not so positive note though, one can see here that there is a large overlap in summary statistics between the three scenarios expansion, contraction and neutral. The loadings of each summary statistic also indicate little variety between the summary statistics, or more precisely said, many of them have a high collinearity. This means, they all kinda describe the same aspects in the data. This is not ideal. There also a lot of outliers in the contraction simulation, which make up a lot of the variation in the data set. This issue is adressed later by doing less simulations with extreme priors.

I first went with the ABC random forest algorithm I usually used for all the bat experiments. It has the advantage, that it is pretty robust against to choice of summary statistics. Theoretically, the high collinearity in the summary stats shouldn’t be large issue with this method.

First, I do cross validation, which is treating randomly selected simulations as the target to see how well the model selection works generally.

## Warning in lda.default(x, grouping, ...): variables are collinear

##     con exp neu class.error
## con 415 225 295   0.5561497
## exp 188 410 337   0.5614973
## neu 221 262 452   0.5165775

A classification error of around 0.5 is obviously not very good, but at least there is a majority for the correct model for each of them and it is not the case that two models are completely indistinguishable. So I would say it has not a good resolution and there are many cases which can’t be distinguished, but it is not no signal at all. Likely, this is due to the many simulations which overlap in their summary statistics, as shown in the PCA.

Next, I attempt the model selection with the real target using the random forest.

##   selected model votes model1 votes model2 votes model3 post.proba
## 1            neu          111           98          791  0.7423667

With an ~ 80 % posterior probability, the neutral model was selected. A very similar number of votes were casted to the other model, contraction and expansion. This is likely not correct and doing this with other populations and species shows, that the random forest algortihm always tends to favor the neutral model. Looking again at the PCA, the neutral model seems to be a bit in the middel of the two other models. My hypothesis is therefore, that the random forest is bad at differentiating something from the “middle” of the summary stats, even it is clearly different from it, like the target of call WUG5 here. If it can’t differentiate it from this “middle”, it will always pick neutral, because in this “middle” the neutral model is in the majority.

The collinearity of the summary stats can also have an influence on it, but reducing or transforming wasn’t enough to get a better result from it. I will explain later in another section how I choose summary stats, but with an more ideal choice, the random forest still picked neutral model, but now with a low posterior probability of ~30%. So at least it figured itself, that it’s wrong..

Due to this observation and further testing around, I decided to try the more classical ABC methods, that include always a rejection step. This means, these algorithms first always exclude all simulations from the prediction, which summary stats are to far away from the target. I hope, that this rejection step will exclude the “middle” simulations from the prediction and will then produce for extreme enough targets, an accurtate model selection. In the most classical ABC, the best model is the one which occurs most often in the not rejected (or accepted) simulations. In the R package used for this, there are also two expansions of this baseline algorithm implemented. One were a logistic regression model predicts the best model from the accepted simulations and another one were a neural network does this. I tried all three of them, and figured that the one with the logistic regression is producing the most accurate and clear results for the error classification and the model selection for WUG5.

Before I show the results of it, there are two caveats with the classical ABC. The first one is, that is highly dependent on the choice of summary statistics. Given the high collinearity of them, I had to reduce them to a set with a good representation of the different scenarios. I used the PCA and also correlation tests between each of them to find a set, which doesn’t contain to little information but also reduced the collinearity to a minimum. Here is a PCA of the chosen set to also show the effect of the reduction.

So its the number of sites fixed for the alternative allele (fs), the number of sites fixed for the reference allele (sfs_0), the rate of heterozygosity (he), and Tajimas D (TD).

With these, I did again cross validation for this algorithm, but with different tolerances. The tolerance is the proportion of the closest simulations accepted by the algorithm. I did 1, 5, and 10 %.

## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE

## Warning: There are 3 models but only 2 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE

## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE

## Warning: There are 3 models but only 2 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE

## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE

## Warning in matrix(pred[pred != 0], nmod, nmod, byrow = T): data length [2] is
## not a sub-multiple or multiple of the number of rows [3]

## Warning in matrix(pred[pred != 0], nmod, nmod, byrow = F): data length [2] is
## not a sub-multiple or multiple of the number of rows [3]

## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE
## Warning: There are 3 models but only 1 for which simulations have been accepted.
## No regression is performed, method is set to rejection.
## Consider increasing the tolerance rate.TRUE

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  39  31  30
## exp  23  46  31
## neu  25  34  41
## 
## $tol0.02
##     con exp neu
## con  43  27  30
## exp  16  39  45
## neu  24  27  49
## 
## $tol0.05
##     con exp neu
## con  33  29  38
## exp  22  39  39
## neu  21  31  48
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4432 0.2774 0.2793
## exp 0.2495 0.4338 0.3167
## neu 0.2788 0.3415 0.3797
## 
## $tol0.02
##        con    exp    neu
## con 0.4369 0.2848 0.2783
## exp 0.2692 0.4000 0.3308
## neu 0.2930 0.3303 0.3767
## 
## $tol0.05
##        con    exp    neu
## con 0.4300 0.2809 0.2892
## exp 0.2798 0.3920 0.3282
## neu 0.3008 0.3291 0.3701

The results of this are similarly bad to the random forest results.. even a bit worse, tbh. An interesting observation is though, that the cross validation seem to get better with small tolerances. However, it gives also a lot of warning to increase the tolerance and this is, because there not enough simulations accepted to do the second step of the algorithm, the regression. This shows the second caveat of the classical ABC, which is that it needs a lot more simulations or is more dependent on the number of simulations. We later added more simulations to adress this and be able to choose a lower tolerance.

But with that, next comes the actual model selection for the real target call WUG5. Here, with a tolerance of 5 % cause this was the lowest I could go with the around 1000 simulations used initially.

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.05, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (141 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3617 0.3262 0.3121 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.1087 1.1591
## exp 0.9020 1.0000 1.0455
## neu 0.8627 0.9565 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.2454 0.4026 0.3519 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.6096 0.6974
## exp 1.6404 1.0000 1.1440
## neu 1.4339 0.8741 1.0000

The result of this shows, that the first step of the algorithm, the rejection, accepted round about the same amount of simulations from each model, a little more for the contraction model. Overall, 141 simulations were accepted. However, the second step was then able to correctly predict the expansion model for this target, but not with a lot of confidence. I like about the results from this package though, that it also calculated bayes factor, which is very helpful in interpreting the results. It shows that the expansion model is 1.6 more likely to be correct for this target than the contraction model, whereas it is only 1.1 times more likely than the neutral model.

1.2 To mask or not to mask

There were initially two ways to create simulations which are comparable to the target data. The first was to simulate data sets with round about the same number of SNPs as the filtered target data set (no_mask). The second was to simulate data sets with round about the same number of SNPs in the unfiltered (?) target data set, introduce artificial missingness into them by randomly (?) masking genotypes, and then filter for different proportions of missingness (?) (mask). The question is, which approach is more realistic and then hopefully better able to do accurate model selection.

A good way to see how well the simulated summary statistics fit the target summary statistics, is to look at a PCA biplot of them. Below are two of them next to each other for all the call populations, one from the mask approach the other one from the no_mask approach. Its a lot of plots to look at, so I collapsed them.

Click me for a lot of plots!

So looking at them, it seems the masking always pushes the target further into the direction of contraction. Another indication I was curious about though, is the size of the data sets. Which of the approaches leads to a data set with a more similar size to the real data. Here is an overview for the call populations again. Showing the number of SNPs in the targets and the mean number of SNPs in the simulations of all three demographic scenarios.

	GH	WUG1	WUG2	WUG3	WUG4	WUG5
target	1549.000	7236.000	8732.00	6336.000	6018.000	12344.00
mask_con	13426.177	31453.085	25704.25	8796.583	22028.267	14929.48
mask_exp	8577.369	17799.025	15152.86	6446.554	13926.624	11491.44
mask_neu	8822.756	19822.839	16360.98	6888.942	13848.016	11521.73
no_mask_con	11995.624	17883.867	18885.73	10434.312	16057.820	15550.39
no_mask_exp	6990.708	9387.704	10537.31	7126.544	9193.808	10128.25
no_mask_neu	7499.442	10364.788	11247.07	7592.429	9571.133	10769.53

The target always has the lowest number of SNPs, and mask has in most cases higher numbers of SNPs than no_mask. This is still a bit inconclusive, so I generated results from both methods. Here is an overview of the model selection for pall and call together with some more details about the data sets. The last six columns show the posterior probabilities of the model selection.

From these results, and also the fact that the number of SNPs is often times closer to the target, the no mask simulations seemed to be a better fit for the analysis.

2 Model selection for all species and populations

After choosing the ABC rejection+regression algorithm, a set of summary statistics, generating more simulations for no mask and tweaking the priors a little, I started doing the model selection for all of the populations and species. Here are the results.

Here are all the results and how they are generated in detail, including cross validation and other stuff. Just expand what you are interessted in. In the end is a summary table.

call GH

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  42  26  32
## exp  17  53  30
## neu  32  27  41
## 
## $tol0.02
##     con exp neu
## con  40  27  33
## exp  15  48  37
## neu  31  30  39
## 
## $tol0.05
##     con exp neu
## con  40  27  33
## exp  17  47  36
## neu  27  35  38
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4192 0.2713 0.3095
## exp 0.2042 0.4733 0.3225
## neu 0.3146 0.2855 0.3999
## 
## $tol0.02
##        con    exp    neu
## con 0.4449 0.2697 0.2853
## exp 0.2094 0.4593 0.3312
## neu 0.3027 0.3192 0.3782
## 
## $tol0.05
##        con    exp    neu
## con 0.4495 0.2572 0.2933
## exp 0.2347 0.4356 0.3297
## neu 0.2902 0.3416 0.3682

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (113 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3097 0.3717 0.3186 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.8333 0.9722
## exp 1.2000 1.0000 1.1667
## neu 1.0286 0.8571 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.4461 0.2709 0.2830 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.6467 1.5763
## exp 0.6073 1.0000 0.9573
## neu 0.6344 1.0446 1.0000

call WUG1

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  64  14  22
## exp  24  44  32
## neu  20  30  50
## 
## $tol0.02
##     con exp neu
## con  53  18  29
## exp  23  49  28
## neu  25  22  53
## 
## $tol0.05
##     con exp neu
## con  57  18  25
## exp  24  38  38
## neu  24  19  57
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.6321 0.1414 0.2265
## exp 0.2197 0.4500 0.3303
## neu 0.2101 0.3220 0.4679
## 
## $tol0.02
##        con    exp    neu
## con 0.5355 0.2041 0.2604
## exp 0.2575 0.4560 0.2866
## neu 0.2881 0.2496 0.4623
## 
## $tol0.05
##        con    exp    neu
## con 0.5331 0.2163 0.2506
## exp 0.2569 0.4120 0.3312
## neu 0.2678 0.2849 0.4473

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (115 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.2000 0.3217 0.4783 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.6216 0.4182
## exp 1.6087 1.0000 0.6727
## neu 2.3913 1.4865 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.1343 0.0983 0.7675 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.3660 0.1749
## exp 0.7320 1.0000 0.1281
## neu 5.7163 7.8087 1.0000

call WUG2

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  49  21  30
## exp  14  57  29
## neu  15  28  57
## 
## $tol0.02
##     con exp neu
## con  43  22  35
## exp  16  65  19
## neu  20  27  53
## 
## $tol0.05
##     con exp neu
## con  36  29  35
## exp  18  55  27
## neu  25  34  41
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4882 0.2151 0.2967
## exp 0.1535 0.5572 0.2894
## neu 0.1779 0.2909 0.5312
## 
## $tol0.02
##        con    exp    neu
## con 0.4467 0.2472 0.3061
## exp 0.1928 0.5691 0.2381
## neu 0.2042 0.3077 0.4881
## 
## $tol0.05
##        con    exp    neu
## con 0.4321 0.2745 0.2934
## exp 0.2119 0.5048 0.2833
## neu 0.2615 0.3428 0.3956

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (116 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3362 0.2759 0.3879 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.2188 0.8667
## exp 0.8205 1.0000 0.7111
## neu 1.1538 1.4062 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.4123 0.0391 0.5486 
## 
## Bayes factors:
##         con     exp     neu
## con  1.0000 10.5445  0.7516
## exp  0.0948  1.0000  0.0713
## neu  1.3305 14.0291  1.0000

call WUG3

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  52  28  20
## exp  31  39  30
## neu  24  29  47
## 
## $tol0.02
##     con exp neu
## con  45  34  21
## exp  22  44  34
## neu  24  38  38
## 
## $tol0.05
##     con exp neu
## con  37  37  26
## exp  22  47  31
## neu  18  40  42
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4712 0.3189 0.2098
## exp 0.2882 0.3957 0.3161
## neu 0.2680 0.3193 0.4127
## 
## $tol0.02
##        con    exp    neu
## con 0.4446 0.3153 0.2401
## exp 0.2512 0.4042 0.3446
## neu 0.2763 0.3595 0.3642
## 
## $tol0.05
##        con    exp    neu
## con 0.4266 0.3080 0.2654
## exp 0.2575 0.4224 0.3201
## neu 0.2747 0.3568 0.3684

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (117 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3162 0.4188 0.2650 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.7551 1.1935
## exp 1.3243 1.0000 1.5806
## neu 0.8378 0.6327 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.2978 0.2631 0.4391 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.1320 0.6784
## exp 0.8834 1.0000 0.5993
## neu 1.4742 1.6687 1.0000

call WUG4

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  44  32  24
## exp  14  54  32
## neu  25  40  35
## 
## $tol0.02
##     con exp neu
## con  49  28  23
## exp  18  56  26
## neu  26  35  39
## 
## $tol0.05
##     con exp neu
## con  46  23  31
## exp  20  46  34
## neu  28  36  36
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4362 0.3045 0.2593
## exp 0.1480 0.5364 0.3156
## neu 0.2627 0.3596 0.3777
## 
## $tol0.02
##        con    exp    neu
## con 0.4840 0.2665 0.2495
## exp 0.2074 0.5024 0.2902
## neu 0.2903 0.3563 0.3534
## 
## $tol0.05
##        con    exp    neu
## con 0.4813 0.2339 0.2848
## exp 0.2289 0.4775 0.2936
## neu 0.2849 0.3553 0.3598

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (117 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.2051 0.2906 0.5043 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.7059 0.4068
## exp 1.4167 1.0000 0.5763
## neu 2.4583 1.7353 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.1974 0.4497 0.3529 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.4390 0.5594
## exp 2.2780 1.0000 1.2743
## neu 1.7877 0.7848 1.0000

call WUG5

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  40  27  33
## exp  27  48  25
## neu  27  30  43
## 
## $tol0.02
##     con exp neu
## con  33  27  40
## exp  25  50  25
## neu  22  36  42
## 
## $tol0.05
##     con exp neu
## con  37  30  33
## exp  23  53  24
## neu  21  39  40
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4086 0.2764 0.3150
## exp 0.2622 0.4625 0.2754
## neu 0.2758 0.3012 0.4230
## 
## $tol0.02
##        con    exp    neu
## con 0.3685 0.2900 0.3415
## exp 0.2576 0.4893 0.2531
## neu 0.2648 0.3323 0.4029
## 
## $tol0.05
##        con    exp    neu
## con 0.3893 0.3062 0.3046
## exp 0.2666 0.4527 0.2807
## neu 0.2790 0.3677 0.3533

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (113 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3186 0.3009 0.3805 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.0588 0.8372
## exp 0.9444 1.0000 0.7907
## neu 1.1944 1.2647 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.1144 0.2473 0.6382 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.4627 0.1793
## exp 2.1614 1.0000 0.3876
## neu 5.5771 2.5803 1.0000

pall GH

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  49  21  30
## exp  15  49  36
## neu  16  33  51
## 
## $tol0.02
##     con exp neu
## con  46  18  36
## exp  13  51  36
## neu   8  34  58
## 
## $tol0.05
##     con exp neu
## con  43  17  40
## exp  15  46  39
## neu  15  29  56
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4753 0.2165 0.3081
## exp 0.2019 0.4670 0.3311
## neu 0.2213 0.3209 0.4578
## 
## $tol0.02
##        con    exp    neu
## con 0.4658 0.2226 0.3116
## exp 0.2019 0.4813 0.3168
## neu 0.2189 0.3341 0.4470
## 
## $tol0.05
##        con    exp    neu
## con 0.4770 0.2327 0.2903
## exp 0.2239 0.4580 0.3181
## neu 0.2491 0.3457 0.4052

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (116 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3879 0.3103 0.3017 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.2500 1.2857
## exp 0.8000 1.0000 1.0286
## neu 0.7778 0.9722 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.3818 0.3061 0.3121 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.2474 1.2231
## exp 0.8017 1.0000 0.9806
## neu 0.8176 1.0198 1.0000

pall WUG1

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  51  23  26
## exp  26  49  25
## neu  24  27  49
## 
## $tol0.02
##     con exp neu
## con  49  31  20
## exp  17  58  25
## neu  22  31  47
## 
## $tol0.05
##     con exp neu
## con  48  31  21
## exp  17  56  27
## neu  17  37  46
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.5288 0.2194 0.2518
## exp 0.2493 0.4901 0.2606
## neu 0.2608 0.3059 0.4333
## 
## $tol0.02
##        con    exp    neu
## con 0.4989 0.2621 0.2391
## exp 0.2150 0.5157 0.2693
## neu 0.2749 0.3257 0.3994
## 
## $tol0.05
##        con    exp    neu
## con 0.4866 0.2614 0.2520
## exp 0.2115 0.5074 0.2811
## neu 0.2683 0.3481 0.3836

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (115 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.2522 0.4087 0.3391 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.6170 0.7436
## exp 1.6207 1.0000 1.2051
## neu 1.3448 0.8298 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.0420 0.6673 0.2906 
## 
## Bayes factors:
##         con     exp     neu
## con  1.0000  0.0630  0.1447
## exp 15.8720  1.0000  2.2964
## neu  6.9117  0.4355  1.0000

pall WUG2

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  53  24  23
## exp  21  51  28
## neu  19  38  43
## 
## $tol0.02
##     con exp neu
## con  55  21  24
## exp  19  46  35
## neu  16  32  52
## 
## $tol0.05
##     con exp neu
## con  49  27  24
## exp  21  44  35
## neu  16  42  42
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.5238 0.2482 0.2280
## exp 0.2117 0.5097 0.2785
## neu 0.2056 0.3668 0.4275
## 
## $tol0.02
##        con    exp    neu
## con 0.5317 0.2368 0.2315
## exp 0.2356 0.4430 0.3214
## neu 0.2218 0.3222 0.4560
## 
## $tol0.05
##        con    exp    neu
## con 0.5291 0.2462 0.2248
## exp 0.2458 0.4293 0.3249
## neu 0.2551 0.3575 0.3874

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (117 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.3248 0.2479 0.4274 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 1.3103 0.7600
## exp 0.7632 1.0000 0.5800
## neu 1.3158 1.7241 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.0876 0.1467 0.7656 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.5972 0.1145
## exp 1.6743 1.0000 0.1917
## neu 8.7357 5.2174 1.0000

plib GH

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  47  23  30
## exp  25  40  35
## neu  13  35  52
## 
## $tol0.02
##     con exp neu
## con  46  20  34
## exp  24  41  35
## neu  20  30  50
## 
## $tol0.05
##     con exp neu
## con  49  17  34
## exp  21  42  37
## neu  21  26  53
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.4881 0.2211 0.2908
## exp 0.2687 0.4279 0.3034
## neu 0.2263 0.3528 0.4209
## 
## $tol0.02
##        con    exp    neu
## con 0.4830 0.2379 0.2791
## exp 0.2917 0.4094 0.2989
## neu 0.2705 0.3284 0.4012
## 
## $tol0.05
##        con    exp    neu
## con 0.4890 0.2344 0.2766
## exp 0.2789 0.4196 0.3015
## neu 0.2898 0.3151 0.3952

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (116 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.1983 0.3879 0.4138 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.5111 0.4792
## exp 1.9565 1.0000 0.9375
## neu 2.0870 1.0667 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.2015 0.3269 0.4715 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.6164 0.4274
## exp 1.6223 1.0000 0.6933
## neu 2.3400 1.4423 1.0000

plib WUG1

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  52  20  28
## exp  35  36  29
## neu  27  30  43
## 
## $tol0.02
##     con exp neu
## con  44  21  35
## exp  37  27  36
## neu  33  23  44
## 
## $tol0.05
##     con exp neu
## con  45  14  41
## exp  35  24  41
## neu  30  14  56
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.5067 0.2198 0.2735
## exp 0.3456 0.3510 0.3034
## neu 0.2764 0.2964 0.4272
## 
## $tol0.02
##        con    exp    neu
## con 0.4667 0.2295 0.3038
## exp 0.3155 0.3356 0.3490
## neu 0.2967 0.2786 0.4246
## 
## $tol0.05
##        con    exp    neu
## con 0.4637 0.2133 0.3230
## exp 0.3347 0.3181 0.3472
## neu 0.3271 0.2681 0.4048

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (115 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.2783 0.3565 0.3652 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.7805 0.7619
## exp 1.2812 1.0000 0.9762
## neu 1.3125 1.0244 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.2238 0.0683 0.7079 
## 
## Bayes factors:
##         con     exp     neu
## con  1.0000  3.2761  0.3162
## exp  0.3052  1.0000  0.0965
## neu  3.1625 10.3609  1.0000

ppli GH

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  73   9  18
## exp   7  68  25
## neu  17  20  63
## 
## $tol0.02
##     con exp neu
## con  73   9  18
## exp   9  66  25
## neu  20  21  59
## 
## $tol0.05
##     con exp neu
## con  73  11  16
## exp   6  72  22
## neu  16  19  65
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.7298 0.1024 0.1678
## exp 0.0882 0.6547 0.2570
## neu 0.1724 0.2681 0.5595
## 
## $tol0.02
##        con    exp    neu
## con 0.7282 0.1227 0.1490
## exp 0.1101 0.6614 0.2285
## neu 0.1984 0.2640 0.5376
## 
## $tol0.05
##        con    exp    neu
## con 0.7247 0.1374 0.1379
## exp 0.1165 0.6639 0.2196
## neu 0.1955 0.2661 0.5384

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (115 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.0174 0.0957 0.8870 
## 
## Bayes factors:
##         con     exp     neu
## con  1.0000  0.1818  0.0196
## exp  5.5000  1.0000  0.1078
## neu 51.0000  9.2727  1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.0172 0.0033 0.9794 
## 
## Bayes factors:
##          con      exp      neu
## con   1.0000   5.1997   0.0176
## exp   0.1923   1.0000   0.0034
## neu  56.7857 295.2708   1.0000

ppli WUG1

** PCA: **

** Cross validation: **

## Confusion matrix based on 100 samples for each model.
## 
## $tol0.01
##     con exp neu
## con  78  10  12
## exp  10  68  22
## neu  19  22  59
## 
## $tol0.02
##     con exp neu
## con  80   9  11
## exp   9  63  28
## neu  21  16  63
## 
## $tol0.05
##     con exp neu
## con  74  10  16
## exp  13  68  19
## neu  18  19  63
## 
## 
## Mean model posterior probabilities (mnlogistic)
## 
## $tol0.01
##        con    exp    neu
## con 0.7741 0.1181 0.1078
## exp 0.0989 0.6660 0.2351
## neu 0.2133 0.2293 0.5574
## 
## $tol0.02
##        con    exp    neu
## con 0.7697 0.1172 0.1131
## exp 0.1016 0.6674 0.2311
## neu 0.2251 0.2090 0.5659
## 
## $tol0.05
##        con    exp    neu
## con 0.7430 0.1217 0.1353
## exp 0.1085 0.6672 0.2243
## neu 0.2320 0.2457 0.5223

** Model Selection: **

## Call: 
## postpr(target = tar, index = model, sumstat = sumstat, tol = 0.02, 
##     method = "mnlogistic")
## Data:
##  postpr.out$values (117 posterior samples)
## Models a priori:
##  con, exp, neu
## Models a posteriori:
##  con, exp, neu
## 
## Proportion of accepted simulations (rejection):
##    con    exp    neu 
## 0.1197 0.2051 0.6752 
## 
## Bayes factors:
##        con    exp    neu
## con 1.0000 0.5833 0.1772
## exp 1.7143 1.0000 0.3038
## neu 5.6429 3.2917 1.0000
## 
## 
## Posterior model probabilities (mnlogistic):
##    con    exp    neu 
## 0.0000 0.2091 0.7909 
## 
## Bayes factors:
##            con        exp        neu
## con     1.0000     0.0001     0.0000
## exp 14011.9688     1.0000     0.2644
## neu 52995.1978     3.7821     1.0000

2.1 Summary Table

	con	exp	neu
call_GH_pred	0.4460928	0.2709064	0.2830008
call_WUG1_pred	0.1342582	0.0982820	0.7674598
call_WUG2_pred	0.4123209	0.0391028	0.5485763
call_WUG3_pred	0.2978334	0.2631150	0.4390517
call_WUG4_pred	0.1974067	0.4496898	0.3529035
call_WUG5_pred	0.1144357	0.2473456	0.6382187
pall_GH_pred	0.3817845	0.3060761	0.3121394
pall_WUG1_pred	0.0420456	0.6673467	0.2906078
pall_WUG2_pred	0.0876423	0.1467432	0.7656145
plib_GH_pred	0.2015201	0.3269322	0.4715478
plib_WUG1_pred	0.2238246	0.0683200	0.7078554
ppli_GH_pred	0.0172479	0.0033171	0.9794350
ppli_WUG1_pred	0.0000149	0.2091084	0.7908767

Marios Frog ABC

1 Finding the best methods

1.1 ABC Algorithms

1.2 To mask or not to mask

2 Model selection for all species and populations

2.1 Summary Table