Inference of demography using R and ABC

The task we need to solve is: **Infer the expansion model parameters for each of the datasets and decide which is the “mild” and which is the “rapid” expansion.

First, we clean the memory in R and we load the necessary libraries.

## clean the memory
rm(list=ls())
## load the abc package
library(abc)

Next, we need to read in the datasets. You have two datasets, ms1 and ms2. One of them is coming from a mild expansion. The other one is coming from a more rapid expansion. The task we need to solve is: **Infer the expansion model parameters for each of the datasets and decide which is the “mild” and which is the “rapid” expansion.

## read in the observed summary statistics                                                                                                                               
obs.stat <- read.table("ms2.stat", header=TRUE)
## read in the simulation output                                                                                                                                         
sim.info <- read.table("sims.data", header=TRUE)
## parameters are the first two columns
sim.param <- sim.info[, 1:2]
## Get sumstat. We remove summary statistic 1, it's colinear with another one
sim.stat <- sim.info[, 4:ncol(sim.info)]
## also from the observed dataset
obs.stat <- obs.stat[,2:ncol(obs.stat)]

Let's see how many parameters and summary statistics we have

dim(obs.stat)
## [1] 1 9
dim(sim.stat)
## [1] 2000    9
dim(sim.param)
## [1] 2000    2

Summary statistics are correlated between themselves and with the parameters

plot(sim.info[1:100, 3:ncol(sim.info)], cex=0.6)
}
## Error: <text>:2:1: unexpected '}'
## 1: plot(sim.info[1:100, 3:ncol(sim.info)], cex=0.6)
## 2: }
##    ^

Summary statistics capture only some information that is related to our parameters. Also, not all summary statistics are equally informative. Some are more related to parameters (or some of them). Ideally we would like to use the most informative statistics. The rest are just 'producing noise'.

\( \color{red}{\text{Which statistics are the most informative}}? \)

#layout(matrix(1:(2*9), ncol=2, byrow=F), widths=3, heights=30)
par(mfrow=c(9,2), las=1)
par(mar=c(5,3,3,1))
for(i in 1:2){
  for(j in 1:ncol(sim.stat)){
    plot(sim.stat[1:200,j], sim.param[1:200,i], pch=19, col='blue', cex=0.6, xlab="", ylab="", main=paste(colnames(sim.stat)[j], "\nvs ", colnames(sim.param)[i]) )
  }
}