Bottleneck For The Win!

Get Files & Stuff

setwd("~/projects/R_reports/")
system("rsync -nc farm:~/projects/bneck/results/\\* ~/projects/R_reports/data")
library(ggplot2)

Functions

# pass file vector of files names, vector of s, vector of h
make.frame <- function(x, sel, h) {
    namex <- paste("./data/neutral.", x, ".txt.stats", sep = "")
    z = data.frame()
    for (i in 1:length(namex)) {
        file <- read.table(namex[i], header = T)
        s <- rep(sel[i], length(file$pi))
        h <- rep(h[i], length(file$pi))
        y = cbind(file, s)
        y = cbind(y, h)
        z = rbind(z, y)
    }
    return(z)
}

Round 1

Sims were done with:

bneck_selection $N $neutral_theta $selected_theta $rho $s $h $g1 $N2 $N3 $g2 $n $nreps $seed  

which corresponds to

bneck_selection 15000 18 12 18 s 0.1 150 5000 100000 900 20 10 78762

and s of 0, -1e-06, -0.001, -0.01

Data

files <- c(190078, 191078, 192078, 193078)
s <- c(0, -0.001, -1e-06, -0.01)
h <- rep(0.1, 4)
dat <- make.frame(files, s, h)
## Warning: cannot open file './data/neutral.190078.txt.stats': No such file
## or directory
## Error: cannot open the connection

Plot D and Pi at Neutral Sites

ggplot(data = dat, aes(factor(s), pi)) + geom_boxplot() + xlab("s") + ylab(expression(pi))
## Error: object 'dat' not found
ggplot(data = dat, aes(factor(s), tajd)) + geom_boxplot() + xlab("s") + ylab("Tajima's D")
## Error: object 'dat' not found

Conclusions Round 1:

Striking difference in D in direction observed in maize. BUT burn-in totally inadequate unless I'm misunderstanding fwdpp.

Not much difference for large values of s.

To try:

Round 2: longer burn-in

Sims were done with:

bneck_selection $N $neutral_theta $selected_theta $rho $s $h $g1 $N2 $N3 $g2 $n $nreps $seed  

which corresponds to

bneck_selection 15000 18 12 18 s 0.1 15000 5000 100000 900 20 10 78762

and s of 0, -1e-05

Data

files = c(229003, 230003)
s = c(0, -1e-05)
h = c(0.1, 0.1)
dat <- make.frame(files, s, h)
## Warning: cannot open file './data/neutral.229003.txt.stats': No such file
## or directory
## Error: cannot open the connection

Plot D and Pi at neutral sites

ggplot(data = dat, aes(factor(s), pi)) + geom_boxplot() + xlab("s") + ylab(expression(pi))
## Error: object 'dat' not found
ggplot(data = dat, aes(factor(s), tajd)) + geom_boxplot() + xlab("s") + ylab("Tajima's D")
## Error: object 'dat' not found

Conclusions Round 2:

Burn-in matters (duh). Difference in D much less striking.

To try:

Round 3: Serious Burn-in

Sims were done with:

bneck_selection $N $neutral_theta $selected_theta $rho $s $h $g1 $N2 $N3 $g2 $n $nreps $seed  

which corresponds to

bneck_selection 15000 18 12 18 s 0.1 90000 5000 100000 900 20 10 78762

to allow 6N generations of burn-in to reach equilibrium.

Data

files <- c(231007, 233007, 234008, 235008)
s = c(0, -0.001, -1e-05, -1e-07)
h = rep(0.1, 4)
dat <- make.frame(files, s, h)
## Warning: cannot open file './data/neutral.231007.txt.stats': No such file
## or directory
## Error: cannot open the connection

Plot D and Pi at Neutral Sites

library(ggplot2)
ggplot(data = dat, aes(factor(s), pi)) + geom_boxplot() + xlab("s") + ylab(expression(pi))
## Error: object 'dat' not found
library(ggplot2)
ggplot(data = dat, aes(factor(s), tajd)) + geom_boxplot() + xlab("s") + ylab("Tajima's D")
## Error: object 'dat' not found

Conclusions Round 3:

D seems to depend on s. For N3*s ~ 1 we get more positive D with BGS, but weak signal. Results from Round #1 make me think that having less variation to start with (i.e. stronger bottleneck) might be important. Will eventually want to ramp burn-in up to 10N generations.

To try:

Round 4: tighter bneck

Sims were done with:

bneck_selection $N $neutral_theta $selected_theta $rho $s $h $g1 $N2 $N3 $g2 $n $nreps $seed  

which corresponds to

bneck_selection 15000 18 12 18 s 0.1 90000 5000 100000 900 20 10 78762

to allow 6N generations of burn-in to reach equilibrium.

Data

files <- c(236030, 237030, 238030, 239030)
s = c(0, -1e-05, -1e-04, -1e-06)
h = rep(0.1, 4)
dat <- make.frame(files, s, h)
## Warning: cannot open file './data/neutral.236030.txt.stats': No such file
## or directory
## Error: cannot open the connection

Plot D and Pi at Neutral Sites

ggplot(data = dat, aes(factor(s), pi)) + geom_boxplot() + xlab("s") + ylab(expression(pi))
## Error: object 'dat' not found
ggplot(data = dat, aes(factor(s), tajd)) + geom_boxplot() + xlab("s") + ylab("Tajima's D")
## Error: object 'dat' not found

D significanlty depends on s

summary(lm(dat$tajd ~ dat$s))
## Error: object 'dat' not found

Round 5: 10N gens, range of s, redo theta for neutral/selected

Sims were done with:

bneck_selection $N $neutral_theta $selected_theta $rho $s $h $g1 $N2 $N3 $g2 $n $nreps $seed  

which corresponds to

bneck_selection 15000 6 12 18 s h 150000 1500 100000 900 20 10 78762

to allow 10N generations of burn-in to reach equilibrium. Redid \( \theta \) so that 1/3 of mutations are neutral (\( \theta_S=6 \)), 2/3 deleterious (\( \theta_N=12 \)) with (\( \theta_T=\theta_N+\theta_S \)) . \( \rho=\theta_T=18 \) sApproximates a ~1Kb region.

Data

files <- c(275286, 276286, 277286, 278286, 279286)
s = c(0, -0.001, -1e-04, -1e-05, -1e-06)
h = rep(0.1, 5)
dat <- make.frame(files, s, h)
## Warning: cannot open file './data/neutral.275286.txt.stats': No such file
## or directory
## Error: cannot open the connection

Plot D and Pi at Neutral Sites

ggplot(data = dat, aes(factor(s), pi)) + geom_boxplot() + xlab("s") + ylab(expression(pi))
## Error: object 'dat' not found
ggplot(data = dat, aes(factor(s), tajd)) + geom_boxplot() + xlab("s") + ylab("Tajima's D")
## Error: object 'dat' not found

D significanlty depends on s

summary(lm(dat$tajd ~ dat$s))
## Error: object 'dat' not found

Round 6: 10N gens, range of s, redo theta for neutral/selected, teosinte i.e. no size change.

Sims were done with:

bneck_selection $N $neutral_theta $selected_theta $rho $s $h $g1 $N2 $N3 $g2 $n $nreps $seed  

which corresponds to

bneck_selection 15000 6 12 18 s h 150000 1500 100000 900 20 10 78762

to allow 10N generations of burn-in to reach equilibrium. Redid \( \theta \) so that 1/3 of mutations are neutral (\( \theta_S=6 \)), 2/3 deleterious (\( \theta_N=12 \)) with (\( \theta_T=\theta_N+\theta_S \)) . \( \rho=\theta_T=18 \) sApproximates a ~1Kb region.

Data

files <- c(271286, 272286, 273286, 274286)
s = c(0, -1e-05, -1e-04, -0.001)
h = rep(0.1, 7)
dat <- make.frame(files, s, h)
## Warning: cannot open file './data/neutral.271286.txt.stats': No such file
## or directory
## Error: cannot open the connection

Plot D and Pi at Neutral Sites

ggplot(data = dat, aes(factor(s), pi)) + geom_boxplot() + xlab("s") + ylab(expression(pi))
## Error: object 'dat' not found
ggplot(data = dat, aes(factor(s), tajd)) + geom_boxplot() + xlab("s") + ylab("Tajima's D")
## Error: object 'dat' not found

D significanlty depends on s

summary(lm(dat$tajd ~ dat$s))
## Error: object 'dat' not found