Hollie asked me to use the bioconductor package qvalue() to try and identify the false detection rate for the Geoduck DMR/Ls we’ve been identifying. I attempted to read the paper behind the q-value located here. It sort of, almost made sense, so I figured I’d progress to actually playing with the package.

First, we need to install the package. It comes from the Bioconductor repo. There were a lot of other things that neeed updated, so I went ahead and did that. I won’t make you suffer through the output, because it’s long and very boring.

library(readr)
library(BiocInstaller)
#biocLite("qvalue")
library(qvalue)

before we explore our Geoduck data, I want to explore their test dataset “hedenfalk” that they use in their documentation to see what an “expected” result looks like. Hedenfalk is a gene expression study on breastcancer with 3226 genes and 3170 two sample t-tests run between the genes.

stat is a vector of the t-test t-statistics, and stat0 is simulated t-tests from permutations on the labels, which generates a null distribution based on the data.

empPvals calculates p-values from supplied obsered test-statistics and simulated null distribution

data(hedenfalk)
stat <- hedenfalk$stat
stat0 <- hedenfalk$stat0
p.pooled <- empPvals(stat = stat, stat0 = stat0)
p.testspecific <- empPvals(stat = stat, stat0=stat0, pool= FALSE)

next we use a quantile-quantile plot to make sure that our observed and our simulated null distributions come from the same distribution.

qqplot(p.pooled, p.testspecific);abline(0,1)

And they do! so hooray. Lets make some q scores.

qobj <- qvalue(p.pooled)
summary(qobj)

Call:
qvalue(p = p.pooled)

pi0:    0.669926    

Cumulative number of significant calls:

          <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1   <1
p-value       15     76   265    424   605  868 3170
q-value        0      0     1     73   162  319 3170
local FDR      0      0     3     30    85  167 2241

The summary above gives us cumulative sigificant results for different q-value cutoffs, as well as a theoretical number of false detectections at a given q-value cutoff.

hist(qobj)

plot(qobj)

Lets look at our data now

results_10x <- read_delim("~/Documents/Genewiz_hdd/Day10/macau/output/result.assoc.txt", 
    "\t", escape_double = FALSE, trim_ws = TRUE)
Parsed with column specification:
cols(
  id = col_character(),
  n = col_integer(),
  acpt_rate = col_double(),
  beta = col_double(),
  se_beta = col_double(),
  pvalue = col_double(),
  h = col_double(),
  se_h = col_double(),
  sigma2 = col_double(),
  se_sigma2 = col_double(),
  alpha0 = col_double(),
  se_alpha0 = col_double()
)
head(results_10x)

We care about the pvalue column, it represents the p-value that the beta parameter adds meaningful information to the overall model developed by program. Beta is the parameter for the treatment variable, so if it’s significant, that means the treatment is likely meaningful.

test.q <- qvalue(results_10x$pvalue)
summary(test.q)

Call:
qvalue(p = results_10x$pvalue)

pi0:    1   

Cumulative number of significant calls:

          <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1   <1
p-value        0      1    11     22    41  103 1974
q-value        0      0     0      0     0    0 1974
local FDR      0      0     0      0     0    0    1
hist(test.q)

plot(test.q)

Those… don’t look anything like the example data.

Lets try it again, with a more liberal cutoff point applied to trimming the data.

results_5x <- read_delim("~/Documents/Genewiz_hdd/Day10/macauv2/output/5x.assoc.txt", 
    "\t", escape_double = FALSE, trim_ws = TRUE)
Parsed with column specification:
cols(
  id = col_character(),
  n = col_integer(),
  acpt_rate = col_double(),
  beta = col_double(),
  se_beta = col_double(),
  pvalue = col_double(),
  h = col_double(),
  se_h = col_double(),
  sigma2 = col_double(),
  se_sigma2 = col_double(),
  alpha0 = col_double(),
  se_alpha0 = col_double()
)
nrow(results_5x)
[1] 8936
head(results_5x)
qval.5x <- qvalue(results_5x$pvalue)
summary(qval.5x)

Call:
qvalue(p = results_5x$pvalue)

pi0:    1   

Cumulative number of significant calls:

          <1e-04 <0.001 <0.01 <0.025 <0.05 <0.1   <1
p-value        3     16    64    144   300  625 8936
q-value        0      0     1      1     2    2 8936
local FDR      0      0     1      1     1    1   18

That’s… a little more heartening. Though still it seems like a very low significance rate via q-value.

hist(qval.5x)

plot(qval.5x)

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

LS0tCnRpdGxlOiAiRnVuIHdpdGggcXZhbHVlKCkiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCkhvbGxpZSBhc2tlZCBtZSB0byB1c2UgdGhlIGJpb2NvbmR1Y3RvciBwYWNrYWdlIHF2YWx1ZSgpIHRvIHRyeSBhbmQgaWRlbnRpZnkgdGhlIGZhbHNlIGRldGVjdGlvbiByYXRlIGZvciB0aGUgR2VvZHVjayBETVIvTHMgd2UndmUgYmVlbiBpZGVudGlmeWluZy4gSSBhdHRlbXB0ZWQgdG8gcmVhZCB0aGUgcGFwZXIgYmVoaW5kIHRoZSBxLXZhbHVlIGxvY2F0ZWQgW2hlcmVdKGh0dHA6Ly93d3cuZ2Vub21pbmUub3JnL3BhcGVycy9kaXJlY3RmZHIucGRmKS4gSXQgc29ydCBvZiwgYWxtb3N0IG1hZGUgc2Vuc2UsIHNvIEkgZmlndXJlZCBJJ2QgcHJvZ3Jlc3MgdG8gYWN0dWFsbHkgcGxheWluZyB3aXRoIHRoZSBwYWNrYWdlLgoKRmlyc3QsIHdlIG5lZWQgdG8gaW5zdGFsbCB0aGUgcGFja2FnZS4gSXQgY29tZXMgZnJvbSB0aGUgQmlvY29uZHVjdG9yIHJlcG8uIFRoZXJlIHdlcmUgYSBsb3Qgb2Ygb3RoZXIgdGhpbmdzIHRoYXQgbmVlZWQgdXBkYXRlZCwgc28gSSB3ZW50IGFoZWFkIGFuZCBkaWQgdGhhdC4gSSB3b24ndCBtYWtlIHlvdSBzdWZmZXIgdGhyb3VnaCB0aGUgb3V0cHV0LCBiZWNhdXNlIGl0J3MgbG9uZyBhbmQgdmVyeSBib3JpbmcuIAoKYGBge3J9CgpsaWJyYXJ5KHJlYWRyKQpsaWJyYXJ5KEJpb2NJbnN0YWxsZXIpCgojYmlvY0xpdGUoInF2YWx1ZSIpCgpsaWJyYXJ5KHF2YWx1ZSkKYGBgCgpiZWZvcmUgd2UgZXhwbG9yZSBvdXIgR2VvZHVjayBkYXRhLCBJIHdhbnQgdG8gZXhwbG9yZSB0aGVpciB0ZXN0IGRhdGFzZXQgImhlZGVuZmFsayIgdGhhdCB0aGV5IHVzZSBpbiB0aGVpciBkb2N1bWVudGF0aW9uIHRvIHNlZSB3aGF0IGFuICJleHBlY3RlZCIgcmVzdWx0IGxvb2tzIGxpa2UuIEhlZGVuZmFsayBpcyBhIGdlbmUgZXhwcmVzc2lvbiBzdHVkeSBvbiBicmVhc3RjYW5jZXIgd2l0aCAzMjI2IGdlbmVzIGFuZCAzMTcwIHR3byBzYW1wbGUgdC10ZXN0cyBydW4gYmV0d2VlbiB0aGUgZ2VuZXMuCgpzdGF0IGlzIGEgdmVjdG9yIG9mIHRoZSB0LXRlc3QgdC1zdGF0aXN0aWNzLCBhbmQgc3RhdDAgaXMgc2ltdWxhdGVkIHQtdGVzdHMgZnJvbSBwZXJtdXRhdGlvbnMgb24gdGhlIGxhYmVscywgd2hpY2ggZ2VuZXJhdGVzIGEgbnVsbCBkaXN0cmlidXRpb24gYmFzZWQgb24gdGhlIGRhdGEuCgplbXBQdmFscyBjYWxjdWxhdGVzIHAtdmFsdWVzIGZyb20gc3VwcGxpZWQgb2JzZXJlZCB0ZXN0LXN0YXRpc3RpY3MgYW5kIHNpbXVsYXRlZCBudWxsIGRpc3RyaWJ1dGlvbgpgYGB7cn0KCmRhdGEoaGVkZW5mYWxrKQoKc3RhdCA8LSBoZWRlbmZhbGskc3RhdApzdGF0MCA8LSBoZWRlbmZhbGskc3RhdDAKCnAucG9vbGVkIDwtIGVtcFB2YWxzKHN0YXQgPSBzdGF0LCBzdGF0MCA9IHN0YXQwKQpwLnRlc3RzcGVjaWZpYyA8LSBlbXBQdmFscyhzdGF0ID0gc3RhdCwgc3RhdDA9c3RhdDAsIHBvb2w9IEZBTFNFKQoKCmBgYAoKbmV4dCB3ZSB1c2UgYSBxdWFudGlsZS1xdWFudGlsZSBwbG90IHRvIG1ha2Ugc3VyZSB0aGF0IG91ciBvYnNlcnZlZCBhbmQgb3VyIHNpbXVsYXRlZCBudWxsIGRpc3RyaWJ1dGlvbnMgY29tZSBmcm9tIHRoZSBzYW1lIGRpc3RyaWJ1dGlvbi4KCmBgYHtyfQpxcXBsb3QocC5wb29sZWQsIHAudGVzdHNwZWNpZmljKTthYmxpbmUoMCwxKQpgYGAKCkFuZCB0aGV5IGRvISBzbyBob29yYXkuIExldHMgbWFrZSBzb21lIHEgc2NvcmVzLgoKYGBge3J9CnFvYmogPC0gcXZhbHVlKHAucG9vbGVkKQoKc3VtbWFyeShxb2JqKQpgYGAKClRoZSBzdW1tYXJ5IGFib3ZlIGdpdmVzIHVzIGN1bXVsYXRpdmUgc2lnaWZpY2FudCByZXN1bHRzIGZvciBkaWZmZXJlbnQgcS12YWx1ZSBjdXRvZmZzLCBhcyB3ZWxsIGFzIGEgdGhlb3JldGljYWwgbnVtYmVyIG9mIGZhbHNlIGRldGVjdGVjdGlvbnMgYXQgYSBnaXZlbiBxLXZhbHVlIGN1dG9mZi4gCgpgYGB7cn0KCmhpc3QocW9iaikKCmBgYAoKCmBgYHtyfQoKcGxvdChxb2JqKQoKYGBgCgpMZXRzIGxvb2sgYXQgb3VyIGRhdGEgbm93CgpgYGB7cn0KCnJlc3VsdHNfMTB4IDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL0dlbmV3aXpfaGRkL0RheTEwL21hY2F1L291dHB1dC9yZXN1bHQuYXNzb2MudHh0IiwgCiAgICAiXHQiLCBlc2NhcGVfZG91YmxlID0gRkFMU0UsIHRyaW1fd3MgPSBUUlVFKQoKCmhlYWQocmVzdWx0c18xMHgpCmBgYAoKV2UgY2FyZSBhYm91dCB0aGUgcHZhbHVlIGNvbHVtbiwgaXQgcmVwcmVzZW50cyB0aGUgcC12YWx1ZSB0aGF0IHRoZSBiZXRhIHBhcmFtZXRlciBhZGRzIG1lYW5pbmdmdWwgaW5mb3JtYXRpb24gdG8gdGhlIG92ZXJhbGwgbW9kZWwgZGV2ZWxvcGVkIGJ5IHByb2dyYW0uIEJldGEgaXMgdGhlIHBhcmFtZXRlciBmb3IgdGhlIHRyZWF0bWVudCB2YXJpYWJsZSwgc28gaWYgaXQncyBzaWduaWZpY2FudCwgdGhhdCBtZWFucyB0aGUgdHJlYXRtZW50IGlzIGxpa2VseSBtZWFuaW5nZnVsLgoKYGBge3J9Cgp0ZXN0LnEgPC0gcXZhbHVlKHJlc3VsdHNfMTB4JHB2YWx1ZSkKCnN1bW1hcnkodGVzdC5xKQpgYGAKCgoKYGBge3J9CgpoaXN0KHRlc3QucSkKCmBgYAoKYGBge3J9CgpwbG90KHRlc3QucSkKCmBgYAoKVGhvc2UuLi4gZG9uJ3QgbG9vayBhbnl0aGluZyBsaWtlIHRoZSBleGFtcGxlIGRhdGEuCgpMZXRzIHRyeSBpdCBhZ2Fpbiwgd2l0aCBhIG1vcmUgbGliZXJhbCBjdXRvZmYgcG9pbnQgYXBwbGllZCB0byB0cmltbWluZyB0aGUgZGF0YS4KCmBgYHtyfQoKcmVzdWx0c181eCA8LSByZWFkX2RlbGltKCJ+L0RvY3VtZW50cy9HZW5ld2l6X2hkZC9EYXkxMC9tYWNhdXYyL291dHB1dC81eC5hc3NvYy50eHQiLCAKICAgICJcdCIsIGVzY2FwZV9kb3VibGUgPSBGQUxTRSwgdHJpbV93cyA9IFRSVUUpCgpucm93KHJlc3VsdHNfNXgpCgpoZWFkKHJlc3VsdHNfNXgpCmBgYAoKYGBge3J9CgpxdmFsLjV4IDwtIHF2YWx1ZShyZXN1bHRzXzV4JHB2YWx1ZSkKCnN1bW1hcnkocXZhbC41eCkKYGBgCgpUaGF0J3MuLi4gYSBsaXR0bGUgbW9yZSBoZWFydGVuaW5nLiBUaG91Z2ggc3RpbGwgaXQgc2VlbXMgbGlrZSBhIHZlcnkgbG93IHNpZ25pZmljYW5jZSByYXRlIHZpYSBxLXZhbHVlLgoKYGBge3J9CgpoaXN0KHF2YWwuNXgpCgpgYGAKCmBgYHtyfQoKcGxvdChxdmFsLjV4KQoKYGBgCgpBZGQgYSBuZXcgY2h1bmsgYnkgY2xpY2tpbmcgdGhlICpJbnNlcnQgQ2h1bmsqIGJ1dHRvbiBvbiB0aGUgdG9vbGJhciBvciBieSBwcmVzc2luZyAqQ3RybCtBbHQrSSouCgpXaGVuIHlvdSBzYXZlIHRoZSBub3RlYm9vaywgYW4gSFRNTCBmaWxlIGNvbnRhaW5pbmcgdGhlIGNvZGUgYW5kIG91dHB1dCB3aWxsIGJlIHNhdmVkIGFsb25nc2lkZSBpdCAoY2xpY2sgdGhlICpQcmV2aWV3KiBidXR0b24gb3IgcHJlc3MgKkN0cmwrU2hpZnQrSyogdG8gcHJldmlldyB0aGUgSFRNTCBmaWxlKS4K