curatedMetagenomicData
and MultiAssayExperiment
“We understand meta-analysis as being the use of statistical techniques to combine the results of studies addressing the same question into a summary measure.”
Villar et al. (2001)
GEOquery::getGEO()
is a workshorse
getGEO2()
: consolidate and simplify getGEO()
outputgeoPmidLookup()
: look up experiment and publication data from GEO and Pubmed, put in dataframe## BiocLite("lwaldron/LeviRmisc")
library(LeviRmisc)
df <- geoPmidLookup(c("GSE26712", "PMID18593951"))
## [1] "WARNING: please set your email using Sys.setenv(email='name@email.com')"
df[, c(1:3, 15, 16)]
## pubMedIds platform_accession platform_summary journal
## GSE26712 18593951 GPL96 hgu133a Cancer Res.
## PMID18593951 18593951 <NA> <NA> Cancer Res.
## volume
## GSE26712 68
## PMID18593951 68
C: Matching RNAseq to microarray, D: matching cell lines between CCLE and NCI-60
Waldron L, et al.: The Doppelgänger Effect: Hidden Duplicates in Databases of Transcriptome Profiles. J. Natl. Cancer Inst. 2016, 108.
Q-test: Under the null hypothesis of no heterogeneity between studies (\(\tau = 0\)), \[ Q \sim \chi^2_{K-1} \]
Load the curatedOvarianData package, look at available datasets:
library(curatedOvarianData)
data(package="curatedOvarianData")
Load (and check out) rules defined in default configuration file:
downloader::download("https://bitbucket.org/lwaldron/ovrc4_sigvalidation/raw/tip/input/patientselection.config",
destfile="patientselection.config")
source("patientselection.config")
impute.missing <- TRUE
keep.common.only <- TRUE
runCox <- function(eset, probeset="CXCL12"){
library(survival)
eset$y <- Surv(eset$days_to_death, eset$vital_status == "deceased")
if(probeset %in% featureNames(eset)){
obj <- coxph(eset$y ~ scale(t(exprs(eset[probeset, ]))[, 1]))
output <- c(obj$coefficients, sqrt(obj$var))
names(output) <- c("log.HR", "SE")
}else{output <- NULL}
output}
runCox(esets[[1]])
## log.HR SE
## 0.1080378 0.1167063
(study.coefs <- t(sapply(esets, runCox)))
## log.HR SE
## E.MTAB.386_eset 0.108037829 0.11670634
## GSE13876_eset -0.015533625 0.12165106
## GSE17260_eset 0.196604844 0.22132140
## GSE18520_eset 0.004334577 0.15785733
## GSE19829.GPL8300_eset 0.072413433 0.19658498
## GSE26193_eset -0.035518891 0.16886806
## GSE26712_eset 0.205703027 0.09889057
## GSE32062.GPL6480_eset -0.035661806 0.17253159
## GSE49997_eset 0.386074941 0.17795245
## GSE51088_eset 0.208534008 0.11565319
## GSE9891_eset -0.015481600 0.11555760
## PMID17290060_eset 0.356194786 0.14969168
## TCGA_eset 0.102434252 0.07029190
## TCGA.RNASeqV2_eset 0.077791413 0.10215517
library(metafor)
res.fe <- metafor::rma(yi=study.coefs[, 1], sei=study.coefs[, 2], method="FE")
forest.rma(res.fe, slab=gsub("_eset$","",rownames(study.coefs)), atransf=exp)
(res.re <- metafor::rma(yi=study.coefs[, 1], sei=study.coefs[, 2], method="DL"))
##
## Random-Effects Model (k = 14; tau^2 estimator: DL)
##
## tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0062)
## tau (square root of estimated tau^2 value): 0
## I^2 (total heterogeneity / total variability): 0.00%
## H^2 (total variability / sampling variability): 1.00
##
## Test for Heterogeneity:
## Q(df = 13) = 11.2219, p-val = 0.5922
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## 0.1108 0.0329 3.3664 0.0008 0.0463 0.1754 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Leave-one-dataset-out validation of a survival signature. (Riester et al. JNCI 2014)
“Improvement over random signatures (IOR)” score of gene signatures relative to random gene signatures, equalizing the influences of authors’ algorithms for generating risk scores, quality of the original training data, and gene signature size (Waldron et al. JNCI 2014).
curatedMetagenomicData
, available through ExperimentHub
in bioc-devel
ExpressionSet
objects per dataset:
phyloseq
);Pasolli E et al.: Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights. PLoS Comput. Biol. 2016, 12:e1004977.
library(ExperimentHub)
eh = ExperimentHub()
myquery = query(eh, "curatedMetagenomicData")
myquery
View(mcols(myquery))
subquery = display(myquery)
taxabund = eh[["EH2"]]
taxabund
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 3302 features, 38 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: H10 H11 ... IT8 (38 total)
## varLabels: dataset_name sampleID ... group (211 total)
## varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## pubMedIds: 25981789
## Annotation: NA