Meta-analysis

Levi Waldron
CUNY School of Public Health
http://www.waldronlab.org

July 15, 2016

Outline

Scope: what is meta-analysis?

“We understand meta-analysis as being the use of statistical techniques to combine the results of studies addressing the same question into a summary measure.”

Villar et al. (2001)

Preparation: downloading datasets

Preparation: downloading datasets (cont’d)

## BiocLite("lwaldron/LeviRmisc")
library(LeviRmisc)  
df <- geoPmidLookup(c("GSE26712", "PMID18593951"))
## [1] "WARNING: please set your email using Sys.setenv(email='name@email.com')"
df[, c(1:3, 15, 16)]
##              pubMedIds platform_accession platform_summary     journal
## GSE26712      18593951              GPL96          hgu133a Cancer Res.
## PMID18593951  18593951               <NA>             <NA> Cancer Res.
##              volume
## GSE26712         68
## PMID18593951     68

Preparation: curation

Preparation: preprocessing and gene mapping

Preparation: duplicate checking

C: Matching RNAseq to microarray, D: matching cell lines between CCLE and NCI-60

C: Matching RNAseq to microarray, D: matching cell lines between CCLE and NCI-60

Waldron L, et al.: The Doppelgänger Effect: Hidden Duplicates in Databases of Transcriptome Profiles. J. Natl. Cancer Inst. 2016, 108.

Fixed and Random Effects Synthesis

Assessing Heterogeneity

Example 1: Is CXCL12 gene a prognostic factor for ovarian cancer?

Load the curatedOvarianData package, look at available datasets:

library(curatedOvarianData)
data(package="curatedOvarianData")

Load (and check out) rules defined in default configuration file:

downloader::download("https://bitbucket.org/lwaldron/ovrc4_sigvalidation/raw/tip/input/patientselection.config", 
                     destfile="patientselection.config")
source("patientselection.config")
impute.missing <- TRUE
keep.common.only <- TRUE

Example 1 (cont’d)

runCox <- function(eset, probeset="CXCL12"){
  library(survival)
  eset$y <- Surv(eset$days_to_death, eset$vital_status == "deceased")
  if(probeset %in% featureNames(eset)){
    obj <- coxph(eset$y ~ scale(t(exprs(eset[probeset, ]))[, 1]))
    output <- c(obj$coefficients, sqrt(obj$var))
    names(output) <- c("log.HR", "SE")
  }else{output <- NULL}
    output}
runCox(esets[[1]])
##    log.HR        SE 
## 0.1080378 0.1167063

Example 1 (cont’d)

(study.coefs <- t(sapply(esets, runCox)))
##                             log.HR         SE
## E.MTAB.386_eset        0.108037829 0.11670634
## GSE13876_eset         -0.015533625 0.12165106
## GSE17260_eset          0.196604844 0.22132140
## GSE18520_eset          0.004334577 0.15785733
## GSE19829.GPL8300_eset  0.072413433 0.19658498
## GSE26193_eset         -0.035518891 0.16886806
## GSE26712_eset          0.205703027 0.09889057
## GSE32062.GPL6480_eset -0.035661806 0.17253159
## GSE49997_eset          0.386074941 0.17795245
## GSE51088_eset          0.208534008 0.11565319
## GSE9891_eset          -0.015481600 0.11555760
## PMID17290060_eset      0.356194786 0.14969168
## TCGA_eset              0.102434252 0.07029190
## TCGA.RNASeqV2_eset     0.077791413 0.10215517

Example 1 (cont’d): forest plot

library(metafor)
res.fe <- metafor::rma(yi=study.coefs[, 1], sei=study.coefs[, 2], method="FE")
forest.rma(res.fe, slab=gsub("_eset$","",rownames(study.coefs)), atransf=exp)

Example 1 (cont’d): FE vs. RE

(res.re <- metafor::rma(yi=study.coefs[, 1], sei=study.coefs[, 2], method="DL"))
## 
## Random-Effects Model (k = 14; tau^2 estimator: DL)
## 
## tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0062)
## tau (square root of estimated tau^2 value):      0
## I^2 (total heterogeneity / total variability):   0.00%
## H^2 (total variability / sampling variability):  1.00
## 
## Test for Heterogeneity: 
## Q(df = 13) = 11.2219, p-val = 0.5922
## 
## Model Results:
## 
## estimate       se     zval     pval    ci.lb    ci.ub          
##   0.1108   0.0329   3.3664   0.0008   0.0463   0.1754      *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 1 (cont’d): closing comments

Example 2: Leave-one-dataset-out validation

Leave-one-dataset-out validation of a survival signature. (Riester et al. JNCI 2014)

Example 3: Leave-one-dataset-in validation

Leave-one-dataset-in validation (cont’d)

“Improvement over random signatures (IOR)” score of gene signatures relative to random gene signatures, equalizing the influences of authors’ algorithms for generating risk scores, quality of the original training data, and gene signature size (Waldron et al. JNCI 2014).

Meta-analysis summary

Resources in Bioconductor

Pasolli E et al.: Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights. PLoS Comput. Biol. 2016, 12:e1004977.

curatedMetagenomicData and ExperimentHub

library(ExperimentHub)
eh = ExperimentHub()
myquery = query(eh, "curatedMetagenomicData")
myquery
View(mcols(myquery))
subquery = display(myquery)
taxabund = eh[["EH2"]]
taxabund
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 3302 features, 38 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: H10 H11 ... IT8 (38 total)
##   varLabels: dataset_name sampleID ... group (211 total)
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
##   pubMedIds: 25981789 
## Annotation: NA