author: Levi Waldron
date: August 1, 2014
Narrow definition: a synthesis of per-study estimates
Not: pooling of per-patient data
“We understand meta-analysis as being the use of statistical techniques to combine the results of studies addressing the same question into a summary measure.”
Villar et al. (2001)
Classic meta-analysis: Third-generation agents compared with best supportive care. Differences in 1-year survival proportions (Baggstrom et al. 2007).
library(LeviRmisc)
df <- geoPmidLookup(c("GSE26712", "PMID18593951"))
Fixed-effects model: \[ \hat{\theta_k} = \theta + \sigma_k \epsilon_k, \; \epsilon_k \stackrel{iid}{\sim} N(0, 1) \]
Random-effects model: \[ \hat{\theta_k} = \theta + \mu_k + \sigma_k \epsilon_k, \; \epsilon_k \stackrel{iid}{\sim} N(0, 1); \mu_k \stackrel{iid}{\sim} N(0, \tau^2) \]
\[ \begin{align*} \textrm{where: }\hat{\theta_k} &= \textrm{effect size estimate from study}\, k \\ \theta &= \textrm{synthesized effect size} \\ \sigma_k &= \textrm{standard error of effect size in study}\, k \\ \epsilon_k &= \textrm{within-study error term}\\ \mu_k &= \textrm{between-study (random-effects) error term in RE model}\\ \tau^2 &= \textrm{variance of "true effects" across studies in RE model} \end{align*} \]
\[ \hat{\theta_k} = \theta + \sigma_k \epsilon_k, \; \epsilon_k \stackrel{iid}{\sim} N(0, 1) \]
Maximum-likelihood estimate of \( \theta \) under fixed-effects model is:
\[ \hat{\theta}_F = \frac{\sum\limits_{k=1}^{k}w_k \hat{\theta}_k}{\sum\limits_{k=1}^k w_k}\\ \textrm{where: } w_k = 1 / \sigma_k^2 \]
i.e., the synthesized effect is the average of the study-specific effects, weighted by the inverse squared standard error
\[ \begin{align*} \textrm{where }w_k &= 1 / \sigma_k^2,\\ \sigma_k &= \textrm{study-specific standard errors} \end{align*} \]
\[ \begin{align*} \scriptsize \hat{\theta}_F &= \scriptsize \frac{\sum\limits_{k=1}^{k}w^*_k \hat{\theta}_k}{\sum\limits_{k=1}^k w^*_k}\\ \textrm{where: } w^*_k &= 1 / (\tau^2 + \sigma_k^2)\\ \end{align*} \]
Standard hypothesis test for heterogeneity: under the null hypothesis of no heterogeneity between studies (\( \tau = 0 \)), \[ Q \sim \chi^2_{K-1} \]
Standard descriptions of heterogeneity:
For further info:
Load the curatedOvarianData package, look at available datasets:
library(curatedOvarianData)
data(package="curatedOvarianData")
Load (and check out) rules defined in default configuration file:
download.file("https://bitbucket.org/lwaldron/ovrc4_sigvalidation/raw/tip/input/patientselection.config", method="wget", destfile="patientselection.config")
source("patientselection.config")
impute.missing <- TRUE
keep.common.only <- TRUE
Create list of ExpressionSets meeting criteria:
download.file("https://bitbucket.org/lwaldron/ovrc4_sigvalidation/raw/tip/src/createEsetList_source.R", method="wget", destfile="createEsetList.R")
source("createEsetList.R")
length(esets)
[1] 10
runCox <- function(eset, probeset="CXCL12"){
library(survival)
eset$y <- Surv(eset$days_to_death, eset$vital_status == "deceased")
if(probeset %in% featureNames(eset)){
obj <- coxph(eset$y ~ scale(t(exprs(eset[probeset, ]))[, 1]))
output <- c(obj$coefficients, sqrt(obj$var))
names(output) <- c("log.HR", "SE")
}else{output <- NULL}
output}
runCox(esets[[1]])
log.HR SE
0.1080 0.1167
study.coefs <- t(sapply(esets, runCox)); head(study.coefs)
log.HR SE
E.MTAB.386_eset 0.108038 0.11671
GSE13876_eset -0.015534 0.12165
GSE17260_eset 0.196605 0.22132
GSE18520_eset 0.004335 0.15786
GSE19829.GPL8300_eset 0.072413 0.19658
GSE26712_eset 0.192926 0.09192
library(metafor)
res.fe <- metafor::rma(yi=study.coefs[, 1], sei=study.coefs[, 2], method="FE")
forest.rma(res.fe, slab=gsub("_eset$","",rownames(study.coefs)), atransf=exp)
(res.re <- metafor::rma(yi=study.coefs[, 1], sei=study.coefs[, 2], method="DL"))
Random-Effects Model (k = 10; tau^2 estimator: DL)
tau^2 (estimated amount of total heterogeneity): 0 (SE = 0.0073)
tau (square root of estimated tau^2 value): 0
I^2 (total heterogeneity / total variability): 0.00%
H^2 (total variability / sampling variability): 1.00
Test for Heterogeneity:
Q(df = 9) = 5.3487, p-val = 0.8029
Model Results:
estimate se zval pval ci.lb ci.ub
0.0966 0.0381 2.5334 0.0113 0.0219 0.1714 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
if( !require("survHD") || package.version("survHD") != "0.5.0" ){
library(devtools)
install_url("https://bitbucket.org/lwaldron/survhd/downloads/survHD_0.5.0.tar.gz")
}
download.file("https://bitbucket.org/lima1/ovrc4_signew/raw/tip/src/metaCMA.R", destfile="metaCMA.R", method="wget")
source("metaCMA.R")
gene.coefs <- metaCMA.coefs(esets)
FE.res <- metaCMA.opt(esets=esets, coefs=gene.coefs, rma.method="FE", n=200)
Riester et al. JNCI 2014 - see Bitbucket page to reproduce paper and find helpful tools
LODO.res <- metaCMA(esets,coefs=gene.coefs,n=200, rma.method="FE")
Leave-one-dataset-out validation of a survival signature. (Riester et al. JNCI 2014)
“Improvement over random signatures (IOR)” score of gene signatures relative to random gene signatures, equalizing the influences of authors’ algorithms for generating risk scores, quality of the original training data, and gene signature size (Waldron et al. JNCI 2014).
source scripts: genMumatrix.R and analyseMumatrix.R