JGL

Tim Triche, Jr.
November 21st, 2013

The problem

RAM usage: 60GB for 33400 features x 12 samples
Speed: ~24 hours when using 'memory.efficient'
Data: not published yet, so let's look at a different dataset

The Setup (blood stem cells)

require(GEOquery)
#require(genefilter)
#require(hgu133plus2.db)
#agingHSCs <- getGEO('GSE32719')[[1]]
#agingHSCs$agegroup <- relevel(factor(agingHSCs$source_name_ch1), 3)
#annotation(agingHSCs) <- 'hgu133plus2'
#aging <- nsFilter(agingHSCs)$eset
theFile <- url("http://www.anamnetic.com/JGL/agingHSCs.rds")
aging <- readRDS(theFile)
close(theFile)
byAgeGroup <- split.data.frame(t(exprs(aging)), aging$agegroup)

The fun part

require(JGL)
lambda1 <- 0.99
lambda2 <- 0.01
# running the next step seems to take 
# upwards of an hour on a box with 48GB RAM
# solution <- JGL(byAgeGroup, 'fused', lambda1, lambda2, weights='sample.size')
# cool!  It worked! (slowly)

Suggestions from the author

JGL is equivalent to glasso when lambda2=0, so…
Pre-prune edges with 'glasso' (note penalize.diag differences!)
Then use at least as high a lambda1 as for glasso to feed JGL.
In an ideal world, first break apart the data, THEN fit JGL.
The next step takes a few hours to fit.

Feeding the MLE of cov(X) to glasso

require(glasso)
# sigmaHat <- cov(t(exprs(aging))) * ((ncol(aging)-1)/ncol(aging))
# soln.glasso <- glasso(sigmaHat, rho=lambda1, penalize.diagonal=FALSE)
# inspect the glasso solution:

Plotting the initial results

require(JGL)
theOtherFile <- url("http://www.anamnetic.com/JGL/JGL.soln.rds", open="r")
solution <- readRDS(theOtherFile)
close(theOtherFile)
plot(solution) # kinda slow

plot of chunk unnamed-chunk-4

The fun part (the sequel)

# all entries in theta have same size...
features <- intersect(rownames(solution$theta[[1]]), colnames(solution$theta[[1]]))
aging2 <- t(exprs(aging)[features,])
byAgeGroup2 <- split.data.frame(t(exprs(aging)[features,]), aging$agegroup)
unlist(lapply(byAgeGroup2, nrow)) ## subjects

 Young HSC Middle HSC    Old HSC 
        14          5          8

unlist(lapply(byAgeGroup2, ncol)) ## features

 Young HSC Middle HSC    Old HSC 
      1677       1677       1677

The (smallified) solution

# solution.shrunkMore <- JGL(Y=byAgeGroup2, lambda1=0.999, lambda2=0.1, 
#                           warm=solution$theta, weights='sample.size')
# starting ADMM
# Error in while ((iter == 0) || (iter < maxiter && diff_value > tol)) { : 
#   missing value where TRUE/FALSE needed

Oops

The results

At the moment, until I feed an MLE for the overall covariance matrix to glasso() and shrink out substantial numbers of edges, I can't claim to be Doing It Right… but the results so far are already somewhat promising and I hope to resolve this soon.