7/26/2020

Outline

  • The Problem
  • Current Approaches (and Their Limitations)
  • Our Method
  • Simulation Results
  • Practical Example

Motivating Problem

  • Modern genomics research suffers greatly from the so-called “curse of dimensionality”
  • Matrix decomposition techniques are often employed to reduce the feature space from the tens of thousands to something more manageable
  • Such techniques, taken naively, may ignore vast amounts of known biological information concerning relationships between features
  • Pathway analysis seeks to group genomic features by biological information
  • PathwayPCA is one of many tools useful to bridge matrix decomposition techniques with pathway-level biological information

The pathwayPCA Package

Current Approaches

There are two major approaches to incorporating levels of data (clinical, genetic, proteomic, etc.).

One-Platform Supervised

  • Use single-platform data to find pathways associated with a phenotype, predict class membership or survival, or build semi-supervised networks
  • Allows for in-depth inspection of a single data type, prioritized by clinical relevance
  • Cannot incorporate other platforms to answer systems-level questions



Two-Platform Unsupervised

  • Use multi-platform data to find concurrent highly-active pathways across multiple types of genomic data, cluster features across platform by correlation, or build unsupervised networks
  • Ideal for targeting systems-level questions
  • Cannot incorporate clinical response information directly to prioritize cross-platform information

The pathwayPCA Global Test

  • For binary or classification response, fit the model \(g(\textbf{y}) \sim \beta_1\text{PC}_1 + \beta_2\text{PC}_2\) plus any biologically relevant interactions and/or control factors (such as age or sex). The link function \(g(\cdot)\) will depend on the outcome.
  • For survival response, fit the Cox PH model \(h(t) = h_0(t)\exp\left[\beta_1\text{PC}_1 + \beta_2\text{PC}_2 + \ldots\right]\), including any biologically relevant interactions and/or control factors (such as age or sex).
  • We test the global hypothesis \(\left[\beta_1\ \beta_2\right]^T = \textbf{0}\) with 2 degrees of freedom.
  • To see a detailed walkthrough of this process in practice, see our published supplemental vignette.

Simulation Study Overview

The test sizes for these three compared methods were well-controlled:
0.0493 (Global Test), 0.0156 (NMF), and 0.0148 (sCCA).

Simulation Results

Practical Example

We wanted to highlight disregulated pathways related to overall survival of Ovarian cancer patients. We use the TCGA data as hosted by LinkedOmics, and our source dictionary is available on GitHub

  • We performed pathway analysis independently on RNAseq and proteomics data using WikiPathways, but no pathways were significant after FDR adjustment (\(q = 0.1\)).
  • After performing integrative pathway analysis using our Global Test, the pathways “T Cell Receptor Signaling” and “cell death signaling via NRAGE, NRIF, and NADE”.
  • This example is shown in Section 3.3 of the paper.

Wrapping Up

  • The pathwayPCA Global Test allows for simple but powerful cross-platform integration of eigen-genes and eigen-proteins (and other platforms) within a supervised linear model framework.
  • Future work: for 3+ platforms, matched samples can often be a prohibitive requirement for a method. Our next method relaxes this assumption.
  • We will be building this functionality and our next method into pathwayPCA version 2.

Check out our lab’s current and future work at https://transbioinfolab.org/.

Acknowledgements

  • Thank you to Tiago Silva, Antonio Colaprico, Shirley Sun, Lizhong Liu, Alex Pico, Bing Zhang, and the helpful reviewers at PROTEOMICS!
  • Connect on Twitter: @RevDocGabriel
  • Check out our work on the COVID-19 Pandemic in South Florida: http://miamicovidproject.com/