Contents

1 Setup

This vignette is running the same analysis as TCGA_5studies.Rmd but with the PCAmodel with a fewer clusters.

1.1 Load packages

suppressPackageStartupMessages({
  library(PCAGenomicSignatures)
  library(dplyr)
  library(ggplot2)
})

1.2 TCGA datasets

load("data/TCGA_validationDatasets.rda")
datasets <- TCGA_validationDatasets[1:5]

1.3 PCAmodel

This model used 1,399 studies that have more than 20 successfully imported samples. We collected top 20 PCs from each study and subset to 7,951 common genes from 90% varying genes of each study. The number of cluster was set with d = 2.25.

PCAmodel
## class: PCAGenomicSignatures 
## dim: 7951 12436 
## metadata(6): cluster size ... MeSH_freq updateNote
## assays(1): model
## rownames(7951): 5S_rRNA 7SK ... SLC16A3 SLC38A2
## rowData names(0):
## colnames(12436): RAV1 RAV2 ... RAV12435 RAV12436
## colData names(4): RAV studies silhouetteWidth gsea
## trainingData(2): PCAsummary MeSH
## trainingData names(1399): DRP000499 ERP023890 ... SRP188526 SRP189762
updateNote(PCAmodel)
## [1] "1,399 refine.bio studies/ top 90% varying genes/ GSEA with MSigDB C2"

2 Multi-comparison (only TCGA)

2.1 heatmapTable (all)

# This process takes little time due to the size of datasets.
val_all <- validate(datasets, PCAmodel)
heatmapTable(val_all, scoreCutoff = 0.75) 

It seems like RAV656 is specific to BRCA while RAV5964 is stronglyassociated with lung cancer. Different from PCAmodels with 536 studies, there is no colon/rectal cancers-specific RAV.

2.2 heatmapTable - BRCA

val_brca <- validate(datasets[["BRCA"]], PCAmodel)
heatmapTable(val_brca)

2.3 heatmapTable - LUAD

val_coad <- validate(datasets[["LUAD"]], PCAmodel)
heatmapTable(val_coad) 

2.4 MeSH terms and associated studies

2.4.1 BRCA-associated

ind <- 656
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
##      studyName
## 279  ERP016798
## 918  SRP019936
## 4265 SRP111102
## 5047 SRP133401
## 5240 SRP141444
##                                                                                                                         title
## 279                                                                 Whole transcriptome profiling of 63 breast cancer tumours
## 918                                         An Integrated Model of the Transcriptome Landscape of HER2-Positive Breast Cancer
## 4265 Genome-wide multi-omics profiling reveals extensive genetic complexity in 8p11-p12 amplified breast carcinomas [RNA-seq]
## 5047            Expression profiling by RNA-Seq of breast cancer samples from patients in walnut-consuming and control groups
## 5240                                                                                      RNAseq of Breast cancer PDX samples
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
##                             RAV656
## Up_1  There is no enriched pathway
## Up_2  There is no enriched pathway
## Up_3  There is no enriched pathway
## Up_4  There is no enriched pathway
## Up_5  There is no enriched pathway
## Up_6  There is no enriched pathway
## Up_7  There is no enriched pathway
## Up_8  There is no enriched pathway
## Up_9  There is no enriched pathway
## Up_10 There is no enriched pathway
drawWordcloud(PCAmodel, ind)

ind <- 655
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
##      studyName
## 279  ERP016798
## 4275 SRP111343
##                                                                          title
## 279                  Whole transcriptome profiling of 63 breast cancer tumours
## 4275 RNAseq analysis of chemotherapy and radiation therapy-naïve breast tumors
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
##                             RAV655
## Up_1  There is no enriched pathway
## Up_2  There is no enriched pathway
## Up_3  There is no enriched pathway
## Up_4  There is no enriched pathway
## Up_5  There is no enriched pathway
## Up_6  There is no enriched pathway
## Up_7  There is no enriched pathway
## Up_8  There is no enriched pathway
## Up_9  There is no enriched pathway
## Up_10 There is no enriched pathway
drawWordcloud(PCAmodel, ind)

ind <- 6183
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
##      studyName
## 2710 SRP072176
## 2907 SRP075396
## 3202 SRP081022
## 3620 SRP095644
## 4880 SRP128653
## 5412 SRP149865
## 5713 SRP158625
## 5884 SRP165962
##                                                                                                                                    title
## 2710                                    Ridaforolimus (MK-8669) synergizes with Dalotuzumab (MK-0646) in hormone-sensitive breast cancer
## 2907                                                    Xenograft sequencing to elucidate molecular events following anti-CD44 treatment
## 3202                 MicroRNAs underlie genome-wide transcriptome and translatome regulation in asthma as revealed by Frac-seq (RNA-Seq)
## 3620                                        Gene expression profiling of histone deacetylase inhibition in triple-negative breast cancer
## 4880 Human iPS-derived astroglia from a stable neural precursor state; improved functionality compared to conventional astrocytic models
## 5412                                                                                    Knockout of ER membrane protein complex subunits
## 5713                                                                   Sequencing of polysome-associated mRNA in VSV infected HeLa cells
## 5884                  Genome-wide transcriptional analysis of human iPSC-derived healthy control vs. schizophrenia cortical interneurons
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
##                            RAV6183
## Up_1  There is no enriched pathway
## Up_2  There is no enriched pathway
## Up_3  There is no enriched pathway
## Up_4  There is no enriched pathway
## Up_5  There is no enriched pathway
## Up_6  There is no enriched pathway
## Up_7  There is no enriched pathway
## Up_8  There is no enriched pathway
## Up_9  There is no enriched pathway
## Up_10 There is no enriched pathway
drawWordcloud(PCAmodel, ind)

2.4.2 LUAD-associated

ind <- 5964
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
##      studyName
## 2603 SRP069760
## 2855 SRP074349
## 4692 SRP125001
##                                                                                                         title
## 2603 RNA Sequencing Facilitates Quantitative Analysis of Transcriptomes in Human Normal and Cancerous Tissues
## 2855                                        Next Generation Sequencing (RNAseq) of non-small cell lung cancer
## 4692              Multi-Omic Molecular Profiling of Lung Cancer Risk in Chronic Obstructive Pulmonary Disease
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
##                                   RAV5964
## Up_1  FOURNIER_ACINAR_DEVELOPMENT_LATE_DN
## Up_2                 KEGG_RNA_DEGRADATION
## Up_3                                 <NA>
## Up_4                                 <NA>
## Up_5                                 <NA>
## Up_6                                 <NA>
## Up_7                                 <NA>
## Up_8                                 <NA>
## Up_9                                 <NA>
## Up_10                                <NA>
drawWordcloud(PCAmodel, ind)

3 Multi-comparison (TCGA + others)

3.1 heatmapTable (all)

Here, we added SLE-WB microarray dataset and 4 colon cancer microarray dataasets to 5 TCGA dataset and scoreCutoff is set to 0.68 instead of the default 0.7.

## Warning: NaNs produced

## Warning: NaNs produced
names(new_datasets)
##  [1] "COAD"     "BRCA"     "LUAD"     "READ"     "UCEC"     "SLE"     
##  [7] "GSE14095" "GSE17536" "GSE2109"  "GSE39582"

Based on this multi-datasets validation table,
- RAV55 is SLE-specific

3.2 MeSH terms and associated studies

3.2.1 SLE-associated

ind <- 55
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
##      studyName
## 16   DRP001953
## 174  ERP009992
## 411  ERP106451
## 539  ERP115010
## 1749 SRP051848
## 2024 SRP059039
## 2209 SRP062966
## 2367 SRP065865
## 4113 SRP107901
## 5004 SRP132709
## 5289 SRP144588
## 6074 SRP175005
## 6231 SRP187105
##                                                                                                                                                title
## 16                                           Interactive Transcriptome Analysis of Malaria Patients and Infecting Plasmodium falciparum in Indonesia
## 174                           The definition of SCA2 blood RNA biomarkers highlights Ataxin-2 as strong modifier of mitochondrial factors like PINK1
## 411                                Dual RNA-seq of peripheral blood from Gambian children with severe or uncomplicated Plasmodium falciparum malaria
## 539                                                                                                              Blood RNAseq of TB contacts samples
## 1749                                                      Gene Networks Specific for Innate Immunity Define Post-traumatic Stress Disorder [RNA-Seq]
## 2024                                   Elucidating the etiology and molecular pathogenicity of infectious diarrhea by high throughput RNA sequencing
## 2209                                                                                                                               SLE lupus RNA-seq
## 2367                     Gene Networks and Blood Biomarkers of Methamphetamine-Associated Psychosis: A Preliminary Integrative RNA-Sequencing Report
## 4113 Identification of reference genes for normalizing the levels of circulating RNA transcripts in pregnant women based on whole-transcriptome data
## 5004                                                 Whole blood transcriptome analysis of Septic shock patients according to early therapy response
## 5289                            Targeted sequencing based maternal whole blood expression changes with gestational age and labor in normal pregnancy
## 6074                                                                     Transcriptomic Responses to Lumacaftor/Ivacaftor Therapy in Cystic Fibrosis
## 6231                                                          NCBI GEO Submission of human whole blood transcriptomes in response to a high-fat meal
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
##                            RAV55
## Up_1  REACTOME_SIGNALING_BY_GPCR
## Up_2                        <NA>
## Up_3                        <NA>
## Up_4                        <NA>
## Up_5                        <NA>
## Up_6                        <NA>
## Up_7                        <NA>
## Up_8                        <NA>
## Up_9                        <NA>
## Up_10                       <NA>
drawWordcloud(PCAmodel, ind)