This vignette is running the same analysis as TCGA_5studies.Rmd but with the PCAmodel
with a fewer clusters.
suppressPackageStartupMessages({
library(PCAGenomicSignatures)
library(dplyr)
library(ggplot2)
})
load("data/TCGA_validationDatasets.rda")
datasets <- TCGA_validationDatasets[1:5]
This model used 1,399 studies that have more than 20 successfully imported samples.
We collected top 20 PCs from each study and subset to 7,951 common genes from 90%
varying genes of each study. The number of cluster was set with d = 2.25.
PCAmodel
## class: PCAGenomicSignatures
## dim: 7951 12436
## metadata(6): cluster size ... MeSH_freq updateNote
## assays(1): model
## rownames(7951): 5S_rRNA 7SK ... SLC16A3 SLC38A2
## rowData names(0):
## colnames(12436): RAV1 RAV2 ... RAV12435 RAV12436
## colData names(4): RAV studies silhouetteWidth gsea
## trainingData(2): PCAsummary MeSH
## trainingData names(1399): DRP000499 ERP023890 ... SRP188526 SRP189762
updateNote(PCAmodel)
## [1] "1,399 refine.bio studies/ top 90% varying genes/ GSEA with MSigDB C2"
# This process takes little time due to the size of datasets.
val_all <- validate(datasets, PCAmodel)
heatmapTable(val_all, scoreCutoff = 0.75)
It seems like RAV656 is specific to BRCA while RAV5964 is stronglyassociated with lung cancer. Different from PCAmodels with 536 studies, there is no colon/rectal cancers-specific RAV.
val_brca <- validate(datasets[["BRCA"]], PCAmodel)
heatmapTable(val_brca)
val_coad <- validate(datasets[["LUAD"]], PCAmodel)
heatmapTable(val_coad)
ind <- 656
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
## studyName
## 279 ERP016798
## 918 SRP019936
## 4265 SRP111102
## 5047 SRP133401
## 5240 SRP141444
## title
## 279 Whole transcriptome profiling of 63 breast cancer tumours
## 918 An Integrated Model of the Transcriptome Landscape of HER2-Positive Breast Cancer
## 4265 Genome-wide multi-omics profiling reveals extensive genetic complexity in 8p11-p12 amplified breast carcinomas [RNA-seq]
## 5047 Expression profiling by RNA-Seq of breast cancer samples from patients in walnut-consuming and control groups
## 5240 RNAseq of Breast cancer PDX samples
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
## RAV656
## Up_1 There is no enriched pathway
## Up_2 There is no enriched pathway
## Up_3 There is no enriched pathway
## Up_4 There is no enriched pathway
## Up_5 There is no enriched pathway
## Up_6 There is no enriched pathway
## Up_7 There is no enriched pathway
## Up_8 There is no enriched pathway
## Up_9 There is no enriched pathway
## Up_10 There is no enriched pathway
drawWordcloud(PCAmodel, ind)
ind <- 655
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
## studyName
## 279 ERP016798
## 4275 SRP111343
## title
## 279 Whole transcriptome profiling of 63 breast cancer tumours
## 4275 RNAseq analysis of chemotherapy and radiation therapy-naïve breast tumors
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
## RAV655
## Up_1 There is no enriched pathway
## Up_2 There is no enriched pathway
## Up_3 There is no enriched pathway
## Up_4 There is no enriched pathway
## Up_5 There is no enriched pathway
## Up_6 There is no enriched pathway
## Up_7 There is no enriched pathway
## Up_8 There is no enriched pathway
## Up_9 There is no enriched pathway
## Up_10 There is no enriched pathway
drawWordcloud(PCAmodel, ind)
ind <- 6183
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
## studyName
## 2710 SRP072176
## 2907 SRP075396
## 3202 SRP081022
## 3620 SRP095644
## 4880 SRP128653
## 5412 SRP149865
## 5713 SRP158625
## 5884 SRP165962
## title
## 2710 Ridaforolimus (MK-8669) synergizes with Dalotuzumab (MK-0646) in hormone-sensitive breast cancer
## 2907 Xenograft sequencing to elucidate molecular events following anti-CD44 treatment
## 3202 MicroRNAs underlie genome-wide transcriptome and translatome regulation in asthma as revealed by Frac-seq (RNA-Seq)
## 3620 Gene expression profiling of histone deacetylase inhibition in triple-negative breast cancer
## 4880 Human iPS-derived astroglia from a stable neural precursor state; improved functionality compared to conventional astrocytic models
## 5412 Knockout of ER membrane protein complex subunits
## 5713 Sequencing of polysome-associated mRNA in VSV infected HeLa cells
## 5884 Genome-wide transcriptional analysis of human iPSC-derived healthy control vs. schizophrenia cortical interneurons
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
## RAV6183
## Up_1 There is no enriched pathway
## Up_2 There is no enriched pathway
## Up_3 There is no enriched pathway
## Up_4 There is no enriched pathway
## Up_5 There is no enriched pathway
## Up_6 There is no enriched pathway
## Up_7 There is no enriched pathway
## Up_8 There is no enriched pathway
## Up_9 There is no enriched pathway
## Up_10 There is no enriched pathway
drawWordcloud(PCAmodel, ind)
ind <- 5964
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
## studyName
## 2603 SRP069760
## 2855 SRP074349
## 4692 SRP125001
## title
## 2603 RNA Sequencing Facilitates Quantitative Analysis of Transcriptomes in Human Normal and Cancerous Tissues
## 2855 Next Generation Sequencing (RNAseq) of non-small cell lung cancer
## 4692 Multi-Omic Molecular Profiling of Lung Cancer Risk in Chronic Obstructive Pulmonary Disease
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
## RAV5964
## Up_1 FOURNIER_ACINAR_DEVELOPMENT_LATE_DN
## Up_2 KEGG_RNA_DEGRADATION
## Up_3 <NA>
## Up_4 <NA>
## Up_5 <NA>
## Up_6 <NA>
## Up_7 <NA>
## Up_8 <NA>
## Up_9 <NA>
## Up_10 <NA>
drawWordcloud(PCAmodel, ind)
Here, we added SLE-WB microarray dataset and 4 colon cancer microarray dataasets
to 5 TCGA dataset and scoreCutoff is set to 0.68 instead of the default 0.7.
## Warning: NaNs produced
## Warning: NaNs produced
names(new_datasets)
## [1] "COAD" "BRCA" "LUAD" "READ" "UCEC" "SLE"
## [7] "GSE14095" "GSE17536" "GSE2109" "GSE39582"
Based on this multi-datasets validation table,
- RAV55 is SLE-specific
ind <- 55
findStudiesInCluster(PCAmodel, ind, studyTitle = TRUE)
## studyName
## 16 DRP001953
## 174 ERP009992
## 411 ERP106451
## 539 ERP115010
## 1749 SRP051848
## 2024 SRP059039
## 2209 SRP062966
## 2367 SRP065865
## 4113 SRP107901
## 5004 SRP132709
## 5289 SRP144588
## 6074 SRP175005
## 6231 SRP187105
## title
## 16 Interactive Transcriptome Analysis of Malaria Patients and Infecting Plasmodium falciparum in Indonesia
## 174 The definition of SCA2 blood RNA biomarkers highlights Ataxin-2 as strong modifier of mitochondrial factors like PINK1
## 411 Dual RNA-seq of peripheral blood from Gambian children with severe or uncomplicated Plasmodium falciparum malaria
## 539 Blood RNAseq of TB contacts samples
## 1749 Gene Networks Specific for Innate Immunity Define Post-traumatic Stress Disorder [RNA-Seq]
## 2024 Elucidating the etiology and molecular pathogenicity of infectious diarrhea by high throughput RNA sequencing
## 2209 SLE lupus RNA-seq
## 2367 Gene Networks and Blood Biomarkers of Methamphetamine-Associated Psychosis: A Preliminary Integrative RNA-Sequencing Report
## 4113 Identification of reference genes for normalizing the levels of circulating RNA transcripts in pregnant women based on whole-transcriptome data
## 5004 Whole blood transcriptome analysis of Septic shock patients according to early therapy response
## 5289 Targeted sequencing based maternal whole blood expression changes with gestational age and labor in normal pregnancy
## 6074 Transcriptomic Responses to Lumacaftor/Ivacaftor Therapy in Cystic Fibrosis
## 6231 NCBI GEO Submission of human whole blood transcriptomes in response to a high-fat meal
subsetEnrichedPathways(PCAmodel, ind) %>% as.data.frame
## RAV55
## Up_1 REACTOME_SIGNALING_BY_GPCR
## Up_2 <NA>
## Up_3 <NA>
## Up_4 <NA>
## Up_5 <NA>
## Up_6 <NA>
## Up_7 <NA>
## Up_8 <NA>
## Up_9 <NA>
## Up_10 <NA>
drawWordcloud(PCAmodel, ind)