This script performs pairwise testing for gene-expression results via Affymetrix gene chip, visualizes the results, places the differentially-expressed genes into pathways via GO and Reactome, and performs pathway enrichment analyses.
Sources: The second half of this script (pathway enrichment ) draws from https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html
You should already have prepared test matrix files according to limma protocols
Annotation <- read.csv("HTA-2_0.na35.2.hg19.transcript.csv/HTA-2_0.na35.2.hg19.transcript.csv", header=TRUE, skip=19)
probe.gene.map <- read.csv("probe.gene.map.csv")
Differential abundance can only be performed on pairs of sample sets. Each pairwise test has to be configured as a separate file. Importing the test matrices:
targets <- read.csv("targets.noDrug.Strain.txt", as.is=T)
all of the samples and metadata are listed here:
targets
## FileName Strain Treatment Experiment
## 1 20160505_HTA2.0_0316_WTH_1R_(HTA-2_0).CEL GF control 1
## 2 20160505_HTA2.0_0318_WTH_3R_(HTA-2_0).CEL GFR control 1
## 3 20160616_HTA2.0_0333_WTH_1_R1_(HTA-2_0).CEL GF control 2
## 4 20160616_HTA2.0_0335_WTH_3_R2_(HTA-2_0).CEL GFR control 2
dat <- read.celfiles(targets$FileName)
## Reading in : 20160505_HTA2.0_0316_WTH_1R_(HTA-2_0).CEL
## Reading in : 20160505_HTA2.0_0318_WTH_3R_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0333_WTH_1_R1_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0335_WTH_3_R2_(HTA-2_0).CEL
project.bgcorrect.norm.avg <- rma(dat, background=TRUE, normalize=TRUE)
## Background correcting
## Normalizing
## Calculating Expression
project.bgcorrect.norm.avg.filt <- exprs(project.bgcorrect.norm.avg)[which(rownames(project.bgcorrect.norm.avg) %in% Annotation[,1]),]
Exp <- factor(targets$Experiment)
Strain <- factor(targets$Strain, levels=c("GF","GFR"))
Treatment <- factor(targets$Treatment, levels=c("control","drug"))
This is important decision; these chips were ran from different experimental batches, thus I have controlled for experiment, essentially running a paired analysis. In an earlier iteration, I did not control for experiment and found very few differences due to the high variance between experimental replicates
design <- model.matrix(~Exp+Strain)
eset <- project.bgcorrect.norm.avg.filt
fit <- lmFit(eset, design)
fit <- eBayes(fit, trend=T, robust=T)
## Warning: 65 very small variances detected, have been offset away from zero
results <- decideTests(fit, p.value = 0.05, method = "separate", adjust.method = "fdr")
write.table(summary(results), "pairwise_test_results/cGFR.cGF/summary.cGFR.cGF.csv", quote=F)
summary(results)
## (Intercept) Exp2 StrainGFR
## Down 0 0 29
## NotSig 4 70523 70450
## Up 70519 0 44
This script also generates a temporary differential expression table for ALL probes which can be used later for comparisons with other pairwise-test results
library(data.table)
library(dplyr)
#full table
fit.cGFR.cGF <- topTable(fit, n = nrow(fit))
#diff table only, must be customized based on summary results
diff <- topTable(fit, p.value = 0.05, n = Inf, coef="StrainGFR")
setDT(diff, keep.rownames = TRUE)[]
colnames(diff)[1] <- "probe"
diff <- dplyr::left_join(diff,probe.gene.map,by="probe")
## Warning: Column `probe` joining character vector and factor, coercing into
## character vector
write.table(diff, "pairwise_test_results/cGFR.cGF/diff.cGFR.cGF", quote = F)
write.table(diff$name, "pairwise_test_results/cGFR.cGF/diff.cGFR.cGF.gene.list", quote = F)
kable(diff) %>%
kable_styling() %>%
scroll_box(width = "100%", height = "400px")
| probe | logFC | AveExpr | t | P.Value | adj.P.Val | B | name | description |
|---|---|---|---|---|---|---|---|---|
| TC05001205.hg.1 | -4.628246 | 5.478085 | -51.37863 | 3.00e-07 | 0.0226388 | 4.343165 | CDH10 | cadherin 10, type 2 (T2-cadherin) |
| TC18000084.hg.1 | -3.559838 | 5.536229 | -38.69108 | 1.20e-06 | 0.0226388 | 4.141456 | ANKRD30B | ankyrin repeat domain 30B |
| TC09000323.hg.1 | 4.167048 | 7.927503 | 34.93740 | 1.80e-06 | 0.0226388 | 4.041932 | MAMDC2 | MAM domain containing 2 |
| TC19000805.hg.1 | 3.678938 | 5.928768 | 35.70503 | 1.80e-06 | 0.0226388 | 3.995222 | ERVV-2 | endogenous retrovirus group V, member 2 |
| TC03000117.hg.1 | 2.870035 | 4.204782 | 32.22194 | 2.60e-06 | 0.0226388 | 3.950694 | ZNF385D-AS1 | ZNF385D antisense RNA 1 |
| TC0X000272.hg.1 | -2.896188 | 5.694877 | -31.54888 | 2.80e-06 | 0.0226388 | 3.924952 | GAGE12F | G antigen 12F |
| TC0X000273.hg.1 | -2.896188 | 5.694877 | -31.54888 | 2.80e-06 | 0.0226388 | 3.924952 | GAGE12F | G antigen 12F |
| TC07001184.hg.1 | 2.736362 | 5.172051 | 31.36573 | 2.90e-06 | 0.0226388 | 3.917706 | CDCA7L | cell division cycle associated 7-like |
| TC19000804.hg.1 | 3.146994 | 5.332210 | 30.61629 | 3.60e-06 | 0.0226388 | 3.803272 | ERVV-1 | endogenous retrovirus group V, member 1 |
| TC06000999.hg.1 | -4.086888 | 5.285906 | -30.30939 | 3.70e-06 | 0.0226388 | 3.790327 | EYA4 | EYA transcriptional coactivator and phosphatase 4 |
| TC01001099.hg.1 | 2.409066 | 4.707651 | 27.31432 | 5.30e-06 | 0.0226388 | 3.725890 | PDZK1 | PDZ domain containing 1 |
| TC02002606.hg.1 | -2.758627 | 6.018420 | -26.73367 | 6.20e-06 | 0.0226388 | 3.631533 | TFPI | tissue factor pathway inhibitor (lipoprotein-associated coagulation inhibitor) |
| TC17001279.hg.1 | 2.584618 | 2.707918 | 26.15430 | 6.60e-06 | 0.0226388 | 3.622087 | — | — |
| TC0X002226.hg.1 | 3.164775 | 5.597893 | 26.42792 | 6.70e-06 | 0.0226388 | 3.594009 | SMARCA1 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1 |
| TC0X002352.hg.1 | -2.580665 | 5.921162 | -25.48702 | 7.50e-06 | 0.0226388 | 3.563792 | GAGE12C | G antigen 12C |
| TC0X002349.hg.1 | -2.620039 | 6.898765 | -24.50655 | 8.40e-06 | 0.0226388 | 3.547016 | GAGE12F | G antigen 12F |
| TC0X002354.hg.1 | -2.512567 | 6.483804 | -24.44983 | 8.50e-06 | 0.0226388 | 3.542905 | GAGE12C | G antigen 12C |
| TC0X002353.hg.1 | -2.511968 | 6.484104 | -24.43050 | 8.50e-06 | 0.0226388 | 3.541499 | GAGE12C | G antigen 12C |
| TC0X002355.hg.1 | -2.471585 | 6.204097 | -24.46811 | 8.50e-06 | 0.0226388 | 3.539114 | GAGE2D | G antigen 2D |
| TC0X000274.hg.1 | -2.523305 | 6.497707 | -24.27791 | 8.70e-06 | 0.0226388 | 3.530308 | GAGE12C | G antigen 12C |
| TC0X002356.hg.1 | -2.541908 | 4.784199 | -25.25511 | 8.10e-06 | 0.0226388 | 3.520203 | GAGE13 | G antigen 13 |
| TC10001136.hg.1 | 2.669551 | 5.743115 | 24.97706 | 8.50e-06 | 0.0226388 | 3.501508 | MPP7 | membrane protein, palmitoylated 7 (MAGUK p55 subfamily member 7) |
| TC04000170.hg.1 | -2.228417 | 3.448540 | -23.62693 | 9.80e-06 | 0.0226388 | 3.480702 | RP11-552M14.1 | novel transcript |
| TC08001062.hg.1 | 3.530057 | 5.032105 | 24.53266 | 9.20e-06 | 0.0226388 | 3.470601 | STC1 | stanniocalcin 1 |
| TC16001410.hg.1 | 4.400137 | 5.923346 | 24.10786 | 9.90e-06 | 0.0226388 | 3.439822 | — | — |
| TC0X000334.hg.1 | 3.026448 | 4.168199 | 24.03499 | 1.00e-05 | 0.0226388 | 3.434416 | FOXR2 | forkhead box R2 |
| TC18000033.hg.1 | 2.873209 | 5.529335 | 23.96108 | 1.01e-05 | 0.0226388 | 3.428895 | ARHGAP28 | Rho GTPase activating protein 28 |
| TC01002163.hg.1 | 2.069735 | 5.581921 | 22.90347 | 1.12e-05 | 0.0226388 | 3.421802 | TNFRSF9 | tumor necrosis factor receptor superfamily, member 9 |
| TC0X002348.hg.1 | -2.410759 | 6.902910 | -22.66891 | 1.17e-05 | 0.0226388 | 3.401796 | GAGE2D | G antigen 2D |
| TC01003130.hg.1 | 2.457507 | 4.530527 | 23.29069 | 1.14e-05 | 0.0226388 | 3.376985 | PDZK1P2 | PDZ domain containing 1 pseudogene 2 |
| TC16000857.hg.1 | 2.913084 | 6.625634 | 23.26734 | 1.15e-05 | 0.0226388 | 3.375115 | EMP2 | epithelial membrane protein 2 |
| TC01003115.hg.1 | 2.512097 | 4.687943 | 23.20682 | 1.16e-05 | 0.0226388 | 3.370250 | PDZK1 | PDZ domain containing 1 |
| TC0X002351.hg.1 | -2.564517 | 7.208601 | -22.43019 | 1.24e-05 | 0.0226388 | 3.368584 | GAGE2B | G antigen 2B |
| TC12001572.hg.1 | -2.918966 | 4.109037 | -23.17832 | 1.17e-05 | 0.0226388 | 3.367949 | GTSF1 | gametocyte specific factor 1 |
| TC17001282.hg.1 | 2.075472 | 2.676655 | 22.09676 | 1.31e-05 | 0.0226388 | 3.351010 | — | — |
| TC02002612.hg.1 | -2.094084 | 5.836062 | -22.06609 | 1.32e-05 | 0.0226388 | 3.348205 | COL5A2 | collagen, type V, alpha 2 |
| TC10001559.hg.1 | 1.978422 | 5.501996 | 22.06182 | 1.32e-05 | 0.0226388 | 3.347814 | PIK3AP1 | phosphoinositide-3-kinase adaptor protein 1 |
| TC0X002350.hg.1 | -2.767751 | 7.172888 | -22.76734 | 1.26e-05 | 0.0226388 | 3.336760 | GAGE2D | G antigen 2D |
| TC01005783.hg.1 | 2.607030 | 4.493940 | 22.74484 | 1.26e-05 | 0.0226388 | 3.332151 | PDZK1P2 | PDZ domain containing 1 pseudogene 2 |
| TC10000771.hg.1 | 2.321318 | 5.973386 | 22.45138 | 1.31e-05 | 0.0226388 | 3.324560 | INA | internexin neuronal intermediate filament protein, alpha |
| TC11000149.hg.1 | 2.353825 | 5.552275 | 22.54415 | 1.31e-05 | 0.0226388 | 3.315053 | OLFML1 | olfactomedin-like 1 |
| TC0X001704.hg.1 | -2.528942 | 8.288097 | -21.40757 | 1.50e-05 | 0.0226936 | 3.285849 | GAGE1 | G antigen 1 |
| TC02004836.hg.1 | 1.969937 | 4.283770 | 21.50581 | 1.50e-05 | 0.0226936 | 3.279227 | COL4A4 | collagen, type IV, alpha 4 |
| TC17000383.hg.1 | 3.248247 | 6.971808 | 22.09219 | 1.43e-05 | 0.0226936 | 3.275279 | CCL2 | chemokine (C-C motif) ligand 2 |
| TC0X001917.hg.1 | 2.386292 | 4.159331 | 22.04135 | 1.45e-05 | 0.0226936 | 3.270691 | GPR50 | G protein-coupled receptor 50 |
| TC0X001353.hg.1 | 2.706157 | 5.292266 | 22.00855 | 1.45e-05 | 0.0226936 | 3.267720 | SMARCA1 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1 |
| TC06003061.hg.1 | -4.559864 | 5.397600 | -21.80603 | 1.51e-05 | 0.0226936 | 3.249151 | EYA4 | EYA transcriptional coactivator and phosphatase 4 |
| 47424360_st | -2.330965 | 3.813760 | -21.55866 | 1.59e-05 | 0.0232931 | 3.225949 | — | — |
| TC0X002315.hg.1 | -2.796348 | 7.124280 | -21.46068 | 1.62e-05 | 0.0232931 | 3.216597 | GAGE2A | G antigen 2A |
| TC15000391.hg.1 | 2.811151 | 5.713701 | 21.33499 | 1.66e-05 | 0.0234032 | 3.204463 | SLC27A2 | solute carrier family 27 (fatty acid transporter), member 2 |
| TC0X002314.hg.1 | -2.364330 | 6.175474 | -21.06156 | 1.75e-05 | 0.0242348 | 3.177522 | GAGE1 | G antigen 1 |
| TC18000230.hg.1 | 2.001510 | 4.306799 | 20.47761 | 1.97e-05 | 0.0267788 | 3.117380 | SERPINB7 | serpin peptidase inhibitor, clade B (ovalbumin), member 7 |
| TC12000656.hg.1 | -1.748827 | 4.519903 | -19.66398 | 2.15e-05 | 0.0285715 | 3.098885 | NAV3 | neuron navigator 3 |
| TC0X002358.hg.1 | -2.280241 | 6.979847 | -19.84562 | 2.19e-05 | 0.0285715 | 3.073760 | GAGE12J | G antigen 12J |
| TC05003420.hg.1 | 1.660110 | 4.677979 | 19.22849 | 2.37e-05 | 0.0298965 | 3.046592 | TAS2R1 | taste receptor, type 2, member 1 |
| TC0X001567.hg.1 | 2.123993 | 7.233006 | 19.22431 | 2.37e-05 | 0.0298965 | 3.046078 | — | — |
| TC05002946.hg.1 | 1.855209 | 3.750657 | 19.21772 | 2.44e-05 | 0.0302081 | 3.022631 | ADAMTS12 | ADAM metallopeptidase with thrombospondin type 1 motif, 12 |
| TC01003409.hg.1 | 1.627664 | 4.666166 | 18.74092 | 2.65e-05 | 0.0316426 | 2.985066 | ITLN1 | intelectin 1 (galactofuranose binding) |
| TC17001289.hg.1 | 2.388848 | 2.777901 | 19.16648 | 2.61e-05 | 0.0316426 | 2.968129 | — | — |
| TC05000914.hg.1 | 1.912424 | 5.757070 | 18.59170 | 2.97e-05 | 0.0349450 | 2.895785 | TENM2 | teneurin transmembrane protein 2 |
| TC08000127.hg.1 | 1.888398 | 6.471544 | 17.96572 | 3.21e-05 | 0.0371469 | 2.869049 | SLC7A2 | solute carrier family 7 (cationic amino acid transporter, y+ system), member 2 |
| TC0X000056.hg.1 | 1.699452 | 5.117462 | 17.85644 | 3.47e-05 | 0.0389575 | 2.810004 | RP11-1L9.1 | putative novel transcript |
| TC04002799.hg.1 | 1.643756 | 5.296234 | 17.71051 | 3.52e-05 | 0.0389575 | 2.805919 | PDGFC | platelet derived growth factor C |
| TC01002849.hg.1 | -1.516580 | 5.229775 | -17.34431 | 3.69e-05 | 0.0389575 | 2.789475 | GBP4 | guanylate binding protein 4 |
| TC17001342.hg.1 | -3.181241 | 4.689182 | -17.79054 | 3.58e-05 | 0.0389575 | 2.787024 | EVI2A | ecotropic viral integration site 2A |
| TC07002625.hg.1 | 2.233647 | 4.814454 | 17.70976 | 3.65e-05 | 0.0389575 | 2.775510 | TMEM178B | transmembrane protein 178B |
| TC0X000121.hg.1 | 1.836808 | 6.067730 | 17.57681 | 3.70e-05 | 0.0389575 | 2.772030 | SCARNA23 | small Cajal body-specific RNA 23 |
| TC0X001301.hg.1 | 3.471041 | 6.170682 | 17.29918 | 4.03e-05 | 0.0418177 | 2.715348 | RP11-232D9.3 | putative novel transcript |
| TC10001544.hg.1 | 1.894721 | 5.457671 | 17.17982 | 4.15e-05 | 0.0420208 | 2.697327 | PLCE1-AS1 | PLCE1 antisense RNA 1 |
| TC17001341.hg.1 | -3.198643 | 3.758366 | -17.16121 | 4.17e-05 | 0.0420208 | 2.694495 | EVI2B | ecotropic viral integration site 2B |
| TC0Y000006.hg.1 | 1.865103 | 7.304595 | 16.65119 | 4.39e-05 | 0.0435698 | 2.680417 | CSF2RA | colony stimulating factor 2 receptor, alpha, low-affinity (granulocyte-macrophage) |
| TC08000482.hg.1 | 1.827821 | 6.974642 | 16.47493 | 4.62e-05 | 0.0446949 | 2.646437 | RDH10 | retinol dehydrogenase 10 (all-trans) |
| TC0X000006.hg.1 | 1.694680 | 6.609224 | 16.44449 | 4.63e-05 | 0.0446949 | 2.646182 | CSF2RA | colony stimulating factor 2 receptor, alpha, low-affinity (granulocyte-macrophage) |
This table can be regarded as a final output of the pairwise differential expression pipeline.
up <- diff[diff$logFC > 0,]
write.csv(up,"pairwise_test_results/cGFR.cGF/up.cGFR.cGF",row.names = F,quote = F)
write.csv(up$name,"pairwise_test_results/cGFR.cGF/up.cGFR.cGF.gene.list",row.names = F,quote = F)
down <- diff[diff$logFC < 0,]
write.csv(down,"pairwise_test_results/cGFR.cGF/down.cGFR.cGF",row.names = F,quote = F)
write.csv(down$name,"pairwise_test_results/cGFR.cGF/down.cGFR.cGF.gene.list",row.names = F,quote = F)
png("pairwise_test_results/cGFR.cGF/cGFR.cGF.MD.png")
plotMD(fit,status = results[,3], legend = F, hl.col = c("red","blue"), main = "cGFR vs. cGF", xlim=c(0,14))
abline(0,0,col="red",lty=3)
dev.off()
## png
## 2
plotMD(fit,status = results[,3], legend = F, hl.col = c("red","blue"), main = "cGFR vs. cGF", xlim=c(0,14))
abline(0,0,col="red",lty=3)
We now have full tables for differentially expressed genes. We can also generate tables of up- and down-regulated pathways via the Reactome.db. These will be handled in a separate script.
This pipeline is adapted from https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html (this required a good deal of modification to current pipeline)
First, we get a “control set” of genes, those which have similar expression levels but are not differentially expressed. this is handled by a package “genefilter”.
library(genefilter)
library(Biobase)
#make a list of diff genes
diff.probe <- subset(x = diff)$probe
background.genes <- genefilter::genefinder(eset, as.character(diff.probe), method = "manhattan", scale = "none")
background.genes <- sapply(background.genes, function(x) x$indices)
background.genes <- as.vector(background.genes)
back.genes <- featureNames(project.bgcorrect.norm.avg)[background.genes]
We now remove any differentially expressed genes from the background gene set and verify that it is correct: the diff set and the background set should have nothing in common.
back.genes <- setdiff(back.genes, diff.probe)
intersect(back.genes, diff.probe)
## character(0)
library(cancerTiming)
multidensity(list(
all = fit.cGFR.cGF[,"AveExpr"],
fore = diff[,"AveExpr"],
back = fit.cGFR.cGF[rownames(fit.cGFR.cGF) %in% back.genes, "AveExpr"]),
col = c("#e46981", "#ae7ee2", "#a7ad4a"),
xlab = "mean expression",
main = "cGFR vs cGF: Differentially Expressed, Background, and All gene expression levels"
)
Is the differential expression set enriched in particular functional categories? This is why we have created a background gene set, so that we know the “typical” functional distribution of genes expressed at similar levels. We will connect the probe names to GO categories and run a few tests for pathway enrichment.
library(topGO)
gene.IDs <- rownames(eset)
in.universe <- gene.IDs %in% c(diff.probe,back.genes)
in.selection <- gene.IDs %in% diff.probe
all.genes <- in.selection[in.universe]
all.genes <- factor(as.integer(in.selection[in.universe]))
names(all.genes) <- gene.IDs[in.universe]
#this builds a database with all GO information related to our two test sets AND whether the gene is differentially expressed
top_GO_data <- new("topGOdata", ontology = "BP", allGenes = all.genes,
nodeSize = 10, annot = annFUN.db, affyLib = "hta20transcriptcluster.db")
result_top_GO_elim <-
runTest(top_GO_data, algorithm = "elim", statistic = "Fisher")
result_top_GO_classic <-
runTest(top_GO_data, algorithm = "classic", statistic = "Fisher")
library(stringr)
res_top_GO <- GenTable(top_GO_data, Fisher.elim = result_top_GO_elim,
Fisher.classic = result_top_GO_classic,
orderBy = "Fisher.elim" , topNodes = 100)
genes_top_GO <- printGenes(top_GO_data, whichTerms = res_top_GO$GO.ID,
chip = "hta20transcriptcluster.db", geneCutOff = 1000)
res_top_GO$sig_genes <- sapply(genes_top_GO, function(x){
str_c(paste0(x[x$'raw p-value' == 2, "Symbol.id"],";"),
collapse = "")
})
kable(res_top_GO) %>%
kable_styling() %>%
scroll_box(width = "100%", height = "400px")
| GO.ID | Term | Annotated | Significant | Expected | Rank in Fisher.classic | Fisher.elim | Fisher.classic | sig_genes |
|---|---|---|---|---|---|---|---|---|
| GO:0071345 | cellular response to cytokine stimulus | 27 | 9 | 6.39 | 1 | 0.15 | 0.15 | TNFRSF9;GBP4;TFPI;ADAMTS12;CSF2RA;CSF2RA;CSF2RA;INA;CCL2; |
| GO:0006952 | defense response | 39 | 12 | 9.23 | 2 | 0.17 | 0.17 | GBP4;ADAMTS12;SLC7A2;GAGE1;GAGE1;NA;NA;NA;NA;NA;PIK3AP1;CCL2; |
| GO:0007600 | sensory perception | 10 | 4 | 2.37 | 3 | 0.19 | 0.19 | TAS2R1;EYA4;EYA4;RDH10; |
| GO:0044087 | regulation of cellular component biogene… | 10 | 4 | 2.37 | 4 | 0.19 | 0.19 | TENM2;MPP7;NAV3;ARHGAP28; |
| GO:0050877 | nervous system process | 11 | 4 | 2.60 | 5 | 0.25 | 0.25 | TAS2R1;EYA4;EYA4;RDH10; |
| GO:0043270 | positive regulation of ion transport | 11 | 4 | 2.60 | 6 | 0.25 | 0.25 | PDZK1;NA;STC1;CCL2; |
| GO:0006820 | anion transport | 16 | 5 | 3.79 | 7 | 0.32 | 0.32 | PDZK1;NA;SLC7A2;STC1;SLC27A2; |
| GO:0050896 | response to stimulus | 132 | 33 | 31.23 | 8 | 0.34 | 0.34 | PDZK1;TNFRSF9;GBP4;NA;ITLN1;TFPI;COL5A2;PDGFC;TENM2;ADAMTS12;TAS2R1;EYA4;EYA4;SLC7A2;STC1;CSF2RA;CSF2RA;GAGE1;GPR50;GAGE1;NA;NA;NA;NA;NA;CSF2RA;INA;MPP7;PIK3AP1;SLC27A2;EMP2;CCL2;ARHGAP28; |
| GO:0034097 | response to cytokine | 33 | 9 | 7.81 | 9 | 0.37 | 0.37 | TNFRSF9;GBP4;TFPI;ADAMTS12;CSF2RA;CSF2RA;CSF2RA;INA;CCL2; |
| GO:0006325 | chromatin organization | 14 | 4 | 3.31 | 10 | 0.43 | 0.43 | EYA4;EYA4;SMARCA1;SMARCA1; |
| GO:0061448 | connective tissue development | 10 | 3 | 2.37 | 11 | 0.43 | 0.43 | ADAMTS12;STC1;SERPINB7; |
| GO:0048568 | embryonic organ development | 10 | 3 | 2.37 | 12 | 0.43 | 0.43 | COL5A2;PDGFC;RDH10; |
| GO:0001501 | skeletal system development | 19 | 5 | 4.50 | 13 | 0.48 | 0.48 | COL5A2;PDGFC;ADAMTS12;RDH10;STC1; |
| GO:0019221 | cytokine-mediated signaling pathway | 19 | 5 | 4.50 | 14 | 0.48 | 0.48 | TNFRSF9;CSF2RA;CSF2RA;CSF2RA;CCL2; |
| GO:0018212 | peptidyl-tyrosine modification | 15 | 4 | 3.55 | 15 | 0.49 | 0.49 | PDGFC;CSF2RA;CSF2RA;CSF2RA; |
| GO:0018108 | peptidyl-tyrosine phosphorylation | 15 | 4 | 3.55 | 16 | 0.49 | 0.49 | PDGFC;CSF2RA;CSF2RA;CSF2RA; |
| GO:0006886 | intracellular protein transport | 11 | 3 | 2.60 | 17 | 0.51 | 0.51 | PDZK1;NA;SLC27A2; |
| GO:0050727 | regulation of inflammatory response | 11 | 3 | 2.60 | 18 | 0.51 | 0.51 | ADAMTS12;SLC7A2;PIK3AP1; |
| GO:0051276 | chromosome organization | 16 | 4 | 3.79 | 19 | 0.55 | 0.55 | EYA4;EYA4;SMARCA1;SMARCA1; |
| GO:0006954 | inflammatory response | 16 | 4 | 3.79 | 20 | 0.55 | 0.55 | ADAMTS12;SLC7A2;PIK3AP1;CCL2; |
| GO:0030336 | negative regulation of cell migration | 12 | 3 | 2.84 | 21 | 0.57 | 0.57 | STC1;NAV3;CCL2; |
| GO:0001822 | kidney development | 12 | 3 | 2.84 | 22 | 0.57 | 0.57 | COL4A4;RDH10;SERPINB7; |
| GO:0072001 | renal system development | 12 | 3 | 2.84 | 23 | 0.57 | 0.57 | COL4A4;RDH10;SERPINB7; |
| GO:0001655 | urogenital system development | 12 | 3 | 2.84 | 24 | 0.57 | 0.57 | COL4A4;RDH10;SERPINB7; |
| GO:2000146 | negative regulation of cell motility | 12 | 3 | 2.84 | 25 | 0.57 | 0.57 | STC1;NAV3;CCL2; |
| GO:0071417 | cellular response to organonitrogen comp… | 12 | 3 | 2.84 | 26 | 0.57 | 0.57 | COL5A2;PDGFC;STC1; |
| GO:0006996 | organelle organization | 43 | 10 | 10.17 | 27 | 0.60 | 0.60 | EYA4;EYA4;SMARCA1;SMARCA1;INA;NAV3;SLC27A2;EMP2;CCL2;ARHGAP28; |
| GO:0006259 | DNA metabolic process | 17 | 4 | 4.02 | 28 | 0.60 | 0.60 | EYA4;EYA4;SMARCA1;SMARCA1; |
| GO:0046907 | intracellular transport | 17 | 4 | 4.02 | 29 | 0.60 | 0.60 | PDZK1;NA;SLC27A2;EMP2; |
| GO:0008283 | cell proliferation | 39 | 9 | 9.23 | 30 | 0.61 | 0.61 | PDZK1;TNFRSF9;NA;PDGFC;CDCA7L;STC1;EMP2;CCL2;SERPINB7; |
| GO:0071310 | cellular response to organic substance | 52 | 12 | 12.30 | 31 | 0.61 | 0.61 | TNFRSF9;GBP4;TFPI;COL5A2;PDGFC;ADAMTS12;STC1;CSF2RA;CSF2RA;CSF2RA;INA;CCL2; |
| GO:0030036 | actin cytoskeleton organization | 13 | 3 | 3.08 | 32 | 0.63 | 0.63 | INA;EMP2;ARHGAP28; |
| GO:0051271 | negative regulation of cellular componen… | 13 | 3 | 3.08 | 33 | 0.63 | 0.63 | STC1;NAV3;CCL2; |
| GO:0034762 | regulation of transmembrane transport | 13 | 3 | 3.08 | 34 | 0.63 | 0.63 | PDZK1;NA;ITLN1; |
| GO:0007186 | G protein-coupled receptor signaling pat… | 13 | 3 | 3.08 | 35 | 0.63 | 0.63 | TAS2R1;GPR50;CCL2; |
| GO:0044703 | multi-organism reproductive process | 13 | 3 | 3.08 | 36 | 0.63 | 0.63 | STC1;GTSF1;EMP2; |
| GO:0010469 | regulation of signaling receptor activit… | 13 | 3 | 3.08 | 37 | 0.63 | 0.63 | PDGFC;STC1;CCL2; |
| GO:0040013 | negative regulation of locomotion | 13 | 3 | 3.08 | 38 | 0.63 | 0.63 | STC1;NAV3;CCL2; |
| GO:0032101 | regulation of response to external stimu… | 22 | 5 | 5.21 | 39 | 0.63 | 0.63 | TFPI;ADAMTS12;SLC7A2;PIK3AP1;CCL2; |
| GO:0030029 | actin filament-based process | 18 | 4 | 4.26 | 40 | 0.66 | 0.66 | STC1;INA;EMP2;ARHGAP28; |
| GO:0034613 | cellular protein localization | 18 | 4 | 4.26 | 41 | 0.66 | 0.66 | PDZK1;NA;SLC27A2;EMP2; |
| GO:0070727 | cellular macromolecule localization | 18 | 4 | 4.26 | 42 | 0.66 | 0.66 | PDZK1;NA;SLC27A2;EMP2; |
| GO:0006950 | response to stress | 71 | 16 | 16.80 | 43 | 0.67 | 0.67 | GBP4;TFPI;ADAMTS12;EYA4;EYA4;SLC7A2;STC1;GAGE1;GAGE1;NA;NA;NA;NA;NA;PIK3AP1;CCL2; |
| GO:0000165 | MAPK cascade | 23 | 5 | 5.44 | 44 | 0.68 | 0.68 | PDGFC;CSF2RA;CSF2RA;CSF2RA;CCL2; |
| GO:0023014 | signal transduction by protein phosphory… | 23 | 5 | 5.44 | 45 | 0.68 | 0.68 | PDGFC;CSF2RA;CSF2RA;CSF2RA;CCL2; |
| GO:0007010 | cytoskeleton organization | 23 | 5 | 5.44 | 46 | 0.68 | 0.68 | INA;NAV3;EMP2;CCL2;ARHGAP28; |
| GO:0006812 | cation transport | 23 | 5 | 5.44 | 47 | 0.68 | 0.68 | PDZK1;NA;SLC7A2;STC1;CCL2; |
| GO:0044092 | negative regulation of molecular functio… | 14 | 3 | 3.31 | 48 | 0.69 | 0.69 | TFPI;ARHGAP28;SERPINB7; |
| GO:1901699 | cellular response to nitrogen compound | 14 | 3 | 3.31 | 49 | 0.69 | 0.69 | COL5A2;PDGFC;STC1; |
| GO:0035295 | tube development | 28 | 6 | 6.62 | 50 | 0.70 | 0.70 | COL4A4;PDGFC;ADAMTS12;RDH10;EMP2;CCL2; |
| GO:0043269 | regulation of ion transport | 19 | 4 | 4.50 | 51 | 0.70 | 0.70 | PDZK1;NA;STC1;CCL2; |
| GO:0046942 | carboxylic acid transport | 10 | 2 | 2.37 | 52 | 0.73 | 0.73 | SLC7A2;SLC27A2; |
| GO:0002237 | response to molecule of bacterial origin | 10 | 2 | 2.37 | 53 | 0.73 | 0.73 | TFPI;CCL2; |
| GO:0052548 | regulation of endopeptidase activity | 10 | 2 | 2.37 | 54 | 0.73 | 0.73 | TFPI;SERPINB7; |
| GO:0014068 | positive regulation of phosphatidylinosi… | 10 | 2 | 2.37 | 55 | 0.73 | 0.73 | PDGFC;PIK3AP1; |
| GO:0051222 | positive regulation of protein transport | 10 | 2 | 2.37 | 56 | 0.73 | 0.73 | PDZK1;NA; |
| GO:0050865 | regulation of cell activation | 10 | 2 | 2.37 | 57 | 0.73 | 0.73 | SLC7A2;CCL2; |
| GO:0032496 | response to lipopolysaccharide | 10 | 2 | 2.37 | 58 | 0.73 | 0.73 | TFPI;CCL2; |
| GO:0016311 | dephosphorylation | 10 | 2 | 2.37 | 59 | 0.73 | 0.73 | EYA4;EYA4; |
| GO:0015849 | organic acid transport | 10 | 2 | 2.37 | 60 | 0.73 | 0.73 | SLC7A2;SLC27A2; |
| GO:0002009 | morphogenesis of an epithelium | 10 | 2 | 2.37 | 61 | 0.73 | 0.73 | ADAMTS12;RDH10; |
| GO:0043542 | endothelial cell migration | 10 | 2 | 2.37 | 62 | 0.73 | 0.73 | STC1;EMP2; |
| GO:1904062 | regulation of cation transmembrane trans… | 10 | 2 | 2.37 | 63 | 0.73 | 0.73 | PDZK1;NA; |
| GO:0010594 | regulation of endothelial cell migration | 10 | 2 | 2.37 | 64 | 0.73 | 0.73 | STC1;EMP2; |
| GO:1904951 | positive regulation of establishment of … | 10 | 2 | 2.37 | 65 | 0.73 | 0.73 | PDZK1;NA; |
| GO:0019725 | cellular homeostasis | 10 | 2 | 2.37 | 66 | 0.73 | 0.73 | STC1;CCL2; |
| GO:0032386 | regulation of intracellular transport | 10 | 2 | 2.37 | 67 | 0.73 | 0.73 | PDZK1;NA; |
| GO:0071495 | cellular response to endogenous stimulus | 29 | 6 | 6.86 | 68 | 0.73 | 0.73 | TFPI;COL5A2;PDGFC;ADAMTS12;STC1;CCL2; |
| GO:0070887 | cellular response to chemical stimulus | 56 | 12 | 13.25 | 69 | 0.73 | 0.73 | TNFRSF9;GBP4;TFPI;COL5A2;PDGFC;ADAMTS12;STC1;CSF2RA;CSF2RA;CSF2RA;INA;CCL2; |
| GO:0001525 | angiogenesis | 15 | 3 | 3.55 | 70 | 0.73 | 0.73 | COL4A4;EMP2;CCL2; |
| GO:0098655 | cation transmembrane transport | 15 | 3 | 3.55 | 71 | 0.73 | 0.73 | PDZK1;NA;SLC7A2; |
| GO:0034622 | cellular protein-containing complex asse… | 15 | 3 | 3.55 | 72 | 0.73 | 0.73 | PDGFC;NAV3;ARHGAP28; |
| GO:0071396 | cellular response to lipid | 15 | 3 | 3.55 | 73 | 0.73 | 0.73 | TFPI;STC1;CCL2; |
| GO:0007165 | signal transduction | 78 | 17 | 18.46 | 74 | 0.74 | 0.74 | TNFRSF9;PDGFC;TENM2;ADAMTS12;TAS2R1;EYA4;EYA4;STC1;CSF2RA;CSF2RA;GPR50;CSF2RA;MPP7;PIK3AP1;EMP2;CCL2;ARHGAP28; |
| GO:0008284 | positive regulation of cell proliferatio… | 20 | 4 | 4.73 | 75 | 0.74 | 0.74 | PDGFC;CDCA7L;EMP2;SERPINB7; |
| GO:0060548 | negative regulation of cell death | 20 | 4 | 4.73 | 76 | 0.74 | 0.74 | EYA4;EYA4;CCL2;EVI2B; |
| GO:0000003 | reproduction | 20 | 4 | 4.73 | 77 | 0.74 | 0.74 | RDH10;STC1;GTSF1;EMP2; |
| GO:0033554 | cellular response to stress | 20 | 4 | 4.73 | 78 | 0.74 | 0.74 | EYA4;EYA4;STC1;CCL2; |
| GO:0043066 | negative regulation of apoptotic process | 20 | 4 | 4.73 | 79 | 0.74 | 0.74 | EYA4;EYA4;CCL2;EVI2B; |
| GO:0043069 | negative regulation of programmed cell d… | 20 | 4 | 4.73 | 80 | 0.74 | 0.74 | EYA4;EYA4;CCL2;EVI2B; |
| GO:0022414 | reproductive process | 20 | 4 | 4.73 | 81 | 0.74 | 0.74 | RDH10;STC1;GTSF1;EMP2; |
| GO:0035239 | tube morphogenesis | 25 | 5 | 5.92 | 82 | 0.75 | 0.75 | COL4A4;ADAMTS12;RDH10;EMP2;CCL2; |
| GO:0030334 | regulation of cell migration | 25 | 5 | 5.92 | 83 | 0.75 | 0.75 | PDGFC;STC1;NAV3;EMP2;CCL2; |
| GO:0051050 | positive regulation of transport | 25 | 5 | 5.92 | 84 | 0.75 | 0.75 | PDZK1;NA;ITLN1;STC1;CCL2; |
| GO:0065003 | protein-containing complex assembly | 25 | 5 | 5.92 | 85 | 0.75 | 0.75 | ITLN1;PDGFC;MPP7;NAV3;ARHGAP28; |
| GO:0051716 | cellular response to stimulus | 96 | 21 | 22.71 | 86 | 0.76 | 0.76 | TNFRSF9;GBP4;TFPI;COL5A2;PDGFC;TENM2;ADAMTS12;TAS2R1;EYA4;EYA4;STC1;CSF2RA;CSF2RA;GPR50;CSF2RA;INA;MPP7;PIK3AP1;EMP2;CCL2;ARHGAP28; |
| GO:0006793 | phosphorus metabolic process | 48 | 10 | 11.36 | 87 | 0.76 | 0.76 | ITLN1;PDGFC;EYA4;EYA4;CSF2RA;CSF2RA;CSF2RA;PIK3AP1;EMP2;CCL2; |
| GO:0006796 | phosphate-containing compound metabolic … | 48 | 10 | 11.36 | 88 | 0.76 | 0.76 | ITLN1;PDGFC;EYA4;EYA4;CSF2RA;CSF2RA;CSF2RA;PIK3AP1;EMP2;CCL2; |
| GO:0014070 | response to organic cyclic compound | 16 | 3 | 3.79 | 89 | 0.78 | 0.78 | TFPI;STC1;CCL2; |
| GO:0071229 | cellular response to acid chemical | 11 | 2 | 2.60 | 90 | 0.78 | 0.78 | COL5A2;PDGFC; |
| GO:0070374 | positive regulation of ERK1 and ERK2 cas… | 11 | 2 | 2.60 | 91 | 0.78 | 0.78 | PDGFC;CCL2; |
| GO:0048015 | phosphatidylinositol-mediated signaling | 11 | 2 | 2.60 | 92 | 0.78 | 0.78 | PDGFC;PIK3AP1; |
| GO:0048017 | inositol lipid-mediated signaling | 11 | 2 | 2.60 | 93 | 0.78 | 0.78 | PDGFC;PIK3AP1; |
| GO:0014065 | phosphatidylinositol 3-kinase signaling | 11 | 2 | 2.60 | 94 | 0.78 | 0.78 | PDGFC;PIK3AP1; |
| GO:0014066 | regulation of phosphatidylinositol 3-kin… | 11 | 2 | 2.60 | 95 | 0.78 | 0.78 | PDGFC;PIK3AP1; |
| GO:0052547 | regulation of peptidase activity | 11 | 2 | 2.60 | 96 | 0.78 | 0.78 | TFPI;SERPINB7; |
| GO:0080135 | regulation of cellular response to stres… | 11 | 2 | 2.60 | 97 | 0.78 | 0.78 | EYA4;EYA4; |
| GO:0045786 | negative regulation of cell cycle | 11 | 2 | 2.60 | 98 | 0.78 | 0.78 | CCL2;EVI2B; |
| GO:0070838 | divalent metal ion transport | 11 | 2 | 2.60 | 99 | 0.78 | 0.78 | STC1;CCL2; |
| GO:0048598 | embryonic morphogenesis | 11 | 2 | 2.60 | 100 | 0.78 | 0.78 | COL5A2;RDH10; |
write.csv(res_top_GO, "pairwise_test_results/cGFR.cGF/cGFR.cGF.GO.enrichment.csv")
This is a ranking of which GO terms are enriched according to two different testing strategies, elim and Fisher.classic. these are, roughly, probability scores and can be broadly interpreted the same way as p-scores.
nodes <- showSigOfNodes(top_GO_data, score(result_top_GO_elim), firstSigNodes = 3,
useInfo = 'def')
png("pairwise_test_results/cGFR.cGF/tGFR.tGFC.GO.nodes.png")
nodes
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 16
## Number of Edges = 19
##
## $complete.dag
## [1] "A graph with 16 nodes."
dev.off()
## png
## 2
nodes
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 16
## Number of Edges = 19
##
## $complete.dag
## [1] "A graph with 16 nodes."
Reactome maps
library(reactome.db)
library(DOSE)
library(ReactomePA)
entrez_ids <- mapIds(hta20transcriptcluster.db,
keys = rownames(eset),
keytype = "PROBEID",
column = "ENTREZID",
multiVals = "first")
reactome_enrich <- enrichPathway(gene = entrez_ids[diff.probe],
universe = entrez_ids[c(diff.probe,
back.genes)],
organism = "human",
pvalueCutoff = 1,
qvalueCutoff = 1,
readable = TRUE)
reactome_enrich@result$Description <- paste0(str_sub(
reactome_enrich@result$Description, 1, 20),
"...")
write.csv(as.data.frame(reactome_enrich), "pairwise_test_results/cGFR.cGF/cGFR.cGF.reactome.enrichment.csv")
kable(reactome_enrich) %>%
kable_styling() %>%
scroll_box(width = "100%", height = "400px")
| ID | Description | GeneRatio | BgRatio | pvalue | p.adjust | qvalue | geneID | Count | |
|---|---|---|---|---|---|---|---|---|---|
| R-HSA-1280215 | R-HSA-1280215 | Cytokine Signaling i… | 4/17 | 11/114 | 0.0583678 | 0.7388291 | 0.7388291 | TNFRSF9/CCL2/CSF2RA/GBP4 | 4 |
| R-HSA-168256 | R-HSA-168256 | Immune System… | 7/17 | 31/114 | 0.1345785 | 0.7388291 | 0.7388291 | TNFRSF9/PIK3AP1/CCL2/SLC27A2/CSF2RA/ITLN1/GBP4 | 7 |
| R-HSA-162582 | R-HSA-162582 | Signal Transduction… | 7/17 | 34/114 | 0.2031929 | 0.7388291 | 0.7388291 | COL5A2/PIK3AP1/COL4A4/TAS2R1/CSF2RA/PDGFC/RDH10 | 7 |
| R-HSA-9006934 | R-HSA-9006934 | Signaling by Recepto… | 3/17 | 11/114 | 0.2110940 | 0.7388291 | 0.7388291 | COL5A2/COL4A4/PDGFC | 3 |
| R-HSA-1643685 | R-HSA-1643685 | Disease… | 3/17 | 16/114 | 0.4395847 | 0.9104492 | 0.9104492 | PIK3AP1/CSF2RA/ADAMTS12 | 3 |
| R-HSA-422475 | R-HSA-422475 | Axon guidance… | 2/17 | 12/114 | 0.5647665 | 0.9104492 | 0.9104492 | COL5A2/COL4A4 | 2 |
| R-HSA-392499 | R-HSA-392499 | Metabolism of protei… | 3/17 | 19/114 | 0.5705701 | 0.9104492 | 0.9104492 | CCL2/CSF2RA/ADAMTS12 | 3 |
| R-HSA-1266738 | R-HSA-1266738 | Developmental Biolog… | 2/17 | 16/114 | 0.7345116 | 0.9104492 | 0.9104492 | COL5A2/COL4A4 | 2 |
| R-HSA-1474244 | R-HSA-1474244 | Extracellular matrix… | 2/17 | 16/114 | 0.7345116 | 0.9104492 | 0.9104492 | COL5A2/COL4A4 | 2 |
| R-HSA-168249 | R-HSA-168249 | Innate Immune System… | 2/17 | 16/114 | 0.7345116 | 0.9104492 | 0.9104492 | SLC27A2/ITLN1 | 2 |
| R-HSA-109582 | R-HSA-109582 | Hemostasis… | 1/17 | 11/114 | 0.8454171 | 0.9104492 | 0.9104492 | TFPI | 1 |
| R-HSA-556833 | R-HSA-556833 | Metabolism of lipids… | 1/17 | 11/114 | 0.8454171 | 0.9104492 | 0.9104492 | SLC27A2 | 1 |
| R-HSA-597592 | R-HSA-597592 | Post-translational p… | 1/17 | 11/114 | 0.8454171 | 0.9104492 | 0.9104492 | ADAMTS12 | 1 |
| R-HSA-1430728 | R-HSA-1430728 | Metabolism… | 1/17 | 23/114 | 0.9845429 | 0.9845429 | 0.9845429 | SLC27A2 | 1 |
Barplots, showing reaction categories represented and their respective p-values:
png("pairwise_test_results/cGFR.cGF/cGFR.cGF.reactome.barplot.png")
barplot(reactome_enrich)
dev.off()
## png
## 2
barplot(reactome_enrich)
png("pairwise_test_results/cGFR.cGF/cGFR.cGF.reactome.emapplot.png")
emapplot(reactome_enrich, showCategory = 10)
dev.off()
## png
## 2
emapplot(reactome_enrich, showCategory = 10)
In this plot, color is p-value, size is proportional to the number of DE genes that mapped to the pathway, and the edge thickness is proportional to the number of shared genes between the units. This is useful for examining the interdependence of the pathways; which results should be taken more serioiusly, are independent? This particular plot shows the top ten most significant pathways, regardless of their p-values