Differential Gene Expressionand Pathway Enrichment: cGFR vs. cGF

Pipeline overview

This script performs pairwise testing for gene-expression results via Affymetrix gene chip, visualizes the results, places the differentially-expressed genes into pathways via GO and Reactome, and performs pathway enrichment analyses.

Sources: The second half of this script (pathway enrichment ) draws from https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html

Preparation for adaptation to a different pairwise comparison within the project

You should already have prepared test matrix files according to limma protocols

  1. The strings “cGFR.cGF” and “cGFR vs. cGF” are used throughout the script to denote the test being performed. These should be bulk find-and-replaced throughout the script
  2. The “targets” matrix in section “3. Set up test matrices” must be changed to the appropriate test
  3. Sections “5. Load in the metadata” and “6. Set up the test” must be customized to the desired test
  4. In Section 8. chance the “coef=” to the appropriate category

Results files;

1. Prepare the base environment

2. Prepare Annotation Table to yield useful mappings of probe to gene

Annotation <- read.csv("HTA-2_0.na35.2.hg19.transcript.csv/HTA-2_0.na35.2.hg19.transcript.csv", header=TRUE, skip=19)
probe.gene.map <- read.csv("probe.gene.map.csv")

3. Set up test matrices

Differential abundance can only be performed on pairs of sample sets. Each pairwise test has to be configured as a separate file. Importing the test matrices:

targets <- read.csv("targets.noDrug.Strain.txt", as.is=T)

all of the samples and metadata are listed here:

targets 
##                                      FileName Strain Treatment Experiment
## 1   20160505_HTA2.0_0316_WTH_1R_(HTA-2_0).CEL     GF   control          1
## 2   20160505_HTA2.0_0318_WTH_3R_(HTA-2_0).CEL    GFR   control          1
## 3 20160616_HTA2.0_0333_WTH_1_R1_(HTA-2_0).CEL     GF   control          2
## 4 20160616_HTA2.0_0335_WTH_3_R2_(HTA-2_0).CEL    GFR   control          2

4. Define the subset of .CEL files, read them in, and normalize them

dat <- read.celfiles(targets$FileName)
## Reading in : 20160505_HTA2.0_0316_WTH_1R_(HTA-2_0).CEL
## Reading in : 20160505_HTA2.0_0318_WTH_3R_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0333_WTH_1_R1_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0335_WTH_3_R2_(HTA-2_0).CEL
project.bgcorrect.norm.avg <- rma(dat, background=TRUE, normalize=TRUE)
## Background correcting
## Normalizing
## Calculating Expression
project.bgcorrect.norm.avg.filt <- exprs(project.bgcorrect.norm.avg)[which(rownames(project.bgcorrect.norm.avg) %in% Annotation[,1]),]

5. Load in the metadata

Exp <- factor(targets$Experiment)
Strain <- factor(targets$Strain, levels=c("GF","GFR"))
Treatment <- factor(targets$Treatment, levels=c("control","drug"))

6. Define the test

This is important decision; these chips were ran from different experimental batches, thus I have controlled for experiment, essentially running a paired analysis. In an earlier iteration, I did not control for experiment and found very few differences due to the high variance between experimental replicates

design <- model.matrix(~Exp+Strain)

7. Run “fit” algorithms and separate out the significantly different probes

eset <- project.bgcorrect.norm.avg.filt
fit <- lmFit(eset, design)
fit <- eBayes(fit, trend=T, robust=T)
## Warning: 65 very small variances detected, have been offset away from zero
results <- decideTests(fit, p.value = 0.05, method = "separate", adjust.method = "fdr")
write.table(summary(results), "pairwise_test_results/cGFR.cGF/summary.cGFR.cGF.csv", quote=F)

This is a simple summary of the differences, only the far right column

summary(results)
##        (Intercept)  Exp2 StrainGFR
## Down             0     0        29
## NotSig           4 70523     70450
## Up           70519     0        44

8. Generate tables of the differentially expressed genes and attach gene name/function information

This script also generates a temporary differential expression table for ALL probes which can be used later for comparisons with other pairwise-test results

library(data.table)
library(dplyr)
#full table
fit.cGFR.cGF <- topTable(fit, n = nrow(fit))
#diff table only, must be customized based on summary results
diff <- topTable(fit, p.value = 0.05,  n = Inf, coef="StrainGFR")
setDT(diff, keep.rownames = TRUE)[]
colnames(diff)[1] <- "probe"
diff <- dplyr::left_join(diff,probe.gene.map,by="probe")
## Warning: Column `probe` joining character vector and factor, coercing into
## character vector
write.table(diff, "pairwise_test_results/cGFR.cGF/diff.cGFR.cGF", quote = F)
write.table(diff$name, "pairwise_test_results/cGFR.cGF/diff.cGFR.cGF.gene.list", quote = F)
kable(diff) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "400px")
probe logFC AveExpr t P.Value adj.P.Val B name description
TC05001205.hg.1 -4.628246 5.478085 -51.37863 3.00e-07 0.0226388 4.343165 CDH10 cadherin 10, type 2 (T2-cadherin)
TC18000084.hg.1 -3.559838 5.536229 -38.69108 1.20e-06 0.0226388 4.141456 ANKRD30B ankyrin repeat domain 30B
TC09000323.hg.1 4.167048 7.927503 34.93740 1.80e-06 0.0226388 4.041932 MAMDC2 MAM domain containing 2
TC19000805.hg.1 3.678938 5.928768 35.70503 1.80e-06 0.0226388 3.995222 ERVV-2 endogenous retrovirus group V, member 2
TC03000117.hg.1 2.870035 4.204782 32.22194 2.60e-06 0.0226388 3.950694 ZNF385D-AS1 ZNF385D antisense RNA 1
TC0X000272.hg.1 -2.896188 5.694877 -31.54888 2.80e-06 0.0226388 3.924952 GAGE12F G antigen 12F
TC0X000273.hg.1 -2.896188 5.694877 -31.54888 2.80e-06 0.0226388 3.924952 GAGE12F G antigen 12F
TC07001184.hg.1 2.736362 5.172051 31.36573 2.90e-06 0.0226388 3.917706 CDCA7L cell division cycle associated 7-like
TC19000804.hg.1 3.146994 5.332210 30.61629 3.60e-06 0.0226388 3.803272 ERVV-1 endogenous retrovirus group V, member 1
TC06000999.hg.1 -4.086888 5.285906 -30.30939 3.70e-06 0.0226388 3.790327 EYA4 EYA transcriptional coactivator and phosphatase 4
TC01001099.hg.1 2.409066 4.707651 27.31432 5.30e-06 0.0226388 3.725890 PDZK1 PDZ domain containing 1
TC02002606.hg.1 -2.758627 6.018420 -26.73367 6.20e-06 0.0226388 3.631533 TFPI tissue factor pathway inhibitor (lipoprotein-associated coagulation inhibitor)
TC17001279.hg.1 2.584618 2.707918 26.15430 6.60e-06 0.0226388 3.622087
TC0X002226.hg.1 3.164775 5.597893 26.42792 6.70e-06 0.0226388 3.594009 SMARCA1 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1
TC0X002352.hg.1 -2.580665 5.921162 -25.48702 7.50e-06 0.0226388 3.563792 GAGE12C G antigen 12C
TC0X002349.hg.1 -2.620039 6.898765 -24.50655 8.40e-06 0.0226388 3.547016 GAGE12F G antigen 12F
TC0X002354.hg.1 -2.512567 6.483804 -24.44983 8.50e-06 0.0226388 3.542905 GAGE12C G antigen 12C
TC0X002353.hg.1 -2.511968 6.484104 -24.43050 8.50e-06 0.0226388 3.541499 GAGE12C G antigen 12C
TC0X002355.hg.1 -2.471585 6.204097 -24.46811 8.50e-06 0.0226388 3.539114 GAGE2D G antigen 2D
TC0X000274.hg.1 -2.523305 6.497707 -24.27791 8.70e-06 0.0226388 3.530308 GAGE12C G antigen 12C
TC0X002356.hg.1 -2.541908 4.784199 -25.25511 8.10e-06 0.0226388 3.520203 GAGE13 G antigen 13
TC10001136.hg.1 2.669551 5.743115 24.97706 8.50e-06 0.0226388 3.501508 MPP7 membrane protein, palmitoylated 7 (MAGUK p55 subfamily member 7)
TC04000170.hg.1 -2.228417 3.448540 -23.62693 9.80e-06 0.0226388 3.480702 RP11-552M14.1 novel transcript
TC08001062.hg.1 3.530057 5.032105 24.53266 9.20e-06 0.0226388 3.470601 STC1 stanniocalcin 1
TC16001410.hg.1 4.400137 5.923346 24.10786 9.90e-06 0.0226388 3.439822
TC0X000334.hg.1 3.026448 4.168199 24.03499 1.00e-05 0.0226388 3.434416 FOXR2 forkhead box R2
TC18000033.hg.1 2.873209 5.529335 23.96108 1.01e-05 0.0226388 3.428895 ARHGAP28 Rho GTPase activating protein 28
TC01002163.hg.1 2.069735 5.581921 22.90347 1.12e-05 0.0226388 3.421802 TNFRSF9 tumor necrosis factor receptor superfamily, member 9
TC0X002348.hg.1 -2.410759 6.902910 -22.66891 1.17e-05 0.0226388 3.401796 GAGE2D G antigen 2D
TC01003130.hg.1 2.457507 4.530527 23.29069 1.14e-05 0.0226388 3.376985 PDZK1P2 PDZ domain containing 1 pseudogene 2
TC16000857.hg.1 2.913084 6.625634 23.26734 1.15e-05 0.0226388 3.375115 EMP2 epithelial membrane protein 2
TC01003115.hg.1 2.512097 4.687943 23.20682 1.16e-05 0.0226388 3.370250 PDZK1 PDZ domain containing 1
TC0X002351.hg.1 -2.564517 7.208601 -22.43019 1.24e-05 0.0226388 3.368584 GAGE2B G antigen 2B
TC12001572.hg.1 -2.918966 4.109037 -23.17832 1.17e-05 0.0226388 3.367949 GTSF1 gametocyte specific factor 1
TC17001282.hg.1 2.075472 2.676655 22.09676 1.31e-05 0.0226388 3.351010
TC02002612.hg.1 -2.094084 5.836062 -22.06609 1.32e-05 0.0226388 3.348205 COL5A2 collagen, type V, alpha 2
TC10001559.hg.1 1.978422 5.501996 22.06182 1.32e-05 0.0226388 3.347814 PIK3AP1 phosphoinositide-3-kinase adaptor protein 1
TC0X002350.hg.1 -2.767751 7.172888 -22.76734 1.26e-05 0.0226388 3.336760 GAGE2D G antigen 2D
TC01005783.hg.1 2.607030 4.493940 22.74484 1.26e-05 0.0226388 3.332151 PDZK1P2 PDZ domain containing 1 pseudogene 2
TC10000771.hg.1 2.321318 5.973386 22.45138 1.31e-05 0.0226388 3.324560 INA internexin neuronal intermediate filament protein, alpha
TC11000149.hg.1 2.353825 5.552275 22.54415 1.31e-05 0.0226388 3.315053 OLFML1 olfactomedin-like 1
TC0X001704.hg.1 -2.528942 8.288097 -21.40757 1.50e-05 0.0226936 3.285849 GAGE1 G antigen 1
TC02004836.hg.1 1.969937 4.283770 21.50581 1.50e-05 0.0226936 3.279227 COL4A4 collagen, type IV, alpha 4
TC17000383.hg.1 3.248247 6.971808 22.09219 1.43e-05 0.0226936 3.275279 CCL2 chemokine (C-C motif) ligand 2
TC0X001917.hg.1 2.386292 4.159331 22.04135 1.45e-05 0.0226936 3.270691 GPR50 G protein-coupled receptor 50
TC0X001353.hg.1 2.706157 5.292266 22.00855 1.45e-05 0.0226936 3.267720 SMARCA1 SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a, member 1
TC06003061.hg.1 -4.559864 5.397600 -21.80603 1.51e-05 0.0226936 3.249151 EYA4 EYA transcriptional coactivator and phosphatase 4
47424360_st -2.330965 3.813760 -21.55866 1.59e-05 0.0232931 3.225949
TC0X002315.hg.1 -2.796348 7.124280 -21.46068 1.62e-05 0.0232931 3.216597 GAGE2A G antigen 2A
TC15000391.hg.1 2.811151 5.713701 21.33499 1.66e-05 0.0234032 3.204463 SLC27A2 solute carrier family 27 (fatty acid transporter), member 2
TC0X002314.hg.1 -2.364330 6.175474 -21.06156 1.75e-05 0.0242348 3.177522 GAGE1 G antigen 1
TC18000230.hg.1 2.001510 4.306799 20.47761 1.97e-05 0.0267788 3.117380 SERPINB7 serpin peptidase inhibitor, clade B (ovalbumin), member 7
TC12000656.hg.1 -1.748827 4.519903 -19.66398 2.15e-05 0.0285715 3.098885 NAV3 neuron navigator 3
TC0X002358.hg.1 -2.280241 6.979847 -19.84562 2.19e-05 0.0285715 3.073760 GAGE12J G antigen 12J
TC05003420.hg.1 1.660110 4.677979 19.22849 2.37e-05 0.0298965 3.046592 TAS2R1 taste receptor, type 2, member 1
TC0X001567.hg.1 2.123993 7.233006 19.22431 2.37e-05 0.0298965 3.046078
TC05002946.hg.1 1.855209 3.750657 19.21772 2.44e-05 0.0302081 3.022631 ADAMTS12 ADAM metallopeptidase with thrombospondin type 1 motif, 12
TC01003409.hg.1 1.627664 4.666166 18.74092 2.65e-05 0.0316426 2.985066 ITLN1 intelectin 1 (galactofuranose binding)
TC17001289.hg.1 2.388848 2.777901 19.16648 2.61e-05 0.0316426 2.968129
TC05000914.hg.1 1.912424 5.757070 18.59170 2.97e-05 0.0349450 2.895785 TENM2 teneurin transmembrane protein 2
TC08000127.hg.1 1.888398 6.471544 17.96572 3.21e-05 0.0371469 2.869049 SLC7A2 solute carrier family 7 (cationic amino acid transporter, y+ system), member 2
TC0X000056.hg.1 1.699452 5.117462 17.85644 3.47e-05 0.0389575 2.810004 RP11-1L9.1 putative novel transcript
TC04002799.hg.1 1.643756 5.296234 17.71051 3.52e-05 0.0389575 2.805919 PDGFC platelet derived growth factor C
TC01002849.hg.1 -1.516580 5.229775 -17.34431 3.69e-05 0.0389575 2.789475 GBP4 guanylate binding protein 4
TC17001342.hg.1 -3.181241 4.689182 -17.79054 3.58e-05 0.0389575 2.787024 EVI2A ecotropic viral integration site 2A
TC07002625.hg.1 2.233647 4.814454 17.70976 3.65e-05 0.0389575 2.775510 TMEM178B transmembrane protein 178B
TC0X000121.hg.1 1.836808 6.067730 17.57681 3.70e-05 0.0389575 2.772030 SCARNA23 small Cajal body-specific RNA 23
TC0X001301.hg.1 3.471041 6.170682 17.29918 4.03e-05 0.0418177 2.715348 RP11-232D9.3 putative novel transcript
TC10001544.hg.1 1.894721 5.457671 17.17982 4.15e-05 0.0420208 2.697327 PLCE1-AS1 PLCE1 antisense RNA 1
TC17001341.hg.1 -3.198643 3.758366 -17.16121 4.17e-05 0.0420208 2.694495 EVI2B ecotropic viral integration site 2B
TC0Y000006.hg.1 1.865103 7.304595 16.65119 4.39e-05 0.0435698 2.680417 CSF2RA colony stimulating factor 2 receptor, alpha, low-affinity (granulocyte-macrophage)
TC08000482.hg.1 1.827821 6.974642 16.47493 4.62e-05 0.0446949 2.646437 RDH10 retinol dehydrogenase 10 (all-trans)
TC0X000006.hg.1 1.694680 6.609224 16.44449 4.63e-05 0.0446949 2.646182 CSF2RA colony stimulating factor 2 receptor, alpha, low-affinity (granulocyte-macrophage)

This table can be regarded as a final output of the pairwise differential expression pipeline.

9.split this into up- and down-regulated gene tables and lists for analysis and pathway mapping

up <- diff[diff$logFC > 0,]
write.csv(up,"pairwise_test_results/cGFR.cGF/up.cGFR.cGF",row.names = F,quote = F)
write.csv(up$name,"pairwise_test_results/cGFR.cGF/up.cGFR.cGF.gene.list",row.names = F,quote = F)
down <- diff[diff$logFC < 0,]
write.csv(down,"pairwise_test_results/cGFR.cGF/down.cGFR.cGF",row.names = F,quote = F)
write.csv(down$name,"pairwise_test_results/cGFR.cGF/down.cGFR.cGF.gene.list",row.names = F,quote = F)

10. Plotting intensity and variance, highlighting significant differences (plot shown at start of document also)

png("pairwise_test_results/cGFR.cGF/cGFR.cGF.MD.png")
plotMD(fit,status = results[,3], legend = F, hl.col = c("red","blue"), main = "cGFR vs. cGF", xlim=c(0,14))
abline(0,0,col="red",lty=3)
dev.off()
## png 
##   2
plotMD(fit,status = results[,3], legend = F, hl.col = c("red","blue"), main = "cGFR vs. cGF", xlim=c(0,14))
abline(0,0,col="red",lty=3)

We now have full tables for differentially expressed genes. We can also generate tables of up- and down-regulated pathways via the Reactome.db. These will be handled in a separate script.

11. Linking to GO enrichment analysis: step 1, gather a set of background genes to compare

This pipeline is adapted from https://bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html (this required a good deal of modification to current pipeline)

First, we get a “control set” of genes, those which have similar expression levels but are not differentially expressed. this is handled by a package “genefilter”.

library(genefilter)
library(Biobase)
#make a list of diff genes
diff.probe <- subset(x = diff)$probe
background.genes <- genefilter::genefinder(eset, as.character(diff.probe), method = "manhattan", scale = "none")
background.genes <- sapply(background.genes, function(x) x$indices)
background.genes <- as.vector(background.genes)
back.genes <- featureNames(project.bgcorrect.norm.avg)[background.genes]

We now remove any differentially expressed genes from the background gene set and verify that it is correct: the diff set and the background set should have nothing in common.

back.genes <- setdiff(back.genes, diff.probe)
intersect(back.genes, diff.probe)
## character(0)

12. Examine mean intensity of the diff set, background set and all probes

library(cancerTiming)
multidensity(list(
    all = fit.cGFR.cGF[,"AveExpr"],
    fore = diff[,"AveExpr"],  
    back = fit.cGFR.cGF[rownames(fit.cGFR.cGF) %in% back.genes, "AveExpr"]),
  col = c("#e46981", "#ae7ee2", "#a7ad4a"),
  xlab = "mean expression",
  main = "cGFR vs cGF: Differentially Expressed, Background, and All gene expression levels"
)

13. GO enrichment analysis

Is the differential expression set enriched in particular functional categories? This is why we have created a background gene set, so that we know the “typical” functional distribution of genes expressed at similar levels. We will connect the probe names to GO categories and run a few tests for pathway enrichment.

library(topGO)

gene.IDs <- rownames(eset)
in.universe <- gene.IDs %in% c(diff.probe,back.genes)
in.selection <- gene.IDs %in% diff.probe

all.genes <- in.selection[in.universe]
all.genes <- factor(as.integer(in.selection[in.universe]))
names(all.genes) <- gene.IDs[in.universe]

#this builds a database with all GO information related to our two test sets AND whether the gene is differentially expressed
top_GO_data <- new("topGOdata", ontology = "BP", allGenes = all.genes,
 nodeSize = 10, annot = annFUN.db, affyLib = "hta20transcriptcluster.db")

result_top_GO_elim <- 
  runTest(top_GO_data, algorithm = "elim", statistic = "Fisher")
result_top_GO_classic <- 
  runTest(top_GO_data, algorithm = "classic", statistic = "Fisher")
library(stringr)
res_top_GO <- GenTable(top_GO_data, Fisher.elim = result_top_GO_elim,
        Fisher.classic = result_top_GO_classic,
        orderBy = "Fisher.elim" , topNodes = 100)

genes_top_GO <- printGenes(top_GO_data, whichTerms = res_top_GO$GO.ID,
    chip = "hta20transcriptcluster.db", geneCutOff = 1000)

res_top_GO$sig_genes <- sapply(genes_top_GO, function(x){
                str_c(paste0(x[x$'raw p-value' == 2, "Symbol.id"],";"), 
                      collapse = "")
    })
kable(res_top_GO) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "400px")
GO.ID Term Annotated Significant Expected Rank in Fisher.classic Fisher.elim Fisher.classic sig_genes
GO:0071345 cellular response to cytokine stimulus 27 9 6.39 1 0.15 0.15 TNFRSF9;GBP4;TFPI;ADAMTS12;CSF2RA;CSF2RA;CSF2RA;INA;CCL2;
GO:0006952 defense response 39 12 9.23 2 0.17 0.17 GBP4;ADAMTS12;SLC7A2;GAGE1;GAGE1;NA;NA;NA;NA;NA;PIK3AP1;CCL2;
GO:0007600 sensory perception 10 4 2.37 3 0.19 0.19 TAS2R1;EYA4;EYA4;RDH10;
GO:0044087 regulation of cellular component biogene… 10 4 2.37 4 0.19 0.19 TENM2;MPP7;NAV3;ARHGAP28;
GO:0050877 nervous system process 11 4 2.60 5 0.25 0.25 TAS2R1;EYA4;EYA4;RDH10;
GO:0043270 positive regulation of ion transport 11 4 2.60 6 0.25 0.25 PDZK1;NA;STC1;CCL2;
GO:0006820 anion transport 16 5 3.79 7 0.32 0.32 PDZK1;NA;SLC7A2;STC1;SLC27A2;
GO:0050896 response to stimulus 132 33 31.23 8 0.34 0.34 PDZK1;TNFRSF9;GBP4;NA;ITLN1;TFPI;COL5A2;PDGFC;TENM2;ADAMTS12;TAS2R1;EYA4;EYA4;SLC7A2;STC1;CSF2RA;CSF2RA;GAGE1;GPR50;GAGE1;NA;NA;NA;NA;NA;CSF2RA;INA;MPP7;PIK3AP1;SLC27A2;EMP2;CCL2;ARHGAP28;
GO:0034097 response to cytokine 33 9 7.81 9 0.37 0.37 TNFRSF9;GBP4;TFPI;ADAMTS12;CSF2RA;CSF2RA;CSF2RA;INA;CCL2;
GO:0006325 chromatin organization 14 4 3.31 10 0.43 0.43 EYA4;EYA4;SMARCA1;SMARCA1;
GO:0061448 connective tissue development 10 3 2.37 11 0.43 0.43 ADAMTS12;STC1;SERPINB7;
GO:0048568 embryonic organ development 10 3 2.37 12 0.43 0.43 COL5A2;PDGFC;RDH10;
GO:0001501 skeletal system development 19 5 4.50 13 0.48 0.48 COL5A2;PDGFC;ADAMTS12;RDH10;STC1;
GO:0019221 cytokine-mediated signaling pathway 19 5 4.50 14 0.48 0.48 TNFRSF9;CSF2RA;CSF2RA;CSF2RA;CCL2;
GO:0018212 peptidyl-tyrosine modification 15 4 3.55 15 0.49 0.49 PDGFC;CSF2RA;CSF2RA;CSF2RA;
GO:0018108 peptidyl-tyrosine phosphorylation 15 4 3.55 16 0.49 0.49 PDGFC;CSF2RA;CSF2RA;CSF2RA;
GO:0006886 intracellular protein transport 11 3 2.60 17 0.51 0.51 PDZK1;NA;SLC27A2;
GO:0050727 regulation of inflammatory response 11 3 2.60 18 0.51 0.51 ADAMTS12;SLC7A2;PIK3AP1;
GO:0051276 chromosome organization 16 4 3.79 19 0.55 0.55 EYA4;EYA4;SMARCA1;SMARCA1;
GO:0006954 inflammatory response 16 4 3.79 20 0.55 0.55 ADAMTS12;SLC7A2;PIK3AP1;CCL2;
GO:0030336 negative regulation of cell migration 12 3 2.84 21 0.57 0.57 STC1;NAV3;CCL2;
GO:0001822 kidney development 12 3 2.84 22 0.57 0.57 COL4A4;RDH10;SERPINB7;
GO:0072001 renal system development 12 3 2.84 23 0.57 0.57 COL4A4;RDH10;SERPINB7;
GO:0001655 urogenital system development 12 3 2.84 24 0.57 0.57 COL4A4;RDH10;SERPINB7;
GO:2000146 negative regulation of cell motility 12 3 2.84 25 0.57 0.57 STC1;NAV3;CCL2;
GO:0071417 cellular response to organonitrogen comp… 12 3 2.84 26 0.57 0.57 COL5A2;PDGFC;STC1;
GO:0006996 organelle organization 43 10 10.17 27 0.60 0.60 EYA4;EYA4;SMARCA1;SMARCA1;INA;NAV3;SLC27A2;EMP2;CCL2;ARHGAP28;
GO:0006259 DNA metabolic process 17 4 4.02 28 0.60 0.60 EYA4;EYA4;SMARCA1;SMARCA1;
GO:0046907 intracellular transport 17 4 4.02 29 0.60 0.60 PDZK1;NA;SLC27A2;EMP2;
GO:0008283 cell proliferation 39 9 9.23 30 0.61 0.61 PDZK1;TNFRSF9;NA;PDGFC;CDCA7L;STC1;EMP2;CCL2;SERPINB7;
GO:0071310 cellular response to organic substance 52 12 12.30 31 0.61 0.61 TNFRSF9;GBP4;TFPI;COL5A2;PDGFC;ADAMTS12;STC1;CSF2RA;CSF2RA;CSF2RA;INA;CCL2;
GO:0030036 actin cytoskeleton organization 13 3 3.08 32 0.63 0.63 INA;EMP2;ARHGAP28;
GO:0051271 negative regulation of cellular componen… 13 3 3.08 33 0.63 0.63 STC1;NAV3;CCL2;
GO:0034762 regulation of transmembrane transport 13 3 3.08 34 0.63 0.63 PDZK1;NA;ITLN1;
GO:0007186 G protein-coupled receptor signaling pat… 13 3 3.08 35 0.63 0.63 TAS2R1;GPR50;CCL2;
GO:0044703 multi-organism reproductive process 13 3 3.08 36 0.63 0.63 STC1;GTSF1;EMP2;
GO:0010469 regulation of signaling receptor activit… 13 3 3.08 37 0.63 0.63 PDGFC;STC1;CCL2;
GO:0040013 negative regulation of locomotion 13 3 3.08 38 0.63 0.63 STC1;NAV3;CCL2;
GO:0032101 regulation of response to external stimu… 22 5 5.21 39 0.63 0.63 TFPI;ADAMTS12;SLC7A2;PIK3AP1;CCL2;
GO:0030029 actin filament-based process 18 4 4.26 40 0.66 0.66 STC1;INA;EMP2;ARHGAP28;
GO:0034613 cellular protein localization 18 4 4.26 41 0.66 0.66 PDZK1;NA;SLC27A2;EMP2;
GO:0070727 cellular macromolecule localization 18 4 4.26 42 0.66 0.66 PDZK1;NA;SLC27A2;EMP2;
GO:0006950 response to stress 71 16 16.80 43 0.67 0.67 GBP4;TFPI;ADAMTS12;EYA4;EYA4;SLC7A2;STC1;GAGE1;GAGE1;NA;NA;NA;NA;NA;PIK3AP1;CCL2;
GO:0000165 MAPK cascade 23 5 5.44 44 0.68 0.68 PDGFC;CSF2RA;CSF2RA;CSF2RA;CCL2;
GO:0023014 signal transduction by protein phosphory… 23 5 5.44 45 0.68 0.68 PDGFC;CSF2RA;CSF2RA;CSF2RA;CCL2;
GO:0007010 cytoskeleton organization 23 5 5.44 46 0.68 0.68 INA;NAV3;EMP2;CCL2;ARHGAP28;
GO:0006812 cation transport 23 5 5.44 47 0.68 0.68 PDZK1;NA;SLC7A2;STC1;CCL2;
GO:0044092 negative regulation of molecular functio… 14 3 3.31 48 0.69 0.69 TFPI;ARHGAP28;SERPINB7;
GO:1901699 cellular response to nitrogen compound 14 3 3.31 49 0.69 0.69 COL5A2;PDGFC;STC1;
GO:0035295 tube development 28 6 6.62 50 0.70 0.70 COL4A4;PDGFC;ADAMTS12;RDH10;EMP2;CCL2;
GO:0043269 regulation of ion transport 19 4 4.50 51 0.70 0.70 PDZK1;NA;STC1;CCL2;
GO:0046942 carboxylic acid transport 10 2 2.37 52 0.73 0.73 SLC7A2;SLC27A2;
GO:0002237 response to molecule of bacterial origin 10 2 2.37 53 0.73 0.73 TFPI;CCL2;
GO:0052548 regulation of endopeptidase activity 10 2 2.37 54 0.73 0.73 TFPI;SERPINB7;
GO:0014068 positive regulation of phosphatidylinosi… 10 2 2.37 55 0.73 0.73 PDGFC;PIK3AP1;
GO:0051222 positive regulation of protein transport 10 2 2.37 56 0.73 0.73 PDZK1;NA;
GO:0050865 regulation of cell activation 10 2 2.37 57 0.73 0.73 SLC7A2;CCL2;
GO:0032496 response to lipopolysaccharide 10 2 2.37 58 0.73 0.73 TFPI;CCL2;
GO:0016311 dephosphorylation 10 2 2.37 59 0.73 0.73 EYA4;EYA4;
GO:0015849 organic acid transport 10 2 2.37 60 0.73 0.73 SLC7A2;SLC27A2;
GO:0002009 morphogenesis of an epithelium 10 2 2.37 61 0.73 0.73 ADAMTS12;RDH10;
GO:0043542 endothelial cell migration 10 2 2.37 62 0.73 0.73 STC1;EMP2;
GO:1904062 regulation of cation transmembrane trans… 10 2 2.37 63 0.73 0.73 PDZK1;NA;
GO:0010594 regulation of endothelial cell migration 10 2 2.37 64 0.73 0.73 STC1;EMP2;
GO:1904951 positive regulation of establishment of … 10 2 2.37 65 0.73 0.73 PDZK1;NA;
GO:0019725 cellular homeostasis 10 2 2.37 66 0.73 0.73 STC1;CCL2;
GO:0032386 regulation of intracellular transport 10 2 2.37 67 0.73 0.73 PDZK1;NA;
GO:0071495 cellular response to endogenous stimulus 29 6 6.86 68 0.73 0.73 TFPI;COL5A2;PDGFC;ADAMTS12;STC1;CCL2;
GO:0070887 cellular response to chemical stimulus 56 12 13.25 69 0.73 0.73 TNFRSF9;GBP4;TFPI;COL5A2;PDGFC;ADAMTS12;STC1;CSF2RA;CSF2RA;CSF2RA;INA;CCL2;
GO:0001525 angiogenesis 15 3 3.55 70 0.73 0.73 COL4A4;EMP2;CCL2;
GO:0098655 cation transmembrane transport 15 3 3.55 71 0.73 0.73 PDZK1;NA;SLC7A2;
GO:0034622 cellular protein-containing complex asse… 15 3 3.55 72 0.73 0.73 PDGFC;NAV3;ARHGAP28;
GO:0071396 cellular response to lipid 15 3 3.55 73 0.73 0.73 TFPI;STC1;CCL2;
GO:0007165 signal transduction 78 17 18.46 74 0.74 0.74 TNFRSF9;PDGFC;TENM2;ADAMTS12;TAS2R1;EYA4;EYA4;STC1;CSF2RA;CSF2RA;GPR50;CSF2RA;MPP7;PIK3AP1;EMP2;CCL2;ARHGAP28;
GO:0008284 positive regulation of cell proliferatio… 20 4 4.73 75 0.74 0.74 PDGFC;CDCA7L;EMP2;SERPINB7;
GO:0060548 negative regulation of cell death 20 4 4.73 76 0.74 0.74 EYA4;EYA4;CCL2;EVI2B;
GO:0000003 reproduction 20 4 4.73 77 0.74 0.74 RDH10;STC1;GTSF1;EMP2;
GO:0033554 cellular response to stress 20 4 4.73 78 0.74 0.74 EYA4;EYA4;STC1;CCL2;
GO:0043066 negative regulation of apoptotic process 20 4 4.73 79 0.74 0.74 EYA4;EYA4;CCL2;EVI2B;
GO:0043069 negative regulation of programmed cell d… 20 4 4.73 80 0.74 0.74 EYA4;EYA4;CCL2;EVI2B;
GO:0022414 reproductive process 20 4 4.73 81 0.74 0.74 RDH10;STC1;GTSF1;EMP2;
GO:0035239 tube morphogenesis 25 5 5.92 82 0.75 0.75 COL4A4;ADAMTS12;RDH10;EMP2;CCL2;
GO:0030334 regulation of cell migration 25 5 5.92 83 0.75 0.75 PDGFC;STC1;NAV3;EMP2;CCL2;
GO:0051050 positive regulation of transport 25 5 5.92 84 0.75 0.75 PDZK1;NA;ITLN1;STC1;CCL2;
GO:0065003 protein-containing complex assembly 25 5 5.92 85 0.75 0.75 ITLN1;PDGFC;MPP7;NAV3;ARHGAP28;
GO:0051716 cellular response to stimulus 96 21 22.71 86 0.76 0.76 TNFRSF9;GBP4;TFPI;COL5A2;PDGFC;TENM2;ADAMTS12;TAS2R1;EYA4;EYA4;STC1;CSF2RA;CSF2RA;GPR50;CSF2RA;INA;MPP7;PIK3AP1;EMP2;CCL2;ARHGAP28;
GO:0006793 phosphorus metabolic process 48 10 11.36 87 0.76 0.76 ITLN1;PDGFC;EYA4;EYA4;CSF2RA;CSF2RA;CSF2RA;PIK3AP1;EMP2;CCL2;
GO:0006796 phosphate-containing compound metabolic … 48 10 11.36 88 0.76 0.76 ITLN1;PDGFC;EYA4;EYA4;CSF2RA;CSF2RA;CSF2RA;PIK3AP1;EMP2;CCL2;
GO:0014070 response to organic cyclic compound 16 3 3.79 89 0.78 0.78 TFPI;STC1;CCL2;
GO:0071229 cellular response to acid chemical 11 2 2.60 90 0.78 0.78 COL5A2;PDGFC;
GO:0070374 positive regulation of ERK1 and ERK2 cas… 11 2 2.60 91 0.78 0.78 PDGFC;CCL2;
GO:0048015 phosphatidylinositol-mediated signaling 11 2 2.60 92 0.78 0.78 PDGFC;PIK3AP1;
GO:0048017 inositol lipid-mediated signaling 11 2 2.60 93 0.78 0.78 PDGFC;PIK3AP1;
GO:0014065 phosphatidylinositol 3-kinase signaling 11 2 2.60 94 0.78 0.78 PDGFC;PIK3AP1;
GO:0014066 regulation of phosphatidylinositol 3-kin… 11 2 2.60 95 0.78 0.78 PDGFC;PIK3AP1;
GO:0052547 regulation of peptidase activity 11 2 2.60 96 0.78 0.78 TFPI;SERPINB7;
GO:0080135 regulation of cellular response to stres… 11 2 2.60 97 0.78 0.78 EYA4;EYA4;
GO:0045786 negative regulation of cell cycle 11 2 2.60 98 0.78 0.78 CCL2;EVI2B;
GO:0070838 divalent metal ion transport 11 2 2.60 99 0.78 0.78 STC1;CCL2;
GO:0048598 embryonic morphogenesis 11 2 2.60 100 0.78 0.78 COL5A2;RDH10;
write.csv(res_top_GO, "pairwise_test_results/cGFR.cGF/cGFR.cGF.GO.enrichment.csv")

This is a ranking of which GO terms are enriched according to two different testing strategies, elim and Fisher.classic. these are, roughly, probability scores and can be broadly interpreted the same way as p-scores.

14. Visualizing the results:

nodes <- showSigOfNodes(top_GO_data, score(result_top_GO_elim), firstSigNodes = 3,
               useInfo = 'def')

png("pairwise_test_results/cGFR.cGF/tGFR.tGFC.GO.nodes.png")
nodes
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 16 
## Number of Edges = 19 
## 
## $complete.dag
## [1] "A graph with 16 nodes."
dev.off()
## png 
##   2
nodes
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 16 
## Number of Edges = 19 
## 
## $complete.dag
## [1] "A graph with 16 nodes."

15. Pathway enrichment analysis via Reactome

Reactome maps

library(reactome.db)
library(DOSE)
library(ReactomePA)
entrez_ids <- mapIds(hta20transcriptcluster.db, 
      keys = rownames(eset), 
      keytype = "PROBEID",
      column = "ENTREZID",
      multiVals = "first")

reactome_enrich <- enrichPathway(gene = entrez_ids[diff.probe], 
                                universe = entrez_ids[c(diff.probe, 
                                                        back.genes)],
                                organism = "human",
                                pvalueCutoff = 1,
                                qvalueCutoff = 1, 
                                readable = TRUE)

reactome_enrich@result$Description <- paste0(str_sub(
                                    reactome_enrich@result$Description, 1, 20),
                                    "...")

write.csv(as.data.frame(reactome_enrich), "pairwise_test_results/cGFR.cGF/cGFR.cGF.reactome.enrichment.csv")

kable(reactome_enrich) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "400px")
ID Description GeneRatio BgRatio pvalue p.adjust qvalue geneID Count
R-HSA-1280215 R-HSA-1280215 Cytokine Signaling i… 4/17 11/114 0.0583678 0.7388291 0.7388291 TNFRSF9/CCL2/CSF2RA/GBP4 4
R-HSA-168256 R-HSA-168256 Immune System… 7/17 31/114 0.1345785 0.7388291 0.7388291 TNFRSF9/PIK3AP1/CCL2/SLC27A2/CSF2RA/ITLN1/GBP4 7
R-HSA-162582 R-HSA-162582 Signal Transduction… 7/17 34/114 0.2031929 0.7388291 0.7388291 COL5A2/PIK3AP1/COL4A4/TAS2R1/CSF2RA/PDGFC/RDH10 7
R-HSA-9006934 R-HSA-9006934 Signaling by Recepto… 3/17 11/114 0.2110940 0.7388291 0.7388291 COL5A2/COL4A4/PDGFC 3
R-HSA-1643685 R-HSA-1643685 Disease… 3/17 16/114 0.4395847 0.9104492 0.9104492 PIK3AP1/CSF2RA/ADAMTS12 3
R-HSA-422475 R-HSA-422475 Axon guidance… 2/17 12/114 0.5647665 0.9104492 0.9104492 COL5A2/COL4A4 2
R-HSA-392499 R-HSA-392499 Metabolism of protei… 3/17 19/114 0.5705701 0.9104492 0.9104492 CCL2/CSF2RA/ADAMTS12 3
R-HSA-1266738 R-HSA-1266738 Developmental Biolog… 2/17 16/114 0.7345116 0.9104492 0.9104492 COL5A2/COL4A4 2
R-HSA-1474244 R-HSA-1474244 Extracellular matrix… 2/17 16/114 0.7345116 0.9104492 0.9104492 COL5A2/COL4A4 2
R-HSA-168249 R-HSA-168249 Innate Immune System… 2/17 16/114 0.7345116 0.9104492 0.9104492 SLC27A2/ITLN1 2
R-HSA-109582 R-HSA-109582 Hemostasis… 1/17 11/114 0.8454171 0.9104492 0.9104492 TFPI 1
R-HSA-556833 R-HSA-556833 Metabolism of lipids… 1/17 11/114 0.8454171 0.9104492 0.9104492 SLC27A2 1
R-HSA-597592 R-HSA-597592 Post-translational p… 1/17 11/114 0.8454171 0.9104492 0.9104492 ADAMTS12 1
R-HSA-1430728 R-HSA-1430728 Metabolism… 1/17 23/114 0.9845429 0.9845429 0.9845429 SLC27A2 1

16. Visualizing reactome results

Barplots, showing reaction categories represented and their respective p-values:

png("pairwise_test_results/cGFR.cGF/cGFR.cGF.reactome.barplot.png")
barplot(reactome_enrich)
dev.off()
## png 
##   2
barplot(reactome_enrich)

png("pairwise_test_results/cGFR.cGF/cGFR.cGF.reactome.emapplot.png")
emapplot(reactome_enrich, showCategory = 10)
dev.off()
## png 
##   2
emapplot(reactome_enrich, showCategory = 10)

In this plot, color is p-value, size is proportional to the number of DE genes that mapped to the pathway, and the edge thickness is proportional to the number of shared genes between the units. This is useful for examining the interdependence of the pathways; which results should be taken more serioiusly, are independent? This particular plot shows the top ten most significant pathways, regardless of their p-values