We went over some Tableau genes in our top genes of non-EBV associated pathologies compared to top genes of many studies we found the top genes by fold change values and tested for predictive accuracy and found to be better than 70% and some even better than 90% in classifying their respective class in that study.
We now want to compare these genes that we viewed relationships of fold change from acute infectious mononucleosis (AIM) and Chronic Active Epstein-Barr Virus (CAEBV). We compared 12 genes that showed some positive and negative correlations as far as how big of a magnitude in the same or opposite direction the gene expression fold change of pathology compared to baseline pathology. Here are those genes and what we found.
MT1G — very up regulated positive correlation with AIM & CAEBV, Fibromyalgia (FM) and chronic fatigue syndrome (CFS), slightly up regulated in Lyme disease (LD)
PRIM1 — very up regulated in FM & LD, slightly up regulated in CFS & CAEBV
TNFAIP6 — very down regulated in AIM, FM, LD, and uterine leiomyoma (UL), but up regulated in CFS, and slightly up in CAEBV
ANKRD22 — very upregulated in CAEBV & UL, slightly up in AIM, down regulated in CFS, LD, & Autism
ASPM — very up in AIM & UL, slight up in CAEBV, CFS, & LD
HISTIH3B — very up in AIM and UL, slight up in CAEBV & LD
OLR1 — very down regulated in AIM & CAEBV & LD, and very up regulated in UL & CFS
IRG1 — not in any sample
KIF11 — very up in AIM & CAEBV & UL & FM, slight up in CFS
ILG — very down in AIM & CAEBV & FM, slight down in LD, and slight up in CFS
ILIA — very down in AIM & CAEBV, slight down in CFS & LD
DTL — very up in AIM & FM, slight up in UL & LD & CFS
FFAR2 — very down in AIM & CAEBV & LD & CFS, but very up in FM & UL
GPR84 — very down in AIM, slight down in CAEBV & CFS, slight up in UL
CCNA2 — very up in AIM & UL & FM, slight up in LD & CFS
CCL20 — very down in AIM & CAEBV & LD, but very up in CFS
Lets make a string vector of these genes to pull from these datasets.
relationalGenes <- c("ASPM","HISTIH3B","OLR1","IRG1","KIF11",
"ILG", "ILIA", "DTL", "FFAR2", "GPR84",
"CCNA2", "CCL20")
relationalGenes
## [1] "ASPM" "HISTIH3B" "OLR1" "IRG1" "KIF11" "ILG"
## [7] "ILIA" "DTL" "FFAR2" "GPR84" "CCNA2" "CCL20"
Lets read in some packages
library(rmarkdown)
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
You can retrieve these data sets here at these links:
Lets read in the mono and EBV dataset first.
pathMono <- "path to CAEBV_genes_32670_FCs.csv"
setwd(pathMono)
monoEBV <- read.csv("CAEBV_genes_32670_FCs.csv") #32670 X 24
paged_table(monoEBV[1:10,])
Now lets read in the Fibromyalgia then the other datasets of non-EBV pathologies Chronic Fatigue Syndrome, Lyme Disease, and Uterine Leiomyoma.
pathFM <- "path to GeneSymbols_FM_FCs_filtered.csv"
setwd(pathFM)
FM <- read.csv("GeneSymbols_FM_FCs_filtered.csv") # 20142 X 17
paged_table(FM[1:10,])
Now read in the chronic fatigue syndrome dataset.
pathCFS <- "path to CFS_data_filtered_ordered_GSE293840.csv"
setwd(pathCFS)
CFS <- read.csv("CFS_data_filtered_ordered_GSE293840.csv") # 39378 X 174
paged_table(CFS[1:10,])
Lets read in the lyme disease dataset.
pathLyme <- "path to LymeDiseaseNormalizedFCsMeansAdded_June4th2026_ABS_min-x.csv"
setwd(pathLyme)
Lyme <- read.csv("LymeDiseaseNormalizedFCsMeansAdded_June4th2026_ABS_min-x.csv") #19526 X 95
paged_table(Lyme[1:10,])
Now we will add in the uterine leiomyoma dataset.
pathUL <- "path to UL_all_FCs_58735_notFiltered_hasNaNs_hasINf.csv"
setwd(pathUL)
UL <- read.csv("UL_all_FCs_58735_notFiltered_hasNaNs_hasINf.csv") #58735X36
paged_table(UL[1:10,])
Now we have 5 datasets to combine and get the genes from our relationalGenes string of genes.
colnames(monoEBV)
## [1] "ID" "gene" "GSM2279022_AIM"
## [4] "GSM2279023_AIM" "GSM2279024_AIM" "GSM2279025_CAEBV"
## [7] "GSM2279026_AIM" "GSM2279027_CAEBV" "GSM2279028_CAEBV"
## [10] "GSM2279029_CAEBV" "GSM2279030_CAEBV" "GSM2279031_healthy"
## [13] "GSM2279032_healthy" "GSM2279033_healthy" "GSM2279034_healthy"
## [16] "GSM2279035_healthy" "GSM2279036_healthy" "GSM2279037_AIM"
## [19] "GSM2279038_AIM" "AIM_mean" "CAEBV_mean"
## [22] "healthy_mean" "FC_AIM_healthy" "FC_CAEBV_healthy"
We only want the actual samples and the gene name for all of these datasets, so we will omit the mean and FCs from each dataset as well as the ensembl ID and other features if present and not a sample of the gene name.
monoEBV_strict <- monoEBV[which(monoEBV$gene %in% relationalGenes), c(2:19)]
paged_table(monoEBV_strict) #9 of the 12 relational genes
Lets make a class string for the monoEBV.
aim <- grep("AIM",colnames(monoEBV_strict))
caebv <- grep("CAEBV",colnames(monoEBV_strict))
healthy <- grep("healthy", colnames(monoEBV_strict))
classMono <- "gene"
classMono[aim] <- "AIM"
classMono[caebv] <- "CAEBV"
classMono[healthy] <- "healthy mono caebv"
colnames(monoEBV_strict)
## [1] "gene" "GSM2279022_AIM" "GSM2279023_AIM"
## [4] "GSM2279024_AIM" "GSM2279025_CAEBV" "GSM2279026_AIM"
## [7] "GSM2279027_CAEBV" "GSM2279028_CAEBV" "GSM2279029_CAEBV"
## [10] "GSM2279030_CAEBV" "GSM2279031_healthy" "GSM2279032_healthy"
## [13] "GSM2279033_healthy" "GSM2279034_healthy" "GSM2279035_healthy"
## [16] "GSM2279036_healthy" "GSM2279037_AIM" "GSM2279038_AIM"
classMono
## [1] "gene" "AIM" "AIM"
## [4] "AIM" "CAEBV" "AIM"
## [7] "CAEBV" "CAEBV" "CAEBV"
## [10] "CAEBV" "healthy mono caebv" "healthy mono caebv"
## [13] "healthy mono caebv" "healthy mono caebv" "healthy mono caebv"
## [16] "healthy mono caebv" "AIM" "AIM"
Those match without arranging order of the samples by type.
Lets now get the fibromyalgia dataset samples.
colnames(FM)
## [1] "gene_id" "gene_name" "Healthy1" "Healthy2"
## [5] "Healthy3" "Healthy4" "Healthy5" "myo1"
## [9] "myo2" "myo3" "myo4" "myo5"
## [13] "myo6" "myo7" "healthy_Mean" "myo_Mean"
## [17] "FC_myo_healthy"
FM_strict <- FM[which(FM$gene_name %in% relationalGenes),c(2:14)]
paged_table(FM_strict) #4 of the 12 relational genes
colnames(FM_strict)
## [1] "gene_name" "Healthy1" "Healthy2" "Healthy3" "Healthy4" "Healthy5"
## [7] "myo1" "myo2" "myo3" "myo4" "myo5" "myo6"
## [13] "myo7"
classFM <- "gene"
healthyFM <- grep("Healthy", colnames(FM_strict))
fibromyalgia <- grep("myo",colnames(FM_strict))
classFM[healthyFM] <- 'healthy FM'
classFM[fibromyalgia] <- 'fibromyalgia'
classFM
## [1] "gene" "healthy FM" "healthy FM" "healthy FM" "healthy FM"
## [6] "healthy FM" "fibromyalgia" "fibromyalgia" "fibromyalgia" "fibromyalgia"
## [11] "fibromyalgia" "fibromyalgia" "fibromyalgia"
Lets now get the Chronic Fatigue Syndrome data in same format.
colnames(CFS)
## [1] "gene_id" "gene_name" "Ensembl_transcript"
## [4] "control_1" "control_2" "control_3"
## [7] "case_4" "control_5" "case_6"
## [10] "control_7" "control_8" "case_11"
## [13] "case_12" "case_13" "case_14"
## [16] "control_15" "case_16" "control_17"
## [19] "case_18" "control_21" "control_22"
## [22] "case_23" "control_24" "case_25"
## [25] "case_26" "case_27" "case_28"
## [28] "case_31" "control_32" "case_33"
## [31] "control_34" "case_35" "control_36"
## [34] "control_37" "control_38" "case_41"
## [37] "case_42" "control_43" "control_44"
## [40] "control_45" "case_46" "control_47"
## [43] "control_48" "control_51" "case_52"
## [46] "case_53" "control_54" "control_55"
## [49] "case_56" "control_57" "case_58"
## [52] "control_59" "control_60" "case_63"
## [55] "case_64" "case_65" "control_66"
## [58] "case_67" "control_68" "case_69"
## [61] "case_70" "case_71" "control_72"
## [64] "case_139" "case_140" "case_141"
## [67] "case_142" "control_143" "control_145"
## [70] "control_146" "control_147" "case_148"
## [73] "case_150" "control_151" "control_152"
## [76] "case_153" "case_154" "control_155"
## [79] "case_156" "case_157" "case_159"
## [82] "case_160" "control_161" "control_162"
## [85] "case_163" "case_164" "control_165"
## [88] "case_166" "case_167" "control_168"
## [91] "control_169" "case_170" "case_171"
## [94] "case_173" "case_174" "case_177"
## [97] "case_178" "case_179" "control_181"
## [100] "case_182" "control_183" "control_184"
## [103] "control_185" "case_186" "control_187"
## [106] "control_188" "control_189" "control_190"
## [109] "case_192" "control_193" "control_194"
## [112] "control_195" "case_196" "case_197"
## [115] "case_198" "control_199" "case_200"
## [118] "case_201" "case_202" "case_204"
## [121] "case_205" "case_206" "control_207"
## [124] "control_208" "control_209" "case_211"
## [127] "control_212" "case_213" "control_214"
## [130] "control_215" "case_219" "control_220"
## [133] "case_221" "case_222" "case_223"
## [136] "case_224" "case_225" "case_226"
## [139] "case_230" "case_231" "control_232"
## [142] "case_233" "case_235" "control_236"
## [145] "case_240" "case_241" "case_242"
## [148] "control_243" "control_244" "case_245"
## [151] "control_246" "case_247" "case_248"
## [154] "case_251" "control_252" "case_253"
## [157] "case_254" "control_255" "control_256"
## [160] "control_257" "case_258" "case_259"
## [163] "control_260" "control_264" "case_265"
## [166] "case_266" "control_267" "control_268"
## [169] "case_270" "control_271" "case_272"
## [172] "healthy_mean" "CSF_mean" "foldchange_CSF_healthy"
CFS_strict <- CFS[which(CFS$gene_name %in% relationalGenes),c(2,4:171)]
colnames(CFS_strict) #8 genes of the 12 relational genes
## [1] "gene_name" "control_1" "control_2" "control_3" "case_4"
## [6] "control_5" "case_6" "control_7" "control_8" "case_11"
## [11] "case_12" "case_13" "case_14" "control_15" "case_16"
## [16] "control_17" "case_18" "control_21" "control_22" "case_23"
## [21] "control_24" "case_25" "case_26" "case_27" "case_28"
## [26] "case_31" "control_32" "case_33" "control_34" "case_35"
## [31] "control_36" "control_37" "control_38" "case_41" "case_42"
## [36] "control_43" "control_44" "control_45" "case_46" "control_47"
## [41] "control_48" "control_51" "case_52" "case_53" "control_54"
## [46] "control_55" "case_56" "control_57" "case_58" "control_59"
## [51] "control_60" "case_63" "case_64" "case_65" "control_66"
## [56] "case_67" "control_68" "case_69" "case_70" "case_71"
## [61] "control_72" "case_139" "case_140" "case_141" "case_142"
## [66] "control_143" "control_145" "control_146" "control_147" "case_148"
## [71] "case_150" "control_151" "control_152" "case_153" "case_154"
## [76] "control_155" "case_156" "case_157" "case_159" "case_160"
## [81] "control_161" "control_162" "case_163" "case_164" "control_165"
## [86] "case_166" "case_167" "control_168" "control_169" "case_170"
## [91] "case_171" "case_173" "case_174" "case_177" "case_178"
## [96] "case_179" "control_181" "case_182" "control_183" "control_184"
## [101] "control_185" "case_186" "control_187" "control_188" "control_189"
## [106] "control_190" "case_192" "control_193" "control_194" "control_195"
## [111] "case_196" "case_197" "case_198" "control_199" "case_200"
## [116] "case_201" "case_202" "case_204" "case_205" "case_206"
## [121] "control_207" "control_208" "control_209" "case_211" "control_212"
## [126] "case_213" "control_214" "control_215" "case_219" "control_220"
## [131] "case_221" "case_222" "case_223" "case_224" "case_225"
## [136] "case_226" "case_230" "case_231" "control_232" "case_233"
## [141] "case_235" "control_236" "case_240" "case_241" "case_242"
## [146] "control_243" "control_244" "case_245" "control_246" "case_247"
## [151] "case_248" "case_251" "control_252" "case_253" "case_254"
## [156] "control_255" "control_256" "control_257" "case_258" "case_259"
## [161] "control_260" "control_264" "case_265" "case_266" "control_267"
## [166] "control_268" "case_270" "control_271" "case_272"
classCFS <- "gene"
cfs <- grep('case',colnames(CFS_strict))
healthyCFS <- grep('control',colnames(CFS_strict))
classCFS[cfs] <- "Chronic Fatigue Syndrome"
classCFS[healthyCFS] <- "healthy CFS"
classCFS
## [1] "gene" "healthy CFS"
## [3] "healthy CFS" "healthy CFS"
## [5] "Chronic Fatigue Syndrome" "healthy CFS"
## [7] "Chronic Fatigue Syndrome" "healthy CFS"
## [9] "healthy CFS" "Chronic Fatigue Syndrome"
## [11] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [13] "Chronic Fatigue Syndrome" "healthy CFS"
## [15] "Chronic Fatigue Syndrome" "healthy CFS"
## [17] "Chronic Fatigue Syndrome" "healthy CFS"
## [19] "healthy CFS" "Chronic Fatigue Syndrome"
## [21] "healthy CFS" "Chronic Fatigue Syndrome"
## [23] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [25] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [27] "healthy CFS" "Chronic Fatigue Syndrome"
## [29] "healthy CFS" "Chronic Fatigue Syndrome"
## [31] "healthy CFS" "healthy CFS"
## [33] "healthy CFS" "Chronic Fatigue Syndrome"
## [35] "Chronic Fatigue Syndrome" "healthy CFS"
## [37] "healthy CFS" "healthy CFS"
## [39] "Chronic Fatigue Syndrome" "healthy CFS"
## [41] "healthy CFS" "healthy CFS"
## [43] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [45] "healthy CFS" "healthy CFS"
## [47] "Chronic Fatigue Syndrome" "healthy CFS"
## [49] "Chronic Fatigue Syndrome" "healthy CFS"
## [51] "healthy CFS" "Chronic Fatigue Syndrome"
## [53] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [55] "healthy CFS" "Chronic Fatigue Syndrome"
## [57] "healthy CFS" "Chronic Fatigue Syndrome"
## [59] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [61] "healthy CFS" "Chronic Fatigue Syndrome"
## [63] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [65] "Chronic Fatigue Syndrome" "healthy CFS"
## [67] "healthy CFS" "healthy CFS"
## [69] "healthy CFS" "Chronic Fatigue Syndrome"
## [71] "Chronic Fatigue Syndrome" "healthy CFS"
## [73] "healthy CFS" "Chronic Fatigue Syndrome"
## [75] "Chronic Fatigue Syndrome" "healthy CFS"
## [77] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [79] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [81] "healthy CFS" "healthy CFS"
## [83] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [85] "healthy CFS" "Chronic Fatigue Syndrome"
## [87] "Chronic Fatigue Syndrome" "healthy CFS"
## [89] "healthy CFS" "Chronic Fatigue Syndrome"
## [91] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [93] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [95] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [97] "healthy CFS" "Chronic Fatigue Syndrome"
## [99] "healthy CFS" "healthy CFS"
## [101] "healthy CFS" "Chronic Fatigue Syndrome"
## [103] "healthy CFS" "healthy CFS"
## [105] "healthy CFS" "healthy CFS"
## [107] "Chronic Fatigue Syndrome" "healthy CFS"
## [109] "healthy CFS" "healthy CFS"
## [111] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [113] "Chronic Fatigue Syndrome" "healthy CFS"
## [115] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [117] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [119] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [121] "healthy CFS" "healthy CFS"
## [123] "healthy CFS" "Chronic Fatigue Syndrome"
## [125] "healthy CFS" "Chronic Fatigue Syndrome"
## [127] "healthy CFS" "healthy CFS"
## [129] "Chronic Fatigue Syndrome" "healthy CFS"
## [131] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [133] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [135] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [137] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [139] "healthy CFS" "Chronic Fatigue Syndrome"
## [141] "Chronic Fatigue Syndrome" "healthy CFS"
## [143] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [145] "Chronic Fatigue Syndrome" "healthy CFS"
## [147] "healthy CFS" "Chronic Fatigue Syndrome"
## [149] "healthy CFS" "Chronic Fatigue Syndrome"
## [151] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [153] "healthy CFS" "Chronic Fatigue Syndrome"
## [155] "Chronic Fatigue Syndrome" "healthy CFS"
## [157] "healthy CFS" "healthy CFS"
## [159] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [161] "healthy CFS" "healthy CFS"
## [163] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [165] "healthy CFS" "healthy CFS"
## [167] "Chronic Fatigue Syndrome" "healthy CFS"
## [169] "Chronic Fatigue Syndrome"
Lets do the same thing or process to the UL dataset.
colnames(UL)
## [1] "GeneID" "GeneSymbol" "GeneBiotype"
## [4] "MyoF.348_S12_white" "MyoF.428_S11_white" "MyoF.483_S8_black"
## [7] "MyoF.526_S10_white" "MyoF.UI.10_S7_black" "MyoF.UI.13_S9_black"
## [10] "MyoN.432_S4_white" "MyoN.514_S2_black" "MyoN.549_S5_white"
## [13] "MyoN.UI.20_S1_black" "MyoN.UI.43_S3_black" "MyoN.UI.8_S6_white"
## [16] "UF.372_S18_white" "UF.428_S17_white" "UF.483_S14_black"
## [19] "UF.526_S16_white" "UF.UI.13_S15_black" "UF.UI.23_S13_black"
## [22] "normal_all_mean" "UF_all_mean" "UF_all_risk_mean"
## [25] "normal_white_mean" "UF_white_mean" "UF_risk_white_mean"
## [28] "normal_black_mean" "UF_black_mean" "UF_risk_black_mean"
## [31] "UF_normal_all_FC" "UF_risk_normal_all_FC" "UF_normal_white_FC"
## [34] "UF_risk_white_FC" "UF_normal_black_FC" "UF_risk_black_FC"
UL_strict <- UL[which(UL$GeneSymbol %in% relationalGenes),c(2,4:21)]
colnames(UL_strict)
## [1] "GeneSymbol" "MyoF.348_S12_white" "MyoF.428_S11_white"
## [4] "MyoF.483_S8_black" "MyoF.526_S10_white" "MyoF.UI.10_S7_black"
## [7] "MyoF.UI.13_S9_black" "MyoN.432_S4_white" "MyoN.514_S2_black"
## [10] "MyoN.549_S5_white" "MyoN.UI.20_S1_black" "MyoN.UI.43_S3_black"
## [13] "MyoN.UI.8_S6_white" "UF.372_S18_white" "UF.428_S17_white"
## [16] "UF.483_S14_black" "UF.526_S16_white" "UF.UI.13_S15_black"
## [19] "UF.UI.23_S13_black"
Note that in this study the MyoF is at risk tissue next to the uterine fibroid, the MyoN is normal myometrial tissue from somebody completely different, and the UF is the uterine fibroid.
classUL <- "gene"
healthyUL <- grep("MyoN",colnames(UL_strict))
ul <- grep("UF", colnames(UL_strict))
ulRisk <- grep("MyoF", colnames(UL_strict))
classUL[healthyUL] <- 'healthy uterine tissue'
classUL[ul] <- 'uterine leiomyoma'
classUL[ulRisk] <- 'UL surrounding tissue'
classUL
## [1] "gene" "UL surrounding tissue" "UL surrounding tissue"
## [4] "UL surrounding tissue" "UL surrounding tissue" "UL surrounding tissue"
## [7] "UL surrounding tissue" "healthy uterine tissue" "healthy uterine tissue"
## [10] "healthy uterine tissue" "healthy uterine tissue" "healthy uterine tissue"
## [13] "healthy uterine tissue" "uterine leiomyoma" "uterine leiomyoma"
## [16] "uterine leiomyoma" "uterine leiomyoma" "uterine leiomyoma"
## [19] "uterine leiomyoma"
Now for the Lyme disease data to be arranged as the others.
colnames(Lyme)
## [1] "Gene" "healthyControl_1"
## [3] "healthyControl_2" "healthyControl_3"
## [5] "healthyControl_4" "healthyControl_5"
## [7] "healthyControl_6" "healthyControl_7"
## [9] "healthyControl_8" "healthyControl_9"
## [11] "healthyControl_10" "healthyControl_11"
## [13] "healthyControl_12" "healthyControl_13"
## [15] "healthyControl_14" "healthyControl_15"
## [17] "healthyControl_16" "healthyControl_17"
## [19] "healthyControl_18" "healthyControl_19"
## [21] "healthyControl_20" "healthyControl_21"
## [23] "acuteLymeDisease_1" "acuteLymeDisease_2"
## [25] "acuteLymeDisease_3" "acuteLymeDisease_4"
## [27] "acuteLymeDisease_5" "acuteLymeDisease_6"
## [29] "acuteLymeDisease_7" "acuteLymeDisease_8"
## [31] "acuteLymeDisease_9" "acuteLymeDisease_10"
## [33] "acuteLymeDisease_11" "acuteLymeDisease_12"
## [35] "acuteLymeDisease_13" "acuteLymeDisease_14"
## [37] "acuteLymeDisease_15" "acuteLymeDisease_16"
## [39] "acuteLymeDisease_17" "acuteLymeDisease_18"
## [41] "acuteLymeDisease_19" "acuteLymeDisease_20"
## [43] "acuteLymeDisease_21" "acuteLymeDisease_22"
## [45] "acuteLymeDisease_23" "acuteLymeDisease_24"
## [47] "acuteLymeDisease_25" "acuteLymeDisease_26"
## [49] "acuteLymeDisease_27" "acuteLymeDisease_28"
## [51] "Antibodies_1month_1" "Antibodies_1month_2"
## [53] "Antibodies_1month_3" "Antibodies_1month_4"
## [55] "Antibodies_1month_5" "Antibodies_1month_6"
## [57] "Antibodies_1month_7" "Antibodies_1month_8"
## [59] "Antibodies_1month_9" "Antibodies_1month_10"
## [61] "Antibodies_1month_11" "Antibodies_1month_12"
## [63] "Antibodies_1month_13" "Antibodies_1month_14"
## [65] "Antibodies_1month_15" "Antibodies_1month_16"
## [67] "Antibodies_1month_17" "Antibodies_1month_18"
## [69] "Antibodies_1month_19" "Antibodies_1month_20"
## [71] "Antibodies_1month_21" "Antibodies_1month_22"
## [73] "Antibodies_1month_23" "Antibodies_1month_24"
## [75] "Antibodies_1month_25" "Antibodies_1month_26"
## [77] "Antibodies_1month_27" "Antibodies_6months_1"
## [79] "Antibodies_6months_2" "Antibodies_6months_3"
## [81] "Antibodies_6months_4" "Antibodies_6months_5"
## [83] "Antibodies_6months_6" "Antibodies_6months_7"
## [85] "Antibodies_6months_8" "Antibodies_6months_9"
## [87] "Antibodies_6months_10" "healthy_mean"
## [89] "acute_mean" "month1_mean"
## [91] "month6_mean" "foldchange_acute_healthy"
## [93] "foldchange_1month_healthy" "foldchange_6month_healthy"
## [95] "foldchange_6month_acute"
Lyme_strict <- Lyme[which(Lyme$Gene %in% relationalGenes),c(1:87)]
colnames(Lyme_strict) #8 genes of the 12
## [1] "Gene" "healthyControl_1" "healthyControl_2"
## [4] "healthyControl_3" "healthyControl_4" "healthyControl_5"
## [7] "healthyControl_6" "healthyControl_7" "healthyControl_8"
## [10] "healthyControl_9" "healthyControl_10" "healthyControl_11"
## [13] "healthyControl_12" "healthyControl_13" "healthyControl_14"
## [16] "healthyControl_15" "healthyControl_16" "healthyControl_17"
## [19] "healthyControl_18" "healthyControl_19" "healthyControl_20"
## [22] "healthyControl_21" "acuteLymeDisease_1" "acuteLymeDisease_2"
## [25] "acuteLymeDisease_3" "acuteLymeDisease_4" "acuteLymeDisease_5"
## [28] "acuteLymeDisease_6" "acuteLymeDisease_7" "acuteLymeDisease_8"
## [31] "acuteLymeDisease_9" "acuteLymeDisease_10" "acuteLymeDisease_11"
## [34] "acuteLymeDisease_12" "acuteLymeDisease_13" "acuteLymeDisease_14"
## [37] "acuteLymeDisease_15" "acuteLymeDisease_16" "acuteLymeDisease_17"
## [40] "acuteLymeDisease_18" "acuteLymeDisease_19" "acuteLymeDisease_20"
## [43] "acuteLymeDisease_21" "acuteLymeDisease_22" "acuteLymeDisease_23"
## [46] "acuteLymeDisease_24" "acuteLymeDisease_25" "acuteLymeDisease_26"
## [49] "acuteLymeDisease_27" "acuteLymeDisease_28" "Antibodies_1month_1"
## [52] "Antibodies_1month_2" "Antibodies_1month_3" "Antibodies_1month_4"
## [55] "Antibodies_1month_5" "Antibodies_1month_6" "Antibodies_1month_7"
## [58] "Antibodies_1month_8" "Antibodies_1month_9" "Antibodies_1month_10"
## [61] "Antibodies_1month_11" "Antibodies_1month_12" "Antibodies_1month_13"
## [64] "Antibodies_1month_14" "Antibodies_1month_15" "Antibodies_1month_16"
## [67] "Antibodies_1month_17" "Antibodies_1month_18" "Antibodies_1month_19"
## [70] "Antibodies_1month_20" "Antibodies_1month_21" "Antibodies_1month_22"
## [73] "Antibodies_1month_23" "Antibodies_1month_24" "Antibodies_1month_25"
## [76] "Antibodies_1month_26" "Antibodies_1month_27" "Antibodies_6months_1"
## [79] "Antibodies_6months_2" "Antibodies_6months_3" "Antibodies_6months_4"
## [82] "Antibodies_6months_5" "Antibodies_6months_6" "Antibodies_6months_7"
## [85] "Antibodies_6months_8" "Antibodies_6months_9" "Antibodies_6months_10"
classLyme <- "gene"
healthyLyme <- grep('healthy',colnames(Lyme_strict))
acute <- grep('acute', colnames(Lyme_strict))
lyme1 <- grep('1month', colnames(Lyme_strict))
lyme6 <- grep('6month', colnames(Lyme_strict))
classLyme[healthyLyme] <- "healthy before lyme disease"
classLyme[acute] <- "lyme disease acute"
classLyme[lyme1] <- "lyme disease 1 month"
classLyme[lyme6] <- "lyme disease 6 months"
classLyme
## [1] "gene" "healthy before lyme disease"
## [3] "healthy before lyme disease" "healthy before lyme disease"
## [5] "healthy before lyme disease" "healthy before lyme disease"
## [7] "healthy before lyme disease" "healthy before lyme disease"
## [9] "healthy before lyme disease" "healthy before lyme disease"
## [11] "healthy before lyme disease" "healthy before lyme disease"
## [13] "healthy before lyme disease" "healthy before lyme disease"
## [15] "healthy before lyme disease" "healthy before lyme disease"
## [17] "healthy before lyme disease" "healthy before lyme disease"
## [19] "healthy before lyme disease" "healthy before lyme disease"
## [21] "healthy before lyme disease" "healthy before lyme disease"
## [23] "lyme disease acute" "lyme disease acute"
## [25] "lyme disease acute" "lyme disease acute"
## [27] "lyme disease acute" "lyme disease acute"
## [29] "lyme disease acute" "lyme disease acute"
## [31] "lyme disease acute" "lyme disease acute"
## [33] "lyme disease acute" "lyme disease acute"
## [35] "lyme disease acute" "lyme disease acute"
## [37] "lyme disease acute" "lyme disease acute"
## [39] "lyme disease acute" "lyme disease acute"
## [41] "lyme disease acute" "lyme disease acute"
## [43] "lyme disease acute" "lyme disease acute"
## [45] "lyme disease acute" "lyme disease acute"
## [47] "lyme disease acute" "lyme disease acute"
## [49] "lyme disease acute" "lyme disease acute"
## [51] "lyme disease 1 month" "lyme disease 1 month"
## [53] "lyme disease 1 month" "lyme disease 1 month"
## [55] "lyme disease 1 month" "lyme disease 1 month"
## [57] "lyme disease 1 month" "lyme disease 1 month"
## [59] "lyme disease 1 month" "lyme disease 1 month"
## [61] "lyme disease 1 month" "lyme disease 1 month"
## [63] "lyme disease 1 month" "lyme disease 1 month"
## [65] "lyme disease 1 month" "lyme disease 1 month"
## [67] "lyme disease 1 month" "lyme disease 1 month"
## [69] "lyme disease 1 month" "lyme disease 1 month"
## [71] "lyme disease 1 month" "lyme disease 1 month"
## [73] "lyme disease 1 month" "lyme disease 1 month"
## [75] "lyme disease 1 month" "lyme disease 1 month"
## [77] "lyme disease 1 month" "lyme disease 6 months"
## [79] "lyme disease 6 months" "lyme disease 6 months"
## [81] "lyme disease 6 months" "lyme disease 6 months"
## [83] "lyme disease 6 months" "lyme disease 6 months"
## [85] "lyme disease 6 months" "lyme disease 6 months"
## [87] "lyme disease 6 months"
Lets look at the genes in the data by which genes in common among all these pathologies.
CFS_strict$gene_name
## [1] "CCL20" "DTL" "KIF11" "CCNA2" "ASPM" "OLR1" "FFAR2" "GPR84"
FM_strict$gene_name
## [1] "FFAR2" "KIF11" "DTL" "CCNA2"
Lyme_strict$Gene
## [1] "OLR1" "CCL20" "KIF11" "FFAR2" "GPR84" "CCNA2" "ASPM" "DTL"
monoEBV_strict$gene
## [1] "KIF11" "ASPM" "CCNA2" "DTL" "FFAR2" "IRG1" "GPR84" "CCL20" "OLR1"
UL_strict$GeneSymbol
## [1] "ASPM" "DTL" "CCL20" "CCNA2" "KIF11" "OLR1" "GPR84" "FFAR2"
It looks like the 4 genes that are limited in the uterine leiomyoma can be used in predicting the class of sample.
genes4 <- FM_strict$gene_name
We are using the fibromyalgia or FM data of 4 genes that are common to the other data sets.
CFS4 <- CFS_strict[which(CFS_strict$gene_name %in% genes4),]
mono4 <- monoEBV_strict[which(monoEBV_strict$gene %in% genes4),]
Lyme4 <- Lyme_strict[which(Lyme_strict$Gene %in% genes4),]
UL4 <- UL_strict[which(UL_strict$GeneSymbol %in% genes4),]
Lets make our matrices for each of these and add in each class feature we just made.
CFS4_t <- data.frame(t(CFS4[,2:169]))
colnames(CFS4_t) <- CFS4$gene_name
CFS4_t$class <- classCFS[2:length(classCFS)]
paged_table(CFS4_t[1:10,])
CFS4_t2 <- CFS4_t[,c(3,1,4,2,5)]
colnames(CFS4_t2)
## [1] "CCNA2" "DTL" "FFAR2" "KIF11" "class"
The above is the chronic fatigue syndrome, the next will be the fibromyalgia.
FM4_t <- data.frame(t(FM_strict[,2:13]))
colnames(FM4_t) <- FM_strict$gene_name
FM4_t$class <- classFM[2:length(classFM)]
paged_table(FM4_t)
FM4_t2 <- FM4_t[,c(4,3,1,2,5)]
colnames(FM4_t2)
## [1] "CCNA2" "DTL" "FFAR2" "KIF11" "class"
Now for the Lyme disease data matrix. We just made the CFS and FM matrices and alphabatized the gene features.
Lyme4_t <- data.frame(t(Lyme4[,2:87]))
colnames(Lyme4_t) <- Lyme4$Gene
Lyme4_t$class <- classLyme[2:length(classLyme)]
paged_table(Lyme4_t[1:10,])
Lyme4_t2 <- Lyme4_t[,c(3,4,2,1,5)]
colnames(Lyme4_t2)
## [1] "CCNA2" "DTL" "FFAR2" "KIF11" "class"
Next will be the UL matrix
UL4_t <- data.frame(t(UL4[,2:19]))
colnames(UL4_t) <- UL4$GeneSymbol
UL4_t$class <- classUL[2:length(classUL)]
paged_table(UL4_t[1:10,])
UL4_t2 <- UL4_t[,c(2,1,4,3,5)]
colnames(UL4_t2)
## [1] "CCNA2" "DTL" "FFAR2" "KIF11" "class"
Next will be the last matrix of the mono and EBV genes.
mono4_t <- data.frame(t(mono4[,2:18]))
colnames(mono4_t) <- mono4$gene
mono4_t$class <- classMono[2:length(classMono)]
paged_table(mono4_t[1:10,])
mono4_t2 <- mono4_t[,c(2,3,4,1,5)]
colnames(mono4_t2)
## [1] "CCNA2" "DTL" "FFAR2" "KIF11" "class"
Lets row bind all these samples together now that they have the same feature IDs by gene and class.
matrix5sets <- rbind(mono4_t2,FM4_t2,CFS4_t2,UL4_t2,Lyme4_t2)
paged_table(matrix5sets[c(1:10,50:75,100:125),])
write.csv(matrix5sets,'matrix4genes.csv', row.names=F)
Now lets replace the healthy samples to only have one sample name of healthy.
table(matrix5sets$class)
##
## AIM CAEBV
## 6 5
## Chronic Fatigue Syndrome fibromyalgia
## 93 7
## healthy before lyme disease healthy CFS
## 21 75
## healthy FM healthy mono caebv
## 5 6
## healthy uterine tissue lyme disease 1 month
## 6 27
## lyme disease 6 months lyme disease acute
## 10 28
## UL surrounding tissue uterine leiomyoma
## 6 6
healthy5 <- grep('healthy',matrix5sets$class)
matrix5sets$class[healthy5] <- 'healthy'
table(matrix5sets$class)
##
## AIM CAEBV Chronic Fatigue Syndrome
## 6 5 93
## fibromyalgia healthy lyme disease 1 month
## 7 113 27
## lyme disease 6 months lyme disease acute UL surrounding tissue
## 10 28 6
## uterine leiomyoma
## 6
write.csv(matrix5sets,'matrix5sets_healthy5into1healthy.csv', row.names=F)
matrix5sets$class <- as.factor(matrix5sets$class)
set.seed(125)
inTrain <- sample(1:301, .8*301)
training <- matrix5sets[inTrain,]
testing <- matrix5sets[-inTrain,]
table(training$class)
##
## AIM CAEBV Chronic Fatigue Syndrome
## 5 5 69
## fibromyalgia healthy lyme disease 1 month
## 6 91 26
## lyme disease 6 months lyme disease acute UL surrounding tissue
## 8 19 6
## uterine leiomyoma
## 5
table(testing$class)
##
## AIM CAEBV Chronic Fatigue Syndrome
## 1 0 24
## fibromyalgia healthy lyme disease 1 month
## 1 22 1
## lyme disease 6 months lyme disease acute UL surrounding tissue
## 2 9 0
## uterine leiomyoma
## 1
rf1 <- randomForest(training[1:4], training$class, mtry=3, ntree=5000, confusion=T)
rf1$confusion
## AIM CAEBV Chronic Fatigue Syndrome fibromyalgia
## AIM 5 0 0 0
## CAEBV 0 3 0 0
## Chronic Fatigue Syndrome 0 0 35 0
## fibromyalgia 0 0 0 2
## healthy 0 1 30 1
## lyme disease 1 month 0 0 0 0
## lyme disease 6 months 0 0 0 0
## lyme disease acute 0 0 0 0
## UL surrounding tissue 0 0 1 0
## uterine leiomyoma 0 0 1 0
## healthy lyme disease 1 month lyme disease 6 months
## AIM 0 0 0
## CAEBV 2 0 0
## Chronic Fatigue Syndrome 34 0 0
## fibromyalgia 2 0 0
## healthy 39 9 3
## lyme disease 1 month 7 13 1
## lyme disease 6 months 3 4 0
## lyme disease acute 5 6 0
## UL surrounding tissue 3 0 0
## uterine leiomyoma 3 0 0
## lyme disease acute UL surrounding tissue
## AIM 0 0
## CAEBV 0 0
## Chronic Fatigue Syndrome 0 0
## fibromyalgia 2 0
## healthy 5 1
## lyme disease 1 month 5 0
## lyme disease 6 months 1 0
## lyme disease acute 8 0
## UL surrounding tissue 0 0
## uterine leiomyoma 0 0
## uterine leiomyoma class.error
## AIM 0 0.0000000
## CAEBV 0 0.4000000
## Chronic Fatigue Syndrome 0 0.4927536
## fibromyalgia 0 0.6666667
## healthy 2 0.5714286
## lyme disease 1 month 0 0.5000000
## lyme disease 6 months 0 1.0000000
## lyme disease acute 0 0.5789474
## UL surrounding tissue 2 1.0000000
## uterine leiomyoma 1 0.8000000
Lets see if it will predict any better, 100% accuracy on the acute infectious mono in training, and chronic fatigue syndrome has 0% of being related to mono or EBV or Lyme disease or UL because it was only incorrectly classified as healthy when it was CFS. Other interesting insights from the above model.
prediction1 <- predict(rf1,testing)
results1 <- data.frame(predicted=prediction1, actual=testing$class)
results1
## predicted actual
## GSM2279024_AIM AIM AIM
## GSM2279035_healthy healthy healthy
## Healthy4 healthy healthy
## myo6 healthy fibromyalgia
## control_3 Chronic Fatigue Syndrome healthy
## control_24 Chronic Fatigue Syndrome healthy
## case_27 healthy Chronic Fatigue Syndrome
## control_37 healthy healthy
## case_42 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_46 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_47 Chronic Fatigue Syndrome healthy
## control_51 healthy healthy
## case_58 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_63 healthy Chronic Fatigue Syndrome
## case_67 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_140 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_143 Chronic Fatigue Syndrome healthy
## control_146 Chronic Fatigue Syndrome healthy
## control_147 healthy healthy
## case_148 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_152 healthy healthy
## case_153 healthy Chronic Fatigue Syndrome
## case_157 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_159 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_164 healthy Chronic Fatigue Syndrome
## case_173 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_181 Chronic Fatigue Syndrome healthy
## control_185 Chronic Fatigue Syndrome healthy
## control_189 Chronic Fatigue Syndrome healthy
## control_190 healthy healthy
## case_192 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_198 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_200 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_214 healthy healthy
## case_221 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_223 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_224 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_225 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_230 healthy Chronic Fatigue Syndrome
## case_233 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_245 healthy Chronic Fatigue Syndrome
## case_254 Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_255 healthy healthy
## control_264 Chronic Fatigue Syndrome healthy
## control_267 Chronic Fatigue Syndrome healthy
## MyoN.549_S5_white healthy healthy
## UF.UI.13_S15_black UL surrounding tissue uterine leiomyoma
## healthyControl_6 lyme disease 6 months healthy
## healthyControl_11 lyme disease acute healthy
## acuteLymeDisease_3 lyme disease acute lyme disease acute
## acuteLymeDisease_6 healthy lyme disease acute
## acuteLymeDisease_8 lyme disease acute lyme disease acute
## acuteLymeDisease_11 healthy lyme disease acute
## acuteLymeDisease_14 lyme disease acute lyme disease acute
## acuteLymeDisease_16 lyme disease 1 month lyme disease acute
## acuteLymeDisease_21 lyme disease 6 months lyme disease acute
## acuteLymeDisease_22 lyme disease acute lyme disease acute
## acuteLymeDisease_28 lyme disease acute lyme disease acute
## Antibodies_1month_7 healthy lyme disease 1 month
## Antibodies_6months_7 healthy lyme disease 6 months
## Antibodies_6months_10 lyme disease 1 month lyme disease 6 months
Interesting results, the actual lyme disease samples were predicted as healthy or lyme disease at a different or same timeline of exposure to it, many CFS were in the results but this is because we had much more samples from our CFS data.
Thats it for today.
We can keep on doing this with the GIT and lymphoma EBV associated pathologies as well but at a later time. You can look through each study at my rpubs site for these data sets at the top of this document to run these chunks of code in R markdown for knitr.