We went over some Tableau genes in our top genes of non-EBV associated pathologies compared to top genes of many studies we found the top genes by fold change values and tested for predictive accuracy and found to be better than 70% and some even better than 90% in classifying their respective class in that study.

We now want to compare these genes that we viewed relationships of fold change from acute infectious mononucleosis (AIM) and Chronic Active Epstein-Barr Virus (CAEBV). We compared 12 genes that showed some positive and negative correlations as far as how big of a magnitude in the same or opposite direction the gene expression fold change of pathology compared to baseline pathology. Here are those genes and what we found.

Lets make a string vector of these genes to pull from these datasets.

relationalGenes <- c("ASPM","HISTIH3B","OLR1","IRG1","KIF11",
                     "ILG", "ILIA", "DTL", "FFAR2", "GPR84",
                     "CCNA2", "CCL20")

relationalGenes
##  [1] "ASPM"     "HISTIH3B" "OLR1"     "IRG1"     "KIF11"    "ILG"     
##  [7] "ILIA"     "DTL"      "FFAR2"    "GPR84"    "CCNA2"    "CCL20"

Lets read in some packages

library(rmarkdown)
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.

You can retrieve these data sets here at these links:

Lets read in the mono and EBV dataset first.

pathMono <- "path to CAEBV_genes_32670_FCs.csv"
setwd(pathMono)

monoEBV <- read.csv("CAEBV_genes_32670_FCs.csv") #32670 X 24

paged_table(monoEBV[1:10,])

Now lets read in the Fibromyalgia then the other datasets of non-EBV pathologies Chronic Fatigue Syndrome, Lyme Disease, and Uterine Leiomyoma.

pathFM <- "path to GeneSymbols_FM_FCs_filtered.csv"
setwd(pathFM)

FM <- read.csv("GeneSymbols_FM_FCs_filtered.csv") # 20142 X 17

paged_table(FM[1:10,])

Now read in the chronic fatigue syndrome dataset.

pathCFS <- "path to CFS_data_filtered_ordered_GSE293840.csv"
setwd(pathCFS)

CFS <- read.csv("CFS_data_filtered_ordered_GSE293840.csv") # 39378 X 174

paged_table(CFS[1:10,])

Lets read in the lyme disease dataset.

pathLyme <- "path to LymeDiseaseNormalizedFCsMeansAdded_June4th2026_ABS_min-x.csv"
setwd(pathLyme)

Lyme <- read.csv("LymeDiseaseNormalizedFCsMeansAdded_June4th2026_ABS_min-x.csv") #19526 X 95

paged_table(Lyme[1:10,])

Now we will add in the uterine leiomyoma dataset.

pathUL <- "path to UL_all_FCs_58735_notFiltered_hasNaNs_hasINf.csv"
setwd(pathUL)

UL <- read.csv("UL_all_FCs_58735_notFiltered_hasNaNs_hasINf.csv") #58735X36

paged_table(UL[1:10,])

Now we have 5 datasets to combine and get the genes from our relationalGenes string of genes.

colnames(monoEBV)
##  [1] "ID"                 "gene"               "GSM2279022_AIM"    
##  [4] "GSM2279023_AIM"     "GSM2279024_AIM"     "GSM2279025_CAEBV"  
##  [7] "GSM2279026_AIM"     "GSM2279027_CAEBV"   "GSM2279028_CAEBV"  
## [10] "GSM2279029_CAEBV"   "GSM2279030_CAEBV"   "GSM2279031_healthy"
## [13] "GSM2279032_healthy" "GSM2279033_healthy" "GSM2279034_healthy"
## [16] "GSM2279035_healthy" "GSM2279036_healthy" "GSM2279037_AIM"    
## [19] "GSM2279038_AIM"     "AIM_mean"           "CAEBV_mean"        
## [22] "healthy_mean"       "FC_AIM_healthy"     "FC_CAEBV_healthy"

We only want the actual samples and the gene name for all of these datasets, so we will omit the mean and FCs from each dataset as well as the ensembl ID and other features if present and not a sample of the gene name.

monoEBV_strict <- monoEBV[which(monoEBV$gene %in% relationalGenes), c(2:19)]

paged_table(monoEBV_strict) #9 of the 12 relational genes

Lets make a class string for the monoEBV.

aim <- grep("AIM",colnames(monoEBV_strict))
caebv <- grep("CAEBV",colnames(monoEBV_strict))
healthy <- grep("healthy", colnames(monoEBV_strict))

classMono <- "gene"

classMono[aim] <- "AIM"
classMono[caebv] <- "CAEBV"
classMono[healthy] <- "healthy mono caebv"
colnames(monoEBV_strict)
##  [1] "gene"               "GSM2279022_AIM"     "GSM2279023_AIM"    
##  [4] "GSM2279024_AIM"     "GSM2279025_CAEBV"   "GSM2279026_AIM"    
##  [7] "GSM2279027_CAEBV"   "GSM2279028_CAEBV"   "GSM2279029_CAEBV"  
## [10] "GSM2279030_CAEBV"   "GSM2279031_healthy" "GSM2279032_healthy"
## [13] "GSM2279033_healthy" "GSM2279034_healthy" "GSM2279035_healthy"
## [16] "GSM2279036_healthy" "GSM2279037_AIM"     "GSM2279038_AIM"
classMono
##  [1] "gene"               "AIM"                "AIM"               
##  [4] "AIM"                "CAEBV"              "AIM"               
##  [7] "CAEBV"              "CAEBV"              "CAEBV"             
## [10] "CAEBV"              "healthy mono caebv" "healthy mono caebv"
## [13] "healthy mono caebv" "healthy mono caebv" "healthy mono caebv"
## [16] "healthy mono caebv" "AIM"                "AIM"

Those match without arranging order of the samples by type.

Lets now get the fibromyalgia dataset samples.

colnames(FM)
##  [1] "gene_id"        "gene_name"      "Healthy1"       "Healthy2"      
##  [5] "Healthy3"       "Healthy4"       "Healthy5"       "myo1"          
##  [9] "myo2"           "myo3"           "myo4"           "myo5"          
## [13] "myo6"           "myo7"           "healthy_Mean"   "myo_Mean"      
## [17] "FC_myo_healthy"
FM_strict <- FM[which(FM$gene_name %in% relationalGenes),c(2:14)]

paged_table(FM_strict) #4 of the 12 relational genes
colnames(FM_strict)
##  [1] "gene_name" "Healthy1"  "Healthy2"  "Healthy3"  "Healthy4"  "Healthy5" 
##  [7] "myo1"      "myo2"      "myo3"      "myo4"      "myo5"      "myo6"     
## [13] "myo7"
classFM <- "gene"

healthyFM <- grep("Healthy", colnames(FM_strict))
fibromyalgia <- grep("myo",colnames(FM_strict))

classFM[healthyFM] <- 'healthy FM'
classFM[fibromyalgia] <- 'fibromyalgia'

classFM
##  [1] "gene"         "healthy FM"   "healthy FM"   "healthy FM"   "healthy FM"  
##  [6] "healthy FM"   "fibromyalgia" "fibromyalgia" "fibromyalgia" "fibromyalgia"
## [11] "fibromyalgia" "fibromyalgia" "fibromyalgia"

Lets now get the Chronic Fatigue Syndrome data in same format.

colnames(CFS)
##   [1] "gene_id"                "gene_name"              "Ensembl_transcript"    
##   [4] "control_1"              "control_2"              "control_3"             
##   [7] "case_4"                 "control_5"              "case_6"                
##  [10] "control_7"              "control_8"              "case_11"               
##  [13] "case_12"                "case_13"                "case_14"               
##  [16] "control_15"             "case_16"                "control_17"            
##  [19] "case_18"                "control_21"             "control_22"            
##  [22] "case_23"                "control_24"             "case_25"               
##  [25] "case_26"                "case_27"                "case_28"               
##  [28] "case_31"                "control_32"             "case_33"               
##  [31] "control_34"             "case_35"                "control_36"            
##  [34] "control_37"             "control_38"             "case_41"               
##  [37] "case_42"                "control_43"             "control_44"            
##  [40] "control_45"             "case_46"                "control_47"            
##  [43] "control_48"             "control_51"             "case_52"               
##  [46] "case_53"                "control_54"             "control_55"            
##  [49] "case_56"                "control_57"             "case_58"               
##  [52] "control_59"             "control_60"             "case_63"               
##  [55] "case_64"                "case_65"                "control_66"            
##  [58] "case_67"                "control_68"             "case_69"               
##  [61] "case_70"                "case_71"                "control_72"            
##  [64] "case_139"               "case_140"               "case_141"              
##  [67] "case_142"               "control_143"            "control_145"           
##  [70] "control_146"            "control_147"            "case_148"              
##  [73] "case_150"               "control_151"            "control_152"           
##  [76] "case_153"               "case_154"               "control_155"           
##  [79] "case_156"               "case_157"               "case_159"              
##  [82] "case_160"               "control_161"            "control_162"           
##  [85] "case_163"               "case_164"               "control_165"           
##  [88] "case_166"               "case_167"               "control_168"           
##  [91] "control_169"            "case_170"               "case_171"              
##  [94] "case_173"               "case_174"               "case_177"              
##  [97] "case_178"               "case_179"               "control_181"           
## [100] "case_182"               "control_183"            "control_184"           
## [103] "control_185"            "case_186"               "control_187"           
## [106] "control_188"            "control_189"            "control_190"           
## [109] "case_192"               "control_193"            "control_194"           
## [112] "control_195"            "case_196"               "case_197"              
## [115] "case_198"               "control_199"            "case_200"              
## [118] "case_201"               "case_202"               "case_204"              
## [121] "case_205"               "case_206"               "control_207"           
## [124] "control_208"            "control_209"            "case_211"              
## [127] "control_212"            "case_213"               "control_214"           
## [130] "control_215"            "case_219"               "control_220"           
## [133] "case_221"               "case_222"               "case_223"              
## [136] "case_224"               "case_225"               "case_226"              
## [139] "case_230"               "case_231"               "control_232"           
## [142] "case_233"               "case_235"               "control_236"           
## [145] "case_240"               "case_241"               "case_242"              
## [148] "control_243"            "control_244"            "case_245"              
## [151] "control_246"            "case_247"               "case_248"              
## [154] "case_251"               "control_252"            "case_253"              
## [157] "case_254"               "control_255"            "control_256"           
## [160] "control_257"            "case_258"               "case_259"              
## [163] "control_260"            "control_264"            "case_265"              
## [166] "case_266"               "control_267"            "control_268"           
## [169] "case_270"               "control_271"            "case_272"              
## [172] "healthy_mean"           "CSF_mean"               "foldchange_CSF_healthy"
CFS_strict <- CFS[which(CFS$gene_name %in% relationalGenes),c(2,4:171)]

colnames(CFS_strict) #8 genes of the 12  relational genes
##   [1] "gene_name"   "control_1"   "control_2"   "control_3"   "case_4"     
##   [6] "control_5"   "case_6"      "control_7"   "control_8"   "case_11"    
##  [11] "case_12"     "case_13"     "case_14"     "control_15"  "case_16"    
##  [16] "control_17"  "case_18"     "control_21"  "control_22"  "case_23"    
##  [21] "control_24"  "case_25"     "case_26"     "case_27"     "case_28"    
##  [26] "case_31"     "control_32"  "case_33"     "control_34"  "case_35"    
##  [31] "control_36"  "control_37"  "control_38"  "case_41"     "case_42"    
##  [36] "control_43"  "control_44"  "control_45"  "case_46"     "control_47" 
##  [41] "control_48"  "control_51"  "case_52"     "case_53"     "control_54" 
##  [46] "control_55"  "case_56"     "control_57"  "case_58"     "control_59" 
##  [51] "control_60"  "case_63"     "case_64"     "case_65"     "control_66" 
##  [56] "case_67"     "control_68"  "case_69"     "case_70"     "case_71"    
##  [61] "control_72"  "case_139"    "case_140"    "case_141"    "case_142"   
##  [66] "control_143" "control_145" "control_146" "control_147" "case_148"   
##  [71] "case_150"    "control_151" "control_152" "case_153"    "case_154"   
##  [76] "control_155" "case_156"    "case_157"    "case_159"    "case_160"   
##  [81] "control_161" "control_162" "case_163"    "case_164"    "control_165"
##  [86] "case_166"    "case_167"    "control_168" "control_169" "case_170"   
##  [91] "case_171"    "case_173"    "case_174"    "case_177"    "case_178"   
##  [96] "case_179"    "control_181" "case_182"    "control_183" "control_184"
## [101] "control_185" "case_186"    "control_187" "control_188" "control_189"
## [106] "control_190" "case_192"    "control_193" "control_194" "control_195"
## [111] "case_196"    "case_197"    "case_198"    "control_199" "case_200"   
## [116] "case_201"    "case_202"    "case_204"    "case_205"    "case_206"   
## [121] "control_207" "control_208" "control_209" "case_211"    "control_212"
## [126] "case_213"    "control_214" "control_215" "case_219"    "control_220"
## [131] "case_221"    "case_222"    "case_223"    "case_224"    "case_225"   
## [136] "case_226"    "case_230"    "case_231"    "control_232" "case_233"   
## [141] "case_235"    "control_236" "case_240"    "case_241"    "case_242"   
## [146] "control_243" "control_244" "case_245"    "control_246" "case_247"   
## [151] "case_248"    "case_251"    "control_252" "case_253"    "case_254"   
## [156] "control_255" "control_256" "control_257" "case_258"    "case_259"   
## [161] "control_260" "control_264" "case_265"    "case_266"    "control_267"
## [166] "control_268" "case_270"    "control_271" "case_272"
classCFS <- "gene"

cfs <- grep('case',colnames(CFS_strict))
healthyCFS <- grep('control',colnames(CFS_strict))

classCFS[cfs] <- "Chronic Fatigue Syndrome"
classCFS[healthyCFS] <- "healthy CFS"

classCFS
##   [1] "gene"                     "healthy CFS"             
##   [3] "healthy CFS"              "healthy CFS"             
##   [5] "Chronic Fatigue Syndrome" "healthy CFS"             
##   [7] "Chronic Fatigue Syndrome" "healthy CFS"             
##   [9] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [11] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [13] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [15] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [17] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [19] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [21] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [23] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [25] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [27] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [29] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [31] "healthy CFS"              "healthy CFS"             
##  [33] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [35] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [37] "healthy CFS"              "healthy CFS"             
##  [39] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [41] "healthy CFS"              "healthy CFS"             
##  [43] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [45] "healthy CFS"              "healthy CFS"             
##  [47] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [49] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [51] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [53] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [55] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [57] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [59] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [61] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [63] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [65] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [67] "healthy CFS"              "healthy CFS"             
##  [69] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [71] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [73] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [75] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [77] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [79] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [81] "healthy CFS"              "healthy CFS"             
##  [83] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [85] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [87] "Chronic Fatigue Syndrome" "healthy CFS"             
##  [89] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [91] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [93] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [95] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
##  [97] "healthy CFS"              "Chronic Fatigue Syndrome"
##  [99] "healthy CFS"              "healthy CFS"             
## [101] "healthy CFS"              "Chronic Fatigue Syndrome"
## [103] "healthy CFS"              "healthy CFS"             
## [105] "healthy CFS"              "healthy CFS"             
## [107] "Chronic Fatigue Syndrome" "healthy CFS"             
## [109] "healthy CFS"              "healthy CFS"             
## [111] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [113] "Chronic Fatigue Syndrome" "healthy CFS"             
## [115] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [117] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [119] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [121] "healthy CFS"              "healthy CFS"             
## [123] "healthy CFS"              "Chronic Fatigue Syndrome"
## [125] "healthy CFS"              "Chronic Fatigue Syndrome"
## [127] "healthy CFS"              "healthy CFS"             
## [129] "Chronic Fatigue Syndrome" "healthy CFS"             
## [131] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [133] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [135] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [137] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [139] "healthy CFS"              "Chronic Fatigue Syndrome"
## [141] "Chronic Fatigue Syndrome" "healthy CFS"             
## [143] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [145] "Chronic Fatigue Syndrome" "healthy CFS"             
## [147] "healthy CFS"              "Chronic Fatigue Syndrome"
## [149] "healthy CFS"              "Chronic Fatigue Syndrome"
## [151] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [153] "healthy CFS"              "Chronic Fatigue Syndrome"
## [155] "Chronic Fatigue Syndrome" "healthy CFS"             
## [157] "healthy CFS"              "healthy CFS"             
## [159] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [161] "healthy CFS"              "healthy CFS"             
## [163] "Chronic Fatigue Syndrome" "Chronic Fatigue Syndrome"
## [165] "healthy CFS"              "healthy CFS"             
## [167] "Chronic Fatigue Syndrome" "healthy CFS"             
## [169] "Chronic Fatigue Syndrome"

Lets do the same thing or process to the UL dataset.

colnames(UL)
##  [1] "GeneID"                "GeneSymbol"            "GeneBiotype"          
##  [4] "MyoF.348_S12_white"    "MyoF.428_S11_white"    "MyoF.483_S8_black"    
##  [7] "MyoF.526_S10_white"    "MyoF.UI.10_S7_black"   "MyoF.UI.13_S9_black"  
## [10] "MyoN.432_S4_white"     "MyoN.514_S2_black"     "MyoN.549_S5_white"    
## [13] "MyoN.UI.20_S1_black"   "MyoN.UI.43_S3_black"   "MyoN.UI.8_S6_white"   
## [16] "UF.372_S18_white"      "UF.428_S17_white"      "UF.483_S14_black"     
## [19] "UF.526_S16_white"      "UF.UI.13_S15_black"    "UF.UI.23_S13_black"   
## [22] "normal_all_mean"       "UF_all_mean"           "UF_all_risk_mean"     
## [25] "normal_white_mean"     "UF_white_mean"         "UF_risk_white_mean"   
## [28] "normal_black_mean"     "UF_black_mean"         "UF_risk_black_mean"   
## [31] "UF_normal_all_FC"      "UF_risk_normal_all_FC" "UF_normal_white_FC"   
## [34] "UF_risk_white_FC"      "UF_normal_black_FC"    "UF_risk_black_FC"
UL_strict <- UL[which(UL$GeneSymbol %in% relationalGenes),c(2,4:21)]
colnames(UL_strict)
##  [1] "GeneSymbol"          "MyoF.348_S12_white"  "MyoF.428_S11_white" 
##  [4] "MyoF.483_S8_black"   "MyoF.526_S10_white"  "MyoF.UI.10_S7_black"
##  [7] "MyoF.UI.13_S9_black" "MyoN.432_S4_white"   "MyoN.514_S2_black"  
## [10] "MyoN.549_S5_white"   "MyoN.UI.20_S1_black" "MyoN.UI.43_S3_black"
## [13] "MyoN.UI.8_S6_white"  "UF.372_S18_white"    "UF.428_S17_white"   
## [16] "UF.483_S14_black"    "UF.526_S16_white"    "UF.UI.13_S15_black" 
## [19] "UF.UI.23_S13_black"

Note that in this study the MyoF is at risk tissue next to the uterine fibroid, the MyoN is normal myometrial tissue from somebody completely different, and the UF is the uterine fibroid.

classUL <- "gene"

healthyUL <- grep("MyoN",colnames(UL_strict))
ul <- grep("UF", colnames(UL_strict))
ulRisk <- grep("MyoF", colnames(UL_strict))

classUL[healthyUL] <- 'healthy uterine tissue'
classUL[ul] <- 'uterine leiomyoma'
classUL[ulRisk] <- 'UL surrounding tissue'

classUL
##  [1] "gene"                   "UL surrounding tissue"  "UL surrounding tissue" 
##  [4] "UL surrounding tissue"  "UL surrounding tissue"  "UL surrounding tissue" 
##  [7] "UL surrounding tissue"  "healthy uterine tissue" "healthy uterine tissue"
## [10] "healthy uterine tissue" "healthy uterine tissue" "healthy uterine tissue"
## [13] "healthy uterine tissue" "uterine leiomyoma"      "uterine leiomyoma"     
## [16] "uterine leiomyoma"      "uterine leiomyoma"      "uterine leiomyoma"     
## [19] "uterine leiomyoma"

Now for the Lyme disease data to be arranged as the others.

colnames(Lyme)
##  [1] "Gene"                      "healthyControl_1"         
##  [3] "healthyControl_2"          "healthyControl_3"         
##  [5] "healthyControl_4"          "healthyControl_5"         
##  [7] "healthyControl_6"          "healthyControl_7"         
##  [9] "healthyControl_8"          "healthyControl_9"         
## [11] "healthyControl_10"         "healthyControl_11"        
## [13] "healthyControl_12"         "healthyControl_13"        
## [15] "healthyControl_14"         "healthyControl_15"        
## [17] "healthyControl_16"         "healthyControl_17"        
## [19] "healthyControl_18"         "healthyControl_19"        
## [21] "healthyControl_20"         "healthyControl_21"        
## [23] "acuteLymeDisease_1"        "acuteLymeDisease_2"       
## [25] "acuteLymeDisease_3"        "acuteLymeDisease_4"       
## [27] "acuteLymeDisease_5"        "acuteLymeDisease_6"       
## [29] "acuteLymeDisease_7"        "acuteLymeDisease_8"       
## [31] "acuteLymeDisease_9"        "acuteLymeDisease_10"      
## [33] "acuteLymeDisease_11"       "acuteLymeDisease_12"      
## [35] "acuteLymeDisease_13"       "acuteLymeDisease_14"      
## [37] "acuteLymeDisease_15"       "acuteLymeDisease_16"      
## [39] "acuteLymeDisease_17"       "acuteLymeDisease_18"      
## [41] "acuteLymeDisease_19"       "acuteLymeDisease_20"      
## [43] "acuteLymeDisease_21"       "acuteLymeDisease_22"      
## [45] "acuteLymeDisease_23"       "acuteLymeDisease_24"      
## [47] "acuteLymeDisease_25"       "acuteLymeDisease_26"      
## [49] "acuteLymeDisease_27"       "acuteLymeDisease_28"      
## [51] "Antibodies_1month_1"       "Antibodies_1month_2"      
## [53] "Antibodies_1month_3"       "Antibodies_1month_4"      
## [55] "Antibodies_1month_5"       "Antibodies_1month_6"      
## [57] "Antibodies_1month_7"       "Antibodies_1month_8"      
## [59] "Antibodies_1month_9"       "Antibodies_1month_10"     
## [61] "Antibodies_1month_11"      "Antibodies_1month_12"     
## [63] "Antibodies_1month_13"      "Antibodies_1month_14"     
## [65] "Antibodies_1month_15"      "Antibodies_1month_16"     
## [67] "Antibodies_1month_17"      "Antibodies_1month_18"     
## [69] "Antibodies_1month_19"      "Antibodies_1month_20"     
## [71] "Antibodies_1month_21"      "Antibodies_1month_22"     
## [73] "Antibodies_1month_23"      "Antibodies_1month_24"     
## [75] "Antibodies_1month_25"      "Antibodies_1month_26"     
## [77] "Antibodies_1month_27"      "Antibodies_6months_1"     
## [79] "Antibodies_6months_2"      "Antibodies_6months_3"     
## [81] "Antibodies_6months_4"      "Antibodies_6months_5"     
## [83] "Antibodies_6months_6"      "Antibodies_6months_7"     
## [85] "Antibodies_6months_8"      "Antibodies_6months_9"     
## [87] "Antibodies_6months_10"     "healthy_mean"             
## [89] "acute_mean"                "month1_mean"              
## [91] "month6_mean"               "foldchange_acute_healthy" 
## [93] "foldchange_1month_healthy" "foldchange_6month_healthy"
## [95] "foldchange_6month_acute"
Lyme_strict <- Lyme[which(Lyme$Gene %in% relationalGenes),c(1:87)]

colnames(Lyme_strict) #8 genes of the 12
##  [1] "Gene"                  "healthyControl_1"      "healthyControl_2"     
##  [4] "healthyControl_3"      "healthyControl_4"      "healthyControl_5"     
##  [7] "healthyControl_6"      "healthyControl_7"      "healthyControl_8"     
## [10] "healthyControl_9"      "healthyControl_10"     "healthyControl_11"    
## [13] "healthyControl_12"     "healthyControl_13"     "healthyControl_14"    
## [16] "healthyControl_15"     "healthyControl_16"     "healthyControl_17"    
## [19] "healthyControl_18"     "healthyControl_19"     "healthyControl_20"    
## [22] "healthyControl_21"     "acuteLymeDisease_1"    "acuteLymeDisease_2"   
## [25] "acuteLymeDisease_3"    "acuteLymeDisease_4"    "acuteLymeDisease_5"   
## [28] "acuteLymeDisease_6"    "acuteLymeDisease_7"    "acuteLymeDisease_8"   
## [31] "acuteLymeDisease_9"    "acuteLymeDisease_10"   "acuteLymeDisease_11"  
## [34] "acuteLymeDisease_12"   "acuteLymeDisease_13"   "acuteLymeDisease_14"  
## [37] "acuteLymeDisease_15"   "acuteLymeDisease_16"   "acuteLymeDisease_17"  
## [40] "acuteLymeDisease_18"   "acuteLymeDisease_19"   "acuteLymeDisease_20"  
## [43] "acuteLymeDisease_21"   "acuteLymeDisease_22"   "acuteLymeDisease_23"  
## [46] "acuteLymeDisease_24"   "acuteLymeDisease_25"   "acuteLymeDisease_26"  
## [49] "acuteLymeDisease_27"   "acuteLymeDisease_28"   "Antibodies_1month_1"  
## [52] "Antibodies_1month_2"   "Antibodies_1month_3"   "Antibodies_1month_4"  
## [55] "Antibodies_1month_5"   "Antibodies_1month_6"   "Antibodies_1month_7"  
## [58] "Antibodies_1month_8"   "Antibodies_1month_9"   "Antibodies_1month_10" 
## [61] "Antibodies_1month_11"  "Antibodies_1month_12"  "Antibodies_1month_13" 
## [64] "Antibodies_1month_14"  "Antibodies_1month_15"  "Antibodies_1month_16" 
## [67] "Antibodies_1month_17"  "Antibodies_1month_18"  "Antibodies_1month_19" 
## [70] "Antibodies_1month_20"  "Antibodies_1month_21"  "Antibodies_1month_22" 
## [73] "Antibodies_1month_23"  "Antibodies_1month_24"  "Antibodies_1month_25" 
## [76] "Antibodies_1month_26"  "Antibodies_1month_27"  "Antibodies_6months_1" 
## [79] "Antibodies_6months_2"  "Antibodies_6months_3"  "Antibodies_6months_4" 
## [82] "Antibodies_6months_5"  "Antibodies_6months_6"  "Antibodies_6months_7" 
## [85] "Antibodies_6months_8"  "Antibodies_6months_9"  "Antibodies_6months_10"
classLyme <- "gene"

healthyLyme <- grep('healthy',colnames(Lyme_strict))
acute <- grep('acute', colnames(Lyme_strict))
lyme1 <- grep('1month', colnames(Lyme_strict))
lyme6 <- grep('6month', colnames(Lyme_strict))

classLyme[healthyLyme] <- "healthy before lyme disease"
classLyme[acute] <- "lyme disease acute"
classLyme[lyme1] <- "lyme disease 1 month"
classLyme[lyme6] <- "lyme disease 6 months"

classLyme
##  [1] "gene"                        "healthy before lyme disease"
##  [3] "healthy before lyme disease" "healthy before lyme disease"
##  [5] "healthy before lyme disease" "healthy before lyme disease"
##  [7] "healthy before lyme disease" "healthy before lyme disease"
##  [9] "healthy before lyme disease" "healthy before lyme disease"
## [11] "healthy before lyme disease" "healthy before lyme disease"
## [13] "healthy before lyme disease" "healthy before lyme disease"
## [15] "healthy before lyme disease" "healthy before lyme disease"
## [17] "healthy before lyme disease" "healthy before lyme disease"
## [19] "healthy before lyme disease" "healthy before lyme disease"
## [21] "healthy before lyme disease" "healthy before lyme disease"
## [23] "lyme disease acute"          "lyme disease acute"         
## [25] "lyme disease acute"          "lyme disease acute"         
## [27] "lyme disease acute"          "lyme disease acute"         
## [29] "lyme disease acute"          "lyme disease acute"         
## [31] "lyme disease acute"          "lyme disease acute"         
## [33] "lyme disease acute"          "lyme disease acute"         
## [35] "lyme disease acute"          "lyme disease acute"         
## [37] "lyme disease acute"          "lyme disease acute"         
## [39] "lyme disease acute"          "lyme disease acute"         
## [41] "lyme disease acute"          "lyme disease acute"         
## [43] "lyme disease acute"          "lyme disease acute"         
## [45] "lyme disease acute"          "lyme disease acute"         
## [47] "lyme disease acute"          "lyme disease acute"         
## [49] "lyme disease acute"          "lyme disease acute"         
## [51] "lyme disease 1 month"        "lyme disease 1 month"       
## [53] "lyme disease 1 month"        "lyme disease 1 month"       
## [55] "lyme disease 1 month"        "lyme disease 1 month"       
## [57] "lyme disease 1 month"        "lyme disease 1 month"       
## [59] "lyme disease 1 month"        "lyme disease 1 month"       
## [61] "lyme disease 1 month"        "lyme disease 1 month"       
## [63] "lyme disease 1 month"        "lyme disease 1 month"       
## [65] "lyme disease 1 month"        "lyme disease 1 month"       
## [67] "lyme disease 1 month"        "lyme disease 1 month"       
## [69] "lyme disease 1 month"        "lyme disease 1 month"       
## [71] "lyme disease 1 month"        "lyme disease 1 month"       
## [73] "lyme disease 1 month"        "lyme disease 1 month"       
## [75] "lyme disease 1 month"        "lyme disease 1 month"       
## [77] "lyme disease 1 month"        "lyme disease 6 months"      
## [79] "lyme disease 6 months"       "lyme disease 6 months"      
## [81] "lyme disease 6 months"       "lyme disease 6 months"      
## [83] "lyme disease 6 months"       "lyme disease 6 months"      
## [85] "lyme disease 6 months"       "lyme disease 6 months"      
## [87] "lyme disease 6 months"

Lets look at the genes in the data by which genes in common among all these pathologies.

CFS_strict$gene_name
## [1] "CCL20" "DTL"   "KIF11" "CCNA2" "ASPM"  "OLR1"  "FFAR2" "GPR84"
FM_strict$gene_name
## [1] "FFAR2" "KIF11" "DTL"   "CCNA2"
Lyme_strict$Gene
## [1] "OLR1"  "CCL20" "KIF11" "FFAR2" "GPR84" "CCNA2" "ASPM"  "DTL"
monoEBV_strict$gene
## [1] "KIF11" "ASPM"  "CCNA2" "DTL"   "FFAR2" "IRG1"  "GPR84" "CCL20" "OLR1"
UL_strict$GeneSymbol
## [1] "ASPM"  "DTL"   "CCL20" "CCNA2" "KIF11" "OLR1"  "GPR84" "FFAR2"

It looks like the 4 genes that are limited in the uterine leiomyoma can be used in predicting the class of sample.

genes4 <- FM_strict$gene_name

We are using the fibromyalgia or FM data of 4 genes that are common to the other data sets.

CFS4 <- CFS_strict[which(CFS_strict$gene_name %in% genes4),]
mono4 <- monoEBV_strict[which(monoEBV_strict$gene %in% genes4),]
Lyme4 <- Lyme_strict[which(Lyme_strict$Gene %in% genes4),]
UL4 <- UL_strict[which(UL_strict$GeneSymbol %in% genes4),]

Lets make our matrices for each of these and add in each class feature we just made.

CFS4_t <- data.frame(t(CFS4[,2:169]))
colnames(CFS4_t) <- CFS4$gene_name
CFS4_t$class <- classCFS[2:length(classCFS)]

paged_table(CFS4_t[1:10,])
CFS4_t2 <- CFS4_t[,c(3,1,4,2,5)]
colnames(CFS4_t2)
## [1] "CCNA2" "DTL"   "FFAR2" "KIF11" "class"

The above is the chronic fatigue syndrome, the next will be the fibromyalgia.

FM4_t <- data.frame(t(FM_strict[,2:13]))
colnames(FM4_t) <- FM_strict$gene_name
FM4_t$class <- classFM[2:length(classFM)]

paged_table(FM4_t)
FM4_t2 <- FM4_t[,c(4,3,1,2,5)]
colnames(FM4_t2)
## [1] "CCNA2" "DTL"   "FFAR2" "KIF11" "class"

Now for the Lyme disease data matrix. We just made the CFS and FM matrices and alphabatized the gene features.

Lyme4_t <- data.frame(t(Lyme4[,2:87]))
colnames(Lyme4_t) <- Lyme4$Gene
Lyme4_t$class <- classLyme[2:length(classLyme)]

paged_table(Lyme4_t[1:10,])
Lyme4_t2 <- Lyme4_t[,c(3,4,2,1,5)]

colnames(Lyme4_t2)
## [1] "CCNA2" "DTL"   "FFAR2" "KIF11" "class"

Next will be the UL matrix

UL4_t <- data.frame(t(UL4[,2:19]))
colnames(UL4_t) <- UL4$GeneSymbol
UL4_t$class <- classUL[2:length(classUL)]

paged_table(UL4_t[1:10,])
UL4_t2 <- UL4_t[,c(2,1,4,3,5)]

colnames(UL4_t2)
## [1] "CCNA2" "DTL"   "FFAR2" "KIF11" "class"

Next will be the last matrix of the mono and EBV genes.

mono4_t <- data.frame(t(mono4[,2:18]))
colnames(mono4_t) <- mono4$gene
mono4_t$class <- classMono[2:length(classMono)]

paged_table(mono4_t[1:10,])
mono4_t2 <- mono4_t[,c(2,3,4,1,5)]

colnames(mono4_t2)
## [1] "CCNA2" "DTL"   "FFAR2" "KIF11" "class"

Lets row bind all these samples together now that they have the same feature IDs by gene and class.

matrix5sets <- rbind(mono4_t2,FM4_t2,CFS4_t2,UL4_t2,Lyme4_t2)

paged_table(matrix5sets[c(1:10,50:75,100:125),])
write.csv(matrix5sets,'matrix4genes.csv', row.names=F)

Now lets replace the healthy samples to only have one sample name of healthy.

table(matrix5sets$class)
## 
##                         AIM                       CAEBV 
##                           6                           5 
##    Chronic Fatigue Syndrome                fibromyalgia 
##                          93                           7 
## healthy before lyme disease                 healthy CFS 
##                          21                          75 
##                  healthy FM          healthy mono caebv 
##                           5                           6 
##      healthy uterine tissue        lyme disease 1 month 
##                           6                          27 
##       lyme disease 6 months          lyme disease acute 
##                          10                          28 
##       UL surrounding tissue           uterine leiomyoma 
##                           6                           6
healthy5 <- grep('healthy',matrix5sets$class)

matrix5sets$class[healthy5] <- 'healthy'

table(matrix5sets$class)
## 
##                      AIM                    CAEBV Chronic Fatigue Syndrome 
##                        6                        5                       93 
##             fibromyalgia                  healthy     lyme disease 1 month 
##                        7                      113                       27 
##    lyme disease 6 months       lyme disease acute    UL surrounding tissue 
##                       10                       28                        6 
##        uterine leiomyoma 
##                        6
write.csv(matrix5sets,'matrix5sets_healthy5into1healthy.csv', row.names=F)
matrix5sets$class <- as.factor(matrix5sets$class)

set.seed(125)

inTrain <- sample(1:301, .8*301)

training <- matrix5sets[inTrain,]
testing <- matrix5sets[-inTrain,]

table(training$class)
## 
##                      AIM                    CAEBV Chronic Fatigue Syndrome 
##                        5                        5                       69 
##             fibromyalgia                  healthy     lyme disease 1 month 
##                        6                       91                       26 
##    lyme disease 6 months       lyme disease acute    UL surrounding tissue 
##                        8                       19                        6 
##        uterine leiomyoma 
##                        5
table(testing$class)
## 
##                      AIM                    CAEBV Chronic Fatigue Syndrome 
##                        1                        0                       24 
##             fibromyalgia                  healthy     lyme disease 1 month 
##                        1                       22                        1 
##    lyme disease 6 months       lyme disease acute    UL surrounding tissue 
##                        2                        9                        0 
##        uterine leiomyoma 
##                        1
rf1 <- randomForest(training[1:4], training$class, mtry=3, ntree=5000, confusion=T)

rf1$confusion
##                          AIM CAEBV Chronic Fatigue Syndrome fibromyalgia
## AIM                        5     0                        0            0
## CAEBV                      0     3                        0            0
## Chronic Fatigue Syndrome   0     0                       35            0
## fibromyalgia               0     0                        0            2
## healthy                    0     1                       30            1
## lyme disease 1 month       0     0                        0            0
## lyme disease 6 months      0     0                        0            0
## lyme disease acute         0     0                        0            0
## UL surrounding tissue      0     0                        1            0
## uterine leiomyoma          0     0                        1            0
##                          healthy lyme disease 1 month lyme disease 6 months
## AIM                            0                    0                     0
## CAEBV                          2                    0                     0
## Chronic Fatigue Syndrome      34                    0                     0
## fibromyalgia                   2                    0                     0
## healthy                       39                    9                     3
## lyme disease 1 month           7                   13                     1
## lyme disease 6 months          3                    4                     0
## lyme disease acute             5                    6                     0
## UL surrounding tissue          3                    0                     0
## uterine leiomyoma              3                    0                     0
##                          lyme disease acute UL surrounding tissue
## AIM                                       0                     0
## CAEBV                                     0                     0
## Chronic Fatigue Syndrome                  0                     0
## fibromyalgia                              2                     0
## healthy                                   5                     1
## lyme disease 1 month                      5                     0
## lyme disease 6 months                     1                     0
## lyme disease acute                        8                     0
## UL surrounding tissue                     0                     0
## uterine leiomyoma                         0                     0
##                          uterine leiomyoma class.error
## AIM                                      0   0.0000000
## CAEBV                                    0   0.4000000
## Chronic Fatigue Syndrome                 0   0.4927536
## fibromyalgia                             0   0.6666667
## healthy                                  2   0.5714286
## lyme disease 1 month                     0   0.5000000
## lyme disease 6 months                    0   1.0000000
## lyme disease acute                       0   0.5789474
## UL surrounding tissue                    2   1.0000000
## uterine leiomyoma                        1   0.8000000

Lets see if it will predict any better, 100% accuracy on the acute infectious mono in training, and chronic fatigue syndrome has 0% of being related to mono or EBV or Lyme disease or UL because it was only incorrectly classified as healthy when it was CFS. Other interesting insights from the above model.

prediction1 <- predict(rf1,testing)

results1 <- data.frame(predicted=prediction1, actual=testing$class)

results1
##                                      predicted                   actual
## GSM2279024_AIM                             AIM                      AIM
## GSM2279035_healthy                     healthy                  healthy
## Healthy4                               healthy                  healthy
## myo6                                   healthy             fibromyalgia
## control_3             Chronic Fatigue Syndrome                  healthy
## control_24            Chronic Fatigue Syndrome                  healthy
## case_27                                healthy Chronic Fatigue Syndrome
## control_37                             healthy                  healthy
## case_42               Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_46               Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_47            Chronic Fatigue Syndrome                  healthy
## control_51                             healthy                  healthy
## case_58               Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_63                                healthy Chronic Fatigue Syndrome
## case_67               Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_140              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_143           Chronic Fatigue Syndrome                  healthy
## control_146           Chronic Fatigue Syndrome                  healthy
## control_147                            healthy                  healthy
## case_148              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_152                            healthy                  healthy
## case_153                               healthy Chronic Fatigue Syndrome
## case_157              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_159              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_164                               healthy Chronic Fatigue Syndrome
## case_173              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_181           Chronic Fatigue Syndrome                  healthy
## control_185           Chronic Fatigue Syndrome                  healthy
## control_189           Chronic Fatigue Syndrome                  healthy
## control_190                            healthy                  healthy
## case_192              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_198              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_200              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_214                            healthy                  healthy
## case_221              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_223              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_224              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_225              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_230                               healthy Chronic Fatigue Syndrome
## case_233              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## case_245                               healthy Chronic Fatigue Syndrome
## case_254              Chronic Fatigue Syndrome Chronic Fatigue Syndrome
## control_255                            healthy                  healthy
## control_264           Chronic Fatigue Syndrome                  healthy
## control_267           Chronic Fatigue Syndrome                  healthy
## MyoN.549_S5_white                      healthy                  healthy
## UF.UI.13_S15_black       UL surrounding tissue        uterine leiomyoma
## healthyControl_6         lyme disease 6 months                  healthy
## healthyControl_11           lyme disease acute                  healthy
## acuteLymeDisease_3          lyme disease acute       lyme disease acute
## acuteLymeDisease_6                     healthy       lyme disease acute
## acuteLymeDisease_8          lyme disease acute       lyme disease acute
## acuteLymeDisease_11                    healthy       lyme disease acute
## acuteLymeDisease_14         lyme disease acute       lyme disease acute
## acuteLymeDisease_16       lyme disease 1 month       lyme disease acute
## acuteLymeDisease_21      lyme disease 6 months       lyme disease acute
## acuteLymeDisease_22         lyme disease acute       lyme disease acute
## acuteLymeDisease_28         lyme disease acute       lyme disease acute
## Antibodies_1month_7                    healthy     lyme disease 1 month
## Antibodies_6months_7                   healthy    lyme disease 6 months
## Antibodies_6months_10     lyme disease 1 month    lyme disease 6 months

Interesting results, the actual lyme disease samples were predicted as healthy or lyme disease at a different or same timeline of exposure to it, many CFS were in the results but this is because we had much more samples from our CFS data.

Thats it for today.

We can keep on doing this with the GIT and lymphoma EBV associated pathologies as well but at a later time. You can look through each study at my rpubs site for these data sets at the top of this document to run these chunks of code in R markdown for knitr.