Top 41 genes in common to the multiple sclerosis patients participating and the comparison commercial store bought multiple sclerosis patient from the top 50 and bottom 50 or top 50 enhancer and bottom 50 silencer complementary DNA fragments of 20 base pairs long each called barcodes on study GSE293036 of NCBI.
This finds the top 100 complementary DNA strands of fragments in 20 nucleic DNA base pairs long of multiple sclerosis patients Mean values in two MS patients, a commercial MS patient for comparison, and a control of a healthy patient. The study is from GSE293036 but find the data and information detail on the data extraction portion of the data that makes this very large 8.8 Million row size data frame by 19 features here.
data <- read.csv('allSampleRepeatsControlsMS1MS2Commercial.csv',sep=',',header=T, na.strings=c('',' ','na','NA'))
str(data)
## 'data.frame': 8838657 obs. of 19 variables:
## $ ID_REF : chr "TTTTTTTTTTTTTTCGTCCC" "TTTTTTTTTTTTTCCTTGCT" "TTTTTTTTTTTTGCAGTGAT" "TTTTTTTTTTTTCTGCTATG" ...
## $ control1.4362 : int 4 2 4 4 5 1 4 3 2 5 ...
## $ control2.4363 : int 4 1 1 1 5 2 3 1 3 2 ...
## $ control3.4364 : int 3 5 1 2 8 4 1 2 3 3 ...
## $ MS1_r1_4370 : int 3 5 3 3 15 23 2 4 11 11 ...
## $ MS1_r2_4371 : int 1 4 4 8 23 15 10 4 4 6 ...
## $ MS1_r3_4372 : int 3 3 3 2 16 7 5 10 7 1 ...
## $ MS1_r4_4373 : int 5 9 3 4 43 17 18 22 24 17 ...
## $ MS1_r5_4374 : int 6 12 5 9 26 12 21 8 5 8 ...
## $ MS2_r1_4375 : int 4 5 3 1 19 9 8 12 12 6 ...
## $ MS2_r2_4376 : int 8 3 7 8 27 19 6 7 13 5 ...
## $ MS2_r3_4377 : int 11 10 8 7 25 19 6 6 8 6 ...
## $ MS2_r4_4378 : int 3 8 4 4 19 22 11 8 20 4 ...
## $ MS2_r5_4379 : int 4 9 5 5 17 21 9 8 14 8 ...
## $ commercial1o.commercial_r1_4365: int 5 5 6 9 24 14 6 4 7 5 ...
## $ commercial2o.commercial_r2_4366: int 8 8 8 13 16 16 8 6 7 6 ...
## $ commercial3o.commercial_r3_4367: int 5 3 5 6 33 17 8 4 13 6 ...
## $ commercial4o.commercial_r4_4368: int 9 8 4 3 29 12 4 7 8 4 ...
## $ commercial5o.commercial_r5_4369: int 1 8 2 2 16 10 6 5 18 1 ...
colnames(data)
## [1] "ID_REF" "control1.4362"
## [3] "control2.4363" "control3.4364"
## [5] "MS1_r1_4370" "MS1_r2_4371"
## [7] "MS1_r3_4372" "MS1_r4_4373"
## [9] "MS1_r5_4374" "MS2_r1_4375"
## [11] "MS2_r2_4376" "MS2_r3_4377"
## [13] "MS2_r4_4378" "MS2_r5_4379"
## [15] "commercial1o.commercial_r1_4365" "commercial2o.commercial_r2_4366"
## [17] "commercial3o.commercial_r3_4367" "commercial4o.commercial_r4_4368"
## [19] "commercial5o.commercial_r5_4369"
data$controlMeans <- rowMeans(data[,2:4],na.rm=F,dims=1)
data$MS1_Means <- rowMeans(data[,5:9], na.rm=F, dims=1)
data$MS2_Means <- rowMeans(data[,10:14], na.rm=F, dims=1)
data$commercial_Means <- rowMeans(data[,15:19],na.rm=F, dims=1)
summary(data)
## ID_REF control1.4362 control2.4363 control3.4364
## Length:8838657 Min. : 1.00 Min. : 1.00 Min. : 1.00
## Class :character 1st Qu.: 6.00 1st Qu.: 5.00 1st Qu.: 6.00
## Mode :character Median : 10.00 Median : 10.00 Median : 10.00
## Mean : 12.06 Mean : 11.66 Mean : 11.76
## 3rd Qu.: 16.00 3rd Qu.: 16.00 3rd Qu.: 16.00
## Max. :724.00 Max. :634.00 Max. :693.00
## MS1_r1_4370 MS1_r2_4371 MS1_r3_4372 MS1_r4_4373
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 12.00 1st Qu.: 14.00 1st Qu.: 10.00 1st Qu.: 21.00
## Median : 25.00 Median : 26.00 Median : 18.00 Median : 37.00
## Mean : 33.24 Mean : 33.42 Mean : 23.49 Mean : 47.73
## 3rd Qu.: 45.00 3rd Qu.: 44.00 3rd Qu.: 31.00 3rd Qu.: 62.00
## Max. :2287.00 Max. :2734.00 Max. :2089.00 Max. :3993.00
## MS1_r5_4374 MS2_r1_4375 MS2_r2_4376 MS2_r3_4377
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 18.00 1st Qu.: 16.00 1st Qu.: 16.00 1st Qu.: 14.00
## Median : 32.00 Median : 28.00 Median : 27.00 Median : 25.00
## Mean : 40.83 Mean : 34.06 Mean : 33.71 Mean : 30.51
## 3rd Qu.: 53.00 3rd Qu.: 45.00 3rd Qu.: 44.00 3rd Qu.: 40.00
## Max. :3215.00 Max. :2398.00 Max. :2412.00 Max. :2127.00
## MS2_r4_4378 MS2_r5_4379 commercial1o.commercial_r1_4365
## Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 16.00 1st Qu.: 15.00 1st Qu.: 14.00
## Median : 28.00 Median : 25.00 Median : 25.00
## Mean : 34.03 Mean : 31.16 Mean : 31.17
## 3rd Qu.: 45.00 3rd Qu.: 41.00 3rd Qu.: 41.00
## Max. :2298.00 Max. :2173.00 Max. :2496.00
## commercial2o.commercial_r2_4366 commercial3o.commercial_r3_4367
## Min. : 1.00 Min. : 1.00
## 1st Qu.: 13.00 1st Qu.: 15.00
## Median : 23.00 Median : 26.00
## Mean : 29.57 Mean : 34.86
## 3rd Qu.: 39.00 3rd Qu.: 45.00
## Max. :2226.00 Max. :3084.00
## commercial4o.commercial_r4_4368 commercial5o.commercial_r5_4369
## Min. : 1.00 Min. : 1.00
## 1st Qu.: 11.00 1st Qu.: 11.00
## Median : 19.00 Median : 20.00
## Mean : 25.01 Mean : 25.54
## 3rd Qu.: 33.00 3rd Qu.: 34.00
## Max. :1908.00 Max. :1908.00
## controlMeans MS1_Means MS2_Means commercial_Means
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 6.00 1st Qu.: 17.20 1st Qu.: 16.40 1st Qu.: 13.80
## Median : 10.00 Median : 28.20 Median : 26.80 Median : 23.00
## Mean : 11.83 Mean : 35.74 Mean : 32.69 Mean : 29.23
## 3rd Qu.: 15.67 3rd Qu.: 45.60 3rd Qu.: 42.20 3rd Qu.: 37.60
## Max. :683.67 Max. :2853.20 Max. :2281.60 Max. :2324.40
Lets use fold change of the MS1, MS2, and commercial MS patient sample compared to the control mean to get our changes in pathology compared to healthy.
data$foldchange_MS1_vs_control <- data$MS1_Means/data$controlMeans
summary(data$foldchange_MS1_vs_control)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.06076 2.14054 2.92258 3.31576 4.03333 161.40000
data$foldchange_MS2_vs_control <- data$MS2_Means/data$controlMeans
summary(data$foldchange_MS2_vs_control)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0625 2.1600 2.7120 2.9620 3.4645 110.4000
data$foldchange_commercialMS_vs_control <- data$controlMeans/data$commercial_Means
summary(data$foldchange_commercialMS_vs_control)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.007023 0.302144 0.415225 0.470008 0.575540 17.916667
top50 and bottom 50 genes when ordered by fold change type.
top50bottom50_MS1_FC <- data[order(data$foldchange_MS1_vs_control,decreasing=T)[c(1:50,8838608:8838657)],]
top100_MS1_cDNA <- top50bottom50_MS1_FC$ID_REF
top100_MS1_cDNA
## [1] "GAGTCGTTTAAAGGCTCTCT" "CACCGTCGTTTTTGTGACCG" "CCCATAGCGATCTAACTTTT"
## [4] "CTCCAGAGCCGTTTTCGGTG" "TTTAGAGTCGGTGGTAGATC" "GGCTCGGAGTCGCTGAAAAT"
## [7] "GTGATTCCACAGTCGTTAAT" "TCTGCTCTCTTTACTTATAC" "GTAGAGTCGTTACCCGACAC"
## [10] "GTACCGTCGGTTGCTCGTGC" "CCAGTCGATTCTTTTCATAT" "CCTCTCACCAGTCGTTTTGG"
## [13] "AGAGTCGATTTGTCCAATCG" "TTTCGGGGAACCGAGTCGAT" "ATCGTCGGTCTTAGCGGTCA"
## [16] "AGAGTCGCTCGTTAGGATCT" "GGAGTCGTCTTTTTATCCCC" "GCTTCGCAGTCGTTAGAGTT"
## [19] "CCAGCAGAGTCGCTCGAAAT" "TAGACATGCAGTCGTTTCGA" "ATCTTCGTTTTTCTTTCGGA"
## [22] "AGATTAACCCAATACATTAT" "CGGTTAGAGTCGATAGCTTT" "CTATCAACAGAGTCGCTAAT"
## [25] "GGTGTTGTCAGAGTCGTTAA" "GTGAGGATACAGTCGGTTTT" "ACCCCCGTCGTTAATTCGAC"
## [28] "ATCGTCGTTTTAGCCGTAGG" "AGAGTCGCTCAACTCCGACT" "CTTGCTCCAGGTCAGAGCGA"
## [31] "ATGAGTCGTTTCGTGTTTGG" "GGATGATACTGTCGTTTTCG" "TTAGAGTCGTCGGTTTTACT"
## [34] "TTAATGTCGTTTTTTGGCGA" "CGAGTCGTTTGACCGGCGCA" "ATTCGGGTACCGTCGGTTTT"
## [37] "CATTGTCGTTTTGAGACCGG" "AGACCCGAGCCGTTTTCTTC" "AGAGGCGTTCGATCTTAGAC"
## [40] "AACATTTAAGATCCGGGTTG" "TGAATTTTAGAGTCGGTTTC" "GCCGAGTCGTTATGGACCCA"
## [43] "AGGAGTCGTTAATTCTGATC" "GAGTCGTTCTCGTTTCGCAG" "ACCGTCGCTTGAGGTCAGAT"
## [46] "CAGCGTCGTTTCTTCGTAGT" "GACACCGGTCGTTTGTCAGC" "GGGGGAGTCGTTTCGCTCCA"
## [49] "ACTGTCGTTTCAACGTTGAA" "GGTGCGTGGTCGTTTTGAGA" "TAGAGTACCGTTTTTGAACT"
## [52] "AACCAGAGTAGCGTTTGCTT" "ACCCGGACCCTTGACTCACC" "GCATCCCGGGCGCGTCTAAC"
## [55] "TGCCCGCGCCTACAGTAGTG" "AAGTGTGGCCCTTTGGGTTT" "CATGCCGGTCCCTTTATCTT"
## [58] "ACTTCCGGGTCCCTTTCGTC" "GGCTTTTTTTTTTCTTTGTG" "GCAGTCCTGTTTTACTCCCG"
## [61] "TAGGGCCCTTTCTTCGCCAG" "TCCGGTCGCGTGGCTCATAC" "CGCCTCCCCGGGCCTTAATT"
## [64] "CCGTGGCTTTTTTTCTTACG" "TGAGAGTACCCGGGCCTTTC" "ATCCGGGTGGCGCTTTTTTC"
## [67] "ACGGCCCCTCTTTGCCCATT" "CTGTCCGGCCCTGTCTTATT" "GCACAATTTTCATGTGGGAC"
## [70] "GGGCGTGTTTTTCTGGAGTA" "CAGGGGCGTGAGCTTTCTGT" "CTATGGTCCCTTAGTGTTTA"
## [73] "CGGGCCTTTCTAGTCATCAG" "GGCGGGTCTTGTGTTTTGCT" "GCCGTCCTGTCTTTCTCATT"
## [76] "GCGGTCCCTTAGCTCTTCCG" "ACCGGCCTTTTTGGCAGGTC" "GCCCGGGCTTGTAGGTCTTT"
## [79] "ATGCTGGCCTTTGTATTTAC" "CGGGTGGCGGGGTTTTTATC" "AACGCACGGGCGTGTTAGTC"
## [82] "GTGCGGGCCCTTCGTCCTGT" "ACACTGGCGCGTTTTTCCCA" "ACGGTGGCTTCTCTTACGTG"
## [85] "ATGTCGCGGCGTGTGGTTTT" "TAGTGGCGTGAGATTTGCGT" "TAGACACGGGCCTTTGCTAC"
## [88] "TGCGGTCGCGACCTTTCAGC" "GGGGTCCTTTTATCCTAATC" "GTGGCACAGGGTCGCGTAAA"
## [91] "TTGGTGTGGTGTTTGTTCCA" "GGTCCTGTCTTTTCTGCTGA" "CGGGCCTGAGTTTTCTACGC"
## [94] "GTGGGCCCCTTTGATTCTTC" "AGGGTCCTTTGGGGTCAGAA" "TTACGGCCGCGGTTTTACTG"
## [97] "TGTCGCGTATTTTCTCCAAA" "CCATGGTCGTGTACCGTTAA" "GGCCGGCCCTTTAGGCTTGA"
## [100] "TTCACGGTCCTTTTGGTCAC"
top50bottom50_MS2_FC <- data[order(data$foldchange_MS2_vs_control,decreasing=T)[c(1:50,8838608:8838657)],]
top100_MS2_cDNA <- top50bottom50_MS2_FC$ID_REF
top100_MS2_cDNA
## [1] "GAGTCGTTTAAAGGCTCTCT" "ATCGTCGGTCTTAGCGGTCA" "GTAGAGTCGTTACCCGACAC"
## [4] "CTCCAGAGCCGTTTTCGGTG" "TAGACATGCAGTCGTTTCGA" "GGCTCGGAGTCGCTGAAAAT"
## [7] "ACCCCCGTCGTTAATTCGAC" "AGAGTCGCTCGTTAGGATCT" "CCAGTCGATTCTTTTCATAT"
## [10] "ACCGCGAGTCGCTTGAACTC" "GGAGTCGTCTTTTTATCCCC" "CACCGTCGTTTTTGTGACCG"
## [13] "GGTGTTGTCAGAGTCGTTAA" "AGAGTCGATTTGTCCAATCG" "CCAGCAGAGTCGCTCGAAAT"
## [16] "CCTCTCACCAGTCGTTTTGG" "AGACCCGAGCCGTTTTCTTC" "TTTAGAGTCGGTGGTAGATC"
## [19] "GAGTCGTTCTCGTTTCGCAG" "CGGTTAGAGTCGATAGCTTT" "AGAGTCGCTCAACTCCGACT"
## [22] "TGTATCCACCCCCGCCCTAT" "CAGCGTCGTTTCTTCGTAGT" "CGAGTCGTTTGACCGGCGCA"
## [25] "TCCGAGTCGATTTCGCTAAC" "CGACCAGTCGTTTATACACC" "GTGATTCCACAGTCGTTAAT"
## [28] "TAACGGAGTCGTTTTTCAAG" "AGATTAACCCAATACATTAT" "ATCGTCGTTTTAGCCGTAGG"
## [31] "GTGAGGATACAGTCGGTTTT" "GCTTCGCAGTCGTTAGAGTT" "TAGAGTCGTTCTCTACGCGA"
## [34] "GTACCGTCGGTTGCTCGTGC" "GGGTTCCGAGTCGTTCAAGT" "GCTATCGGCGTTTTCGTATT"
## [37] "ATGAGTCGTTTCGTGTTTGG" "TTTCGGGGAACCGAGTCGAT" "ACTGTCGTTTCAACGTTGAA"
## [40] "TAGCGCCGTTGTTGTTCTTA" "TTAGAGTCGTCGGTTTTACT" "AGAGGCGTTCGATCTTAGAC"
## [43] "ACCGTCGCTTGAGGTCAGAT" "GCCGAGTCGTTATGGACCCA" "AGATGCCAGTCGTTTCTCTT"
## [46] "TGAATTTTAGAGTCGGTTTC" "CTAAAGCGTCGCTTGTAGTT" "TTTACCGGGGCCGAGTCGCT"
## [49] "CTATCAACAGAGTCGCTAAT" "ATTCGGGTACCGTCGGTTTT" "TTGTTATCGTTATAGGCGTG"
## [52] "TGAAAAGTGGCGAGTCTATT" "GGTGGCGGGCCTTTATACCT" "TGCGTATGGTCGCGTCTTGC"
## [55] "CTCGATGGCGTGTAGTGTAG" "CTTTATCTGATACAGTAGTG" "TAATAAACCCGATAGTGTAG"
## [58] "TGCGCGGGCGCGTTTCGATA" "GCCAGGGCCCCTTTCGTCAT" "GGTCACAGTAGTGTCGAGCT"
## [61] "GGAACCAGTGTAGTGAAGAG" "TGCGGTCGCGACCTTTCAGC" "ACTTCCACTTTTTAGTGGCG"
## [64] "ACGGCCCCTCTTTGCCCATT" "TGTAGTGCTATTGGCGTGTC" "TTGTAGGCGTGTATTTTCTA"
## [67] "TCGGTGTATTTTTAGCGGCG" "GGCTACCTCGAAGAGTAGTG" "GGGCGTGTTTTTCTGGAGTA"
## [70] "GTCAGTGGCCTGTACGTTTC" "CGCTCGGGCCTGTTTTCTCA" "TCATAGCGTAGTGTGGCTTA"
## [73] "TGAAGTGTAGTGGATCATTT" "GCTGATACCGCGTAGTGTAG" "AATTGCGGCCCTTCCATTTT"
## [76] "TAGAGTACCGTTTTTGAACT" "TAGTGAAGTGTCCCATCGCA" "AGCTCTAGGGCCCCTTTTCG"
## [79] "CGGTCTGTAGGAGTGTCGTG" "GGTCCTGTCTTTTCTGCTGA" "TCTATGTACTTACCGTAGTG"
## [82] "AACGCACGGGCGTGTTAGTC" "CGTTCCATGGTAGTCTAGTG" "CTATCCCAAGTAGTGTATTG"
## [85] "ACCGGCCTTTTTGGCAGGTC" "TGTAGATTACTGTAGTGGCG" "TAGTGGCGTGAGATTTGCGT"
## [88] "GTGGGCCCCTTTGATTCTTC" "GCCGTCCTGTCTTTCTCATT" "TCATACTTACCTGCCTTTAA"
## [91] "TTTGCCACGGGCGCGTTTCA" "TGAAATACGTCAGTGTAGTG" "TCCCGGGGCCTCTGTTTTAT"
## [94] "AACCAGAGTAGCGTTTGCTT" "TACAGTCCTTTCTGTTGACG" "TGCCCGCGCCTACAGTAGTG"
## [97] "TTCTAGTAGTGTCCTGTACC" "CTATGGTCCCTTAGTGTTTA" "TTCACGGTCCTTTTGGTCAC"
## [100] "CTTCTGTTAGTGTAGTGTTG"
top50bottom50_commercial_FC <- data[order(data$foldchange_commercialMS_vs_control,decreasing=T)[c(1:50,8838608:8838657)],]
top100_commercialMS_cDNA <- top50bottom50_commercial_FC$ID_REF
top100_commercialMS_cDNA
## [1] "TTACGGCCGCGGTTTTACTG" "TTAGCGACGTGTACAGCCTG" "TCTGCTTACGGTCCCTTTTA"
## [4] "TTCACGGTCCTTTTGGTCAC" "GCCAGGGCCCCTTTCGTCAT" "TGAAAAGTGGCGAGTCTATT"
## [7] "TGCGGTCGCGACCTTTCAGC" "GTCAGTGGCCTGTACGTTTC" "GACAGTGTAGTGAATATTGT"
## [10] "TCGGTGGTAGGGTCCTTTTC" "TAGTGGCGTGAGATTTGCGT" "TGTAGATTACTGTAGTGGCG"
## [13] "TGCCCGCGCCTACAGTAGTG" "GCATTCAGAGTAGTGTGTCT" "GGCGCCTAAATTTATCTTTT"
## [16] "GCCGTCCTGTCTTTCTCATT" "CAAATCAACCCTTAGTGGCG" "AACGCACGGGCGTGTTAGTC"
## [19] "ACAGGCCTGTCTTATGTTTG" "CGGTCTGTAGGAGTGTCGTG" "ACAGTAGGGTCTTGGCTGCT"
## [22] "GGTCCTGTCTTTTCTGCTGA" "TAGAGTACCGTTTTTGAACT" "ATGGACCTGTTTTCTTTTAG"
## [25] "CGCTCGGGCCTGTTTTCTCA" "CTTCTGTTAGTGTAGTGTTG" "CTGTCCGGCCCTGTCTTATT"
## [28] "CAATATCGGTCCTGTTTTTT" "CGGGCCTTTCTAGTCATCAG" "CGGGTGGCGGGGTTTTTATC"
## [31] "TGCGTATGGTCGCGTCTTGC" "GGTCACAGTAGTGTCGAGCT" "GGGCGTGTTTTTCTGGAGTA"
## [34] "GCTCGTGGGCCCTTTTTCGT" "ACTTCCGGGTCCCTTTCGTC" "ATCCGGGTGGCGCTTTTTTC"
## [37] "GGGGGTCCTTTTTGAATTCG" "ATTGGCCTGTATTATTGCGC" "ACGGGCCTCTTTGCTCGTGT"
## [40] "ACCCGGACCCTTGACTCACC" "AATACGGGCCCGTGTTACCC" "GTGCGGGCCCTTCGTCCTGT"
## [43] "CAAGCAGTCCTTTCTTTTAA" "CGCACCGGGGTCCCGTTTTT" "CGGACCCGGTAGTGTAGCTT"
## [46] "AGTTCAGGGGCCCTTTCTCG" "GCCCTGGCCCTTTATCTTGA" "GCCTCCGGCCCTTTTCCTTC"
## [49] "AAACCGCGGGCCCTTTAGGA" "CTATGGTCCCTTAGTGTTTA" "AACATTTAAGATCCGGGTTG"
## [52] "AGTGCACATTTTAACCGATC" "GGGGGAGTCGTTTCGCTCCA" "GAGTCGTTCTCGTTTCGCAG"
## [55] "AGTCTGTGGGCGGAAAGATG" "TTTTACAGTCGTTCGGATGT" "TGTCAAGTCGTTTGTGTTGA"
## [58] "GTGAGGATACAGTCGGTTTT" "TAACGGAGTCGTTTTTCAAG" "AGCTTCGTTTTTCGTTACGG"
## [61] "ACCGCGAGTCGCTTGAACTC" "CGGACCCGGTCGATTCGGTA" "CCAGTCGTTTTGACTAGGCC"
## [64] "TGAATTTTAGAGTCGGTTTC" "CCTCTCACCAGTCGTTTTGG" "CGGTTAGAGTCGATAGCTTT"
## [67] "ATCGTCGTTTTAGCCGTAGG" "TTAATGTCGTTTTTTGGCGA" "AGAGGCGTTCGATCTTAGAC"
## [70] "CGAGTCGTTTGACCGGCGCA" "GCCGAGTCGTTATGGACCCA" "AGATTAACCCAATACATTAT"
## [73] "GTGATTCCACAGTCGTTAAT" "CAAGGGATATCCACTTGCGT" "TAGAGTCGTTCTCTACGCGA"
## [76] "CTATCAACAGAGTCGCTAAT" "TCCGAGTCGATTTCGCTAAC" "ACTGTCGTTTCAACGTTGAA"
## [79] "CCAGTCGATTCTTTTCATAT" "GCTTCGCAGTCGTTAGAGTT" "CGGGGCTAGGTACAGTGATC"
## [82] "ACCCCCGTCGTTAATTCGAC" "GGAGTCGTCTTTTTATCCCC" "TTTCGGGGAACCGAGTCGAT"
## [85] "GGTCATGACCGTTCCGTTAA" "AGAGTCGATTTGTCCAATCG" "TTTAGAGTCGGTGGTAGATC"
## [88] "GGCTCGGAGTCGCTGAAAAT" "AGAGTCGCTCGTTAGGATCT" "GTACCGTCGGTTGCTCGTGC"
## [91] "GGTGTTGTCAGAGTCGTTAA" "ATCGTCGGTCTTAGCGGTCA" "TAGACATGCAGTCGTTTCGA"
## [94] "CTCCAGAGCCGTTTTCGGTG" "GTAGAGTCGTTACCCGACAC" "CCAGCAGAGTCGCTCGAAAT"
## [97] "ATGTTTTAATTGCTATAAGA" "CACCGTCGTTTTTGTGACCG" "CTATCCCGAGATCCGGCTGG"
## [100] "GAGTCGTTTAAAGGCTCTCT"
Are there any strands in the fold change groups common to all 3 sets of top 100 genes?
common1 <- top100_commercialMS_cDNA[which(top100_commercialMS_cDNA %in% top100_MS1_cDNA)]
common2 <- top100_MS1_cDNA[which(top100_MS1_cDNA %in% top100_MS2_cDNA)]
commonAll3 <- common1[which(common1 %in% common2 )]
commonAll3
## [1] "TTCACGGTCCTTTTGGTCAC" "TGCGGTCGCGACCTTTCAGC" "TAGTGGCGTGAGATTTGCGT"
## [4] "TGCCCGCGCCTACAGTAGTG" "GCCGTCCTGTCTTTCTCATT" "AACGCACGGGCGTGTTAGTC"
## [7] "GGTCCTGTCTTTTCTGCTGA" "TAGAGTACCGTTTTTGAACT" "GGGCGTGTTTTTCTGGAGTA"
## [10] "CTATGGTCCCTTAGTGTTTA" "GAGTCGTTCTCGTTTCGCAG" "GTGAGGATACAGTCGGTTTT"
## [13] "TGAATTTTAGAGTCGGTTTC" "CCTCTCACCAGTCGTTTTGG" "CGGTTAGAGTCGATAGCTTT"
## [16] "ATCGTCGTTTTAGCCGTAGG" "AGAGGCGTTCGATCTTAGAC" "CGAGTCGTTTGACCGGCGCA"
## [19] "GCCGAGTCGTTATGGACCCA" "AGATTAACCCAATACATTAT" "GTGATTCCACAGTCGTTAAT"
## [22] "CTATCAACAGAGTCGCTAAT" "ACTGTCGTTTCAACGTTGAA" "CCAGTCGATTCTTTTCATAT"
## [25] "GCTTCGCAGTCGTTAGAGTT" "ACCCCCGTCGTTAATTCGAC" "GGAGTCGTCTTTTTATCCCC"
## [28] "TTTCGGGGAACCGAGTCGAT" "AGAGTCGATTTGTCCAATCG" "TTTAGAGTCGGTGGTAGATC"
## [31] "GGCTCGGAGTCGCTGAAAAT" "AGAGTCGCTCGTTAGGATCT" "GTACCGTCGGTTGCTCGTGC"
## [34] "GGTGTTGTCAGAGTCGTTAA" "ATCGTCGGTCTTAGCGGTCA" "TAGACATGCAGTCGTTTCGA"
## [37] "CTCCAGAGCCGTTTTCGGTG" "GTAGAGTCGTTACCCGACAC" "CCAGCAGAGTCGCTCGAAAT"
## [40] "CACCGTCGTTTTTGTGACCG" "GAGTCGTTTAAAGGCTCTCT"
Great, there are some top genes common to all 3 sets of fold change values top 100 genes each. Totaling 41 genes. Lets make this its own data frame of common 41 cDNA gene base pair strands.
top41 <- data[which(data$ID_REF %in% commonAll3),]
summary(top41)
## ID_REF control1.4362 control2.4363 control3.4364
## Length:41 Min. : 1.000 Min. : 1.000 Min. : 1.000
## Class :character 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000
## Mode :character Median : 2.000 Median : 1.000 Median : 2.000
## Mean : 7.512 Mean : 7.634 Mean : 7.927
## 3rd Qu.: 4.000 3rd Qu.: 6.000 3rd Qu.: 4.000
## Max. :33.000 Max. :46.000 Max. :38.000
## MS1_r1_4370 MS1_r2_4371 MS1_r3_4372 MS1_r4_4373
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1
## 1st Qu.: 17.00 1st Qu.: 31.00 1st Qu.: 34.00 1st Qu.: 61
## Median : 78.00 Median : 76.00 Median : 55.00 Median :103
## Mean : 78.76 Mean : 79.66 Mean : 55.98 Mean :115
## 3rd Qu.:120.00 3rd Qu.:114.00 3rd Qu.: 75.00 3rd Qu.:180
## Max. :225.00 Max. :247.00 Max. :206.00 Max. :298
## MS1_r5_4374 MS2_r1_4375 MS2_r2_4376 MS2_r3_4377
## Min. : 1.0 Min. : 1.00 Min. : 1.00 Min. : 2.00
## 1st Qu.: 42.0 1st Qu.: 34.00 1st Qu.: 43.00 1st Qu.: 21.00
## Median : 99.0 Median : 69.00 Median : 71.00 Median : 77.00
## Mean :100.1 Mean : 67.07 Mean : 67.17 Mean : 64.29
## 3rd Qu.:134.0 3rd Qu.: 93.00 3rd Qu.: 89.00 3rd Qu.: 91.00
## Max. :311.0 Max. :256.00 Max. :228.00 Max. :200.00
## MS2_r4_4378 MS2_r5_4379 commercial1o.commercial_r1_4365
## Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 36.00 1st Qu.: 26.00 1st Qu.: 50.00
## Median : 75.00 Median : 64.00 Median : 71.00
## Mean : 69.49 Mean : 60.73 Mean : 72.17
## 3rd Qu.: 95.00 3rd Qu.: 87.00 3rd Qu.: 98.00
## Max. :241.00 Max. :177.00 Max. :251.00
## commercial2o.commercial_r2_4366 commercial3o.commercial_r3_4367
## Min. : 1.00 Min. : 1.00
## 1st Qu.: 23.00 1st Qu.: 54.00
## Median : 70.00 Median : 94.00
## Mean : 68.54 Mean : 85.85
## 3rd Qu.: 98.00 3rd Qu.:111.00
## Max. :200.00 Max. :254.00
## commercial4o.commercial_r4_4368 commercial5o.commercial_r5_4369
## Min. : 1.00 Min. : 1.00
## 1st Qu.: 24.00 1st Qu.: 29.00
## Median : 67.00 Median : 60.00
## Mean : 60.68 Mean : 57.95
## 3rd Qu.: 86.00 3rd Qu.: 77.00
## Max. :207.00 Max. :185.00
## controlMeans MS1_Means MS2_Means commercial_Means
## Min. : 1.000 Min. : 1.6 Min. : 1.40 Min. : 1.80
## 1st Qu.: 1.000 1st Qu.: 51.6 1st Qu.: 42.20 1st Qu.: 41.60
## Median : 1.667 Median : 88.8 Median : 73.60 Median : 71.80
## Mean : 7.691 Mean : 85.9 Mean : 65.75 Mean : 69.04
## 3rd Qu.: 4.333 3rd Qu.:125.0 3rd Qu.: 87.00 3rd Qu.: 91.60
## Max. :34.333 Max. :248.0 Max. :220.40 Max. :212.20
## foldchange_MS1_vs_control foldchange_MS2_vs_control
## Min. : 0.06076 Min. : 0.06835
## 1st Qu.: 51.60000 1st Qu.: 41.00000
## Median : 64.40000 Median : 48.60000
## Mean : 55.81724 Mean : 43.15910
## 3rd Qu.: 75.40000 3rd Qu.: 61.40000
## Max. :161.40000 Max. :110.40000
## foldchange_commercialMS_vs_control
## Min. : 0.007023
## 1st Qu.: 0.016129
## Median : 0.019707
## Mean : 2.523062
## 3rd Qu.: 0.024039
## Max. :14.629630
Lets write the data and the top41 genes out to csv to use as needed.
write.csv(data,'foldchange3setsVsControlMeans.csv',row.names=F)
write.csv(top41,'top41genesCommonToAllFoldchangeValues3groupsMS.csv',row.names=F)
We will test these genes out in bioconductor to see if it is working today. And also with random forest modeling later to see if we can use these genes to predict the class of the sample as healthy or Multiple Sclerosis pathology.