An extension to the Multiple Sclerosis gene expression data of barcodes done earlier in week but making an additional couple fields to look at the amino acids made from the cDNA and the amino acids made from the mRNA using Bioconductor’s Biostrings library.
I added in the gene summaries to the top 41 genes after using the BLAST online tool to find genes close to the target strip of 20 base pair cDNA nucleotides and using the correct settings. We will upload our table on Multiple sclerosis genes and accompanying features from the GSE293036 study we have worked on last few weeks. We have an inverse relationship fold change of control to commercial MS line that is reverse correlated with the fold change of the commercial MS to control means.
barcodes <- read.csv('MS_top41genes_FCs_summariesAdded.csv', header=T, sep=',', na.strings=c('',' ','na','NA'))
str(barcodes)
## 'data.frame': 41 obs. of 34 variables:
## $ ID_REF : chr "GAGTCGTTTAAAGGCTCTCT" "CACCGTCGTTTTTGTGACCG" "CCAGCAGAGTCGCTCGAAAT" "GTAGAGTCGTTACCCGACAC" ...
## $ geneSynonyms : chr "CLINT1" "IGFBP7" "FAM20C" "SMARCA2" ...
## $ genecardsSummary : chr "GeneCards Symbol: CLINT1 \nClathrin Interactor 1\nClathrin Interacting Protein Localized In The Trans-Golgi Reg"| __truncated__ "GeneCards Symbol: IGFBP7\nInsulin Like Growth Factor Binding Protein 7\nGeneCards Summary for IGFBP7 Gene\nIGFB"| __truncated__ "GeneCards Symbol: FAM20C \nFAM20C Golgi Associated Secretory Pathway Kinase\nGeneCards Summary for FAM20C Gene\"| __truncated__ "GeneCards Symbol: SMARCA2\nSWI/SNF Related BAF Chromatin Remodeling Complex Subunit ATPase 2 \nGeneCards Summar"| __truncated__ ...
## $ NCBI_Summary : chr "NCBI Gene Summary for CLINT1 Gene \nThis gene encodes a protein with similarity to the epsin family of endocyti"| __truncated__ "NCBI Gene Summary for IGFBP7 Gene \nThis gene encodes a member of the insulin-like growth factor (IGF)-binding "| __truncated__ "This gene encodes a member of the family of secreted protein kinases. The encoded protein binds calcium and pho"| __truncated__ "NCBI Gene Summary for SMARCA2 Gene \nThe protein encoded by this gene is a member of the SWI/SNF family of prot"| __truncated__ ...
## $ uniProtSummary : chr "UniProtKB/Swiss-Prot Summary for CLINT1 Gene\nBinds to membranes enriched in phosphatidylinositol 4,5-bisphosph"| __truncated__ "UniProtKB/Swiss-Prot Summary for IGFBP7 Gene\nBinds IGF1 and IGF2 with a relatively low affinity. Stimulates pr"| __truncated__ "UniProtKB/Swiss-Prot Summary for FAM20C Gene\nGolgi serine/threonine protein kinase that phosphorylates secreto"| __truncated__ "UniProtKB/Swiss-Prot Summary for SMARCA2 Gene\nATPase involved in transcriptional activation and repression of "| __truncated__ ...
## $ Ensembl_Name : chr "ENSG00000113282" "ENSG00000163453" "ENSG00000177706" "ENSG00000080503" ...
## $ control1.4362 : int 1 1 1 1 1 1 1 1 3 2 ...
## $ control2.4363 : int 1 1 1 1 2 1 1 1 2 1 ...
## $ control3.4364 : int 1 1 1 1 1 2 1 2 3 1 ...
## $ MS1_r1_4370 : int 125 103 85 82 141 61 17 102 225 94 ...
## $ MS1_r2_4371 : int 185 63 47 67 84 114 81 62 167 101 ...
## $ MS1_r3_4372 : int 97 85 65 55 61 65 70 58 113 49 ...
## $ MS1_r4_4373 : int 209 124 83 99 190 103 102 93 269 152 ...
## $ MS1_r5_4374 : int 191 142 67 97 156 116 103 111 260 98 ...
## $ MS2_r1_4375 : int 102 55 53 69 69 101 66 67 148 74 ...
## $ MS2_r2_4376 : int 122 61 60 51 88 110 110 76 143 85 ...
## $ MS2_r3_4377 : int 111 52 58 77 110 87 89 87 115 88 ...
## $ MS2_r4_4378 : int 114 75 61 94 105 110 74 79 94 95 ...
## $ MS2_r5_4379 : int 103 64 58 67 105 68 48 99 112 81 ...
## $ commercial1o.commercial_r1_4365 : int 132 96 55 78 104 88 68 101 154 70 ...
## $ commercial2o.commercial_r2_4366 : int 155 65 84 82 92 105 46 70 200 98 ...
## $ commercial3o.commercial_r3_4367 : int 203 123 96 94 109 97 85 111 178 103 ...
## $ commercial4o.commercial_r4_4368 : int 123 64 67 51 89 86 73 77 130 77 ...
## $ commercial5o.commercial_r5_4369 : int 99 107 57 46 64 64 56 77 185 70 ...
## $ controlMeans : num 1 1 1 1 1.33 ...
## $ MS1_Means : num 161.4 103.4 69.4 80 126.4 ...
## $ MS2_Means : num 110.4 61.4 58 71.6 95.4 ...
## $ commercial_Means : num 142.4 91 71.8 70.2 91.6 ...
## $ foldchange_MS1_vs_control : num 161.4 103.4 69.4 80 94.8 ...
## $ foldchange_MS2_vs_control : num 110.4 61.4 58 71.6 71.5 ...
## $ RealFC_commercial_control_means : num 142.4 91 71.8 70.2 68.7 ...
## $ inverse_FC_control_over_commercial_means: num 0.00702 0.01099 0.01393 0.01425 0.01456 ...
## $ mRNA : chr "CUCAGCAAAUUUCCGAGAGA" "GUGGCAGCAAAAACACUGGC" "GGUCGUCUCAGCGAGCUUUA" "CAUCUCAGCAAUGGGCUGUG" ...
## $ aminoAcids_simulated : chr "-Leu-A-Ala-AA-Phe-CC-Glu-Arg-" "#NAME?" "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "#NAME?" ...
IDs <- barcodes$ID_REF
IDs
## [1] "GAGTCGTTTAAAGGCTCTCT" "CACCGTCGTTTTTGTGACCG" "CCAGCAGAGTCGCTCGAAAT"
## [4] "GTAGAGTCGTTACCCGACAC" "CTCCAGAGCCGTTTTCGGTG" "TAGACATGCAGTCGTTTCGA"
## [7] "ATCGTCGGTCTTAGCGGTCA" "GGTGTTGTCAGAGTCGTTAA" "GTACCGTCGGTTGCTCGTGC"
## [10] "AGAGTCGCTCGTTAGGATCT" "GGCTCGGAGTCGCTGAAAAT" "TTTAGAGTCGGTGGTAGATC"
## [13] "AGAGTCGATTTGTCCAATCG" "TTTCGGGGAACCGAGTCGAT" "GGAGTCGTCTTTTTATCCCC"
## [16] "ACCCCCGTCGTTAATTCGAC" "GCTTCGCAGTCGTTAGAGTT" "CCAGTCGATTCTTTTCATAT"
## [19] "ACTGTCGTTTCAACGTTGAA" "CTATCAACAGAGTCGCTAAT" "GTGATTCCACAGTCGTTAAT"
## [22] "AGATTAACCCAATACATTAT" "GCCGAGTCGTTATGGACCCA" "CGAGTCGTTTGACCGGCGCA"
## [25] "AGAGGCGTTCGATCTTAGAC" "ATCGTCGTTTTAGCCGTAGG" "CGGTTAGAGTCGATAGCTTT"
## [28] "CCTCTCACCAGTCGTTTTGG" "TGAATTTTAGAGTCGGTTTC" "GTGAGGATACAGTCGGTTTT"
## [31] "GAGTCGTTCTCGTTTCGCAG" "CTATGGTCCCTTAGTGTTTA" "GGGCGTGTTTTTCTGGAGTA"
## [34] "TAGAGTACCGTTTTTGAACT" "GGTCCTGTCTTTTCTGCTGA" "AACGCACGGGCGTGTTAGTC"
## [37] "GCCGTCCTGTCTTTCTCATT" "TGCCCGCGCCTACAGTAGTG" "TAGTGGCGTGAGATTTGCGT"
## [40] "TGCGGTCGCGACCTTTCAGC" "TTCACGGTCCTTTTGGTCAC"
Load the Biostrings package from earlier installation.
Look at the mRNA we found by translating cDNA into mRNA in our table.
barcodes$mRNA
## [1] "CUCAGCAAAUUUCCGAGAGA" "GUGGCAGCAAAAACACUGGC" "GGUCGUCUCAGCGAGCUUUA"
## [4] "CAUCUCAGCAAUGGGCUGUG" "GAGGUCUCGGCAAAAGCCAC" "AUCUGUACGUCAGCAAAGCU"
## [7] "UAGCAGCCAGAAUCGCCAGU" "CCACAACAGUCUCAGCAAUU" "CAUGGCAGCCAACGAGCACG"
## [10] "UCUCAGCGAGCAAUCCUAGA" "CCGAGCCUCAGCGACUUUUA" "AAAUCUCAGCCACCAUCUAG"
## [13] "UCUCAGCUAAACAGGUUAGC" "AAAGCCCCUUGGCUCAGCUA" "CCUCAGCAGAAAAAUAGGGG"
## [16] "UGGGGGCAGCAAUUAAGCUG" "CGAAGCGUCAGCAAUCUCAA" "GGUCAGCUAAGAAAAGUAUA"
## [19] "UGACAGCAAAGUUGCAACUU" "GAUAGUUGUCUCAGCGAUUA" "CACUAAGGUGUCAGCAAUUA"
## [22] "UCUAAUUGGGUUAUGUAAUA" "CGGCUCAGCAAUACCUGGGU" "GCUCAGCAAACUGGCCGCGU"
## [25] "UCUCCGCAAGCUAGAAUCUG" "UAGCAGCAAAAUCGGCAUCC" "GCCAAUCUCAGCUAUCGAAA"
## [28] "GGAGAGUGGUCAGCAAAACC" "ACUUAAAAUCUCAGCCAAAG" "CACUCCUAUGUCAGCCAAAA"
## [31] "CUCAGCAAGAGCAAAGCGUC" "GAUACCAGGGAAUCACAAAU" "CCCGCACAAAAAGACCUCAU"
## [34] "AUCUCAUGGCAAAAACUUGA" "CCAGGACAGAAAAGACGACU" "UUGCGUGCCCGCACAAUCAG"
## [37] "CGGCAGGACAGAAAGAGUAA" "ACGGGCGCGGAUGUCAUCAC" "AUCACCGCACUCUAAACGCA"
## [40] "ACGCCAGCGCUGGAAAGUCG" "AAGUGCCAGGAAAACCAGUG"
RNAs <- RNAStringSet(barcodes$mRNA)
RNAs
## RNAStringSet object of length 41:
## width seq
## [1] 20 CUCAGCAAAUUUCCGAGAGA
## [2] 20 GUGGCAGCAAAAACACUGGC
## [3] 20 GGUCGUCUCAGCGAGCUUUA
## [4] 20 CAUCUCAGCAAUGGGCUGUG
## [5] 20 GAGGUCUCGGCAAAAGCCAC
## ... ... ...
## [37] 20 CGGCAGGACAGAAAGAGUAA
## [38] 20 ACGGGCGCGGAUGUCAUCAC
## [39] 20 AUCACCGCACUCUAAACGCA
## [40] 20 ACGCCAGCGCUGGAAAGUCG
## [41] 20 AAGUGCCAGGAAAACCAGUG
The above took the mRNA we made earlier from the cDNA and made it an RNA object. But it has to be in the RNA nucleotide form with T replaced by U or thymine replaced with uracil. It won’t convert the cDNA into mRNA for you.
AminoAcids <- translate(RNAs)
AminoAcids
## AAStringSet object of length 41:
## width seq
## [1] 6 LSKFPR
## [2] 6 VAAKTL
## [3] 6 GRLSEL
## [4] 6 HLSNGL
## [5] 6 EVSAKA
## ... ... ...
## [37] 6 RQDRKS
## [38] 6 TGADVI
## [39] 6 ITAL*T
## [40] 6 TPALES
## [41] 6 KCQENQ
AAs <- as.character(AminoAcids)
AAs
## [1] "LSKFPR" "VAAKTL" "GRLSEL" "HLSNGL" "EVSAKA" "ICTSAK" "*QPESP" "PQQSQQ"
## [9] "HGSQRA" "SQRAIL" "PSLSDF" "KSQPPS" "SQLNRL" "KAPWLS" "PQQKNR" "WGQQLS"
## [17] "RSVSNL" "GQLRKV" "*QQSCN" "DSCLSD" "H*GVSN" "SNWVM*" "RLSNTW" "AQQTGR"
## [25] "SPQARI" "*QQNRH" "ANLSYR" "GEWSAK" "T*NLSQ" "HSYVSQ" "LSKSKA" "DTRESQ"
## [33] "PAQKDL" "ISWQKL" "PGQKRR" "MRARTI" "RQDRKS" "TGADVI" "ITAL*T" "TPALES"
## [41] "KCQENQ"
The above are the amino acids made from the mRNA string and stored as a character because the AminoAcids object is made up of lists. Lets add this Amino Acids, AAs, field to our data of ID_REF but only the fields we have on the strings given of 20 base pair cDNA.
proteinStrings_DF <- barcodes[,c(1,33,34)]
str(proteinStrings_DF)
## 'data.frame': 41 obs. of 3 variables:
## $ ID_REF : chr "GAGTCGTTTAAAGGCTCTCT" "CACCGTCGTTTTTGTGACCG" "CCAGCAGAGTCGCTCGAAAT" "GTAGAGTCGTTACCCGACAC" ...
## $ mRNA : chr "CUCAGCAAAUUUCCGAGAGA" "GUGGCAGCAAAAACACUGGC" "GGUCGUCUCAGCGAGCUUUA" "CAUCUCAGCAAUGGGCUGUG" ...
## $ aminoAcids_simulated: chr "-Leu-A-Ala-AA-Phe-CC-Glu-Arg-" "#NAME?" "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "#NAME?" ...
proteinStrings_DF$cDNA <- barcodes$ID_REF
proteinStrings_DF$RNA_AminoAcids <- AAs
str(proteinStrings_DF)
## 'data.frame': 41 obs. of 5 variables:
## $ ID_REF : chr "GAGTCGTTTAAAGGCTCTCT" "CACCGTCGTTTTTGTGACCG" "CCAGCAGAGTCGCTCGAAAT" "GTAGAGTCGTTACCCGACAC" ...
## $ mRNA : chr "CUCAGCAAAUUUCCGAGAGA" "GUGGCAGCAAAAACACUGGC" "GGUCGUCUCAGCGAGCUUUA" "CAUCUCAGCAAUGGGCUGUG" ...
## $ aminoAcids_simulated: chr "-Leu-A-Ala-AA-Phe-CC-Glu-Arg-" "#NAME?" "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "#NAME?" ...
## $ cDNA : chr "GAGTCGTTTAAAGGCTCTCT" "CACCGTCGTTTTTGTGACCG" "CCAGCAGAGTCGCTCGAAAT" "GTAGAGTCGTTACCCGACAC" ...
## $ RNA_AminoAcids : chr "LSKFPR" "VAAKTL" "GRLSEL" "HLSNGL" ...
cDNA <- DNAStringSet(barcodes$ID_REF)
Pro <- translate(cDNA)
Pro
## AAStringSet object of length 41:
## width seq
## [1] 6 ESFKGS
## [2] 6 HRRFCD
## [3] 6 PAESLE
## [4] 6 VESLPD
## [5] 6 LQSRFR
## ... ... ...
## [37] 6 AVLSFS
## [38] 6 CPRLQ*
## [39] 6 *WREIC
## [40] 6 CGRDLS
## [41] 6 FTVLLV
Those are the one letter abbreviations for the 21 amino acids made from cDNA seen above, not the same as the amino acids made from the mRNA that is complementary to the cDNA, and is a replicate of the forward DNA strand.
Add in the protein amino acids to the proteinStrings_DF data frame.
proAAs <- as.character(Pro)
proAAs
## [1] "ESFKGS" "HRRFCD" "PAESLE" "VESLPD" "LQSRFR" "*TCSRF" "IVGLSG" "GVVRVV"
## [9] "VPSVAR" "RVAR*D" "GSESLK" "FRVGGR" "RVDLSN" "FRGTES" "GVVFLS" "TPVVNS"
## [17] "ASQSLE" "PVDSFH" "TVVSTL" "LSTESL" "VIPQSL" "RLTQYI" "AESLWT" "RVV*PA"
## [25] "RGVRS*" "IVVLAV" "RLESIA" "PLTSRF" "*ILESV" "VRIQSV" "ESFSFR" "LWSLSV"
## [33] "GRVFLE" "*STVFE" "GPVFSA" "NARAC*" "AVLSFS" "CPRLQ*" "*WREIC" "CGRDLS"
## [41] "FTVLLV"
proteinStrings_DF$cDNA_AminoAcids <- proAAs
head(proteinStrings_DF)
## ID_REF mRNA aminoAcids_simulated
## 1 GAGTCGTTTAAAGGCTCTCT CUCAGCAAAUUUCCGAGAGA -Leu-A-Ala-AA-Phe-CC-Glu-Arg-
## 2 CACCGTCGTTTTTGTGACCG GUGGCAGCAAAAACACUGGC #NAME?
## 3 CCAGCAGAGTCGCTCGAAAT GGUCGUCUCAGCGAGCUUUA G-Val-GU-Leu-A-Ala-AGC-Phe-A
## 4 GTAGAGTCGTTACCCGACAC CAUCUCAGCAAUGGGCUGUG #NAME?
## 5 CTCCAGAGCCGTTTTCGGTG GAGGUCUCGGCAAAAGCCAC #NAME?
## 6 TAGACATGCAGTCGTTTCGA AUCUGUACGUCAGCAAAGCU AU-Leu-Tyr-Val-A-Ala-AA-Ala-
## cDNA RNA_AminoAcids cDNA_AminoAcids
## 1 GAGTCGTTTAAAGGCTCTCT LSKFPR ESFKGS
## 2 CACCGTCGTTTTTGTGACCG VAAKTL HRRFCD
## 3 CCAGCAGAGTCGCTCGAAAT GRLSEL PAESLE
## 4 GTAGAGTCGTTACCCGACAC HLSNGL VESLPD
## 5 CTCCAGAGCCGTTTTCGGTG EVSAKA LQSRFR
## 6 TAGACATGCAGTCGTTTCGA ICTSAK *TCSRF
cDNA_mRNA_AAs_both <- proteinStrings_DF[,c(1,4,6,2,5)]
head(cDNA_mRNA_AAs_both)
## ID_REF cDNA cDNA_AminoAcids
## 1 GAGTCGTTTAAAGGCTCTCT GAGTCGTTTAAAGGCTCTCT ESFKGS
## 2 CACCGTCGTTTTTGTGACCG CACCGTCGTTTTTGTGACCG HRRFCD
## 3 CCAGCAGAGTCGCTCGAAAT CCAGCAGAGTCGCTCGAAAT PAESLE
## 4 GTAGAGTCGTTACCCGACAC GTAGAGTCGTTACCCGACAC VESLPD
## 5 CTCCAGAGCCGTTTTCGGTG CTCCAGAGCCGTTTTCGGTG LQSRFR
## 6 TAGACATGCAGTCGTTTCGA TAGACATGCAGTCGTTTCGA *TCSRF
## mRNA RNA_AminoAcids
## 1 CUCAGCAAAUUUCCGAGAGA LSKFPR
## 2 GUGGCAGCAAAAACACUGGC VAAKTL
## 3 GGUCGUCUCAGCGAGCUUUA GRLSEL
## 4 CAUCUCAGCAAUGGGCUGUG HLSNGL
## 5 GAGGUCUCGGCAAAAGCCAC EVSAKA
## 6 AUCUGUACGUCAGCAAAGCU ICTSAK
Let add these to the barcodes data frame. But lets remove the mRNA and aminoAcids_simulated feature in the data frame before adding these additional features.
Barcodes <- barcodes[,-c(33,34)]
colnames(Barcodes)
## [1] "ID_REF"
## [2] "geneSynonyms"
## [3] "genecardsSummary"
## [4] "NCBI_Summary"
## [5] "uniProtSummary"
## [6] "Ensembl_Name"
## [7] "control1.4362"
## [8] "control2.4363"
## [9] "control3.4364"
## [10] "MS1_r1_4370"
## [11] "MS1_r2_4371"
## [12] "MS1_r3_4372"
## [13] "MS1_r4_4373"
## [14] "MS1_r5_4374"
## [15] "MS2_r1_4375"
## [16] "MS2_r2_4376"
## [17] "MS2_r3_4377"
## [18] "MS2_r4_4378"
## [19] "MS2_r5_4379"
## [20] "commercial1o.commercial_r1_4365"
## [21] "commercial2o.commercial_r2_4366"
## [22] "commercial3o.commercial_r3_4367"
## [23] "commercial4o.commercial_r4_4368"
## [24] "commercial5o.commercial_r5_4369"
## [25] "controlMeans"
## [26] "MS1_Means"
## [27] "MS2_Means"
## [28] "commercial_Means"
## [29] "foldchange_MS1_vs_control"
## [30] "foldchange_MS2_vs_control"
## [31] "RealFC_commercial_control_means"
## [32] "inverse_FC_control_over_commercial_means"
Barcodes1 <- merge(Barcodes, cDNA_mRNA_AAs_both,by.x="ID_REF",by.y='ID_REF')
colnames(Barcodes1)
## [1] "ID_REF"
## [2] "geneSynonyms"
## [3] "genecardsSummary"
## [4] "NCBI_Summary"
## [5] "uniProtSummary"
## [6] "Ensembl_Name"
## [7] "control1.4362"
## [8] "control2.4363"
## [9] "control3.4364"
## [10] "MS1_r1_4370"
## [11] "MS1_r2_4371"
## [12] "MS1_r3_4372"
## [13] "MS1_r4_4373"
## [14] "MS1_r5_4374"
## [15] "MS2_r1_4375"
## [16] "MS2_r2_4376"
## [17] "MS2_r3_4377"
## [18] "MS2_r4_4378"
## [19] "MS2_r5_4379"
## [20] "commercial1o.commercial_r1_4365"
## [21] "commercial2o.commercial_r2_4366"
## [22] "commercial3o.commercial_r3_4367"
## [23] "commercial4o.commercial_r4_4368"
## [24] "commercial5o.commercial_r5_4369"
## [25] "controlMeans"
## [26] "MS1_Means"
## [27] "MS2_Means"
## [28] "commercial_Means"
## [29] "foldchange_MS1_vs_control"
## [30] "foldchange_MS2_vs_control"
## [31] "RealFC_commercial_control_means"
## [32] "inverse_FC_control_over_commercial_means"
## [33] "cDNA"
## [34] "cDNA_AminoAcids"
## [35] "mRNA"
## [36] "RNA_AminoAcids"
Lets make the sample repeats of the commercial line shorter.
colnames(Barcodes1)[20:24] <- c("commercial_r1_4365"
,"commercial_r2_4366"
,"commercial_r3_4367"
,"commercial_r4_4368"
,"commercial_r5_4369")
colnames(Barcodes1)
## [1] "ID_REF"
## [2] "geneSynonyms"
## [3] "genecardsSummary"
## [4] "NCBI_Summary"
## [5] "uniProtSummary"
## [6] "Ensembl_Name"
## [7] "control1.4362"
## [8] "control2.4363"
## [9] "control3.4364"
## [10] "MS1_r1_4370"
## [11] "MS1_r2_4371"
## [12] "MS1_r3_4372"
## [13] "MS1_r4_4373"
## [14] "MS1_r5_4374"
## [15] "MS2_r1_4375"
## [16] "MS2_r2_4376"
## [17] "MS2_r3_4377"
## [18] "MS2_r4_4378"
## [19] "MS2_r5_4379"
## [20] "commercial_r1_4365"
## [21] "commercial_r2_4366"
## [22] "commercial_r3_4367"
## [23] "commercial_r4_4368"
## [24] "commercial_r5_4369"
## [25] "controlMeans"
## [26] "MS1_Means"
## [27] "MS2_Means"
## [28] "commercial_Means"
## [29] "foldchange_MS1_vs_control"
## [30] "foldchange_MS2_vs_control"
## [31] "RealFC_commercial_control_means"
## [32] "inverse_FC_control_over_commercial_means"
## [33] "cDNA"
## [34] "cDNA_AminoAcids"
## [35] "mRNA"
## [36] "RNA_AminoAcids"
I also want to shorten the last features of names for the means and foldchange fields.
colnames(Barcodes1)[25] <- "control_Means"
colnames(Barcodes1)[28] <- "comml_Means"
colnames(Barcodes1)[29:30] <- gsub('foldchange','FC',colnames(Barcodes1)[29:30])
colnames(Barcodes1)[31] <- "FC_comml_vs_control"
colnames(Barcodes1)[32] <- "FC_inverse_control_vs_comml"
colnames(Barcodes1)
## [1] "ID_REF" "geneSynonyms"
## [3] "genecardsSummary" "NCBI_Summary"
## [5] "uniProtSummary" "Ensembl_Name"
## [7] "control1.4362" "control2.4363"
## [9] "control3.4364" "MS1_r1_4370"
## [11] "MS1_r2_4371" "MS1_r3_4372"
## [13] "MS1_r4_4373" "MS1_r5_4374"
## [15] "MS2_r1_4375" "MS2_r2_4376"
## [17] "MS2_r3_4377" "MS2_r4_4378"
## [19] "MS2_r5_4379" "commercial_r1_4365"
## [21] "commercial_r2_4366" "commercial_r3_4367"
## [23] "commercial_r4_4368" "commercial_r5_4369"
## [25] "control_Means" "MS1_Means"
## [27] "MS2_Means" "comml_Means"
## [29] "FC_MS1_vs_control" "FC_MS2_vs_control"
## [31] "FC_comml_vs_control" "FC_inverse_control_vs_comml"
## [33] "cDNA" "cDNA_AminoAcids"
## [35] "mRNA" "RNA_AminoAcids"
The data frame looks good to me. So now we write it out to csv.
Write this table out to csv.
write.csv(Barcodes1,'MS_top41genes_AAs_both_cDNA_mRNA.csv', row.names=F)
==========================================
Lets look at our table, we put in summaries and alternate Ensembl names for the top 41 nucleotide strings of 20 base pairs long cDNA, but didn’t paraphrase what they do. Lets order this data frame, Barcodes1, by the inverse fold change values of commercial line of MS in feature 32 of our table by colname “FC_inverse_control_vs_comml” so that decreasing is True.
Empty our environment by clearing it and read back in the file we wrote out to csv on all features thus far, but lets name it cDNA instead of barcodes.
cDNA <- read.csv('MS_top41genes_AAs_both_cDNA_mRNA.csv',header=T, sep=',', na.strings=c('',' ','na','NA'))
str(cDNA)
## 'data.frame': 41 obs. of 36 variables:
## $ ID_REF : chr "AACGCACGGGCGTGTTAGTC" "ACCCCCGTCGTTAATTCGAC" "ACTGTCGTTTCAACGTTGAA" "AGAGGCGTTCGATCTTAGAC" ...
## $ geneSynonyms : chr "TSNARE1" "CACNA1E" "STX12" "PDE4DIPP5" ...
## $ genecardsSummary : chr "GeneCards Symbol: TSNARE1 \nT-SNARE Domain Containing 1 \nGeneCards Summary for TSNARE1 Gene\nTSNARE1 (T-SNARE "| __truncated__ "GeneCards Symbol: CACNA1E \nCalcium Voltage-Gated Channel Subunit Alpha1 E\nGeneCards Summary for CACNA1E Gene\"| __truncated__ "GeneCards Symbol: STX12 \nSyntaxin 12\nGeneCards Summary for STX12 Gene\nSTX12 (Syntaxin 12) is a Protein Codin"| __truncated__ "GeneCards Symbol: PDE4DIPP5 \nPDE4DIP Pseudogene 5\nGeneCards Summary for PDE4DIPP5 Gene\nPDE4DIPP5 (PDE4DIP Ps"| __truncated__ ...
## $ NCBI_Summary : chr "NCBI Gene Summary for TSNARE1 Gene \nPredicted to enable SNAP receptor activity and SNARE binding activity. Pre"| __truncated__ "NCBI Gene Summary for CACNA1E Gene \nVoltage-dependent calcium channels are multisubunit complexes consisting o"| __truncated__ "NCBI Gene Summary for STX12 Gene \nPredicted to enable SNAP receptor activity and SNARE binding activity. Invol"| __truncated__ "No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene "| __truncated__ ...
## $ uniProtSummary : chr "No data available for CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , Rfam cl"| __truncated__ "UniProtKB/Swiss-Prot Summary for CACNA1E Gene\nVoltage-sensitive calcium channels (VSCC) mediate the entry of c"| __truncated__ "UniProtKB/Swiss-Prot Summary for STX12 Gene\nSNARE promoting fusion of transport vesicles with target membranes"| __truncated__ "No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene "| __truncated__ ...
## $ Ensembl_Name : chr "ENSG00000171045" "ENSG00000198216" "ENSG00000117758" "ENSG00000275064" ...
## $ control1.4362 : int 25 1 1 3 2 2 2 1 1 1 ...
## $ control2.4363 : int 46 1 1 4 2 1 1 1 1 1 ...
## $ control3.4364 : int 32 1 1 1 2 1 2 1 1 1 ...
## $ MS1_r1_4370 : int 1 25 49 63 70 94 160 17 78 103 ...
## $ MS1_r2_4371 : int 2 104 31 180 172 101 66 81 34 63 ...
## $ MS1_r3_4372 : int 1 34 35 107 68 49 75 70 50 85 ...
## $ MS1_r4_4373 : int 3 75 72 203 224 152 147 102 72 124 ...
## $ MS1_r5_4374 : int 8 70 71 179 220 98 123 103 70 142 ...
## $ MS2_r1_4375 : int 6 52 34 118 107 74 88 66 41 55 ...
## $ MS2_r2_4376 : int 6 60 51 94 126 85 75 110 44 61 ...
## $ MS2_r3_4377 : int 3 68 21 122 120 88 81 89 53 52 ...
## $ MS2_r4_4378 : int 3 75 67 112 132 95 75 74 49 75 ...
## $ MS2_r5_4379 : int 3 66 47 123 119 81 86 48 50 64 ...
## $ commercial_r1_4365 : int 5 65 56 143 108 70 62 68 69 96 ...
## $ commercial_r2_4366 : int 4 50 52 133 117 98 93 46 23 65 ...
## $ commercial_r3_4367 : int 3 79 65 153 181 103 102 85 66 123 ...
## $ commercial_r4_4368 : int 2 42 24 107 126 77 79 73 47 64 ...
## $ commercial_r5_4369 : int 4 46 67 113 74 70 85 56 35 107 ...
## $ control_Means : num 34.33 1 1 2.67 2 ...
## $ MS1_Means : num 3 61.6 51.6 146.4 150.8 ...
## $ MS2_Means : num 4.2 64.2 44 113.8 120.8 ...
## $ comml_Means : num 3.6 56.4 52.8 129.8 121.2 ...
## $ FC_MS1_vs_control : num 0.0874 61.6 51.6 54.9 75.4 ...
## $ FC_MS2_vs_control : num 0.122 64.2 44 42.675 60.4 ...
## $ FC_comml_vs_control : num 0.105 56.4 52.8 48.675 60.6 ...
## $ FC_inverse_control_vs_comml: num 9.537 0.0177 0.0189 0.0205 0.0165 ...
## $ cDNA : chr "AACGCACGGGCGTGTTAGTC" "ACCCCCGTCGTTAATTCGAC" "ACTGTCGTTTCAACGTTGAA" "AGAGGCGTTCGATCTTAGAC" ...
## $ cDNA_AminoAcids : chr "NARAC*" "TPVVNS" "TVVSTL" "RGVRS*" ...
## $ mRNA : chr "UUGCGUGCCCGCACAAUCAG" "UGGGGGCAGCAAUUAAGCUG" "UGACAGCAAAGUUGCAACUU" "UCUCCGCAAGCUAGAAUCUG" ...
## $ RNA_AminoAcids : chr "MRARTI" "WGQQLS" "*QQSCN" "SPQARI" ...
cDNA_ordered <- cDNA[order(cDNA$FC_inverse_control_vs_comml, decreasing=T),]
head(cDNA_ordered[,c(2:6,29:36)])
## geneSynonyms
## 39 DDAH1
## 38 KLHL29
## 35 RNU6-280P
## 37 RPL31P30
## 22 MIR4432HG
## 1 TSNARE1
## genecardsSummary
## 39 GeneCards Symbol: DDAH1\nDimethylarginine Dimethylaminohydrolase 1\nGeneCards Summary for DDAH1 Gene\nDDAH1 (Dimethylarginine Dimethylaminohydrolase 1) is a Protein Coding gene. Diseases associated with DDAH1 include Hyperhomocysteinemia and Pulmonary Hypertension. Among its related pathways are Metabolism of nitric oxide: NOS3 activation and regulation and Metabolism. Gene Ontology (GO) annotations related to this gene include amino acid binding and dimethylargininase activity. An important paralog of this gene is DDAH2.
## 38 GeneCards Symbol: KLHL29 \nKelch Like Family Member 29\nGeneCards Summary for KLHL29 Gene\nKLHL29 (Kelch Like Family Member 29) is a Protein Coding gene. Diseases associated with KLHL29 include Bardet-Biedl Syndrome 7. An important paralog of this gene is KLHL24.
## 35 GeneCards Symbol: RNU6-280P\nRNA, U6 Small Nuclear 280, Pseudogene\nGeneCards Summary for RNU6-280P Gene\nRNU6-280P (RNA, U6 Small Nuclear 280, Pseudogene) is a Pseudogene.
## 37 GeneCards Symbol: RPL31P30\nRibosomal Protein L31 Pseudogene 30\nGeneCards Summary for RPL31P30 Gene\nRPL31P30 (Ribosomal Protein L31 Pseudogene 30) is a Pseudogene.
## 22 GeneCards Symbol: MIR4432HG\nMIR4432 Host Gene\nGeneCards Summary for MIR4432HG Gene\nMIR4432HG (MIR4432 Host Gene) is an RNA Gene, and is affiliated with the lncRNA class.\n
## 1 GeneCards Symbol: TSNARE1 \nT-SNARE Domain Containing 1 \nGeneCards Summary for TSNARE1 Gene\nTSNARE1 (T-SNARE Domain Containing 1) is a Protein Coding gene. Diseases associated with TSNARE1 include Lodder-Merla Syndrome, Type 1, With Impaired Intellectual Development And Cardiac Arrhythmia and Pitt-Hopkins-Like Syndrome 1. Gene Ontology (GO) annotations related to this gene include SNARE binding and SNAP receptor activity. An important paralog of this gene is STX12.\n
## NCBI_Summary
## 39 NCBI Gene Summary for DDAH1 Gene \nThis gene belongs to the dimethylarginine dimethylaminohydrolase (DDAH) gene family. The encoded enzyme plays a role in nitric oxide generation by regulating cellular concentrations of methylarginines, which in turn inhibit nitric oxide synthase activity. [provided by RefSeq, Jul 2008]
## 38 NCBI Gene Summary for KLHL29 Gene \nPredicted to enable ubiquitin-like ligase-substrate adaptor activity. Predicted to be involved in proteasome-mediated ubiquitin-dependent protein catabolic process. Predicted to be part of Cul3-RING ubiquitin ligase complex. Predicted to be active in cytoplasm. [provided by Alliance of Genome Resources, Jul 2025]
## 35 No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , PharmGKB Summary , Rfam classification and piRNA Summary for RNU6-280P Gene
## 37 No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , PharmGKB Summary , Rfam classification and piRNA Summary for RPL31P30 Gene
## 22 No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , PharmGKB Summary , Rfam classification and piRNA Summary for MIR4432HG Gene
## 1 NCBI Gene Summary for TSNARE1 Gene \nPredicted to enable SNAP receptor activity and SNARE binding activity. Predicted to be involved in intracellular protein transport; vesicle docking; and vesicle fusion. Predicted to be located in membrane. Predicted to be part of SNARE complex. Predicted to be active in synaptic vesicle. [provided by Alliance of Genome Resources, Jul 2025]
## uniProtSummary
## 39 UniProtKB/Swiss-Prot Summary for DDAH1 Gene\nHydrolyzes N(G),N(G)-dimethyl-L-arginine (ADMA) and N(G)-monomethyl-L-arginine (MMA) which act as inhibitors of NOS. Has therefore a role in the regulation of nitric oxide generation. ( DDAH1_HUMAN,O94760 )
## 38 No data available for CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , Rfam classification and piRNA Summary for KLHL29 Gene
## 35 No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , PharmGKB Summary , Rfam classification and piRNA Summary for RNU6-280P Gene
## 37 No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , PharmGKB Summary , Rfam classification and piRNA Summary for RPL31P30 Gene
## 22 No data available for NCBI Gene Summary , CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , PharmGKB Summary , Rfam classification and piRNA Summary for MIR4432HG Gene
## 1 No data available for CIViC Summary , UniProtKB/Swiss-Prot Summary , Tocris Summary , Gene Wiki entry , Rfam classification and piRNA Summary for TSNARE1 Gene
## Ensembl_Name FC_MS1_vs_control FC_MS2_vs_control FC_comml_vs_control
## 39 ENSG00000153904 0.06075949 0.06835443 0.06835443
## 38 ENSG00000119771 0.08219178 0.14794520 0.08219178
## 35 ENSG00000201015 0.08250000 0.11250000 0.09000000
## 37 ENSG00000230702 0.10843374 0.09397590 0.09397590
## 22 ENSG00000228590 0.08955224 0.10746269 0.09850746
## 1 ENSG00000171045 0.08737864 0.12233010 0.10485437
## FC_inverse_control_vs_comml cDNA cDNA_AminoAcids
## 39 14.629630 TTCACGGTCCTTTTGGTCAC FTVLLV
## 38 12.166667 TGCGGTCGCGACCTTTCAGC CGRDLS
## 35 11.111111 TAGTGGCGTGAGATTTGCGT *WREIC
## 37 10.641026 TGCCCGCGCCTACAGTAGTG CPRLQ*
## 22 10.151515 GCCGTCCTGTCTTTCTCATT AVLSFS
## 1 9.537037 AACGCACGGGCGTGTTAGTC NARAC*
## mRNA RNA_AminoAcids
## 39 AAGUGCCAGGAAAACCAGUG KCQENQ
## 38 ACGCCAGCGCUGGAAAGUCG TPALES
## 35 AUCACCGCACUCUAAACGCA ITAL*T
## 37 ACGGGCGCGGAUGUCAUCAC TGADVI
## 22 CGGCAGGACAGAAAGAGUAA RQDRKS
## 1 UUGCGUGCCCGCACAAUCAG MRARTI
The first 10 listed genes are higher in value greater than 1 while the fold change values for the repeat sample means of the MS1 and MS2 vs control means is lower than 1. This means these top 10 listed genes by higher than 1 inverse fold change value of control means to commercial means are silenced in multiple sclerosis patients, while all other of the remaining 41 genes totalling 31 genes are enhanced in multiple sclerosis patients.
We can look at the top genes and get an idea of the summary information to see why silenced, and then compare amino acids in the mRNA amino acids that will build that protein to see if any deficiencies in amino acids or abundance that pops out as more common, to see if diet could help regulate more vitamins and nutrients to bring in amino acids lacking in diet or to correct imbalances in over abundance. But nothing really pops out.
Lets just pull up an online search of what these one letter abreviations are for our 21 or so amino acids that are made up of 3 nucleotides of ribonucleic acids.
The one-letter abbreviations of amino acids are as follows: * A - Alanine * C - Cysteine * D - Aspartic acid * E - Glutamic acid * F - Phenylalanine * G - Glycine * H - Histidine * I - Isoleucine * K - Lysine * L - Leucine * M - Methionine * N - Asparagine * P - Proline * Q - Glutamine * R - Arginine * S - Serine * T - Threonine * V - Valine * W - Tryptophan * Y - Tyrosine
The three letter alternate and substitute codon chart of ribonucleic acids forming each amino acid is below.Amino Acid Chart
Lets look at the 10 silenced genes in multiple sclerosis patients by their mRNA amino acids.
silenced10genes <- cDNA_ordered[1:10,]
silenced10genes[,c(2,36)]
## geneSynonyms RNA_AminoAcids
## 39 DDAH1 KCQENQ
## 38 KLHL29 TPALES
## 35 RNU6-280P ITAL*T
## 37 RPL31P30 TGADVI
## 22 MIR4432HG RQDRKS
## 1 TSNARE1 MRARTI
## 27 ACTN4 PGQKRR
## 34 TFPI ISWQKL
## 26 WDR35 PAQKDL
## 17 LZIC DTRESQ
Now lets get the other 31 genes that are enhanced in multiple sclerosis patients.
enhanced31genes <- cDNA_ordered[11:31,]
enhanced31genes[,c(2,36)]
## geneSynonyms RNA_AminoAcids
## 19 AQP12B LSKSKA
## 31 CAMKMT HSYVSQ
## 36 ATP5MC3 T*NLSQ
## 13 SCHLAP1 GEWSAK
## 15 HDAC4 ANLSYR
## 9 ST8SIA4 *QQNRH
## 4 PDE4DIPP5 SPQARI
## 14 SLIRPP1 AQQTGR
## 21 KCNA6-AS1 RLSNTW
## 7 PIK3C2A SNWVM*
## 32 CDH8 H*GVSN
## 16 CRLF3P3 DSCLSD
## 3 STX12 *QQSCN
## 12 NDUFB5P1 GQLRKV
## 23 TMEM200A RSVSNL
## 2 CACNA1E WGQQLS
## 24 ERBB4 PQQKNR
## 41 CPA6 KAPWLS
## 5 PLSCR5 SQLNRL
## 25 ESRP1 PSLSDF
## 40 RPL7P61 KSQPPS
Branch chain amino acids (BCAAs) are protein building blocks for skeletal muscle of leucine, isoleucine, and valine for VIL. We can see how many more in the silenced or enhanced regions have these BCAAs. The essential amino acids are PVT TIM HALL as the BCAAs of VIL, plus 6 more for histidine, phenylalanine, tryptophan, methionine, Lysine, and threonine. Those are essential to human metabolism for all body functions and need to be consumed in the diet because we degrade them or don’t make them from our own body. All other amino acids other than the essential amino acids are made in our body with what we have and recycled products in the body cellular processes. This was a part of nutrition and human biochemistry.
In neurology and scientific basis of chiropractic, we learned of the amino acids that are neurotransmitters, which are essential for good central nervous system and peripheral nervous system activity to be optimal and not be over abundant in some which can cause toxic pain and lead to chronic pain and inflammation in the body.
Since multiple sclerosis patients lose optimal CNS and PNS functions with disease progression due to demyelination of the nerve tissue in the brain or the cord. Regulatory mechanisms in the brain or central nervous system use the oligodendrocytes to maintain proper nerve activity and repair damaged neurons. While the peripheral nervous system is maintained by schwann cells. They had a mnemonic called COPS to remember the type of macraphage that regulates that nervous tissue for CNS is oligodendrocytes, and PNS is Schwann cells. In inflammation those types of macrophages or WBCs are in abundance. Those also are responsible for keeping the nervous tissue myelinated when maintaining those type A or type B nerve roots that are myelinated. The type C are not myelinated and thinner and seen in chronic pain patterns that reach the cortex like CRPS for Complex Regional Pain Syndrome, a new pathology that used to be the gaslighting diagnosis for women that complained of pain all the time but had no physical or clinical signs of pain other than redness and swelling when lightly touched in a sensory test.
Getting to those amino acids by the scientific basic of chiropractic that are neuro transporters. I have to look it up because I need to. So those aren’t amino acids, just dopamine, adrenaline/epinephrine, noradrenaline/norepinephrine, acetylcholine, endorphines, alpha aminobutyric acid, and glutamate.
Glutamate is an amino acid or part of the amino acids glutamine or glutamic acid. It is needed for memory and brain development, found in the brain but too much is toxic and it can damage brain neurons and is found to be in excess after a stroke which can lead to brain neuron cell death from excess glutamate. The one letter amino acid for glutamic acid is E and the one letter amino acid for glutamine is Q. An internet search said that glutamate is essentially the same thing as glutamic acid but not the acid form, while glutamine is a completely different amino acid related to gut metabolism and glutamate is related to brain neurotransmitter functions. So we only look at the E for glutamic acid in our amino acid strings from mRNA.
Lets see how many of the amino acids in the 10 silenced gene mRNA amino acids have E for glutamic acid. Then see how many have E in the 31 enhanced genes in multiple sclerosis patients.
E_silenced <- grep("E",silenced10genes$RNA_AminoAcids)
EgenesSilenced <- silenced10genes[E_silenced,]
EgenesSilenced[,c(2,3,36)]
## geneSynonyms
## 39 DDAH1
## 38 KLHL29
## 17 LZIC
## genecardsSummary
## 39 GeneCards Symbol: DDAH1\nDimethylarginine Dimethylaminohydrolase 1\nGeneCards Summary for DDAH1 Gene\nDDAH1 (Dimethylarginine Dimethylaminohydrolase 1) is a Protein Coding gene. Diseases associated with DDAH1 include Hyperhomocysteinemia and Pulmonary Hypertension. Among its related pathways are Metabolism of nitric oxide: NOS3 activation and regulation and Metabolism. Gene Ontology (GO) annotations related to this gene include amino acid binding and dimethylargininase activity. An important paralog of this gene is DDAH2.
## 38 GeneCards Symbol: KLHL29 \nKelch Like Family Member 29\nGeneCards Summary for KLHL29 Gene\nKLHL29 (Kelch Like Family Member 29) is a Protein Coding gene. Diseases associated with KLHL29 include Bardet-Biedl Syndrome 7. An important paralog of this gene is KLHL24.
## 17 GeneCards Symbol: LZIC\nLeucine Zipper And CTNNBIP1 Domain Containing\nGeneCards Summary for LZIC Gene\nLZIC (Leucine Zipper And CTNNBIP1 Domain Containing) is a Protein Coding gene. Diseases associated with LZIC include Schnyder Corneal Dystrophy. Gene Ontology (GO) annotations related to this gene include beta-catenin binding. An important paralog of this gene is CTNNBIP1.
## RNA_AminoAcids
## 39 KCQENQ
## 38 TPALES
## 17 DTRESQ
These 3 genes are silenced in multiple sclerosis people and have glutamic acid in them, DDAH1, KLHL29, and LZIC.
Now lets see the enhanced genes in multiple sclerosis patients that have glutamic acid in the mRNA amino acid strand made from the 20 base pair cDNA.
E_enhanced <- grep('E',enhanced31genes$RNA_AminoAcids)
glutamicAcidEnhancedGenes <- enhanced31genes[E_enhanced,]
glutamicAcidEnhancedGenes[,c(2,3,36)]
## geneSynonyms
## 13 SCHLAP1
## genecardsSummary
## 13 GeneCards Symbol: SCHLAP1\nSWI/SNF Complex Antagonist Associated With Prostate Cancer 1\nGeneCards Summary for SCHLAP1 Gene\nSCHLAP1 (SWI/SNF Complex Antagonist Associated With Prostate Cancer 1) is an RNA Gene, and is affiliated with the lncRNA class. Diseases associated with SCHLAP1 include Prostate Cancer and Glioblastoma.
## RNA_AminoAcids
## 13 GEWSAK
Not many of the top 41 genes actually have the glutamate in its fragment strand. Glutamate is a neurotransmitter that is a memory neurotransmitter capable in excess of destroying neurons but in adequate amounts repairs them and develops the brain. We only seen Glutamate in 3 of the silencer genes of 10 silenced genes in multiple sclerosis and 1 of the enhanced genes of 31 enhancer genes.
Lets see if the essential amino acids of PVT TIM HALL are present in our silenced genes just by an eye scan of the 10 silenced genes. And then at the enhanced genes. Here is the chart of 1 letter amino acid abbreviations again. The one-letter abbreviations of amino acids are as follows:
Lets just divide these up into essential and non essential amino acids.
Essential:
Non-essential:
Lets glance at the silenced genes to search our essential amino acids.
silenced10genes$RNA_AminoAcids
## [1] "KCQENQ" "TPALES" "ITAL*T" "TGADVI" "RQDRKS" "MRARTI" "PGQKRR" "ISWQKL"
## [9] "PAQKDL" "DTRESQ"
Essential amino acids are present in all the silenced top genes in MS patients from looking above. A lot of K-Lysine in 5 genes, A-alanine in 5 genes, T-threonine in 5 genes, I-isoleucine in 4 genes, one V-valine, one W-tryptophan, and one M-methionine. But no F-phenyalanine, or H-histidine present. These are the essential amino acids that a person must consume in diet to obtain.
The non-essential amino acids in the silenced 10 top genes in MS patients are always present without consuming foods to obtain them. From the list of genes above, C, E, N, S, G, R, Q, D, and P are seen, but not Y for Tyrosine.
So, none of the top 10 silenced genes in Multiple Sclerosis patients have tyrosine that is abundant in the body but made from phenylalanine which is consumed and also not seen in our top 10 silenced genes, nor is histidine. An internet search said Tyrosine is needed for memory and thyroid health and derived from histidine that is essential in infants but semi-essential in adults as not needed to consume as much from the diet. Tyrosine is also needed for mood stability and mental cognition. Histidine is needed for building proteins and RBCs as well as many metabolic processes.
Lets put this into a couple data frames for inference later.
F_silenced <- grep("F", silenced10genes$RNA_AminoAcids)
V_silenced <- grep("V", silenced10genes$RNA_AminoAcids)
W_silenced <- grep("W", silenced10genes$RNA_AminoAcids)
T_silenced <- grep("T", silenced10genes$RNA_AminoAcids)
I_silenced <- grep("I", silenced10genes$RNA_AminoAcids)
M_silenced <- grep("M", silenced10genes$RNA_AminoAcids)
H_silenced <- grep("H", silenced10genes$RNA_AminoAcids)
A_silenced <- grep("A", silenced10genes$RNA_AminoAcids)
K_silenced <- grep("K", silenced10genes$RNA_AminoAcids)
L_silenced <- grep("L", silenced10genes$RNA_AminoAcids)
countsEssentialsilenced <- data.frame(AminoAcid=c("F","V","W","T","I","M","H","A","K","L"),genesPresent=c(length(F_silenced),length(V_silenced),length(W_silenced),length(T_silenced),length(I_silenced),length(M_silenced),length(H_silenced),length(A_silenced),length(K_silenced),length(L_silenced)))
countsES <- countsEssentialsilenced[order(countsEssentialsilenced$genesPresent,decreasing=T),]
countsES
## AminoAcid genesPresent
## 4 T 5
## 8 A 5
## 9 K 5
## 5 I 4
## 10 L 4
## 2 V 1
## 3 W 1
## 6 M 1
## 1 F 0
## 7 H 0
The above table is of the ranked order of abundance of essential amino acids in our 10 silenced genes that are part of the top 41 most changed genes in expression profiling of MS patients compared to healthy control patients.
C_Silenced <- grep("C", silenced10genes$RNA_AminoAcids)
D_Silenced <- grep("D", silenced10genes$RNA_AminoAcids)
E_Silenced <- grep("E", silenced10genes$RNA_AminoAcids)
G_Silenced <- grep("G", silenced10genes$RNA_AminoAcids)
N_Silenced <- grep("N", silenced10genes$RNA_AminoAcids)
P_Silenced <- grep("P", silenced10genes$RNA_AminoAcids)
Q_Silenced <- grep("Q", silenced10genes$RNA_AminoAcids)
R_Silenced <- grep("R", silenced10genes$RNA_AminoAcids)
S_Silenced <- grep("S", silenced10genes$RNA_AminoAcids)
Y_Silenced <- grep("Y", silenced10genes$RNA_AminoAcids)
countsNonEssentialSilenced <- data.frame(AminoAcid=c("C","D","E","G","N","P","Q","R","S","Y"),genesPresent=c(length(C_Silenced),length(D_Silenced),length(E_Silenced),length(G_Silenced),length(N_Silenced),length(P_Silenced),length(Q_Silenced),length(R_Silenced),length(S_Silenced),length(Y_Silenced)))
countsNS <- countsNonEssentialSilenced[order(countsNonEssentialSilenced$genesPresent,decreasing=T),]
countsNS
## AminoAcid genesPresent
## 7 Q 6
## 2 D 4
## 8 R 4
## 9 S 4
## 3 E 3
## 6 P 3
## 4 G 2
## 1 C 1
## 5 N 1
## 10 Y 0
The above are the non-essential abundance in the 10 silenced genes ranked by most to least abundant in number of genes found in the 10 for that amino acid that is non-essential or already in our bodies without consuming them in our diets.
Lets do the same for the enhanced genes now. Lets grep the number of genes the essential amino acids are in of our 31 enhanced genes.
F_enhanced <- grep("F", enhanced31genes$RNA_AminoAcids)
V_enhanced <- grep("V", enhanced31genes$RNA_AminoAcids)
W_enhanced <- grep("W", enhanced31genes$RNA_AminoAcids)
T_enhanced <- grep("T", enhanced31genes$RNA_AminoAcids)
I_enhanced <- grep("I", enhanced31genes$RNA_AminoAcids)
M_enhanced <- grep("M", enhanced31genes$RNA_AminoAcids)
H_enhanced <- grep("H", enhanced31genes$RNA_AminoAcids)
A_enhanced <- grep("A", enhanced31genes$RNA_AminoAcids)
K_enhanced <- grep("K", enhanced31genes$RNA_AminoAcids)
L_enhanced <- grep("L", enhanced31genes$RNA_AminoAcids)
countsEssentialEnhanced <- data.frame(AminoAcid=c("F","V","W","T","I","M","H","A","K","L"),genesPresent=c(length(F_enhanced),length(V_enhanced),length(W_enhanced),length(T_enhanced),length(I_enhanced),length(M_enhanced),length(H_enhanced),length(A_enhanced),length(K_enhanced),length(L_enhanced)))
countsEE <- countsEssentialEnhanced[order(countsEssentialEnhanced$genesPresent,decreasing=T),]
countsEE
## AminoAcid genesPresent
## 10 L 11
## 8 A 6
## 9 K 6
## 2 V 5
## 3 W 5
## 4 T 3
## 7 H 3
## 1 F 1
## 5 I 1
## 6 M 1
We can see in the enhanced 31 top genes in MS patients that Leucine is the most present essential amino acid found in 11 genes, then a tie between Alanine and Lysine in 6 genes each, and then a tie with Valine and Tryptophan in 5 genes each, then a tie between Threonine and Histidine found in 3 genes each, Then each of the following are in only 1 gene each for Phenylalanine, isoleucine, and methionine in the enhanced genes.
Now lets get a count each of the non-essential amino acids by abundance in the 31 enhanced top genes in Multiple Sclerosis patients.
C_Enhanced <- grep("C", enhanced31genes$RNA_AminoAcids)
D_Enhanced <- grep("D", enhanced31genes$RNA_AminoAcids)
E_Enhanced <- grep("E", enhanced31genes$RNA_AminoAcids)
G_Enhanced <- grep("G", enhanced31genes$RNA_AminoAcids)
N_Enhanced <- grep("N", enhanced31genes$RNA_AminoAcids)
P_Enhanced <- grep("P", enhanced31genes$RNA_AminoAcids)
Q_Enhanced <- grep("Q", enhanced31genes$RNA_AminoAcids)
R_Enhanced <- grep("R", enhanced31genes$RNA_AminoAcids)
S_Enhanced <- grep("S", enhanced31genes$RNA_AminoAcids)
Y_Enhanced <- grep("Y", enhanced31genes$RNA_AminoAcids)
countsNonEssentialEnhanced <- data.frame(AminoAcid=c("C","D","E","G","N","P","Q","R","S","Y"),genesPresent=c(length(C_Enhanced),length(D_Enhanced),length(E_enhanced),length(G_Enhanced),length(N_Enhanced),length(P_Enhanced),length(Q_Enhanced),length(R_Enhanced),length(S_Enhanced),length(Y_Enhanced)))
countsNE <- countsNonEssentialEnhanced[order(countsNonEssentialEnhanced$genesPresent,decreasing=T),]
countsNE
## AminoAcid genesPresent
## 9 S 17
## 7 Q 11
## 5 N 10
## 8 R 9
## 4 G 5
## 6 P 5
## 1 C 2
## 2 D 2
## 10 Y 2
## 3 E 1
The amino acids that are non-essential and already present in the body without having to consume them in diet are all in the top 31 genes enhanced by nucleotide strip 20 base pair barcodes of cDNA into mRNA and taking amino acids with biostrings library. The most abundant non-essential amino acid is in 17 of the 31 genes and is S-Serine. The following most abundant is Q-Glutamine in 11 genes, then N-Asparagine in 10 genes, and R-Arginine in 9 genes, then G-Glycine in 5 genes tied with P-Proline in 5 genes as well, then C-Cysteine in 2 genes, same with D-Aspartic Acid in 2 genes, and Y-Tyrosine in 2 genes, and then the least abundant non-essential amino acid in our top 31 enhanced MS genes is E-Glutamatic Acid.
If it is true what the author said about MS and how she almost cured it by improving her deterioration with food consumption management with diet and eating her amino acids by eating the color of the rainbow in veggies and fruits and eating organ meat once a week and meats daily, then if we follow this logic of amino acids and genes silenced, we should eat those foods with those amino acids to build them up like Alanine, Lysine, and Threonine. And then for the enhanced genes, we should limit those foods consumed with Leucine, Lysine, Alanine, and Tryptophan.
It seemed to work for that author, Terry Wahls,M.D., to help her manage not cure her MS symptoms.
That was an alternate analysis of the gene expression data gathered and use of biostrings to make proper amino acid strings from the mRNA and cDNA we started with. We have the genes and more knowledge about abundant amino acids found in the silenced and enhanced genes of our top 41 expressed genes in upregulation and downregulation. With 10 genes downregulated and 31 genes upregulated in Multiple Sclerosis patients. You can get the csv file that we created or added to here
We will still keep these genes to add to our machine in seeing if it can predict associated pathologies related to Epstein-Barr Virus or EBV later. You will have to go back to the machine learning post on this same study data that found with the 15 samples of MS repeats and 3 healthy control repeats, a 100% accuracy in class prediction of healthy or MS pathology using 10,000 tress in cross validation of a simple random forest using the randomForest library.
Thanks.