This document is an extension of the last project where we did machine learning on the data of Multiple Sclerosis barcode gene expression data of 20 base pair complementary DNA ID_REF sequences to predict with 100% accuracy if the sample was healthy or of an MS patient.
Since this occasion Bioconductor wasn’t working with the code searched on the internet earlier. I fooled around with the cDNA ID_REF feature of barcodes to get the mRNA strings, then from that ran code for triplet codons of 64 amino acids, including the start codon and additionally the 3 stop codons. Without running code for the 64 amino acids to replace their triplet nucleic acids of ribonucleic acid.
There were 18 strands with stop codons. Stop codons indicate the transcription could stop early in translating a protein and that the protein is deficient in that person possibly if all cells are doing that replication of the stop codon on that gene. After replacing the codons with their respective amino acid, there were areas of single and double ribonucleic acids that didn’t make an amino acid but no stop codons.
This could be useful if the amino acids all lined up well and split accordingly to show which amino acids are being replaced from DNA damage or not getting repaired. Many times the allele variants and copy variants of genes are what make differences in genetics and pathologies from inherited genes or from disturbances like environmental, viral, radiation, illness, chemotherapy, carcinogen exposure, and so on.
Feel free to continue exploring this experiment to see if you get any ideas or find a way to code for triplets that are in line and not gapped by 1-2 RNAs or ribonucleic acids that are not forming triplets for amino acid formation to get translated at the ribosome into a protein the body needs.
ML41data <- read.csv("ML_data_15MS_3Healthy.csv",header=T, sep=',',na.string=c('',' ','na','NA'))
ML41data$class <- as.factor(ML41data$class)
head(ML41data)
## TTTCGGGGAACCGAGTCGAT TTTAGAGTCGGTGGTAGATC TTCACGGTCCTTTTGGTCAC
## 1 1 1 31
## 2 1 1 23
## 3 3 1 25
## 4 119 99 2
## 5 106 89 2
## 6 101 49 2
## TGCGGTCGCGACCTTTCAGC TGCCCGCGCCTACAGTAGTG TGAATTTTAGAGTCGGTTTC
## 1 21 27 2
## 2 22 24 4
## 3 30 32 2
## 4 1 1 135
## 5 2 1 124
## 6 2 4 99
## TAGTGGCGTGAGATTTGCGT TAGAGTACCGTTTTTGAACT TAGACATGCAGTCGTTTCGA
## 1 28 20 1
## 2 29 26 1
## 3 23 13 2
## 4 1 1 61
## 5 1 2 114
## 6 3 4 65
## GTGATTCCACAGTCGTTAAT GTGAGGATACAGTCGGTTTT GTAGAGTCGTTACCCGACAC
## 1 2 2 1
## 2 1 1 1
## 3 4 2 1
## 4 209 76 82
## 5 176 124 67
## 6 135 52 55
## GTACCGTCGGTTGCTCGTGC GGTGTTGTCAGAGTCGTTAA GGTCCTGTCTTTTCTGCTGA
## 1 3 1 33
## 2 2 1 30
## 3 3 2 37
## 4 225 102 2
## 5 167 62 2
## 6 113 58 2
## GGGCGTGTTTTTCTGGAGTA GGCTCGGAGTCGCTGAAAAT GGAGTCGTCTTTTTATCCCC
## 1 32 1 2
## 2 33 1 1
## 3 38 1 1
## 4 7 70 155
## 5 4 69 78
## 6 2 53 44
## GCTTCGCAGTCGTTAGAGTT GCCGTCCTGTCTTTCTCATT GCCGAGTCGTTATGGACCCA
## 1 3 28 1
## 2 1 16 1
## 3 1 23 1
## 4 123 1 44
## 5 123 3 52
## 6 59 2 52
## GAGTCGTTTAAAGGCTCTCT GAGTCGTTCTCGTTTCGCAG CTCCAGAGCCGTTTTCGGTG
## 1 1 1 1
## 2 1 1 2
## 3 1 1 1
## 4 125 70 141
## 5 185 59 84
## 6 97 38 61
## CTATGGTCCCTTAGTGTTTA CTATCAACAGAGTCGCTAAT CGGTTAGAGTCGATAGCTTT
## 1 15 2 1
## 2 15 2 1
## 3 22 2 3
## 4 1 139 89
## 5 2 105 107
## 6 1 86 58
## CGAGTCGTTTGACCGGCGCA CCTCTCACCAGTCGTTTTGG CCAGTCGATTCTTTTCATAT
## 1 4 1 1
## 2 6 4 1
## 3 3 1 2
## 4 178 120 105
## 5 247 152 76
## 6 206 81 72
## CCAGCAGAGTCGCTCGAAAT CACCGTCGTTTTTGTGACCG ATCGTCGTTTTAGCCGTAGG
## 1 1 1 1
## 2 1 1 1
## 3 1 1 1
## 4 85 103 78
## 5 47 63 34
## 6 65 85 50
## ATCGTCGGTCTTAGCGGTCA AGATTAACCCAATACATTAT AGAGTCGCTCGTTAGGATCT
## 1 1 2 2
## 2 1 1 1
## 3 1 2 1
## 4 17 160 94
## 5 81 66 101
## 6 70 75 49
## AGAGTCGATTTGTCCAATCG AGAGGCGTTCGATCTTAGAC ACTGTCGTTTCAACGTTGAA
## 1 2 3 1
## 2 2 4 1
## 3 2 1 1
## 4 70 63 49
## 5 172 180 31
## 6 68 107 35
## ACCCCCGTCGTTAATTCGAC AACGCACGGGCGTGTTAGTC class
## 1 1 25 healthy
## 2 1 46 healthy
## 3 1 32 healthy
## 4 25 1 MS
## 5 104 2 MS
## 6 34 1 MS
Get the data here.
FC41 <- read.csv('top41genesCommonToAllFoldchangeValues3groupsMS.csv', header=T, sep=',', na.strings=c('',' ','na','NA'))
head(FC41)
## ID_REF control1.4362 control2.4363 control3.4364 MS1_r1_4370
## 1 TTTCGGGGAACCGAGTCGAT 1 1 3 119
## 2 TTTAGAGTCGGTGGTAGATC 1 1 1 99
## 3 TTCACGGTCCTTTTGGTCAC 31 23 25 2
## 4 TGCGGTCGCGACCTTTCAGC 21 22 30 1
## 5 TGCCCGCGCCTACAGTAGTG 27 24 32 1
## 6 TGAATTTTAGAGTCGGTTTC 2 4 2 135
## MS1_r2_4371 MS1_r3_4372 MS1_r4_4373 MS1_r5_4374 MS2_r1_4375 MS2_r2_4376
## 1 106 101 200 99 75 65
## 2 89 49 99 108 85 49
## 3 2 2 1 1 3 1
## 4 2 2 2 3 5 5
## 5 1 4 4 5 3 2
## 6 124 99 235 133 122 109
## MS2_r3_4377 MS2_r4_4378 MS2_r5_4379 commercial1o.commercial_r1_4365
## 1 63 78 87 128
## 2 53 36 43 83
## 3 3 1 1 3
## 4 2 3 3 2
## 5 3 1 4 5
## 6 97 123 98 106
## commercial2o.commercial_r2_4366 commercial3o.commercial_r3_4367
## 1 98 109
## 2 52 60
## 3 2 2
## 4 3 3
## 5 1 1
## 6 116 156
## commercial4o.commercial_r4_4368 commercial5o.commercial_r5_4369 controlMeans
## 1 93 57 1.666667
## 2 67 48 1.000000
## 3 1 1 26.333333
## 4 1 1 24.333333
## 5 5 1 27.666667
## 6 122 99 2.666667
## MS1_Means MS2_Means commercial_Means foldchange_MS1_vs_control
## 1 125.0 73.6 97.0 75.00000000
## 2 88.8 53.2 62.0 88.80000000
## 3 1.6 1.8 1.8 0.06075949
## 4 2.0 3.6 2.0 0.08219178
## 5 3.0 2.6 2.6 0.10843373
## 6 145.2 109.8 119.8 54.45000000
## foldchange_MS2_vs_control foldchange_commercialMS_vs_control
## 1 44.16000000 0.01718213
## 2 53.20000000 0.01612903
## 3 0.06835443 14.62962963
## 4 0.14794521 12.16666667
## 5 0.09397590 10.64102564
## 6 41.17500000 0.02225932
#install.packages("refseqR")
#install.packages("Biostrings")
#doesn't work have to install next line for bioconductor
#install.packages("BiocManager")
#BiocManager::install("Biostrings")
library(refseqR)
GeneID <- FC41[1,1]
transcript <- refseq_fromGene(GeneID,sequence="transcript")
protein <- refseq_fromGene(GeneID, sequence="protein")
barcodes_cDNA <- FC41$ID_REF
barcodes_cDNA
## [1] "TTTCGGGGAACCGAGTCGAT" "TTTAGAGTCGGTGGTAGATC" "TTCACGGTCCTTTTGGTCAC"
## [4] "TGCGGTCGCGACCTTTCAGC" "TGCCCGCGCCTACAGTAGTG" "TGAATTTTAGAGTCGGTTTC"
## [7] "TAGTGGCGTGAGATTTGCGT" "TAGAGTACCGTTTTTGAACT" "TAGACATGCAGTCGTTTCGA"
## [10] "GTGATTCCACAGTCGTTAAT" "GTGAGGATACAGTCGGTTTT" "GTAGAGTCGTTACCCGACAC"
## [13] "GTACCGTCGGTTGCTCGTGC" "GGTGTTGTCAGAGTCGTTAA" "GGTCCTGTCTTTTCTGCTGA"
## [16] "GGGCGTGTTTTTCTGGAGTA" "GGCTCGGAGTCGCTGAAAAT" "GGAGTCGTCTTTTTATCCCC"
## [19] "GCTTCGCAGTCGTTAGAGTT" "GCCGTCCTGTCTTTCTCATT" "GCCGAGTCGTTATGGACCCA"
## [22] "GAGTCGTTTAAAGGCTCTCT" "GAGTCGTTCTCGTTTCGCAG" "CTCCAGAGCCGTTTTCGGTG"
## [25] "CTATGGTCCCTTAGTGTTTA" "CTATCAACAGAGTCGCTAAT" "CGGTTAGAGTCGATAGCTTT"
## [28] "CGAGTCGTTTGACCGGCGCA" "CCTCTCACCAGTCGTTTTGG" "CCAGTCGATTCTTTTCATAT"
## [31] "CCAGCAGAGTCGCTCGAAAT" "CACCGTCGTTTTTGTGACCG" "ATCGTCGTTTTAGCCGTAGG"
## [34] "ATCGTCGGTCTTAGCGGTCA" "AGATTAACCCAATACATTAT" "AGAGTCGCTCGTTAGGATCT"
## [37] "AGAGTCGATTTGTCCAATCG" "AGAGGCGTTCGATCTTAGAC" "ACTGTCGTTTCAACGTTGAA"
## [40] "ACCCCCGTCGTTAATTCGAC" "AACGCACGGGCGTGTTAGTC"
The mRNA is reverse transcribed into the cDNA which is the antisense strand that is made into mRNA in transcription that is 5’ to 3’ just like the DNA strand that cDNA is a complement to. In translation the mRNA is translated with tRNA into amino acid triplet codons at the ribosome to make a protein. There are start and stop codons in copying DNA and making DNA. There are 21 or 26 amino acids and some have alternate triplet combinations but still make the same functional amino acid in making a protein. These barcodes are 20 base pairs each of cDNA which is the stuff copied. We can get back the mRNA that binds amino acids and translates into a protein, but they will also be fragments. We can look at the amino acids in these top barcodes to see what variants of the amino acids are more common in MS. Instead of getting complicated with stuff I don’t want to have to think about doing even with AI help in building a for loop and iterations to get the mRNA, it is actually possible to get the mRNA with just a few lines of gsub() as there are only 4 nucleic acids and T binds A and A binds U in RNA, so we can sub there, but we have to be careful when doing gsub on C and G that replace each other as one replacement if encountered next run will turn it into the other instead of keeping its place, so simply making one a lower case to run the other instance of C or G then after the other instance ran making the lower case instance when ran again back into capital will return the completed mRNA strand.
mRNA <- barcodes_cDNA
mRNA <- gsub("A","U", mRNA)
mRNA <- gsub("G","c",mRNA)
mRNA <- gsub("G","C", mRNA)
mRNA <- gsub("T", "A", mRNA)
mRNA <- gsub("C", "G", mRNA)
mRNA <- gsub("c","C",mRNA)
cDNA_mRNA <- data.frame(cDNA=barcodes_cDNA,mRNA=mRNA)
cDNA_mRNA
## cDNA mRNA
## 1 TTTCGGGGAACCGAGTCGAT AAAGCCCCUUGGCUCAGCUA
## 2 TTTAGAGTCGGTGGTAGATC AAAUCUCAGCCACCAUCUAG
## 3 TTCACGGTCCTTTTGGTCAC AAGUGCCAGGAAAACCAGUG
## 4 TGCGGTCGCGACCTTTCAGC ACGCCAGCGCUGGAAAGUCG
## 5 TGCCCGCGCCTACAGTAGTG ACGGGCGCGGAUGUCAUCAC
## 6 TGAATTTTAGAGTCGGTTTC ACUUAAAAUCUCAGCCAAAG
## 7 TAGTGGCGTGAGATTTGCGT AUCACCGCACUCUAAACGCA
## 8 TAGAGTACCGTTTTTGAACT AUCUCAUGGCAAAAACUUGA
## 9 TAGACATGCAGTCGTTTCGA AUCUGUACGUCAGCAAAGCU
## 10 GTGATTCCACAGTCGTTAAT CACUAAGGUGUCAGCAAUUA
## 11 GTGAGGATACAGTCGGTTTT CACUCCUAUGUCAGCCAAAA
## 12 GTAGAGTCGTTACCCGACAC CAUCUCAGCAAUGGGCUGUG
## 13 GTACCGTCGGTTGCTCGTGC CAUGGCAGCCAACGAGCACG
## 14 GGTGTTGTCAGAGTCGTTAA CCACAACAGUCUCAGCAAUU
## 15 GGTCCTGTCTTTTCTGCTGA CCAGGACAGAAAAGACGACU
## 16 GGGCGTGTTTTTCTGGAGTA CCCGCACAAAAAGACCUCAU
## 17 GGCTCGGAGTCGCTGAAAAT CCGAGCCUCAGCGACUUUUA
## 18 GGAGTCGTCTTTTTATCCCC CCUCAGCAGAAAAAUAGGGG
## 19 GCTTCGCAGTCGTTAGAGTT CGAAGCGUCAGCAAUCUCAA
## 20 GCCGTCCTGTCTTTCTCATT CGGCAGGACAGAAAGAGUAA
## 21 GCCGAGTCGTTATGGACCCA CGGCUCAGCAAUACCUGGGU
## 22 GAGTCGTTTAAAGGCTCTCT CUCAGCAAAUUUCCGAGAGA
## 23 GAGTCGTTCTCGTTTCGCAG CUCAGCAAGAGCAAAGCGUC
## 24 CTCCAGAGCCGTTTTCGGTG GAGGUCUCGGCAAAAGCCAC
## 25 CTATGGTCCCTTAGTGTTTA GAUACCAGGGAAUCACAAAU
## 26 CTATCAACAGAGTCGCTAAT GAUAGUUGUCUCAGCGAUUA
## 27 CGGTTAGAGTCGATAGCTTT GCCAAUCUCAGCUAUCGAAA
## 28 CGAGTCGTTTGACCGGCGCA GCUCAGCAAACUGGCCGCGU
## 29 CCTCTCACCAGTCGTTTTGG GGAGAGUGGUCAGCAAAACC
## 30 CCAGTCGATTCTTTTCATAT GGUCAGCUAAGAAAAGUAUA
## 31 CCAGCAGAGTCGCTCGAAAT GGUCGUCUCAGCGAGCUUUA
## 32 CACCGTCGTTTTTGTGACCG GUGGCAGCAAAAACACUGGC
## 33 ATCGTCGTTTTAGCCGTAGG UAGCAGCAAAAUCGGCAUCC
## 34 ATCGTCGGTCTTAGCGGTCA UAGCAGCCAGAAUCGCCAGU
## 35 AGATTAACCCAATACATTAT UCUAAUUGGGUUAUGUAAUA
## 36 AGAGTCGCTCGTTAGGATCT UCUCAGCGAGCAAUCCUAGA
## 37 AGAGTCGATTTGTCCAATCG UCUCAGCUAAACAGGUUAGC
## 38 AGAGGCGTTCGATCTTAGAC UCUCCGCAAGCUAGAAUCUG
## 39 ACTGTCGTTTCAACGTTGAA UGACAGCAAAGUUGCAACUU
## 40 ACCCCCGTCGTTAATTCGAC UGGGGGCAGCAAUUAAGCUG
## 41 AACGCACGGGCGTGTTAGTC UUGCGUGCCCGCACAAUCAG
You can see above in the data frame of cDNA barcodes next to the mRNA it makes or reverse transcribed it and see that for every instance of T there is an A, for A there is U, for C there is G, and for G there is C. Those are Adenine, Thymine, Guanine, and Cytosine. Where Uracil is in RNA and Thymine is in DNA.
Lets save mRNA character string we created from the cDNA and use mRNA to find the amino acids.
mRNA_cDNA_reversed <- mRNA
Now lets look online for start codons and stop codons. In DNA transcription if there is a stop codon encountered early, the mRNA is stopped early and there will be deficient proteins that mRNA molecule or strand was to make.
Start codons are: (chart online using)[https://microbenotes.com/codon-chart-table-amino-acids/]
Amino Acid Chart
Start codon is showing only AUG, and note all mRNA are bound at the end by non-coding regions that protect the sequence if unraveled at ends like a poly-A tail. There are 20 amino acids which includes the start codon methionine.
Methionine = start codon AUG
Stop Codons: UAA,UAG,UGA
stop_UAA <- gsub("UAA","---", mRNA)
stop_UAG <- gsub("UAG","***", mRNA)
stop_UGA <- gsub("UGA", "???", mRNA)
stops <- data.frame(UAA= stop_UAA, UAG=stop_UAG, UGA=stop_UGA)
stops
## UAA UAG UGA
## 1 AAAGCCCCUUGGCUCAGCUA AAAGCCCCUUGGCUCAGCUA AAAGCCCCUUGGCUCAGCUA
## 2 AAAUCUCAGCCACCAUCUAG AAAUCUCAGCCACCAUC*** AAAUCUCAGCCACCAUCUAG
## 3 AAGUGCCAGGAAAACCAGUG AAGUGCCAGGAAAACCAGUG AAGUGCCAGGAAAACCAGUG
## 4 ACGCCAGCGCUGGAAAGUCG ACGCCAGCGCUGGAAAGUCG ACGCCAGCGCUGGAAAGUCG
## 5 ACGGGCGCGGAUGUCAUCAC ACGGGCGCGGAUGUCAUCAC ACGGGCGCGGAUGUCAUCAC
## 6 ACU---AAUCUCAGCCAAAG ACUUAAAAUCUCAGCCAAAG ACUUAAAAUCUCAGCCAAAG
## 7 AUCACCGCACUC---ACGCA AUCACCGCACUCUAAACGCA AUCACCGCACUCUAAACGCA
## 8 AUCUCAUGGCAAAAACUUGA AUCUCAUGGCAAAAACUUGA AUCUCAUGGCAAAAACU???
## 9 AUCUGUACGUCAGCAAAGCU AUCUGUACGUCAGCAAAGCU AUCUGUACGUCAGCAAAGCU
## 10 CAC---GGUGUCAGCAAUUA CACUAAGGUGUCAGCAAUUA CACUAAGGUGUCAGCAAUUA
## 11 CACUCCUAUGUCAGCCAAAA CACUCCUAUGUCAGCCAAAA CACUCCUAUGUCAGCCAAAA
## 12 CAUCUCAGCAAUGGGCUGUG CAUCUCAGCAAUGGGCUGUG CAUCUCAGCAAUGGGCUGUG
## 13 CAUGGCAGCCAACGAGCACG CAUGGCAGCCAACGAGCACG CAUGGCAGCCAACGAGCACG
## 14 CCACAACAGUCUCAGCAAUU CCACAACAGUCUCAGCAAUU CCACAACAGUCUCAGCAAUU
## 15 CCAGGACAGAAAAGACGACU CCAGGACAGAAAAGACGACU CCAGGACAGAAAAGACGACU
## 16 CCCGCACAAAAAGACCUCAU CCCGCACAAAAAGACCUCAU CCCGCACAAAAAGACCUCAU
## 17 CCGAGCCUCAGCGACUUUUA CCGAGCCUCAGCGACUUUUA CCGAGCCUCAGCGACUUUUA
## 18 CCUCAGCAGAAAAAUAGGGG CCUCAGCAGAAAAA***GGG CCUCAGCAGAAAAAUAGGGG
## 19 CGAAGCGUCAGCAAUCUCAA CGAAGCGUCAGCAAUCUCAA CGAAGCGUCAGCAAUCUCAA
## 20 CGGCAGGACAGAAAGAG--- CGGCAGGACAGAAAGAGUAA CGGCAGGACAGAAAGAGUAA
## 21 CGGCUCAGCAAUACCUGGGU CGGCUCAGCAAUACCUGGGU CGGCUCAGCAAUACCUGGGU
## 22 CUCAGCAAAUUUCCGAGAGA CUCAGCAAAUUUCCGAGAGA CUCAGCAAAUUUCCGAGAGA
## 23 CUCAGCAAGAGCAAAGCGUC CUCAGCAAGAGCAAAGCGUC CUCAGCAAGAGCAAAGCGUC
## 24 GAGGUCUCGGCAAAAGCCAC GAGGUCUCGGCAAAAGCCAC GAGGUCUCGGCAAAAGCCAC
## 25 GAUACCAGGGAAUCACAAAU GAUACCAGGGAAUCACAAAU GAUACCAGGGAAUCACAAAU
## 26 GAUAGUUGUCUCAGCGAUUA GA***UUGUCUCAGCGAUUA GAUAGUUGUCUCAGCGAUUA
## 27 GCCAAUCUCAGCUAUCGAAA GCCAAUCUCAGCUAUCGAAA GCCAAUCUCAGCUAUCGAAA
## 28 GCUCAGCAAACUGGCCGCGU GCUCAGCAAACUGGCCGCGU GCUCAGCAAACUGGCCGCGU
## 29 GGAGAGUGGUCAGCAAAACC GGAGAGUGGUCAGCAAAACC GGAGAGUGGUCAGCAAAACC
## 30 GGUCAGC---GAAAAGUAUA GGUCAGCUAAGAAAAGUAUA GGUCAGCUAAGAAAAGUAUA
## 31 GGUCGUCUCAGCGAGCUUUA GGUCGUCUCAGCGAGCUUUA GGUCGUCUCAGCGAGCUUUA
## 32 GUGGCAGCAAAAACACUGGC GUGGCAGCAAAAACACUGGC GUGGCAGCAAAAACACUGGC
## 33 UAGCAGCAAAAUCGGCAUCC ***CAGCAAAAUCGGCAUCC UAGCAGCAAAAUCGGCAUCC
## 34 UAGCAGCCAGAAUCGCCAGU ***CAGCCAGAAUCGCCAGU UAGCAGCCAGAAUCGCCAGU
## 35 UC---UUGGGUUAUG---UA UCUAAUUGGGUUAUGUAAUA UCUAAUUGGGUUAUGUAAUA
## 36 UCUCAGCGAGCAAUCCUAGA UCUCAGCGAGCAAUCC***A UCUCAGCGAGCAAUCCUAGA
## 37 UCUCAGC---ACAGGUUAGC UCUCAGCUAAACAGGU***C UCUCAGCUAAACAGGUUAGC
## 38 UCUCCGCAAGCUAGAAUCUG UCUCCGCAAGC***AAUCUG UCUCCGCAAGCUAGAAUCUG
## 39 UGACAGCAAAGUUGCAACUU UGACAGCAAAGUUGCAACUU ???CAGCAAAGUUGCAACUU
## 40 UGGGGGCAGCAAU---GCUG UGGGGGCAGCAAUUAAGCUG UGGGGGCAGCAAUUAAGCUG
## 41 UUGCGUGCCCGCACAAUCAG UUGCGUGCCCGCACAAUCAG UUGCGUGCCCGCACAAUCAG
We can see above all the stop codons show in the barcodes. Lets grab the nucleotide strands with stop codons.
UAA <- grep("---",stop_UAA)
stop_UAA[UAA]
## [1] "ACU---AAUCUCAGCCAAAG" "AUCACCGCACUC---ACGCA" "CAC---GGUGUCAGCAAUUA"
## [4] "CGGCAGGACAGAAAGAG---" "GGUCAGC---GAAAAGUAUA" "UC---UUGGGUUAUG---UA"
## [7] "UCUCAGC---ACAGGUUAGC" "UGGGGGCAGCAAU---GCUG"
There are 8 strings with the stop codon UAA, and one of those has 2 instances of this stop codon.
UAG <- grep("\\*\\*\\*",stop_UAG)
stop_UAG[UAG]
## [1] "AAAUCUCAGCCACCAUC***" "CCUCAGCAGAAAAA***GGG" "GA***UUGUCUCAGCGAUUA"
## [4] "***CAGCAAAAUCGGCAUCC" "***CAGCCAGAAUCGCCAGU" "UCUCAGCGAGCAAUCC***A"
## [7] "UCUCAGCUAAACAGGU***C" "UCUCCGCAAGC***AAUCUG"
There are also 8 instances of the stop codon UAG, but none of the strings have more than one instance of that stop codon.
UGA <- grep("\\?\\?\\?",stop_UGA)
stop_UGA[UGA]
## [1] "AUCUCAUGGCAAAAACU???" "???CAGCAAAGUUGCAACUU"
There are only 2 strings with one instance each of the stop codon UGA.
Lets go ahead and see if we can gsub all the triplet codons, 64 of them into their respective amino acids. Here is the list again. 1. Phenylalanine = UUU, UUC
mRNA <- gsub("UUU","-Phe-", mRNA)
mRNA <- gsub("UUC", "-Phe-",mRNA)
mRNA
## [1] "AAAGCCCCUUGGCUCAGCUA" "AAAUCUCAGCCACCAUCUAG" "AAGUGCCAGGAAAACCAGUG"
## [4] "ACGCCAGCGCUGGAAAGUCG" "ACGGGCGCGGAUGUCAUCAC" "ACUUAAAAUCUCAGCCAAAG"
## [7] "AUCACCGCACUCUAAACGCA" "AUCUCAUGGCAAAAACUUGA" "AUCUGUACGUCAGCAAAGCU"
## [10] "CACUAAGGUGUCAGCAAUUA" "CACUCCUAUGUCAGCCAAAA" "CAUCUCAGCAAUGGGCUGUG"
## [13] "CAUGGCAGCCAACGAGCACG" "CCACAACAGUCUCAGCAAUU" "CCAGGACAGAAAAGACGACU"
## [16] "CCCGCACAAAAAGACCUCAU" "CCGAGCCUCAGCGAC-Phe-UA" "CCUCAGCAGAAAAAUAGGGG"
## [19] "CGAAGCGUCAGCAAUCUCAA" "CGGCAGGACAGAAAGAGUAA" "CGGCUCAGCAAUACCUGGGU"
## [22] "CUCAGCAAA-Phe-CCGAGAGA" "CUCAGCAAGAGCAAAGCGUC" "GAGGUCUCGGCAAAAGCCAC"
## [25] "GAUACCAGGGAAUCACAAAU" "GAUAGUUGUCUCAGCGAUUA" "GCCAAUCUCAGCUAUCGAAA"
## [28] "GCUCAGCAAACUGGCCGCGU" "GGAGAGUGGUCAGCAAAACC" "GGUCAGCUAAGAAAAGUAUA"
## [31] "GGUCGUCUCAGCGAGC-Phe-A" "GUGGCAGCAAAAACACUGGC" "UAGCAGCAAAAUCGGCAUCC"
## [34] "UAGCAGCCAGAAUCGCCAGU" "UCUAAUUGGGUUAUGUAAUA" "UCUCAGCGAGCAAUCCUAGA"
## [37] "UCUCAGCUAAACAGGUUAGC" "UCUCCGCAAGCUAGAAUCUG" "UGACAGCAAAGUUGCAACUU"
## [40] "UGGGGGCAGCAAUUAAGCUG" "UUGCGUGCCCGCACAAUCAG"
mRNA <- gsub("UUA", "-Leu-", mRNA)
mRNA <- gsub("UUG", "-Leu-", mRNA)
mRNA <- gsub("CUU", "-Leu-", mRNA)
mRNA <- gsub("CUC", "-Leu-", mRNA)
mRNA <- gsub("CUA", "-Leu-", mRNA)
mRNA <- gsub("CUG", "-Leu-", mRNA)
mRNA
## [1] "AAAGCCCC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-AGCCACCAU-Leu-G"
## [3] "AAGUGCCAGGAAAACCAGUG" "ACGCCAGCG-Leu-GAAAGUCG"
## [5] "ACGGGCGCGGAUGUCAUCAC" "AC-Leu-AAAU-Leu-AGCCAAAG"
## [7] "AUCACCGCA-Leu-UAAACGCA" "AU-Leu-AUGGCAAAAAC-Leu-A"
## [9] "AU-Leu-UACGUCAGCAAAGCU" "CA-Leu-AGGUGUCAGCAA-Leu-"
## [11] "CA-Leu--Leu-UGUCAGCCAAAA" "CAU-Leu-AGCAAUGGG-Leu-UG"
## [13] "CAUGGCAGCCAACGAGCACG" "CCACAACAGU-Leu-AGCAAUU"
## [15] "CCAGGACAGAAAAGACGACU" "CCCGCACAAAAAGAC-Leu-AU"
## [17] "CCGAGC-Leu-AGCGAC-Phe-UA" "C-Leu-AGCAGAAAAAUAGGGG"
## [19] "CGAAGCGUCAGCAAU-Leu-AA" "CGGCAGGACAGAAAGAGUAA"
## [21] "CGG-Leu-AGCAAUAC-Leu-GGU" "-Leu-AGCAAA-Phe-CCGAGAGA"
## [23] "-Leu-AGCAAGAGCAAAGCGUC" "GAGGU-Leu-GGCAAAAGCCAC"
## [25] "GAUACCAGGGAAUCACAAAU" "GAUAG-Leu-U-Leu-AGCGA-Leu-"
## [27] "GCCAAU-Leu-AG-Leu-UCGAAA" "G-Leu-AGCAAA-Leu-GCCGCGU"
## [29] "GGAGAGUGGUCAGCAAAACC" "GGUCAG-Leu-AGAAAAGUAUA"
## [31] "GGUCGU-Leu-AGCGAGC-Phe-A" "GUGGCAGCAAAAACA-Leu-GC"
## [33] "UAGCAGCAAAAUCGGCAUCC" "UAGCAGCCAGAAUCGCCAGU"
## [35] "U-Leu-A-Leu-GG-Leu-UGUAAUA" "U-Leu-AGCGAGCAAUC-Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-CGCAAG-Leu-GAAU-Leu-"
## [39] "UGACAGCAAAG-Leu-CAA-Leu-" "UGGGGGCAGCAA-Leu-AG-Leu-"
## [41] "-Leu-CGUGCCCGCACAAUCAG"
mRNA <- gsub("AUU", "-Ile-", mRNA)
mRNA <- gsub("AUC", "-Ile-", mRNA)
mRNA <- gsub("AUA", "-Ile-", mRNA)
mRNA
## [1] "AAAGCCCC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-AGCCACCAU-Leu-G"
## [3] "AAGUGCCAGGAAAACCAGUG" "ACGCCAGCG-Leu-GAAAGUCG"
## [5] "ACGGGCGCGGAUGUC-Ile-AC" "AC-Leu-AAAU-Leu-AGCCAAAG"
## [7] "-Ile-ACCGCA-Leu-UAAACGCA" "AU-Leu-AUGGCAAAAAC-Leu-A"
## [9] "AU-Leu-UACGUCAGCAAAGCU" "CA-Leu-AGGUGUCAGCAA-Leu-"
## [11] "CA-Leu--Leu-UGUCAGCCAAAA" "CAU-Leu-AGCAAUGGG-Leu-UG"
## [13] "CAUGGCAGCCAACGAGCACG" "CCACAACAGU-Leu-AGCA-Ile-"
## [15] "CCAGGACAGAAAAGACGACU" "CCCGCACAAAAAGAC-Leu-AU"
## [17] "CCGAGC-Leu-AGCGAC-Phe-UA" "C-Leu-AGCAGAAAA-Ile-GGGG"
## [19] "CGAAGCGUCAGCAAU-Leu-AA" "CGGCAGGACAGAAAGAGUAA"
## [21] "CGG-Leu-AGCA-Ile-C-Leu-GGU" "-Leu-AGCAAA-Phe-CCGAGAGA"
## [23] "-Leu-AGCAAGAGCAAAGCGUC" "GAGGU-Leu-GGCAAAAGCCAC"
## [25] "G-Ile-CCAGGGA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-AGCGA-Leu-"
## [27] "GCCAAU-Leu-AG-Leu-UCGAAA" "G-Leu-AGCAAA-Leu-GCCGCGU"
## [29] "GGAGAGUGGUCAGCAAAACC" "GGUCAG-Leu-AGAAAAGU-Ile-"
## [31] "GGUCGU-Leu-AGCGAGC-Phe-A" "GUGGCAGCAAAAACA-Leu-GC"
## [33] "UAGCAGCAAA-Ile-GGC-Ile-C" "UAGCAGCCAGA-Ile-GCCAGU"
## [35] "U-Leu-A-Leu-GG-Leu-UGUA-Ile-" "U-Leu-AGCGAGCA-Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-CGCAAG-Leu-GAAU-Leu-"
## [39] "UGACAGCAAAG-Leu-CAA-Leu-" "UGGGGGCAGCAA-Leu-AG-Leu-"
## [41] "-Leu-CGUGCCCGCACA-Ile-AG"
mRNA <- gsub("GUU", "-Val-", mRNA)
mRNA <- gsub("GUC", "-Val-", mRNA)
mRNA <- gsub("GUA", "-Val-", mRNA)
mRNA <- gsub("GUG", "-Val-", mRNA)
mRNA
## [1] "AAAGCCCC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-AGCCACCAU-Leu-G"
## [3] "AA-Val-CCAGGAAAACCA-Val-" "ACGCCAGCG-Leu-GAAA-Val-G"
## [5] "ACGGGCGCGGAU-Val--Ile-AC" "AC-Leu-AAAU-Leu-AGCCAAAG"
## [7] "-Ile-ACCGCA-Leu-UAAACGCA" "AU-Leu-AUGGCAAAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-AGCAAAGCU" "CA-Leu-AGGU-Val-AGCAA-Leu-"
## [11] "CA-Leu--Leu-U-Val-AGCCAAAA" "CAU-Leu-AGCAAUGGG-Leu-UG"
## [13] "CAUGGCAGCCAACGAGCACG" "CCACAACAGU-Leu-AGCA-Ile-"
## [15] "CCAGGACAGAAAAGACGACU" "CCCGCACAAAAAGAC-Leu-AU"
## [17] "CCGAGC-Leu-AGCGAC-Phe-UA" "C-Leu-AGCAGAAAA-Ile-GGGG"
## [19] "CGAAGC-Val-AGCAAU-Leu-AA" "CGGCAGGACAGAAAGA-Val-A"
## [21] "CGG-Leu-AGCA-Ile-C-Leu-GGU" "-Leu-AGCAAA-Phe-CCGAGAGA"
## [23] "-Leu-AGCAAGAGCAAAGC-Val-" "GAGGU-Leu-GGCAAAAGCCAC"
## [25] "G-Ile-CCAGGGA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-AGCGA-Leu-"
## [27] "GCCAAU-Leu-AG-Leu-UCGAAA" "G-Leu-AGCAAA-Leu-GCCGCGU"
## [29] "GGAGA-Val--Val-AGCAAAACC" "G-Val-AG-Leu-AGAAAAGU-Ile-"
## [31] "G-Val-GU-Leu-AGCGAGC-Phe-A" "-Val-GCAGCAAAAACA-Leu-GC"
## [33] "UAGCAGCAAA-Ile-GGC-Ile-C" "UAGCAGCCAGA-Ile-GCCAGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-AGCGAGCA-Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-CGCAAG-Leu-GAAU-Leu-"
## [39] "UGACAGCAAAG-Leu-CAA-Leu-" "UGGGGGCAGCAA-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCCGCACA-Ile-AG"
mRNA <- gsub("GCU", "-Ala-", mRNA)
mRNA <- gsub("GCC", "-Ala-", mRNA)
mRNA <- gsub("GCA", "-Ala-", mRNA)
mRNA <- gsub("GCG", "-Ala-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-ACCAU-Leu-G"
## [3] "AA-Val-CCAGGAAAACCA-Val-" "AC-Ala-A-Ala--Leu-GAAA-Val-G"
## [5] "ACGG-Ala-CGGAU-Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu-AGGU-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-AUGGG-Leu-UG"
## [13] "CAUG-Ala--Ala-AACGA-Ala-CG" "CCACAACAGU-Leu-A-Ala--Ile-"
## [15] "CCAGGACAGAAAAGACGACU" "CCC-Ala-CAAAAAGAC-Leu-AU"
## [17] "CCGAGC-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala-GAAAA-Ile-GGGG"
## [19] "CGAAGC-Val-A-Ala-AU-Leu-AA" "CG-Ala-GGACAGAAAGA-Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu-GGU" "-Leu-A-Ala-AA-Phe-CCGAGAGA"
## [23] "-Leu-A-Ala-AGA-Ala-AAGC-Val-" "GAGGU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CCAGGGA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UCGAAA" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "GGAGA-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-AGAAAAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile-GGC-Ile-C" "UA-Ala--Ala-AGA-Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-C-Ala-AG-Leu-GAAU-Leu-"
## [39] "UGACA-Ala-AAG-Leu-CAA-Leu-" "UGGGG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("GAU", "-Asp-", mRNA)
mRNA <- gsub("GAC", "-Asp-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-ACCAU-Leu-G"
## [3] "AA-Val-CCAGGAAAACCA-Val-" "AC-Ala-A-Ala--Leu-GAAA-Val-G"
## [5] "ACGG-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu-AGGU-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-AUGGG-Leu-UG"
## [13] "CAUG-Ala--Ala-AACGA-Ala-CG" "CCACAACAGU-Leu-A-Ala--Ile-"
## [15] "CCAG-Asp-AGAAAA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CCGAGC-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala-GAAAA-Ile-GGGG"
## [19] "CGAAGC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-AGAAAGA-Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu-GGU" "-Leu-A-Ala-AA-Phe-CCGAGAGA"
## [23] "-Leu-A-Ala-AGA-Ala-AAGC-Val-" "GAGGU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CCAGGGA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UCGAAA" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "GGAGA-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-AGAAAAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile-GGC-Ile-C" "UA-Ala--Ala-AGA-Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-C-Ala-AG-Leu-GAAU-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "UGGGG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("GAA", "-Glu-", mRNA)
mRNA <- gsub("GAG", "-Glu-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-ACCAU-Leu-G"
## [3] "AA-Val-CCAG-Glu-AACCA-Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "ACGG-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu-AGGU-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-AUGGG-Leu-UG"
## [13] "CAUG-Ala--Ala-AACGA-Ala-CG" "CCACAACAGU-Leu-A-Ala--Ile-"
## [15] "CCAG-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile-GGGG"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu-AGA-Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu-GGU" "-Leu-A-Ala-AA-Phe-CC-Glu-AGA"
## [23] "-Leu-A-Ala-AGA-Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CCAGGGA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile-GGC-Ile-C" "UA-Ala--Ala-AGA-Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "UGGGG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("UGG", "-Trp-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-ACCAU-Leu-G"
## [3] "AA-Val-CCAG-Glu-AACCA-Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "ACGG-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu-AGGU-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AACGA-Ala-CG" "CCACAACAGU-Leu-A-Ala--Ile-"
## [15] "CCAG-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile-GGGG"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu-AGA-Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu-GGU" "-Leu-A-Ala-AA-Phe-CC-Glu-AGA"
## [23] "-Leu-A-Ala-AGA-Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CCAGGGA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile-GGC-Ile-C" "UA-Ala--Ala-AGA-Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AACAGG-Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("CGU", "-Arg-", mRNA)
mRNA <- gsub("CGC", "-Arg-", mRNA)
mRNA <- gsub("CGA", "-Arg-", mRNA)
mRNA <- gsub("CGC", "-Arg-", mRNA)
mRNA <- gsub("AGA", "-Arg-", mRNA)
mRNA <- gsub("AGG", "-Arg-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-ACCAU-Leu-G"
## [3] "AA-Val-CCAG-Glu-AACCA-Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "ACGG-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "CCACAACAGU-Leu-A-Ala--Ile-"
## [15] "CCAG-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile-GGGG"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu-GGU" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile-GGC-Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("GGU", "-Gln-", mRNA)
mRNA <- gsub("GGC", "-Gln-", mRNA)
mRNA <- gsub("GGA", "-Gln-", mRNA)
mRNA <- gsub("GGG", "-Gln-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-ACCAU-Leu-G"
## [3] "AA-Val-CCAG-Glu-AACCA-Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "ACGG-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "CCACAACAGU-Leu-A-Ala--Ile-"
## [15] "CCAG-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("CCU", "-Pro-", mRNA)
mRNA <- gsub("CCA", "-Pro-", mRNA)
mRNA <- gsub("CCU", "-Pro-", mRNA)
mRNA <- gsub("CCG", "-Pro-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "ACGG-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile-ACC-Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CAACAGU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile-ACAAAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AAACC" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAAACA-Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("ACU", "-Thr-", mRNA)
mRNA <- gsub("ACC", "-Thr-", mRNA)
mRNA <- gsub("ACA", "-Thr-", mRNA)
mRNA <- gsub("ACG", "-Thr-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile--Thr--Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAA-Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("UCU", "-Ser-", mRNA)
mRNA <- gsub("UCC", "-Ser-", mRNA)
mRNA <- gsub("UCA", "-Ser-", mRNA)
mRNA <- gsub("UCG", "Ser-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile--Thr--Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu-UAC-Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAA-Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("UAU", "-Tyr-", mRNA)
mRNA <- gsub("UAC", "-Tyr-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile--Thr--Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAA-Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("UGU", "-Cys-", mRNA)
mRNA <- gsub("UGC", "-Cys-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile--Thr--Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "CAU-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "CAUG-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAA-Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("CAU", "-His-", mRNA)
mRNA <- gsub("CAC", "-His-", mRNA)
mRNA
## [1] "AAA-Ala-CC-Leu-G-Leu-AG-Leu-" "AAAU-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu-AAAU-Leu-A-Ala-AAAG"
## [7] "-Ile--Thr--Ala--Leu-UAAAC-Ala-" "AU-Leu-AUG-Ala-AAAAC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala-AAAA" "-His--Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "-His-G-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-CAAAAA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala-AAGC-Val-" "-Glu-GU-Leu-G-Ala-AAA-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu-AAGU-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala-AAA-Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala-AAG-Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("AAA", "-Lys-", mRNA)
mRNA <- gsub("AAG", "-Lys-", mRNA)
mRNA
## [1] "-Lys--Ala-CC-Leu-G-Leu-AG-Leu-" "-Lys-U-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu--Lys-U-Leu-A-Ala--Lys-G"
## [7] "-Ile--Thr--Ala--Leu-U-Lys-C-Ala-" "AU-Leu-AUG-Ala--Lys-AC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala--Lys-A" "-His--Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "-His-G-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-C-Lys-AA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala--Lys-C-Val-" "-Glu-GU-Leu-G-Ala--Lys--Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu--Lys-U-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala--Lys--Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala--Lys--Leu-CAA-Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("CAA", "-Gln-", mRNA)
mRNA <- gsub("CAG", "-Gln-", mRNA)
mRNA
## [1] "-Lys--Ala-CC-Leu-G-Leu-AG-Leu-" "-Lys-U-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu--Lys-U-Leu-A-Ala--Lys-G"
## [7] "-Ile--Thr--Ala--Leu-U-Lys-C-Ala-" "AU-Leu-AUG-Ala--Lys-AC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala--Lys-A" "-His--Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "-His-G-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-C-Lys-AA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala--Lys-C-Val-" "-Glu-GU-Leu-G-Ala--Lys--Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr-AAU" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-AAU-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu--Lys-U-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala--Lys--Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu-AAC-Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala--Lys--Leu--Gln--Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("AAU", "-Asp-", mRNA)
mRNA <- gsub("AAC", "-Asp-", mRNA)
mRNA
## [1] "-Lys--Ala-CC-Leu-G-Leu-AG-Leu-" "-Lys-U-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu--Lys-U-Leu-A-Ala--Lys-G"
## [7] "-Ile--Thr--Ala--Leu-U-Lys-C-Ala-" "AU-Leu-AUG-Ala--Lys-AC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala--Lys-A" "-His--Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "-His-G-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-C-Lys-AA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala--Lys-C-Val-" "-Glu-GU-Leu-G-Ala--Lys--Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr--Asp-" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala--Asp--Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu--Lys-U-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala--Lys--Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu--Asp--Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala--Lys--Leu--Gln--Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
mRNA <- gsub("AUG", "-Met-", mRNA)
mRNA
## [1] "-Lys--Ala-CC-Leu-G-Leu-AG-Leu-" "-Lys-U-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val--Pro-G-Glu-AA-Pro--Val-" "AC-Ala-A-Ala--Leu--Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp--Val--Ile-AC" "AC-Leu--Lys-U-Leu-A-Ala--Lys-G"
## [7] "-Ile--Thr--Ala--Leu-U-Lys-C-Ala-" "AU-Leu--Met--Ala--Lys-AC-Leu-A"
## [9] "AU-Leu--Tyr--Val-A-Ala-AA-Ala-" "CA-Leu--Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu--Leu-U-Val-A-Ala--Lys-A" "-His--Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "-His-G-Ala--Ala-AA-Arg--Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala--Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp--Asp-U" "CCC-Ala-C-Lys-AA-Asp--Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala--Glu-AA-Ile--Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu--Arg--Val-A"
## [21] "CGG-Leu-A-Ala--Ile-C-Leu--Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu--Arg-"
## [23] "-Leu-A-Ala--Arg--Ala--Lys-C-Val-" "-Glu-GU-Leu-G-Ala--Lys--Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile--Thr--Asp-" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala--Asp--Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu--Ala--Ala-U"
## [29] "G-Glu-A-Val--Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu--Lys-U-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val--Ala--Ala--Lys--Thr--Leu-GC"
## [33] "UA-Ala--Ala-AA-Ile--Gln--Ile-C" "UA-Ala--Ala--Arg--Ile--Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val--Ile-" "U-Leu-A-Ala-A-Ala--Ile--Leu-GA"
## [37] "U-Leu-AG-Leu--Asp--Arg--Leu-GC" "U-Leu-C-Ala-AG-Leu--Glu-U-Leu-"
## [39] "U-Asp-A-Ala--Lys--Leu--Gln--Leu-" "-Trp-GG-Ala--Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
Stop Codons: UAA,UAG,UGA
mRNA <- gsub("UAA", "-STOP-", mRNA)
mRNA <- gsub("UAG", "-STOP-", mRNA)
mRNA <- gsub("UGA", "-STOP-", mRNA)
mRNA <- gsub("\\-\\-","\\-",mRNA)
mRNA
## [1] "-Lys-Ala-CC-Leu-G-Leu-AG-Leu-" "-Lys-U-Leu-A-Ala-A-Pro-U-Leu-G"
## [3] "AA-Val-Pro-G-Glu-AA-Pro-Val-" "AC-Ala-A-Ala-Leu-Glu-A-Val-G"
## [5] "-Thr-G-Ala-CG-Asp-Val-Ile-AC" "AC-Leu-Lys-U-Leu-A-Ala-Lys-G"
## [7] "-Ile-Thr-Ala-Leu-U-Lys-C-Ala-" "AU-Leu-Met-Ala-Lys-AC-Leu-A"
## [9] "AU-Leu-Tyr-Val-A-Ala-AA-Ala-" "CA-Leu-Arg-U-Val-A-Ala-A-Leu-"
## [11] "CA-Leu-Leu-U-Val-A-Ala-Lys-A" "-His-Leu-A-Ala-A-Trp-G-Leu-UG"
## [13] "-His-G-Ala-Ala-AA-Arg-Ala-CG" "-Pro-CA-Thr-GU-Leu-A-Ala-Ile-"
## [15] "-Pro-G-Asp-A-Glu-AA-Asp-Asp-U" "CCC-Ala-C-Lys-AA-Asp-Leu-AU"
## [17] "CC-Glu-C-Leu-A-Ala-AC-Phe-UA" "C-Leu-A-Ala-Glu-AA-Ile-Gln-G"
## [19] "C-Glu-GC-Val-A-Ala-AU-Leu-AA" "CG-Ala-G-Asp-A-Glu-Arg-Val-A"
## [21] "CGG-Leu-A-Ala-Ile-C-Leu-Gln-" "-Leu-A-Ala-AA-Phe-CC-Glu-Arg-"
## [23] "-Leu-A-Ala-Arg-Ala-Lys-C-Val-" "-Glu-GU-Leu-G-Ala-Lys-Ala-AC"
## [25] "G-Ile-CC-Arg-GA-Ile-Thr-Asp-" "G-Ile-G-Leu-U-Leu-A-Ala-A-Leu-"
## [27] "-Ala-Asp-Leu-AG-Leu-UC-Glu-A" "G-Leu-A-Ala-AA-Leu-Ala-Ala-U"
## [29] "G-Glu-A-Val-Val-A-Ala-AA-Thr-" "G-Val-AG-Leu-A-Glu-Lys-U-Ile-"
## [31] "G-Val-GU-Leu-A-Ala-AGC-Phe-A" "-Val-Ala-Ala-Lys-Thr-Leu-GC"
## [33] "UA-Ala-Ala-AA-Ile-Gln-Ile-C" "UA-Ala-Ala-Arg-Ile-Ala-AGU"
## [35] "U-Leu-A-Leu-GG-Leu-U-Val-Ile-" "U-Leu-A-Ala-A-Ala-Ile-Leu-GA"
## [37] "U-Leu-AG-Leu-Asp-Arg-Leu-GC" "U-Leu-C-Ala-AG-Leu-Glu-U-Leu-"
## [39] "U-Asp-A-Ala-Lys-Leu-Gln-Leu-" "-Trp-GG-Ala-Ala-A-Leu-AG-Leu-"
## [41] "-Leu-C-Val-CCC-Ala-CA-Ile-AG"
We can see with use of a gsub of the triplet codons for their respective amino acid name that the mRNA has no STOP codons in the barcodes, but there are gaps of 1-2 ribonucleic acids that are not set into a codon.
These are broken fragments that they described in the methods of the experiment so that these could be 20 nucleic acids long and stored as the barcodes to identify their location on DNA and the gene associated with.
No gene is only 20 base pairs long, and having the codons above represented looks useful but it should probably start at the beginning and use every set of 3 nucleic acids to grab the respective amino acid, then mark the stop codon. I am not aware of how to split a string by count of characters but on symbol or by each character. We saw from chart image above that every nucleic acid symbol for RNA has a set accounted for in a triplet set or codon. If we could break the set up into sets of 3 from the beginning of the string to the end of the string there would still be left out nucleic acids as this 21 is divisble by 3 or 18 but not 20 evenly. We don’t really know where along the gene the barcode starts or ends. We could look and see
Lets go ahead and add the mRNA and the amino acids decoded character strings of our ID_REF cDNA top 41 barcodes in MS.
FC41$mRNA <- mRNA_cDNA_reversed
FC41$aminoAcids_simulated <- mRNA
str(FC41)
## 'data.frame': 41 obs. of 28 variables:
## $ ID_REF : chr "TTTCGGGGAACCGAGTCGAT" "TTTAGAGTCGGTGGTAGATC" "TTCACGGTCCTTTTGGTCAC" "TGCGGTCGCGACCTTTCAGC" ...
## $ control1.4362 : int 1 1 31 21 27 2 28 20 1 2 ...
## $ control2.4363 : int 1 1 23 22 24 4 29 26 1 1 ...
## $ control3.4364 : int 3 1 25 30 32 2 23 13 2 4 ...
## $ MS1_r1_4370 : int 119 99 2 1 1 135 1 1 61 209 ...
## $ MS1_r2_4371 : int 106 89 2 2 1 124 1 2 114 176 ...
## $ MS1_r3_4372 : int 101 49 2 2 4 99 3 4 65 135 ...
## $ MS1_r4_4373 : int 200 99 1 2 4 235 3 2 103 242 ...
## $ MS1_r5_4374 : int 99 108 1 3 5 133 3 2 116 225 ...
## $ MS2_r1_4375 : int 75 85 3 5 3 122 1 3 101 104 ...
## $ MS2_r2_4376 : int 65 49 1 5 2 109 3 3 110 114 ...
## $ MS2_r3_4377 : int 63 53 3 2 3 97 4 4 87 107 ...
## $ MS2_r4_4378 : int 78 36 1 3 1 123 6 1 110 138 ...
## $ MS2_r5_4379 : int 87 43 1 3 4 98 1 2 68 111 ...
## $ commercial1o.commercial_r1_4365 : int 128 83 3 2 5 106 3 3 88 124 ...
## $ commercial2o.commercial_r2_4366 : int 98 52 2 3 1 116 4 2 105 114 ...
## $ commercial3o.commercial_r3_4367 : int 109 60 2 3 1 156 2 2 97 156 ...
## $ commercial4o.commercial_r4_4368 : int 93 67 1 1 5 122 1 2 86 95 ...
## $ commercial5o.commercial_r5_4369 : int 57 48 1 1 1 99 2 2 64 103 ...
## $ controlMeans : num 1.67 1 26.33 24.33 27.67 ...
## $ MS1_Means : num 125 88.8 1.6 2 3 ...
## $ MS2_Means : num 73.6 53.2 1.8 3.6 2.6 ...
## $ commercial_Means : num 97 62 1.8 2 2.6 ...
## $ foldchange_MS1_vs_control : num 75 88.8 0.0608 0.0822 0.1084 ...
## $ foldchange_MS2_vs_control : num 44.16 53.2 0.0684 0.1479 0.094 ...
## $ foldchange_commercialMS_vs_control: num 0.0172 0.0161 14.6296 12.1667 10.641 ...
## $ mRNA : chr "AAAGCCCCUUGGCUCAGCUA" "AAAUCUCAGCCACCAUCUAG" "AAGUGCCAGGAAAACCAGUG" "ACGCCAGCGCUGGAAAGUCG" ...
## $ aminoAcids_simulated : chr "-Lys-Ala-CC-Leu-G-Leu-AG-Leu-" "-Lys-U-Leu-A-Ala-A-Pro-U-Leu-G" "AA-Val-Pro-G-Glu-AA-Pro-Val-" "AC-Ala-A-Ala-Leu-Glu-A-Val-G" ...
I was hoping that the code for bioconductor would work but then decided to try looking up the mRNA and the amino acids that could be found by simple gsub of all 64 amino acids from the mRNA reverse transcribed from the complementary DNA but there are gaps and single and double sets of nucleic acids between the triplet codons of amino acids. It was neat to try, but the other way requires some fancy coding to split into sets of 3 but still got the extra 2 from the 20 bp long barcodes that wouldn’t make a set. We saw the barcodes had 2-8 of these top 41 barcodes with the stop codons in their strings before running code to get all 64 amino acids if present after which none of the stop codons were present.
The next best thing is to manually look up each barcode in the respective gene database to get their ranking among other barcodes for which gene and chromosome location the barcode is located and if on the up or down strand towards or away from centromere some call forward and reverse strands.
Thanks for reading, maybe you have ideas or know how to tackle this.
write.csv(FC41, 'MS_41genes_15samples_Foldchange_mRNA_AAs.csv',
row.names=FALSE)
Get this file here.