RNA editing is a process that changes a RNA transcript such that it would no longer correspond to a sequence of DNA in a genome. A-to-I RNA editing is widespread in animals and results in the modification of a adenosine to inosine which will be read as a guanine. A-to-I editing in messenger RNA (mRNA) can cause changes in the amino acid sequence of a protein (amino acid recoding). It was recently discovered that RNA editing and amino acid changes are widespread in cephalopods (octopus and squid). Article: Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods, Cell 169, 191–202 (2017).
(RNA is a copy of one specific part of DNA, but after being created it can still be changed in a way that the DNA did not program it. There can be many types of changes. We are looking at the one where nucleotide A changes to nucleotide G, which changes the codon. Codons (64 combinations in total) are codes for specific amino acids. (one amino acids can relate to more than one codon). Nucleotides are being changed by enzymes that catalyze the editing.)
Codons
Amino acids
Description of the data:
The data set consists of calculated expected distribution (“Expected amount”, “Expected frequency”) and observed distribution (“edits”, “frequency”) of amino acid changes in humans, four cephalopod species, and conserved edits from those species.
KR = represents a change from amino acid K to amino acid R syn = synonymous, without any changes to amino acid, likely doesn’t have an effect stop_W = stop to w, causes a significant change in a protein sequence
Types of amino acid changes:
The ratio of radical to conservative changes indicates how many changes are likely to have a negative effect.
Amino acid changes also can be random or non-random: - Random edits = likely to be slightly bad = doesn’t make an animal better - Non-random edits = likely to be good = likely to be actively preserved in the population = seen more in conserved
In humans most changes are random, therefore, they are most likely to have a negative effect. In Individual cephalopod species, amino acid changes are a lot less random and they are less likely to have a negative effect. In conserved, changes are least random (assumption) and most likely to have a positive effect. 1. Human vs. individual ocean species 2. Individual cephalopod species vs. conserved
| Change | Expected_amount | Expected_frequency | edits | frequency | difference | fold_difference | sp_name |
|---|---|---|---|---|---|---|---|
| KR | 2 | 0.0606061 | 220 | 0.1424870 | -0.0818810 | 2.3510363 | human |
| SG | 2 | 0.0606061 | 151 | 0.0977979 | -0.0371919 | 1.6136658 | human |
| IM | 1 | 0.0303030 | 38 | 0.0246114 | 0.0056916 | 0.8121762 | human |
| TA | 4 | 0.1212121 | 148 | 0.0958549 | 0.0253572 | 0.7908031 | human |
| IV | 3 | 0.0909091 | 101 | 0.0654145 | 0.0254946 | 0.7195596 | human |
| MV | 1 | 0.0303030 | 31 | 0.0200777 | 0.0102253 | 0.6625648 | human |
| HR | 2 | 0.0606061 | 42 | 0.0272021 | 0.0334040 | 0.4488342 | human |
| NS | 2 | 0.0606061 | 93 | 0.0602332 | 0.0003729 | 0.9938472 | human |
| ND | 2 | 0.0606061 | 87 | 0.0563472 | 0.0042589 | 0.9297280 | human |
| QR | 2 | 0.0606061 | 128 | 0.0829016 | -0.0222955 | 1.3678756 | human |
| KE | 2 | 0.0606061 | 176 | 0.1139896 | -0.0533836 | 1.8808290 | human |
| EG | 2 | 0.0606061 | 114 | 0.0738342 | -0.0132281 | 1.2182642 | human |
| RG | 2 | 0.0606061 | 99 | 0.0641192 | -0.0035131 | 1.0579663 | human |
| YC | 2 | 0.0606061 | 64 | 0.0414508 | 0.0191553 | 0.6839378 | human |
| DG | 2 | 0.0606061 | 47 | 0.0304404 | 0.0301656 | 0.5022668 | human |
| stop-W | 2 | 0.0606061 | 5 | 0.0032383 | 0.0573677 | 0.0534326 | human |
| KR | 2 | 0.0416667 | 14503 | 0.1362949 | -0.0946282 | 3.2710767 | specific_sepia |
| SG | 2 | 0.0416667 | 6856 | 0.0644306 | -0.0227640 | 1.5463354 | specific_sepia |
| NS | 2 | 0.0416667 | 5836 | 0.0548450 | -0.0131783 | 1.3162796 | specific_sepia |
| KE | 2 | 0.0416667 | 5552 | 0.0521760 | -0.0105094 | 1.2522249 | specific_sepia |
| YC | 2 | 0.0416667 | 4843 | 0.0455131 | -0.0038464 | 1.0923136 | specific_sepia |
| syn | 15 | 0.3125000 | 36310 | 0.3412305 | -0.0287305 | 1.0919377 | specific_sepia |
| ND | 2 | 0.0416667 | 4807 | 0.0451748 | -0.0035081 | 1.0841940 | specific_sepia |
| MV | 1 | 0.0208333 | 2396 | 0.0225169 | -0.0016836 | 1.0808108 | specific_sepia |
| RG | 2 | 0.0416667 | 4086 | 0.0383990 | 0.0032677 | 0.9215762 | specific_sepia |
| IM | 1 | 0.0208333 | 2003 | 0.0188236 | 0.0020097 | 0.9035326 | specific_sepia |
| QR | 2 | 0.0416667 | 3678 | 0.0345647 | 0.0071019 | 0.8295539 | specific_sepia |
| IV | 3 | 0.0625000 | 5041 | 0.0473738 | 0.0151262 | 0.7579810 | specific_sepia |
| EG | 2 | 0.0416667 | 2902 | 0.0272721 | 0.0143945 | 0.6545311 | specific_sepia |
| TA | 4 | 0.0833333 | 5523 | 0.0519035 | 0.0314298 | 0.6228421 | specific_sepia |
| DG | 2 | 0.0416667 | 1103 | 0.0103657 | 0.0313010 | 0.2487759 | specific_sepia |
| HR | 2 | 0.0416667 | 947 | 0.0088996 | 0.0327670 | 0.2135910 | specific_sepia |
| stop-W | 2 | 0.0416667 | 23 | 0.0002161 | 0.0414505 | 0.0051875 | specific_sepia |
| KR | 2 | 0.0416667 | 243 | 0.2120419 | -0.1703752 | 5.0890052 | conserved_cephalopods |
| SG | 2 | 0.0416667 | 106 | 0.0924956 | -0.0508290 | 2.2198953 | conserved_cephalopods |
| YC | 2 | 0.0416667 | 66 | 0.0575916 | -0.0159250 | 1.3821990 | conserved_cephalopods |
| RG | 2 | 0.0416667 | 64 | 0.0558464 | -0.0141798 | 1.3403141 | conserved_cephalopods |
| QR | 2 | 0.0416667 | 63 | 0.0549738 | -0.0133072 | 1.3193717 | conserved_cephalopods |
| IM | 1 | 0.0208333 | 30 | 0.0261780 | -0.0053447 | 1.2565445 | conserved_cephalopods |
| MV | 1 | 0.0208333 | 26 | 0.0226876 | -0.0018543 | 1.0890052 | conserved_cephalopods |
| NS | 2 | 0.0416667 | 46 | 0.0401396 | 0.0015271 | 0.9633508 | conserved_cephalopods |
| KE | 2 | 0.0416667 | 43 | 0.0375218 | 0.0041449 | 0.9005236 | conserved_cephalopods |
| ND | 2 | 0.0416667 | 40 | 0.0349040 | 0.0067627 | 0.8376963 | conserved_cephalopods |
| IV | 3 | 0.0625000 | 58 | 0.0506108 | 0.0118892 | 0.8097731 | conserved_cephalopods |
| syn | 15 | 0.3125000 | 259 | 0.2260035 | 0.0864965 | 0.7232112 | conserved_cephalopods |
| EG | 2 | 0.0416667 | 32 | 0.0279232 | 0.0137435 | 0.6701571 | conserved_cephalopods |
| TA | 4 | 0.0833333 | 44 | 0.0383944 | 0.0449389 | 0.4607330 | conserved_cephalopods |
| HR | 2 | 0.0416667 | 13 | 0.0113438 | 0.0303229 | 0.2722513 | conserved_cephalopods |
| DG | 2 | 0.0416667 | 11 | 0.0095986 | 0.0320681 | 0.2303665 | conserved_cephalopods |
| stop-W | 2 | 0.0416667 | 2 | 0.0017452 | 0.0399215 | 0.0418848 | conserved_cephalopods |
| KR | 2 | 0.0416667 | 8304 | 0.1327048 | -0.0910381 | 3.1849141 | specific_oct_bim |
| SG | 2 | 0.0416667 | 4503 | 0.0719616 | -0.0302950 | 1.7270795 | specific_oct_bim |
| NS | 2 | 0.0416667 | 3484 | 0.0556772 | -0.0140105 | 1.3362525 | specific_oct_bim |
| IM | 1 | 0.0208333 | 1543 | 0.0246584 | -0.0038251 | 1.1836037 | specific_oct_bim |
| KE | 2 | 0.0416667 | 3020 | 0.0482621 | -0.0065954 | 1.1582901 | specific_oct_bim |
| ND | 2 | 0.0416667 | 3006 | 0.0480384 | -0.0063717 | 1.1529205 | specific_oct_bim |
| syn | 15 | 0.3125000 | 21881 | 0.3496764 | -0.0371764 | 1.1189644 | specific_oct_bim |
| YC | 2 | 0.0416667 | 2657 | 0.0424610 | -0.0007944 | 1.0190651 | specific_oct_bim |
| MV | 1 | 0.0208333 | 1225 | 0.0195765 | 0.0012568 | 0.9396724 | specific_oct_bim |
| RG | 2 | 0.0416667 | 2315 | 0.0369956 | 0.0046711 | 0.8878945 | specific_oct_bim |
| QR | 2 | 0.0416667 | 2014 | 0.0321854 | 0.0094813 | 0.7724491 | specific_oct_bim |
| IV | 3 | 0.0625000 | 2986 | 0.0477187 | 0.0147813 | 0.7634998 | specific_oct_bim |
| TA | 4 | 0.0833333 | 3287 | 0.0525290 | 0.0308044 | 0.6303476 | specific_oct_bim |
| EG | 2 | 0.0416667 | 1267 | 0.0202477 | 0.0214190 | 0.4859449 | specific_oct_bim |
| DG | 2 | 0.0416667 | 576 | 0.0092050 | 0.0324617 | 0.2209189 | specific_oct_bim |
| HR | 2 | 0.0416667 | 498 | 0.0079584 | 0.0337082 | 0.1910028 | specific_oct_bim |
| stop-W | 2 | 0.0416667 | 9 | 0.0001438 | 0.0415228 | 0.0034519 | specific_oct_bim |
| KR | 2 | 0.0416667 | 8647 | 0.1334393 | -0.0917726 | 3.2025432 | specific_squid |
| SG | 2 | 0.0416667 | 4518 | 0.0697211 | -0.0280545 | 1.6733075 | specific_squid |
| NS | 2 | 0.0416667 | 3451 | 0.0532554 | -0.0115887 | 1.2781284 | specific_squid |
| syn | 15 | 0.3125000 | 22614 | 0.3489761 | -0.0364761 | 1.1167235 | specific_squid |
| KE | 2 | 0.0416667 | 2886 | 0.0445363 | -0.0028697 | 1.0688724 | specific_squid |
| MV | 1 | 0.0208333 | 1387 | 0.0214040 | -0.0005707 | 1.0273916 | specific_squid |
| RG | 2 | 0.0416667 | 2706 | 0.0417586 | -0.0000919 | 1.0022068 | specific_squid |
| ND | 2 | 0.0416667 | 2678 | 0.0413265 | 0.0003401 | 0.9918365 | specific_squid |
| YC | 2 | 0.0416667 | 2622 | 0.0404623 | 0.0012043 | 0.9710961 | specific_squid |
| IM | 1 | 0.0208333 | 1252 | 0.0193207 | 0.0015126 | 0.9273931 | specific_squid |
| QR | 2 | 0.0416667 | 2372 | 0.0366044 | 0.0050623 | 0.8785050 | specific_squid |
| IV | 3 | 0.0625000 | 2984 | 0.0460487 | 0.0164513 | 0.7367788 | specific_squid |
| EG | 2 | 0.0416667 | 1910 | 0.0294749 | 0.0121918 | 0.7073965 | specific_squid |
| TA | 4 | 0.0833333 | 3410 | 0.0526226 | 0.0307107 | 0.6314717 | specific_squid |
| DG | 2 | 0.0416667 | 782 | 0.0120677 | 0.0295990 | 0.2896252 | specific_squid |
| HR | 2 | 0.0416667 | 569 | 0.0087807 | 0.0328859 | 0.2107375 | specific_squid |
| stop-W | 2 | 0.0416667 | 13 | 0.0002006 | 0.0414661 | 0.0048147 | specific_squid |
| KR | 2 | 0.0416667 | 12722 | 0.1315330 | -0.0898663 | 3.1567912 | specific_oct_vul |
| SG | 2 | 0.0416667 | 6870 | 0.0710290 | -0.0293624 | 1.7046970 | specific_oct_vul |
| NS | 2 | 0.0416667 | 5197 | 0.0537319 | -0.0120652 | 1.2895648 | specific_oct_vul |
| KE | 2 | 0.0416667 | 4652 | 0.0480971 | -0.0064304 | 1.1543305 | specific_oct_vul |
| syn | 15 | 0.3125000 | 33868 | 0.3501618 | -0.0376618 | 1.1205178 | specific_oct_vul |
| ND | 2 | 0.0416667 | 4511 | 0.0466393 | -0.0049726 | 1.1193433 | specific_oct_vul |
| IM | 1 | 0.0208333 | 2221 | 0.0229630 | -0.0021296 | 1.1022219 | specific_oct_vul |
| YC | 2 | 0.0416667 | 4237 | 0.0438064 | -0.0021397 | 1.0513539 | specific_oct_vul |
| MV | 1 | 0.0208333 | 1970 | 0.0203679 | 0.0004655 | 0.9776574 | specific_oct_vul |
| RG | 2 | 0.0416667 | 3692 | 0.0381716 | 0.0034950 | 0.9161196 | specific_oct_vul |
| QR | 2 | 0.0416667 | 3083 | 0.0318752 | 0.0097915 | 0.7650045 | specific_oct_vul |
| IV | 3 | 0.0625000 | 4565 | 0.0471976 | 0.0153024 | 0.7551618 | specific_oct_vul |
| TA | 4 | 0.0833333 | 5101 | 0.0527393 | 0.0305940 | 0.6328719 | specific_oct_vul |
| EG | 2 | 0.0416667 | 2287 | 0.0236453 | 0.0180213 | 0.5674879 | specific_oct_vul |
| DG | 2 | 0.0416667 | 963 | 0.0099565 | 0.0317102 | 0.2389553 | specific_oct_vul |
| HR | 2 | 0.0416667 | 767 | 0.0079300 | 0.0337366 | 0.1903206 | specific_oct_vul |
| stop-W | 2 | 0.0416667 | 15 | 0.0001551 | 0.0415116 | 0.0037220 | specific_oct_vul |
With this plot we are trying to explore the amino acid changes that are 1) the most different in all of species and the most similar. 2) how and why are they different or similar? 3) And what is the general pattern?
First observation:
Humans have a different pattern from each one of cephalopod species, but cephalopod species together have a similar pattern which means that this amino acid preference is connected to cephalopod species being special.
Second observation:
EG, TA, KE changes in human are most different and therefore are more likely to have a negative effect. As well contributed to cephalopod species being special.
If we check these three changes, it turns out: EG, KE are radical changes and TA is conserved.
Observation:
Squid and sepia have more similar number of edits compared to oct_bim and oct_vul because they are just different kinds of octapus.
How are the distribution different/similar between different species?
First, we notice that KR and synonymous are conserved. We can observe that they have higher frequency in cephalopod species which means that the ratio of radical to conserved in cephalopod species is less than this ratio in humans and conserved_cephalopods.
Which is exactly the difference in ratios of radical to conserved in all the species from the research:
Ratio of radical to conserved
A part of a RNA that was changed is called an editing site. Here we wanted to see if for any two editing sites, the distance is correlated with the difference of editing levels for the two sites. Distance is the number of nucleotides(letters) between two editing sites and an editing level is The percentage of RNA that is edited.
Therefore, for every possible pair of editing sites, their distance and difference in editing level were recorded.
Data sets consist of three columns: the first column is the gene name, the second column is the distance, the third column is the difference in editing level.
Since there are a lot of data points in these graphs, it is difficult to see any correlation, but for octopus vulgaris and octopus bimaculoides, we can see that the bigger the difference in editing level the smaller the distance between editing sites. So, let’s look at the part with distance > 5000 and the difference in editing level > 10 so we can see the trend.
## `geom_smooth()` using method = 'gam'
Or randomly choosing 1500 data points:
## `geom_smooth()` using method = 'gam'
As distance increases there are less data points and the difference in editing level is smaller.
Now it is clear that smaller distance and difference in editing level occur more often than bigger values. Also, from the “Difference in editing level in octopus bimaculoides” histogram we can see that there is a high concentration of data when difference in editing level is around 0, 25/2, and 25 on the “distance vs. diff_in_editing_level in octopus bimaculoides” graph because these values of differences occur just more often.