Introduction

What is RNA editing?

RNA editing is a process that changes a RNA transcript such that it would no longer correspond to a sequence of DNA in a genome. A-to-I RNA editing is widespread in animals and results in the modification of a adenosine to inosine which will be read as a guanine. A-to-I editing in messenger RNA (mRNA) can cause changes in the amino acid sequence of a protein (amino acid recoding). It was recently discovered that RNA editing and amino acid changes are widespread in cephalopods (octopus and squid). Article: Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods, Cell 169, 191–202 (2017).

(RNA is a copy of one specific part of DNA, but after being created it can still be changed in a way that the DNA did not program it. There can be many types of changes. We are looking at the one where nucleotide A changes to nucleotide G, which changes the codon. Codons (64 combinations in total) are codes for specific amino acids. (one amino acids can relate to more than one codon). Nucleotides are being changed by enzymes that catalyze the editing.)

Codons

Codons

Amino acids

Amino acids

Description of the data:

The data set consists of calculated expected distribution (“Expected amount”, “Expected frequency”) and observed distribution (“edits”, “frequency”) of amino acid changes in humans, four cephalopod species, and conserved edits from those species.

KR = represents a change from amino acid K to amino acid R syn = synonymous, without any changes to amino acid, likely doesn’t have an effect stop_W = stop to w, causes a significant change in a protein sequence

Types of amino acid changes:

  • All types of amino acids are categorized as either radical or conserved
  • Radical changes are those that change the physicochemical property of the amino acid while conservative changes do not.

The ratio of radical to conservative changes indicates how many changes are likely to have a negative effect.

Amino acid changes: R-radical,C-conserved Amino acid changes also can be random or non-random: - Random edits = likely to be slightly bad = doesn’t make an animal better - Non-random edits = likely to be good = likely to be actively preserved in the population = seen more in conserved

In humans most changes are random, therefore, they are most likely to have a negative effect. In Individual cephalopod species, amino acid changes are a lot less random and they are less likely to have a negative effect. In conserved, changes are least random (assumption) and most likely to have a positive effect. 1. Human vs. individual ocean species 2. Individual cephalopod species vs. conserved

Data

Change Expected_amount Expected_frequency edits frequency difference fold_difference sp_name
KR 2 0.0606061 220 0.1424870 -0.0818810 2.3510363 human
SG 2 0.0606061 151 0.0977979 -0.0371919 1.6136658 human
IM 1 0.0303030 38 0.0246114 0.0056916 0.8121762 human
TA 4 0.1212121 148 0.0958549 0.0253572 0.7908031 human
IV 3 0.0909091 101 0.0654145 0.0254946 0.7195596 human
MV 1 0.0303030 31 0.0200777 0.0102253 0.6625648 human
HR 2 0.0606061 42 0.0272021 0.0334040 0.4488342 human
NS 2 0.0606061 93 0.0602332 0.0003729 0.9938472 human
ND 2 0.0606061 87 0.0563472 0.0042589 0.9297280 human
QR 2 0.0606061 128 0.0829016 -0.0222955 1.3678756 human
KE 2 0.0606061 176 0.1139896 -0.0533836 1.8808290 human
EG 2 0.0606061 114 0.0738342 -0.0132281 1.2182642 human
RG 2 0.0606061 99 0.0641192 -0.0035131 1.0579663 human
YC 2 0.0606061 64 0.0414508 0.0191553 0.6839378 human
DG 2 0.0606061 47 0.0304404 0.0301656 0.5022668 human
stop-W 2 0.0606061 5 0.0032383 0.0573677 0.0534326 human
KR 2 0.0416667 14503 0.1362949 -0.0946282 3.2710767 specific_sepia
SG 2 0.0416667 6856 0.0644306 -0.0227640 1.5463354 specific_sepia
NS 2 0.0416667 5836 0.0548450 -0.0131783 1.3162796 specific_sepia
KE 2 0.0416667 5552 0.0521760 -0.0105094 1.2522249 specific_sepia
YC 2 0.0416667 4843 0.0455131 -0.0038464 1.0923136 specific_sepia
syn 15 0.3125000 36310 0.3412305 -0.0287305 1.0919377 specific_sepia
ND 2 0.0416667 4807 0.0451748 -0.0035081 1.0841940 specific_sepia
MV 1 0.0208333 2396 0.0225169 -0.0016836 1.0808108 specific_sepia
RG 2 0.0416667 4086 0.0383990 0.0032677 0.9215762 specific_sepia
IM 1 0.0208333 2003 0.0188236 0.0020097 0.9035326 specific_sepia
QR 2 0.0416667 3678 0.0345647 0.0071019 0.8295539 specific_sepia
IV 3 0.0625000 5041 0.0473738 0.0151262 0.7579810 specific_sepia
EG 2 0.0416667 2902 0.0272721 0.0143945 0.6545311 specific_sepia
TA 4 0.0833333 5523 0.0519035 0.0314298 0.6228421 specific_sepia
DG 2 0.0416667 1103 0.0103657 0.0313010 0.2487759 specific_sepia
HR 2 0.0416667 947 0.0088996 0.0327670 0.2135910 specific_sepia
stop-W 2 0.0416667 23 0.0002161 0.0414505 0.0051875 specific_sepia
KR 2 0.0416667 243 0.2120419 -0.1703752 5.0890052 conserved_cephalopods
SG 2 0.0416667 106 0.0924956 -0.0508290 2.2198953 conserved_cephalopods
YC 2 0.0416667 66 0.0575916 -0.0159250 1.3821990 conserved_cephalopods
RG 2 0.0416667 64 0.0558464 -0.0141798 1.3403141 conserved_cephalopods
QR 2 0.0416667 63 0.0549738 -0.0133072 1.3193717 conserved_cephalopods
IM 1 0.0208333 30 0.0261780 -0.0053447 1.2565445 conserved_cephalopods
MV 1 0.0208333 26 0.0226876 -0.0018543 1.0890052 conserved_cephalopods
NS 2 0.0416667 46 0.0401396 0.0015271 0.9633508 conserved_cephalopods
KE 2 0.0416667 43 0.0375218 0.0041449 0.9005236 conserved_cephalopods
ND 2 0.0416667 40 0.0349040 0.0067627 0.8376963 conserved_cephalopods
IV 3 0.0625000 58 0.0506108 0.0118892 0.8097731 conserved_cephalopods
syn 15 0.3125000 259 0.2260035 0.0864965 0.7232112 conserved_cephalopods
EG 2 0.0416667 32 0.0279232 0.0137435 0.6701571 conserved_cephalopods
TA 4 0.0833333 44 0.0383944 0.0449389 0.4607330 conserved_cephalopods
HR 2 0.0416667 13 0.0113438 0.0303229 0.2722513 conserved_cephalopods
DG 2 0.0416667 11 0.0095986 0.0320681 0.2303665 conserved_cephalopods
stop-W 2 0.0416667 2 0.0017452 0.0399215 0.0418848 conserved_cephalopods
KR 2 0.0416667 8304 0.1327048 -0.0910381 3.1849141 specific_oct_bim
SG 2 0.0416667 4503 0.0719616 -0.0302950 1.7270795 specific_oct_bim
NS 2 0.0416667 3484 0.0556772 -0.0140105 1.3362525 specific_oct_bim
IM 1 0.0208333 1543 0.0246584 -0.0038251 1.1836037 specific_oct_bim
KE 2 0.0416667 3020 0.0482621 -0.0065954 1.1582901 specific_oct_bim
ND 2 0.0416667 3006 0.0480384 -0.0063717 1.1529205 specific_oct_bim
syn 15 0.3125000 21881 0.3496764 -0.0371764 1.1189644 specific_oct_bim
YC 2 0.0416667 2657 0.0424610 -0.0007944 1.0190651 specific_oct_bim
MV 1 0.0208333 1225 0.0195765 0.0012568 0.9396724 specific_oct_bim
RG 2 0.0416667 2315 0.0369956 0.0046711 0.8878945 specific_oct_bim
QR 2 0.0416667 2014 0.0321854 0.0094813 0.7724491 specific_oct_bim
IV 3 0.0625000 2986 0.0477187 0.0147813 0.7634998 specific_oct_bim
TA 4 0.0833333 3287 0.0525290 0.0308044 0.6303476 specific_oct_bim
EG 2 0.0416667 1267 0.0202477 0.0214190 0.4859449 specific_oct_bim
DG 2 0.0416667 576 0.0092050 0.0324617 0.2209189 specific_oct_bim
HR 2 0.0416667 498 0.0079584 0.0337082 0.1910028 specific_oct_bim
stop-W 2 0.0416667 9 0.0001438 0.0415228 0.0034519 specific_oct_bim
KR 2 0.0416667 8647 0.1334393 -0.0917726 3.2025432 specific_squid
SG 2 0.0416667 4518 0.0697211 -0.0280545 1.6733075 specific_squid
NS 2 0.0416667 3451 0.0532554 -0.0115887 1.2781284 specific_squid
syn 15 0.3125000 22614 0.3489761 -0.0364761 1.1167235 specific_squid
KE 2 0.0416667 2886 0.0445363 -0.0028697 1.0688724 specific_squid
MV 1 0.0208333 1387 0.0214040 -0.0005707 1.0273916 specific_squid
RG 2 0.0416667 2706 0.0417586 -0.0000919 1.0022068 specific_squid
ND 2 0.0416667 2678 0.0413265 0.0003401 0.9918365 specific_squid
YC 2 0.0416667 2622 0.0404623 0.0012043 0.9710961 specific_squid
IM 1 0.0208333 1252 0.0193207 0.0015126 0.9273931 specific_squid
QR 2 0.0416667 2372 0.0366044 0.0050623 0.8785050 specific_squid
IV 3 0.0625000 2984 0.0460487 0.0164513 0.7367788 specific_squid
EG 2 0.0416667 1910 0.0294749 0.0121918 0.7073965 specific_squid
TA 4 0.0833333 3410 0.0526226 0.0307107 0.6314717 specific_squid
DG 2 0.0416667 782 0.0120677 0.0295990 0.2896252 specific_squid
HR 2 0.0416667 569 0.0087807 0.0328859 0.2107375 specific_squid
stop-W 2 0.0416667 13 0.0002006 0.0414661 0.0048147 specific_squid
KR 2 0.0416667 12722 0.1315330 -0.0898663 3.1567912 specific_oct_vul
SG 2 0.0416667 6870 0.0710290 -0.0293624 1.7046970 specific_oct_vul
NS 2 0.0416667 5197 0.0537319 -0.0120652 1.2895648 specific_oct_vul
KE 2 0.0416667 4652 0.0480971 -0.0064304 1.1543305 specific_oct_vul
syn 15 0.3125000 33868 0.3501618 -0.0376618 1.1205178 specific_oct_vul
ND 2 0.0416667 4511 0.0466393 -0.0049726 1.1193433 specific_oct_vul
IM 1 0.0208333 2221 0.0229630 -0.0021296 1.1022219 specific_oct_vul
YC 2 0.0416667 4237 0.0438064 -0.0021397 1.0513539 specific_oct_vul
MV 1 0.0208333 1970 0.0203679 0.0004655 0.9776574 specific_oct_vul
RG 2 0.0416667 3692 0.0381716 0.0034950 0.9161196 specific_oct_vul
QR 2 0.0416667 3083 0.0318752 0.0097915 0.7650045 specific_oct_vul
IV 3 0.0625000 4565 0.0471976 0.0153024 0.7551618 specific_oct_vul
TA 4 0.0833333 5101 0.0527393 0.0305940 0.6328719 specific_oct_vul
EG 2 0.0416667 2287 0.0236453 0.0180213 0.5674879 specific_oct_vul
DG 2 0.0416667 963 0.0099565 0.0317102 0.2389553 specific_oct_vul
HR 2 0.0416667 767 0.0079300 0.0337366 0.1903206 specific_oct_vul
stop-W 2 0.0416667 15 0.0001551 0.0415116 0.0037220 specific_oct_vul

Visualization and Exploration

Expected frequency vs. actual frequency

The comparison of frequencies of changes in all species

With this plot we are trying to explore the amino acid changes that are 1) the most different in all of species and the most similar. 2) how and why are they different or similar? 3) And what is the general pattern?

First observation:

Humans have a different pattern from each one of cephalopod species, but cephalopod species together have a similar pattern which means that this amino acid preference is connected to cephalopod species being special.

Second observation:

EG, TA, KE changes in human are most different and therefore are more likely to have a negative effect. As well contributed to cephalopod species being special.

If we check these three changes, it turns out: EG, KE are radical changes and TA is conserved.

The comparison of amount of changes in cephalopod species

Observation:

Squid and sepia have more similar number of edits compared to oct_bim and oct_vul because they are just different kinds of octapus.

Frequencies in different species

How are the distribution different/similar between different species?

First, we notice that KR and synonymous are conserved. We can observe that they have higher frequency in cephalopod species which means that the ratio of radical to conserved in cephalopod species is less than this ratio in humans and conserved_cephalopods.

Which is exactly the difference in ratios of radical to conserved in all the species from the research:

Ratio of radical to conserved

Ratio of radical to conserved

Modeling

A part of a RNA that was changed is called an editing site. Here we wanted to see if for any two editing sites, the distance is correlated with the difference of editing levels for the two sites. Distance is the number of nucleotides(letters) between two editing sites and an editing level is The percentage of RNA that is edited. Example of a gene sequence

Therefore, for every possible pair of editing sites, their distance and difference in editing level were recorded.

Data sets consist of three columns: the first column is the gene name, the second column is the distance, the third column is the difference in editing level.

Distance vs. Difference in editing level

Since there are a lot of data points in these graphs, it is difficult to see any correlation, but for octopus vulgaris and octopus bimaculoides, we can see that the bigger the difference in editing level the smaller the distance between editing sites. So, let’s look at the part with distance > 5000 and the difference in editing level > 10 so we can see the trend.

## `geom_smooth()` using method = 'gam'

Or randomly choosing 1500 data points:

## `geom_smooth()` using method = 'gam'

As distance increases there are less data points and the difference in editing level is smaller.

Distribution of a distance variable

Distribution of difference in editing levels

Now it is clear that smaller distance and difference in editing level occur more often than bigger values. Also, from the “Difference in editing level in octopus bimaculoides” histogram we can see that there is a high concentration of data when difference in editing level is around 0, 25/2, and 25 on the “distance vs. diff_in_editing_level in octopus bimaculoides” graph because these values of differences occur just more often.