WORKFLOW

-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved

Number of total cassette exons: 4919
Number of genes with cassette exons: 2957

Number of total cassette splicing events: 19009

BLAST RESULTS:

Species Number of Conserved Cassette Splicing Events Number of Non-conserved Cassette Splicing Events Number of genes with conserved splicing events
lamprey 5994 7686 1028
spotted gar 11046 4260 1525
zebrafish 9546 5891 1194
fugu 9284 5316 1285
coelacanth 7424 6685 1315

Splicing Event Conservation:

upset(fromList(listInput), order.by = "freq")

  • i.e 550 mouse genes have cassette splicing events that are conserved in all 5 fish species
    TODO: blast against other mammals

-take the first hit (the one with the highest bit score) -legnth vs bit -play around with the data