WORKFLOW

blastx against genbank protein database

-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved

Number of total cassette exons: 5051
Number of genes with cassette exons: 3037

Number of total cassette splicing events: 19015

BLAST RESULTS:

Species Number of Conserved Cassette Splicing Events Number of Non-conserved Cassette Splicing Events Number of genes with conserved splicing events
lamprey 73 1.850610^{4} 20
spotted gar 7257 7960 906
zebrafish 6647 8553 892
fugu 3045 1.321610^{4} 541
coelacanth 6978 8070 1008
human 10322 4975 1560
C. elegans 1817 1.55210^{4} 146

Splicing Event Conservation:

upset(fromList(listInput), nsets = 7, order.by = "freq")

  • i.e 10 mouse genes have cassette splicing events that are conserved in all 5 fish species, human and C. elegans

TODO: blast against other mammals
-take the first hit (the one with the highest bit score)
-legnth vs bit