WORKFLOW

blastx against ncbi non-redundant database

-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved

Number of total cassette exons: 5051
Number of genes with cassette exons: 3037

Number of total cassette splicing events: 19015

BLAST RESULTS:

Species Number of Conserved Cassette Splicing Events Number of Non-conserved Cassette Splicing Events Number of genes with conserved splicing events
lamprey 76 1.850410^{4} 23
spotted gar 7681 7784 1101
zebrafish 7017 8366 1070
fugu 3284 1.312710^{4} 660
coelacanth 7443 7855 1210
human 10901 4735 1764
C. elegans 1863 1.549410^{4} 172

Splicing Event Conservation:

upset(fromList(listInput), nsets = 7, order.by = "freq")

  • i.e 11 mouse genes have cassette splicing events that are conserved in all 5 fish species, human and C. elegans

TODO: blast against other mammals
-take the first hit (the one with the highest bit score)
-legnth vs bit