WORKFLOW

blastx against genbank protein database

-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70 and gap introduced is less than 30% of query length)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved

Number of total cassette exons: 4926
Number of genes with cassette exons: 2961

Number of total cassette splicing events: 18354

Non-conserved means blast hits for either the spliced in or spliced out sequences (exclusive or)

BLAST RESULTS:

Species Number of Conserved Cassette Splicing Events (# of genes) Number of Non-conserved Cassette Splicing Events (# of genes) Number of genes with at least one conserved isoform
spotted gar 5274 (933) 7727 (1630) 2128
zebrafish 5304 (948) 7625 (1578) 2102
fugu 2093 (462) 6258 (1187) 1447
coelacanth 4995 (883) 8192 (1725) 2185
human 6731 (1343) 8337 (1733) 2493

Splicing Event Conservation:

upset(fromList(listInput), nsets = 5, order.by = "freq")

Denominator: 2961 genes

25 cassette splicing events that are conserved in all 5 fish species (not human)
13 genes have cassette splicing events that are conserved in all 5 fish species (not human)
1750 cassette splicing events are conserved in all 5 fish species and human
373 genes have cassette splicing events that are conserved in all 5 fish species and human
6464 cassette splicing events from 1161 are conserved in at least one fish species