WORKFLOW

Corrections made:

-take the reverse complement of negative strand and blast (didn’t do it for previous result)

-use a different identifier (FastDBID) instead of gene symbol because some rows don’t have gene symbols

-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved

Number of total cassette exons: 5051
Number of genes with cassette exons: 3037

Number of total cassette splicing events: 19015

BLAST RESULTS:

Species Number of Conserved Cassette Splicing Events Number of Non-conserved Cassette Splicing Events Number of genes with conserved splicing events
lamprey 6208 7415 1086
spotted gar 11557 3898 1643
zebrafish 10136 5514 1375
fugu 9767 4895 1409
coelacanth 7892 6317 1448
human 15478 1743 2242
C. elegans 1983 1.438910^{4} 242

Splicing Event Conservation:

upset(fromList(listInput), nsets = 7, order.by = "freq")

  • i.e 164 mouse genes have cassette splicing events that are conserved in all 5 fish species, human and C. elegans

TODO: blast against other mammals
-take the first hit (the one with the highest bit score)
-legnth vs bit