-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved
Number of total cassette exons: 5051
Number of genes with cassette exons: 3037
Number of total cassette splicing events: 19015
Non-conserved means blast hits for either the spliced in or spliced out sequences (exclusive or)
| Species | Number of Conserved Cassette Splicing Events (# of genes) | Number of Non-conserved Cassette Splicing Events (# of genes) | Number of genes with at least one conserved isoform |
|---|---|---|---|
| lamprey | 3499 (427) | 3260 (697) | 969 |
| spotted gar | 7358 (899) | 3875 (1036) | 1628 |
| zebrafish | 6783 (889) | 3806 (972) | 1559 |
| fugu | 6372 (825) | 3865 (938) | 1486 |
| coelacanth | 5496 (938) | 4071 (1016) | 1619 |
| human | 10495 (1559) | 3672 (994) | 2128 |
| C. elegans | 1794 (145) | 1655 (427) | 506 |
upset(fromList(listInput), nsets = 7, order.by = "freq")
TODO: blast against other mammals
-take the first hit (the one with the highest bit score)
-legnth vs bit