-For each cassette exons, find the upstream and downstream exons
-Blast sequences with cassette exon spliced in and spliced out against the five fish species (e-value threshold = 0.1 and identity % >= 30 and query coverage >= 70 and gap introduced is less than 30% of query length)
-If there are blast results for both spliced in and spliced out sequences, then that splicing event is conserved
Number of total cassette exons: 5051
Number of genes with cassette exons: 3037
Number of total cassette splicing events: 19015
Number of genes with conserved spliced in OR (inclusive) spliced out isoforms in at least one species: 2367
Non-conserved means blast hits for either the spliced in or spliced out sequences (exclusive or)
| Species | Number of Conserved Cassette Splicing Events (# of genes) | Number of Non-conserved Cassette Splicing Events (# of genes) | Number of genes with at least one conserved isoform |
|---|---|---|---|
| lamprey | 73 (20) | 436 (174) | 184 |
| spotted gar | 7257 (906) | 3798 (987) | 1590 |
| zebrafish | 6647 (892) | 3815 (972) | 1564 |
| fugu | 3045 (541) | 2754 (737) | 1107 |
| coelacanth | 6978 (1008) | 3967 (990) | 1647 |
| human | 10322 (1560) | 3718 (992) | 2134 |
| C. elegans | 1817 (146) | 1678 (452) | 534 |
upset(fromList(listInput), nsets = 7, order.by = "freq")
TODO: blast against other mammals
-take the first hit (the one with the highest bit score)
-legnth vs bit