Large scale genomic mutations mark significant steps in bacterial evolution. Inversions of large portions of the genome may serve as a marker for evolutionary pathways if they occur within inversions that have occurred previously.
Chromosomal inversions occur during bacterial replication and have been observed to favour symmetry relative to the origin of replication (ori) and the terminus (ter). Symmetry of these inversions can be visualized through the pairwise alignment of the maximal-unique matches (MUMs) between complete genomes, which is represented by a roughly even X-shape in a segment plot. Inversions that are nested within other inversions represent events that mark mutation events along an evolutionary pathway.
All complete genomes for an organism are collected from the NCBI database for prokaryotic organisms. To meaningfully identify inversion positions at the alignment step, all genomes are “synchronized” such that each complete sequence begins at the reference ori sequence on the same strand. Alignments are generated and filtered using nucmer and delta-filter from Mummer version 3. This generates an alignment file that reports alignment regions unique to the reference and query sequences. The alignment is further filtered such that the minimum length and maximum gap size for a continuous alignment are both filtered at 10 kb. Inversions are therefore identified as alignment blocks that span at least 10 kb, and symmetric inversions are identified as regions that are at most 500 kb away from the diagonal line of symmetry, and 1 Mbp difference in length to each breakpoint.
Groups of colinear genomes are identified from alignments generated by comparing one genome versus all other in the collection. To identify the best candidate genome, 5 randomly sampled genomes are aligned to 5 other randomly sampled genomes, and selected based on which is part of the largest colinear group. Genomes are sorted based on their relationship to the selected reference genome, and symmetric inversion events are used to construct evolutionary pathways.
There were up to 5 inversions observed comparing each of the 775 genomes to the reference. Inversions are filtered to have near X-symmetry such that the breakpoints are nearly equidistant, and the distance from the diagonal X is less than 500 kb. Genoforms are groups of genomes that share the same inversion pattern relative to the reference genome. The following table shows how many inversions are filtered for symmetry of the total:
| Inversions | Symmetric Genoforms | Symmetric Genomes | Total Genomes |
|---|---|---|---|
| 0 | NA | 394 | 394 |
| 1 | 15 | 190 | 231 |
| 2 | 11 | 14 | 23 |
| 3 | 6 | 105 | 107 |
| 4 | 4 | 11 | 16 |
| 5 | 3 | 5 | 5 |
| Total | 39 | 719 | 776 |
Each genome is screened against the reference genome “A”, and is sorted into groups or “genoforms” based on the positions of inversions observed relative to A. Here are the representatives, populations, and aliases associated with each genoform for Salmonella:
| genome_representative | genoforms | Inversions | Alias |
|---|---|---|---|
| NZ_CP019403.1 | 394 | 0 | A |
| CP006876.1 | 156 | 1 | B |
| NC_006511.1 | 9 | 1 | C |
| NZ_CP016406.1 | 4 | 1 | D |
| NZ_CP018642.1 | 4 | 1 | E |
| CP029989.1 | 3 | 1 | F |
| LR133909.1 | 3 | 1 | G |
| NC_021820.1 | 3 | 1 | H |
| NC_010067.1 | 1 | 1 | I |
| NZ_CP017728.1 | 1 | 1 | J |
| NZ_CP018219.1 | 1 | 1 | K |
| NZ_CP022139.1 | 1 | 1 | L |
| NZ_CP028199.1 | 1 | 1 | M |
| NZ_CP030231.1 | 1 | 1 | N |
| NZ_LN890522.1 | 1 | 1 | O |
| NZ_LR134158.1 | 1 | 1 | P |
| NC_011274.1 | 2 | 2 | Q |
| NC_021902.1 | 2 | 2 | R |
| NZ_LT904887.1 | 2 | 2 | S |
| NC_012125.1 | 1 | 2 | T |
| NC_021151.1 | 1 | 2 | U |
| NZ_CP018648.1 | 1 | 2 | V |
| NZ_CP025453.1 | 1 | 2 | W |
| NZ_CP030838.1 | 1 | 2 | X |
| NZ_CP033352.2 | 1 | 2 | Y |
| NZ_LN890520.1 | 1 | 2 | Z |
| NZ_LT904872.1 | 1 | 2 | AA |
| NC_003198.1 | 74 | 3 | AB |
| LT904885.1 | 22 | 3 | AC |
| CP003047.1 | 4 | 3 | AD |
| NZ_LT904875.1 | 3 | 3 | AE |
| NZ_CP009768.1 | 1 | 3 | AF |
| NZ_CP019409.1 | 1 | 3 | AG |
| NC_004631.1 | 6 | 4 | AH |
| NZ_CP030936.1 | 3 | 4 | AI |
| NZ_CP022963.1 | 1 | 4 | AJ |
| NZ_LT904777.1 | 1 | 4 | AK |
| LT905141.1 | 3 | 5 | AL |
| NZ_CP034074.1 | 1 | 5 | AM |
| NZ_LT904883.1 | 1 | 5 | AN |
Inversion events between genomes can happen 2 different ways: (1) A single inversion can be nested between two others, meaning that the inverted alignment of the single inversion would be the length of the middle colinear alignment between each of the double inversions, or (2) a double inversion is nested within a larger single inversion. Each nested inversion suggests an event predicated on the previous inversion event.
Here is an example of both cases where case (1) is top-bottom, and case (2) is left-right. Genomes C and D are separate inversion events that occurred after the first inversion occurred in B. The three inversions in F could have occurred within the existing inversion of D or as a new inversion outside of C
Below is a working sketch of the inversion pathways that are generated from the reference genome NZ_CP019403.1. Each line represents an individual inversion between two genomes. Paths are only considered for inversions nested within others:
Estimated completion time: Before the break