Background

Large scale genomic mutations mark significant steps in bacterial evolution. Inversions of large portions of the genome may serve as a marker for evolutionary pathways if they occur within inversions that have occurred previously.

Chromosomal inversions occur during bacterial replication and have been observed to favour symmetry relative to the origin of replication (ori) and the terminus (ter). Symmetry of these inversions can be visualized through the pairwise alignment of the maximal-unique matches (MUMs) between complete genomes, which is represented by a roughly even X-shape in a segment plot. Inversions that are nested within other inversions represent events that mark mutation events along an evolutionary pathway.

Methods

All complete genomes for an organism are collected from the NCBI database for prokaryotic organisms. To meaningfully identify inversion positions at the alignment step, all genomes are “synchronized” such that each complete sequence begins at the reference ori sequence on the same strand. Alignments are generated and filtered using nucmer and delta-filter from Mummer version 3. This generates an alignment file that reports alignment regions unique to the reference and query sequences. The alignment is further filtered such that the minimum length and maximum gap size for a continuous alignment are both filtered at 10 kb. Inversions are therefore identified as alignment blocks that span at least 10 kb, and symmetric inversions are identified as regions that are at most 500 kb away from the diagonal line of symmetry, and 1 Mbp difference in length to each breakpoint.

Groups of colinear genomes are identified from alignments generated by comparing one genome versus all other in the collection. To identify the best candidate genome, 5 randomly sampled genomes are aligned to 5 other randomly sampled genomes, and selected based on which is part of the largest colinear group. Genomes are sorted based on their relationship to the selected reference genome, and symmetric inversion events are used to construct evolutionary pathways.

Salmonella enterica

There were up to 5 inversions observed comparing each of the 775 genomes to the reference. Inversions are filtered to have near X-symmetry such that the breakpoints are nearly equidistant, and the distance from the diagonal X is less than 500 kb. Genoforms are groups of genomes that share the same inversion pattern relative to the reference genome. The following table shows how many inversions are filtered for symmetry of the total:

Inversions Symmetric Genoforms Symmetric Genomes Total Genomes
0 NA 394 394
1 15 190 231
2 11 14 23
3 6 105 107
4 4 11 16
5 3 5 5
Total 39 719 776
Sorting

Each genome is screened against the reference genome “A”, and is sorted into groups or “genoforms” based on the positions of inversions observed relative to A. Here are the representatives, populations, and aliases associated with each genoform for Salmonella:

genome_representative genoforms Inversions Alias
NZ_CP019403.1 394 0 A
CP006876.1 156 1 B
NC_006511.1 9 1 C
NZ_CP016406.1 4 1 D
NZ_CP018642.1 4 1 E
CP029989.1 3 1 F
LR133909.1 3 1 G
NC_021820.1 3 1 H
NC_010067.1 1 1 I
NZ_CP017728.1 1 1 J
NZ_CP018219.1 1 1 K
NZ_CP022139.1 1 1 L
NZ_CP028199.1 1 1 M
NZ_CP030231.1 1 1 N
NZ_LN890522.1 1 1 O
NZ_LR134158.1 1 1 P
NC_011274.1 2 2 Q
NC_021902.1 2 2 R
NZ_LT904887.1 2 2 S
NC_012125.1 1 2 T
NC_021151.1 1 2 U
NZ_CP018648.1 1 2 V
NZ_CP025453.1 1 2 W
NZ_CP030838.1 1 2 X
NZ_CP033352.2 1 2 Y
NZ_LN890520.1 1 2 Z
NZ_LT904872.1 1 2 AA
NC_003198.1 74 3 AB
LT904885.1 22 3 AC
CP003047.1 4 3 AD
NZ_LT904875.1 3 3 AE
NZ_CP009768.1 1 3 AF
NZ_CP019409.1 1 3 AG
NC_004631.1 6 4 AH
NZ_CP030936.1 3 4 AI
NZ_CP022963.1 1 4 AJ
NZ_LT904777.1 1 4 AK
LT905141.1 3 5 AL
NZ_CP034074.1 1 5 AM
NZ_LT904883.1 1 5 AN

The 15 symmetric genoforms with one inversion

11 with two

and 6 with three

Identification of nested inversions

Inversion events between genomes can happen 2 different ways: (1) A single inversion can be nested between two others, meaning that the inverted alignment of the single inversion would be the length of the middle colinear alignment between each of the double inversions, or (2) a double inversion is nested within a larger single inversion. Each nested inversion suggests an event predicated on the previous inversion event.

Here is an example of both cases where case (1) is top-bottom, and case (2) is left-right. Genomes C and D are separate inversion events that occurred after the first inversion occurred in B. The three inversions in F could have occurred within the existing inversion of D or as a new inversion outside of C

Below is a working sketch of the inversion pathways that are generated from the reference genome NZ_CP019403.1. Each line represents an individual inversion between two genomes. Paths are only considered for inversions nested within others:

Next Steps
Complete the inversion pathway for Salmonella
steps:
  1. Reformat completed table to show populations, relationships between branches
  2. Automate nesting detection to build tree as part of the computational pipeline
    Estimated completion time: Early next week (Monday/Tuesday)

Other Main figures

Show replication pathways in other organisms

Compare nested inversion pathway with other measures of genomic diversity

Draft of paper

Estimated completion time: Before the break