This script describes a loop-based approach for generating pairwise gene syteny maps.

The general pipeline:

prepare input files
- export from IMG: the gene info map, containing gene name, gene coordinates, etc. can contain many genes, not just the ones you are working with.
- edit the gene info file to have a “cassette” column, a code for each genome
- a single protein multifasta containing all proteins from all cassettes
- ideally these will be targetted by the R script from the command line, one is base one is target, they are targetted using the “cassette” name
calculate gene relative abundance based on single-copy-conserved-protein gene copy numbers
import and format files
- change fasta file headers, make appropriate names, remove everything after first space, make sure it matches with “locus_tag” from IMG
  - use only locus tags for everything
- format the gene info table to have only crucial information, target actual IMG terms in the program, not column numbers
write sub-multifastas for the target gene clusters
perform blastP searches in the format compatible with genoplotR
- make blastdb with the template
- blast the query against the database
Import and parse the blastP results
Create the genoplot synteny maps, one for each gene cassette compared to the Defo mec cassette

1 Building full IMG cassette sets for DCM cassette

1.1 metagenomes and genomes

This follows identification of cmuA-orthologs. Add all ortholog scaffolds to cart and then add all genes from those scaffolds, save as a new set.

Gather any other orthologs into another set (the ABCD orthologs), add them to DCM_cassette_allorgs gene set

We now have two gene sets, those from isolates and those from metagenomes (note that the DCMF cassette has to be added manually to the final exports)

Combine the two sets into the gene cart and download the proteins.

Manually reformat the gene info table two show

start Coord
End Coord
Strand
DNA seq. length
AA seq. length
Scaffold read depth (used for relative abundance calculation)

Export the table

Save the two new files in project directory

1.2 manual amendment of gene.info table

The main goal here is to

clean the Gene_IDs such that they are unique and can be linked to the fasta headers (which will be cleaned accordingly)
add a code-word for each cassette
add strain DCMF to the list

In the table, these exist in two forms

Goal “1” is accomplished by using only the locus-tags. This is already a column in the gene info table.

2 Calculation of gene-frequency information

Gather an average total genomes count from 10 single-copy conserved proteins;

sccp <- read.csv("DCM.cassettes.all.proteins/single.copy.conserved.functions.tsv",header = T,sep = "\t")
datatable(sccp)

write.csv(sccp,"sccp.function.list.csv")

Genes with these functions are gathered from the metagenomes.

2.1 unique single-copy conserved protein counts

sccp.unique.counts <- read.csv("DCM.cassettes.all.proteins/mgenome.sccp.unique.counts.tsv", header = T, sep = "\t")

write.csv(sccp.unique.counts,"sccp.unique.counts.csv")
datatable(sccp.unique.counts)

Average counts are very consistent.

2.2 calculate average genome count for each metagenome

2.2.1 Build a combination set for each metagenome

Gather the COG proteins into the cart, display read-depths, and export. No need to display the COG names, I will be summing all read depths and dividing by 10

(use the workaround for this: place funcitons in cart > profile and alignment tab > gene cart (at bottom)… this has to be done in batches since it will overload the gene cart)

2.2.2 import and combine the sccp tables

sccp.full.table <- data.frame()
files <- list.files("DCM.cassettes.all.proteins/mgenome.sccp.tables/")
for (x in (1:length(files))) {
  chunk <- read.csv(paste("DCM.cassettes.all.proteins/mgenome.sccp.tables/",files[x],sep=""), header = T, sep = "\t")
  sccp.full.table <- rbind(sccp.full.table,chunk)
}

sccp.full.table <- sccp.full.table[-1,]
write.csv(sccp.full.table, "sccp.full.table.csv")
datatable(head(sccp.full.table))

2.2.3 do summaries of each COG by the genome ID

library(dplyr)
sccp.average.read.depth <- sccp.full.table %>%
  group_by(Genome.Name) %>%
  summarize(avg.sccp.read.depth = sum(Scaffold.Read.Depth)/10)
write.csv(sccp.average.read.depth,"sccp.average.read.depth.csv")
datatable(sccp.average.read.depth)

2.3 calculate gene frequencies for each DCM cassette

The average sccp gene depths are joined into an overall gene info table

2.3.1 import custom gene info table, add sccp.depth, and calculate proportion

gene.info.all <- read.csv("DCM.cassettes.all.proteins/dcm.cassette.gene.info.all.tsv", header = T, sep = "\t")
gene.info.all <- gene.info.all[-8]
gene.info.all <- left_join(gene.info.all, sccp.average.read.depth, by="Genome.Name")
gene.info.all$genes.per.genome.copy <- gene.info.all$read.depth / gene.info.all$avg.sccp.read.depth
write.csv(gene.info.all, "gene.info.all.with.proportions.csv")

DCM cassettes per genome copy reach up to 1% in the deep subsurface, close in Santa Barbara oil seep and in some peat samples

3 Splitting the multifasta by cassette

3.1 cleaning the fasta headers

The goal is to only have the locus tags as fasta headers… the problem is that isolates have genome.id followed by locustag, while mgenome fasta headers start iwth locus tag. This can be handled programmatically but it is a PiTA. So, I start by manually removing genome ID’s from the fasta file in text editor.

library(Biostrings)
dcm.prot.all <- readAAStringSet("DCM.cassettes.all.proteins/dcm.cassette.all.proteins.faa")
name <- names(dcm.prot.all)
seq <- paste(dcm.prot.all)
dcm.prot.all <- data.frame(name,seq,stringsAsFactors = F)

#split by space and take only the first item from each
## strategy is to use lapply across the column, define a simple split function first
splitfun <- function(x){
  strsplit(x," ")[[1]][1]
}
dcm.prot.all$name <- unlist(lapply(dcm.prot.all$name, splitfun))

# custom function for writing fasta files from data frames
writeFasta<-function(data, filename){
  fastaLines = c()
  for (rowNum in 1:nrow(data)){
    fastaLines = c(fastaLines, as.character(paste(">", data[rowNum,"name"], sep = "")))
    fastaLines = c(fastaLines,as.character(data[rowNum,"seq"]))
  }
  fileConn<-file(filename)
  writeLines(fastaLines, fileConn)
  close(fileConn)
}

writeFasta(dcm.prot.all, "DCM.cassettes.all.proteins/dcm.cassette.all.proteins.clean.faa")

3.2 write a multifasta for each cassette

add the gene.info.all table to the left, filter down to columns with only cassette name, split it, and write the multiple fasta files

#change the name column to be compatible with gene info table
colnames(dcm.prot.all)[1] <- "Locus.Tag"
#join the full table
dcm.prot.all.cass <- left_join(dcm.prot.all, gene.info.all, by="Locus.Tag")
#remove irrelevant columns
dcm.prot.all.cass <- dcm.prot.all.cass[c(1,2,4)]
#change name column back so that it is compatible with writeFasta function
colnames(dcm.prot.all.cass)[1] <- "name"
#split by cassette and write each cassette multifasta
cass <- unique(dcm.prot.all.cass$DCM.cassette)
for (x in (1:length(cass))) {
  chunk <- dcm.prot.all.cass[dcm.prot.all.cass$DCM.cassette == cass[x], ]
  chunk <- chunk[c(1,2)]
  writeFasta(chunk, paste("DCM.cassettes.all.proteins/single.cassettes/",cass[x],".faa",sep=""))
}

There is now a separate multifasta for each cassette. This is the pre-requisite for setting up the cassette to cassette blastP searches.

4 Generating blastP results tables

For this, I am doing every cassette against DEFO cassette.

This script was adapted from my analysis of archaeal RDase synteny. It is executed from the folder:

/Dropbox/Projects/DCM/DCM.cassette.synteny/DCM.cassettes.all.proteins/

note that DEFO.faa must be copied into the reference directory for this script to work

makeblastdb -in DEFO.reference/DEFO.faa -parse_seqids \
-dbtype prot

mkdir blast_results

for f in single.cassettes/*
    do 
    name=${f##*/}
    blastp -db DEFO.reference/DEFO.faa -query "$f" \
    -out blast_results/"${name}.out" \
  -outfmt 6 \
  -evalue 0.0000000001
done

These results are all placed into directory DCM.cassettes.all.proteins/blast_results (relative to R working directory)

5 Generating synteny plots iteratively

5.1 overview

The plot itself is generated from two objects:

5.1.0.1 dnaseg table

These tables are used by genoplotR to determine where the genes are and in what orientation they lay. This data is used in combination with the similarity information.

The information needed by this table is: 1. names1 : name of the gene, in this case locus tag (unless I want to go back and change it more agressively) 2. starts1 : start location in DNA 3. ends1 : stop location in DNA 4. strands1 : orientation, + or - 5. cols1 : arbitrary color assignment for the thing, can use some standard colors like “blue” or “grey”, maybe hex codes I am not sure

These will be generated from the gene.info.all table

5.1.0.2 comparison object

This is read from the blast output file.

5.1.1 Merging the two objects

The most important issue is that the blast coordinates are in terms of AA and the gene coordinates are in terms of DNA. This makes for a few problems

AA coords must be multiplied by 3
BLAST AA coords must be added to gene coords for this to be workable

Additionally, any cassettes that are in reverse orientation have to be flipped… luckily, DEFO is in + orientation

First, write a script that will make necessary adjustments with a conditional orientation flip.

Some of the scaffolds have genes in multiple orientations. This script detects the “correct” orientation and adjusts accordingly. Correct is defined as the orientation as it matches cmuA, since that was the core of the search..

library(genoPlotR)

#define the cmuA/dcmE gene name; this is used in the conditional flip
core <- "Ga0196314_1154"

#dnaseg generate.. this does not need to be in the loop
cass <- unique(gene.info.all$DCM.cassette)
#reference dnaseg for DEFO
dnaseg.ref <- gene.info.all[gene.info.all$DCM.cassette == "DEFO", ]
names1 <- dnaseg.ref$Locus.Tag
starts1 <- dnaseg.ref$Start.Coord
ends1 <- dnaseg.ref$End.Coord
strands1 <- dnaseg.ref$Strand
cols1 <- "blue"
dnaseg.ref <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1)
dnaseg.ref <- dna_seg(dnaseg.ref)
#change coords so it starts at 1
adjust <- dnaseg.ref[1,2] - 1
dnaseg.ref$start <- dnaseg.ref$start - adjust
dnaseg.ref$end <- dnaseg.ref$end - adjust

#import the query
#this is where loop will begin
dnaseg.q <- gene.info.all[gene.info.all$DCM.cassette == cass[18], ]
names1 <- dnaseg.q$Locus.Tag
starts1 <- dnaseg.q$Start.Coord
ends1 <- dnaseg.q$End.Coord
strands1 <- dnaseg.q$Strand
cols1 <- "blue"
dnaseg.q <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1)
dnaseg.q <- dna_seg(dnaseg.q)

#adjusting DNA coordinates
##first we must account for coordinates for genes that are not restricted to a scaffold, which have very large numbers.  This will screw up images.
##make sure the first relevant gene starts at position 1 by subtracting first gene start-1 from all starts and stops
adjust <- dnaseg.q[1,2] - 1
dnaseg.q$start <- dnaseg.q$start - adjust
dnaseg.q$end <- dnaseg.q$end - adjust


#comparison generate
comp <- read_comparison_from_blast(paste("DCM.cassettes.all.proteins/blast_results/",cass[18],".faa.out",sep = ""),color_scheme = "grey", filt_low_per_id = 25)
comp$name2 <- as.character(comp$name2)

##then we have to adjust the comp file to actually map to the gene coords
##wherever names match, start and stop have to be added to gene starts and stops
##this is where the flip can take place?  sort of... but not if genes are out of order

##define functions that will make the necessary adjustments
##the idea is to make series of functions that can be applied via lapply across the comp file

cadStart1 <- function(x) {
  return(comp[comp$name1 == x,1] *3 + dnaseg.q[dnaseg.q$name == x,2])
}

cadEnd1 <- function(x) {
  return(comp[comp$name1 == x,2] *3+ dnaseg.q[dnaseg.q$name == x,2])
}

cadStart2 <- function(x) {
  return(comp[comp$name2 == x,3]  *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
 
cadEnd2 <- function(x) {
  return(comp[comp$name2 == x,4] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
 
comp$start1 <- unlist(lapply(comp$name1, cadStart1))
comp$start2 <- unlist(lapply(comp$name2, cadStart2))
comp$end1 <- unlist(lapply(comp$name1, cadEnd1))
comp$end2 <- unlist(lapply(comp$name2, cadEnd2))

# the comp file is now fixed along with the dnasegs

# build a genoplot

segs <- list(dnaseg.q,dnaseg.ref )
comps <- list(comp)

svg(paste("DCM.cassettes.all.proteins/pairwise.maps/",cass[15],"_to_DEFO.svg",sep = ""), h=4, w=8)
plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = T,
              dna_seg_labels = c(paste(cass[15]),"DEFO"),
              gene_type = "arrows")
dev.off()

## png 
##   2

#override_color_schemes = T, 
              #global_color_scheme = c("auto","auto","grey",0.4),

6 Looping plot creation across all gene cassettes

for this loop to work, duplicate alignments have to be manually removed from the blast table;; I manually removed the weaker hit of any duplicates. For example, methyltransferases all have some weak homology to one another, but only the strongest is relevant.

library(genoPlotR)
library(ggplot2)


 #testing a refined cad function
cadStart1 <- function(x) {
  return(cline[cline$name1 == x,1] *3 + dnaseg.q[dnaseg.q$name == x,2])
}
cadEnd1 <- function(x) {
  return(cline[cline$name1 == x,2] *3+ dnaseg.q[dnaseg.q$name == x,2])
}
cadStart2 <- function(x) {
  return(cline[cline$name2 == x,3]  *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
 cadEnd2 <- function(x) {
  return(cline[cline$name2 == x,4] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
 
 
mid_pos <- middle(dnaseg.ref)
annot.ref <- c("dcmA","dcmB","dcmC","dcmD","dcmE","dcmF","dcmG","dcmH","dcmI","dcmJ")
annot1 <- genoPlotR::annotation(x1=mid_pos, text=annot.ref, rot = 30)
annot <- list (NULL, annot1)
 
#color.codes
urod <- "#ff4949"
reg <- "#f7f020"
mtrH <- "#048c0b"
fedox <- "#f7ebc0"
trans <- "#ecc4ff"
corr <- "#87f3ff"
 
 
#dnaseg generate.. this does not need to be in the loop
cass <- unique(gene.info.all$DCM.cassette)
#reference dnaseg for DEFO
dnaseg.ref <- gene.info.all[gene.info.all$DCM.cassette == "DEFO", ]
names1 <- dnaseg.ref$Locus.Tag
starts1 <- dnaseg.ref$Start.Coord
ends1 <- dnaseg.ref$End.Coord
strands1 <- dnaseg.ref$Strand
cols1 <- "black"
fill1 <- c(reg, corr, urod, reg, urod, mtrH, fedox, fedox, urod, trans)
dnaseg.ref <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1, fill=fill1)
dnaseg.ref <- dna_seg(dnaseg.ref)
#change coords so it starts at 1
adjust <- dnaseg.ref[1,2] - 1
dnaseg.ref$start <- dnaseg.ref$start - adjust
dnaseg.ref$end <- dnaseg.ref$end - adjust

#create annotations for reference
mid_pos <- middle(dnaseg.ref) - 0
annot.ref <- c("dcmA","dcmB","dcmC","dcmD","dcmE","dcmF","dcmG","dcmH","dcmI","dcmJ")
annot1 <- genoPlotR::annotation(x1=mid_pos, text=annot.ref, rot = 0)
annot <- list (NULL, annot1)

#create lists for iterating over
cass <- unique(gene.info.all$DCM.cassette)
pfun <- function(x) { paste(x,".faa",sep="")}
files <- unlist(lapply(cass,pfun))

#import the query
#this is where loop will begin

for (x in c(1:length(files))) {
  #first generate query dnaseg
  dnaseg.q <- gene.info.all[gene.info.all$DCM.cassette == cass[x], ]
  names1 <- dnaseg.q$Locus.Tag
  starts1 <- dnaseg.q$Start.Coord
  ends1 <- dnaseg.q$End.Coord
  strands1 <- dnaseg.q$Strand
  cols1 <- "black"
  fill1 <- "#1f7df7"
  dnaseg.q <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1, fill=fill1)
  dnaseg.q <- dna_seg(dnaseg.q)
  #
  
  #import comparison file
  comp <- read_comparison_from_blast(paste("DCM.cassettes.all.proteins/blast_results/",cass[x],".faa.out",sep = ""),color_scheme = "grey", filt_low_per_id = 25)
  comp$name2 <- as.character(comp$name2)
  comp$col <- apply_color_scheme(comp$per_id, color_scheme = "gray", rng = c(25,100), transparency = 0.5)
 
  #direction test relative to cmuA/dcmE
    #define the cmuA/dcmE gene name; this is used in the conditional flip
    core <- "Ga0196314_1154"
    #check the orientation of the gene which aligns with cmuA/dcmE
    ## pull the matching gene name
    fliptest <- comp[comp$name2 == core, 5]
    direction <- 0
    # a small subset of cassettes have no dcmE homolog, these must be skipped over
    if (length(fliptest) != 0) {
    direction <- dnaseg.q[dnaseg.q$name == fliptest, 4]
    }
  #make necessary adjustments for - orientation
  
  if (direction == -1 & cass[x] == "DCMF") {
    adjust <- dnaseg.q[nrow(dnaseg.q),2] - 1
    dnaseg.q$start <- abs(dnaseg.q$start - adjust)
    dnaseg.q$end <- abs(dnaseg.q$end - adjust)
    dnaseg.q$strand <- 1
     for (y in c(1:nrow(comp))) {
      cline <- comp[y,]
      comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
      comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
      comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
      comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
    }
  }
  #make necesary start coordinate adjustment  for reverse orientation
  if (direction == -1 & cass[x] != "DCMF" | direction == 0) {
    adjust <- dnaseg.q[1,2] - 1
    dnaseg.q$start <- dnaseg.q$start - adjust
    dnaseg.q$end <- dnaseg.q$end - adjust
    star <- abs(dnaseg.q$end - dnaseg.q[nrow(dnaseg.q),3] -1)
    en <- abs(dnaseg.q$start - dnaseg.q[nrow(dnaseg.q),3] -1)
    dnaseg.q$start <- star
    dnaseg.q$end <- en
    dnaseg.q$strand <- -dnaseg.q$strand
    #make adjustments for gene location and AA to DNA
    for (y in c(1:nrow(comp))) {
      cline <- comp[y,]
      comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
      comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
      comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
      comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
    }
  }
  #this handles cassettes in positive orientation
  if (direction > 0) {
    adjust <- dnaseg.q[1,2] - 1
    dnaseg.q$start <- dnaseg.q$start - adjust
    dnaseg.q$end <- dnaseg.q$end - adjust
    #make adjustments for gene location and AA to DNA
    for (y in c(1:nrow(comp))) {
      cline <- comp[y,]
      comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
      comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
      comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
      comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
    }
  }
  
  #to add back the colors that are stripped in the final plot
  fill1 <- c(reg, corr, urod, reg, urod, mtrH, fedox, fedox, urod, trans)
  dnaseg.ref$fill <- fill1
    
  #build the genoplot
  segs <- list(dnaseg.q,dnaseg.ref )
  comps <- list(comp)
  pdf(paste("DCM.cassettes.all.proteins/pairwise.maps/",cass[x],"_to_DEFO.pdf",sep = ""), h=2, w=8)
  plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F, annotations = annot,
              dna_seg_labels = c(paste(cass[x]),"DEFO"),
              gene_type = "arrows")
  dev.off()
  
  #build wordy maps
  segs <- list(dnaseg.q,dnaseg.ref )
  comps <- list(comp)
  pdf(paste("DCM.cassettes.all.proteins/pairwise.maps.extensive/",cass[x],"_to_DEFO.pdf",sep = ""), h=2, w=8)
  plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F, annotations = annot,
              dna_seg_labels = c(paste(cass[x]),"DEFO"),
              gene_type = "arrows",
              main = paste(gene.info.all[gene.info.all$DCM.cassette == cass[x],7][1],
                           " (copies per genome = ",
                           gene.info.all[gene.info.all$DCM.cassette == cass[x],15][1],")",sep = ""))
        
    dev.off()
    
  #build compact maps
  dnaseg.ref$fill <- "blue"
  segs <- list(dnaseg.q,dnaseg.ref )
  comps <- list(comp)
  pdf(paste("DCM.cassettes.all.proteins/pairwise.maps.compact/",cass[x],"_to_DEFO.pdf",sep = ""), h=2, w=4)
  plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F,
              gene_type = "arrows",
              main = paste("(copies per genome = ",
                           round(gene.info.all[gene.info.all$DCM.cassette == cass[x],15][1],4),")",sep = ""))
  dev.off()
}
#, override_color_schemes = T, 
              #global_color_scheme = c("auto","auto","grey",0.4)

example compact plot

Individual maps are knitted together in Adobe Illustrator for Figure S2:

7 Pairwise synteny map

This section describes the pipeline for generating a sequential synteny map, which compares many cassettes to one another in a step-wise manner

The sequence will be:

DIELB
DIELA
DEFO
UNSWDHB
DMCF

#!/bin/bash

#DIEL B to DIEL A

makeblastdb -in DIEL_A/DIEL_A.faa -parse_seqids \
-dbtype prot

blastp -db DIEL_A/DIEL_A.faa -query DIEL_B/DIEL_B.faa \
    -out blast_results/DIEL_B_to_DIEL_B.out \
  -outfmt 6 \
  -evalue 0.0000000001

#DIEL A to DEFO

makeblastdb -in DEFO/DEFO.faa -parse_seqids \
-dbtype prot

blastp -db DEFO/DEFO.faa -query DIEL_A/DIEL_A.faa \
    -out blast_results/DIEL_A_to_DEFO.out \
  -outfmt 6 \
  -evalue 0.0000000001


#DEFO to UNSWDHB

makeblastdb -in UNSWDHB/UNSWDHB.faa -parse_seqids \
-dbtype prot

blastp -db UNSWDHB/UNSWDHB.faa -query DEFO/DEFO.faa \
    -out blast_results/DEFO_to_UNSWDHB.out \
  -outfmt 6 \
  -evalue 0.0000000001

#UNSWDHB to DCMF

makeblastdb -in DCMF/DCMF.faa -parse_seqids \
-dbtype prot

blastp -db DCMF/DCMF.faa -query UNSWDHB/UNSWDHB.faa \
    -out blast_results/UNSWDHB_to_DCMF.out \
  -outfmt 6 \
  -evalue 0.0000000001

Now I have to read in each comp and create a dnaseg for each, then line them all up and make a plot

7.1 dnasegs

#function for turning gene coordinate table into dnaseg format
t2dnaseg <- function(x) {
  names1 <- x$Locus.Tag
  starts1 <- x$Start.Coord
  ends1 <- x$End.Coord
  strands1 <- x$Strand
  cols1 <- "blue"
  dnaseg <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1)
  dnaseg <- dna_seg(dnaseg)
  #change coords so it starts at 1
  adjust <- dnaseg[1,2] - 1
  dnaseg$start <- dnaseg$start - adjust
  dnaseg$end <- dnaseg$end - adjust
  return(dnaseg)
}

#function to flip orientation only if strand = -1, i.e. if all genes are in reverse orientation
cflip <- function(x){
  if (unique(x$strand == -1)){
    star <- abs(x$end - x[nrow(x),3] -1)
    en <- abs(x$start - x[nrow(x),3] -1)
    x$start <- star
    x$end <- en
    x$strand <- -x$strand
    return(x)
  }
  else{
    return(x)
  }  
}

#reference dnaseg for DIEL_A
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DIEL_A", ]
dnaseg.diela <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DIEL_B", ]
dnaseg.dielb <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DEFO", ]
dnaseg.defo <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "UNSWDHB", ]
dnaseg.unswdhb <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DCMF", ]
starts <- dnaseg$Start.Coord
stops <- dnaseg$End.Coord
dnaseg$Start.Coord <- stops
dnaseg$End.Coord <- starts
dnaseg.dcmf <- cflip(t2dnaseg(dnaseg))

7.2 comps

 #comp adjust function
cadStart1 <- function(x) {
  return(cline[cline$name1 == x,1] *3 + dnaseg.q[dnaseg.q$name == x,2])
}
cadEnd1 <- function(x) {
  return(cline[cline$name1 == x,2] *3+ dnaseg.q[dnaseg.q$name == x,2])
}
cadStart2 <- function(x) {
  return(cline[cline$name2 == x,3]  *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
cadEnd2 <- function(x) {
  return(cline[cline$name2 == x,4] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}

#comp b to a
comp <- read_comparison_from_blast("DCM.cassettes.all.proteins/sequential.reference/blast_results/DIEL_B_to_DIEL_A.out",color_scheme = "grey", filt_low_per_id = 25)
comp$col <- apply_color_scheme(comp$per_id, color_scheme = "gray", rng = c(25,100), transparency = 0.5)

#adjust coordinates for mapping, do orientation flips if needed
dnaseg.q <- dnaseg.dielb
dnaseg.ref <- dnaseg.diela

#make adjustments for gene location and AA to DNA
for (y in c(1:nrow(comp))) {
  cline <- comp[y,]
  comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
  comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
  comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
  comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
}

    
  segs <- list(dnaseg.dielb,dnaseg.diela )
  comps <- list(comp)
  plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F, 
              gene_type = "arrows")

Pairwise comparisons are knitted together in Adobe Illustrator, recolored, and annotated:

SCCP calculations and gene synteny maps