SCCP calculations and gene synteny maps
This script describes a loop-based approach for generating pairwise gene syteny maps.
The general pipeline:
- prepare input files
- export from IMG: the gene info map, containing gene name, gene coordinates, etc. can contain many genes, not just the ones you are working with.
- edit the gene info file to have a “cassette” column, a code for each genome
- a single protein multifasta containing all proteins from all cassettes
- ideally these will be targetted by the R script from the command line, one is base one is target, they are targetted using the “cassette” name
- calculate gene relative abundance based on single-copy-conserved-protein gene copy numbers
- import and format files
- change fasta file headers, make appropriate names, remove everything after first space, make sure it matches with “locus_tag” from IMG
- use only locus tags for everything
- format the gene info table to have only crucial information, target actual IMG terms in the program, not column numbers
- change fasta file headers, make appropriate names, remove everything after first space, make sure it matches with “locus_tag” from IMG
- write sub-multifastas for the target gene clusters
- perform blastP searches in the format compatible with genoplotR
- make blastdb with the template
- blast the query against the database
- Import and parse the blastP results
- Create the genoplot synteny maps, one for each gene cassette compared to the Defo mec cassette
1 Building full IMG cassette sets for DCM cassette
1.1 metagenomes and genomes
This follows identification of cmuA-orthologs. Add all ortholog scaffolds to cart and then add all genes from those scaffolds, save as a new set.
Gather any other orthologs into another set (the ABCD orthologs), add them to DCM_cassette_allorgs gene set
We now have two gene sets, those from isolates and those from metagenomes (note that the DCMF cassette has to be added manually to the final exports)
Combine the two sets into the gene cart and download the proteins.
Manually reformat the gene info table two show
- start Coord
- End Coord
- Strand
- DNA seq. length
- AA seq. length
- Scaffold read depth (used for relative abundance calculation)
Export the table
Save the two new files in project directory
1.2 manual amendment of gene.info table
The main goal here is to
- clean the Gene_IDs such that they are unique and can be linked to the fasta headers (which will be cleaned accordingly)
- add a code-word for each cassette
- add strain DCMF to the list
In the table, these exist in two forms
Goal “1” is accomplished by using only the locus-tags. This is already a column in the gene info table.
2 Calculation of gene-frequency information
Gather an average total genomes count from 10 single-copy conserved proteins;
sccp <- read.csv("DCM.cassettes.all.proteins/single.copy.conserved.functions.tsv",header = T,sep = "\t")
datatable(sccp)Genes with these functions are gathered from the metagenomes.
2.1 unique single-copy conserved protein counts
sccp.unique.counts <- read.csv("DCM.cassettes.all.proteins/mgenome.sccp.unique.counts.tsv", header = T, sep = "\t")
write.csv(sccp.unique.counts,"sccp.unique.counts.csv")
datatable(sccp.unique.counts)Average counts are very consistent.
2.2 calculate average genome count for each metagenome
2.2.1 Build a combination set for each metagenome
Gather the COG proteins into the cart, display read-depths, and export. No need to display the COG names, I will be summing all read depths and dividing by 10
(use the workaround for this: place funcitons in cart > profile and alignment tab > gene cart (at bottom)… this has to be done in batches since it will overload the gene cart)
2.2.2 import and combine the sccp tables
sccp.full.table <- data.frame()
files <- list.files("DCM.cassettes.all.proteins/mgenome.sccp.tables/")
for (x in (1:length(files))) {
chunk <- read.csv(paste("DCM.cassettes.all.proteins/mgenome.sccp.tables/",files[x],sep=""), header = T, sep = "\t")
sccp.full.table <- rbind(sccp.full.table,chunk)
}
sccp.full.table <- sccp.full.table[-1,]
write.csv(sccp.full.table, "sccp.full.table.csv")
datatable(head(sccp.full.table))2.3 calculate gene frequencies for each DCM cassette
The average sccp gene depths are joined into an overall gene info table
2.3.1 import custom gene info table, add sccp.depth, and calculate proportion
gene.info.all <- read.csv("DCM.cassettes.all.proteins/dcm.cassette.gene.info.all.tsv", header = T, sep = "\t")
gene.info.all <- gene.info.all[-8]
gene.info.all <- left_join(gene.info.all, sccp.average.read.depth, by="Genome.Name")
gene.info.all$genes.per.genome.copy <- gene.info.all$read.depth / gene.info.all$avg.sccp.read.depth
write.csv(gene.info.all, "gene.info.all.with.proportions.csv")DCM cassettes per genome copy reach up to 1% in the deep subsurface, close in Santa Barbara oil seep and in some peat samples
3 Splitting the multifasta by cassette
3.1 cleaning the fasta headers
The goal is to only have the locus tags as fasta headers… the problem is that isolates have genome.id followed by locustag, while mgenome fasta headers start iwth locus tag. This can be handled programmatically but it is a PiTA. So, I start by manually removing genome ID’s from the fasta file in text editor.
library(Biostrings)
dcm.prot.all <- readAAStringSet("DCM.cassettes.all.proteins/dcm.cassette.all.proteins.faa")
name <- names(dcm.prot.all)
seq <- paste(dcm.prot.all)
dcm.prot.all <- data.frame(name,seq,stringsAsFactors = F)
#split by space and take only the first item from each
## strategy is to use lapply across the column, define a simple split function first
splitfun <- function(x){
strsplit(x," ")[[1]][1]
}
dcm.prot.all$name <- unlist(lapply(dcm.prot.all$name, splitfun))
# custom function for writing fasta files from data frames
writeFasta<-function(data, filename){
fastaLines = c()
for (rowNum in 1:nrow(data)){
fastaLines = c(fastaLines, as.character(paste(">", data[rowNum,"name"], sep = "")))
fastaLines = c(fastaLines,as.character(data[rowNum,"seq"]))
}
fileConn<-file(filename)
writeLines(fastaLines, fileConn)
close(fileConn)
}
writeFasta(dcm.prot.all, "DCM.cassettes.all.proteins/dcm.cassette.all.proteins.clean.faa")3.2 write a multifasta for each cassette
add the gene.info.all table to the left, filter down to columns with only cassette name, split it, and write the multiple fasta files
#change the name column to be compatible with gene info table
colnames(dcm.prot.all)[1] <- "Locus.Tag"
#join the full table
dcm.prot.all.cass <- left_join(dcm.prot.all, gene.info.all, by="Locus.Tag")
#remove irrelevant columns
dcm.prot.all.cass <- dcm.prot.all.cass[c(1,2,4)]
#change name column back so that it is compatible with writeFasta function
colnames(dcm.prot.all.cass)[1] <- "name"
#split by cassette and write each cassette multifasta
cass <- unique(dcm.prot.all.cass$DCM.cassette)
for (x in (1:length(cass))) {
chunk <- dcm.prot.all.cass[dcm.prot.all.cass$DCM.cassette == cass[x], ]
chunk <- chunk[c(1,2)]
writeFasta(chunk, paste("DCM.cassettes.all.proteins/single.cassettes/",cass[x],".faa",sep=""))
}There is now a separate multifasta for each cassette. This is the pre-requisite for setting up the cassette to cassette blastP searches.
4 Generating blastP results tables
For this, I am doing every cassette against DEFO cassette.
This script was adapted from my analysis of archaeal RDase synteny. It is executed from the folder:
/Dropbox/Projects/DCM/DCM.cassette.synteny/DCM.cassettes.all.proteins/
note that DEFO.faa must be copied into the reference directory for this script to work
makeblastdb -in DEFO.reference/DEFO.faa -parse_seqids \
-dbtype prot
mkdir blast_results
for f in single.cassettes/*
do
name=${f##*/}
blastp -db DEFO.reference/DEFO.faa -query "$f" \
-out blast_results/"${name}.out" \
-outfmt 6 \
-evalue 0.0000000001
done
These results are all placed into directory DCM.cassettes.all.proteins/blast_results (relative to R working directory)
5 Generating synteny plots iteratively
5.1 overview
The plot itself is generated from two objects:
5.1.0.1 dnaseg table
These tables are used by genoplotR to determine where the genes are and in what orientation they lay. This data is used in combination with the similarity information.
The information needed by this table is: 1. names1 : name of the gene, in this case locus tag (unless I want to go back and change it more agressively) 2. starts1 : start location in DNA 3. ends1 : stop location in DNA 4. strands1 : orientation, + or - 5. cols1 : arbitrary color assignment for the thing, can use some standard colors like “blue” or “grey”, maybe hex codes I am not sure
These will be generated from the gene.info.all table
5.1.0.2 comparison object
This is read from the blast output file.
5.1.1 Merging the two objects
The most important issue is that the blast coordinates are in terms of AA and the gene coordinates are in terms of DNA. This makes for a few problems
- AA coords must be multiplied by 3
- BLAST AA coords must be added to gene coords for this to be workable
Additionally, any cassettes that are in reverse orientation have to be flipped… luckily, DEFO is in + orientation
First, write a script that will make necessary adjustments with a conditional orientation flip.
Some of the scaffolds have genes in multiple orientations. This script detects the “correct” orientation and adjusts accordingly. Correct is defined as the orientation as it matches cmuA, since that was the core of the search..
library(genoPlotR)
#define the cmuA/dcmE gene name; this is used in the conditional flip
core <- "Ga0196314_1154"
#dnaseg generate.. this does not need to be in the loop
cass <- unique(gene.info.all$DCM.cassette)
#reference dnaseg for DEFO
dnaseg.ref <- gene.info.all[gene.info.all$DCM.cassette == "DEFO", ]
names1 <- dnaseg.ref$Locus.Tag
starts1 <- dnaseg.ref$Start.Coord
ends1 <- dnaseg.ref$End.Coord
strands1 <- dnaseg.ref$Strand
cols1 <- "blue"
dnaseg.ref <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1)
dnaseg.ref <- dna_seg(dnaseg.ref)
#change coords so it starts at 1
adjust <- dnaseg.ref[1,2] - 1
dnaseg.ref$start <- dnaseg.ref$start - adjust
dnaseg.ref$end <- dnaseg.ref$end - adjust
#import the query
#this is where loop will begin
dnaseg.q <- gene.info.all[gene.info.all$DCM.cassette == cass[18], ]
names1 <- dnaseg.q$Locus.Tag
starts1 <- dnaseg.q$Start.Coord
ends1 <- dnaseg.q$End.Coord
strands1 <- dnaseg.q$Strand
cols1 <- "blue"
dnaseg.q <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1)
dnaseg.q <- dna_seg(dnaseg.q)
#adjusting DNA coordinates
##first we must account for coordinates for genes that are not restricted to a scaffold, which have very large numbers. This will screw up images.
##make sure the first relevant gene starts at position 1 by subtracting first gene start-1 from all starts and stops
adjust <- dnaseg.q[1,2] - 1
dnaseg.q$start <- dnaseg.q$start - adjust
dnaseg.q$end <- dnaseg.q$end - adjust
#comparison generate
comp <- read_comparison_from_blast(paste("DCM.cassettes.all.proteins/blast_results/",cass[18],".faa.out",sep = ""),color_scheme = "grey", filt_low_per_id = 25)
comp$name2 <- as.character(comp$name2)
##then we have to adjust the comp file to actually map to the gene coords
##wherever names match, start and stop have to be added to gene starts and stops
##this is where the flip can take place? sort of... but not if genes are out of order
##define functions that will make the necessary adjustments
##the idea is to make series of functions that can be applied via lapply across the comp file
cadStart1 <- function(x) {
return(comp[comp$name1 == x,1] *3 + dnaseg.q[dnaseg.q$name == x,2])
}
cadEnd1 <- function(x) {
return(comp[comp$name1 == x,2] *3+ dnaseg.q[dnaseg.q$name == x,2])
}
cadStart2 <- function(x) {
return(comp[comp$name2 == x,3] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
cadEnd2 <- function(x) {
return(comp[comp$name2 == x,4] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
comp$start1 <- unlist(lapply(comp$name1, cadStart1))
comp$start2 <- unlist(lapply(comp$name2, cadStart2))
comp$end1 <- unlist(lapply(comp$name1, cadEnd1))
comp$end2 <- unlist(lapply(comp$name2, cadEnd2))
# the comp file is now fixed along with the dnasegs
# build a genoplot
segs <- list(dnaseg.q,dnaseg.ref )
comps <- list(comp)
svg(paste("DCM.cassettes.all.proteins/pairwise.maps/",cass[15],"_to_DEFO.svg",sep = ""), h=4, w=8)
plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = T,
dna_seg_labels = c(paste(cass[15]),"DEFO"),
gene_type = "arrows")
dev.off()## png
## 2
6 Looping plot creation across all gene cassettes
for this loop to work, duplicate alignments have to be manually removed from the blast table;; I manually removed the weaker hit of any duplicates. For example, methyltransferases all have some weak homology to one another, but only the strongest is relevant.
library(genoPlotR)
library(ggplot2)
#testing a refined cad function
cadStart1 <- function(x) {
return(cline[cline$name1 == x,1] *3 + dnaseg.q[dnaseg.q$name == x,2])
}
cadEnd1 <- function(x) {
return(cline[cline$name1 == x,2] *3+ dnaseg.q[dnaseg.q$name == x,2])
}
cadStart2 <- function(x) {
return(cline[cline$name2 == x,3] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
cadEnd2 <- function(x) {
return(cline[cline$name2 == x,4] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
mid_pos <- middle(dnaseg.ref)
annot.ref <- c("dcmA","dcmB","dcmC","dcmD","dcmE","dcmF","dcmG","dcmH","dcmI","dcmJ")
annot1 <- genoPlotR::annotation(x1=mid_pos, text=annot.ref, rot = 30)
annot <- list (NULL, annot1)
#color.codes
urod <- "#ff4949"
reg <- "#f7f020"
mtrH <- "#048c0b"
fedox <- "#f7ebc0"
trans <- "#ecc4ff"
corr <- "#87f3ff"
#dnaseg generate.. this does not need to be in the loop
cass <- unique(gene.info.all$DCM.cassette)
#reference dnaseg for DEFO
dnaseg.ref <- gene.info.all[gene.info.all$DCM.cassette == "DEFO", ]
names1 <- dnaseg.ref$Locus.Tag
starts1 <- dnaseg.ref$Start.Coord
ends1 <- dnaseg.ref$End.Coord
strands1 <- dnaseg.ref$Strand
cols1 <- "black"
fill1 <- c(reg, corr, urod, reg, urod, mtrH, fedox, fedox, urod, trans)
dnaseg.ref <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1, fill=fill1)
dnaseg.ref <- dna_seg(dnaseg.ref)
#change coords so it starts at 1
adjust <- dnaseg.ref[1,2] - 1
dnaseg.ref$start <- dnaseg.ref$start - adjust
dnaseg.ref$end <- dnaseg.ref$end - adjust
#create annotations for reference
mid_pos <- middle(dnaseg.ref) - 0
annot.ref <- c("dcmA","dcmB","dcmC","dcmD","dcmE","dcmF","dcmG","dcmH","dcmI","dcmJ")
annot1 <- genoPlotR::annotation(x1=mid_pos, text=annot.ref, rot = 0)
annot <- list (NULL, annot1)
#create lists for iterating over
cass <- unique(gene.info.all$DCM.cassette)
pfun <- function(x) { paste(x,".faa",sep="")}
files <- unlist(lapply(cass,pfun))
#import the query
#this is where loop will begin
for (x in c(1:length(files))) {
#first generate query dnaseg
dnaseg.q <- gene.info.all[gene.info.all$DCM.cassette == cass[x], ]
names1 <- dnaseg.q$Locus.Tag
starts1 <- dnaseg.q$Start.Coord
ends1 <- dnaseg.q$End.Coord
strands1 <- dnaseg.q$Strand
cols1 <- "black"
fill1 <- "#1f7df7"
dnaseg.q <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1, fill=fill1)
dnaseg.q <- dna_seg(dnaseg.q)
#
#import comparison file
comp <- read_comparison_from_blast(paste("DCM.cassettes.all.proteins/blast_results/",cass[x],".faa.out",sep = ""),color_scheme = "grey", filt_low_per_id = 25)
comp$name2 <- as.character(comp$name2)
comp$col <- apply_color_scheme(comp$per_id, color_scheme = "gray", rng = c(25,100), transparency = 0.5)
#direction test relative to cmuA/dcmE
#define the cmuA/dcmE gene name; this is used in the conditional flip
core <- "Ga0196314_1154"
#check the orientation of the gene which aligns with cmuA/dcmE
## pull the matching gene name
fliptest <- comp[comp$name2 == core, 5]
direction <- 0
# a small subset of cassettes have no dcmE homolog, these must be skipped over
if (length(fliptest) != 0) {
direction <- dnaseg.q[dnaseg.q$name == fliptest, 4]
}
#make necessary adjustments for - orientation
if (direction == -1 & cass[x] == "DCMF") {
adjust <- dnaseg.q[nrow(dnaseg.q),2] - 1
dnaseg.q$start <- abs(dnaseg.q$start - adjust)
dnaseg.q$end <- abs(dnaseg.q$end - adjust)
dnaseg.q$strand <- 1
for (y in c(1:nrow(comp))) {
cline <- comp[y,]
comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
}
}
#make necesary start coordinate adjustment for reverse orientation
if (direction == -1 & cass[x] != "DCMF" | direction == 0) {
adjust <- dnaseg.q[1,2] - 1
dnaseg.q$start <- dnaseg.q$start - adjust
dnaseg.q$end <- dnaseg.q$end - adjust
star <- abs(dnaseg.q$end - dnaseg.q[nrow(dnaseg.q),3] -1)
en <- abs(dnaseg.q$start - dnaseg.q[nrow(dnaseg.q),3] -1)
dnaseg.q$start <- star
dnaseg.q$end <- en
dnaseg.q$strand <- -dnaseg.q$strand
#make adjustments for gene location and AA to DNA
for (y in c(1:nrow(comp))) {
cline <- comp[y,]
comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
}
}
#this handles cassettes in positive orientation
if (direction > 0) {
adjust <- dnaseg.q[1,2] - 1
dnaseg.q$start <- dnaseg.q$start - adjust
dnaseg.q$end <- dnaseg.q$end - adjust
#make adjustments for gene location and AA to DNA
for (y in c(1:nrow(comp))) {
cline <- comp[y,]
comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
}
}
#to add back the colors that are stripped in the final plot
fill1 <- c(reg, corr, urod, reg, urod, mtrH, fedox, fedox, urod, trans)
dnaseg.ref$fill <- fill1
#build the genoplot
segs <- list(dnaseg.q,dnaseg.ref )
comps <- list(comp)
pdf(paste("DCM.cassettes.all.proteins/pairwise.maps/",cass[x],"_to_DEFO.pdf",sep = ""), h=2, w=8)
plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F, annotations = annot,
dna_seg_labels = c(paste(cass[x]),"DEFO"),
gene_type = "arrows")
dev.off()
#build wordy maps
segs <- list(dnaseg.q,dnaseg.ref )
comps <- list(comp)
pdf(paste("DCM.cassettes.all.proteins/pairwise.maps.extensive/",cass[x],"_to_DEFO.pdf",sep = ""), h=2, w=8)
plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F, annotations = annot,
dna_seg_labels = c(paste(cass[x]),"DEFO"),
gene_type = "arrows",
main = paste(gene.info.all[gene.info.all$DCM.cassette == cass[x],7][1],
" (copies per genome = ",
gene.info.all[gene.info.all$DCM.cassette == cass[x],15][1],")",sep = ""))
dev.off()
#build compact maps
dnaseg.ref$fill <- "blue"
segs <- list(dnaseg.q,dnaseg.ref )
comps <- list(comp)
pdf(paste("DCM.cassettes.all.proteins/pairwise.maps.compact/",cass[x],"_to_DEFO.pdf",sep = ""), h=2, w=4)
plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F,
gene_type = "arrows",
main = paste("(copies per genome = ",
round(gene.info.all[gene.info.all$DCM.cassette == cass[x],15][1],4),")",sep = ""))
dev.off()
}
#, override_color_schemes = T,
#global_color_scheme = c("auto","auto","grey",0.4)example compact plot
Individual maps are knitted together in Adobe Illustrator for Figure S2:
7 Pairwise synteny map
This section describes the pipeline for generating a sequential synteny map, which compares many cassettes to one another in a step-wise manner
The sequence will be:
- DIELB
- DIELA
- DEFO
- UNSWDHB
- DMCF
#!/bin/bash
#DIEL B to DIEL A
makeblastdb -in DIEL_A/DIEL_A.faa -parse_seqids \
-dbtype prot
blastp -db DIEL_A/DIEL_A.faa -query DIEL_B/DIEL_B.faa \
-out blast_results/DIEL_B_to_DIEL_B.out \
-outfmt 6 \
-evalue 0.0000000001
#DIEL A to DEFO
makeblastdb -in DEFO/DEFO.faa -parse_seqids \
-dbtype prot
blastp -db DEFO/DEFO.faa -query DIEL_A/DIEL_A.faa \
-out blast_results/DIEL_A_to_DEFO.out \
-outfmt 6 \
-evalue 0.0000000001
#DEFO to UNSWDHB
makeblastdb -in UNSWDHB/UNSWDHB.faa -parse_seqids \
-dbtype prot
blastp -db UNSWDHB/UNSWDHB.faa -query DEFO/DEFO.faa \
-out blast_results/DEFO_to_UNSWDHB.out \
-outfmt 6 \
-evalue 0.0000000001
#UNSWDHB to DCMF
makeblastdb -in DCMF/DCMF.faa -parse_seqids \
-dbtype prot
blastp -db DCMF/DCMF.faa -query UNSWDHB/UNSWDHB.faa \
-out blast_results/UNSWDHB_to_DCMF.out \
-outfmt 6 \
-evalue 0.0000000001
Now I have to read in each comp and create a dnaseg for each, then line them all up and make a plot
7.1 dnasegs
#function for turning gene coordinate table into dnaseg format
t2dnaseg <- function(x) {
names1 <- x$Locus.Tag
starts1 <- x$Start.Coord
ends1 <- x$End.Coord
strands1 <- x$Strand
cols1 <- "blue"
dnaseg <- data.frame(name=names1, start=starts1, end=ends1, strand=strands1, col=cols1)
dnaseg <- dna_seg(dnaseg)
#change coords so it starts at 1
adjust <- dnaseg[1,2] - 1
dnaseg$start <- dnaseg$start - adjust
dnaseg$end <- dnaseg$end - adjust
return(dnaseg)
}
#function to flip orientation only if strand = -1, i.e. if all genes are in reverse orientation
cflip <- function(x){
if (unique(x$strand == -1)){
star <- abs(x$end - x[nrow(x),3] -1)
en <- abs(x$start - x[nrow(x),3] -1)
x$start <- star
x$end <- en
x$strand <- -x$strand
return(x)
}
else{
return(x)
}
}
#reference dnaseg for DIEL_A
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DIEL_A", ]
dnaseg.diela <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DIEL_B", ]
dnaseg.dielb <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DEFO", ]
dnaseg.defo <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "UNSWDHB", ]
dnaseg.unswdhb <- cflip(t2dnaseg(dnaseg))
dnaseg <- gene.info.all[gene.info.all$DCM.cassette == "DCMF", ]
starts <- dnaseg$Start.Coord
stops <- dnaseg$End.Coord
dnaseg$Start.Coord <- stops
dnaseg$End.Coord <- starts
dnaseg.dcmf <- cflip(t2dnaseg(dnaseg))7.2 comps
#comp adjust function
cadStart1 <- function(x) {
return(cline[cline$name1 == x,1] *3 + dnaseg.q[dnaseg.q$name == x,2])
}
cadEnd1 <- function(x) {
return(cline[cline$name1 == x,2] *3+ dnaseg.q[dnaseg.q$name == x,2])
}
cadStart2 <- function(x) {
return(cline[cline$name2 == x,3] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
cadEnd2 <- function(x) {
return(cline[cline$name2 == x,4] *3 + dnaseg.ref[dnaseg.ref$name == x,2])
}
#comp b to a
comp <- read_comparison_from_blast("DCM.cassettes.all.proteins/sequential.reference/blast_results/DIEL_B_to_DIEL_A.out",color_scheme = "grey", filt_low_per_id = 25)
comp$col <- apply_color_scheme(comp$per_id, color_scheme = "gray", rng = c(25,100), transparency = 0.5)
#adjust coordinates for mapping, do orientation flips if needed
dnaseg.q <- dnaseg.dielb
dnaseg.ref <- dnaseg.diela
#make adjustments for gene location and AA to DNA
for (y in c(1:nrow(comp))) {
cline <- comp[y,]
comp[y,1] <- unlist(lapply(cline$name1, cadStart1))
comp[y,3] <- unlist(lapply(cline$name2, cadStart2))
comp[y,2] <- unlist(lapply(cline$name1, cadEnd1))
comp[y,4] <- unlist(lapply(cline$name2, cadEnd2))
}
segs <- list(dnaseg.dielb,dnaseg.diela )
comps <- list(comp)
plot_gene_map(segs, comparisons = comps, scale = F, dna_seg_scale = F,
gene_type = "arrows")Pairwise comparisons are knitted together in Adobe Illustrator, recolored, and annotated: