There are currently DCM two proteomes, one for D. formicoaceticum (defo) and one for DCM enrichment culture RM, in which Ca. D. elyunquensis (DIEL) is the dominant strain.

Both of them have been: 1. Updated to the current IMG gene names 2. The mec cassettes were flagged 3. The data must be combined into a single “long” table with three columns; 1) genome 2) gene number 3) class (mec, genome, metagenome) 4. The data will then be plotted in dodged box-plot style and the color of the DCM cassette genes is flagged.

1 Import and prepare for ggplot2

The data was manually curated to fix redundancy errors and mismatches and turned into long-format using Excel

library(readxl)
defo.long <- read_xlsx("new.protomes.ggplot/both.xlsx", sheet = 1)
diel.long <- read_xlsx("new.protomes.ggplot/both.xlsx", sheet = 2)
prot.long <- rbind(defo.long,diel.long)

print(paste(nrow(defo.long),"proteins were detected in the DEFO proteome"))
## [1] "1781 proteins were detected in the DEFO proteome"
print(paste(nrow(diel.long),"proteins were detected in the DIEL proteome"))
## [1] "1743 proteins were detected in the DIEL proteome"
datatable(prot.long)

2 Plotting

library(ggplot2)
library(scales)
library(plyr)

df = prot.long
df$class <- factor(df$class, levels=c("metagenome","genome","mec"))
df <- df[order(df$class),]

theme_set(theme_minimal())
p <- ggplot(df, aes(x=genome, y=Log2.Mean.Abund)) +
  geom_jitter(size = 1, aes(color = class), 
              position = position_jitter(width=c(0.1,.2),seed=23),
              alpha = 0.6) +
  scale_color_manual(values=c("#b5b5b5","#b5b5b5","#ff0000" )) +
  stat_boxplot(geom ="errorbar", width = 0.25) + 
  geom_boxplot(outlier.alpha = 0, alpha = .1, width = 0.5) +
  stat_summary(fun.y=mean, colour="black", geom="point", 
               shape=21, fill = "white", size=2,show_guide = FALSE) +
  labs(y="log2 abundance", 
       x="Genome") + guides(color = guide_legend(override.aes = list(size=5))) + 
  labs(color='remove') +
  ylim(0,40) 
ggsave(file="new.protomes.ggplot/prot.dodge.box.svg", plot=p, width=10, height=8, device = "svg")
p

These tables are imported into Adobe Illustrator for clean-up and annotation