A) Setting the experimental conditions

First, we selected the appropriate concentrations of arabinose based on the PCN and the fitness.

##Arabinose selection

##              Df Sum Sq Mean Sq F value Pr(>F)    
## ara           4 168900   42225   94.17 <2e-16 ***
## Residuals   243 108965     448                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Growth curves at different arabinose concentrations

B) Monitoring the evolution experiment

After the evolution experiment, we calculated the number of generations for each line each day of the evolution.

Analysis of generations

Generations statistics

## 
##  Pearson's product-moment correlation
## 
## data:  datalimpia$generations and datalimpia$day
## t = 2.9393, df = 448, p-value = 0.00346
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.04569129 0.22709974
## sample estimates:
##       cor 
## 0.1375488

## 
##  Pearson's product-moment correlation
## 
## data:  data_ara0$generations and data_ara0$day
## t = 0.27202, df = 90, p-value = 0.7862
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1771963  0.2321162
## sample estimates:
##        cor 
## 0.02866135

## 
##  Pearson's product-moment correlation
## 
## data:  data_ara0.0005$generations and data_ara0.0005$day
## t = 0.56582, df = 69, p-value = 0.5734
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1680085  0.2965608
## sample estimates:
##        cor 
## 0.06795902

## 
##  Pearson's product-moment correlation
## 
## data:  data_ara0.05$generations and data_ara0.05$day
## t = 0.48033, df = 97, p-value = 0.6321
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1501432  0.2437800
## sample estimates:
##        cor 
## 0.04871251

## 
##  Pearson's product-moment correlation
## 
## data:  data_ara0$generations and data_ara0$day
## t = 1.5921, df = 57, p-value = 0.1169
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.05251048  0.43921970
## sample estimates:
##       cor 
## 0.2063465

## 
##  Pearson's product-moment correlation
## 
## data:  data_ara0.0005$generations and data_ara0.0005$day
## t = 1.4182, df = 63, p-value = 0.1611
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.07105853  0.40252027
## sample estimates:
##       cor 
## 0.1758889

## 
##  Pearson's product-moment correlation
## 
## data:  data_ara0.05$generations and data_ara0.05$day
## t = 3.2985, df = 62, p-value = 0.001613
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1553190 0.5773521
## sample estimates:
##       cor 
## 0.3863769

## 
##  Spearman's rank correlation rho
## 
## data:  correlation.AUCvsgenerations$generations and correlation.AUCvsgenerations$AUC
## S = 1.4647e+10, p-value = 0.0001668
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.0558932

Analysis of PCN

Also, we calculated the PCN by qPCR for the days 10, 20 and 30 for all samples (we run 3 qPCRs for each line, i.e. 9 technical replicates) and plot the PCN results against each line, by day and by arabinose concentration:

## [1] 0e+00 0e+00 0e+00 0e+00 5e-04 5e-04

### PCN statistics

This is the statistic analysis for the PCN along the evolution experiment. To see data distribution, we do a Shapiro test. I do an anova to see if there are significant differences, by day and conc. from now With the ANOVA analysis we see that there are significant differences between the arabinose concentrations (p-value < 2e-16), but not with respect to the day or the interaction of the day and the arabinose concentration.

## 
##  Pearson's product-moment correlation
## 
## data:  alldata_filtered$pcn and alldata_filtered$day
## t = -1.9568, df = 24, p-value = 0.0621
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.66302404  0.01916846
## sample estimates:
##        cor 
## -0.3709377

## 
##  Shapiro-Wilk normality test
## 
## data:  pcn
## W = 0.81074, p-value < 2.2e-16

##                Df Sum Sq Mean Sq F value   Pr(>F)    
## ara             2 549718  274859 4434.09  < 2e-16 ***
## day             1   1106    1106   17.84 2.65e-05 ***
## linea          11  10956     996   16.07  < 2e-16 ***
## ara:day         2   2634    1317   21.24 9.85e-10 ***
## ara:linea      22  22052    1002   16.17  < 2e-16 ***
## day:linea      11   7135     649   10.46  < 2e-16 ***
## ara:day:linea  22  15745     716   11.55  < 2e-16 ***
## Residuals     866  53681      62                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

C) Sequencing analysis

BLAST Analysis

Once the strains from the last day of evolution and the ancestors have been sequenced, we proceed to analyze the mutations that have accumulated throughout the evolution experiment.

First we do a BLAST of the contig where the plasmid is found in each strain, against the contig where the plasmid is almost closed (contig 66 - pTAday0). With the coverage of these contigs we can calculate the PCN.

## 
##  Pearson's product-moment correlation
## 
## data:  PCN_qPCR_cov$PCN_qPCR and PCN_qPCR_cov$PCN_cov
## t = 8.024, df = 28, p-value = 9.742e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6786587 0.9187601
## sample estimates:
##       cor 
## 0.8348156

## Breseq with populations - all the info

MUTATION ANALYSIS: In Breseq outputs, mutations can be divided by: - RA evidence: that supports a mutated base or indel more than the reference sequence, but without sufficient support to pass the cutoff threshold - JC evidence: for a set number of the highest scoring junctions that do not pass all test criteria.

So, we select only the RA mutations

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

# Breseq por poblaciones de todas las cepas de pTA frente a pTAday0 (chr y plasmido)

library(readxl)

breseq.pop.ptaday0 <- read_excel("~/MEGA/Tesis/Mutation_accumulation/Bioinfo/Breseq_results_pop/quality20/summary_table_Breseq_20_pop_p.xlsx", sheet = 2)

# breseq.pop.ptaday0 = read.delim(file = "~/MEGA/Tesis/Mutation_accumulation/Bioinfo/Breseq_results_pop/quality30/page2", header = TRUE, sep = "", dec = ",") 

breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>% 
  pivot_longer(cols = c(WTCHG_914003_71215097:WTCHG_914003_71825158)) 

cepas <- read.table(file = "~/MEGA/Tesis/Mutation_accumulation/Bioinfo/cepas", header = TRUE, sep = "", dec = ",")

cepas <- cepas %>% separate(Sample, into = c("Cosa", "Sample"), sep = (":"))

breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>%
  mutate(strain = name) %>% 
  left_join(cepas)

breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>% mutate(name = Sample) %>% select(-Cosa, -Count)

#Mutaciones totales. Para que entienda los signos hay que poner expresiones regulares, con esto les dices que no es una expresión regular (\\simbol)
 # Si queremos analizar un tipo de mutaciones en concreto, lo corremos con el filtro en mutation_type
Filter_pop=0
breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>%  
  filter(value != "0") %>% 
  mutate(mutation_type = ifelse(grepl("→", Mutation), yes = "SNP", 
                                no = ifelse(grepl("\\+", Mutation), yes = "INS", 
                                            no = ifelse(grepl("bp", Mutation), yes = "DEL",
                                                        no= NA)))) 
# %>%  filter(mutation_type=="SNP")

# quitamos las mutaciones no fijadas del chr
 # Tenemos que haber cambiado el nombre de la columna Seq ID por ID. En este caso he cambiado Position para llamarlo ID porque en ID solo me salia el nombre de un contig.
chr.pop = breseq.pop.ptaday0 %>% 
  filter(ID != "D_66") %>% 
   mutate(locus = "chr") %>% filter(Annotation>Filter_pop)


plasmid.pop  = breseq.pop.ptaday0 %>% 
  filter(ID == "D_66" ) %>% 
  mutate(locus = "plasmid") %>%   filter(Annotation>Filter_pop)

mutationsporlocus.pop = rbind(chr.pop, plasmid.pop)

mutationsporlocus.pop = mutationsporlocus.pop %>% mutate(genotype = ifelse(grepl("p", name), yes = "pta", no = ifelse(grepl("w", name), yes = "wt", 
                                                                                                          no = ifelse(grepl("W", name), yes = "wt", no = NA))))


mutationsporlocus.pop = mutationsporlocus.pop %>% mutate(ara = ifelse(grepl("p0", name), yes = "0", no = ifelse(grepl("p50", name), yes = "0.05", 
                                                                                                    no = ifelse(grepl("p5", name), yes = "0.0005", 
                                                                                                    no = ifelse(grepl("pTAday0", name), yes = "ancestro", 
                                                                                                    no =  ifelse(grepl("w0", name), yes = "0", 
                                                                                                    no = ifelse(grepl("w50", name), yes = "0.05", 
                                                                                                    no = ifelse(grepl("w5", name), yes = "0.0005", 
                                                                                                    no = ifelse(grepl("WTday0", name), yes = "ancestro", no = NA)))))))))


mutationsporlocus.pop = mutationsporlocus.pop %>% 
  filter(name != "p5B1" & name != "p0B3" & name != "w0B1" & name != "w50A4" & name != "w5B4" & name != "p0B1" & name != "p5A2" & name != "p50A4"  & name != "w0A2" & name != "p5C4" & name != "w0A4" & name != "p5B3" ) %>% 
  separate(ID, into = c("cosa", "contig"), sep = "_", remove = F) %>% 
  select(-cosa) 

mutationsporlocus.pop$Position= str_remove(mutationsporlocus.pop$Position, ",")
mutationsporlocus.pop$Position = as.numeric(mutationsporlocus.pop$Position)
mutationsporlocus.pop$contig = as.numeric(mutationsporlocus.pop$contig)



 # Limpiamos las mutaciones compartidas con los ancestros:

mutationsporlocus.pop.ancestro = mutationsporlocus.pop %>% 
  filter(ara == "ancestro") 

mutationsporlocus.pop %>% 
  filter(ara != "ancestro")

# Para filtrar en tabla_1_total_pop por porcentaje de fijacion:

mutation.pop.limpias = anti_join(mutationsporlocus.pop, mutationsporlocus.pop.ancestro, by=c("ID", "Position", "Description", "Gene")) %>% 
    mutate(value= as.numeric(value)) 
    # %>% filter(Evidence == "RA")
    # filter(value > 5.9)

genome_size=4639675
# pTA44 del dia cero cerrado por nanopore
plasmid_size=10319
# Mediana PCN día 30 (qPCR)
pcn_ara_0<- 1.160
pcn_ara_0.0005<- 11.865
pcn_ara_0.05<- 56.400

 # Calculamos el numero de replicas de cada condición 
replicas.pop = mutation.pop.limpias %>% 
  select(genotype, ara, name) %>% 
  distinct() %>% 
  group_by(genotype, ara) %>%
  summarise(N=n())

Mutation.per.replica.pop <- left_join(mutation.pop.limpias, replicas.pop) 

Mutation.per.replica.pop<- Mutation.per.replica.pop %>% 
  full_join(data.frame(contigs=names(myfiles[["ptaday0_illum.fasta"]])) %>%
  separate(contigs, into=c("Node", "cosa"), sep="_length_") %>% 
  separate(cosa, into=c("node_length", "cov"), sep="_cov_") %>% 
              separate(Node, into=c("cosa", "contig")) %>%
              select(contig, node_length) %>%
              mutate(contig=as.numeric(contig)))

# Aquí calculamos las posiciones sumando al contig siguiente el numero de bases del anterior y limpiamos los contigs pequeños
Mutation.per.replica.pop <- Mutation.per.replica.pop %>% 
  filter(Evidence == "RA") %>% 
  filter(contig!=66) %>% 
  select(contig, node_length) %>%
  arrange(contig) %>% 
  unique() %>% 
  mutate(cumulative_contig=cumsum(node_length)) %>% 
  mutate(cumulative_contig=cumulative_contig-as.numeric(node_length)) %>% 
  right_join(Mutation.per.replica.pop) %>% 
  mutate(cumulative_contig=ifelse(contig==66, 0, cumulative_contig)) %>% 
  mutate(cumulative_pos=cumulative_contig+Position) %>% 
  filter(contig < 116)

# We filter only SNPs with the Evidence 
total.mutaciones.pop = Mutation.per.replica.pop %>% 
  filter(Evidence == "RA") %>% 
  # # Para filtrar el porcentaje de fijación
  # filter(value > 90) %>% 
  group_by(ara, locus, genotype,N) %>% 
  filter(!is.na(ara)) %>% 
  summarise(Mutaciones.totales = n()) %>% 
  left_join(replicas.pop)

mutationsporlocus.pop$Annotation = as.numeric(mutationsporlocus.pop$Annotation)
Mutation.per.replica.pop$value = as.numeric(Mutation.per.replica.pop$value)

mut.total.pop = Mutation.per.replica.pop %>%
filter(Evidence == "RA") %>% 
  # filter(value > 6) %>% 
  group_by(ara, locus, genotype, N) %>%
  filter(!is.na(mutation_type)) %>%
  summarise(Mutaciones.totales = n()) 
#
mut.total.pop = mut.total.pop %>%
  mutate(SNPs.totales.per.rep = Mutaciones.totales/N)

Breseq - THRESHOLD to clean low quality mutations

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

mutation.pop.limpias %>%  filter(locus == "chr") %>% 
  ggplot(aes (x= value)) +
  geom_histogram() +
  facet_wrap(~ara)

mut.threshold <- mutation.pop.limpias %>% 
  # mutate(strain = Sample) %>% 
  # filter(locus == "plasmid") %>% 
  mutate(ara = as.factor(ara)) 

PCN_qPCR_cov <- PCN_qPCR_cov %>% 
  mutate(ara = as.factor(ara)) %>% 
  mutate(name = strain)

mut.threshold <- mut.threshold %>%  
  left_join(PCN_qPCR_cov, by =c("name")) %>% 
  mutate(ara = ara.x) %>% 
  # mutate(Sample = Sample.x) %>% 
  # select(-Sample.x) %>% 
  mutate(strain = strain.x) %>% 
  select(-strain.x)

mut.threshold <- mut.threshold %>% mutate(threshold = 1/PCN_qPCR) %>% mutate(threshold.porc = threshold*100)

mut.threshold <-mut.threshold %>% mutate(mayor.threshold = ifelse(value>threshold.porc, yes = "SI", no = "NO")) %>% mutate(ara = as.character(ara))

mut.threshold %>% 
ggplot(aes (x= value)) +
  geom_histogram() +
  facet_wrap(~ara.x~mayor.threshold)

mut.threshold %>% group_by(locus, ara, Sample) %>% summarise(N=n())

Statistics - lines with 0 mutations

We also did statistical analysis to see if there are significant differences between the lines with 0 mutations. There are statistically significant differences between 0-0.0005% and 0-0.05%

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

# Vamos a ver si la proporcion de no mutantes es diferente a medida que aumentas el PCN.

mutants <- read.table(file = "~/MEGA/Tesis/Mutation_accumulation/Bioinfo/contingencia", header = TRUE, sep = "", dec = ",")

ara0vs.0.0005 <- mutants %>% select(X0,X0.0005) 
ara0.0005vs.0.05 <- mutants %>% select(X0.0005,X0.05)  
ara0vs.0.05 <- mutants %>% select(X0,X0.05) 

# Chi-square 

# 0 vs 0.0005 
chisq.test(ara0vs.0.0005)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  ara0vs.0.0005
## X-squared = 3.8503, df = 1, p-value = 0.04974

# 0 vs 0.05 
chisq.test(ara0.0005vs.0.05)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  ara0.0005vs.0.05
## X-squared = 0.026989, df = 1, p-value = 0.8695

#0.0005 vs 0.05
chisq.test(ara0vs.0.05)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  ara0vs.0.05
## X-squared = 8.6148, df = 1, p-value = 0.003334

# Fisher

# 0 vs 0.0005 
fisher.test(ara0vs.0.0005)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  ara0vs.0.0005
## p-value = 0.02482
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    1.033394 832.598699
## sample estimates:
## odds ratio 
##   13.51875

# 0 vs 0.05 
fisher.test(ara0.0005vs.0.05)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  ara0.0005vs.0.05
## p-value = 0.4211
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.03525615        Inf
## sample estimates:
## odds ratio 
##        Inf

#0.0005 vs 0.05
fisher.test(ara0vs.0.05)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  ara0vs.0.05
## p-value = 0.001032
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.8631    Inf
## sample estimates:
## odds ratio 
##        Inf

Breseq - Total mutations per generation

Breseq - Total mutations per strain

THE plot

# Mutations per generation and locus
p1 <- ggplot(data = prueba %>% filter(locus=="plasmid", variable =="Mutations.per.gen.and.locus", Copy_number<2000), aes(x = Copy_number, y = value))+
  # geom_smooth( method="gam",color="grey", fill="lightgrey", alpha=0.2, fullrange=TRUE) +
  geom_smooth(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutations.per.gen.and.locus"), aes(x = Copy_number, y = value),
                 method="lm", 
               formula=y~log(x),
              color="grey", alpha=0.02, fullrange=TRUE, linetype="dotted", size=.7) +
    # geom_point(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutations.per.gen.and.locus"), aes(x = Copy_number, y = value), color="grey") +
  geom_smooth(method="lm", formula=y~log(x),color="grey", fill="lightgrey", alpha=0.4, fullrange=TRUE)+
  # geom_line(aes(x=Copy_number, y=value), color="lightgrey",linetype=3)+
  geom_point(aes(color=ara, fill=ara), size=2)+ 
  scale_y_log10(limits=c(3e-8, 3e-5))+
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_x_continuous(breaks=c(0,20,40,60, 80, 100, 120, 140), limits=c(0,100))+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(angle = 0,  hjust=0.5),
        legend.position = "none") +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
  labs( y="Mutations per generation and locus", x="Copy number")

# Mutations per generation and nt
a <- ggplot(data= prueba %>% filter(locus=="plasmid", variable =="Mutation.rate.per.gen.and.nt", Copy_number<2000), aes(x = Copy_number, y = value))+
  # geom_smooth( method="gam",color="grey", fill="lightgrey", alpha=0.2, fullrange=TRUE) +
  geom_smooth(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutation.rate.per.gen.and.nt"), aes(x = Copy_number, y = value),
                 method="lm", formula=y~log(x),color="gray", alpha=0.02, fullrange=TRUE, linetype="dotted", size=.7) +
  geom_smooth(method="lm", formula=y~log(x),color="gray", fill="lightgrey", alpha=0.2, fullrange=TRUE)+
  # geom_line(aes(x=Copy_number, y=value), color="lightgrey",linetype=3)+
  geom_point(aes(color=ara, fill=ara), size=2)+ 
  # geom_point(data= prueba %>% filter(locus=="plasmid", variable =="expected.NJs.per.gen.and.locus"), aes(x = Copy_number, y = value), color="grey") +
  scale_y_log10(limits=c(3e-8, 1e-5))+
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_x_continuous(breaks=c(0,20,40,60, 80, 100, 120, 140), limits=c(0,120))+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(angle = 0,  hjust=0.5),
        legend.position = "none") +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
  labs(y="Mutation rate (Mutations per generation and nt)", x="Copy number")

# CHR
# geom.chr <- ggplot(data = prueba %>% mutate(ara=as.numeric(ara)) %>% filter(locus=="chr", variable =="Mutations.per.gen.and.locus", Copy_number<2000), aes(x = ara, y = value))+
#   # geom_smooth( method="gam",color="grey", fill="lightgrey", alpha=0.2, fullrange=TRUE) +
#   # geom_smooth(data= prueba %>% filter(locus=="chr", variable =="expected.mutations.per.gen.and.locus"), aes(x = ara, y = value),
#               #    method="lm", 
#               #  formula=y~log(x),
#               # color="grey", alpha=0.02, fullrange=TRUE, linetype="dotted", size=.7) +
#     # geom_point(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutations.per.gen.and.locus"), aes(x = Copy_number, y = value), color="grey") +
#   geom_smooth(method="lm", formula=y~log(x),color="grey", fill="lightgrey", alpha=0.4, fullrange=TRUE)+
#   # geom_line(aes(x=Copy_number, y=value), color="lightgrey",linetype=3)+
#   geom_point(aes(color=ara, fill=ara), size=2)+ 
#   scale_y_log10(limits=c(3e-8, 3e-5))+
#   annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
#   scale_x_continuous() +
#   theme_bw() +
#   theme(panel.background = element_blank(), panel.grid = element_blank(), 
#         aspect.ratio = 1, 
#         #legend.position = "none",
#         strip.background = element_blank(),
#         axis.text.x = element_text(angle = 45,  hjust=1),
#         legend.position = "none") +
#   scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
#   scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
#   labs(title="Mutation rate", y="Mutations per generation and locus", x="Copy number")

boxplot.chr <- table_1_good_total_pop %>% filter(locus=="chr") %>% 
  ggplot() +
  geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, outlier.color = NULL, outlier.fill = NULL) +
  geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, width=0.15,  size=1.5) +
  scale_y_log10()+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(size = 10, hjust=0.5),
        axis.text.y = element_text(size = 10)) +
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
  xlab("Arabinose (%)") + ylab("Mutations per generation and locus") 

boxplot.chr

# Boxplot con el chr de fondo
boxplot_chr_plasmid <- table_1_good_total_pop %>% filter(locus=="plasmid") %>% 
  ggplot() +
  geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3) +
  geom_boxplot(data = table_1_good_total_pop %>% filter(locus=="chr"), aes(x=ara, y= Mutations.per.gen.and.locus, color="Chromosome", fill="Chromosome"), alpha=0.3, outlier.color = NULL, outlier.fill = NULL)+
  geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, width=0.15,  size=1.5) +
  scale_y_log10(limits=c(3e-8, 1e-5))+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
         axis.text.x = element_text(size = 10, hjust=0.5),
        axis.text.y = element_text(size = 10)) +
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_fill_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray")) +
  scale_color_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray"))+
  labs(y="Mutations per generation and locus", x = "Arabinose (%)")
boxplot_chr_plasmid

b <- table_1_good_total_pop %>% filter(locus=="plasmid") %>% 
  ggplot() +
  geom_boxplot(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color=ara, fill=ara), alpha=0.3) +
  geom_jitter(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color=ara, fill=ara), alpha=0.3, width=0.15,  size=1.5) +
  scale_y_log10(limits=c(3e-8, 1e-5))+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(hjust=0.5),
             legend.position = "none") +
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
  labs(y="Mutation rate (Mutations per generation and nt)", x = "Arabinose (%)")

ggarrange(b, a, align = "hv", nrow = 1)

# ggarrange(p1,b1)

# Boxplots Mutations.per.gen.nt y Mutations.per.gen.locus
table_1_good_total_pop %>% filter(locus=="plasmid") %>% 
  ggplot() +
  geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.4) +
  geom_boxplot(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color="lightgray", fill="lightgray"), alpha=0.2, outlier.color = NULL, outlier.fill = NULL)+
  geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.4, width=0.15,  size=1.5) +
  geom_jitter(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color="lightgray", fill="lightgray"), alpha=0.2, width=0.15,  size=1.5) +
  scale_y_log10(limits=c(3e-8, 1e-5))+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
         axis.text.x = element_text(size = 10, hjust=0.5),
        axis.text.y = element_text(size = 10)) +
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_fill_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray")) +
  scale_color_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray"))+
  labs(y="Mutations per generation and locus", x = "Arabinose (%)")

## Breseq - Correlation PCN vs mutation rate The mutation rate is correlated with the PCN:

# Correlation mutation rate vs copy number

cor.test(table_1_good_total_pop$Mutations.per.gen.and.locus, table_1_good_total_pop$Copy_number, method= c("pea"))

## 
##  Pearson's product-moment correlation
## 
## data:  table_1_good_total_pop$Mutations.per.gen.and.locus and table_1_good_total_pop$Copy_number
## t = 8.9298, df = 56, p-value = 2.343e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6335884 0.8554025
## sample estimates:
##       cor 
## 0.7664519

cor.test(table_1_good_total_pop$Mutations.per.gen.and.locus, table_1_good_total_pop$Copy_number, method= c("spea"))

## 
##  Spearman's rank correlation rho
## 
## data:  table_1_good_total_pop$Mutations.per.gen.and.locus and table_1_good_total_pop$Copy_number
## S = 7192.4, p-value = 6.158e-13
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##      rho 
## 0.778756

# 
# cor.test(table_1_good_total_pop$Mutation.rate.per.gen.and.nt_Norm, table_1_good_total_pop$Copy_number, method= c("pea")) 


# write.table(table_1_good_total_pop, file="~/MEGA/Tesis/Mutation_accumulation/Bioinfo/Tabla_1_total_breseq", 
#            sep="\t", quote = F, row.names = T, col.names = T)
# 
# write.table(table_1_good_pop, file="~/MEGA/Tesis/Mutation_accumulation/Bioinfo/ConAdaptadores/Tabla_1_breseq", 
#             sep="\t", quote = F, row.names = T, col.names = T)

Breseq with populations - only fixed mutations

Now, we analyze only the fixed mutations (with a fixation percentage of 100%)

## Breseq pop - Mutation rate statistics

There are also statistically significant differences in the mutation rate of the plasmid between 0-0.0005 and 0-0.05 and no differences between the conditions in the chromosome:

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

# statatistics.breseq <- table_1_good_total_pop %>% group_by(ara) %>% summarise(mean.mut.rate=mean(Mutation.rate.per.gen.and.nt)) %>% view()

statatistics.breseq <- table_1_good_total_pop %>% filter(locus == "plasmid") %>% 
group_by(ara) %>%
  summarise(
    count = n(),
    mean = mean(Mutations.per.gen.and.locus, na.rm = TRUE),
    sd = sd(Mutation.rate.per.gen.and.nt, na.rm = TRUE),
    median = median(Mutation.rate.per.gen.and.nt, na.rm = TRUE),
    IQR = IQR(Mutation.rate.per.gen.and.nt, na.rm = TRUE)
  ) 
# install.packages("ggpubr")
library(ggpubr)

b1 <- table_1_good_total_pop %>% filter(locus=="plasmid") %>% 
  ggplot() +
  geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3) +
  geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, width=0.15,  size=1.5) +
  scale_y_log10(limits=c(3e-8, 3e-5))+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(hjust=1)) +
  annotation_logticks(sides = "l", outside = F,  long = unit(0.05, "cm")) +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
  labs(y="Mutations per generation and locus", xlab = "Ara")

b1

# Kruskal test para ver si hay diferencias significativas entre las distintas concentraciones de Ara en el plásmido
kruskal.test(Mutations.per.gen.and.locus ~ ara, data = table_1_good_total_pop %>% filter(locus == "plasmid"))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Mutations.per.gen.and.locus by ara
## Kruskal-Wallis chi-squared = 19.584, df = 2, p-value = 5.591e-05

pairwise.wilcox.test(table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "plasmid"], table_1_good_total_pop$ara[table_1_good_total_pop$locus == "plasmid"],
                 p.adjust.method = "BH")

## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "plasmid"] and table_1_good_total_pop$ara[table_1_good_total_pop$locus == "plasmid"] 
## 
##       0       5e-04
## 5e-04 6.9e-05 -    
## 0.05  1.7e-05 0.27 
## 
## P value adjustment method: BH

# Model

lm<- glm(log10(Mutations.per.gen.and.locus) ~ Copy_number , data=table_1_good_total_pop %>% filter(locus=="plasmid"))
summary(lm)

## 
## Call:
## glm(formula = log10(Mutations.per.gen.and.locus) ~ Copy_number, 
##     data = table_1_good_total_pop %>% filter(locus == "plasmid"))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.70672  -0.11163  -0.00462   0.24110   0.44292  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.143247   0.075589 -81.271  < 2e-16 ***
## Copy_number  0.007451   0.001831   4.069 0.000368 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.09053332)
## 
##     Null deviance: 3.9437  on 28  degrees of freedom
## Residual deviance: 2.4444  on 27  degrees of freedom
## AIC: 16.567
## 
## Number of Fisher Scoring iterations: 2

summary(anova(lm))

##        Df       Deviance       Resid. Df       Resid. Dev   
##  Min.   :1   Min.   :1.499   Min.   :27.00   Min.   :2.444  
##  1st Qu.:1   1st Qu.:1.499   1st Qu.:27.25   1st Qu.:2.819  
##  Median :1   Median :1.499   Median :27.50   Median :3.194  
##  Mean   :1   Mean   :1.499   Mean   :27.50   Mean   :3.194  
##  3rd Qu.:1   3rd Qu.:1.499   3rd Qu.:27.75   3rd Qu.:3.569  
##  Max.   :1   Max.   :1.499   Max.   :28.00   Max.   :3.944  
##  NA's   :1   NA's   :1

coefficients(lm)

##  (Intercept)  Copy_number 
## -6.143247050  0.007451323

# Kruskal test para el chromosome
kruskal.test(Mutations.per.gen.and.locus ~ ara, data = table_1_good_total_pop %>% filter(locus == "chr"))

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Mutations.per.gen.and.locus by ara
## Kruskal-Wallis chi-squared = 0.4806, df = 2, p-value = 0.7864

pairwise.wilcox.test(table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "chr"], table_1_good_total_pop$ara[table_1_good_total_pop$locus == "chr"],
                 p.adjust.method = "BH")

## 
##  Pairwise comparisons using Wilcoxon rank sum exact test 
## 
## data:  table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "chr"] and table_1_good_total_pop$ara[table_1_good_total_pop$locus == "chr"] 
## 
##       0    5e-04
## 5e-04 0.76 -    
## 0.05  0.76 0.76 
## 
## P value adjustment method: BH

Breseq - Total mutations and PCN per strain

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

# PLOT SNPs per gen / PCN

PCN_qPCR_cov$ara<- as.factor(PCN_qPCR_cov$ara)
table_1_good_total_pop$ara<- as.factor(table_1_good_total_pop$ara)

# Mutations.and.PCNs.pop <- left_join(table_1_good_total_pop, PCN_qPCR_cov, by = c("ara", "strain")) %>% 
#   filter(genotype == "pta")

Mutations.and.PCNs.pop <- table_1_good_total_pop %>%
  filter(genotype == "pta")

# Plot con PCN COVERAGE
Mutations.and.PCNs.pop %>%
  filter(locus=="plasmid") %>%
               group_by(ara, locus) %>%
               summarise(cosa=sum(Mutations.per.generation, na.rm=T),
                         mean_PCN=mean(PCN_cov, na.rm=T),
                         sd_pcn=sd(PCN_cov, na.rm=T),
                         mean_mutations=mean(Mutations.per.generation, na.rm=T),
                         sd_mut=sd(Mutations.per.generation, na.rm=T))%>%
  mutate(ara = as.factor(ara)) %>%
  ggplot() +
  geom_point(aes(x=PCN_cov, y=Mutations.per.generation, color=ara),
             data=Mutations.and.PCNs.pop %>% filter(locus=="plasmid"),
             alpha=.5) +
  geom_point(aes(x=mean_PCN, y=mean_mutations, color=ara, fill=ara),
             shape=23,
           #  alpha=.7,
             size=4)+
  geom_errorbar(aes(x=mean_PCN, ymin=mean_mutations-sd_mut, ymax=mean_mutations+sd_mut, color=ara),
             width=0)+
  geom_errorbarh(aes(xmin=mean_PCN-sd_pcn,xmax=mean_PCN+sd_pcn, y=mean_mutations, color=ara),
             height=0)+
  scale_x_log10()+
  scale_y_log10()+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(),
        aspect.ratio = 1,
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(angle = 45,  hjust=1)) +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))

# Plot con PCN qPCR
Mutations.and.PCNs.pop %>%  
  filter(locus=="plasmid") %>% 
               group_by(ara, locus) %>% 
               summarise(cosa=sum(Mutations.per.generation, na.rm=T), 
                         mean_PCN=mean(PCN_qPCR, na.rm=T),
                         sd_pcn=sd(PCN_qPCR, na.rm=T),
                         mean_mutations=mean(Mutations.per.generation, na.rm=T),
                         sd_mut=sd(Mutations.per.generation, na.rm=T))%>% 
  mutate(ara = as.factor(ara)) %>% 
  ggplot() +
  geom_point(aes(x=PCN_qPCR, y=Mutations.per.generation, color=ara),
             data=Mutations.and.PCNs.pop %>% filter(locus=="plasmid"),
             alpha=.5) +
  geom_point(aes(x=mean_PCN, y=mean_mutations, color=ara, fill=ara), 
             shape=23, 
           #  alpha=.7, 
             size=4)+
  geom_errorbar(aes(x=mean_PCN, ymin=mean_mutations-sd_mut, ymax=mean_mutations+sd_mut, color=ara),
             width=0)+
  geom_errorbarh(aes(xmin=mean_PCN-sd_pcn,xmax=mean_PCN+sd_pcn, y=mean_mutations, color=ara),
             height=0)+
  scale_x_log10()+
  scale_y_log10()+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(angle = 45,  hjust=1)) +
  scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))

## Mutational Spectrum The largest type of mutations that accumulate are SNPs, both in the plasmid and in the chromosome. In the mutational spectrum, there are more transitions than transversions, this is typical of -MutS strains.

Transitions: A:G, G:A, C:T, T:C
Transversions: A:C, C:A, G:T, T:G

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

snps.pop <- mutation.pop.limpias %>% 
  filter(mutation_type == "SNP")

snps.pop %>% 
  group_by(genotype, locus, Mutation) %>% 
  summarise(N=n())

AT.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus, Mutation) %>%
  filter(Mutation=="A→T") 
  # summarise(Nmutations=n()) 

AG.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="A→G") 
  # summarise(Nmutations=n()) 

AC.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="A→C") 
  # summarise(Nmutations=n()) 

TC.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="T→C") 
  # summarise(Nmutations=n()) 

TA.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="T→A") 
  # summarise(Nmutations=n()) 

TG.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="T→G") 
  # summarise(Nmutations=n())

GA.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="G→A") 
  # summarise(Nmutations=n()) 

GC.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="G→C")
  # summarise(Nmutations=n()) 

GT.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="G→T") 
  # summarise(Nmutations=n()) 

CT.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="C→T") 
  # summarise(Nmutations=n())

CA.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="C→A") 
  # summarise(Nmutations=n()) 

CG.pop<- snps.pop %>% 
  # group_by(ara, genotype, locus,Mutation) %>%
  filter( Mutation=="C→G")  
  # summarise(Nmutations=n()) 

MutationalSpectrum.pop <- rbind(AT.pop, AC.pop, TA.pop, TG.pop, CA.pop, CG.pop, GC.pop, GT.pop, AG.pop, CT.pop, GA.pop, TC.pop)

summary.mutational.spectrum <- snps.pop %>% 
  group_by(genotype, ara, locus) %>% 
  summarise(N.snps.por.ara=n()) 

# position<-c("A:G","C:T", "G:A", "T:C", "T:G", "A:C", "A:T", "C:A", "C:G", "G:C", "G:T","T:A")
  

# MutationalSpectrum.pop %>% ungroup() %>% 
#   # filter(locus =="plasmid") %>% 
#   # left_join(summary.mutational.spectrum) %>%
#   # mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
#   ggplot() +
#   geom_density(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) +  # Intercambio x y y
#   facet_wrap(~locus, scales= "free") +
#   # scale_x_discrete(limits=position) +
#   theme_bw() +
#   theme(
#     panel.background = element_blank(), 
#     panel.grid = element_blank(), 
#     aspect.ratio = 1, 
#     # legend.position = "none",
#     strip.background = element_blank(),
#     axis.text.x = element_text(size=11, hjust=0.5),
#     axis.text.y = element_text(size=11)) +
#   scale_color_manual(values=c("#a666cbff", "#95cea4ff")) +
#   scale_fill_manual(values=c("#a666cbff", "#9fd3adff")) +
#   coord_flip() +
#   xlab("Type of mutation") +
#   ylab("Count") 

MS <- MutationalSpectrum.pop %>% ungroup() %>% 
  # filter(locus =="plasmid") %>% 
  # left_join(summary.mutational.spectrum) %>%
  # mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
  ggplot() +
  geom_bar(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) +  # Intercambio x y y
  facet_wrap(~locus, scales= "free_x") +
  # scale_x_discrete(limits=position) +
  theme_bw() +
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(), 
    aspect.ratio = 1, 
    legend.position = "none",
    strip.background = element_blank(),
    axis.text.x = element_text(size=11, hjust=0.5),
    axis.text.y = element_text(size=11)) +
  scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
  scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
  coord_flip() +
  xlab("Type of mutation") +
  ylab(" ") +
  labs(title = "All mutations")

# Only fixed mutations: 

fixed.MS <- MutationalSpectrum.pop %>% ungroup() %>% 
  filter(value == 100) %>%
  # left_join(summary.mutational.spectrum) %>%
  # mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
  ggplot() +
  geom_bar(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) +  # Intercambio x y y
  facet_wrap(~locus, scales= "free_x") +
  # scale_x_discrete(limits=position) +
  theme_bw() +
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(), 
    aspect.ratio = 1, 
    legend.position = "none",
    strip.background = element_blank(),
    axis.text.x = element_text(size=11, hjust=0.5),
    axis.text.y = element_text(size=11)) +
  scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
  scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
  coord_flip() +
  xlab("Type of mutation") +
  ylab("Count") +
  labs(title = "Only fixed mutations")


fixed.MS <- MutationalSpectrum.pop %>% ungroup() %>% 
  filter(value == 100) %>%
  # left_join(summary.mutational.spectrum) %>%
  # mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
  ggplot() +
  geom_bar(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) +  # Intercambio x y y
  facet_wrap(~locus, scales= "free_x") +
  # scale_x_discrete(limits=position) +
  theme_bw() +
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(), 
    aspect.ratio = 1, 
    legend.position = "none",
    strip.background = element_blank(),
    axis.text.x = element_text(size=11, hjust=0.5),
    axis.text.y = element_text(size=11)) +
  scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
  scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
  coord_flip() +
  xlab("Type of mutation") +
  ylab("Count") +
  labs(title = "Only fixed mutations")

# total mutaciones chr=3873
MS_fixed_chr <- MutationalSpectrum.pop %>% 
  filter(value == 100) %>%
  filter(locus=="chr") %>% 
  group_by(locus, Mutation) %>% 
  summarise(N=n()) %>% 
  mutate(percen=(N/3873)*100) 

# total mutaciones plasmid=15
MS_fixed_plasmid <- MutationalSpectrum.pop %>% 
  filter(value == 100) %>%
  filter(locus=="plasmid") %>% 
  group_by(locus, Mutation) %>% 
  summarise(N=-n()) %>% 
  mutate(percen=(N/15)*100) 

MS_a <- MS_fixed_chr %>% rbind(MS_fixed_plasmid) %>% 
  ggplot() +
  geom_col(aes(x=Mutation, y=percen, color=locus, fill=locus), alpha=0.4) +  # Intercambio x y y
  # facet_wrap(~locus, scales= "free_x") +
  scale_y_continuous(limits=c(-40,40))+
  theme_bw() +
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(), 
    aspect.ratio = 1, 
    legend.position = "none",
    strip.background = element_blank(),
    axis.text.x = element_text(size=11, hjust=0.5),
    axis.text.y = element_text(size=11)) +
  scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
  scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
  coord_flip() +
  xlab("Type of mutation") +
  ylab("Percentage of total mutations") +
  labs(title = "Only fixed mutations")

# ALL MUTATIONS

# total mutations in chr= 7196
MS_chr <- MutationalSpectrum.pop %>% 
  filter(locus=="chr") %>% 
  group_by(locus, Mutation) %>% 
  summarise(N=n()) %>% 
  mutate(percen=(N/7196)*100) 

# total mutaciones plasmid=278
MS_plasmid <- MutationalSpectrum.pop %>% 
  filter(locus=="plasmid") %>% 
  group_by(locus, Mutation) %>% 
  summarise(N=-n()) %>% 
  mutate(percen=(N/278)*100) 

MS_b <- MS_chr %>% rbind(MS_plasmid) %>% 
  ggplot() +
  geom_col(aes(x=Mutation, y=percen, color=locus, fill=locus), alpha=0.4) +  # Intercambio x y y
  # facet_wrap(~locus, scales= "free_x") +
  # scale_x_discrete(limits=position) +
    scale_y_continuous(limits=c(-40,40))+
  theme_bw() +
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(), 
    aspect.ratio = 1, 
    legend.position = "none",
    strip.background = element_blank(),
    axis.text.x = element_text(size=11, hjust=0.5),
    axis.text.y = element_text(size=11)) +
  scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
  scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
  coord_flip() +
  xlab("Type of mutation") +
  ylab("Percentage of total mutations") +
  labs(title = "All mutations")

ggarrange(MS_b, MS_a, align = "hv")

Mutation distribution

Now, we analyse the mutations distribution:

knitr::opts_chunk$set(warning = FALSE, message = FALSE) 

muchas.muts.plasmid.breseq <- Mutation.per.replica.pop %>% group_by(locus, Gene, cumulative_pos, ara) %>% 
  summarise(Nmuts=n()) 
  # filter(N>3) 

Mutation.per.replica.pop %>% 
   filter(Evidence == "RA") %>% 
#  filter(contig == 66) %>%
  filter(!is.na(Position)) %>% 
ggplot(aes(x = cumulative_pos, y = ara)) +
  geom_density_ridges(aes(color= ara, fill = ara), bandwidth= 50, alpha=0.5) +
 facet_wrap(~locus, scales="free") +
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(),
        aspect.ratio = 1,
        legend.position = "none",
        strip.background = element_blank(),
        # strip.background = element_rect(fill = "black"),
        # strip.text = element_text(color = "white"),
        axis.text.x = element_text( vjust = 0.5))+
    scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
    xlab("Position (bp)") + ylab("Density")

mut.threshold  %>% 
  filter(contig!=66) %>%
  filter(value == 100) %>% 
  select(contig, Length) %>%
  arrange(contig) %>% 
  unique() %>% 
  mutate(cumulative_contig=cumsum(Length)) %>% 
  mutate(cumulative_contig=cumulative_contig-as.numeric(Length)) %>% 
  right_join(Mutation.per.replica.pop) %>% 
  mutate(cumulative_contig=ifelse(contig==66, 0, cumulative_contig)) %>% 
  mutate(cumulative_pos=cumulative_contig+Position) %>% 
  filter(contig < 116)

# Genes con más mutaciones en plasmidos
muchas.muts.plasmid.breseq %>% filter(locus == "plasmid") %>% 
  ggplot() +
  geom_bar(mapping = aes(x=Gene, fill=ara, color=ara), alpha=0.3) +
  facet_wrap(~ara, scales="fixed")+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
        axis.text.x = element_text(angle = 45,  hjust=1))+
      scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))

muchas.muts.plasmid.breseq %>% filter(locus == "chr")  %>% filter(Nmuts>3) %>% 
  ggplot() +
  geom_bar(mapping = aes(x=Gene, fill=ara, color=ara), alpha=0.3) +
  facet_wrap(~ara, scales="fixed")+
  theme_bw() +
  theme(panel.background = element_blank(), panel.grid = element_blank(), 
        aspect.ratio = 1, 
        #legend.position = "none",
        strip.background = element_blank(),
         axis.text.x = element_text(angle = 90,  vjust=0.5))+
      scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
  scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))

# muchas.muts.plasmid.breseq %>% filter(locus == "chr") %>% filter(Nmuts>4) %>% 
#   filter(value == 100) %>% 
#   ggplot() +
#   geom_col(mapping = aes(x=Gene, y=Nmuts, fill=ara, color=ara), alpha=0.3) +
#   facet_wrap(~ara, scales="fixed")+
#   theme_bw() +
#   theme(panel.background = element_blank(), panel.grid = element_blank(), 
#         aspect.ratio = 1, 
#         #legend.position = "none",
#         strip.background = element_blank(),
#         axis.text.x = element_text(angle = 90,  vjust=0.5))+
#       scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
#   scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))

MA