First, we selected the appropriate concentrations of arabinose based on the PCN and the fitness.
##Arabinose selection
## Df Sum Sq Mean Sq F value Pr(>F)
## ara 4 168900 42225 94.17 <2e-16 ***
## Residuals 243 108965 448
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After the evolution experiment, we calculated the number of generations for each line each day of the evolution.
##
## Pearson's product-moment correlation
##
## data: datalimpia$generations and datalimpia$day
## t = 2.9393, df = 448, p-value = 0.00346
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.04569129 0.22709974
## sample estimates:
## cor
## 0.1375488
##
## Pearson's product-moment correlation
##
## data: data_ara0$generations and data_ara0$day
## t = 0.27202, df = 90, p-value = 0.7862
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1771963 0.2321162
## sample estimates:
## cor
## 0.02866135
##
## Pearson's product-moment correlation
##
## data: data_ara0.0005$generations and data_ara0.0005$day
## t = 0.56582, df = 69, p-value = 0.5734
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1680085 0.2965608
## sample estimates:
## cor
## 0.06795902
##
## Pearson's product-moment correlation
##
## data: data_ara0.05$generations and data_ara0.05$day
## t = 0.48033, df = 97, p-value = 0.6321
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1501432 0.2437800
## sample estimates:
## cor
## 0.04871251
##
## Pearson's product-moment correlation
##
## data: data_ara0$generations and data_ara0$day
## t = 1.5921, df = 57, p-value = 0.1169
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.05251048 0.43921970
## sample estimates:
## cor
## 0.2063465
##
## Pearson's product-moment correlation
##
## data: data_ara0.0005$generations and data_ara0.0005$day
## t = 1.4182, df = 63, p-value = 0.1611
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.07105853 0.40252027
## sample estimates:
## cor
## 0.1758889
##
## Pearson's product-moment correlation
##
## data: data_ara0.05$generations and data_ara0.05$day
## t = 3.2985, df = 62, p-value = 0.001613
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1553190 0.5773521
## sample estimates:
## cor
## 0.3863769
##
## Spearman's rank correlation rho
##
## data: correlation.AUCvsgenerations$generations and correlation.AUCvsgenerations$AUC
## S = 1.4647e+10, p-value = 0.0001668
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.0558932
Also, we calculated the PCN by qPCR for the days 10, 20 and 30 for all samples (we run 3 qPCRs for each line, i.e. 9 technical replicates) and plot the PCN results against each line, by day and by arabinose concentration:
## [1] 0e+00 0e+00 0e+00 0e+00 5e-04 5e-04
### PCN statistics
This is the statistic analysis for the PCN along the evolution experiment. To see data distribution, we do a Shapiro test. I do an anova to see if there are significant differences, by day and conc. from now With the ANOVA analysis we see that there are significant differences between the arabinose concentrations (p-value < 2e-16), but not with respect to the day or the interaction of the day and the arabinose concentration.
##
## Pearson's product-moment correlation
##
## data: alldata_filtered$pcn and alldata_filtered$day
## t = -1.9568, df = 24, p-value = 0.0621
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.66302404 0.01916846
## sample estimates:
## cor
## -0.3709377
##
## Shapiro-Wilk normality test
##
## data: pcn
## W = 0.81074, p-value < 2.2e-16
## Df Sum Sq Mean Sq F value Pr(>F)
## ara 2 549718 274859 4434.09 < 2e-16 ***
## day 1 1106 1106 17.84 2.65e-05 ***
## linea 11 10956 996 16.07 < 2e-16 ***
## ara:day 2 2634 1317 21.24 9.85e-10 ***
## ara:linea 22 22052 1002 16.17 < 2e-16 ***
## day:linea 11 7135 649 10.46 < 2e-16 ***
## ara:day:linea 22 15745 716 11.55 < 2e-16 ***
## Residuals 866 53681 62
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Once the strains from the last day of evolution and the ancestors have been sequenced, we proceed to analyze the mutations that have accumulated throughout the evolution experiment.
First we do a BLAST of the contig where the plasmid is found in each strain, against the contig where the plasmid is almost closed (contig 66 - pTAday0). With the coverage of these contigs we can calculate the PCN.
##
## Pearson's product-moment correlation
##
## data: PCN_qPCR_cov$PCN_qPCR and PCN_qPCR_cov$PCN_cov
## t = 8.024, df = 28, p-value = 9.742e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6786587 0.9187601
## sample estimates:
## cor
## 0.8348156
## Breseq with populations - all the info
MUTATION ANALYSIS: In Breseq outputs, mutations can be divided by: - RA evidence: that supports a mutated base or indel more than the reference sequence, but without sufficient support to pass the cutoff threshold - JC evidence: for a set number of the highest scoring junctions that do not pass all test criteria.
So, we select only the RA mutations
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# Breseq por poblaciones de todas las cepas de pTA frente a pTAday0 (chr y plasmido)
library(readxl)
breseq.pop.ptaday0 <- read_excel("~/MEGA/Tesis/Mutation_accumulation/Bioinfo/Breseq_results_pop/quality20/summary_table_Breseq_20_pop_p.xlsx", sheet = 2)
# breseq.pop.ptaday0 = read.delim(file = "~/MEGA/Tesis/Mutation_accumulation/Bioinfo/Breseq_results_pop/quality30/page2", header = TRUE, sep = "", dec = ",")
breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>%
pivot_longer(cols = c(WTCHG_914003_71215097:WTCHG_914003_71825158))
cepas <- read.table(file = "~/MEGA/Tesis/Mutation_accumulation/Bioinfo/cepas", header = TRUE, sep = "", dec = ",")
cepas <- cepas %>% separate(Sample, into = c("Cosa", "Sample"), sep = (":"))
breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>%
mutate(strain = name) %>%
left_join(cepas)
breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>% mutate(name = Sample) %>% select(-Cosa, -Count)
#Mutaciones totales. Para que entienda los signos hay que poner expresiones regulares, con esto les dices que no es una expresión regular (\\simbol)
# Si queremos analizar un tipo de mutaciones en concreto, lo corremos con el filtro en mutation_type
Filter_pop=0
breseq.pop.ptaday0 <- breseq.pop.ptaday0 %>%
filter(value != "0") %>%
mutate(mutation_type = ifelse(grepl("→", Mutation), yes = "SNP",
no = ifelse(grepl("\\+", Mutation), yes = "INS",
no = ifelse(grepl("bp", Mutation), yes = "DEL",
no= NA))))
# %>% filter(mutation_type=="SNP")
# quitamos las mutaciones no fijadas del chr
# Tenemos que haber cambiado el nombre de la columna Seq ID por ID. En este caso he cambiado Position para llamarlo ID porque en ID solo me salia el nombre de un contig.
chr.pop = breseq.pop.ptaday0 %>%
filter(ID != "D_66") %>%
mutate(locus = "chr") %>% filter(Annotation>Filter_pop)
plasmid.pop = breseq.pop.ptaday0 %>%
filter(ID == "D_66" ) %>%
mutate(locus = "plasmid") %>% filter(Annotation>Filter_pop)
mutationsporlocus.pop = rbind(chr.pop, plasmid.pop)
mutationsporlocus.pop = mutationsporlocus.pop %>% mutate(genotype = ifelse(grepl("p", name), yes = "pta", no = ifelse(grepl("w", name), yes = "wt",
no = ifelse(grepl("W", name), yes = "wt", no = NA))))
mutationsporlocus.pop = mutationsporlocus.pop %>% mutate(ara = ifelse(grepl("p0", name), yes = "0", no = ifelse(grepl("p50", name), yes = "0.05",
no = ifelse(grepl("p5", name), yes = "0.0005",
no = ifelse(grepl("pTAday0", name), yes = "ancestro",
no = ifelse(grepl("w0", name), yes = "0",
no = ifelse(grepl("w50", name), yes = "0.05",
no = ifelse(grepl("w5", name), yes = "0.0005",
no = ifelse(grepl("WTday0", name), yes = "ancestro", no = NA)))))))))
mutationsporlocus.pop = mutationsporlocus.pop %>%
filter(name != "p5B1" & name != "p0B3" & name != "w0B1" & name != "w50A4" & name != "w5B4" & name != "p0B1" & name != "p5A2" & name != "p50A4" & name != "w0A2" & name != "p5C4" & name != "w0A4" & name != "p5B3" ) %>%
separate(ID, into = c("cosa", "contig"), sep = "_", remove = F) %>%
select(-cosa)
mutationsporlocus.pop$Position= str_remove(mutationsporlocus.pop$Position, ",")
mutationsporlocus.pop$Position = as.numeric(mutationsporlocus.pop$Position)
mutationsporlocus.pop$contig = as.numeric(mutationsporlocus.pop$contig)
# Limpiamos las mutaciones compartidas con los ancestros:
mutationsporlocus.pop.ancestro = mutationsporlocus.pop %>%
filter(ara == "ancestro")
mutationsporlocus.pop %>%
filter(ara != "ancestro")
# Para filtrar en tabla_1_total_pop por porcentaje de fijacion:
mutation.pop.limpias = anti_join(mutationsporlocus.pop, mutationsporlocus.pop.ancestro, by=c("ID", "Position", "Description", "Gene")) %>%
mutate(value= as.numeric(value))
# %>% filter(Evidence == "RA")
# filter(value > 5.9)
genome_size=4639675
# pTA44 del dia cero cerrado por nanopore
plasmid_size=10319
# Mediana PCN día 30 (qPCR)
pcn_ara_0<- 1.160
pcn_ara_0.0005<- 11.865
pcn_ara_0.05<- 56.400
# Calculamos el numero de replicas de cada condición
replicas.pop = mutation.pop.limpias %>%
select(genotype, ara, name) %>%
distinct() %>%
group_by(genotype, ara) %>%
summarise(N=n())
Mutation.per.replica.pop <- left_join(mutation.pop.limpias, replicas.pop)
Mutation.per.replica.pop<- Mutation.per.replica.pop %>%
full_join(data.frame(contigs=names(myfiles[["ptaday0_illum.fasta"]])) %>%
separate(contigs, into=c("Node", "cosa"), sep="_length_") %>%
separate(cosa, into=c("node_length", "cov"), sep="_cov_") %>%
separate(Node, into=c("cosa", "contig")) %>%
select(contig, node_length) %>%
mutate(contig=as.numeric(contig)))
# Aquí calculamos las posiciones sumando al contig siguiente el numero de bases del anterior y limpiamos los contigs pequeños
Mutation.per.replica.pop <- Mutation.per.replica.pop %>%
filter(Evidence == "RA") %>%
filter(contig!=66) %>%
select(contig, node_length) %>%
arrange(contig) %>%
unique() %>%
mutate(cumulative_contig=cumsum(node_length)) %>%
mutate(cumulative_contig=cumulative_contig-as.numeric(node_length)) %>%
right_join(Mutation.per.replica.pop) %>%
mutate(cumulative_contig=ifelse(contig==66, 0, cumulative_contig)) %>%
mutate(cumulative_pos=cumulative_contig+Position) %>%
filter(contig < 116)
# We filter only SNPs with the Evidence
total.mutaciones.pop = Mutation.per.replica.pop %>%
filter(Evidence == "RA") %>%
# # Para filtrar el porcentaje de fijación
# filter(value > 90) %>%
group_by(ara, locus, genotype,N) %>%
filter(!is.na(ara)) %>%
summarise(Mutaciones.totales = n()) %>%
left_join(replicas.pop)
mutationsporlocus.pop$Annotation = as.numeric(mutationsporlocus.pop$Annotation)
Mutation.per.replica.pop$value = as.numeric(Mutation.per.replica.pop$value)
mut.total.pop = Mutation.per.replica.pop %>%
filter(Evidence == "RA") %>%
# filter(value > 6) %>%
group_by(ara, locus, genotype, N) %>%
filter(!is.na(mutation_type)) %>%
summarise(Mutaciones.totales = n())
#
mut.total.pop = mut.total.pop %>%
mutate(SNPs.totales.per.rep = Mutaciones.totales/N)
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
mutation.pop.limpias %>% filter(locus == "chr") %>%
ggplot(aes (x= value)) +
geom_histogram() +
facet_wrap(~ara)
mut.threshold <- mutation.pop.limpias %>%
# mutate(strain = Sample) %>%
# filter(locus == "plasmid") %>%
mutate(ara = as.factor(ara))
PCN_qPCR_cov <- PCN_qPCR_cov %>%
mutate(ara = as.factor(ara)) %>%
mutate(name = strain)
mut.threshold <- mut.threshold %>%
left_join(PCN_qPCR_cov, by =c("name")) %>%
mutate(ara = ara.x) %>%
# mutate(Sample = Sample.x) %>%
# select(-Sample.x) %>%
mutate(strain = strain.x) %>%
select(-strain.x)
mut.threshold <- mut.threshold %>% mutate(threshold = 1/PCN_qPCR) %>% mutate(threshold.porc = threshold*100)
mut.threshold <-mut.threshold %>% mutate(mayor.threshold = ifelse(value>threshold.porc, yes = "SI", no = "NO")) %>% mutate(ara = as.character(ara))
mut.threshold %>%
ggplot(aes (x= value)) +
geom_histogram() +
facet_wrap(~ara.x~mayor.threshold)
mut.threshold %>% group_by(locus, ara, Sample) %>% summarise(N=n())
We also did statistical analysis to see if there are significant differences between the lines with 0 mutations. There are statistically significant differences between 0-0.0005% and 0-0.05%
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# Vamos a ver si la proporcion de no mutantes es diferente a medida que aumentas el PCN.
mutants <- read.table(file = "~/MEGA/Tesis/Mutation_accumulation/Bioinfo/contingencia", header = TRUE, sep = "", dec = ",")
ara0vs.0.0005 <- mutants %>% select(X0,X0.0005)
ara0.0005vs.0.05 <- mutants %>% select(X0.0005,X0.05)
ara0vs.0.05 <- mutants %>% select(X0,X0.05)
# Chi-square
# 0 vs 0.0005
chisq.test(ara0vs.0.0005)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ara0vs.0.0005
## X-squared = 3.8503, df = 1, p-value = 0.04974
# 0 vs 0.05
chisq.test(ara0.0005vs.0.05)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ara0.0005vs.0.05
## X-squared = 0.026989, df = 1, p-value = 0.8695
#0.0005 vs 0.05
chisq.test(ara0vs.0.05)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ara0vs.0.05
## X-squared = 8.6148, df = 1, p-value = 0.003334
# Fisher
# 0 vs 0.0005
fisher.test(ara0vs.0.0005)
##
## Fisher's Exact Test for Count Data
##
## data: ara0vs.0.0005
## p-value = 0.02482
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.033394 832.598699
## sample estimates:
## odds ratio
## 13.51875
# 0 vs 0.05
fisher.test(ara0.0005vs.0.05)
##
## Fisher's Exact Test for Count Data
##
## data: ara0.0005vs.0.05
## p-value = 0.4211
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.03525615 Inf
## sample estimates:
## odds ratio
## Inf
#0.0005 vs 0.05
fisher.test(ara0vs.0.05)
##
## Fisher's Exact Test for Count Data
##
## data: ara0vs.0.05
## p-value = 0.001032
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.8631 Inf
## sample estimates:
## odds ratio
## Inf
# Mutations per generation and locus
p1 <- ggplot(data = prueba %>% filter(locus=="plasmid", variable =="Mutations.per.gen.and.locus", Copy_number<2000), aes(x = Copy_number, y = value))+
# geom_smooth( method="gam",color="grey", fill="lightgrey", alpha=0.2, fullrange=TRUE) +
geom_smooth(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutations.per.gen.and.locus"), aes(x = Copy_number, y = value),
method="lm",
formula=y~log(x),
color="grey", alpha=0.02, fullrange=TRUE, linetype="dotted", size=.7) +
# geom_point(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutations.per.gen.and.locus"), aes(x = Copy_number, y = value), color="grey") +
geom_smooth(method="lm", formula=y~log(x),color="grey", fill="lightgrey", alpha=0.4, fullrange=TRUE)+
# geom_line(aes(x=Copy_number, y=value), color="lightgrey",linetype=3)+
geom_point(aes(color=ara, fill=ara), size=2)+
scale_y_log10(limits=c(3e-8, 3e-5))+
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_x_continuous(breaks=c(0,20,40,60, 80, 100, 120, 140), limits=c(0,100))+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(angle = 0, hjust=0.5),
legend.position = "none") +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
labs( y="Mutations per generation and locus", x="Copy number")
# Mutations per generation and nt
a <- ggplot(data= prueba %>% filter(locus=="plasmid", variable =="Mutation.rate.per.gen.and.nt", Copy_number<2000), aes(x = Copy_number, y = value))+
# geom_smooth( method="gam",color="grey", fill="lightgrey", alpha=0.2, fullrange=TRUE) +
geom_smooth(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutation.rate.per.gen.and.nt"), aes(x = Copy_number, y = value),
method="lm", formula=y~log(x),color="gray", alpha=0.02, fullrange=TRUE, linetype="dotted", size=.7) +
geom_smooth(method="lm", formula=y~log(x),color="gray", fill="lightgrey", alpha=0.2, fullrange=TRUE)+
# geom_line(aes(x=Copy_number, y=value), color="lightgrey",linetype=3)+
geom_point(aes(color=ara, fill=ara), size=2)+
# geom_point(data= prueba %>% filter(locus=="plasmid", variable =="expected.NJs.per.gen.and.locus"), aes(x = Copy_number, y = value), color="grey") +
scale_y_log10(limits=c(3e-8, 1e-5))+
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_x_continuous(breaks=c(0,20,40,60, 80, 100, 120, 140), limits=c(0,120))+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(angle = 0, hjust=0.5),
legend.position = "none") +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
labs(y="Mutation rate (Mutations per generation and nt)", x="Copy number")
# CHR
# geom.chr <- ggplot(data = prueba %>% mutate(ara=as.numeric(ara)) %>% filter(locus=="chr", variable =="Mutations.per.gen.and.locus", Copy_number<2000), aes(x = ara, y = value))+
# # geom_smooth( method="gam",color="grey", fill="lightgrey", alpha=0.2, fullrange=TRUE) +
# # geom_smooth(data= prueba %>% filter(locus=="chr", variable =="expected.mutations.per.gen.and.locus"), aes(x = ara, y = value),
# # method="lm",
# # formula=y~log(x),
# # color="grey", alpha=0.02, fullrange=TRUE, linetype="dotted", size=.7) +
# # geom_point(data= prueba %>% filter(locus=="plasmid", variable =="expected.mutations.per.gen.and.locus"), aes(x = Copy_number, y = value), color="grey") +
# geom_smooth(method="lm", formula=y~log(x),color="grey", fill="lightgrey", alpha=0.4, fullrange=TRUE)+
# # geom_line(aes(x=Copy_number, y=value), color="lightgrey",linetype=3)+
# geom_point(aes(color=ara, fill=ara), size=2)+
# scale_y_log10(limits=c(3e-8, 3e-5))+
# annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
# scale_x_continuous() +
# theme_bw() +
# theme(panel.background = element_blank(), panel.grid = element_blank(),
# aspect.ratio = 1,
# #legend.position = "none",
# strip.background = element_blank(),
# axis.text.x = element_text(angle = 45, hjust=1),
# legend.position = "none") +
# scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
# scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
# labs(title="Mutation rate", y="Mutations per generation and locus", x="Copy number")
boxplot.chr <- table_1_good_total_pop %>% filter(locus=="chr") %>%
ggplot() +
geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, outlier.color = NULL, outlier.fill = NULL) +
geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, width=0.15, size=1.5) +
scale_y_log10()+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size = 10, hjust=0.5),
axis.text.y = element_text(size = 10)) +
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
xlab("Arabinose (%)") + ylab("Mutations per generation and locus")
boxplot.chr
# Boxplot con el chr de fondo
boxplot_chr_plasmid <- table_1_good_total_pop %>% filter(locus=="plasmid") %>%
ggplot() +
geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3) +
geom_boxplot(data = table_1_good_total_pop %>% filter(locus=="chr"), aes(x=ara, y= Mutations.per.gen.and.locus, color="Chromosome", fill="Chromosome"), alpha=0.3, outlier.color = NULL, outlier.fill = NULL)+
geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, width=0.15, size=1.5) +
scale_y_log10(limits=c(3e-8, 1e-5))+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size = 10, hjust=0.5),
axis.text.y = element_text(size = 10)) +
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_fill_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray")) +
scale_color_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray"))+
labs(y="Mutations per generation and locus", x = "Arabinose (%)")
boxplot_chr_plasmid
b <- table_1_good_total_pop %>% filter(locus=="plasmid") %>%
ggplot() +
geom_boxplot(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color=ara, fill=ara), alpha=0.3) +
geom_jitter(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color=ara, fill=ara), alpha=0.3, width=0.15, size=1.5) +
scale_y_log10(limits=c(3e-8, 1e-5))+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(hjust=0.5),
legend.position = "none") +
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
labs(y="Mutation rate (Mutations per generation and nt)", x = "Arabinose (%)")
ggarrange(b, a, align = "hv", nrow = 1)
# ggarrange(p1,b1)
# Boxplots Mutations.per.gen.nt y Mutations.per.gen.locus
table_1_good_total_pop %>% filter(locus=="plasmid") %>%
ggplot() +
geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.4) +
geom_boxplot(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color="lightgray", fill="lightgray"), alpha=0.2, outlier.color = NULL, outlier.fill = NULL)+
geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.4, width=0.15, size=1.5) +
geom_jitter(aes(x=ara, y= Mutation.rate.per.gen.and.nt, color="lightgray", fill="lightgray"), alpha=0.2, width=0.15, size=1.5) +
scale_y_log10(limits=c(3e-8, 1e-5))+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size = 10, hjust=0.5),
axis.text.y = element_text(size = 10)) +
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_fill_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray")) +
scale_color_manual(values=c("#d02543ff", "#ddb13cff", "#5ebdb2ff", "lightgray"))+
labs(y="Mutations per generation and locus", x = "Arabinose (%)")
## Breseq - Correlation PCN vs mutation rate The mutation rate is
correlated with the PCN:
# Correlation mutation rate vs copy number
cor.test(table_1_good_total_pop$Mutations.per.gen.and.locus, table_1_good_total_pop$Copy_number, method= c("pea"))
##
## Pearson's product-moment correlation
##
## data: table_1_good_total_pop$Mutations.per.gen.and.locus and table_1_good_total_pop$Copy_number
## t = 8.9298, df = 56, p-value = 2.343e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6335884 0.8554025
## sample estimates:
## cor
## 0.7664519
cor.test(table_1_good_total_pop$Mutations.per.gen.and.locus, table_1_good_total_pop$Copy_number, method= c("spea"))
##
## Spearman's rank correlation rho
##
## data: table_1_good_total_pop$Mutations.per.gen.and.locus and table_1_good_total_pop$Copy_number
## S = 7192.4, p-value = 6.158e-13
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.778756
#
# cor.test(table_1_good_total_pop$Mutation.rate.per.gen.and.nt_Norm, table_1_good_total_pop$Copy_number, method= c("pea"))
# write.table(table_1_good_total_pop, file="~/MEGA/Tesis/Mutation_accumulation/Bioinfo/Tabla_1_total_breseq",
# sep="\t", quote = F, row.names = T, col.names = T)
#
# write.table(table_1_good_pop, file="~/MEGA/Tesis/Mutation_accumulation/Bioinfo/ConAdaptadores/Tabla_1_breseq",
# sep="\t", quote = F, row.names = T, col.names = T)
Now, we analyze only the fixed mutations (with a fixation percentage of 100%)
## Breseq pop - Mutation rate statistics
There are also statistically significant differences in the mutation rate of the plasmid between 0-0.0005 and 0-0.05 and no differences between the conditions in the chromosome:
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# statatistics.breseq <- table_1_good_total_pop %>% group_by(ara) %>% summarise(mean.mut.rate=mean(Mutation.rate.per.gen.and.nt)) %>% view()
statatistics.breseq <- table_1_good_total_pop %>% filter(locus == "plasmid") %>%
group_by(ara) %>%
summarise(
count = n(),
mean = mean(Mutations.per.gen.and.locus, na.rm = TRUE),
sd = sd(Mutation.rate.per.gen.and.nt, na.rm = TRUE),
median = median(Mutation.rate.per.gen.and.nt, na.rm = TRUE),
IQR = IQR(Mutation.rate.per.gen.and.nt, na.rm = TRUE)
)
# install.packages("ggpubr")
library(ggpubr)
b1 <- table_1_good_total_pop %>% filter(locus=="plasmid") %>%
ggplot() +
geom_boxplot(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3) +
geom_jitter(aes(x=ara, y= Mutations.per.gen.and.locus, color=ara, fill=ara), alpha=0.3, width=0.15, size=1.5) +
scale_y_log10(limits=c(3e-8, 3e-5))+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(hjust=1)) +
annotation_logticks(sides = "l", outside = F, long = unit(0.05, "cm")) +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))+
labs(y="Mutations per generation and locus", xlab = "Ara")
b1
# Kruskal test para ver si hay diferencias significativas entre las distintas concentraciones de Ara en el plásmido
kruskal.test(Mutations.per.gen.and.locus ~ ara, data = table_1_good_total_pop %>% filter(locus == "plasmid"))
##
## Kruskal-Wallis rank sum test
##
## data: Mutations.per.gen.and.locus by ara
## Kruskal-Wallis chi-squared = 19.584, df = 2, p-value = 5.591e-05
pairwise.wilcox.test(table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "plasmid"], table_1_good_total_pop$ara[table_1_good_total_pop$locus == "plasmid"],
p.adjust.method = "BH")
##
## Pairwise comparisons using Wilcoxon rank sum exact test
##
## data: table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "plasmid"] and table_1_good_total_pop$ara[table_1_good_total_pop$locus == "plasmid"]
##
## 0 5e-04
## 5e-04 6.9e-05 -
## 0.05 1.7e-05 0.27
##
## P value adjustment method: BH
# Model
lm<- glm(log10(Mutations.per.gen.and.locus) ~ Copy_number , data=table_1_good_total_pop %>% filter(locus=="plasmid"))
summary(lm)
##
## Call:
## glm(formula = log10(Mutations.per.gen.and.locus) ~ Copy_number,
## data = table_1_good_total_pop %>% filter(locus == "plasmid"))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.70672 -0.11163 -0.00462 0.24110 0.44292
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.143247 0.075589 -81.271 < 2e-16 ***
## Copy_number 0.007451 0.001831 4.069 0.000368 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.09053332)
##
## Null deviance: 3.9437 on 28 degrees of freedom
## Residual deviance: 2.4444 on 27 degrees of freedom
## AIC: 16.567
##
## Number of Fisher Scoring iterations: 2
summary(anova(lm))
## Df Deviance Resid. Df Resid. Dev
## Min. :1 Min. :1.499 Min. :27.00 Min. :2.444
## 1st Qu.:1 1st Qu.:1.499 1st Qu.:27.25 1st Qu.:2.819
## Median :1 Median :1.499 Median :27.50 Median :3.194
## Mean :1 Mean :1.499 Mean :27.50 Mean :3.194
## 3rd Qu.:1 3rd Qu.:1.499 3rd Qu.:27.75 3rd Qu.:3.569
## Max. :1 Max. :1.499 Max. :28.00 Max. :3.944
## NA's :1 NA's :1
coefficients(lm)
## (Intercept) Copy_number
## -6.143247050 0.007451323
# Kruskal test para el chromosome
kruskal.test(Mutations.per.gen.and.locus ~ ara, data = table_1_good_total_pop %>% filter(locus == "chr"))
##
## Kruskal-Wallis rank sum test
##
## data: Mutations.per.gen.and.locus by ara
## Kruskal-Wallis chi-squared = 0.4806, df = 2, p-value = 0.7864
pairwise.wilcox.test(table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "chr"], table_1_good_total_pop$ara[table_1_good_total_pop$locus == "chr"],
p.adjust.method = "BH")
##
## Pairwise comparisons using Wilcoxon rank sum exact test
##
## data: table_1_good_total_pop$Mutations.per.gen.and.locus[table_1_good_total_pop$locus == "chr"] and table_1_good_total_pop$ara[table_1_good_total_pop$locus == "chr"]
##
## 0 5e-04
## 5e-04 0.76 -
## 0.05 0.76 0.76
##
## P value adjustment method: BH
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
# PLOT SNPs per gen / PCN
PCN_qPCR_cov$ara<- as.factor(PCN_qPCR_cov$ara)
table_1_good_total_pop$ara<- as.factor(table_1_good_total_pop$ara)
# Mutations.and.PCNs.pop <- left_join(table_1_good_total_pop, PCN_qPCR_cov, by = c("ara", "strain")) %>%
# filter(genotype == "pta")
Mutations.and.PCNs.pop <- table_1_good_total_pop %>%
filter(genotype == "pta")
# Plot con PCN COVERAGE
Mutations.and.PCNs.pop %>%
filter(locus=="plasmid") %>%
group_by(ara, locus) %>%
summarise(cosa=sum(Mutations.per.generation, na.rm=T),
mean_PCN=mean(PCN_cov, na.rm=T),
sd_pcn=sd(PCN_cov, na.rm=T),
mean_mutations=mean(Mutations.per.generation, na.rm=T),
sd_mut=sd(Mutations.per.generation, na.rm=T))%>%
mutate(ara = as.factor(ara)) %>%
ggplot() +
geom_point(aes(x=PCN_cov, y=Mutations.per.generation, color=ara),
data=Mutations.and.PCNs.pop %>% filter(locus=="plasmid"),
alpha=.5) +
geom_point(aes(x=mean_PCN, y=mean_mutations, color=ara, fill=ara),
shape=23,
# alpha=.7,
size=4)+
geom_errorbar(aes(x=mean_PCN, ymin=mean_mutations-sd_mut, ymax=mean_mutations+sd_mut, color=ara),
width=0)+
geom_errorbarh(aes(xmin=mean_PCN-sd_pcn,xmax=mean_PCN+sd_pcn, y=mean_mutations, color=ara),
height=0)+
scale_x_log10()+
scale_y_log10()+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(angle = 45, hjust=1)) +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))
# Plot con PCN qPCR
Mutations.and.PCNs.pop %>%
filter(locus=="plasmid") %>%
group_by(ara, locus) %>%
summarise(cosa=sum(Mutations.per.generation, na.rm=T),
mean_PCN=mean(PCN_qPCR, na.rm=T),
sd_pcn=sd(PCN_qPCR, na.rm=T),
mean_mutations=mean(Mutations.per.generation, na.rm=T),
sd_mut=sd(Mutations.per.generation, na.rm=T))%>%
mutate(ara = as.factor(ara)) %>%
ggplot() +
geom_point(aes(x=PCN_qPCR, y=Mutations.per.generation, color=ara),
data=Mutations.and.PCNs.pop %>% filter(locus=="plasmid"),
alpha=.5) +
geom_point(aes(x=mean_PCN, y=mean_mutations, color=ara, fill=ara),
shape=23,
# alpha=.7,
size=4)+
geom_errorbar(aes(x=mean_PCN, ymin=mean_mutations-sd_mut, ymax=mean_mutations+sd_mut, color=ara),
width=0)+
geom_errorbarh(aes(xmin=mean_PCN-sd_pcn,xmax=mean_PCN+sd_pcn, y=mean_mutations, color=ara),
height=0)+
scale_x_log10()+
scale_y_log10()+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(angle = 45, hjust=1)) +
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))
## Mutational Spectrum The largest type of mutations that accumulate are
SNPs, both in the plasmid and in the chromosome. In the mutational
spectrum, there are more transitions than transversions, this is typical
of -MutS strains.
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
snps.pop <- mutation.pop.limpias %>%
filter(mutation_type == "SNP")
snps.pop %>%
group_by(genotype, locus, Mutation) %>%
summarise(N=n())
AT.pop<- snps.pop %>%
# group_by(ara, genotype, locus, Mutation) %>%
filter(Mutation=="A→T")
# summarise(Nmutations=n())
AG.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="A→G")
# summarise(Nmutations=n())
AC.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="A→C")
# summarise(Nmutations=n())
TC.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="T→C")
# summarise(Nmutations=n())
TA.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="T→A")
# summarise(Nmutations=n())
TG.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="T→G")
# summarise(Nmutations=n())
GA.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="G→A")
# summarise(Nmutations=n())
GC.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="G→C")
# summarise(Nmutations=n())
GT.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="G→T")
# summarise(Nmutations=n())
CT.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="C→T")
# summarise(Nmutations=n())
CA.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="C→A")
# summarise(Nmutations=n())
CG.pop<- snps.pop %>%
# group_by(ara, genotype, locus,Mutation) %>%
filter( Mutation=="C→G")
# summarise(Nmutations=n())
MutationalSpectrum.pop <- rbind(AT.pop, AC.pop, TA.pop, TG.pop, CA.pop, CG.pop, GC.pop, GT.pop, AG.pop, CT.pop, GA.pop, TC.pop)
summary.mutational.spectrum <- snps.pop %>%
group_by(genotype, ara, locus) %>%
summarise(N.snps.por.ara=n())
# position<-c("A:G","C:T", "G:A", "T:C", "T:G", "A:C", "A:T", "C:A", "C:G", "G:C", "G:T","T:A")
# MutationalSpectrum.pop %>% ungroup() %>%
# # filter(locus =="plasmid") %>%
# # left_join(summary.mutational.spectrum) %>%
# # mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
# ggplot() +
# geom_density(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) + # Intercambio x y y
# facet_wrap(~locus, scales= "free") +
# # scale_x_discrete(limits=position) +
# theme_bw() +
# theme(
# panel.background = element_blank(),
# panel.grid = element_blank(),
# aspect.ratio = 1,
# # legend.position = "none",
# strip.background = element_blank(),
# axis.text.x = element_text(size=11, hjust=0.5),
# axis.text.y = element_text(size=11)) +
# scale_color_manual(values=c("#a666cbff", "#95cea4ff")) +
# scale_fill_manual(values=c("#a666cbff", "#9fd3adff")) +
# coord_flip() +
# xlab("Type of mutation") +
# ylab("Count")
MS <- MutationalSpectrum.pop %>% ungroup() %>%
# filter(locus =="plasmid") %>%
# left_join(summary.mutational.spectrum) %>%
# mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
ggplot() +
geom_bar(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) + # Intercambio x y y
facet_wrap(~locus, scales= "free_x") +
# scale_x_discrete(limits=position) +
theme_bw() +
theme(
panel.background = element_blank(),
panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size=11, hjust=0.5),
axis.text.y = element_text(size=11)) +
scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
coord_flip() +
xlab("Type of mutation") +
ylab(" ") +
labs(title = "All mutations")
# Only fixed mutations:
fixed.MS <- MutationalSpectrum.pop %>% ungroup() %>%
filter(value == 100) %>%
# left_join(summary.mutational.spectrum) %>%
# mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
ggplot() +
geom_bar(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) + # Intercambio x y y
facet_wrap(~locus, scales= "free_x") +
# scale_x_discrete(limits=position) +
theme_bw() +
theme(
panel.background = element_blank(),
panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size=11, hjust=0.5),
axis.text.y = element_text(size=11)) +
scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
coord_flip() +
xlab("Type of mutation") +
ylab("Count") +
labs(title = "Only fixed mutations")
fixed.MS <- MutationalSpectrum.pop %>% ungroup() %>%
filter(value == 100) %>%
# left_join(summary.mutational.spectrum) %>%
# mutate(percen.mutations = (Nmutations/N.snps.por.ara)*100) %>% ungroup() %>%
ggplot() +
geom_bar(aes(x=Mutation, color=locus, fill=locus), alpha=0.4) + # Intercambio x y y
facet_wrap(~locus, scales= "free_x") +
# scale_x_discrete(limits=position) +
theme_bw() +
theme(
panel.background = element_blank(),
panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size=11, hjust=0.5),
axis.text.y = element_text(size=11)) +
scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
coord_flip() +
xlab("Type of mutation") +
ylab("Count") +
labs(title = "Only fixed mutations")
# total mutaciones chr=3873
MS_fixed_chr <- MutationalSpectrum.pop %>%
filter(value == 100) %>%
filter(locus=="chr") %>%
group_by(locus, Mutation) %>%
summarise(N=n()) %>%
mutate(percen=(N/3873)*100)
# total mutaciones plasmid=15
MS_fixed_plasmid <- MutationalSpectrum.pop %>%
filter(value == 100) %>%
filter(locus=="plasmid") %>%
group_by(locus, Mutation) %>%
summarise(N=-n()) %>%
mutate(percen=(N/15)*100)
MS_a <- MS_fixed_chr %>% rbind(MS_fixed_plasmid) %>%
ggplot() +
geom_col(aes(x=Mutation, y=percen, color=locus, fill=locus), alpha=0.4) + # Intercambio x y y
# facet_wrap(~locus, scales= "free_x") +
scale_y_continuous(limits=c(-40,40))+
theme_bw() +
theme(
panel.background = element_blank(),
panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size=11, hjust=0.5),
axis.text.y = element_text(size=11)) +
scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
coord_flip() +
xlab("Type of mutation") +
ylab("Percentage of total mutations") +
labs(title = "Only fixed mutations")
# ALL MUTATIONS
# total mutations in chr= 7196
MS_chr <- MutationalSpectrum.pop %>%
filter(locus=="chr") %>%
group_by(locus, Mutation) %>%
summarise(N=n()) %>%
mutate(percen=(N/7196)*100)
# total mutaciones plasmid=278
MS_plasmid <- MutationalSpectrum.pop %>%
filter(locus=="plasmid") %>%
group_by(locus, Mutation) %>%
summarise(N=-n()) %>%
mutate(percen=(N/278)*100)
MS_b <- MS_chr %>% rbind(MS_plasmid) %>%
ggplot() +
geom_col(aes(x=Mutation, y=percen, color=locus, fill=locus), alpha=0.4) + # Intercambio x y y
# facet_wrap(~locus, scales= "free_x") +
# scale_x_discrete(limits=position) +
scale_y_continuous(limits=c(-40,40))+
theme_bw() +
theme(
panel.background = element_blank(),
panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(size=11, hjust=0.5),
axis.text.y = element_text(size=11)) +
scale_color_manual(values=c("#8366cbff", "#95cea4ff")) +
scale_fill_manual(values=c("#8366cbff", "#9fd3adff")) +
coord_flip() +
xlab("Type of mutation") +
ylab("Percentage of total mutations") +
labs(title = "All mutations")
ggarrange(MS_b, MS_a, align = "hv")
Now, we analyse the mutations distribution:
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
muchas.muts.plasmid.breseq <- Mutation.per.replica.pop %>% group_by(locus, Gene, cumulative_pos, ara) %>%
summarise(Nmuts=n())
# filter(N>3)
Mutation.per.replica.pop %>%
filter(Evidence == "RA") %>%
# filter(contig == 66) %>%
filter(!is.na(Position)) %>%
ggplot(aes(x = cumulative_pos, y = ara)) +
geom_density_ridges(aes(color= ara, fill = ara), bandwidth= 50, alpha=0.5) +
facet_wrap(~locus, scales="free") +
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
legend.position = "none",
strip.background = element_blank(),
# strip.background = element_rect(fill = "black"),
# strip.text = element_text(color = "white"),
axis.text.x = element_text( vjust = 0.5))+
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
xlab("Position (bp)") + ylab("Density")
mut.threshold %>%
filter(contig!=66) %>%
filter(value == 100) %>%
select(contig, Length) %>%
arrange(contig) %>%
unique() %>%
mutate(cumulative_contig=cumsum(Length)) %>%
mutate(cumulative_contig=cumulative_contig-as.numeric(Length)) %>%
right_join(Mutation.per.replica.pop) %>%
mutate(cumulative_contig=ifelse(contig==66, 0, cumulative_contig)) %>%
mutate(cumulative_pos=cumulative_contig+Position) %>%
filter(contig < 116)
# Genes con más mutaciones en plasmidos
muchas.muts.plasmid.breseq %>% filter(locus == "plasmid") %>%
ggplot() +
geom_bar(mapping = aes(x=Gene, fill=ara, color=ara), alpha=0.3) +
facet_wrap(~ara, scales="fixed")+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(angle = 45, hjust=1))+
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))
muchas.muts.plasmid.breseq %>% filter(locus == "chr") %>% filter(Nmuts>3) %>%
ggplot() +
geom_bar(mapping = aes(x=Gene, fill=ara, color=ara), alpha=0.3) +
facet_wrap(~ara, scales="fixed")+
theme_bw() +
theme(panel.background = element_blank(), panel.grid = element_blank(),
aspect.ratio = 1,
#legend.position = "none",
strip.background = element_blank(),
axis.text.x = element_text(angle = 90, vjust=0.5))+
scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))
# muchas.muts.plasmid.breseq %>% filter(locus == "chr") %>% filter(Nmuts>4) %>%
# filter(value == 100) %>%
# ggplot() +
# geom_col(mapping = aes(x=Gene, y=Nmuts, fill=ara, color=ara), alpha=0.3) +
# facet_wrap(~ara, scales="fixed")+
# theme_bw() +
# theme(panel.background = element_blank(), panel.grid = element_blank(),
# aspect.ratio = 1,
# #legend.position = "none",
# strip.background = element_blank(),
# axis.text.x = element_text(angle = 90, vjust=0.5))+
# scale_fill_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff")) +
# scale_color_manual(values=c("#d02543ff", "#5ebdb2ff", "#ddb13cff"))