The purpose of this script is to perform variant analysis on maf files with maftools. The maftools documentation is available here: https://www.bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html
# First time only
BiocManager::install("maftools")
# Load every time
library(knitr)
library(maftools)
library(dplyr)
library(VennDiagram)
library(ggplot2)
Input: One concatenated .maf file of all 96 canine CD4+ PTCL samples and one concatenated .maf file of all control samples (5 samples of sorted nodal CD4+ T cells and 2 samples of sorted CD4+ thymocytes). These input files were generated from RNA-seq data with the GATK pipeline. SNPs and INDELs were filtered based on these recommendations for hard filtering from the GATK documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants. SNPs with QD2 < 2, SOR > 3, FS > 60, MQ < 40, MQRankSum < -12.5, and ReadPosRankSum < -8 and INDELs with QD < 2, FS > 200, and ReadPosRankSum < -20 were excluded. After filtering, vcf files were annotated with the Ensembl Variant Effect Predictor (VEP) and converted to maf files with vcf2maf. Variants annotated as intronic, splice_site, splice_region, or having low or mediator effects were also filtered out and excluded.
# set working directory
setwd("C:/Users/edlarsen/Documents/PTCLRNASeq")
# read PTCL maf file
mymaf <- read.maf(maf = 'Cohort_2/Input/AllPTCLs.CanFam31.QD2.vep.refiltered.maf')
# read CTRL maf file
mymaf_ctrl <- read.maf(maf = 'Cohort_2/Input/AllCTRLs.CanFam31.QD2.vep.refiltered.maf')
# Print a table of the 20 samples with the highest and lowest number of variants
sampleSummary <- getSampleSummary(mymaf)
first20Samples <- head(sampleSummary, 20)
last20Samples <- tail(sampleSummary, 20)
kable(first20Samples, caption = "Top 20 Canine CD4 PTCL Samples with Highest Number of Variants")
Tumor_Sample_Barcode | Frame_Shift_Del | Frame_Shift_Ins | In_Frame_Del | In_Frame_Ins | Missense_Mutation | Nonsense_Mutation | Nonstop_Mutation | Translation_Start_Site | total |
---|---|---|---|---|---|---|---|---|---|
CI162673 | 144 | 598 | 101 | 265 | 1089 | 209 | 21 | 3 | 2430 |
CI165644 | 155 | 697 | 122 | 313 | 739 | 234 | 11 | 7 | 2278 |
CI166556 | 238 | 635 | 118 | 302 | 721 | 240 | 6 | 7 | 2267 |
CI155427 | 143 | 709 | 119 | 338 | 704 | 239 | 8 | 2 | 2262 |
CI124711 | 147 | 613 | 114 | 327 | 831 | 204 | 12 | 5 | 2253 |
CI165189 | 128 | 661 | 132 | 309 | 765 | 232 | 9 | 8 | 2244 |
CI153070 | 192 | 640 | 128 | 302 | 729 | 236 | 11 | 4 | 2242 |
CI124777 | 118 | 707 | 119 | 300 | 722 | 243 | 8 | 5 | 2222 |
CI104689 | 139 | 648 | 130 | 304 | 766 | 220 | 7 | 6 | 2220 |
CI153051 | 161 | 621 | 127 | 300 | 723 | 235 | 11 | 5 | 2183 |
CI154958 | 149 | 652 | 130 | 334 | 695 | 208 | 9 | 5 | 2182 |
CI164934 | 149 | 551 | 111 | 257 | 856 | 233 | 11 | 4 | 2172 |
CI165411 | 123 | 639 | 123 | 305 | 765 | 200 | 10 | 6 | 2171 |
CI161277 | 151 | 634 | 131 | 332 | 661 | 240 | 8 | 4 | 2161 |
CI171487 | 118 | 547 | 107 | 309 | 823 | 228 | 11 | 9 | 2152 |
CI152603 | 128 | 667 | 112 | 316 | 691 | 220 | 7 | 4 | 2145 |
CI149694 | 156 | 671 | 132 | 307 | 645 | 218 | 10 | 2 | 2141 |
CI150689 | 138 | 630 | 118 | 326 | 654 | 254 | 14 | 6 | 2140 |
CI171323 | 134 | 576 | 133 | 288 | 762 | 219 | 10 | 15 | 2137 |
CI152139 | 143 | 638 | 130 | 347 | 623 | 237 | 10 | 8 | 2136 |
# Print a table of the top 20 mutated genes
geneSummary <- getGeneSummary(mymaf)
topVariantGenes <- head(geneSummary, 20)
kable(topVariantGenes, caption = "Top 20 Mutated Genes in Canine CD4 PTCL")
Hugo_Symbol | Frame_Shift_Del | Frame_Shift_Ins | In_Frame_Del | In_Frame_Ins | Missense_Mutation | Nonsense_Mutation | Nonstop_Mutation | Translation_Start_Site | total | MutatedSamples | AlteredSamples |
---|---|---|---|---|---|---|---|---|---|---|---|
RBBP6 | 1 | 266 | 7 | 46 | 4 | 1 | 0 | 0 | 325 | 96 | 96 |
ENSCAFT00000002983 | 3 | 0 | 0 | 0 | 251 | 1 | 0 | 1 | 256 | 96 | 96 |
SZT2 | 5 | 1 | 186 | 0 | 38 | 3 | 0 | 0 | 233 | 96 | 96 |
METTL26 | 94 | 0 | 56 | 0 | 10 | 0 | 70 | 0 | 230 | 96 | 96 |
PPP4R3A | 1 | 97 | 0 | 14 | 3 | 46 | 0 | 0 | 161 | 96 | 96 |
CTCF | 0 | 143 | 0 | 0 | 3 | 0 | 0 | 0 | 146 | 96 | 96 |
SHPRH | 0 | 8 | 96 | 8 | 11 | 13 | 0 | 0 | 136 | 96 | 96 |
KIF1C | 9 | 4 | 98 | 0 | 6 | 2 | 0 | 0 | 119 | 96 | 96 |
ENSCAFT00000014195 | 13 | 0 | 1 | 0 | 103 | 1 | 0 | 0 | 118 | 96 | 96 |
ERCC2 | 96 | 1 | 0 | 0 | 4 | 4 | 0 | 0 | 105 | 96 | 96 |
ITFG2 | 96 | 0 | 2 | 0 | 2 | 0 | 0 | 0 | 100 | 96 | 96 |
ENSCAFT00000074160 | 1 | 0 | 0 | 0 | 96 | 0 | 0 | 0 | 97 | 96 | 96 |
ENSCAFT00000043707 | 0 | 0 | 0 | 0 | 96 | 0 | 0 | 0 | 96 | 96 | 96 |
REEP4 | 96 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 96 | 96 | 96 |
ENSCAFT00000071963 | 1 | 90 | 0 | 0 | 157 | 0 | 0 | 0 | 248 | 95 | 95 |
EP300 | 3 | 3 | 0 | 96 | 6 | 135 | 0 | 0 | 243 | 95 | 95 |
HTT | 0 | 119 | 0 | 4 | 16 | 85 | 0 | 0 | 224 | 95 | 95 |
POR | 73 | 57 | 1 | 0 | 4 | 0 | 89 | 0 | 224 | 95 | 95 |
NCL | 0 | 91 | 2 | 47 | 19 | 0 | 0 | 0 | 159 | 95 | 95 |
DOCK11 | 0 | 101 | 0 | 47 | 3 | 0 | 0 | 0 | 151 | 95 | 95 |
# Write maf summary to an output file
write.mafSummary(maf = mymaf, basename = 'Cohort_2/Output/CD4PTCL_maftools')
# Print a table of the 20 samples with the highest and lowest number of variants
sampleSummary_ctrl <- getSampleSummary(mymaf_ctrl)
first20Samples_ctrl <- head(sampleSummary_ctrl, 20)
last20Samples_ctrl <- tail(sampleSummary_ctrl, 20)
kable(first20Samples_ctrl, caption = "Top 20 Canine CD4 Control Samples with Highest Number of Variants")
Tumor_Sample_Barcode | Frame_Shift_Del | Frame_Shift_Ins | In_Frame_Del | In_Frame_Ins | Missense_Mutation | Nonsense_Mutation | Nonstop_Mutation | Translation_Start_Site | total |
---|---|---|---|---|---|---|---|---|---|
CI157953 | 142 | 578 | 133 | 254 | 682 | 228 | 10 | 5 | 2032 |
CI80400 | 206 | 524 | 121 | 278 | 662 | 202 | 9 | 8 | 2010 |
CI80397 | 194 | 618 | 119 | 284 | 536 | 204 | 10 | 7 | 1972 |
CI80399 | 191 | 496 | 115 | 275 | 652 | 216 | 7 | 9 | 1961 |
CI157907 | 171 | 589 | 114 | 269 | 536 | 209 | 10 | 2 | 1900 |
CI156615 | 143 | 554 | 120 | 276 | 586 | 195 | 8 | 5 | 1887 |
CI156616 | 163 | 541 | 106 | 261 | 576 | 202 | 6 | 7 | 1862 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
Tumor_Sample_Barcode | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
# Print a table of the top 20 mutated genes
geneSummary_ctrl <- getGeneSummary(mymaf_ctrl)
topVariantGenes_ctrl <- head(geneSummary_ctrl, 20)
kable(topVariantGenes_ctrl, caption = "Top 20 Mutated Genes in Canine CD4 Controls")
Hugo_Symbol | Frame_Shift_Del | Frame_Shift_Ins | In_Frame_Del | In_Frame_Ins | Missense_Mutation | Nonsense_Mutation | Nonstop_Mutation | Translation_Start_Site | total | MutatedSamples | AlteredSamples |
---|---|---|---|---|---|---|---|---|---|---|---|
ENSCAFT00000093377 | 0 | 0 | 0 | 0 | 35 | 0 | 0 | 0 | 35 | 7 | 7 |
MACF1 | 2 | 18 | 0 | 6 | 9 | 0 | 0 | 0 | 35 | 7 | 7 |
SAMD9L | 3 | 29 | 0 | 0 | 0 | 0 | 0 | 0 | 32 | 7 | 7 |
ENSCAFT00000037518 | 0 | 0 | 3 | 0 | 23 | 0 | 0 | 0 | 26 | 7 | 7 |
RBBP6 | 0 | 20 | 0 | 3 | 0 | 1 | 0 | 0 | 24 | 7 | 7 |
ENSCAFT00000071963 | 0 | 7 | 0 | 0 | 13 | 0 | 0 | 0 | 20 | 7 | 7 |
FAM214A | 1 | 15 | 0 | 0 | 4 | 0 | 0 | 0 | 20 | 7 | 7 |
PRRC2C | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 7 | 7 |
BDP1 | 0 | 9 | 0 | 10 | 0 | 0 | 0 | 0 | 19 | 7 | 7 |
EP300 | 0 | 0 | 0 | 9 | 1 | 9 | 0 | 0 | 19 | 7 | 7 |
SZT2 | 0 | 0 | 13 | 1 | 3 | 2 | 0 | 0 | 19 | 7 | 7 |
HTT | 0 | 9 | 0 | 0 | 3 | 6 | 0 | 0 | 18 | 7 | 7 |
PLXNB3 | 6 | 1 | 6 | 0 | 4 | 0 | 0 | 0 | 17 | 7 | 7 |
ENSCAFT00000002983 | 0 | 0 | 0 | 0 | 16 | 0 | 0 | 0 | 16 | 7 | 7 |
MYH11 | 0 | 12 | 0 | 2 | 2 | 0 | 0 | 0 | 16 | 7 | 7 |
POR | 6 | 4 | 0 | 0 | 0 | 0 | 6 | 0 | 16 | 7 | 7 |
USP24 | 0 | 6 | 0 | 7 | 3 | 0 | 0 | 0 | 16 | 7 | 7 |
ENSCAFT00000029608 | 0 | 0 | 0 | 0 | 15 | 0 | 0 | 0 | 15 | 7 | 7 |
WDR90 | 7 | 3 | 1 | 0 | 4 | 0 | 0 | 0 | 15 | 7 | 7 |
ZNF148 | 0 | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 15 | 7 | 7 |
# Write maf summary to an output file
write.mafSummary(maf = mymaf_ctrl, basename = 'Cohort_2/Output/CD4CTRL_maftools')
# set colors to annotate mutation types
var_cols = RColorBrewer::brewer.pal(n = 9, name = 'Paired')
names(var_cols) = c(
'In_Frame_Ins',
'Missense_Mutation',
'In_Frame_Del',
'Frame_Shift_Ins',
'Translation_Start_Site',
'Nonstop_Mutation',
'Frame_Shift_Del',
'Multi_Hit',
'Nonsense_Mutation'
)
titvcols = RColorBrewer::brewer.pal(n = 6, name = 'Set3')
names(titvcols) = c("C>T", "C>G", "C>A", "T>A", "T>C", "T>G")
Displays the number of variants in each sample as a stacked barplot and variant types as a boxplot summarized by Variant_Classification. ### PTCLs
plotmafSummary(maf = mymaf,
color = var_cols,
titvColor = titvcols,
rmOutlier = TRUE,
addStat = 'median',
dashboard = TRUE,
titvRaw = FALSE)
plotmafSummary(maf = mymaf_ctrl,
color = var_cols,
titvColor = titvcols,
rmOutlier = TRUE,
addStat = 'median',
dashboard = TRUE,
titvRaw = FALSE)
par(mar = c(5, 0.1, 4, 2))
mafbarplot(
mymaf,
color = var_cols,
n = 20,
genes = NULL,
fontSize = 0.6,
includeCN = FALSE,
legendfontSize = 1,
borderCol = "#34495e",
showPct = TRUE
)
par(mar = c(5, 0.1, 4, 2))
mafbarplot(
mymaf_ctrl,
color = var_cols,
n = 20,
genes = NULL,
fontSize = 0.6,
includeCN = FALSE,
legendfontSize = 1,
borderCol = "#34495e",
showPct = TRUE
)
hPTCLgenes = c("TET2", "DNMT3A", "PTEN", "TP53", "CDKN2A", "MYC", "STAT3", "BCL11B", "BCL6", "CD244", "CD247", "FASLG", "TP63", "TPRG1", "FYN", "IBTK", "LATS1", "ZC3H12D", "TNFAIP3", "RHOA", "KMT2C", "KMT2D", "PTPN13", "IDH2", "SETD1B", "YTHDF2", "PDCD1", "IKZF2", "CD274", "NOTCH1", "ARID1A", "TSC2", "ITPR3", "PIK3R1")
GATA3PTCLgenes = c("DNMT3A", "PTEN", "TP53", "CDKN2A", "MYC", "STAT3")
TBX21PTCLgenes = c("DNMT3A", "BCL11B", "BCL6", "CD244", "CD247", "FASLG", "TP63", "TPRG1", "FYN", "IBTK", "LATS1", "ZC3H12D", "TNFAIP3")
# subset PTCL maf for only these genes
mymaf_hPTCL <- subsetMaf(mymaf, genes = hPTCLgenes)
## -Processing clinical data
mymaf_gata3PTCL <- subsetMaf(mymaf, genes = GATA3PTCLgenes)
## -Processing clinical data
mymaf_tbx21PTCL <- subsetMaf(mymaf, genes = TBX21PTCLgenes)
## -Processing clinical data
# draw oncoplots
oncoplot(
maf = mymaf_hPTCL,
genes = hPTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in Human PTCL"
)
oncoplot(
maf = mymaf_gata3PTCL,
genes = GATA3PTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in GATA3-PTCL"
)
oncoplot(
maf = mymaf_tbx21PTCL,
genes = TBX21PTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in TBX21-PTCL"
)
# subset control maf for only fusion gene partners
mymaf_CTRL_hPTCL <- subsetMaf(mymaf_ctrl, genes = hPTCLgenes)
## -Processing clinical data
mymaf_CTRL_gata3PTCL <- subsetMaf(mymaf_ctrl, genes = GATA3PTCLgenes)
## -Processing clinical data
mymaf_CTRL_tbx21PTCL <- subsetMaf(mymaf_ctrl, genes = TBX21PTCLgenes)
## -Processing clinical data
# draw oncoplots
oncoplot(
maf = mymaf_CTRL_hPTCL,
genes = hPTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in Human PTCL"
)
oncoplot(
maf = mymaf_CTRL_gata3PTCL,
genes = GATA3PTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in GATA3-PTCL"
)
oncoplot(
maf = mymaf_CTRL_tbx21PTCL,
genes = TBX21PTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in TBX21-PTCL"
)
# define list of genes
genes <- c("PTEN", "SATB1", "MAP2K1", "EEF1A1", "NLRP14", "KCND2", "PSMA1", "MET", "KDR", "STK11", "BRAF", "SMAD4", "TET2", "ATM", "EGFR", "JAK1", "MYC", "NOTCH1", "SMO", "TP53", "PLCG1")
# subset PTCL maf for only these genes
mymaf_PTCLgenes <- subsetMaf(mymaf, genes=genes)
## -Processing clinical data
# draw oncoplot
oncoplot(
maf = mymaf_PTCLgenes,
genes = genes,
color = var_cols,
titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in Canine TCL"
)
# subset control maf for only these genes
mymaf_ctrl_PTCLgenes <- subsetMaf(mymaf_ctrl, genes=genes)
## -Processing clinical data
# draw oncoplot
oncoplot(
maf = mymaf_ctrl_PTCLgenes,
genes = genes,
color = var_cols,
titleText = "Canine CD4+ CTRL Variants in Genes Commonly Mutated in Canine TCL"
)
# define list of fusion gene partners
fusiongenes = c("GATD3A", "LMO4", "PTMA", "NCL", "JPT1", "MROH1", "TPD52L2", "TOX2", "REV3L", "FYN", "HMGB1", "BZW1", "HSPD1", "CHD3", "PER1", "EIF5A", "GRB10", "IKZF1", "MYC", "TRIB1", "YWHAZ", "KLF10", "SRSF5")
# subset PTCL maf for only fusion gene partners
mymaf_PTCLfusion <- subsetMaf(mymaf, genes = fusiongenes)
## -Processing clinical data
# draw oncoplot
oncoplot(
maf = mymaf_PTCLfusion,
genes = fusiongenes,
color = var_cols,
titleText = "Canine CD4+ PTCL Variants Called in Fusion Partner Genes"
)
# subset control maf for only fusion gene partners
mymaf_CTRLfusion <- subsetMaf(mymaf_ctrl, genes = fusiongenes)
## -Processing clinical data
# draw oncoplot
oncoplot(
maf = mymaf_CTRLfusion,
genes = fusiongenes,
color = var_cols,
titleText = "Canine CD4+ CTRL Variants Called in Fusion Partner Genes"
)
Boxplot summarizes the overall distribution of different conversions, and stacked barplot shows fraction of conversions in each sample. ### PTCLs
mymaf.titv = titv(maf = mymaf,
plot = FALSE,
useSyn = TRUE)
# plot titv summary
plotTiTv(res = mymaf.titv, color = titvcols)
mymaf_ctrl.titv = titv(maf = mymaf_ctrl,
plot = FALSE,
useSyn = TRUE)
# plot titv summary
plotTiTv(res = mymaf_ctrl.titv, color = titvcols)
tumor_data <- mymaf@data
ctrl_data <- mymaf_ctrl@data
# select columns for matching
tumor_vars <- tumor_data[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
ctrl_vars <- ctrl_data[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
# find common variants
shared_variants <- merge(tumor_vars, ctrl_vars,
by = c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type"),
allow.cartesian = TRUE)
##### Venn Diagram #####
# Read in list of variants in both groups
tumor_vars_unique <- unique(paste(tumor_vars$Chromosome, tumor_vars$Start_Position, tumor_vars$End_Position, tumor_vars$Variant_Classification, tumor_vars$Variant_Type))
ctrl_vars_unique <- unique(paste(ctrl_vars$Chromosome, ctrl_vars$Start_Position, ctrl_vars$End_Position, ctrl_vars$Variant_Classification, ctrl_vars$Variant_Type))
venn1 <- venn.diagram(
x = list(tumor_vars_unique, ctrl_vars_unique),
category.names = c("CD4+ PTCL", "CD4+ CTRL"),
# Output features
filename = NULL,
disable.logging = TRUE,
# Title
main = "Variants Shared Between CD4+ PTCL and \nControl CD4+ Lymphocytes and Thymocytes",
main.cex = 1.5,
main.fontfamily = "sans",
main.fontface = "bold",
# Circles
fill = c(alpha("#440154ff", 0.3), alpha('#21908dff', 0.3)),
lwd = 1,
col = c("#440154ff", '#21908dff'),
# Numbers
cex=1.5,
fontfamly = "sans",
# Categories
cat.cex = 1.5,
cat.fontfamily = "sans",
cat.fontface = "bold",
cat.dist = c(0.05, 0.05),
cat.pos = c(-27, 27),
cat.default.pos = "outer",
cat.col = c("#440154ff", '#21908dff'),
scaled = FALSE,
)
grid.newpage()
grid.draw(venn1)
# filter tumor maf for shared variants
shared_condition <- with(tumor_data,
paste(Chromosome, Start_Position, End_Position, Variant_Classification, Variant_Type) %in%
paste(shared_variants$Chromosome, shared_variants$Start_Position, shared_variants$End_Position,
shared_variants$Variant_Classification, shared_variants$Variant_Type))
tumor_data_filtered <- tumor_data[!shared_condition, ]
mymaf_filtered <- read.maf(tumor_data_filtered)
## -Validating
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 2.520s elapsed (1.550s cpu)
# export
write.table(tumor_data_filtered, file = "ptcl_unique_vars_only.maf", sep = "\t", quote = FALSE, row.names = FALSE)
filtered_tumor_data <- mymaf_filtered@data
filtered_tumor_vars <- filtered_tumor_data[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
filtered_tumor_vars_unique <- unique(paste(filtered_tumor_vars$Chromosome, filtered_tumor_vars$Start_Position, filtered_tumor_vars$End_Position, filtered_tumor_vars$Variant_Classification, filtered_tumor_vars$Variant_Type))
paste("Unique variant calls:", length(filtered_tumor_vars_unique), sep=" ")
## [1] "Unique variant calls: 67486"
Data visualization of only those genes that were mutated in tumor samples, and not in control samples.
Displays the number of variants in each PTCL sample as a stacked barplot and variant types as a boxplot summarized by Variant_Classification.
plotmafSummary(maf = mymaf_filtered,
color = var_cols,
titvColor = titvcols,
rmOutlier = TRUE,
addStat = 'median',
dashboard = TRUE,
titvRaw = FALSE)
par(mar = c(5, 0.1, 4, 2))
mafbarplot(
mymaf_filtered,
n = 20,
genes = NULL,
color = var_cols,
fontSize = 0.7,
includeCN = FALSE,
legendfontSize = 1,
borderCol = "#34495e",
showPct = TRUE
)
Oncoplot for the top 20 mutated genes after filtering out genes also called in control samples. Note: Variants annotated as Multi_Hit are those genes which are mutated more than once in the same sample.
#par(mar = c(5, 2, 4, 2))
oncoplot(maf = mymaf_filtered,
fontSize = 0.5,
top = 20,
colors = var_cols,
titleText = "Top Canine CD4+ PTCL-Specific Variants")
oncoplot(maf = mymaf_filtered,
colors = var_cols,
pathways = "sigpw",
titleText = "Top 10 Mutated Oncogenic Signaling Pathways in Canine CD4+ PTCL",
gene_mar = 8,
fontSize = 0.8,
topPathways = 10,
collapsePathway = TRUE)
oncoplot(maf = mymaf_filtered,
colors = var_cols,
pathways = "sigpw",
titleText = "Details of Top Mutated Oncogenic Signaling Pathway in Canine CD4+ PTCL",
gene_mar = 8,
fontSize = 0.8,
topPathways = 1)
pi3k_genes <- c("PIK3CA", "PTEN", "AKT1", "AKT2", "AKT3", "MTOR", "TSC1", "TSC2", "RPS6", "RPS6KB1", "EIF4E")
oncoplot(
maf = mymaf_filtered,
genes = pi3k_genes,
colors = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants in Genes of the PI3K-AKT-MTOR Pathway"
)
# draw oncoplots
oncoplot(
maf = mymaf_filtered,
genes = hPTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in Human PTCL"
)
oncoplot(
maf = mymaf_filtered,
genes = GATA3PTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in GATA3-PTCL"
)
oncoplot(
maf = mymaf_filtered,
genes = TBX21PTCLgenes,
colors = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in TBX21-PTCL"
)
oncoplot(
maf = mymaf_filtered,
genes = genes,
color = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in Canine TCL"
)
oncoplot(
maf = mymaf_filtered,
genes = fusiongenes,
color = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants Called in Fusion Partner Genes"
)
gata3_genes <- c("GATA3", "IL4", "IL4R", "IL2", "IL2R", "STAT6", "PE8", "STAT5", "MTOR", "SATB1", "CTNNB1", "EP300", "TCF1")
oncoplot(
maf = mymaf_filtered,
genes = gata3_genes,
color = var_cols,
titleText = "Canine CD4+ PTCL-Specific Variants Called in GATA3 Activators"
)
## Transition and Transversions Boxplot summarizes the overall
distribution of different conversions, and stacked barplot shows
fraction of conversions in each sample.
mymafFiltered.titv = titv(maf = mymaf_filtered,
plot = FALSE,
useSyn = TRUE)
# plot titv summary
plotTiTv(res = mymafFiltered.titv, color = titvcols)
Visualizes hypermutated genomic regions in cancer genomes by plotting inter variant distance on a linear genomic scale. “Kataegis” are defined as those genomic segments containing 6 or more consecutive mutations with an average inter-mutation distance of less than or equal to 1,00 bp 5. If tsb = NULL, the most mutated sample is plotted.
top5 <- head(sampleSummary, 5)
top5 <- as.character(top5$Tumor_Sample_Barcode[1:5])
top5
## [1] "CI162673" "CI165644" "CI166556" "CI155427" "CI124711"
for (barcode in top5){
rainfallPlot(maf = mymaf_filtered,
detectChangePoints = TRUE,
tsb = barcode,
width = 10,
height = 5,
pointSize = 0.8)
}
The somaticInteractions function performs pair-wise Fisher’s Exact test to detect signfiicant pairs of mutually exclusive or co-occurring sets of genes.
#exclusive/co-occurance event analysis on top 25 tumor-specific mutated genes.
somaticInteractions(maf = mymaf_filtered,
top = 25,
pvalue = c(0.05, 0.1),
fontSize = 0.6)
oncokb <- read.csv("cancerGeneList.csv")
oncokb_genes <- oncokb$Hugo.Symbol
mymaf_filtered_oncokb <- subsetMaf(maf = mymaf_filtered, genes = oncokb_genes)
## -Processing clinical data
filtered_oncokb <- mymaf_filtered_oncokb@data
filtered_oncokb <- filtered_oncokb[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
filtered_oncokb_unique <- unique(paste(filtered_oncokb$Chromosome, filtered_oncokb$Start_Position, filtered_oncokb$End_Position, filtered_oncokb$Variant_Classification, filtered_oncokb$Variant_Type))
paste("Unique variant calls:", length(filtered_oncokb), sep=" ")
## [1] "Unique variant calls: 5"
## MAF summary
plotmafSummary(maf = mymaf_filtered_oncokb,
color = var_cols,
titvColor = titvcols,
rmOutlier = TRUE,
addStat = 'median',
dashboard = TRUE,
titvRaw = FALSE)
## Barplot
mafbarplot(
mymaf_filtered_oncokb,
n = 20,
genes = NULL,
color = var_cols,
fontSize = 0.7,
includeCN = FALSE,
legendfontSize = 1,
borderCol = "#34495e",
showPct = TRUE
)
## Oncoplot for top 20 mutated cancer-associated genes
oncoplot(maf = mymaf_filtered_oncokb,
fontSize = 0.5,
top = 20,
colors = var_cols,
titleText = "Top Cancer-associated genes mutated in canine CD4+ PTCL")
sessionInfo()
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/Denver
## tzcode source: internal
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplot2_3.5.1 VennDiagram_1.7.3 futile.logger_1.4.3
## [4] dplyr_1.1.4 maftools_2.20.0 knitr_1.49
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-0 gtable_0.3.6 jsonlite_1.8.9
## [4] compiler_4.4.0 tidyselect_1.2.1 jquerylib_0.1.4
## [7] scales_1.3.0 splines_4.4.0 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 DNAcopy_1.78.0
## [13] R6_2.5.1 generics_0.1.3 tibble_3.2.1
## [16] munsell_0.5.1 bslib_0.8.0 pillar_1.10.1
## [19] RColorBrewer_1.1-3 rlang_1.1.3 cachem_1.1.0
## [22] xfun_0.49 sass_0.4.9 cli_3.6.2
## [25] withr_3.0.2 magrittr_2.0.3 formatR_1.14
## [28] futile.options_1.0.1 digest_0.6.35 rstudioapi_0.17.1
## [31] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.3
## [34] glue_1.7.0 data.table_1.16.4 lambda.r_1.2.4
## [37] codetools_0.2-20 survival_3.5-8 colorspace_2.1-1
## [40] rmarkdown_2.29 tools_4.4.0 pkgconfig_2.0.3
## [43] htmltools_0.5.8.1
citation()
## To cite R in publications use:
##
## R Core Team (2024). _R: A Language and Environment for Statistical
## Computing_. R Foundation for Statistical Computing, Vienna, Austria.
## <https://www.R-project.org/>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2024},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.