Introduction

The purpose of this script is to perform variant analysis on maf files with maftools. The maftools documentation is available here: https://www.bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html

Software

# First time only
BiocManager::install("maftools")
# Load every time
library(knitr)
library(maftools)
library(dplyr)
library(VennDiagram)
library(ggplot2)

Data input

Input: One concatenated .maf file of all 96 canine CD4+ PTCL samples and one concatenated .maf file of all control samples (5 samples of sorted nodal CD4+ T cells and 2 samples of sorted CD4+ thymocytes). These input files were generated from RNA-seq data with the GATK pipeline. SNPs and INDELs were filtered based on these recommendations for hard filtering from the GATK documentation: https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants. SNPs with QD2 < 2, SOR > 3, FS > 60, MQ < 40, MQRankSum < -12.5, and ReadPosRankSum < -8 and INDELs with QD < 2, FS > 200, and ReadPosRankSum < -20 were excluded. After filtering, vcf files were annotated with the Ensembl Variant Effect Predictor (VEP) and converted to maf files with vcf2maf. Variants annotated as intronic, splice_site, splice_region, or having low or mediator effects were also filtered out and excluded.

# set working directory
setwd("C:/Users/edlarsen/Documents/PTCLRNASeq")

# read PTCL maf file
mymaf <- read.maf(maf = 'Cohort_2/Input/AllPTCLs.CanFam31.QD2.vep.refiltered.maf')

# read CTRL maf file
mymaf_ctrl <- read.maf(maf = 'Cohort_2/Input/AllCTRLs.CanFam31.QD2.vep.refiltered.maf')

Inspect maf objects and export a summary

PTCLs

# Print a table of the 20 samples with the highest and lowest number of variants
sampleSummary <- getSampleSummary(mymaf)
first20Samples <- head(sampleSummary, 20)
last20Samples <- tail(sampleSummary, 20)
kable(first20Samples, caption = "Top 20 Canine CD4 PTCL Samples with Highest Number of Variants")
Top 20 Canine CD4 PTCL Samples with Highest Number of Variants
Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins Missense_Mutation Nonsense_Mutation Nonstop_Mutation Translation_Start_Site total
CI162673 144 598 101 265 1089 209 21 3 2430
CI165644 155 697 122 313 739 234 11 7 2278
CI166556 238 635 118 302 721 240 6 7 2267
CI155427 143 709 119 338 704 239 8 2 2262
CI124711 147 613 114 327 831 204 12 5 2253
CI165189 128 661 132 309 765 232 9 8 2244
CI153070 192 640 128 302 729 236 11 4 2242
CI124777 118 707 119 300 722 243 8 5 2222
CI104689 139 648 130 304 766 220 7 6 2220
CI153051 161 621 127 300 723 235 11 5 2183
CI154958 149 652 130 334 695 208 9 5 2182
CI164934 149 551 111 257 856 233 11 4 2172
CI165411 123 639 123 305 765 200 10 6 2171
CI161277 151 634 131 332 661 240 8 4 2161
CI171487 118 547 107 309 823 228 11 9 2152
CI152603 128 667 112 316 691 220 7 4 2145
CI149694 156 671 132 307 645 218 10 2 2141
CI150689 138 630 118 326 654 254 14 6 2140
CI171323 134 576 133 288 762 219 10 15 2137
CI152139 143 638 130 347 623 237 10 8 2136
# Print a table of the top 20 mutated genes
geneSummary <- getGeneSummary(mymaf)
topVariantGenes <- head(geneSummary, 20)
kable(topVariantGenes, caption = "Top 20 Mutated Genes in Canine CD4 PTCL")
Top 20 Mutated Genes in Canine CD4 PTCL
Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins Missense_Mutation Nonsense_Mutation Nonstop_Mutation Translation_Start_Site total MutatedSamples AlteredSamples
RBBP6 1 266 7 46 4 1 0 0 325 96 96
ENSCAFT00000002983 3 0 0 0 251 1 0 1 256 96 96
SZT2 5 1 186 0 38 3 0 0 233 96 96
METTL26 94 0 56 0 10 0 70 0 230 96 96
PPP4R3A 1 97 0 14 3 46 0 0 161 96 96
CTCF 0 143 0 0 3 0 0 0 146 96 96
SHPRH 0 8 96 8 11 13 0 0 136 96 96
KIF1C 9 4 98 0 6 2 0 0 119 96 96
ENSCAFT00000014195 13 0 1 0 103 1 0 0 118 96 96
ERCC2 96 1 0 0 4 4 0 0 105 96 96
ITFG2 96 0 2 0 2 0 0 0 100 96 96
ENSCAFT00000074160 1 0 0 0 96 0 0 0 97 96 96
ENSCAFT00000043707 0 0 0 0 96 0 0 0 96 96 96
REEP4 96 0 0 0 0 0 0 0 96 96 96
ENSCAFT00000071963 1 90 0 0 157 0 0 0 248 95 95
EP300 3 3 0 96 6 135 0 0 243 95 95
HTT 0 119 0 4 16 85 0 0 224 95 95
POR 73 57 1 0 4 0 89 0 224 95 95
NCL 0 91 2 47 19 0 0 0 159 95 95
DOCK11 0 101 0 47 3 0 0 0 151 95 95
# Write maf summary to an output file
write.mafSummary(maf = mymaf, basename = 'Cohort_2/Output/CD4PTCL_maftools')

Controls

# Print a table of the 20 samples with the highest and lowest number of variants
sampleSummary_ctrl <- getSampleSummary(mymaf_ctrl)
first20Samples_ctrl <- head(sampleSummary_ctrl, 20)
last20Samples_ctrl <- tail(sampleSummary_ctrl, 20)
kable(first20Samples_ctrl, caption = "Top 20 Canine CD4 Control Samples with Highest Number of Variants")
Top 20 Canine CD4 Control Samples with Highest Number of Variants
Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins Missense_Mutation Nonsense_Mutation Nonstop_Mutation Translation_Start_Site total
CI157953 142 578 133 254 682 228 10 5 2032
CI80400 206 524 121 278 662 202 9 8 2010
CI80397 194 618 119 284 536 204 10 7 1972
CI80399 191 496 115 275 652 216 7 9 1961
CI157907 171 589 114 269 536 209 10 2 1900
CI156615 143 554 120 276 586 195 8 5 1887
CI156616 163 541 106 261 576 202 6 7 1862
0 0 0 0 0 0 0 0 0
Tumor_Sample_Barcode 0 0 0 0 0 0 0 0 0
# Print a table of the top 20 mutated genes
geneSummary_ctrl <- getGeneSummary(mymaf_ctrl)
topVariantGenes_ctrl <- head(geneSummary_ctrl, 20)
kable(topVariantGenes_ctrl, caption = "Top 20 Mutated Genes in Canine CD4 Controls")
Top 20 Mutated Genes in Canine CD4 Controls
Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins Missense_Mutation Nonsense_Mutation Nonstop_Mutation Translation_Start_Site total MutatedSamples AlteredSamples
ENSCAFT00000093377 0 0 0 0 35 0 0 0 35 7 7
MACF1 2 18 0 6 9 0 0 0 35 7 7
SAMD9L 3 29 0 0 0 0 0 0 32 7 7
ENSCAFT00000037518 0 0 3 0 23 0 0 0 26 7 7
RBBP6 0 20 0 3 0 1 0 0 24 7 7
ENSCAFT00000071963 0 7 0 0 13 0 0 0 20 7 7
FAM214A 1 15 0 0 4 0 0 0 20 7 7
PRRC2C 0 20 0 0 0 0 0 0 20 7 7
BDP1 0 9 0 10 0 0 0 0 19 7 7
EP300 0 0 0 9 1 9 0 0 19 7 7
SZT2 0 0 13 1 3 2 0 0 19 7 7
HTT 0 9 0 0 3 6 0 0 18 7 7
PLXNB3 6 1 6 0 4 0 0 0 17 7 7
ENSCAFT00000002983 0 0 0 0 16 0 0 0 16 7 7
MYH11 0 12 0 2 2 0 0 0 16 7 7
POR 6 4 0 0 0 0 6 0 16 7 7
USP24 0 6 0 7 3 0 0 0 16 7 7
ENSCAFT00000029608 0 0 0 0 15 0 0 0 15 7 7
WDR90 7 3 1 0 4 0 0 0 15 7 7
ZNF148 0 14 0 0 1 0 0 0 15 7 7
# Write maf summary to an output file
write.mafSummary(maf = mymaf_ctrl, basename = 'Cohort_2/Output/CD4CTRL_maftools')

Initial data visualization

# set colors to annotate mutation types
var_cols = RColorBrewer::brewer.pal(n = 9, name = 'Paired')
names(var_cols) = c(
  'In_Frame_Ins',
  'Missense_Mutation',
  'In_Frame_Del',
  'Frame_Shift_Ins',
  'Translation_Start_Site',
  'Nonstop_Mutation',
  'Frame_Shift_Del',
  'Multi_Hit',
  'Nonsense_Mutation'
)

titvcols = RColorBrewer::brewer.pal(n = 6, name = 'Set3')
names(titvcols) = c("C>T", "C>G", "C>A", "T>A", "T>C", "T>G")

Plotting MAF summaries

Displays the number of variants in each sample as a stacked barplot and variant types as a boxplot summarized by Variant_Classification. ### PTCLs

plotmafSummary(maf = mymaf,
               color = var_cols,
               titvColor = titvcols,
               rmOutlier = TRUE, 
               addStat = 'median', 
               dashboard = TRUE, 
               titvRaw = FALSE)

Controls

plotmafSummary(maf = mymaf_ctrl,
               color = var_cols,
               titvColor = titvcols,
               rmOutlier = TRUE, 
               addStat = 'median', 
               dashboard = TRUE, 
               titvRaw = FALSE)

Barplots of mutated genes

PTCLs

par(mar = c(5, 0.1, 4, 2))

mafbarplot(
  mymaf,
  color = var_cols,
  n = 20,
  genes = NULL,
  fontSize = 0.6,
  includeCN = FALSE,
  legendfontSize = 1,
  borderCol = "#34495e",
  showPct = TRUE
)

Controls

par(mar = c(5, 0.1, 4, 2))

mafbarplot(
  mymaf_ctrl,
  color = var_cols,
  n = 20,
  genes = NULL,
  fontSize = 0.6,
  includeCN = FALSE,
  legendfontSize = 1,
  borderCol = "#34495e",
  showPct = TRUE
)

Oncoplot of genes commonly mutated in human PTCL

Canine CD4+ PTCLs

hPTCLgenes = c("TET2", "DNMT3A", "PTEN", "TP53", "CDKN2A", "MYC", "STAT3", "BCL11B", "BCL6", "CD244", "CD247", "FASLG", "TP63", "TPRG1", "FYN", "IBTK", "LATS1", "ZC3H12D", "TNFAIP3", "RHOA", "KMT2C", "KMT2D", "PTPN13", "IDH2", "SETD1B", "YTHDF2", "PDCD1", "IKZF2", "CD274", "NOTCH1", "ARID1A", "TSC2", "ITPR3", "PIK3R1")
GATA3PTCLgenes = c("DNMT3A", "PTEN", "TP53", "CDKN2A", "MYC", "STAT3")
TBX21PTCLgenes = c("DNMT3A", "BCL11B", "BCL6", "CD244", "CD247", "FASLG", "TP63", "TPRG1", "FYN", "IBTK", "LATS1", "ZC3H12D", "TNFAIP3")

# subset PTCL maf for only these genes
mymaf_hPTCL <- subsetMaf(mymaf, genes = hPTCLgenes)
## -Processing clinical data
mymaf_gata3PTCL <- subsetMaf(mymaf, genes = GATA3PTCLgenes)
## -Processing clinical data
mymaf_tbx21PTCL <- subsetMaf(mymaf, genes = TBX21PTCLgenes)
## -Processing clinical data
# draw oncoplots
oncoplot(
  maf = mymaf_hPTCL,
  genes = hPTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in Human PTCL"
)

oncoplot(
  maf = mymaf_gata3PTCL,
  genes = GATA3PTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in GATA3-PTCL"
)

oncoplot(
  maf = mymaf_tbx21PTCL,
  genes = TBX21PTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in TBX21-PTCL"
)

Canine CD4+ control lymphocytes and thymocytes

# subset control maf for only fusion gene partners
mymaf_CTRL_hPTCL <- subsetMaf(mymaf_ctrl, genes = hPTCLgenes)
## -Processing clinical data
mymaf_CTRL_gata3PTCL <- subsetMaf(mymaf_ctrl, genes = GATA3PTCLgenes)
## -Processing clinical data
mymaf_CTRL_tbx21PTCL <- subsetMaf(mymaf_ctrl, genes = TBX21PTCLgenes)
## -Processing clinical data
# draw oncoplots
oncoplot(
  maf = mymaf_CTRL_hPTCL,
  genes = hPTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in Human PTCL"
)

oncoplot(
  maf = mymaf_CTRL_gata3PTCL,
  genes = GATA3PTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in GATA3-PTCL"
)

oncoplot(
  maf = mymaf_CTRL_tbx21PTCL,
  genes = TBX21PTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in TBX21-PTCL"
)

Oncoplot of genes commonly mutated in canine PTCL

PTCLs

# define list of genes
genes <- c("PTEN", "SATB1", "MAP2K1", "EEF1A1", "NLRP14", "KCND2", "PSMA1", "MET", "KDR", "STK11", "BRAF", "SMAD4", "TET2", "ATM", "EGFR", "JAK1", "MYC", "NOTCH1", "SMO", "TP53", "PLCG1")

# subset PTCL maf for only these genes
mymaf_PTCLgenes <- subsetMaf(mymaf, genes=genes)
## -Processing clinical data
# draw oncoplot
oncoplot(
  maf = mymaf_PTCLgenes,
  genes = genes,
  color = var_cols,
  titleText = "Canine CD4+ PTCL Variants in Genes Commonly Mutated in Canine TCL"
)

Controls

# subset control maf for only these genes
mymaf_ctrl_PTCLgenes <- subsetMaf(mymaf_ctrl, genes=genes)
## -Processing clinical data
# draw oncoplot
oncoplot(
  maf = mymaf_ctrl_PTCLgenes,
  genes = genes,
  color = var_cols,
  titleText = "Canine CD4+ CTRL Variants in Genes Commonly Mutated in Canine TCL"
)

Oncoplot of canine PTCL fusion gene partners

PTCLs

# define list of fusion gene partners
fusiongenes = c("GATD3A", "LMO4", "PTMA", "NCL", "JPT1", "MROH1", "TPD52L2", "TOX2", "REV3L", "FYN", "HMGB1", "BZW1", "HSPD1", "CHD3", "PER1", "EIF5A", "GRB10", "IKZF1", "MYC", "TRIB1", "YWHAZ", "KLF10", "SRSF5")

# subset PTCL maf for only fusion gene partners
mymaf_PTCLfusion <- subsetMaf(mymaf, genes = fusiongenes)
## -Processing clinical data
# draw oncoplot
oncoplot(
  maf = mymaf_PTCLfusion,
  genes = fusiongenes,
  color = var_cols,
  titleText = "Canine CD4+ PTCL Variants Called in Fusion Partner Genes"
)

Controls

# subset control maf for only fusion gene partners
mymaf_CTRLfusion <- subsetMaf(mymaf_ctrl, genes = fusiongenes)
## -Processing clinical data
# draw oncoplot
oncoplot(
  maf = mymaf_CTRLfusion,
  genes = fusiongenes,
  color = var_cols,
  titleText = "Canine CD4+ CTRL Variants Called in Fusion Partner Genes"
)

Transition and Transversions

Boxplot summarizes the overall distribution of different conversions, and stacked barplot shows fraction of conversions in each sample. ### PTCLs

mymaf.titv = titv(maf = mymaf,
                  plot = FALSE,
                  useSyn = TRUE)
# plot titv summary
plotTiTv(res = mymaf.titv, color = titvcols)

Controls

mymaf_ctrl.titv = titv(maf = mymaf_ctrl,
                  plot = FALSE,
                  useSyn = TRUE)
# plot titv summary
plotTiTv(res = mymaf_ctrl.titv, color = titvcols)

Compare variants called in both tumor and control samples

Venn Diagram

tumor_data <- mymaf@data
ctrl_data <- mymaf_ctrl@data

# select columns for matching
tumor_vars <- tumor_data[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
ctrl_vars <- ctrl_data[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]

# find common variants
shared_variants <- merge(tumor_vars, ctrl_vars, 
                         by = c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type"),
                         allow.cartesian = TRUE)


##### Venn Diagram #####
# Read in list of variants in both groups
tumor_vars_unique <- unique(paste(tumor_vars$Chromosome, tumor_vars$Start_Position, tumor_vars$End_Position, tumor_vars$Variant_Classification, tumor_vars$Variant_Type))

ctrl_vars_unique <- unique(paste(ctrl_vars$Chromosome, ctrl_vars$Start_Position, ctrl_vars$End_Position, ctrl_vars$Variant_Classification, ctrl_vars$Variant_Type))

venn1 <- venn.diagram(
  x = list(tumor_vars_unique, ctrl_vars_unique),
  category.names = c("CD4+ PTCL", "CD4+ CTRL"),
  
  # Output features
  filename = NULL,
  disable.logging = TRUE,
  
  # Title
  main = "Variants Shared Between CD4+ PTCL and \nControl CD4+ Lymphocytes and Thymocytes",
  main.cex = 1.5,
  main.fontfamily = "sans",
  main.fontface = "bold",
  
  # Circles
  fill = c(alpha("#440154ff", 0.3), alpha('#21908dff', 0.3)),
  lwd = 1,
  col = c("#440154ff", '#21908dff'),
  
  # Numbers
  cex=1.5,
  fontfamly = "sans",
  
  # Categories
  cat.cex = 1.5,
  cat.fontfamily = "sans",
  cat.fontface = "bold",
  cat.dist = c(0.05, 0.05),
  cat.pos = c(-27, 27),
  cat.default.pos = "outer",
  cat.col = c("#440154ff", '#21908dff'),
  scaled = FALSE,
)
grid.newpage()
grid.draw(venn1)

Filter genes from tumor samples that were called in control samples

# filter tumor maf for shared variants
shared_condition <- with(tumor_data, 
                         paste(Chromosome, Start_Position, End_Position, Variant_Classification, Variant_Type) %in%
                           paste(shared_variants$Chromosome, shared_variants$Start_Position, shared_variants$End_Position, 
                                 shared_variants$Variant_Classification, shared_variants$Variant_Type))

tumor_data_filtered <- tumor_data[!shared_condition, ]

mymaf_filtered <- read.maf(tumor_data_filtered)
## -Validating
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 2.520s elapsed (1.550s cpu)
# export
write.table(tumor_data_filtered, file = "ptcl_unique_vars_only.maf", sep = "\t", quote = FALSE, row.names = FALSE)



filtered_tumor_data <- mymaf_filtered@data
filtered_tumor_vars <- filtered_tumor_data[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
filtered_tumor_vars_unique <- unique(paste(filtered_tumor_vars$Chromosome, filtered_tumor_vars$Start_Position, filtered_tumor_vars$End_Position, filtered_tumor_vars$Variant_Classification, filtered_tumor_vars$Variant_Type))
paste("Unique variant calls:", length(filtered_tumor_vars_unique), sep=" ")
## [1] "Unique variant calls: 67486"

Data visualization after filtering

Data visualization of only those genes that were mutated in tumor samples, and not in control samples.

Plotting MAF summary in PTCLs

Displays the number of variants in each PTCL sample as a stacked barplot and variant types as a boxplot summarized by Variant_Classification.

plotmafSummary(maf = mymaf_filtered,
               color = var_cols,
               titvColor = titvcols,
               rmOutlier = TRUE, 
               addStat = 'median', 
               dashboard = TRUE, 
               titvRaw = FALSE)

Barplot of mutated genes in PTCLs

par(mar = c(5, 0.1, 4, 2))
mafbarplot(
  mymaf_filtered,
  n = 20,
  genes = NULL,
  color = var_cols,
  fontSize = 0.7,
  includeCN = FALSE,
  legendfontSize = 1,
  borderCol = "#34495e",
  showPct = TRUE
)

Oncoplots

Top tumor-specific mutated genes

Oncoplot for the top 20 mutated genes after filtering out genes also called in control samples. Note: Variants annotated as Multi_Hit are those genes which are mutated more than once in the same sample.

#par(mar = c(5, 2, 4, 2))
oncoplot(maf = mymaf_filtered,
         fontSize = 0.5,
         top = 20,
         colors = var_cols,
         titleText = "Top Canine CD4+ PTCL-Specific Variants")

Oncoplot of oncogenic signaling pathway genes in PTCLs

oncoplot(maf = mymaf_filtered,
         colors = var_cols,
         pathways = "sigpw",
         titleText = "Top 10 Mutated Oncogenic Signaling Pathways in Canine CD4+ PTCL",
         gene_mar = 8, 
         fontSize = 0.8, 
         topPathways = 10,
         collapsePathway = TRUE)

oncoplot(maf = mymaf_filtered,
         colors = var_cols,
         pathways = "sigpw",
         titleText = "Details of Top Mutated Oncogenic Signaling Pathway in Canine CD4+ PTCL",
         gene_mar = 8, 
         fontSize = 0.8, 
         topPathways = 1)

pi3k_genes <- c("PIK3CA", "PTEN", "AKT1", "AKT2", "AKT3", "MTOR", "TSC1", "TSC2", "RPS6", "RPS6KB1", "EIF4E")

oncoplot(
  maf = mymaf_filtered,
  genes = pi3k_genes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants in Genes of the PI3K-AKT-MTOR Pathway"
)

Oncoplot of genes commonly mutated in human PTCL

# draw oncoplots
oncoplot(
  maf = mymaf_filtered,
  genes = hPTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in Human PTCL"
)

oncoplot(
  maf = mymaf_filtered,
  genes = GATA3PTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in GATA3-PTCL"
)

oncoplot(
  maf = mymaf_filtered,
  genes = TBX21PTCLgenes,
  colors = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in TBX21-PTCL"
)

Oncoplot of commonly mutated genes in canine PTCL

oncoplot(
  maf = mymaf_filtered,
  genes = genes,
  color = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants in Genes Commonly Mutated in Canine TCL"
)

Oncoplot of fusion gene partners in PTCLs

oncoplot(
  maf = mymaf_filtered,
  genes = fusiongenes,
  color = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants Called in Fusion Partner Genes"
)

Oncoplot of GATA3 activators

gata3_genes <- c("GATA3", "IL4", "IL4R", "IL2", "IL2R", "STAT6", "PE8", "STAT5", "MTOR", "SATB1", "CTNNB1", "EP300", "TCF1")

oncoplot(
  maf = mymaf_filtered,
  genes = gata3_genes,
  color = var_cols,
  titleText = "Canine CD4+ PTCL-Specific Variants Called in GATA3 Activators"
)

## Transition and Transversions Boxplot summarizes the overall distribution of different conversions, and stacked barplot shows fraction of conversions in each sample.

mymafFiltered.titv = titv(maf = mymaf_filtered,
                  plot = FALSE,
                  useSyn = TRUE)
# plot titv summary
plotTiTv(res = mymafFiltered.titv, color = titvcols)

Rainfall plots of the 5 tumors with the most mutations

Visualizes hypermutated genomic regions in cancer genomes by plotting inter variant distance on a linear genomic scale. “Kataegis” are defined as those genomic segments containing 6 or more consecutive mutations with an average inter-mutation distance of less than or equal to 1,00 bp 5. If tsb = NULL, the most mutated sample is plotted.

top5 <- head(sampleSummary, 5)
top5 <- as.character(top5$Tumor_Sample_Barcode[1:5])
top5
## [1] "CI162673" "CI165644" "CI166556" "CI155427" "CI124711"
for (barcode in top5){
  rainfallPlot(maf = mymaf_filtered,
             detectChangePoints = TRUE,
             tsb = barcode,
             width = 10,
             height = 5,
             pointSize = 0.8)
}

Somatic interactions

The somaticInteractions function performs pair-wise Fisher’s Exact test to detect signfiicant pairs of mutually exclusive or co-occurring sets of genes.

#exclusive/co-occurance event analysis on top 25 tumor-specific mutated genes. 
somaticInteractions(maf = mymaf_filtered, 
                    top = 25, 
                    pvalue = c(0.05, 0.1),
                    fontSize = 0.6)

Variants in cancer-associated genes

oncokb <- read.csv("cancerGeneList.csv")
oncokb_genes <- oncokb$Hugo.Symbol

mymaf_filtered_oncokb <- subsetMaf(maf = mymaf_filtered, genes = oncokb_genes)
## -Processing clinical data
filtered_oncokb <- mymaf_filtered_oncokb@data
filtered_oncokb <- filtered_oncokb[, c("Chromosome", "Start_Position", "End_Position", "Variant_Classification", "Variant_Type")]
filtered_oncokb_unique <- unique(paste(filtered_oncokb$Chromosome, filtered_oncokb$Start_Position, filtered_oncokb$End_Position, filtered_oncokb$Variant_Classification, filtered_oncokb$Variant_Type))
paste("Unique variant calls:", length(filtered_oncokb), sep=" ")
## [1] "Unique variant calls: 5"
## MAF summary
plotmafSummary(maf = mymaf_filtered_oncokb,
               color = var_cols,
               titvColor = titvcols,
               rmOutlier = TRUE, 
               addStat = 'median', 
               dashboard = TRUE, 
               titvRaw = FALSE)

## Barplot
mafbarplot(
  mymaf_filtered_oncokb,
  n = 20,
  genes = NULL,
  color = var_cols,
  fontSize = 0.7,
  includeCN = FALSE,
  legendfontSize = 1,
  borderCol = "#34495e",
  showPct = TRUE
)

## Oncoplot for top 20 mutated cancer-associated genes
oncoplot(maf = mymaf_filtered_oncokb,
         fontSize = 0.5,
         top = 20,
         colors = var_cols,
         titleText = "Top Cancer-associated genes mutated in canine CD4+ PTCL")

Citations

sessionInfo()
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Denver
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ggplot2_3.5.1       VennDiagram_1.7.3   futile.logger_1.4.3
## [4] dplyr_1.1.4         maftools_2.20.0     knitr_1.49         
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-0         gtable_0.3.6         jsonlite_1.8.9      
##  [4] compiler_4.4.0       tidyselect_1.2.1     jquerylib_0.1.4     
##  [7] scales_1.3.0         splines_4.4.0        yaml_2.3.10         
## [10] fastmap_1.2.0        lattice_0.22-6       DNAcopy_1.78.0      
## [13] R6_2.5.1             generics_0.1.3       tibble_3.2.1        
## [16] munsell_0.5.1        bslib_0.8.0          pillar_1.10.1       
## [19] RColorBrewer_1.1-3   rlang_1.1.3          cachem_1.1.0        
## [22] xfun_0.49            sass_0.4.9           cli_3.6.2           
## [25] withr_3.0.2          magrittr_2.0.3       formatR_1.14        
## [28] futile.options_1.0.1 digest_0.6.35        rstudioapi_0.17.1   
## [31] lifecycle_1.0.4      vctrs_0.6.5          evaluate_1.0.3      
## [34] glue_1.7.0           data.table_1.16.4    lambda.r_1.2.4      
## [37] codetools_0.2-20     survival_3.5-8       colorspace_2.1-1    
## [40] rmarkdown_2.29       tools_4.4.0          pkgconfig_2.0.3     
## [43] htmltools_0.5.8.1
citation()
## To cite R in publications use:
## 
##   R Core Team (2024). _R: A Language and Environment for Statistical
##   Computing_. R Foundation for Statistical Computing, Vienna, Austria.
##   <https://www.R-project.org/>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2024},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.