MWAS Visualization Module

MWAS package provides multiple formats of visualization methods, including beeswarm, gradient, scatterplot, violin*, boxplot*, and heatmap*.

Return to Index

Preprocessing Raw Input Data

The raw input data might need a preprocessing step in order to show the relationships appropriately. The function preprocess.mwas includes the following functions (Table 1: Preprocessing options):

Remove nonoverlapping samples across multiple input tables
Convert OTU abudance from absolute to relative values, i.e. the sum of OTU abundance for each sample is equal to 1. (if option is.relative.conversion=TRUE or -r option is on which leads to suppress_relative_abundance_conversion=FALSE)
Remove rare features (OTUs) accroding to the mean abundance of a feature which is smaller than the given threshold. (if option -p or min_prevalence is given a value)
Three data transformation methods are provided. asin_sqrt, norm_asin_sqrt, and none (through option transform_type or -t)
Collapse the OTU table by correlation of 0.95 (if option is.collapse or -b is on)
Filter lineage table (kegg table) if option is.filter.kegg or -K is on (Used in the customized heatmap)

Table 1: Preprocessing options
fig.cap=Table 1. Preprocessing options

Visualization Options and Parameters

Command-line version: All visualization related options are capital letters, except the input options shared with other modules (the blue ones in Table 2). Examples will follow in the next few sections, using the data in the directory test/data/.

Table 2: Visualization options
Table 2. Visualization options
(* Not available in the current version.)

Plot a gradient effect across samples for each individual OTUs/taxa

Command-line version (in Terminal)

Rscript $MWAS_DIR/bin/mwas_analysis.R -w plot -M gradient -i test/data/taxa/GG_100nt_even10k-adults_L7.biom -o example/plot_otu_gradient -S

-w: plot mode
-M: gradient plot
-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table
-o: output directory; the gradient plot is saved as a .pdf file
-S: shorten the taxonomy names in order to show only the lowest taxon level name (removing k__ etc. to simplify the taxon names on the plots)

R version (in R Console)

If you are familiar with R, you could manipulate your data in a more flexible way. Here is the same example as shown in the command-line version.

Set work directory

setwd("~/Documents/LabProjects/mwas_git/")

Load MWAS functions

file.sources = list.files("lib", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))

Set visualization parameters

opts <- list()
opts$mode <- "plot"
opts$method <- "gradient"
opts$input_fp <- "test/data/taxa/GG_100nt_even10k-adults_L7.biom"
opts$transform_type <- "none"
opts$suppress_relative_abundance_conversion <- FALSE
opts$min_prevalence <- NULL
opts$collapse_table <- FALSE
opts$outdir <- "example/plot_otu_gradient"
opts$shorten_taxa <- TRUE
opts$multiple_axes <- FALSE
opts$filter_kegg <- FALSE

Creat the output directory if needed

if(opts$outdir != ".") dir.create(opts$outdir,showWarnings=FALSE, recursive=TRUE)

Parse input parameters and plot the corresponding figure type

mwas.obj <- import.plot.params(opts)
plot(mwas.obj)

The above steps are exactly the same version as in the command-line version. Alternatively, you could also directly use inner-funcions rather than the wrapper functions. More detail on other functions could be found in the learn module tutorial.

Draw a beeswarm plot for each individual OTUs/taxa

The command format is very silimar to gradient plot, except the changes in some options. The beeswarm plot is similar to a scatter plot but each point is closely packed, non-overlapped to each other. It is another way to visualize the distribution of samples.
Beeswarm plot example

Command-line version (in Terminal)

Rscript $MWAS_DIR/bin/mwas_analysis.R -w plot -M beeswarm -i $MWAS_DIR/test/data/taxa/GG_100nt_even10k-adults_L7.biom -o example/beeswarm2 -m $MWAS_DIR/test/data/gg-map-adults.txt -c COUNTRY -A 0.05 -N 20 -S

-w: plot mode
-M: beeswarm plot
-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table
-o: output directory; the gradient plot is saved as a .pdf file
-m: mapping file; Category name should be given as well
-c: categroy name
-F: taxon statistic test result table, including p-values, q-vaules (adjusted p-value; False Discovery Rate control); Required if -i option is empty
-A: False discovery rate control cutoff
-N: Number of taxa to be considered; If omitted, then plot all the taxa that selected
-S: shorten the taxonomy names in order to show only the lowest taxon level name (removing k__ etc. to simplify the taxon names on the plots)

R version (in R Console)

If you are familiar with R, you could manipulate your data more flexibly. Here is the same example as shown in the command-line version.

Set work directory (This step only needs once.)

setwd("~/Documents/LabProjects/mwas_git/")

Load MWAS functions (This step only needs once.)

file.sources = list.files("lib", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))

Set visualization parameters

opts <- list()
opts$mode <- "plot"
opts$method <- "beeswarm"
opts$input_fp <- "test/data/taxa/GG_100nt_even10k-adults_L7.biom"
opts$map_fp <- "test/data/gg-map-adults.txt"
opts$category <- "COUNTRY"
opts$outdir <- "example/plot-beeswarm"
opts$shorten_taxa <- TRUE
opts$fdr <- 0.05
opts$nplot <- 20

Creat the output directory if needed

if(opts$outdir != ".") dir.create(opts$outdir,showWarnings=FALSE, recursive=TRUE)

Parse input parameters and plot the corresponding figure type

mwas.obj <- import.plot.params(opts)
plot(mwas.obj)

Draw a scatter plot for each individual OTUs/taxa

The command format is silimar to beeswarm plot, except the changes in some options. The scatter plot shows correlation between two categories for each selected taxon. It there are more than two categories, then the function outputs pairwise comparison results, as shown below, which has three categories.

scatter plot example

Command-line version (in Terminal)

Rscript $MWAS_DIR/bin/mwas_analysis.R -w plot -M scatterplot -i test/data/taxa/merged-taxa.txt -o example/scatterplot -m test/data/gg-map-adults.txt -c COUNTRY -A 0.01 -N 20 -S

-w: plot mode
-M: scatterplot
-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table
-o: output directory; the gradient plot is saved as a .pdf file
-m: mapping file; Category name should be given as well
-c: categroy name
-A: False discovery rate control cutoff
-N: Number of taxa to be considered; If omitted, then plot all the taxa that selected
-S: shorten the taxonomy names in order to show only the lowest taxon level name (removing k__ etc. to simplify the taxon names on the plots)

R version (in R Console)

If you are familiar with R, you could manipulate your data more flexibly. Here is the same example as shown in the command-line version.

Set work directory (This step only needs once.)

setwd("~/Documents/LabProjects/mwas_git/")

Load MWAS functions (This step only needs once.)

file.sources = list.files("lib", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))

Set visualization parameters

opts <- list()
opts$mode <- "plot"
opts$method <- "scatterplot"
opts$input_fp <- "test/data/taxa/merged-taxa.txt"
opts$map_fp <- "test/data/gg-map-adults.txt"
opts$category <- "COUNTRY"
opts$outdir <- "example/scatter-plot"
opts$shorten_taxa <- TRUE
opts$fdr <- 0.01
opts$nplot <- 20

Creat the output directory if needed

if(opts$outdir != ".") dir.create(opts$outdir,showWarnings=FALSE, recursive=TRUE)

Parse input parameters and plot the corresponding figure type

mwas.obj <- import.plot.params(opts)
plot(mwas.obj)

Reference

Hu Huang, Emmanuel Montassier, Pajau Vangay, Gabe Al Ghalith, Dan Knights. “Robust statistical models for microbiome phenotype prediction with the MWAS package” (in preparation)