MWAS package provides multiple formats of visualization methods, including beeswarm, gradient, scatterplot, violin*, boxplot*, and heatmap*.
The raw input data might need a preprocessing step in order to show the relationships appropriately. The function preprocess.mwas includes the following functions (Table 1: Preprocessing options):
Remove nonoverlapping samples across multiple input tables
Convert OTU abudance from absolute to relative values, i.e. the sum of OTU abundance for each sample is equal to 1. (if option is.relative.conversion=TRUE or -r option is on which leads to suppress_relative_abundance_conversion=FALSE)
Remove rare features (OTUs) accroding to the mean abundance of a feature which is smaller than the given threshold. (if option -p or min_prevalence is given a value)
Three data transformation methods are provided. asin_sqrt, norm_asin_sqrt, and none (through option transform_type or -t)
Collapse the OTU table by correlation of 0.95 (if option is.collapse or -b is on)
Filter lineage table (kegg table) if option is.filter.kegg or -K is on (Used in the customized heatmap)
Table 1: Preprocessing options
Command-line version: All visualization related options are capital letters, except the input options shared with other modules (the blue ones in Table 2). Examples will follow in the next few sections, using the data in the directory test/data/.
Table 2: Visualization options
(* Not available in the current version.)
Rscript $MWAS_DIR/bin/mwas_analysis.R -w plot -M gradient -i test/data/taxa/GG_100nt_even10k-adults_L7.biom -o example/plot_otu_gradient -S
-w: plot mode-M: gradient plot-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table-o: output directory; the gradient plot is saved as a .pdf file-S: shorten the taxonomy names in order to show only the lowest taxon level name (removing k__ etc. to simplify the taxon names on the plots)
If you are familiar with R, you could manipulate your data in a more flexible way. Here is the same example as shown in the command-line version.
setwd("~/Documents/LabProjects/mwas_git/")
file.sources = list.files("lib", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))
opts <- list()
opts$mode <- "plot"
opts$method <- "gradient"
opts$input_fp <- "test/data/taxa/GG_100nt_even10k-adults_L7.biom"
opts$transform_type <- "none"
opts$suppress_relative_abundance_conversion <- FALSE
opts$min_prevalence <- NULL
opts$collapse_table <- FALSE
opts$outdir <- "example/plot_otu_gradient"
opts$shorten_taxa <- TRUE
opts$multiple_axes <- FALSE
opts$filter_kegg <- FALSE
if(opts$outdir != ".") dir.create(opts$outdir,showWarnings=FALSE, recursive=TRUE)
mwas.obj <- import.plot.params(opts)
plot(mwas.obj)
The above steps are exactly the same version as in the command-line version. Alternatively, you could also directly use inner-funcions rather than the wrapper functions. More detail on other functions could be found in the learn module tutorial.
The command format is very silimar to gradient plot, except the changes in some options. The beeswarm plot is similar to a scatter plot but each point is closely packed, non-overlapped to each other. It is another way to visualize the distribution of samples.
Rscript $MWAS_DIR/bin/mwas_analysis.R -w plot -M beeswarm -i $MWAS_DIR/test/data/taxa/GG_100nt_even10k-adults_L7.biom -o example/beeswarm2 -m $MWAS_DIR/test/data/gg-map-adults.txt -c COUNTRY -A 0.05 -N 20 -S
-w: plot mode-M: beeswarm plot-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table-o: output directory; the gradient plot is saved as a .pdf file-m: mapping file; Category name should be given as well-c: categroy name-F: taxon statistic test result table, including p-values, q-vaules (adjusted p-value; False Discovery Rate control); Required if -i option is empty-A: False discovery rate control cutoff-N: Number of taxa to be considered; If omitted, then plot all the taxa that selected-S: shorten the taxonomy names in order to show only the lowest taxon level name (removing k__ etc. to simplify the taxon names on the plots)
If you are familiar with R, you could manipulate your data more flexibly. Here is the same example as shown in the command-line version.
setwd("~/Documents/LabProjects/mwas_git/")
file.sources = list.files("lib", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))
opts <- list()
opts$mode <- "plot"
opts$method <- "beeswarm"
opts$input_fp <- "test/data/taxa/GG_100nt_even10k-adults_L7.biom"
opts$map_fp <- "test/data/gg-map-adults.txt"
opts$category <- "COUNTRY"
opts$outdir <- "example/plot-beeswarm"
opts$shorten_taxa <- TRUE
opts$fdr <- 0.05
opts$nplot <- 20
if(opts$outdir != ".") dir.create(opts$outdir,showWarnings=FALSE, recursive=TRUE)
mwas.obj <- import.plot.params(opts)
plot(mwas.obj)
The command format is silimar to beeswarm plot, except the changes in some options. The scatter plot shows correlation between two categories for each selected taxon. It there are more than two categories, then the function outputs pairwise comparison results, as shown below, which has three categories.
Rscript $MWAS_DIR/bin/mwas_analysis.R -w plot -M scatterplot -i test/data/taxa/merged-taxa.txt -o example/scatterplot -m test/data/gg-map-adults.txt -c COUNTRY -A 0.01 -N 20 -S
-w: plot mode-M: scatterplot-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table-o: output directory; the gradient plot is saved as a .pdf file-m: mapping file; Category name should be given as well-c: categroy name-A: False discovery rate control cutoff-N: Number of taxa to be considered; If omitted, then plot all the taxa that selected-S: shorten the taxonomy names in order to show only the lowest taxon level name (removing k__ etc. to simplify the taxon names on the plots)
If you are familiar with R, you could manipulate your data more flexibly. Here is the same example as shown in the command-line version.
setwd("~/Documents/LabProjects/mwas_git/")
file.sources = list.files("lib", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))
opts <- list()
opts$mode <- "plot"
opts$method <- "scatterplot"
opts$input_fp <- "test/data/taxa/merged-taxa.txt"
opts$map_fp <- "test/data/gg-map-adults.txt"
opts$category <- "COUNTRY"
opts$outdir <- "example/scatter-plot"
opts$shorten_taxa <- TRUE
opts$fdr <- 0.01
opts$nplot <- 20
if(opts$outdir != ".") dir.create(opts$outdir,showWarnings=FALSE, recursive=TRUE)
mwas.obj <- import.plot.params(opts)
plot(mwas.obj)
Hu Huang, Emmanuel Montassier, Pajau Vangay, Gabe Al Ghalith, Dan Knights. “Robust statistical models for microbiome phenotype prediction with the MWAS package” (in preparation)