MWAS feature statistics

There are two different feature selection methods provided. The options are (case-insensitive)

FDR: rank and select features by their ranked false discovery rate, q-value
RF : rank and select features by the variance importance value from the random Forest algorithm

The output is a list of feature statistics, including ranked feature list and their corresponding statistc scores (individual hypothesis test p-values, false discovery rate adjusted q-value

Return to Index

Command-line version (in Terminal)

Rscript $MWAS_DIR/bin/mwas_analysis.R -w statistics -M FDR -i $MWAS_DIR/test/data/taxa/GG_100nt_even10k-adults_L7.biom -m $MWAS_DIR/test/data/gg-map-adults.txt -c COUNTRY -o example/feat_set -A 0.05

Or if using the randomForest method:

Rscript $MWAS_DIR/bin/mwas_analysis.R -w statistics -M RF -i $MWAS_DIR/test/data/taxa/GG_100nt_even10k-adults_L7.biom -m $MWAS_DIR/test/data/gg-map-adults.txt -c COUNTRY -o example/feat_set -A 0.05

-w: statistics mode
-M: FDR statistics or RF randomForest
-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table
-m: Mapping file -c: Category name -o: output directory; the gradient plot is saved as a .pdf file
-A: threshold for feature selection: alpha value for FDR (maximum q-value) or minimum importance value for RF

R version (in R Console)

# 1. Set work directory  (**This step only needs once**.) 
setwd("~/Documents/LabProjects/mwas_git/")
  
# 2. Load MWAS functions  (**This step only needs once**.)

file.sources = list.files("lib/", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))

## This is vegan 2.2-1
## Loaded glmnet 1.9-8
## 
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
## Type 'citation("pROC")' for a citation.

# 3. Set the feature statistics parameters
opts <- list()
opts$mode <- "statistics"
opts$method <- "fdr"  # or 'rf', case-insensitive
opts$input_fp <- "data/taxa/GG_100nt_even10k-adults_L7.biom"
opts$map_fp <- "data/gg-map-adults.txt"
opts$category <- "COUNTRY"
opts$outdir <- "example/feat_test"
opts$fdr <- 0.05      # statistics threshold

# 4. Calculate the feature statistics and output to a designated file
feat_stats <- import.stats.params(opts)
statistical.test.mwas(feat_stats)

## The feature statistics listed in the directory  example/feat_test

References

Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479-498.
Noble, W. S. (2009). How does multiple testing correction work? Nature biotechnology, 27(12), 1135-1137.
Storey, J.D. (2010). False discovery rate. Retrieved on Feb. 1, 2015, from http://www.genomine.org/papers/Storey_FDR_2010.pdf
Hu Huang, Emmanuel Montassier, Pajau Vangay, Gabe Al Ghalith, Dan Knights. Robust statistical models for microbiome phenotype prediction with the MWAS package (in preparation)

MWAS feature statistics

Hu Huang (huan0764 (AT) umn.edu)

March 1, 2015

References