There are two different feature selection methods provided. The options are (case-insensitive)
FDR: rank and select features by their ranked false discovery rate, q-valueRF : rank and select features by the variance importance value from the random Forest algorithm
The output is a list of feature statistics, including ranked feature list and their corresponding statistc scores (individual hypothesis test p-values, false discovery rate adjusted q-value
Rscript $MWAS_DIR/bin/mwas_analysis.R -w statistics -M FDR -i $MWAS_DIR/test/data/taxa/GG_100nt_even10k-adults_L7.biom -m $MWAS_DIR/test/data/gg-map-adults.txt -c COUNTRY -o example/feat_set -A 0.05
Or if using the randomForest method:
Rscript $MWAS_DIR/bin/mwas_analysis.R -w statistics -M RF -i $MWAS_DIR/test/data/taxa/GG_100nt_even10k-adults_L7.biom -m $MWAS_DIR/test/data/gg-map-adults.txt -c COUNTRY -o example/feat_set -A 0.05
-w: statistics mode-M: FDR statistics or RF randomForest-i: input file directory; it could be a .biom format table or a .txt format OTU or taxon table-m: Mapping file -c: Category name -o: output directory; the gradient plot is saved as a .pdf file-A: threshold for feature selection: alpha value for FDR (maximum q-value) or minimum importance value for RF
# 1. Set work directory (**This step only needs once**.)
setwd("~/Documents/LabProjects/mwas_git/")
# 2. Load MWAS functions (**This step only needs once**.)
file.sources = list.files("lib/", pattern="*.R$", full.names=TRUE, ignore.case=TRUE)
invisible(sapply(file.sources, source, .GlobalEnv))
## This is vegan 2.2-1
## Loaded glmnet 1.9-8
##
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
## Type 'citation("pROC")' for a citation.
# 3. Set the feature statistics parameters
opts <- list()
opts$mode <- "statistics"
opts$method <- "fdr" # or 'rf', case-insensitive
opts$input_fp <- "data/taxa/GG_100nt_even10k-adults_L7.biom"
opts$map_fp <- "data/gg-map-adults.txt"
opts$category <- "COUNTRY"
opts$outdir <- "example/feat_test"
opts$fdr <- 0.05 # statistics threshold
# 4. Calculate the feature statistics and output to a designated file
feat_stats <- import.stats.params(opts)
statistical.test.mwas(feat_stats)
## The feature statistics listed in the directory example/feat_test
Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(3), 479-498.
Noble, W. S. (2009). How does multiple testing correction work? Nature biotechnology, 27(12), 1135-1137.
Storey, J.D. (2010). False discovery rate. Retrieved on Feb. 1, 2015, from http://www.genomine.org/papers/Storey_FDR_2010.pdf
Hu Huang, Emmanuel Montassier, Pajau Vangay, Gabe Al Ghalith, Dan Knights. Robust statistical models for microbiome phenotype prediction with the MWAS package (in preparation)