Practicing Corncob

Gordon Ober

2024-07-28

The purpose of this lab is to introduce you all to a useful tool for analyzing and visualizing relative abundance. This is the analysis that came from the paper you all read. The tutorial below is written by the author of the paper. Use the following link to access the tutorial and the code:

https://cran.r-project.org/web/packages/corncob/vignettes/corncob-intro-no-phyloseq.html

As you go through the tutorial, copy and paste code below to perform the tasks. Additionally, I have a few questions you need to answer.

remotes::install_github("statdivlab/corncob")
## Downloading GitHub repo statdivlab/corncob@HEAD
## glue (1.7.0  -> 1.8.0 ) [CRAN]
## ps   (1.7.7  -> 1.8.0 ) [CRAN]
## slam (0.1-52 -> 0.1-54) [CRAN]
## VGAM (1.1-11 -> 1.1-12) [CRAN]
## Installing 4 packages: glue, ps, slam, VGAM
## Installing packages into 'C:/Users/JourdanHourican/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
## Warning in download.file(url, destfile, method, mode = "wb", ...): downloaded
## length 0 != reported length 18143
## Warning in download.file(url, destfile, method, mode = "wb", ...): cannot open
## URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/glue_1.8.0.zip': HTTP
## status was '403 Forbidden'
## Error in download.file(url, destfile, method, mode = "wb", ...) : 
##   cannot open URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/glue_1.8.0.zip'
## Warning in download.packages(pkgs, destdir = tmpd, available = available, :
## download of package 'glue' failed
## Warning in download.file(url, destfile, method, mode = "wb", ...): downloaded
## length 0 != reported length 18135
## Warning in download.file(url, destfile, method, mode = "wb", ...): cannot open
## URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/ps_1.8.0.zip': HTTP
## status was '403 Forbidden'
## Error in download.file(url, destfile, method, mode = "wb", ...) : 
##   cannot open URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/ps_1.8.0.zip'
## Warning in download.packages(pkgs, destdir = tmpd, available = available, :
## download of package 'ps' failed
## Warning in download.file(url, destfile, method, mode = "wb", ...): downloaded
## length 0 != reported length 18147
## Warning in download.file(url, destfile, method, mode = "wb", ...): cannot open
## URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/slam_0.1-54.zip': HTTP
## status was '403 Forbidden'
## Error in download.file(url, destfile, method, mode = "wb", ...) : 
##   cannot open URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/slam_0.1-54.zip'
## Warning in download.packages(pkgs, destdir = tmpd, available = available, :
## download of package 'slam' failed
## Warning in download.file(url, destfile, method, mode = "wb", ...): downloaded
## length 0 != reported length 18147
## Warning in download.file(url, destfile, method, mode = "wb", ...): cannot open
## URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/VGAM_1.1-12.zip': HTTP
## status was '403 Forbidden'
## Error in download.file(url, destfile, method, mode = "wb", ...) : 
##   cannot open URL 'https://cloud.r-project.org/bin/windows/contrib/4.4/VGAM_1.1-12.zip'
## Warning in download.packages(pkgs, destdir = tmpd, available = available, :
## download of package 'VGAM' failed
## ── R CMD build ─────────────────────────────────────────────────────────────────
##          checking for file 'C:\Users\JourdanHourican\AppData\Local\Temp\RtmpCuAdvo\remotes22541bad355\statdivlab-corncob-a0309c5/DESCRIPTION' ...     checking for file 'C:\Users\JourdanHourican\AppData\Local\Temp\RtmpCuAdvo\remotes22541bad355\statdivlab-corncob-a0309c5/DESCRIPTION' ...   ✔  checking for file 'C:\Users\JourdanHourican\AppData\Local\Temp\RtmpCuAdvo\remotes22541bad355\statdivlab-corncob-a0309c5/DESCRIPTION' (968ms)
##       ─  preparing 'corncob': (1.2s)
##    checking DESCRIPTION meta-information ...     checking DESCRIPTION meta-information ...   ✔  checking DESCRIPTION meta-information
##       ─  checking for LF line-endings in source and make files and shell scripts (1.1s)
##       ─  checking for empty or unneeded directories
##       ─  building 'corncob_0.4.1.tar.gz'
##      
## 
## Installing package into 'C:/Users/JourdanHourican/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)
library(corncob)
library(magrittr)
data(soil_phylo_sample)
data(soil_phylo_otu)
data(soil_phylo_taxa)

head(soil_phylo_sample)
##      Plants DayAmdmt Amdmt ID Day
## S009      1       01     1  D   0
## S204      1       21     1  D   2
## S112      0       11     1  B   1
## S247      0       22     2  F   2
## S026      0       00     0  A   0
## S023      1       00     0  C   0
soil_phylo_otu[1:5, 1:5]
##         S009 S204 S112 S247 S026
## OTU.43   350   74  300   70   43
## OTU.2   1796 4204 1752  695  945
## OTU.187  280  709  426  100  139
## OTU.150   33  151   18   13   28
## OTU.91     0    0  184    0    0
soil_phylo_taxa[1:3, ]
##         Kingdom    Phylum           Class                 Order             
## OTU.43  "Bacteria" "Nitrospirae"    "Nitrospira"          "Nitrospirales"   
## OTU.2   "Bacteria" "Proteobacteria" "Alphaproteobacteria" "Rhizobiales"     
## OTU.187 "Bacteria" "Acidobacteria"  "Acidobacteriia"      "Acidobacteriales"
##         Family              Genus            Species
## OTU.43  "Nitrospiraceae"    "Nitrospira"     ""     
## OTU.2   "Bradyrhizobiaceae" "Bradyrhizobium" ""     
## OTU.187 "Koribacteraceae"   ""               ""
data(soil_phylum_small_sample)
sample_data <- soil_phylum_small_sample
data(soil_phylum_small_otu)
data <- soil_phylum_small_otu

pro_data <- cbind(sample_data, 
                  W = unlist(data["Proteobacteria", ]),
                  M = colSums(data))

corncob <- bbdml(formula = cbind(W, M - W) ~ 1,
             phi.formula = ~ 1,
             data = pro_data)

plot(corncob, B = 50)

plot(corncob, total = TRUE, B = 50)

plot(corncob, total = TRUE, color = "DayAmdmt", B = 50)

plot(corncob, color = "DayAmdmt", B = 50)

corncob_da <- bbdml(formula = cbind(W, M - W) ~ DayAmdmt,
             phi.formula = ~ DayAmdmt,
             data = pro_data)
plot(corncob_da, color = "DayAmdmt", total = TRUE, B = 50)

plot(corncob_da, color = "DayAmdmt", B = 50)

lrtest(mod_null = corncob, mod = corncob_da)
## [1] 4.550571e-05
summary(corncob_da)
## 
## Call:
## bbdml(formula = cbind(W, M - W) ~ DayAmdmt, phi.formula = ~DayAmdmt, 
##     data = pro_data)
## 
## 
## Coefficients associated with abundance:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.44595    0.03604 -12.375 7.18e-13 ***
## DayAmdmt21  -0.16791    0.04067  -4.129 0.000297 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Coefficients associated with dispersion:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.3077     0.3537 -15.008 6.44e-15 ***
## DayAmdmt21   -1.3518     0.5029  -2.688    0.012 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Log-likelihood: -286.53
set.seed(1)
da_analysis <- differentialTest(formula = ~ DayAmdmt,
                                 phi.formula = ~ DayAmdmt,
                                 formula_null = ~ 1,
                                 phi.formula_null = ~ DayAmdmt,
                                 test = "Wald", boot = FALSE,
                                 data = data,
                                 sample_data = sample_data,
                                 taxa_are_rows = TRUE, 
                                 fdr_cutoff = 0.05)
da_analysis
## Object of class differentialTest 
## 
## $p: p-values 
## $p_fdr: FDR-adjusted p-values 
## $significant_taxa: taxa names of the statistically significant taxa 
## $significant_models: model summaries of the statistically significant taxa 
## $all_models: all model summaries 
## $restrictions_DA: covariates tested for differential abundance 
## $restrictions_DV: covariates tested for differential variability 
## $discriminant_taxa_DA: taxa for which at least one covariate associated with the abundance was perfectly discriminant 
## $discriminant_taxa_DV: taxa for which at least one covariate associated with the dispersion was perfectly discriminant 
## 
## plot( ) to see a plot of tested coefficients from significant taxa
da_analysis$significant_taxa
##  [1] "Proteobacteria"   "Gemmatimonadetes" "Bacteroidetes"    "Cyanobacteria"   
##  [5] "Firmicutes"       "Planctomycetes"   "Armatimonadetes"  "Spirochaetes"    
##  [9] "Elusimicrobia"    "BRC1"             "OP3"              "FBP"             
## [13] "Chlorobi"         "TM6"
set.seed(1)
dv_analysis <- differentialTest(formula = ~ DayAmdmt,
                                 phi.formula = ~ DayAmdmt,
                                 formula_null = ~ DayAmdmt,
                                 phi.formula_null = ~ 1,
                                 test = "LRT", boot = FALSE,
                                 data = data,
                                 sample_data = sample_data,
                                 taxa_are_rows = TRUE, 
                                 fdr_cutoff = 0.05)
dv_analysis$significant_taxa
## [1] "Acidobacteria" "Cyanobacteria" "Spirochaetes"  "Elusimicrobia"
## [5] "FBP"
da_analysis$p[1:5]
##    Acidobacteria   Proteobacteria Gemmatimonadetes   Actinobacteria 
##     6.509417e-01     3.642734e-05     3.270448e-13     3.703096e-01 
##         [Thermi] 
##     1.031419e-01
da_analysis$p_fdr[1:5]
##    Acidobacteria   Proteobacteria Gemmatimonadetes   Actinobacteria 
##     7.811301e-01     1.457094e-04     3.924537e-12     5.172744e-01 
##         [Thermi] 
##     2.062838e-01
plot(da_analysis)

which(is.na(da_analysis$p)) %>% names
## [1] "GN04"   "GN02"   "MVP-21"
data["GN04", ]
##      S204 S112 S134 S207 S202 S139 S122 S212 S117 S104 S214 S109 S217 S229 S132
## GN04    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
##      S209 S227 S107 S237 S224 S127 S137 S114 S124 S119 S219 S232 S129 S102 S234
## GN04    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
##      S222 S239
## GN04    0    0
ex1 <- differentialTest(formula = ~ Day,
                        phi.formula = ~ 1,
                        formula_null = ~ 1,
                        phi.formula_null = ~ 1,
                        data = data,
                        taxa_are_rows = TRUE,
                        sample_data = sample_data, 
                        test = "Wald", boot = FALSE,
                        fdr_cutoff = 0.05)
plot(ex1)

ex2 <- differentialTest(formula = ~ Day,
                        phi.formula = ~ Day,
                        formula_null = ~ 1,
                        phi.formula_null = ~ Day,
                        data = data,
                        taxa_are_rows = TRUE,
                        sample_data = sample_data, 
                        test = "Wald", boot = FALSE,
                        fdr_cutoff = 0.05)
plot(ex2)

ex3 <- differentialTest(formula = ~ Day,
                        phi.formula = ~ Day,
                        formula_null = ~ 1,
                        phi.formula_null = ~ 1,
                        data = data,
                        taxa_are_rows = TRUE,
                        sample_data = sample_data, 
                        test = "Wald", boot = FALSE,
                        fdr_cutoff = 0.05)
plot(ex3)

QUESTIONS: 1. How do you interpret the first relative abundance plot? (Black and White one)

This relative abundance plot shows relative abundance by sample with 95% prediction intervals. This includes 50 bootstrapping samples. The data points do not show a clear trend and relative abundance is quite similar between all of the samples.

  1. Why do we add covariates and what does it do to the relative abundance plot (the red and blue one at the end of the section, compare to previous red and blue plot)

We add covariates to increase precision by explaining some variability between the data. Covariates also help account for confounding variables. As a result, data points shift, error bars shorten (due to increased precision) and we see better fit to the data on the relative abundance plot.

  1. In the final section of the tutorial “Examples of Answering Scientific Questions”, why is it useful/helpful to make plots like these? How would you interpret these plots?

These plots are useful when testing for differential abundance. In this case it is helpful when you want to compare original data with data controlling for different variables. These plots are interpreted by looking at the relative abundance (x-axis) for each individual Taxa (y-axis) and comparing it to the other Taxa.In this case the condition of the right tells us how significant differential abundance is being determined, and that if a taxon’s abundance data point or error bars cross zero it is considered not significantly different.