Analysis objectives

  1. Import, recode, and subset data from bugsigdb.org
  2. Create a table of studies
  3. Create a clustered heatmap showing similarity of signatures from independent studies
  4. Calculate the frequency of appearance of each taxa in independent signatures, and identify the most frequently reported taxa
  5. Estimate the probability of the most frequently identified taxa occuring so frequently by chance

Packages installation

Install packages (not evaluated in vignette)

install.packages(c("devtools", "tidyverse", "kableExtra", "gt", "glue"))
devtools::install_github("waldronlab/bugSigSimple")
devtools::install_github("waldronlab/BugSigDBStats")
devtools::install_github("waldronlab/bugsigdbr")

Data import, recoding, and subset

library(bugSigSimple)
dat <- bugsigdbr::importBugSigDB(cache = TRUE) 
## Using cached version from 2021-08-08 18:48:08
dim(dat)
## [1] 2021   45
names(dat)
##  [1] "Study"                      "Study design"              
##  [3] "PMID"                       "DOI"                       
##  [5] "URL"                        "Authors"                   
##  [7] "Title"                      "Journal"                   
##  [9] "Year"                       "Abstract"                  
## [11] "Experiment"                 "Location of subjects"      
## [13] "Host species"               "Body site"                 
## [15] "Condition"                  "Group 0 name"              
## [17] "Group 1 name"               "Group 1 definition"        
## [19] "Group 0 sample size"        "Group 1 sample size"       
## [21] "Antibiotics exclusion"      "Sequencing type"           
## [23] "16s variable region"        "Sequencing platform"       
## [25] "Statistical test"           "Significance threshold"    
## [27] "MHT correction"             "LDA Score above"           
## [29] "Matched on"                 "Confounders controlled for"
## [31] "Pielou"                     "Shannon"                   
## [33] "Chao1"                      "Simpson"                   
## [35] "Inverse Simpson"            "Richness"                  
## [37] "Signature page name"        "Source"                    
## [39] "Curated date"               "Curator"                   
## [41] "Revision editor"            "Description"               
## [43] "Abundance in Group 1"       "MetaPhlAn taxon names"     
## [45] "NCBI Taxonomy IDs"
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
condsnew <- c("COVID-19")
covid_all <- subsetByCondition(dat, condsnew) %>%
  mutate(studyexp = paste(Study, Experiment, sep = "_")) %>%
  mutate(
    site = recode(`Body site`,
      "feces" = "Gut",
      "rectal" = "Gut",
      "nasopharynx" = "aURT",
      "oropharynx" = "aURT",
      "nasopharynx,oropharynx" = "aURT",
      "tongue" = "aURT"
    )
  ) %>%
  mutate(comparison1 = paste(`Group 0 name`, `Group 1 name`, sep = " vs "))

Table of studies

bugSigSimple::createStudyTable(covid_all)
Study Condition Cases Controls Study Design
Braun 2021 COVID-19 26 29 cross-sectional observational, not case-control
Cao 2021 COVID-19 13 8 case-control
Chen 2021 COVID-19 30 30 time series / longitudinal observational
De 2020 COVID-19 18 22 cross-sectional observational, not case-control
Engen 2021 COVID-19 9 10 prospective cohort
Gaibani 2021 COVID-19 24 24 case-control
Gu 2020 COVID-19 30 30 cross-sectional observational, not case-control
Hoque 2021 COVID-19 11 10 case-control
Khan 2021 COVID-19 10 10 cross-sectional observational, not case-control
Kim 2021 COVID-19 12 36 time series / longitudinal observational
Liu 2021 COVID-19 6 3 cross-sectional observational, not case-control
Ma 2021 COVID-19 31 29 cross-sectional observational, not case-control
Mazzarelli 2021 COVID-19 9 9 cross-sectional observational, not case-control
Miao 2021 COVID-19 50 54 cross-sectional observational, not case-control
Mostafa 2020 COVID-19 40 10 cross-sectional observational, not case-control
NANA COVID-19 79 67 cross-sectional observational, not case-control
Nardelli 2021 COVID-19 18 12 cross-sectional observational, not case-control
Newsome 2021 COVID-19 50 50 case-control
Ren 2021 COVID-19 48 150 cross-sectional observational, not case-control
Rhoades 2021 COVID-19 68 66 case-control
Rueca 2021 COVID-19 11 11 case-control
Salazar 2021 COVID-19 NA NA case-control
Tao 2020 COVID-19 62 40 cross-sectional observational, not case-control
Tian 2021 COVID-19 7 7 case-control
Ventero 2021 COVID-19 19 19 cross-sectional observational, not case-control
Wu 2021 COVID-19 140 44 cross-sectional observational, not case-control
Xiong 2021 COVID-19 11 11 case-control
Xu 2021 COVID-19 48 15 time series / longitudinal observational
Yeoh 2021 COVID-19 87 78 case-control
Zhong 2021 COVID-19 15 7 cross-sectional observational, not case-control
Zhou 2021 COVID-19 20 11 cross-sectional observational, not case-control
Zuo 2020 COVID-19 8 15 case-control
Zuo 2021 COVID-19 15 15 prospective cohort

Taxon frequency tables by body site

library(dplyr)
gut_sigs <- filter(covid_all, 
                           site == "Gut") %>%
    drop_na(Source)

naso_sigs <- filter(covid_all, 
                           site == "aURT") %>%
  drop_na(Source)

In this table, the Binomial Test p-value corresponds to the null hypothesis

H0: the proportion of signatures in which the taxon is reported increased or decreased, relative to the total number of signatures in which it is reported, is equal to 0.5

kableExtra::kbl(bugSigSimple::createTaxonTable(gut_sigs))
Taxon Name Taxonomic Level total_signatures increased_signatures decreased_signatures Binomial Test pval kingdom phylum class order family genus species n_signatures metaphlan_name
Anaerostipes genus 14 4 10 0.18000 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Anaerostipes NA 16 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerostipes
Enterococcus genus 11 11 0 0.00098 Bacteria Firmicutes Bacilli Lactobacillales Enterococcaceae Enterococcus NA 18 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Enterococcaceae|g__Enterococcus
Streptococcus genus 11 10 1 0.01200 Bacteria Firmicutes Bacilli Lactobacillales Streptococcaceae Streptococcus NA 17 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus
Faecalibacterium genus 11 1 10 0.01200 Bacteria Firmicutes Clostridia Eubacteriales Oscillospiraceae Faecalibacterium NA 17 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium
Lactobacillaceae family 9 5 4 1.00000 Bacteria Firmicutes Bacilli Lactobacillales Lactobacillaceae NA NA 20 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae
Clostridium genus 9 4 5 1.00000 Bacteria Firmicutes Clostridia Eubacteriales Clostridiaceae Clostridium NA 17 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Clostridiaceae|g__Clostridium
Anaerobutyricum hallii species 9 3 6 0.51000 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Anaerobutyricum Anaerobutyricum hallii 9 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerobutyricum|s__Anaerobutyricum hallii
Blautia genus 8 3 5 0.73000 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Blautia NA 12 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Blautia
Coprococcus genus 8 0 8 0.00780 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Coprococcus NA 15 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Coprococcus
Roseburia genus 8 0 8 0.00780 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Roseburia NA 13 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Roseburia
kableExtra::kbl(bugSigSimple::createTaxonTable(naso_sigs))
Taxon Name Taxonomic Level total_signatures increased_signatures decreased_signatures Binomial Test pval kingdom phylum class order family genus species n_signatures metaphlan_name
Veillonella genus 11 8 3 0.230 Bacteria Firmicutes Negativicutes Veillonellales Veillonellaceae Veillonella NA 12 k__Bacteria|p__Firmicutes|c__Negativicutes|o__Veillonellales|f__Veillonellaceae|g__Veillonella
Rothia genus 8 2 6 0.290 Bacteria Actinobacteria Actinomycetia Micrococcales Micrococcaceae Rothia NA 10 k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Micrococcales|f__Micrococcaceae|g__Rothia
Oribacterium genus 7 1 6 0.130 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Oribacterium NA 7 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Oribacterium
Filifactor genus 7 1 6 0.130 Bacteria Firmicutes Clostridia Eubacteriales Peptostreptococcaceae Filifactor NA 7 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Peptostreptococcaceae|g__Filifactor
Acinetobacter genus 7 3 4 1.000 Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales Moraxellaceae Acinetobacter NA 7 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Moraxellaceae|g__Acinetobacter
Prevotella genus 6 3 3 1.000 Bacteria Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae Prevotella NA 17 k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Prevotellaceae|g__Prevotella
Streptococcus genus 6 1 5 0.220 Bacteria Firmicutes Bacilli Lactobacillales Streptococcaceae Streptococcus NA 10 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus
Arachnia genus 5 0 5 0.062 Bacteria Actinobacteria Actinomycetia Propionibacteriales Propionibacteriaceae Arachnia NA 5 k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Propionibacteriales|f__Propionibacteriaceae|g__Arachnia
Selenomonas genus 5 2 3 1.000 Bacteria Firmicutes Negativicutes Selenomonadales Selenomonadaceae Selenomonas NA 5 k__Bacteria|p__Firmicutes|c__Negativicutes|o__Selenomonadales|f__Selenomonadaceae|g__Selenomonas
Serratia genus 5 5 0 0.062 Bacteria Proteobacteria Gammaproteobacteria Enterobacterales Yersiniaceae Serratia NA 5 k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Yersiniaceae|g__Serratia

gut microbiota analysis

Look specifically at case-control comparisons

healthy <- grepl(".*(healthy.*COVID|COVID.*healthy).*", gut_sigs$comparison1, ignore.case = TRUE)

cc_gut_sigs1 <- gut_sigs %>%
      filter(healthy == TRUE) 

Overall frequencies of taxa increased in cases for case/control feces studies

Identifying a taxon reported consistently in 8 out of 8 studies is much more compelling that the FDR value here would suggest, since this taxon also passed a significance threshold in every one of those studies.

cc_gut_sigs1_taxontable <- bugSigSimple::createTaxonTable(cc_gut_sigs1) %>% 
  mutate(FDR =  p.adjust(p = `Binomial Test pval`, method="fdr")) %>%
  relocate(FDR, .after = `Binomial Test pval`)
kableExtra::kbl(cc_gut_sigs1_taxontable)
Taxon Name Taxonomic Level total_signatures increased_signatures decreased_signatures Binomial Test pval FDR kingdom phylum class order family genus species n_signatures metaphlan_name
Enterococcus genus 8 8 0 0.0078 0.0390000 Bacteria Firmicutes Bacilli Lactobacillales Enterococcaceae Enterococcus NA 11 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Enterococcaceae|g__Enterococcus
Lactobacillaceae family 8 5 3 0.7300 0.7300000 Bacteria Firmicutes Bacilli Lactobacillales Lactobacillaceae NA NA 12 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae
Streptococcus genus 8 8 0 0.0078 0.0390000 Bacteria Firmicutes Bacilli Lactobacillales Streptococcaceae Streptococcus NA 10 k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus
Anaerostipes genus 8 1 7 0.0700 0.1000000 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Anaerostipes NA 9 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerostipes
Coprococcus genus 7 0 7 0.0160 0.0533333 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Coprococcus NA 12 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Coprococcus
Clostridium genus 6 2 4 0.6900 0.7300000 Bacteria Firmicutes Clostridia Eubacteriales Clostridiaceae Clostridium NA 8 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Clostridiaceae|g__Clostridium
Anaerobutyricum hallii species 6 2 4 0.6900 0.7300000 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Anaerobutyricum Anaerobutyricum hallii 6 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerobutyricum|s__Anaerobutyricum hallii
Roseburia genus 6 0 6 0.0310 0.0620000 Bacteria Firmicutes Clostridia Eubacteriales Lachnospiraceae Roseburia NA 8 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Roseburia
Faecalibacterium genus 6 0 6 0.0310 0.0620000 Bacteria Firmicutes Clostridia Eubacteriales Oscillospiraceae Faecalibacterium NA 11 k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium
Rothia genus 5 5 0 0.0620 0.1000000 Bacteria Actinobacteria Actinomycetia Micrococcales Micrococcaceae Rothia NA 6 k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Micrococcales|f__Micrococcaceae|g__Rothia

Monte-Carlo simulation for increased abundance taxa

Just for the increased cc_gut_sigs1 for now. I am inclined to skip this analysis in favor of the clustering and binomial test analysis.

library(bugSigSimple)
gut.sigs.increased <- filter(cc_gut_sigs1, `Abundance in Group 1` == "increased") %>% 
  bugsigdbr::getSignatures(tax.id.type = "taxname")
 my.siglengths.inc <- sapply(gut.sigs.increased, length)

getCriticalN(gut.sigs.increased, my.siglengths.inc)
## 95% 
##  10
# Compare to observed - enterococcus & streptococcus are the only taxa that equal the critical limit
frequencySigs(gut.sigs.increased)
##            Enterococcus           Streptococcus      Enterobacteriaceae 
##                       8                       8                       5 
## Enterocloster citroniae         Enterococcaceae        Lactobacillaceae 
##                       5                       5                       5 
##                  Rothia               Atopobium           Lactobacillus 
##                       5                       4                       4 
##                Shigella 
##                       4

Overall frequencies of taxa decreased in cases for case/control feces studies

createTaxonTable(cc_gut_sigs1, n=40)

nasopharyngeal microbiota analysis

Look specifically at case-control comparisons

library(dplyr)
healthy <- grepl(".*(control.*COVID|COVID.*control).*", naso_sigs$comparison1, ignore.case = TRUE)

cc_naso_sigs1 <- naso_sigs %>%
      filter(healthy == TRUE) %>%
  subset(Study != "Study 458")

Cluster analysis

Note, this EDA should really be done before hypothesis testing.

First calculate pairwise overlaps for all signatures of length > 1:

allsigs <- bugsigdbr::getSignatures(covid_all, tax.id.type = "taxname")
allsigs <- allsigs[sapply(allsigs, length) > 1] #require length > 1
dim(allsigs)
## NULL
mydists <- BugSigDBStats::calcPairwiseOverlaps(allsigs)
dim(mydists)
## [1] 6670    8

Create a matrix of Jaccard similarities (0 for no overlap, 1 for 100% overlap)

signames <- unique(c(mydists$name1, mydists$name2))
jmat <- matrix(NA, nrow=length(signames), ncol=length(signames), dimnames=list(signames, signames))
diag(jmat) <- 1
for (i in 1:nrow(mydists)){
  jmat[mydists[i, "name2"], mydists[i, "name1"]] <- mydists[i, "jaccard"]
  jmat[mydists[i, "name1"], mydists[i, "name2"]] <- mydists[i, "jaccard"]
}
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.8.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
siglengths <- sapply(allsigs, length)
ha <- HeatmapAnnotation(`Signature Length` = anno_barplot(siglengths))
hr <- rowAnnotation(`Signature Length` = anno_barplot(siglengths))
hm <- Heatmap(
  jmat,
  top_annotation = ha, left_annotation = hr,
  row_names_max_width = unit(20, "cm"),
  column_names_max_height = unit(20, "cm"),
  row_labels = sub(".+:", "", rownames(jmat)),
  column_labels = sub(".+:", "", colnames(jmat))
)
hm

Use this interactively to make an interactive heatmap. Some expanding of the default size is required to see anything. Creating a sub-heatmap, then exporting it as a table, allows in-depth identification of the subgroups.

library(InteractiveComplexHeatmap)
hm <- draw(hm)
htShiny(hm)
hc <- hclust(as.dist(jmat))
plot(hc)

This tree can be cut to show the clusters, for example. The clusters of more than 1 signature but less than ~10 signatures are most likely to be something interesting.

clusts <- sort(cutree(hc, h = 0.05))
lapply(unique(clusts), function(i) names(clusts)[clusts == i])
## [[1]]
##  [1] "bsdb:428/1/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"                                                                                    
##  [2] "bsdb:428/1/2_COVID-19:COVID-19-cases_vs_Healthy-controls_DOWN"                                                                                  
##  [3] "bsdb:428/3/1_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_UP"                                                                          
##  [4] "bsdb:430/1/1_COVID-19:High-SARS-CoV-2-infectivity_vs_Low-to-none-SARS-CoV-2-infectivity_UP"                                                     
##  [5] "bsdb:441/1/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"                                                                                    
##  [6] "bsdb:441/3/2_COVID-19:COVID-19-cases_vs_H1N1-cases_DOWN"                                                                                        
##  [7] "bsdb:449/1/2_COVID-19:COVID-19-cases_vs_Healthy-controls_DOWN"                                                                                  
##  [8] "bsdb:453/3/1_COVID-19:Recovered-COVID-19-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_UP"                    
##  [9] "bsdb:453/3/2_COVID-19:Recovered-COVID-19-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_DOWN"                  
## [10] "bsdb:458/1/1_COVID-19:COVID-19-cases_vs_Non-intubated-controls-with-non-incubation-viral-pneumonia-or-non-incubation-non-infectious-diseases_UP"
## [11] "bsdb:459/1/2_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_UP"                                                              
## [12] "bsdb:464/1/2_COVID-19:Confirmed-COVID-19-patients_vs_Healthy-controls_UP"                                                                       
## [13] "bsdb:474/1/2_COVID-19:COVID-19-patients_vs_Non-COVID-patients-with-URTI-or-COPD_DOWN"                                                           
## [14] "bsdb:479/2/2_COVID-19:Mild-moderate-COVID-19-patients_vs_Healthy-controls_UP"                                                                   
## [15] "bsdb:482/1/2_COVID-19:Severe-COVID-19-patients_vs_Mild-COVID-19-patients_UP"                                                                    
## [16] "bsdb:483/1/1_COVID-19:Antibiotic-naive-COVID-19-patients_vs_Healthy-controls_UP"                                                                
## [17] "bsdb:483/3/1_COVID-19:Pneumonia-controls_vs_Healthy-controls_DOWN"                                                                              
## [18] "bsdb:484/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls-with-fever-and-cough_UP"                                         
## [19] "bsdb:485/1/1_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_UP"                                                       
## [20] "bsdb:511/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                                                               
## [21] "bsdb:512/3/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-healthcare-workers-(controls)_UP"                                         
## 
## [[2]]
##  [1] "bsdb:428/2/1_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_UP"                                                         
##  [2] "bsdb:428/2/2_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_DOWN"                                                       
##  [3] "bsdb:441/1/2_COVID-19:COVID-19-cases_vs_Healthy-controls_DOWN"                                                                 
##  [4] "bsdb:441/4/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"                                                                   
##  [5] "bsdb:453/1/1_COVID-19:SARS-CoV-2-infected-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_UP"  
##  [6] "bsdb:464/2/2_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_UP"                                                      
##  [7] "bsdb:469/1/1_COVID-19:COVID-19-patients_vs_Non-COVID-19-controls_DOWN"                                                         
##  [8] "bsdb:474/1/1_COVID-19:COVID-19-patients_vs_Non-COVID-patients-with-URTI-or-COPD_UP"                                            
##  [9] "bsdb:476/1/1_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_UP"                                                      
## [10] "bsdb:478/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"                                                                
## [11] "bsdb:479/2/1_COVID-19:Mild-moderate-COVID-19-patients_vs_Healthy-controls_DOWN"                                                
## [12] "bsdb:481/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"                                                                
## [13] "bsdb:487/1/1_COVID-19:Recovered-COVID-19-samples-(respiratory-negative)_vs_Infected-COVID-19-samples-(respiratory-positive)_UP"
## [14] "bsdb:498/1/2_COVID-19:COVID-19-patients_vs_Asymptomatic-uninfected-controls_DOWN"                                              
## [15] "bsdb:511/3/2_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_DOWN"                                 
## [16] "bsdb:511/4/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                                              
## [17] "bsdb:512/2/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_UP"                                             
## 
## [[3]]
##  [1] "bsdb:428/3/2_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_DOWN"                                                                          
##  [2] "bsdb:453/2/1_COVID-19:SARS-CoV-2-recovered-patients_vs_SARS-CoV-2-infected-patients_DOWN"                                                         
##  [3] "bsdb:458/1/2_COVID-19:COVID-19-cases_vs_Non-intubated-controls-with-non-incubation-viral-pneumonia-or-non-incubation-non-infectious-diseases_DOWN"
##  [4] "bsdb:478/2/1_COVID-19:COVID-19-patients_vs_Flu-patients_DOWN"                                                                                     
##  [5] "bsdb:479/1/1_COVID-19:ICU-COVID-19-patients_vs_Healthy-controls_DOWN"                                                                             
##  [6] "bsdb:479/4/2_COVID-19:ICU-COVID-19-patients_vs_Patients-infected-with-other-human-coronaviruses_UP"                                               
##  [7] "bsdb:481/2/1_COVID-19:Antibiotic-treated-COVID-19-patients_vs_Antibiotic-naive-COVID-19-patients_UP"                                              
##  [8] "bsdb:481/3/1_COVID-19:Severe-COVID-19-patients_vs_Mild-COVID-19-patients_UP"                                                                      
##  [9] "bsdb:485/2/2_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_DOWN"                                                       
## [10] "bsdb:487/2/1_COVID-19:Recovered-COVID-19-samples-(respiratory-negative)_vs_Healthy-controls_DOWN"                                                 
## 
## [[4]]
##  [1] "bsdb:428/4/1_COVID-19:COVID-19-patients-treated-with-antibiotics_vs_Healthy-controls_DOWN"                                     
##  [2] "bsdb:428/4/2_COVID-19:COVID-19-patients-treated-with-antibiotics_vs_Healthy-controls_UP"                                       
##  [3] "bsdb:430/1/2_COVID-19:High-SARS-CoV-2-infectivity_vs_Low-to-none-SARS-CoV-2-infectivity_DOWN"                                  
##  [4] "bsdb:453/1/2_COVID-19:SARS-CoV-2-infected-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_DOWN"
##  [5] "bsdb:453/4/2_COVID-19:Positive-for-COVID-19-viral-RNA-in-feces_vs_Negative-for-COVID-19-viral-RNA-in-feces_DOWN"               
##  [6] "bsdb:464/2/1_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_DOWN"                                                    
##  [7] "bsdb:477/1/1_COVID-19:Severe-COVID-19-patients_vs_COVID-19-negative-controls_UP"                                               
##  [8] "bsdb:479/3/1_COVID-19:ICU-COVID-19-patients_vs_Mild-Moderate-COVID-19-patients_DOWN"                                           
##  [9] "bsdb:484/4/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls-with-fever-and-cough_UP"                        
## [10] "bsdb:498/1/1_COVID-19:COVID-19-patients_vs_Asymptomatic-uninfected-controls_UP"                                                
## 
## [[5]]
## [1] "bsdb:441/3/1_COVID-19:COVID-19-cases_vs_H1N1-cases_UP"                                                
## [2] "bsdb:473/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_DOWN"                  
## [3] "bsdb:478/3/1_COVID-19:Flu-patients_vs_Healthy-controls_UP"                                            
## [4] "bsdb:479/1/2_COVID-19:ICU-COVID-19-patients_vs_Healthy-controls_UP"                                   
## [5] "bsdb:481/2/2_COVID-19:Antibiotic-treated-COVID-19-patients_vs_Antibiotic-naive-COVID-19-patients_DOWN"
## [6] "bsdb:500/1/1_COVID-19:COVID-19-patients_vs_Non-COVID-19-pneumonia-patients_UP"                        
## [7] "bsdb:511/2/1_COVID-19:COVID-19-patients-(severe/critical)_vs_COVID-19-patients-(mild/moderate)_DOWN"  
## 
## [[6]]
## [1] "bsdb:449/1/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"                                                         
## [2] "bsdb:453/2/2_COVID-19:SARS-CoV-2-recovered-patients_vs_SARS-CoV-2-infected-patients_UP"                              
## [3] "bsdb:459/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_DOWN"                                 
## [4] "bsdb:478/5/1_COVID-19:Flu-patients_vs_Healthy-controls_DOWN"                                                         
## [5] "bsdb:481/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                                    
## [6] "bsdb:486/1/2_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_DOWN"                                       
## [7] "bsdb:486/2/1_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_UP"                                         
## [8] "bsdb:489/1/1_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_UP"
## [9] "bsdb:513/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_UP"                                   
## 
## [[7]]
## [1] "bsdb:453/4/1_COVID-19:Positive-for-COVID-19-viral-RNA-in-feces_vs_Negative-for-COVID-19-viral-RNA-in-feces_UP"                                            
## [2] "bsdb:464/3/2_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"                                                                                           
## [3] "bsdb:476/1/2_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_DOWN"                                                                               
## [4] "bsdb:489/2/1_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_UP"
## [5] "bsdb:498/2/1_COVID-19:COVID-19-patients-with-high-viral-load_vs_COVID-19-patients-with-low-viral-load_UP"                                                 
## 
## [[8]]
## [1] "bsdb:464/1/1_COVID-19:Confirmed-COVID-19-patients_vs_Healthy-controls_DOWN"                           
## [2] "bsdb:478/4/2_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"                                       
## [3] "bsdb:479/3/2_COVID-19:ICU-COVID-19-patients_vs_Mild-Moderate-COVID-19-patients_UP"                    
## [4] "bsdb:483/2/1_COVID-19:Antibiotic-treated-COVID-19-patients_vs_Antibiotic-naive-COVID-19-patients_DOWN"
## [5] "bsdb:485/1/2_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_DOWN"           
## 
## [[9]]
## [1] "bsdb:464/3/1_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                                                          
## [2] "bsdb:478/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                                                          
## [3] "bsdb:478/5/2_COVID-19:Flu-patients_vs_Healthy-controls_UP"                                                                                 
## [4] "bsdb:489/3/2_COVID-19:COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_DOWN"
## [5] "bsdb:511/5/1_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_UP"                                               
## 
## [[10]]
## [1] "bsdb:478/3/2_COVID-19:Flu-patients_vs_Healthy-controls_DOWN"                                         
## [2] "bsdb:479/4/1_COVID-19:ICU-COVID-19-patients_vs_Patients-infected-with-other-human-coronaviruses_DOWN"
## [3] "bsdb:482/1/1_COVID-19:Severe-COVID-19-patients_vs_Mild-COVID-19-patients_DOWN"                       
## [4] "bsdb:485/2/1_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_UP"            
## 
## [[11]]
## [1] "bsdb:478/4/1_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                          
## [2] "bsdb:486/1/1_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_UP"                               
## [3] "bsdb:498/2/2_COVID-19:COVID-19-patients-with-high-viral-load_vs_COVID-19-patients-with-low-viral-load_DOWN"
## [4] "bsdb:511/3/1_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_UP"               
## [5] "bsdb:511/5/2_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_DOWN"             
## 
## [[12]]
## [1] "bsdb:486/2/2_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_DOWN"                           
## [2] "bsdb:511/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"                                          
## [3] "bsdb:512/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls-and-healthcare-workers_UP"
## 
## [[13]]
## [1] "bsdb:486/3/1_COVID-19:Severe-COVID-19-patients_vs_Healthy-controls_UP"                                                 
## [2] "bsdb:489/1/2_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_DOWN"
## [3] "bsdb:496/2/1_COVID-19:COVID-19-patients_vs_Seasonal-flu-patients_UP"                                                   
## 
## [[14]]
## [1] "bsdb:486/3/2_COVID-19:Severe-COVID-19-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:511/4/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"         
## 
## [[15]]
## [1] "bsdb:487/2/2_COVID-19:Recovered-COVID-19-samples-(respiratory-negative)_vs_Healthy-controls_UP"                                          
## [2] "bsdb:487/3/1_COVID-19:Infected-COVID-19-samples-(respiratory-positive)_vs_Healthy-controls_DOWN"                                         
## [3] "bsdb:496/2/2_COVID-19:COVID-19-patients_vs_Seasonal-flu-patients_DOWN"                                                                   
## [4] "bsdb:513/2/1_COVID-19:COVID-19-positive-patients-with-breathing-assistance_vs_COVID-19-positive-patients-without-breathing-assistance_UP"
## 
## [[16]]
## [1] "bsdb:487/3/2_COVID-19:Infected-COVID-19-samples-(respiratory-positive)_vs_Healthy-controls_UP"                                           
## [2] "bsdb:489/3/1_COVID-19:COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_UP"
## [3] "bsdb:496/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"                                                                        
## [4] "bsdb:500/1/2_COVID-19:COVID-19-patients_vs_Non-COVID-19-pneumonia-patients_DOWN"                                                         
## 
## [[17]]
## [1] "bsdb:489/2/2_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_DOWN"
## [2] "bsdb:496/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"

Create a wide-format dataframe

This would be suitable for regression analysis.

covid_withsigs <- filter(covid_all, !is.na(covid_all$`NCBI Taxonomy IDs`))
sigs <- bugsigdbr::getSignatures(covid_withsigs, tax.id.type = "taxname")
cmat <- t(safe::getCmatrix(sigs, as.matrix = TRUE, min.size = 0, prune = FALSE))
## WARNING: rows are sorted elements of keyword.list
## 125 categories formed
cdf <- data.frame(cmat, stringsAsFactors = FALSE, check.names = FALSE)
cdf <- cbind(covid_withsigs, cdf)
colnames(cdf)[1:54]
##  [1] "Study"                            "Study design"                    
##  [3] "PMID"                             "DOI"                             
##  [5] "URL"                              "Authors"                         
##  [7] "Title"                            "Journal"                         
##  [9] "Year"                             "Abstract"                        
## [11] "Experiment"                       "Location of subjects"            
## [13] "Host species"                     "Body site"                       
## [15] "Condition"                        "Group 0 name"                    
## [17] "Group 1 name"                     "Group 1 definition"              
## [19] "Group 0 sample size"              "Group 1 sample size"             
## [21] "Antibiotics exclusion"            "Sequencing type"                 
## [23] "16s variable region"              "Sequencing platform"             
## [25] "Statistical test"                 "Significance threshold"          
## [27] "MHT correction"                   "LDA Score above"                 
## [29] "Matched on"                       "Confounders controlled for"      
## [31] "Pielou"                           "Shannon"                         
## [33] "Chao1"                            "Simpson"                         
## [35] "Inverse Simpson"                  "Richness"                        
## [37] "Signature page name"              "Source"                          
## [39] "Curated date"                     "Curator"                         
## [41] "Revision editor"                  "Description"                     
## [43] "Abundance in Group 1"             "MetaPhlAn taxon names"           
## [45] "NCBI Taxonomy IDs"                "studyexp"                        
## [47] "site"                             "comparison1"                     
## [49] "[Brevibacterium] frigoritolerans" "[Clostridium] colinum"           
## [51] "[Clostridium] scindens"           "[Clostridium] spiroforme"        
## [53] "[Clostridium] symbiosum"          "[Clostridium] viride"

Note this has a number of columns that are mostly zeros, it could be filtered significantly for any regression or machine learning analysis:

table(cdf[["[Brevibacterium] frigoritolerans"]])
## 
##   0   1 
## 124   1

Create another heatmap on correlations of presence/absence of taxa. This is not necessary because the previous Jaccard Index heatmap is probably better, it is just a demonstration of doing something with the taxa presence/absence directly.

sigcors <- cor(t(cmat))
siglengths <- sapply(sigs, length)
ha <- HeatmapAnnotation(`Signature Length` = anno_barplot(siglengths))
hr <- rowAnnotation(`Signature Length` = anno_barplot(siglengths))
hm <- Heatmap(
  sigcors,
  top_annotation = ha, left_annotation = hr,
  row_names_max_width = unit(20, "cm"),
  column_names_max_height = unit(20, "cm"),
  row_labels = sub(".+:", "", rownames(sigcors)), ##removing study just to make signature names legible
  column_labels = sub(".+:", "", colnames(sigcors))
)
hm

Use this interactively to make an interactive heatmap:

library(InteractiveComplexHeatmap)
hm <- draw(hm)
htShiny(hm)