Install packages (not evaluated in vignette)
install.packages(c("devtools", "tidyverse", "kableExtra", "gt", "glue"))
devtools::install_github("waldronlab/bugSigSimple")
devtools::install_github("waldronlab/BugSigDBStats")
devtools::install_github("waldronlab/bugsigdbr")
library(bugSigSimple)
dat <- bugsigdbr::importBugSigDB(cache = TRUE)
## Using cached version from 2021-08-08 18:48:08
dim(dat)
## [1] 2021 45
names(dat)
## [1] "Study" "Study design"
## [3] "PMID" "DOI"
## [5] "URL" "Authors"
## [7] "Title" "Journal"
## [9] "Year" "Abstract"
## [11] "Experiment" "Location of subjects"
## [13] "Host species" "Body site"
## [15] "Condition" "Group 0 name"
## [17] "Group 1 name" "Group 1 definition"
## [19] "Group 0 sample size" "Group 1 sample size"
## [21] "Antibiotics exclusion" "Sequencing type"
## [23] "16s variable region" "Sequencing platform"
## [25] "Statistical test" "Significance threshold"
## [27] "MHT correction" "LDA Score above"
## [29] "Matched on" "Confounders controlled for"
## [31] "Pielou" "Shannon"
## [33] "Chao1" "Simpson"
## [35] "Inverse Simpson" "Richness"
## [37] "Signature page name" "Source"
## [39] "Curated date" "Curator"
## [41] "Revision editor" "Description"
## [43] "Abundance in Group 1" "MetaPhlAn taxon names"
## [45] "NCBI Taxonomy IDs"
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
condsnew <- c("COVID-19")
covid_all <- subsetByCondition(dat, condsnew) %>%
mutate(studyexp = paste(Study, Experiment, sep = "_")) %>%
mutate(
site = recode(`Body site`,
"feces" = "Gut",
"rectal" = "Gut",
"nasopharynx" = "aURT",
"oropharynx" = "aURT",
"nasopharynx,oropharynx" = "aURT",
"tongue" = "aURT"
)
) %>%
mutate(comparison1 = paste(`Group 0 name`, `Group 1 name`, sep = " vs "))
bugSigSimple::createStudyTable(covid_all)
Study | Condition | Cases | Controls | Study Design |
---|---|---|---|---|
Braun 2021 | COVID-19 | 26 | 29 | cross-sectional observational, not case-control |
Cao 2021 | COVID-19 | 13 | 8 | case-control |
Chen 2021 | COVID-19 | 30 | 30 | time series / longitudinal observational |
De 2020 | COVID-19 | 18 | 22 | cross-sectional observational, not case-control |
Engen 2021 | COVID-19 | 9 | 10 | prospective cohort |
Gaibani 2021 | COVID-19 | 24 | 24 | case-control |
Gu 2020 | COVID-19 | 30 | 30 | cross-sectional observational, not case-control |
Hoque 2021 | COVID-19 | 11 | 10 | case-control |
Khan 2021 | COVID-19 | 10 | 10 | cross-sectional observational, not case-control |
Kim 2021 | COVID-19 | 12 | 36 | time series / longitudinal observational |
Liu 2021 | COVID-19 | 6 | 3 | cross-sectional observational, not case-control |
Ma 2021 | COVID-19 | 31 | 29 | cross-sectional observational, not case-control |
Mazzarelli 2021 | COVID-19 | 9 | 9 | cross-sectional observational, not case-control |
Miao 2021 | COVID-19 | 50 | 54 | cross-sectional observational, not case-control |
Mostafa 2020 | COVID-19 | 40 | 10 | cross-sectional observational, not case-control |
NANA | COVID-19 | 79 | 67 | cross-sectional observational, not case-control |
Nardelli 2021 | COVID-19 | 18 | 12 | cross-sectional observational, not case-control |
Newsome 2021 | COVID-19 | 50 | 50 | case-control |
Ren 2021 | COVID-19 | 48 | 150 | cross-sectional observational, not case-control |
Rhoades 2021 | COVID-19 | 68 | 66 | case-control |
Rueca 2021 | COVID-19 | 11 | 11 | case-control |
Salazar 2021 | COVID-19 | NA | NA | case-control |
Tao 2020 | COVID-19 | 62 | 40 | cross-sectional observational, not case-control |
Tian 2021 | COVID-19 | 7 | 7 | case-control |
Ventero 2021 | COVID-19 | 19 | 19 | cross-sectional observational, not case-control |
Wu 2021 | COVID-19 | 140 | 44 | cross-sectional observational, not case-control |
Xiong 2021 | COVID-19 | 11 | 11 | case-control |
Xu 2021 | COVID-19 | 48 | 15 | time series / longitudinal observational |
Yeoh 2021 | COVID-19 | 87 | 78 | case-control |
Zhong 2021 | COVID-19 | 15 | 7 | cross-sectional observational, not case-control |
Zhou 2021 | COVID-19 | 20 | 11 | cross-sectional observational, not case-control |
Zuo 2020 | COVID-19 | 8 | 15 | case-control |
Zuo 2021 | COVID-19 | 15 | 15 | prospective cohort |
library(dplyr)
gut_sigs <- filter(covid_all,
site == "Gut") %>%
drop_na(Source)
naso_sigs <- filter(covid_all,
site == "aURT") %>%
drop_na(Source)
In this table, the Binomial Test p-value corresponds to the null hypothesis
H0: the proportion of signatures in which the taxon is reported increased or decreased, relative to the total number of signatures in which it is reported, is equal to 0.5
kableExtra::kbl(bugSigSimple::createTaxonTable(gut_sigs))
Taxon Name | Taxonomic Level | total_signatures | increased_signatures | decreased_signatures | Binomial Test pval | kingdom | phylum | class | order | family | genus | species | n_signatures | metaphlan_name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Anaerostipes | genus | 14 | 4 | 10 | 0.18000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Anaerostipes | NA | 16 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerostipes |
Enterococcus | genus | 11 | 11 | 0 | 0.00098 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Enterococcaceae | Enterococcus | NA | 18 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Enterococcaceae|g__Enterococcus |
Streptococcus | genus | 11 | 10 | 1 | 0.01200 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Streptococcaceae | Streptococcus | NA | 17 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus |
Faecalibacterium | genus | 11 | 1 | 10 | 0.01200 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Oscillospiraceae | Faecalibacterium | NA | 17 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium |
Lactobacillaceae | family | 9 | 5 | 4 | 1.00000 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Lactobacillaceae | NA | NA | 20 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae |
Clostridium | genus | 9 | 4 | 5 | 1.00000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Clostridiaceae | Clostridium | NA | 17 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Clostridiaceae|g__Clostridium |
Anaerobutyricum hallii | species | 9 | 3 | 6 | 0.51000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Anaerobutyricum | Anaerobutyricum hallii | 9 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerobutyricum|s__Anaerobutyricum hallii |
Blautia | genus | 8 | 3 | 5 | 0.73000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Blautia | NA | 12 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Blautia |
Coprococcus | genus | 8 | 0 | 8 | 0.00780 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Coprococcus | NA | 15 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Coprococcus |
Roseburia | genus | 8 | 0 | 8 | 0.00780 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Roseburia | NA | 13 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Roseburia |
kableExtra::kbl(bugSigSimple::createTaxonTable(naso_sigs))
Taxon Name | Taxonomic Level | total_signatures | increased_signatures | decreased_signatures | Binomial Test pval | kingdom | phylum | class | order | family | genus | species | n_signatures | metaphlan_name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Veillonella | genus | 11 | 8 | 3 | 0.230 | Bacteria | Firmicutes | Negativicutes | Veillonellales | Veillonellaceae | Veillonella | NA | 12 | k__Bacteria|p__Firmicutes|c__Negativicutes|o__Veillonellales|f__Veillonellaceae|g__Veillonella |
Rothia | genus | 8 | 2 | 6 | 0.290 | Bacteria | Actinobacteria | Actinomycetia | Micrococcales | Micrococcaceae | Rothia | NA | 10 | k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Micrococcales|f__Micrococcaceae|g__Rothia |
Oribacterium | genus | 7 | 1 | 6 | 0.130 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Oribacterium | NA | 7 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Oribacterium |
Filifactor | genus | 7 | 1 | 6 | 0.130 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Peptostreptococcaceae | Filifactor | NA | 7 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Peptostreptococcaceae|g__Filifactor |
Acinetobacter | genus | 7 | 3 | 4 | 1.000 | Bacteria | Proteobacteria | Gammaproteobacteria | Pseudomonadales | Moraxellaceae | Acinetobacter | NA | 7 | k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Pseudomonadales|f__Moraxellaceae|g__Acinetobacter |
Prevotella | genus | 6 | 3 | 3 | 1.000 | Bacteria | Bacteroidetes | Bacteroidia | Bacteroidales | Prevotellaceae | Prevotella | NA | 17 | k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Prevotellaceae|g__Prevotella |
Streptococcus | genus | 6 | 1 | 5 | 0.220 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Streptococcaceae | Streptococcus | NA | 10 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus |
Arachnia | genus | 5 | 0 | 5 | 0.062 | Bacteria | Actinobacteria | Actinomycetia | Propionibacteriales | Propionibacteriaceae | Arachnia | NA | 5 | k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Propionibacteriales|f__Propionibacteriaceae|g__Arachnia |
Selenomonas | genus | 5 | 2 | 3 | 1.000 | Bacteria | Firmicutes | Negativicutes | Selenomonadales | Selenomonadaceae | Selenomonas | NA | 5 | k__Bacteria|p__Firmicutes|c__Negativicutes|o__Selenomonadales|f__Selenomonadaceae|g__Selenomonas |
Serratia | genus | 5 | 5 | 0 | 0.062 | Bacteria | Proteobacteria | Gammaproteobacteria | Enterobacterales | Yersiniaceae | Serratia | NA | 5 | k__Bacteria|p__Proteobacteria|c__Gammaproteobacteria|o__Enterobacterales|f__Yersiniaceae|g__Serratia |
Look specifically at case-control comparisons
healthy <- grepl(".*(healthy.*COVID|COVID.*healthy).*", gut_sigs$comparison1, ignore.case = TRUE)
cc_gut_sigs1 <- gut_sigs %>%
filter(healthy == TRUE)
Identifying a taxon reported consistently in 8 out of 8 studies is much more compelling that the FDR value here would suggest, since this taxon also passed a significance threshold in every one of those studies.
cc_gut_sigs1_taxontable <- bugSigSimple::createTaxonTable(cc_gut_sigs1) %>%
mutate(FDR = p.adjust(p = `Binomial Test pval`, method="fdr")) %>%
relocate(FDR, .after = `Binomial Test pval`)
kableExtra::kbl(cc_gut_sigs1_taxontable)
Taxon Name | Taxonomic Level | total_signatures | increased_signatures | decreased_signatures | Binomial Test pval | FDR | kingdom | phylum | class | order | family | genus | species | n_signatures | metaphlan_name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Enterococcus | genus | 8 | 8 | 0 | 0.0078 | 0.0390000 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Enterococcaceae | Enterococcus | NA | 11 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Enterococcaceae|g__Enterococcus |
Lactobacillaceae | family | 8 | 5 | 3 | 0.7300 | 0.7300000 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Lactobacillaceae | NA | NA | 12 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Lactobacillaceae |
Streptococcus | genus | 8 | 8 | 0 | 0.0078 | 0.0390000 | Bacteria | Firmicutes | Bacilli | Lactobacillales | Streptococcaceae | Streptococcus | NA | 10 | k__Bacteria|p__Firmicutes|c__Bacilli|o__Lactobacillales|f__Streptococcaceae|g__Streptococcus |
Anaerostipes | genus | 8 | 1 | 7 | 0.0700 | 0.1000000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Anaerostipes | NA | 9 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerostipes |
Coprococcus | genus | 7 | 0 | 7 | 0.0160 | 0.0533333 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Coprococcus | NA | 12 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Coprococcus |
Clostridium | genus | 6 | 2 | 4 | 0.6900 | 0.7300000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Clostridiaceae | Clostridium | NA | 8 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Clostridiaceae|g__Clostridium |
Anaerobutyricum hallii | species | 6 | 2 | 4 | 0.6900 | 0.7300000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Anaerobutyricum | Anaerobutyricum hallii | 6 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Anaerobutyricum|s__Anaerobutyricum hallii |
Roseburia | genus | 6 | 0 | 6 | 0.0310 | 0.0620000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Lachnospiraceae | Roseburia | NA | 8 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Lachnospiraceae|g__Roseburia |
Faecalibacterium | genus | 6 | 0 | 6 | 0.0310 | 0.0620000 | Bacteria | Firmicutes | Clostridia | Eubacteriales | Oscillospiraceae | Faecalibacterium | NA | 11 | k__Bacteria|p__Firmicutes|c__Clostridia|o__Eubacteriales|f__Oscillospiraceae|g__Faecalibacterium |
Rothia | genus | 5 | 5 | 0 | 0.0620 | 0.1000000 | Bacteria | Actinobacteria | Actinomycetia | Micrococcales | Micrococcaceae | Rothia | NA | 6 | k__Bacteria|p__Actinobacteria|c__Actinomycetia|o__Micrococcales|f__Micrococcaceae|g__Rothia |
Just for the increased cc_gut_sigs1 for now. I am inclined to skip this analysis in favor of the clustering and binomial test analysis.
library(bugSigSimple)
gut.sigs.increased <- filter(cc_gut_sigs1, `Abundance in Group 1` == "increased") %>%
bugsigdbr::getSignatures(tax.id.type = "taxname")
my.siglengths.inc <- sapply(gut.sigs.increased, length)
getCriticalN(gut.sigs.increased, my.siglengths.inc)
## 95%
## 10
# Compare to observed - enterococcus & streptococcus are the only taxa that equal the critical limit
frequencySigs(gut.sigs.increased)
## Enterococcus Streptococcus Enterobacteriaceae
## 8 8 5
## Enterocloster citroniae Enterococcaceae Lactobacillaceae
## 5 5 5
## Rothia Atopobium Lactobacillus
## 5 4 4
## Shigella
## 4
createTaxonTable(cc_gut_sigs1, n=40)
Look specifically at case-control comparisons
library(dplyr)
healthy <- grepl(".*(control.*COVID|COVID.*control).*", naso_sigs$comparison1, ignore.case = TRUE)
cc_naso_sigs1 <- naso_sigs %>%
filter(healthy == TRUE) %>%
subset(Study != "Study 458")
Note, this EDA should really be done before hypothesis testing.
First calculate pairwise overlaps for all signatures of length > 1:
allsigs <- bugsigdbr::getSignatures(covid_all, tax.id.type = "taxname")
allsigs <- allsigs[sapply(allsigs, length) > 1] #require length > 1
dim(allsigs)
## NULL
mydists <- BugSigDBStats::calcPairwiseOverlaps(allsigs)
dim(mydists)
## [1] 6670 8
Create a matrix of Jaccard similarities (0 for no overlap, 1 for 100% overlap)
signames <- unique(c(mydists$name1, mydists$name2))
jmat <- matrix(NA, nrow=length(signames), ncol=length(signames), dimnames=list(signames, signames))
diag(jmat) <- 1
for (i in 1:nrow(mydists)){
jmat[mydists[i, "name2"], mydists[i, "name1"]] <- mydists[i, "jaccard"]
jmat[mydists[i, "name1"], mydists[i, "name2"]] <- mydists[i, "jaccard"]
}
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.8.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
##
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
## genomic data. Bioinformatics 2016.
##
## The new InteractiveComplexHeatmap package can directly export static
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
siglengths <- sapply(allsigs, length)
ha <- HeatmapAnnotation(`Signature Length` = anno_barplot(siglengths))
hr <- rowAnnotation(`Signature Length` = anno_barplot(siglengths))
hm <- Heatmap(
jmat,
top_annotation = ha, left_annotation = hr,
row_names_max_width = unit(20, "cm"),
column_names_max_height = unit(20, "cm"),
row_labels = sub(".+:", "", rownames(jmat)),
column_labels = sub(".+:", "", colnames(jmat))
)
hm
Use this interactively to make an interactive heatmap. Some expanding of the default size is required to see anything. Creating a sub-heatmap, then exporting it as a table, allows in-depth identification of the subgroups.
library(InteractiveComplexHeatmap)
hm <- draw(hm)
htShiny(hm)
hc <- hclust(as.dist(jmat))
plot(hc)
This tree can be cut to show the clusters, for example. The clusters of more than 1 signature but less than ~10 signatures are most likely to be something interesting.
clusts <- sort(cutree(hc, h = 0.05))
lapply(unique(clusts), function(i) names(clusts)[clusts == i])
## [[1]]
## [1] "bsdb:428/1/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"
## [2] "bsdb:428/1/2_COVID-19:COVID-19-cases_vs_Healthy-controls_DOWN"
## [3] "bsdb:428/3/1_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_UP"
## [4] "bsdb:430/1/1_COVID-19:High-SARS-CoV-2-infectivity_vs_Low-to-none-SARS-CoV-2-infectivity_UP"
## [5] "bsdb:441/1/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"
## [6] "bsdb:441/3/2_COVID-19:COVID-19-cases_vs_H1N1-cases_DOWN"
## [7] "bsdb:449/1/2_COVID-19:COVID-19-cases_vs_Healthy-controls_DOWN"
## [8] "bsdb:453/3/1_COVID-19:Recovered-COVID-19-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_UP"
## [9] "bsdb:453/3/2_COVID-19:Recovered-COVID-19-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_DOWN"
## [10] "bsdb:458/1/1_COVID-19:COVID-19-cases_vs_Non-intubated-controls-with-non-incubation-viral-pneumonia-or-non-incubation-non-infectious-diseases_UP"
## [11] "bsdb:459/1/2_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_UP"
## [12] "bsdb:464/1/2_COVID-19:Confirmed-COVID-19-patients_vs_Healthy-controls_UP"
## [13] "bsdb:474/1/2_COVID-19:COVID-19-patients_vs_Non-COVID-patients-with-URTI-or-COPD_DOWN"
## [14] "bsdb:479/2/2_COVID-19:Mild-moderate-COVID-19-patients_vs_Healthy-controls_UP"
## [15] "bsdb:482/1/2_COVID-19:Severe-COVID-19-patients_vs_Mild-COVID-19-patients_UP"
## [16] "bsdb:483/1/1_COVID-19:Antibiotic-naive-COVID-19-patients_vs_Healthy-controls_UP"
## [17] "bsdb:483/3/1_COVID-19:Pneumonia-controls_vs_Healthy-controls_DOWN"
## [18] "bsdb:484/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls-with-fever-and-cough_UP"
## [19] "bsdb:485/1/1_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_UP"
## [20] "bsdb:511/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [21] "bsdb:512/3/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-healthcare-workers-(controls)_UP"
##
## [[2]]
## [1] "bsdb:428/2/1_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_UP"
## [2] "bsdb:428/2/2_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_DOWN"
## [3] "bsdb:441/1/2_COVID-19:COVID-19-cases_vs_Healthy-controls_DOWN"
## [4] "bsdb:441/4/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"
## [5] "bsdb:453/1/1_COVID-19:SARS-CoV-2-infected-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_UP"
## [6] "bsdb:464/2/2_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_UP"
## [7] "bsdb:469/1/1_COVID-19:COVID-19-patients_vs_Non-COVID-19-controls_DOWN"
## [8] "bsdb:474/1/1_COVID-19:COVID-19-patients_vs_Non-COVID-patients-with-URTI-or-COPD_UP"
## [9] "bsdb:476/1/1_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_UP"
## [10] "bsdb:478/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
## [11] "bsdb:479/2/1_COVID-19:Mild-moderate-COVID-19-patients_vs_Healthy-controls_DOWN"
## [12] "bsdb:481/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
## [13] "bsdb:487/1/1_COVID-19:Recovered-COVID-19-samples-(respiratory-negative)_vs_Infected-COVID-19-samples-(respiratory-positive)_UP"
## [14] "bsdb:498/1/2_COVID-19:COVID-19-patients_vs_Asymptomatic-uninfected-controls_DOWN"
## [15] "bsdb:511/3/2_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_DOWN"
## [16] "bsdb:511/4/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [17] "bsdb:512/2/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_UP"
##
## [[3]]
## [1] "bsdb:428/3/2_COVID-19:Recovered-COVID-19-cases_vs_Healthy-controls_DOWN"
## [2] "bsdb:453/2/1_COVID-19:SARS-CoV-2-recovered-patients_vs_SARS-CoV-2-infected-patients_DOWN"
## [3] "bsdb:458/1/2_COVID-19:COVID-19-cases_vs_Non-intubated-controls-with-non-incubation-viral-pneumonia-or-non-incubation-non-infectious-diseases_DOWN"
## [4] "bsdb:478/2/1_COVID-19:COVID-19-patients_vs_Flu-patients_DOWN"
## [5] "bsdb:479/1/1_COVID-19:ICU-COVID-19-patients_vs_Healthy-controls_DOWN"
## [6] "bsdb:479/4/2_COVID-19:ICU-COVID-19-patients_vs_Patients-infected-with-other-human-coronaviruses_UP"
## [7] "bsdb:481/2/1_COVID-19:Antibiotic-treated-COVID-19-patients_vs_Antibiotic-naive-COVID-19-patients_UP"
## [8] "bsdb:481/3/1_COVID-19:Severe-COVID-19-patients_vs_Mild-COVID-19-patients_UP"
## [9] "bsdb:485/2/2_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_DOWN"
## [10] "bsdb:487/2/1_COVID-19:Recovered-COVID-19-samples-(respiratory-negative)_vs_Healthy-controls_DOWN"
##
## [[4]]
## [1] "bsdb:428/4/1_COVID-19:COVID-19-patients-treated-with-antibiotics_vs_Healthy-controls_DOWN"
## [2] "bsdb:428/4/2_COVID-19:COVID-19-patients-treated-with-antibiotics_vs_Healthy-controls_UP"
## [3] "bsdb:430/1/2_COVID-19:High-SARS-CoV-2-infectivity_vs_Low-to-none-SARS-CoV-2-infectivity_DOWN"
## [4] "bsdb:453/1/2_COVID-19:SARS-CoV-2-infected-patients_vs_Non-infected-patients-with-unrelated-respiratory-medical-conditions_DOWN"
## [5] "bsdb:453/4/2_COVID-19:Positive-for-COVID-19-viral-RNA-in-feces_vs_Negative-for-COVID-19-viral-RNA-in-feces_DOWN"
## [6] "bsdb:464/2/1_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_DOWN"
## [7] "bsdb:477/1/1_COVID-19:Severe-COVID-19-patients_vs_COVID-19-negative-controls_UP"
## [8] "bsdb:479/3/1_COVID-19:ICU-COVID-19-patients_vs_Mild-Moderate-COVID-19-patients_DOWN"
## [9] "bsdb:484/4/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls-with-fever-and-cough_UP"
## [10] "bsdb:498/1/1_COVID-19:COVID-19-patients_vs_Asymptomatic-uninfected-controls_UP"
##
## [[5]]
## [1] "bsdb:441/3/1_COVID-19:COVID-19-cases_vs_H1N1-cases_UP"
## [2] "bsdb:473/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_DOWN"
## [3] "bsdb:478/3/1_COVID-19:Flu-patients_vs_Healthy-controls_UP"
## [4] "bsdb:479/1/2_COVID-19:ICU-COVID-19-patients_vs_Healthy-controls_UP"
## [5] "bsdb:481/2/2_COVID-19:Antibiotic-treated-COVID-19-patients_vs_Antibiotic-naive-COVID-19-patients_DOWN"
## [6] "bsdb:500/1/1_COVID-19:COVID-19-patients_vs_Non-COVID-19-pneumonia-patients_UP"
## [7] "bsdb:511/2/1_COVID-19:COVID-19-patients-(severe/critical)_vs_COVID-19-patients-(mild/moderate)_DOWN"
##
## [[6]]
## [1] "bsdb:449/1/1_COVID-19:COVID-19-cases_vs_Healthy-controls_UP"
## [2] "bsdb:453/2/2_COVID-19:SARS-CoV-2-recovered-patients_vs_SARS-CoV-2-infected-patients_UP"
## [3] "bsdb:459/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_DOWN"
## [4] "bsdb:478/5/1_COVID-19:Flu-patients_vs_Healthy-controls_DOWN"
## [5] "bsdb:481/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [6] "bsdb:486/1/2_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_DOWN"
## [7] "bsdb:486/2/1_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_UP"
## [8] "bsdb:489/1/1_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_UP"
## [9] "bsdb:513/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls_UP"
##
## [[7]]
## [1] "bsdb:453/4/1_COVID-19:Positive-for-COVID-19-viral-RNA-in-feces_vs_Negative-for-COVID-19-viral-RNA-in-feces_UP"
## [2] "bsdb:464/3/2_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
## [3] "bsdb:476/1/2_COVID-19:Recovered-COVID-19-patients_vs_Healthy-controls_DOWN"
## [4] "bsdb:489/2/1_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_UP"
## [5] "bsdb:498/2/1_COVID-19:COVID-19-patients-with-high-viral-load_vs_COVID-19-patients-with-low-viral-load_UP"
##
## [[8]]
## [1] "bsdb:464/1/1_COVID-19:Confirmed-COVID-19-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:478/4/2_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
## [3] "bsdb:479/3/2_COVID-19:ICU-COVID-19-patients_vs_Mild-Moderate-COVID-19-patients_UP"
## [4] "bsdb:483/2/1_COVID-19:Antibiotic-treated-COVID-19-patients_vs_Antibiotic-naive-COVID-19-patients_DOWN"
## [5] "bsdb:485/1/2_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_DOWN"
##
## [[9]]
## [1] "bsdb:464/3/1_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:478/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [3] "bsdb:478/5/2_COVID-19:Flu-patients_vs_Healthy-controls_UP"
## [4] "bsdb:489/3/2_COVID-19:COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_DOWN"
## [5] "bsdb:511/5/1_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_UP"
##
## [[10]]
## [1] "bsdb:478/3/2_COVID-19:Flu-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:479/4/1_COVID-19:ICU-COVID-19-patients_vs_Patients-infected-with-other-human-coronaviruses_DOWN"
## [3] "bsdb:482/1/1_COVID-19:Severe-COVID-19-patients_vs_Mild-COVID-19-patients_DOWN"
## [4] "bsdb:485/2/1_COVID-19:COVID-19-patients-with-fever_vs_COVID-19-patients-without-fever_UP"
##
## [[11]]
## [1] "bsdb:478/4/1_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:486/1/1_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_UP"
## [3] "bsdb:498/2/2_COVID-19:COVID-19-patients-with-high-viral-load_vs_COVID-19-patients-with-low-viral-load_DOWN"
## [4] "bsdb:511/3/1_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_UP"
## [5] "bsdb:511/5/2_COVID-19:COVID-19-patients-not-treated-with-antibiotics_vs_Healthy-controls_DOWN"
##
## [[12]]
## [1] "bsdb:486/2/2_COVID-19:Asymptomatic-COVID-19-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:511/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
## [3] "bsdb:512/1/1_COVID-19:COVID-19-positive-patients_vs_COVID-19-negative-controls-and-healthcare-workers_UP"
##
## [[13]]
## [1] "bsdb:486/3/1_COVID-19:Severe-COVID-19-patients_vs_Healthy-controls_UP"
## [2] "bsdb:489/1/2_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_DOWN"
## [3] "bsdb:496/2/1_COVID-19:COVID-19-patients_vs_Seasonal-flu-patients_UP"
##
## [[14]]
## [1] "bsdb:486/3/2_COVID-19:Severe-COVID-19-patients_vs_Healthy-controls_DOWN"
## [2] "bsdb:511/4/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
##
## [[15]]
## [1] "bsdb:487/2/2_COVID-19:Recovered-COVID-19-samples-(respiratory-negative)_vs_Healthy-controls_UP"
## [2] "bsdb:487/3/1_COVID-19:Infected-COVID-19-samples-(respiratory-positive)_vs_Healthy-controls_DOWN"
## [3] "bsdb:496/2/2_COVID-19:COVID-19-patients_vs_Seasonal-flu-patients_DOWN"
## [4] "bsdb:513/2/1_COVID-19:COVID-19-positive-patients-with-breathing-assistance_vs_COVID-19-positive-patients-without-breathing-assistance_UP"
##
## [[16]]
## [1] "bsdb:487/3/2_COVID-19:Infected-COVID-19-samples-(respiratory-positive)_vs_Healthy-controls_UP"
## [2] "bsdb:489/3/1_COVID-19:COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_vs_Pneumonia-controls-(COVID-negative)_UP"
## [3] "bsdb:496/1/2_COVID-19:COVID-19-patients_vs_Healthy-controls_DOWN"
## [4] "bsdb:500/1/2_COVID-19:COVID-19-patients_vs_Non-COVID-19-pneumonia-patients_DOWN"
##
## [[17]]
## [1] "bsdb:489/2/2_COVID-19:COVID-positive-patients-admitted-to-ICU-(i-COVID-19)_vs_COVID-positive-patients-admitted-to-infectious-disease-ward-(w-COVID-19)_DOWN"
## [2] "bsdb:496/1/1_COVID-19:COVID-19-patients_vs_Healthy-controls_UP"
This would be suitable for regression analysis.
covid_withsigs <- filter(covid_all, !is.na(covid_all$`NCBI Taxonomy IDs`))
sigs <- bugsigdbr::getSignatures(covid_withsigs, tax.id.type = "taxname")
cmat <- t(safe::getCmatrix(sigs, as.matrix = TRUE, min.size = 0, prune = FALSE))
## WARNING: rows are sorted elements of keyword.list
## 125 categories formed
cdf <- data.frame(cmat, stringsAsFactors = FALSE, check.names = FALSE)
cdf <- cbind(covid_withsigs, cdf)
colnames(cdf)[1:54]
## [1] "Study" "Study design"
## [3] "PMID" "DOI"
## [5] "URL" "Authors"
## [7] "Title" "Journal"
## [9] "Year" "Abstract"
## [11] "Experiment" "Location of subjects"
## [13] "Host species" "Body site"
## [15] "Condition" "Group 0 name"
## [17] "Group 1 name" "Group 1 definition"
## [19] "Group 0 sample size" "Group 1 sample size"
## [21] "Antibiotics exclusion" "Sequencing type"
## [23] "16s variable region" "Sequencing platform"
## [25] "Statistical test" "Significance threshold"
## [27] "MHT correction" "LDA Score above"
## [29] "Matched on" "Confounders controlled for"
## [31] "Pielou" "Shannon"
## [33] "Chao1" "Simpson"
## [35] "Inverse Simpson" "Richness"
## [37] "Signature page name" "Source"
## [39] "Curated date" "Curator"
## [41] "Revision editor" "Description"
## [43] "Abundance in Group 1" "MetaPhlAn taxon names"
## [45] "NCBI Taxonomy IDs" "studyexp"
## [47] "site" "comparison1"
## [49] "[Brevibacterium] frigoritolerans" "[Clostridium] colinum"
## [51] "[Clostridium] scindens" "[Clostridium] spiroforme"
## [53] "[Clostridium] symbiosum" "[Clostridium] viride"
Note this has a number of columns that are mostly zeros, it could be filtered significantly for any regression or machine learning analysis:
table(cdf[["[Brevibacterium] frigoritolerans"]])
##
## 0 1
## 124 1
Create another heatmap on correlations of presence/absence of taxa. This is not necessary because the previous Jaccard Index heatmap is probably better, it is just a demonstration of doing something with the taxa presence/absence directly.
sigcors <- cor(t(cmat))
siglengths <- sapply(sigs, length)
ha <- HeatmapAnnotation(`Signature Length` = anno_barplot(siglengths))
hr <- rowAnnotation(`Signature Length` = anno_barplot(siglengths))
hm <- Heatmap(
sigcors,
top_annotation = ha, left_annotation = hr,
row_names_max_width = unit(20, "cm"),
column_names_max_height = unit(20, "cm"),
row_labels = sub(".+:", "", rownames(sigcors)), ##removing study just to make signature names legible
column_labels = sub(".+:", "", colnames(sigcors))
)
hm
Use this interactively to make an interactive heatmap:
library(InteractiveComplexHeatmap)
hm <- draw(hm)
htShiny(hm)