Two main approaches (2)

treat as count data, log-linear generalized linear model (GLM)
- log-linear systematic component
- typically negative binomially-distributed random component
- model can include an “offset” term to account for different sequencing depths

Advantages
- GLM framework provides great flexibility to deal with sequencing depth, over-dispersion
- coefficients are readily interpretable in “multiplicative” models
- phyloseq and DESeq2 packages simplify the process
Disadvantages
- models are more complicated

Multiple Linear Regression Model (approach 1)

Systematic part of model:

\[ E[y|x] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p \]

\(E[y|x]\) is the expected value of \(y\) given \(x\)
\(y\) is the outcome, response, or dependent variable
\(x\) is the vector of predictors / independent variables
\(x_p\) are the individual predictors or independent variables
\(\beta_p\) are the regression coefficients

Multiple Linear Regression Model (cont’d)

Random part of model:

\(y_i = E[y_i|x_i] + \epsilon_i\)

\(y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_i\)

\(x_{pi}\) is the value of predictor \(x_j\) for observation \(i\)

Assumption: \(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma_\epsilon^2)\)

Normal distribution of \(\epsilon_i\)
Mean zero at every value of predictors
Constant variance at every value of predictors
Observations are independent

Generalized Linear Models

Linear regression is a special case of a broad family of models called “Generalized Linear Models” (GLM)
This unifying approach allows to fit a large set of models using maximum likelihood estimation methods (MLE) (Nelder & Wedderburn, 1972)
Can model many types of data directly using appropriate random distribution and “link” function
- Transformations of \(Y\) not needed

Components of GLM

Random component specifies the conditional distribution for the response variable
- e.g. normal, Poisson, Negative Binomial…
Systematic component specifies linear function of predictors (linear predictor)
Link [denoted by g(.)] specifies the relationship between the expected value of the random component and the systematic component
- can be linear or nonlinear

Linear Regression as GLM

The model: \(y_i = E[y|x_i] + \epsilon_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} + \epsilon_i\)
Random component of \(y_i\) is normally distributed: \(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma_\epsilon^2)\)
Systematic component (linear predictor): \(\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi}\)
Link function here is the identity link: \(g(E(y | x)) = E(y | x)\). We are modeling the mean directly, no transformation.

Log-linear models

Systematic component is:

\[ log(E[y|x_i]) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} \]

Or equivalently: \[ E[y|x_i] = exp \left( \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ... + \beta_p x_{pi} \right) \]

where \(E[y|x_i]\) is the expected number of counts for a microbe in subject i

Systematic plus random components:

\(\epsilon_i\) is typically Poisson or Negative Binomal distributed.

Note: Modeling \(log(E[y|x_i])\) is not equivalent to modeling \(E(log(y|x_i))\)

Additive vs. Multiplicative models

Linear regression is an additive model
- e.g. for two binary variables \(\beta_1 = 1.5\), \(\beta_2 = 1.5\).
- If \(x_1=1\) and \(x_2=1\), this adds 3.0 to \(E(y|x)\)
Log-linear models are multiplicative:
- If \(x_1=1\) and \(x_2=1\), this adds 3.0 to \(log(E[y_i])\)
- Expected count increases 20-fold: \(exp(1.5+1.5)\) or \(exp(1.5) * exp(1.5)\)
- Coefficients are invariant to multiplicative scaling of the data

This is a very important distinction!

Poisson model

In the Poisson distribution, the variance is equal to the mean.
i.e. if the mean number of a microbe across all samples is 4, then variance is also 4 and the standard deviation is 2.
The Poisson distribution fails when the variance exceeds the mean

Visualizing the Poisson Distribution

Poisson distribution has one parameter:
- mean \(\lambda\) is greater than 0
- variance is also \(\lambda\)

Negative binomial distribution

The binomial distribution is the number of successes in n trials:
- Roll a die ten times, how many times do you see a 6?
The negative binomial distribution is the number of successes it takes to observe r failures:
- How many times do you have to roll the die to see a 6 ten times?
- Note that the number of rolls is no longer fixed.
- In this example, p=5/6 and a 6 is a “failure”

Visualizing the Negative Binomial Distribution

Compare Poisson vs. Negative Binomial

Negative Binomial Distribution has two parameters: # of trials n, and probability of success p

Demystifying error models

In regression we model observations as coming from a random distribution with fixed parameters:
- linear regression: normal distribution, with mean and standard deviation
- log-linear models: Poisson distribution with mean \(\lambda\), or Negative Binomial distribution with parameters \(n\) and \(p\)

If there is evidence that the fixed parameters differ between two groups of interest, we say the results are statistically significant.

Zero-inflated models

Two-step model:
1. logistic model to determine whether count is zero or Poisson/NB
2. Poisson or NB regression distribution for \(y_i\) not set to zero by step 1.
Not currently supported by DESeq2, edgeR, limma (as far as I know)
Can perform one row at a time using pcsl::zeroinfl()
- Warning: this function’s default is to include covariates in both models, better to change logistic model to intercept-only (count ~ X|1)

Poisson Distribution with Zero Inflation

Example: Candela Africa dataset

Rampelli S et al.: Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota. Curr. Biol. 2015, 25:1682–1693.

indat = read.delim("data/Candela_Africa_stool.txt")
inmetadat = read.delim("data/Candela_Africa_metadat.txt")
library(phyloseq)
source("https://raw.githubusercontent.com/waldronlab/EPIC2016/master/metaphlanToPhyloseq.R")
Candela = metaphlanToPhyloseq(indat, inmetadat, simplenames = TRUE, 
                              roundtointeger = FALSE)

summary(otu_table(Candela))
summary(sample_data(Candela))

Initial exploration

phyloseq help vignettes here.

Candela

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 3302 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 3302 taxa by 8 taxonomic ranks ]

Candela = prune_taxa(taxa_sums(Candela) > 0, Candela)
Candela

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 648 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 648 taxa by 8 taxonomic ranks ]

Subsetting and pruning

rank_names(Candela)

## [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
## [8] "Strain"

subset_taxa(Candela, !is.na(Strain))

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 207 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 207 taxa by 8 taxonomic ranks ]

(Candela.sp_strain = subset_taxa(Candela, !is.na(Species)))

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 444 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 444 taxa by 8 taxonomic ranks ]

taxonomy.level = apply(tax_table(Candela), 1, function(x) sum(!is.na(x)))
Candela.phy = prune_taxa(taxonomy.level==2, Candela)
taxa_names(Candela.phy)

##  [1] "p__Euryarchaeota"    "p__Acidobacteria"    "p__Actinobacteria"  
##  [4] "p__Bacteroidetes"    "p__Cyanobacteria"    "p__Firmicutes"      
##  [7] "p__Fusobacteria"     "p__Proteobacteria"   "p__Spirochaetes"    
## [10] "p__Tenericutes"      "p__Verrucomicrobia"  "p__Eukaryota_noname"
## [13] "p__Microsporidia"

Advanced pruning

Keep taxa only if they are in the most abundant 10% of taxa in at least two samples:

f1<- filterfun_sample(topp(0.1))
pru <- genefilter_sample(Candela, f1, A=2)
summary(pru)

##    Mode   FALSE    TRUE    NA's 
## logical     455     193       0

subset_taxa(Candela, pru)

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 193 taxa and 38 samples ]
## sample_data() Sample Data:       [ 38 samples by 11 sample variables ]
## tax_table()   Taxonomy Table:    [ 193 taxa by 8 taxonomic ranks ]

More help here.

Heatmap

plot_heatmap(Candela.sp_strain, method="PCoA", distance="bray")

Barplot

par(mar = c(18, 4, 0, 0) + 0.1) # make more room on bottom margin
barplot(sort(taxa_sums(Candela.sp_strain), TRUE)[1:30]/nsamples(Candela.sp_strain), las=2)

Distances in high dimensions

High-dimensional data are complex and impossible to visualize in raw form
- Thousands of dimensions, we can only visualize 2-3
Distances can simplify thousands of dimensions
Any clustering or classification of samples and/or genes involves combining or identifying objects that are close or similar.
Distances or similarities are mathematical representations of what we mean by close or similar.
The choice of distance is a subject matter-specific, qualitative decision

Euclidian distance (metric)

Remember grade school: Euclidean d = \(\sqrt{ (A_x-B_x)^2 + (A_y-B_y)^2}\).

Alpha / Beta diversity measures

From Morgan and Huttenhower Human Microbiome Analysis

SVD

These examples describe the A) sequence counts and B) relative abundances of six taxa detected in three samples. C) A collector’s curve using a richness estimator approximates the relationship between the number of sequences drawn from each sample and the number of taxa expected to be present based on detected abundances. D) Alpha diversity captures both the organismal richness of a sample and the evenness of the organisms’ abundance distribution. E) Beta diversity represents the similarity (or difference) in organismal composition between samples.

Shannon Index alpha diversity: \(H' = -\sum_{i=1}^{S} \left( p_i ln(p_i) \right )\)
Beta diversity: \(\beta = (n_1 - c) + (n_2 - c)\)

Alpha diversity estimates

Look at ?phyloseq::estimate_richness
Supported measures of alpha diversity are:
- “Observed”, “Chao1”, “ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”
- more information from vegan package

Note, you can ignore warning about singletons:

Candela.int = Candela
otu_table(Candela.int) = round(otu_table(Candela)*1e4)
alpha_meas = c("Shannon", "Simpson", "InvSimpson")
(p <- plot_richness(Candela.int, "gender", "camp", measures=alpha_meas))

Comparison of alpha diversity estimates

alphas = estimate_richness(Candela.int, measures=alpha_meas)
pairs(alphas)

Beta diversity / dissimilarity

E.g. Bray-Curtis dissimilarity between all pairs of samples:

plot(hclust(phyloseq::distance(Candela, method="bray")), 
     main="Bray-Curtis Dissimilarity", xlab="", sub = "")

Dozens of distance measures are available
- see ?phyloseq::distance and ?phyloseq::distanceMethodList

Ordination

ord = ordinate(Candela, method="PCoA", distance="bray")
plot_ordination(Candela, ord, color="camp", shape="camp") + 
  ggplot2::ggtitle("Bray-Curtis Principal Coordinates Analysis")

Available methods are “DCA”, “CCA”, “RDA”, “CAP”, “DPCoA”, “NMDS”, “MDS”, “PCoA”
PCoA approximately preserves distances in a two-dimensional projection

Not much “horseshoe” effect here.

Ordination (cont’d)

plot_scree(ord) + ggplot2::ggtitle("Screeplot")

Sample selection

Not necessary, not evaluated:

Candela.pruned = subset_samples(Candela, country %in% c("italy", "tanzania"))
Candela.pruned = prune_samples(sample_sums(Candela.pruned) > 10, Candela.pruned)
Candela.pruned

Automatic independent filtering: DESeq2 automatically optimizes “non-specific” filtering / pruning of low-abundance features
filtering based on mean abundance
- optimizes the number of features meeting False Discovery Rate

Conversion to DESeq2

More help on converting to DESeq2 from various formats here.

dds.data = phyloseq_to_deseq2(Candela, ~country)

## converting counts to integer mode

Note: better to use normalized count data than relative abundance

Negative Binomial log-linear model with DESeq2

Excellent DESeq2 manual here or vignettes(package="DESeq2")

dds = DESeq(dds.data)
res = results(dds)
res = res[order(res$padj, na.last=NA), ]
alpha = 0.01
sigtab = res[(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), 
               as(tax_table(Candela)[rownames(sigtab), ], "matrix"))

head(sigtab)

##                                         baseMean log2FoldChange     lfcSE
## f__Bacteroidaceae                       4.980675      -6.015589 0.8326327
## g__Bacteroides                          4.980675      -6.015589 0.8326327
## f__Prevotellaceae                      33.286897       4.456109 0.6539972
## g__Prevotella                          33.215445       4.663931 0.6916855
## g__Phascolarctobacterium                8.684247       4.311203 0.6774475
## s__Phascolarctobacterium_succinatutens  8.684247       4.311203 0.6774475
##                                             stat       pvalue         padj
## f__Bacteroidaceae                      -7.224782 5.019067e-13 4.115635e-11
## g__Bacteroides                         -7.224782 5.019067e-13 4.115635e-11
## f__Prevotellaceae                       6.813652 9.515189e-12 5.201637e-10
## g__Prevotella                           6.742850 1.553097e-11 6.367698e-10
## g__Phascolarctobacterium                6.363892 1.967046e-10 4.608508e-09
## s__Phascolarctobacterium_succinatutens  6.363892 1.967046e-10 4.608508e-09
##                                         Kingdom        Phylum
## f__Bacteroidaceae                      Bacteria Bacteroidetes
## g__Bacteroides                         Bacteria Bacteroidetes
## f__Prevotellaceae                      Bacteria Bacteroidetes
## g__Prevotella                          Bacteria Bacteroidetes
## g__Phascolarctobacterium               Bacteria    Firmicutes
## s__Phascolarctobacterium_succinatutens Bacteria    Firmicutes
##                                                Class           Order
## f__Bacteroidaceae                        Bacteroidia   Bacteroidales
## g__Bacteroides                           Bacteroidia   Bacteroidales
## f__Prevotellaceae                        Bacteroidia   Bacteroidales
## g__Prevotella                            Bacteroidia   Bacteroidales
## g__Phascolarctobacterium               Negativicutes Selenomonadales
## s__Phascolarctobacterium_succinatutens Negativicutes Selenomonadales
##                                                    Family
## f__Bacteroidaceae                          Bacteroidaceae
## g__Bacteroides                             Bacteroidaceae
## f__Prevotellaceae                          Prevotellaceae
## g__Prevotella                              Prevotellaceae
## g__Phascolarctobacterium               Acidaminococcaceae
## s__Phascolarctobacterium_succinatutens Acidaminococcaceae
##                                                        Genus
## f__Bacteroidaceae                                       <NA>
## g__Bacteroides                                   Bacteroides
## f__Prevotellaceae                                       <NA>
## g__Prevotella                                     Prevotella
## g__Phascolarctobacterium               Phascolarctobacterium
## s__Phascolarctobacterium_succinatutens Phascolarctobacterium
##                                                                    Species
## f__Bacteroidaceae                                                     <NA>
## g__Bacteroides                                                        <NA>
## f__Prevotellaceae                                                     <NA>
## g__Prevotella                                                         <NA>
## g__Phascolarctobacterium                                              <NA>
## s__Phascolarctobacterium_succinatutens Phascolarctobacterium_succinatutens
##                                        Strain
## f__Bacteroidaceae                        <NA>
## g__Bacteroides                           <NA>
## f__Prevotellaceae                        <NA>
## g__Prevotella                            <NA>
## g__Phascolarctobacterium                 <NA>
## s__Phascolarctobacterium_succinatutens   <NA>

Bayesian estimation of dispersion

dds2 <- estimateSizeFactors(dds)
dds2 <- estimateDispersions(dds2)
plotDispEsts(dds2)

Plot results

library("ggplot2")
theme_set(theme_bw())
sigtabgen = subset(sigtab, !is.na(Family))
# Phylum order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Phylum, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Phylum = factor(as.character(sigtabgen$Phylum), levels=names(x))
# Family order
x = tapply(sigtabgen$log2FoldChange, sigtabgen$Family, function(x) max(x))
x = sort(x, TRUE)
sigtabgen$Family = factor(as.character(sigtabgen$Family), levels=names(x))
ggplot(sigtabgen, aes(y=Family, x=log2FoldChange, color=Phylum)) + 
  geom_vline(xintercept = 0.0, color = "gray", size = 0.5) +
  geom_point(size=6) + 
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

Correcting for gender as a potential confounder

table(sample_data(Candela)$country, sample_data(Candela)$gender)

##           
##            female male
##   italy         7    4
##   tanzania      9   18

dds.data2 = phyloseq_to_deseq2(Candela, ~country + gender)
dds2 = DESeq(dds.data)

resultsNames(dds2)

## [1] "Intercept"       "countryitaly"    "countrytanzania"

Correcting for gender as a potential confounder

## italy = numerator, tanzania = denominator
res2 = results(dds, contrast=c("country", "italy", "tanzania"))
res2 = res2[order(res$padj, na.last=NA), ]
alpha = 0.01
sigtab2 = res2[which(res2$padj < alpha), ]
sigtab2 = cbind(as(sigtab2, "data.frame"), as(tax_table(Candela)[rownames(sigtab2), ], "matrix"))
head(sigtab2)

##                                 baseMean log2FoldChange     lfcSE     stat
## p__Actinobacteria               3.339887       3.767790 0.6576299 5.729347
## c__Actinobacteria               3.339887       3.767790 0.6576299 5.729347
## o__Bifidobacteriales            2.495634       5.212101 0.9370110 5.562476
## f__Bifidobacteriaceae           2.495634       5.212101 0.9370110 5.562476
## g__Bifidobacterium              2.495634       5.212101 0.9370110 5.562476
## s__Bifidobacterium_adolescentis 1.182044       4.141045 1.0224334 4.050185
##                                       pvalue         padj  Kingdom
## p__Actinobacteria               1.008178e-08 1.271856e-07 Bacteria
## c__Actinobacteria               1.008178e-08 1.271856e-07 Bacteria
## o__Bifidobacteriales            2.659732e-08 2.726225e-07 Bacteria
## f__Bifidobacteriaceae           2.659732e-08 2.726225e-07 Bacteria
## g__Bifidobacterium              2.659732e-08 2.726225e-07 Bacteria
## s__Bifidobacterium_adolescentis 5.117714e-05 2.997518e-04 Bacteria
##                                         Phylum          Class
## p__Actinobacteria               Actinobacteria           <NA>
## c__Actinobacteria               Actinobacteria Actinobacteria
## o__Bifidobacteriales            Actinobacteria Actinobacteria
## f__Bifidobacteriaceae           Actinobacteria Actinobacteria
## g__Bifidobacterium              Actinobacteria Actinobacteria
## s__Bifidobacterium_adolescentis Actinobacteria Actinobacteria
##                                             Order             Family
## p__Actinobacteria                            <NA>               <NA>
## c__Actinobacteria                            <NA>               <NA>
## o__Bifidobacteriales            Bifidobacteriales               <NA>
## f__Bifidobacteriaceae           Bifidobacteriales Bifidobacteriaceae
## g__Bifidobacterium              Bifidobacteriales Bifidobacteriaceae
## s__Bifidobacterium_adolescentis Bifidobacteriales Bifidobacteriaceae
##                                           Genus
## p__Actinobacteria                          <NA>
## c__Actinobacteria                          <NA>
## o__Bifidobacteriales                       <NA>
## f__Bifidobacteriaceae                      <NA>
## g__Bifidobacterium              Bifidobacterium
## s__Bifidobacterium_adolescentis Bifidobacterium
##                                                      Species Strain
## p__Actinobacteria                                       <NA>   <NA>
## c__Actinobacteria                                       <NA>   <NA>
## o__Bifidobacteriales                                    <NA>   <NA>
## f__Bifidobacteriaceae                                   <NA>   <NA>
## g__Bifidobacterium                                      <NA>   <NA>
## s__Bifidobacterium_adolescentis Bifidobacterium_adolescentis   <NA>

Note can add interaction terms

MA plots

Fold-change vs. mean:

plotMA(res, main="Difference vs. Average")
legend("bottomright", legend="differentially abundant", lty=-1, pch=1, col="red", bty='n')

Plot individual counts

par(mfrow=c(1,2))
plotCounts(dds2, gene="p__Actinobacteria", intgroup="country")
plotCounts(dds2, gene="p__Actinobacteria", intgroup="gender")

Heatmap of differentially abundant taxa

select <- rownames(sigtab2)
nt <- normTransform(dds2) # defaults to log2(x+1)
log2.norm.counts <- assay(nt)[select, ]
df <- as.data.frame(colData(dds2)[,c("country", "gender")])

pheatmap::pheatmap(log2.norm.counts, annotation_col=df, main="log2(counts + 1)")

Regression on ordination vectors and alpha diversity

Prepare a data.frame:

df = data.frame(country=sample_data(Candela)$country,
                Shannon=alphas$Shannon)
df = cbind(df, ord$vectors[, 1:5])

par(mfrow=c(3,2))
for (i in 2:7){
  boxplot(df[, i] ~ df$country, main=colnames(df)[i])
}

Regression on ordination vectors and alpha diversity (cont’d)

Multivariate regression:

fit = glm(country ~ ., data=df, family=binomial("logit"))

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Univariate regression:

res <- sapply(2:ncol(df), function(i){
  fit = glm(df$country ~ df[, i], family=binomial("logit"))
  summary(fit)$coefficients[2, ]
})
colnames(res) = colnames(df)[2:ncol(df)]
res

##               Shannon        Axis.1       Axis.2    Axis.3    Axis.4
## Estimate   -2.8818298 -15.342356740 -36.21703957 4.3518071 2.7867726
## Std. Error  2.0011503   5.305383309  15.91867326 3.7931338 4.5480462
## z value    -1.4400866  -2.891846988  -2.27512928 1.1472854 0.6127406
## Pr(>|z|)    0.1498429   0.003829844   0.02289818 0.2512637 0.5400479
##                 Axis.5
## Estimate   -0.39981340
## Std. Error  4.84389021
## z value    -0.08253973
## Pr(>|z|)    0.93421752

write.csv(res, file="univariate_shannonPCoA.csv")

Regression in R: model formulae

regression functions in R such as lm(), glm(), and high-level packages like DESeq2 use a “model formula” interface.
The basic format is:

response variable ~ explanatory variables

The tilde means “is modeled by” or “is modeled as a function of.”

Model formulae tutorial

Can use Model matrices for more control of the model (?model.matrix)

Statistical analysis for metagenomic data

Outline

Learning Objectives

Properties of processed metagenomic data

Linear modeling for metagenomic data: Two main approaches (1)

Two main approaches (2)

Multiple Linear Regression Model (approach 1)

Multiple Linear Regression Model (cont’d)

Generalized Linear Models

Components of GLM

Linear Regression as GLM

Log-linear models

Additive vs. Multiplicative models

Poisson model

Visualizing the Poisson Distribution

Negative binomial distribution

Visualizing the Negative Binomial Distribution

Compare Poisson vs. Negative Binomial

Demystifying error models

Zero-inflated models

Poisson Distribution with Zero Inflation

Lab 1: exploration

Example: Candela Africa dataset

Initial exploration

Subsetting and pruning

Advanced pruning

Heatmap

Barplot

Distances in high dimensions

Euclidian distance (metric)

Alpha / Beta diversity measures

Alpha diversity estimates

Comparison of alpha diversity estimates

Beta diversity / dissimilarity

Ordination

Ordination (cont’d)

Lab 2: regression

Sample selection

Conversion to DESeq2

Negative Binomial log-linear model with DESeq2

Bayesian estimation of dispersion

Plot results

Correcting for gender as a potential confounder

Correcting for gender as a potential confounder

MA plots

Plot individual counts

Heatmap of differentially abundant taxa

Regression on ordination vectors and alpha diversity

Regression on ordination vectors and alpha diversity (cont’d)

Regression in R: model formulae

Note on read depths and offset

Links