COD_202411020_MGK_IBS7048_week12_statistical_microbiome_analysis

Minsik Kim

2024-11-20

This analysis pipe noted https://github.com/minsiksudo/BTE3207_Advanced_Biostatistics.

Roading the data

from previous lecture.. we generated a phyloseq object. it can be saved to hard disk using the below command.

saveRDS(ps, "phyloseq_example.rds")

And it can be loaded to a new R session like below!

getwd()

## [1] "/Users/minsikkim/Dropbox (Personal)"

library(phyloseq)
library(tidyverse)
library(microbiome)

## 
## microbiome R package (microbiome.github.com)
##     
## 
## 
##  Copyright (C) 2011-2022 Leo Lahti, 
##     Sudarshan Shetty et al. <microbiome.github.io>

## 
## Attaching package: 'microbiome'

## The following object is masked from 'package:ggplot2':
## 
##     alpha

## The following object is masked from 'package:base':
## 
##     transform

library(vegan)

## Loading required package: permute

## Loading required package: lattice

## This is vegan 2.6-8

## 
## Attaching package: 'vegan'

## The following object is masked from 'package:microbiome':
## 
##     diversity

library(ggplot2)
#source("https://raw.githubusercontent.com/joey711/phyloseq/master/inst/scripts/installer.R",
#       local = TRUE)


phyloseq <- readRDS("Inha/5_Lectures/2024/Advanced biostatistics/scripts/BTE3207_Advanced_Biostatistics/phyloseq_example.rds")
phyloseq %>% sample_data

##        Subject Gender Day  When
## F3D0         3      F   0 Early
## F3D1         3      F   1 Early
## F3D141       3      F 141  Late
## F3D142       3      F 142  Late
## F3D143       3      F 143  Late
## F3D144       3      F 144  Late
## F3D145       3      F 145  Late
## F3D146       3      F 146  Late
## F3D147       3      F 147  Late
## F3D148       3      F 148  Late
## F3D149       3      F 149  Late
## F3D150       3      F 150  Late
## F3D2         3      F   2 Early
## F3D3         3      F   3 Early
## F3D5         3      F   5 Early
## F3D6         3      F   6 Early
## F3D7         3      F   7 Early
## F3D8         3      F   8 Early
## F3D9         3      F   9 Early

Basics of statistical tests

MPG dataset

MPG dataset is basic data installed in ggplot

head(mpg)

## # A tibble: 6 × 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…

Double-check the data type.

str(mpg)

## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
##  $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##  $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##  $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr [1:234] "f" "f" "f" "f" ...
##  $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr [1:234] "p" "p" "p" "p" ...
##  $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...

t.test fucntion will conduct basic t-test

Paired t-test

t.test(mpg$cty, mpg$hwy, paired = T)

## 
##  Paired t-test
## 
## data:  mpg$cty and mpg$hwy
## t = -44.492, df = 233, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -6.872628 -6.289765
## sample estimates:
## mean difference 
##       -6.581197

Unpaired t-test

mpg$class

##   [1] "compact"    "compact"    "compact"    "compact"    "compact"   
##   [6] "compact"    "compact"    "compact"    "compact"    "compact"   
##  [11] "compact"    "compact"    "compact"    "compact"    "compact"   
##  [16] "midsize"    "midsize"    "midsize"    "suv"        "suv"       
##  [21] "suv"        "suv"        "suv"        "2seater"    "2seater"   
##  [26] "2seater"    "2seater"    "2seater"    "suv"        "suv"       
##  [31] "suv"        "suv"        "midsize"    "midsize"    "midsize"   
##  [36] "midsize"    "midsize"    "minivan"    "minivan"    "minivan"   
##  [41] "minivan"    "minivan"    "minivan"    "minivan"    "minivan"   
##  [46] "minivan"    "minivan"    "minivan"    "pickup"     "pickup"    
##  [51] "pickup"     "pickup"     "pickup"     "pickup"     "pickup"    
##  [56] "pickup"     "pickup"     "suv"        "suv"        "suv"       
##  [61] "suv"        "suv"        "suv"        "suv"        "pickup"    
##  [66] "pickup"     "pickup"     "pickup"     "pickup"     "pickup"    
##  [71] "pickup"     "pickup"     "pickup"     "pickup"     "suv"       
##  [76] "suv"        "suv"        "suv"        "suv"        "suv"       
##  [81] "suv"        "suv"        "suv"        "pickup"     "pickup"    
##  [86] "pickup"     "pickup"     "pickup"     "pickup"     "pickup"    
##  [91] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
##  [96] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [101] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [106] "subcompact" "subcompact" "subcompact" "midsize"    "midsize"   
## [111] "midsize"    "midsize"    "midsize"    "midsize"    "midsize"   
## [116] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [121] "subcompact" "subcompact" "suv"        "suv"        "suv"       
## [126] "suv"        "suv"        "suv"        "suv"        "suv"       
## [131] "suv"        "suv"        "suv"        "suv"        "suv"       
## [136] "suv"        "suv"        "suv"        "suv"        "suv"       
## [141] "suv"        "compact"    "compact"    "midsize"    "midsize"   
## [146] "midsize"    "midsize"    "midsize"    "midsize"    "midsize"   
## [151] "suv"        "suv"        "suv"        "suv"        "midsize"   
## [156] "midsize"    "midsize"    "midsize"    "midsize"    "suv"       
## [161] "suv"        "suv"        "suv"        "suv"        "suv"       
## [166] "subcompact" "subcompact" "subcompact" "subcompact" "compact"   
## [171] "compact"    "compact"    "compact"    "suv"        "suv"       
## [176] "suv"        "suv"        "suv"        "suv"        "midsize"   
## [181] "midsize"    "midsize"    "midsize"    "midsize"    "midsize"   
## [186] "midsize"    "compact"    "compact"    "compact"    "compact"   
## [191] "compact"    "compact"    "compact"    "compact"    "compact"   
## [196] "compact"    "compact"    "compact"    "suv"        "suv"       
## [201] "pickup"     "pickup"     "pickup"     "pickup"     "pickup"    
## [206] "pickup"     "pickup"     "compact"    "compact"    "compact"   
## [211] "compact"    "compact"    "compact"    "compact"    "compact"   
## [216] "compact"    "compact"    "compact"    "compact"    "compact"   
## [221] "compact"    "subcompact" "subcompact" "subcompact" "subcompact"
## [226] "subcompact" "subcompact" "midsize"    "midsize"    "midsize"   
## [231] "midsize"    "midsize"    "midsize"    "midsize"

mpg %>% filter(class == "2seater")%>% .$cty

## [1] 16 15 16 15 15

city_mpg_2seater <- mpg %>% filter(class == "2seater") %>% .$cty

city_mpg_midsize <- mpg %>% filter(class == "midsize") %>% .$cty

city_mpg_2seater

## [1] 16 15 16 15 15

city_mpg_midsize

##  [1] 15 17 16 19 22 18 18 17 18 18 21 21 18 18 19 23 23 19 19 18 19 19 18 16 17
## [26] 18 16 21 21 21 21 18 18 19 21 18 19 21 16 18 17

t.test(city_mpg_2seater,
       city_mpg_midsize,
       paired = F,
       var.equal = F)

## 
##  Welch Two Sample t-test
## 
## data:  city_mpg_2seater and city_mpg_midsize
## t = -8.5965, df = 20.862, p-value = 2.69e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.168314 -2.543882
## sample estimates:
## mean of x mean of y 
##   15.4000   18.7561

Linear model

Univariate linear model

Boxplot of categorical x, continuous y

boxplot(data = mpg, cty ~ class)

Linear model

lm(data = mpg, 
       cty ~ class)

## 
## Call:
## lm(formula = cty ~ class, data = mpg)
## 
## Coefficients:
##     (Intercept)     classcompact     classmidsize     classminivan  
##         15.4000           4.7277           3.3561           0.4182  
##     classpickup  classsubcompact         classsuv  
##         -2.4000           4.9714          -1.9000

Linear model - the output can be stored

lm_result_mpg <- 
        lm(data = mpg, 
           cty ~ class)

lm_result_mpg

## 
## Call:
## lm(formula = cty ~ class, data = mpg)
## 
## Coefficients:
##     (Intercept)     classcompact     classmidsize     classminivan  
##         15.4000           4.7277           3.3561           0.4182  
##     classpickup  classsubcompact         classsuv  
##         -2.4000           4.9714          -1.9000

The result of lm is a list having multiple caclulated factors

lm_result_mpg %>% names

##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"

Summary of a linear model

summary(lm_result_mpg)

## 
## Call:
## lm(formula = cty ~ class, data = mpg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3714 -1.7561 -0.1277  1.0000 14.6286 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      15.4000     1.3024  11.824  < 2e-16 ***
## classcompact      4.7277     1.3699   3.451 0.000666 ***
## classmidsize      3.3561     1.3796   2.433 0.015759 *  
## classminivan      0.4182     1.5708   0.266 0.790307    
## classpickup      -2.4000     1.3976  -1.717 0.087304 .  
## classsubcompact   4.9714     1.3923   3.571 0.000435 ***
## classsuv         -1.9000     1.3539  -1.403 0.161884    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.912 on 227 degrees of freedom
## Multiple R-squared:  0.5438, Adjusted R-squared:  0.5317 
## F-statistic:  45.1 on 6 and 227 DF,  p-value: < 2.2e-16

This result will show you how different data is. (by the factor label of class)

You have 1. Estimate (beta) 2. Standard errors 3. p-values 4. R-squared values

ANOVA

Analysis of variance for linear model can be done to this linear model result.

anova(lm_result_mpg)

## Analysis of Variance Table
## 
## Response: cty
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## class       6 2295.0  382.51  45.099 < 2.2e-16 ***
## Residuals 227 1925.3    8.48                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How can we test different multiple associations?

ggplot(mpg,
       aes(y = cty, x = class, fill = as.factor(year))) +
        geom_boxplot()

Anova - Multivariate

lm(data = mpg,
        formula = hwy ~ class + as.factor(year)) %>%
        anova

## Analysis of Variance Table
## 
## Response: hwy
##                  Df Sum Sq Mean Sq F value Pr(>F)    
## class             6 5683.2  947.21 83.4157 <2e-16 ***
## as.factor(year)   1   12.1   12.15  1.0699 0.3021    
## Residuals       226 2566.3   11.36                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

linear model - Multivariate

Multiple linear regression can be don by just adding a new variable to formula

lm(data = mpg,
        formula = hwy ~ class * as.factor(year)) %>%
        summary

## 
## Call:
## lm(formula = hwy ~ class * as.factor(year), data = mpg)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.000 -1.812 -0.500  1.341 16.080 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          24.5000     2.3822  10.285  < 2e-16 ***
## classcompact                          3.4200     2.4756   1.381  0.16854    
## classmidsize                          2.0000     2.4984   0.800  0.42429    
## classminivan                         -2.0000     2.7507  -0.727  0.46795    
## classpickup                          -7.6875     2.5267  -3.043  0.00263 ** 
## classsubcompact                       4.5000     2.5044   1.797  0.07373 .  
## classsuv                             -6.9483     2.4629  -2.821  0.00522 ** 
## as.factor(year)2008                   0.5000     3.0754   0.163  0.87100    
## classcompact:as.factor(year)2008      0.3073     3.2292   0.095  0.92428    
## classmidsize:as.factor(year)2008      1.0476     3.2505   0.322  0.74754    
## classminivan:as.factor(year)2008     -0.8000     3.6904  -0.217  0.82858    
## classpickup:as.factor(year)2008      -0.3713     3.2916  -0.113  0.91029    
## classsubcompact:as.factor(year)2008  -2.3750     3.2809  -0.724  0.46991    
## classsuv:as.factor(year)2008          0.5846     3.1927   0.183  0.85487    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.369 on 220 degrees of freedom
## Multiple R-squared:  0.6978, Adjusted R-squared:  0.6799 
## F-statistic: 39.07 on 13 and 220 DF,  p-value: < 2.2e-16

Statistical tests for alpha diversity

microbiome::alpha(phyloseq)

## Observed richness

## Other forms of richness

## Diversity

## Evenness

## Dominance

## Rarity

##        observed chao1 diversity_inverse_simpson diversity_gini_simpson
## F3D0        101   101                  28.30269              0.9646677
## F3D1         95    95                  33.63697              0.9702708
## F3D141       72    72                  20.40816              0.9510000
## F3D142       46    46                  16.18065              0.9381978
## F3D143       52    52                  18.96604              0.9472742
## F3D144       48    48                  15.64626              0.9360870
## F3D145       65    65                  15.90272              0.9371177
## F3D146       80    80                  23.17233              0.9568451
## F3D147      100   100                  17.67047              0.9434084
## F3D148       92    92                  20.35859              0.9508807
## F3D149      107   107                  23.33034              0.9571374
## F3D150       72    72                  22.03383              0.9546153
## F3D2        128   128                  14.24080              0.9297792
## F3D3         62    62                  12.77515              0.9217230
## F3D5         79    79                  26.61261              0.9624238
## F3D6         85    85                  16.09561              0.9378712
## F3D7         56    56                  12.45250              0.9196948
## F3D8         91    91                  21.97649              0.9544968
## F3D9        103   103                  23.49765              0.9574426
##        diversity_shannon diversity_fisher diversity_coverage evenness_camargo
## F3D0            3.858471        16.981200                 10        0.3427514
## F3D1            3.961659        16.721685                 13        0.4009949
## F3D141          3.443176        11.995626                  7        0.3073986
## F3D142          3.105778         8.022685                  6        0.3398586
## F3D143          3.291013         9.279171                  7        0.3553785
## F3D144          3.066893         7.815491                  6        0.3135287
## F3D145          3.108397        10.265899                  6        0.2482799
## F3D146          3.614385        14.274466                  8        0.3298440
## F3D147          3.339392        14.687532                  6        0.2132134
## F3D148          3.435338        14.040249                  7        0.2454306
## F3D149          3.633402        16.580009                  8        0.2597679
## F3D150          3.551333        12.314447                  8        0.3416144
## F3D2            3.550710        18.859482                  6        0.2410682
## F3D3            3.024484         9.826275                  5        0.2430211
## F3D5            3.730210        14.243093                  9        0.3807388
## F3D6            3.395656        13.767017                  6        0.2752713
## F3D7            2.965860         9.152918                  5        0.2539099
## F3D8            3.685304        16.216649                  8        0.3264930
## F3D9            3.755616        17.681589                  8        0.3140676
##        evenness_pielou evenness_simpson evenness_evar evenness_bulla
## F3D0         0.8360499        0.2802247     0.3897545      0.5241681
## F3D1         0.8699530        0.3540734     0.4190817      0.5753430
## F3D141       0.8051076        0.2834467     0.3541053      0.4506402
## F3D142       0.8111959        0.3517532     0.3098635      0.4712513
## F3D143       0.8329056        0.3647315     0.3993937      0.4979298
## F3D144       0.7922328        0.3259638     0.2872649      0.4349288
## F3D145       0.7446356        0.2446573     0.2375730      0.3602024
## F3D146       0.8248204        0.2896541     0.4084409      0.4796167
## F3D147       0.7251398        0.1767047     0.2718871      0.3317670
## F3D148       0.7597298        0.2212891     0.2844793      0.3685461
## F3D149       0.7775594        0.2180405     0.3070244      0.4001504
## F3D150       0.8303976        0.3060255     0.4252157      0.4987970
## F3D2         0.7317987        0.1112563     0.3100413      0.4040083
## F3D3         0.7328291        0.2060508     0.2859279      0.3663371
## F3D5         0.8537028        0.3368684     0.4427727      0.5535557
## F3D6         0.7643309        0.1893601     0.4014008      0.4165620
## F3D7         0.7367952        0.2223660     0.3044753      0.3723305
## F3D8         0.8169850        0.2414999     0.4348843      0.4997712
## F3D9         0.8103206        0.2281325     0.4133750      0.4855633
##        dominance_dbp dominance_dmn dominance_absolute dominance_relative
## F3D0      0.08851195     0.1612953                574         0.08851195
## F3D1      0.08244681     0.1540507                403         0.08244681
## F3D141    0.10252170     0.1940885                496         0.10252170
## F3D142    0.12257282     0.2382686                303         0.12257282
## F3D143    0.09322709     0.1852590                234         0.09322709
## F3D144    0.11558621     0.2143448                419         0.11558621
## F3D145    0.11232639     0.2121528                647         0.11232639
## F3D146    0.10095780     0.1827595                390         0.10095780
## F3D147    0.11298457     0.2045164               1501         0.11298457
## F3D148    0.08861532     0.1762132                871         0.08861532
## F3D149    0.08514889     0.1692513                895         0.08514889
## F3D150    0.10894118     0.2002353                463         0.10894118
## F3D2      0.20909200     0.3036655               3491         0.20909200
## F3D3      0.18227332     0.2938995                983         0.18227332
## F3D5      0.08908441     0.1646962                324         0.08908441
## F3D6      0.15355465     0.2557223               1013         0.15355465
## F3D7      0.15553412     0.2761032                645         0.15553412
## F3D8      0.12533937     0.2045249                554         0.12533937
## F3D9      0.09946417     0.1846952                594         0.09946417
##        dominance_simpson dominance_core_abundance dominance_gini
## F3D0          0.03533233                0.7822668      0.8317814
## F3D1          0.02972919                0.7172668      0.8176164
## F3D141        0.04900000                0.9096734      0.8927575
## F3D142        0.06180222                0.9538835      0.9255230
## F3D143        0.05272583                0.9294821      0.9093803
## F3D144        0.06391302                0.9544828      0.9280702
## F3D145        0.06288231                0.9366319      0.9239413
## F3D146        0.04315493                0.8480456      0.8689525
## F3D147        0.05659158                0.9075649      0.8986609
## F3D148        0.04911931                0.9119951      0.8920291
## F3D149        0.04286264                0.8665208      0.8668117
## F3D150        0.04538475                0.8687059      0.8789076
## F3D2          0.07022077                0.8069598      0.8539834
## F3D3          0.07827699                0.9404784      0.9275789
## F3D5          0.03757618                0.7951608      0.8541172
## F3D6          0.06212876                0.8450811      0.8850364
## F3D7          0.08030517                0.9266940      0.9315458
## F3D8          0.04550316                0.7635747      0.8528341
## F3D9          0.04255744                0.7436370      0.8408481
##        rarity_log_modulo_skewness rarity_low_abundance rarity_rare_abundance
## F3D0                     2.041833          0.031765613           0.088357749
## F3D1                     2.031534          0.021890344           0.046849427
## F3D141                   2.039782          0.020876395           0.010954940
## F3D142                   2.005921          0.003640777           0.010517799
## F3D143                   2.037479          0.005577689           0.006374502
## F3D144                   2.018215          0.010206897           0.005517241
## F3D145                   2.030860          0.028645833           0.012673611
## F3D146                   2.046630          0.012943308           0.040383122
## F3D147                   2.046947          0.055024464           0.022581859
## F3D148                   2.045807          0.054125547           0.022586224
## F3D149                   2.045309          0.048615736           0.041004662
## F3D150                   2.040085          0.012000000           0.029882353
## F3D2                     1.994599          0.049173455           0.053066603
## F3D3                     2.046069          0.022621917           0.017059151
## F3D5                     2.050813          0.013747594           0.032444322
## F3D6                     2.053307          0.028800970           0.031832651
## F3D7                     2.048982          0.019049916           0.008680974
## F3D8                     2.055754          0.018325792           0.063122172
## F3D9                     2.050676          0.034829203           0.074514401

sample_data_with_alpha <-
        phyloseq %>%
        sample_data() %>%
        merge(., #Merge function will do the same thing as vloopup at excel
              microbiome::alpha(phyloseq),
              by = 0) %>%
        column_to_rownames("Row.names") %>%
        sample_data

## Observed richness

## Other forms of richness

## Diversity

## Evenness

## Dominance

## Rarity

sample_data(phyloseq) <- sample_data_with_alpha
sample_data_with_alpha

##        Subject Gender Day  When observed chao1 diversity_inverse_simpson
## F3D0         3      F   0 Early      101   101                  28.30269
## F3D1         3      F   1 Early       95    95                  33.63697
## F3D141       3      F 141  Late       72    72                  20.40816
## F3D142       3      F 142  Late       46    46                  16.18065
## F3D143       3      F 143  Late       52    52                  18.96604
## F3D144       3      F 144  Late       48    48                  15.64626
## F3D145       3      F 145  Late       65    65                  15.90272
## F3D146       3      F 146  Late       80    80                  23.17233
## F3D147       3      F 147  Late      100   100                  17.67047
## F3D148       3      F 148  Late       92    92                  20.35859
## F3D149       3      F 149  Late      107   107                  23.33034
## F3D150       3      F 150  Late       72    72                  22.03383
## F3D2         3      F   2 Early      128   128                  14.24080
## F3D3         3      F   3 Early       62    62                  12.77515
## F3D5         3      F   5 Early       79    79                  26.61261
## F3D6         3      F   6 Early       85    85                  16.09561
## F3D7         3      F   7 Early       56    56                  12.45250
## F3D8         3      F   8 Early       91    91                  21.97649
## F3D9         3      F   9 Early      103   103                  23.49765
##        diversity_gini_simpson diversity_shannon diversity_fisher
## F3D0                0.9646677          3.858471        16.981200
## F3D1                0.9702708          3.961659        16.721685
## F3D141              0.9510000          3.443176        11.995626
## F3D142              0.9381978          3.105778         8.022685
## F3D143              0.9472742          3.291013         9.279171
## F3D144              0.9360870          3.066893         7.815491
## F3D145              0.9371177          3.108397        10.265899
## F3D146              0.9568451          3.614385        14.274466
## F3D147              0.9434084          3.339392        14.687532
## F3D148              0.9508807          3.435338        14.040249
## F3D149              0.9571374          3.633402        16.580009
## F3D150              0.9546153          3.551333        12.314447
## F3D2                0.9297792          3.550710        18.859482
## F3D3                0.9217230          3.024484         9.826275
## F3D5                0.9624238          3.730210        14.243093
## F3D6                0.9378712          3.395656        13.767017
## F3D7                0.9196948          2.965860         9.152918
## F3D8                0.9544968          3.685304        16.216649
## F3D9                0.9574426          3.755616        17.681589
##        diversity_coverage evenness_camargo evenness_pielou evenness_simpson
## F3D0                   10        0.3427514       0.8360499        0.2802247
## F3D1                   13        0.4009949       0.8699530        0.3540734
## F3D141                  7        0.3073986       0.8051076        0.2834467
## F3D142                  6        0.3398586       0.8111959        0.3517532
## F3D143                  7        0.3553785       0.8329056        0.3647315
## F3D144                  6        0.3135287       0.7922328        0.3259638
## F3D145                  6        0.2482799       0.7446356        0.2446573
## F3D146                  8        0.3298440       0.8248204        0.2896541
## F3D147                  6        0.2132134       0.7251398        0.1767047
## F3D148                  7        0.2454306       0.7597298        0.2212891
## F3D149                  8        0.2597679       0.7775594        0.2180405
## F3D150                  8        0.3416144       0.8303976        0.3060255
## F3D2                    6        0.2410682       0.7317987        0.1112563
## F3D3                    5        0.2430211       0.7328291        0.2060508
## F3D5                    9        0.3807388       0.8537028        0.3368684
## F3D6                    6        0.2752713       0.7643309        0.1893601
## F3D7                    5        0.2539099       0.7367952        0.2223660
## F3D8                    8        0.3264930       0.8169850        0.2414999
## F3D9                    8        0.3140676       0.8103206        0.2281325
##        evenness_evar evenness_bulla dominance_dbp dominance_dmn
## F3D0       0.3897545      0.5241681    0.08851195     0.1612953
## F3D1       0.4190817      0.5753430    0.08244681     0.1540507
## F3D141     0.3541053      0.4506402    0.10252170     0.1940885
## F3D142     0.3098635      0.4712513    0.12257282     0.2382686
## F3D143     0.3993937      0.4979298    0.09322709     0.1852590
## F3D144     0.2872649      0.4349288    0.11558621     0.2143448
## F3D145     0.2375730      0.3602024    0.11232639     0.2121528
## F3D146     0.4084409      0.4796167    0.10095780     0.1827595
## F3D147     0.2718871      0.3317670    0.11298457     0.2045164
## F3D148     0.2844793      0.3685461    0.08861532     0.1762132
## F3D149     0.3070244      0.4001504    0.08514889     0.1692513
## F3D150     0.4252157      0.4987970    0.10894118     0.2002353
## F3D2       0.3100413      0.4040083    0.20909200     0.3036655
## F3D3       0.2859279      0.3663371    0.18227332     0.2938995
## F3D5       0.4427727      0.5535557    0.08908441     0.1646962
## F3D6       0.4014008      0.4165620    0.15355465     0.2557223
## F3D7       0.3044753      0.3723305    0.15553412     0.2761032
## F3D8       0.4348843      0.4997712    0.12533937     0.2045249
## F3D9       0.4133750      0.4855633    0.09946417     0.1846952
##        dominance_absolute dominance_relative dominance_simpson
## F3D0                  574         0.08851195        0.03533233
## F3D1                  403         0.08244681        0.02972919
## F3D141                496         0.10252170        0.04900000
## F3D142                303         0.12257282        0.06180222
## F3D143                234         0.09322709        0.05272583
## F3D144                419         0.11558621        0.06391302
## F3D145                647         0.11232639        0.06288231
## F3D146                390         0.10095780        0.04315493
## F3D147               1501         0.11298457        0.05659158
## F3D148                871         0.08861532        0.04911931
## F3D149                895         0.08514889        0.04286264
## F3D150                463         0.10894118        0.04538475
## F3D2                 3491         0.20909200        0.07022077
## F3D3                  983         0.18227332        0.07827699
## F3D5                  324         0.08908441        0.03757618
## F3D6                 1013         0.15355465        0.06212876
## F3D7                  645         0.15553412        0.08030517
## F3D8                  554         0.12533937        0.04550316
## F3D9                  594         0.09946417        0.04255744
##        dominance_core_abundance dominance_gini rarity_log_modulo_skewness
## F3D0                  0.7822668      0.8317814                   2.041833
## F3D1                  0.7172668      0.8176164                   2.031534
## F3D141                0.9096734      0.8927575                   2.039782
## F3D142                0.9538835      0.9255230                   2.005921
## F3D143                0.9294821      0.9093803                   2.037479
## F3D144                0.9544828      0.9280702                   2.018215
## F3D145                0.9366319      0.9239413                   2.030860
## F3D146                0.8480456      0.8689525                   2.046630
## F3D147                0.9075649      0.8986609                   2.046947
## F3D148                0.9119951      0.8920291                   2.045807
## F3D149                0.8665208      0.8668117                   2.045309
## F3D150                0.8687059      0.8789076                   2.040085
## F3D2                  0.8069598      0.8539834                   1.994599
## F3D3                  0.9404784      0.9275789                   2.046069
## F3D5                  0.7951608      0.8541172                   2.050813
## F3D6                  0.8450811      0.8850364                   2.053307
## F3D7                  0.9266940      0.9315458                   2.048982
## F3D8                  0.7635747      0.8528341                   2.055754
## F3D9                  0.7436370      0.8408481                   2.050676
##        rarity_low_abundance rarity_rare_abundance
## F3D0            0.031765613           0.088357749
## F3D1            0.021890344           0.046849427
## F3D141          0.020876395           0.010954940
## F3D142          0.003640777           0.010517799
## F3D143          0.005577689           0.006374502
## F3D144          0.010206897           0.005517241
## F3D145          0.028645833           0.012673611
## F3D146          0.012943308           0.040383122
## F3D147          0.055024464           0.022581859
## F3D148          0.054125547           0.022586224
## F3D149          0.048615736           0.041004662
## F3D150          0.012000000           0.029882353
## F3D2            0.049173455           0.053066603
## F3D3            0.022621917           0.017059151
## F3D5            0.013747594           0.032444322
## F3D6            0.028800970           0.031832651
## F3D7            0.019049916           0.008680974
## F3D8            0.018325792           0.063122172
## F3D9            0.034829203           0.074514401

Alpha diversity boxplots

sample_data_with_alpha %>%
        ggplot(aes(x = When, y = observed)) +
        geom_boxplot() +
        xlab("Genus richness") +
        theme_classic(base_size = 15)

Statistical tests on alpha diversity (linear model)

sample_data_with_alpha %>%
        data.frame() %>% 
        lm(data = ., observed ~ When) %>%
        summary

## 
## Call:
## lm(formula = observed ~ When, data = .)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -32.89 -15.64  -1.40  13.11  39.11 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   88.889      7.226  12.301 6.88e-10 ***
## WhenLate     -15.489      9.961  -1.555    0.138    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.68 on 17 degrees of freedom
## Multiple R-squared:  0.1245, Adjusted R-squared:  0.07303 
## F-statistic: 2.418 on 1 and 17 DF,  p-value: 0.1384

The difference between those two group - is not statistically significant

How about other outcomes?

sample_data_with_alpha %>%
        data.frame() %>% 
        lm(data = ., observed ~ evenness_simpson) %>%
        summary

## 
## Call:
## lm(formula = observed ~ evenness_simpson, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.133 -13.320   3.044  14.488  32.319 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        131.10      17.17   7.637 6.83e-07 ***
## evenness_simpson  -193.25      63.79  -3.030  0.00756 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.67 on 17 degrees of freedom
## Multiple R-squared:  0.3506, Adjusted R-squared:  0.3124 
## F-statistic: 9.178 on 1 and 17 DF,  p-value: 0.007562

The evenness difference between those two group - is statistically significant

sample_data_with_alpha %>%
        data.frame() %>% 
        lm(data = ., observed ~ dominance_core_abundance) %>%
        summary

## 
## Call:
## lm(formula = observed ~ dominance_core_abundance, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.214 -12.941  -2.964   2.150  35.648 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                257.88      45.21   5.704 2.58e-05 ***
## dominance_core_abundance  -205.13      52.16  -3.932  0.00107 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.77 on 17 degrees of freedom
## Multiple R-squared:  0.4763, Adjusted R-squared:  0.4455 
## F-statistic: 15.46 on 1 and 17 DF,  p-value: 0.001073

Core abundance was also statistically different.

How can we say this difference is not because of deeper sequencing?

sample_data_with_alpha$final_reads <- phyloseq %>% sample_sums

sample_data_with_alpha %>%
        data.frame() %>% 
        ggplot(aes(x = final_reads, y = observed)) +
        geom_point()

Linear model of this association

(As deeper sequencing identified more taxa)

lm_eqn <- function(m){
    eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2, 
         list(a = format(unname(coef(m)[1]), digits = 2),
              b = format(unname(coef(m)[2]), digits = 2),
             r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));
}


ggplot(data = sample_data_with_alpha, aes(x = final_reads, y = observed)) +
        geom_smooth(method = "lm", 
                    se = T, 
                    color = "red", 
                    linetype = "dashed"
                    ) +  # Plot regression slope
        geom_point() +
        theme_classic( base_family = "serif", base_size = 20) +
        geom_label(x = 12000, y = 60,
                  label = lm_eqn(lm(data = data.frame(sample_data_with_alpha), observed ~ final_reads)),
                  parse = TRUE, 
                  family='serif',
                  size = 8, label.size = 0, color = "red")#, fill = alpha(c("white"),1))

## `geom_smooth()` using formula = 'y ~ x'

lm(data = data.frame(sample_data_with_alpha), observed ~ final_reads) %>% summary

## 
## Call:
## lm(formula = observed ~ final_reads, data = data.frame(sample_data_with_alpha))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -20.443 -13.328  -1.137  10.476  23.659 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.161e+01  6.672e+00   7.736 5.75e-07 ***
## final_reads 4.643e-03  9.189e-04   5.053 9.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.65 on 17 degrees of freedom
## Multiple R-squared:  0.6003, Adjusted R-squared:  0.5768 
## F-statistic: 25.53 on 1 and 17 DF,  p-value: 9.805e-05

There is significant correlation !

ggplot(data = sample_data_with_alpha, aes(x = final_reads, y = observed)) +
        geom_smooth(method = "lm", 
                    se = T, 
                    color = "red", 
                    linetype = "dashed"
                    ) +  # Plot regression slope
        geom_point(aes(col = When)) +
        theme_classic( base_family = "serif", base_size = 20) +
        geom_label(x = 12000, y = 60,
                  label = lm_eqn(lm(data = data.frame(sample_data_with_alpha), observed ~ final_reads)),
                  parse = TRUE, 
                  family='serif',
                  size = 8, label.size = 0, color = "red")#, fill = alpha(c("white"),1))

## `geom_smooth()` using formula = 'y ~ x'

Even for evenness, this association seem to have a significant term in general.

Multiple linear regression

lm(data = data.frame(sample_data_with_alpha),
   evenness_simpson ~ final_reads + When) %>% summary

## 
## Call:
## lm(formula = evenness_simpson ~ final_reads + When, data = data.frame(sample_data_with_alpha))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.052392 -0.025102  0.001077  0.019681  0.090052 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.348e-01  2.117e-02  15.818 3.44e-11 ***
## final_reads -1.449e-05  2.530e-06  -5.728 3.11e-05 ***
## WhenLate     3.168e-02  1.853e-02   1.710    0.107    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04027 on 16 degrees of freedom
## Multiple R-squared:  0.6972, Adjusted R-squared:  0.6593 
## F-statistic: 18.42 on 2 and 16 DF,  p-value: 7.075e-05

The association we ‘obsereved’ was not because of experimental condition, but due to different final reads.

glm(data = data.frame(sample_data_with_alpha),
   as.factor(When) ~ final_reads, family = "binomial") %>% summary

## 
## Call:
## glm(formula = as.factor(When) ~ final_reads, family = "binomial", 
##     data = data.frame(sample_data_with_alpha))
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  2.820e-01  9.144e-01   0.308    0.758
## final_reads -2.813e-05  1.259e-04  -0.223    0.823
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.287  on 18  degrees of freedom
## Residual deviance: 26.237  on 17  degrees of freedom
## AIC: 30.237
## 
## Number of Fisher Scoring iterations: 3

However, the change of dominance was due to experimental condition.

So, the TRUE change between the samples from early and late phase, was Dominance.

How would it change the association for beta diversity?

Ordination

Principal Coordinates Analysis with Bray-Curtis dist

phyloseq_rel <- phyloseq %>% transform_sample_counts(., function(x) {x/sum(x)})

sample_data(phyloseq_rel) <- sample_data_with_alpha

plot_ordination(physeq = phyloseq_rel,
                ordination = ordinate(phyloseq_rel, "PCoA", "bray"),
                col = "When"
                ) +
        geom_point(size = 8)

These two groups seems different. Is it really different, by sequencing depth or the sampling phase?

plot_ordination(physeq = phyloseq_rel,
                ordination = ordinate(phyloseq_rel, "PCoA", "bray"),
                col = "final_reads"
                ) +
        geom_point(size = 8) +
        scale_color_gradient(low = "blue", high = "red")

Symbols and colors at the sample time

plot_ordination(physeq = phyloseq_rel,
                ordination = ordinate(phyloseq_rel, "PCoA", "bray"),
                shape = "When",
                col = "final_reads"
                ) +
        geom_point(size = 8) +
        scale_color_gradient(low = "blue", high = "red") +
        stat_ellipse(type = "norm", linetype = 2) +
        stat_ellipse(type = "t") +
        theme_classic()

## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

        #scale_color_brewer(type = "qual")

PERMANOVA

As betadiversity indices have multiple distances at the same time, the calculation must accompany multiple analyses, at a random order (permutation).

This is permutational anova (PERMANOVA).

Univariate permutational anova (PERMANOVA).

Based on sampling point

set.seed(1)

adonis(formula = distance(phyloseq, method = "bray") ~ When,
       data = data.frame(sample_data_with_alpha),
       permutations = 10000) %>% 
        .$aov.tab

## Warning in adonis(formula = distance(phyloseq, method = "bray") ~ When, : 'adonis' is deprecated.
## Use 'adonis2' instead.
## See help("Deprecated") and help("vegan-deprecated").

## Permutation: free
## Number of permutations: 10000
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## When       1   0.50664 0.50664  5.6658 0.24997  4e-04 ***
## Residuals 17   1.52013 0.08942         0.75003           
## Total     18   2.02676                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There was difference. R-squared shows that the 25% of variance in beta-diversity was explained by When variable.

Multivariate anlaysis

adonis(formula = distance(phyloseq, method = "bray") ~ final_reads + When,
       data = data.frame(sample_data_with_alpha),
       permutations = 10000) %>% 
        .$aov.tab

## Warning in adonis(formula = distance(phyloseq, method = "bray") ~ final_reads + : 'adonis' is deprecated.
## Use 'adonis2' instead.
## See help("Deprecated") and help("vegan-deprecated").

## Permutation: free
## Number of permutations: 10000
## 
## Terms added sequentially (first to last)
## 
##             Df SumsOfSqs MeanSqs F.Model      R2    Pr(>F)    
## final_reads  1   0.65460 0.65460 12.0304 0.32298 9.999e-05 ***
## When         1   0.50158 0.50158  9.2182 0.24748 9.999e-05 ***
## Residuals   16   0.87059 0.05441         0.42955              
## Total       18   2.02676                 1.00000              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There was difference. R-squared shows that the 24% of variance in beta-diversity was explained by When variable, and 32% was explained by Fian reads.

–> 1% of variance in beta-diversity was ADJUSTED by sequencing depth.

Bibliography

## Computing. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also 'citation("pkgname")' for citing R packages.
## version 1.4.3, <https://CRAN.R-project.org/package=readxl>.
## graphics of microbiome census data. Paul J. McMurdie and Susan Holmes (2013) PLoS ONE 8(4):e61217.
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
## J, reikoch, Beasley W, O'Connor B, Warnes GR, Quinn M, Kamvar ZN, Gao C (2024). yaml: Methods to Convert R Data to YAML and Back_. R package version 2.3.10, <https://CRAN.R-project.org/package=yaml>. ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'.