This analysis pipe noted https://github.com/minsiksudo/BTE3207_Advanced_Biostatistics.
Roading the data
from previous lecture.. we generated a phyloseq object. it can be saved to hard disk using the below command.
saveRDS(ps, "phyloseq_example.rds")
And it can be loaded to a new R session like below!
getwd()
## [1] "/Users/minsikkim/Dropbox (Personal)"
library(phyloseq)
library(tidyverse)
library(microbiome)
##
## microbiome R package (microbiome.github.com)
##
##
##
## Copyright (C) 2011-2022 Leo Lahti,
## Sudarshan Shetty et al. <microbiome.github.io>
##
## Attaching package: 'microbiome'
## The following object is masked from 'package:ggplot2':
##
## alpha
## The following object is masked from 'package:base':
##
## transform
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-8
##
## Attaching package: 'vegan'
## The following object is masked from 'package:microbiome':
##
## diversity
library(ggplot2)
#source("https://raw.githubusercontent.com/joey711/phyloseq/master/inst/scripts/installer.R",
# local = TRUE)
phyloseq <- readRDS("Inha/5_Lectures/2024/Advanced biostatistics/scripts/BTE3207_Advanced_Biostatistics/phyloseq_example.rds")
phyloseq %>% sample_data
## Subject Gender Day When
## F3D0 3 F 0 Early
## F3D1 3 F 1 Early
## F3D141 3 F 141 Late
## F3D142 3 F 142 Late
## F3D143 3 F 143 Late
## F3D144 3 F 144 Late
## F3D145 3 F 145 Late
## F3D146 3 F 146 Late
## F3D147 3 F 147 Late
## F3D148 3 F 148 Late
## F3D149 3 F 149 Late
## F3D150 3 F 150 Late
## F3D2 3 F 2 Early
## F3D3 3 F 3 Early
## F3D5 3 F 5 Early
## F3D6 3 F 6 Early
## F3D7 3 F 7 Early
## F3D8 3 F 8 Early
## F3D9 3 F 9 Early
Basics of statistical tests
MPG dataset
MPG dataset is basic data installed in ggplot
head(mpg)
## # A tibble: 6 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
Double-check the data type.
str(mpg)
## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
t.test fucntion will conduct basic t-test
Paired t-test
t.test(mpg$cty, mpg$hwy, paired = T)
##
## Paired t-test
##
## data: mpg$cty and mpg$hwy
## t = -44.492, df = 233, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -6.872628 -6.289765
## sample estimates:
## mean difference
## -6.581197
Unpaired t-test
mpg$class
## [1] "compact" "compact" "compact" "compact" "compact"
## [6] "compact" "compact" "compact" "compact" "compact"
## [11] "compact" "compact" "compact" "compact" "compact"
## [16] "midsize" "midsize" "midsize" "suv" "suv"
## [21] "suv" "suv" "suv" "2seater" "2seater"
## [26] "2seater" "2seater" "2seater" "suv" "suv"
## [31] "suv" "suv" "midsize" "midsize" "midsize"
## [36] "midsize" "midsize" "minivan" "minivan" "minivan"
## [41] "minivan" "minivan" "minivan" "minivan" "minivan"
## [46] "minivan" "minivan" "minivan" "pickup" "pickup"
## [51] "pickup" "pickup" "pickup" "pickup" "pickup"
## [56] "pickup" "pickup" "suv" "suv" "suv"
## [61] "suv" "suv" "suv" "suv" "pickup"
## [66] "pickup" "pickup" "pickup" "pickup" "pickup"
## [71] "pickup" "pickup" "pickup" "pickup" "suv"
## [76] "suv" "suv" "suv" "suv" "suv"
## [81] "suv" "suv" "suv" "pickup" "pickup"
## [86] "pickup" "pickup" "pickup" "pickup" "pickup"
## [91] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [96] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [101] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [106] "subcompact" "subcompact" "subcompact" "midsize" "midsize"
## [111] "midsize" "midsize" "midsize" "midsize" "midsize"
## [116] "subcompact" "subcompact" "subcompact" "subcompact" "subcompact"
## [121] "subcompact" "subcompact" "suv" "suv" "suv"
## [126] "suv" "suv" "suv" "suv" "suv"
## [131] "suv" "suv" "suv" "suv" "suv"
## [136] "suv" "suv" "suv" "suv" "suv"
## [141] "suv" "compact" "compact" "midsize" "midsize"
## [146] "midsize" "midsize" "midsize" "midsize" "midsize"
## [151] "suv" "suv" "suv" "suv" "midsize"
## [156] "midsize" "midsize" "midsize" "midsize" "suv"
## [161] "suv" "suv" "suv" "suv" "suv"
## [166] "subcompact" "subcompact" "subcompact" "subcompact" "compact"
## [171] "compact" "compact" "compact" "suv" "suv"
## [176] "suv" "suv" "suv" "suv" "midsize"
## [181] "midsize" "midsize" "midsize" "midsize" "midsize"
## [186] "midsize" "compact" "compact" "compact" "compact"
## [191] "compact" "compact" "compact" "compact" "compact"
## [196] "compact" "compact" "compact" "suv" "suv"
## [201] "pickup" "pickup" "pickup" "pickup" "pickup"
## [206] "pickup" "pickup" "compact" "compact" "compact"
## [211] "compact" "compact" "compact" "compact" "compact"
## [216] "compact" "compact" "compact" "compact" "compact"
## [221] "compact" "subcompact" "subcompact" "subcompact" "subcompact"
## [226] "subcompact" "subcompact" "midsize" "midsize" "midsize"
## [231] "midsize" "midsize" "midsize" "midsize"
mpg %>% filter(class == "2seater")%>% .$cty
## [1] 16 15 16 15 15
city_mpg_2seater <- mpg %>% filter(class == "2seater") %>% .$cty
city_mpg_midsize <- mpg %>% filter(class == "midsize") %>% .$cty
city_mpg_2seater
## [1] 16 15 16 15 15
city_mpg_midsize
## [1] 15 17 16 19 22 18 18 17 18 18 21 21 18 18 19 23 23 19 19 18 19 19 18 16 17
## [26] 18 16 21 21 21 21 18 18 19 21 18 19 21 16 18 17
t.test(city_mpg_2seater,
city_mpg_midsize,
paired = F,
var.equal = F)
##
## Welch Two Sample t-test
##
## data: city_mpg_2seater and city_mpg_midsize
## t = -8.5965, df = 20.862, p-value = 2.69e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.168314 -2.543882
## sample estimates:
## mean of x mean of y
## 15.4000 18.7561
Linear model
Univariate linear model
Boxplot of categorical x, continuous y
boxplot(data = mpg, cty ~ class)
Linear model
lm(data = mpg,
cty ~ class)
##
## Call:
## lm(formula = cty ~ class, data = mpg)
##
## Coefficients:
## (Intercept) classcompact classmidsize classminivan
## 15.4000 4.7277 3.3561 0.4182
## classpickup classsubcompact classsuv
## -2.4000 4.9714 -1.9000
Linear model - the output can be stored
lm_result_mpg <-
lm(data = mpg,
cty ~ class)
lm_result_mpg
##
## Call:
## lm(formula = cty ~ class, data = mpg)
##
## Coefficients:
## (Intercept) classcompact classmidsize classminivan
## 15.4000 4.7277 3.3561 0.4182
## classpickup classsubcompact classsuv
## -2.4000 4.9714 -1.9000
The result of lm is a list having multiple caclulated factors
lm_result_mpg %>% names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "contrasts" "xlevels" "call" "terms"
## [13] "model"
Summary of a linear model
summary(lm_result_mpg)
##
## Call:
## lm(formula = cty ~ class, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3714 -1.7561 -0.1277 1.0000 14.6286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.4000 1.3024 11.824 < 2e-16 ***
## classcompact 4.7277 1.3699 3.451 0.000666 ***
## classmidsize 3.3561 1.3796 2.433 0.015759 *
## classminivan 0.4182 1.5708 0.266 0.790307
## classpickup -2.4000 1.3976 -1.717 0.087304 .
## classsubcompact 4.9714 1.3923 3.571 0.000435 ***
## classsuv -1.9000 1.3539 -1.403 0.161884
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.912 on 227 degrees of freedom
## Multiple R-squared: 0.5438, Adjusted R-squared: 0.5317
## F-statistic: 45.1 on 6 and 227 DF, p-value: < 2.2e-16
This result will show you how different data is. (by the factor label
of class
)
You have 1. Estimate (beta) 2. Standard errors 3. p-values 4. R-squared values
ANOVA
Analysis of variance for linear model can be done to this linear model result.
anova(lm_result_mpg)
## Analysis of Variance Table
##
## Response: cty
## Df Sum Sq Mean Sq F value Pr(>F)
## class 6 2295.0 382.51 45.099 < 2.2e-16 ***
## Residuals 227 1925.3 8.48
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How can we test different multiple associations?
ggplot(mpg,
aes(y = cty, x = class, fill = as.factor(year))) +
geom_boxplot()
Anova - Multivariate
lm(data = mpg,
formula = hwy ~ class + as.factor(year)) %>%
anova
## Analysis of Variance Table
##
## Response: hwy
## Df Sum Sq Mean Sq F value Pr(>F)
## class 6 5683.2 947.21 83.4157 <2e-16 ***
## as.factor(year) 1 12.1 12.15 1.0699 0.3021
## Residuals 226 2566.3 11.36
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linear model - Multivariate
Multiple linear regression can be don by just adding a new variable to formula
lm(data = mpg,
formula = hwy ~ class * as.factor(year)) %>%
summary
##
## Call:
## lm(formula = hwy ~ class * as.factor(year), data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.000 -1.812 -0.500 1.341 16.080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.5000 2.3822 10.285 < 2e-16 ***
## classcompact 3.4200 2.4756 1.381 0.16854
## classmidsize 2.0000 2.4984 0.800 0.42429
## classminivan -2.0000 2.7507 -0.727 0.46795
## classpickup -7.6875 2.5267 -3.043 0.00263 **
## classsubcompact 4.5000 2.5044 1.797 0.07373 .
## classsuv -6.9483 2.4629 -2.821 0.00522 **
## as.factor(year)2008 0.5000 3.0754 0.163 0.87100
## classcompact:as.factor(year)2008 0.3073 3.2292 0.095 0.92428
## classmidsize:as.factor(year)2008 1.0476 3.2505 0.322 0.74754
## classminivan:as.factor(year)2008 -0.8000 3.6904 -0.217 0.82858
## classpickup:as.factor(year)2008 -0.3713 3.2916 -0.113 0.91029
## classsubcompact:as.factor(year)2008 -2.3750 3.2809 -0.724 0.46991
## classsuv:as.factor(year)2008 0.5846 3.1927 0.183 0.85487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.369 on 220 degrees of freedom
## Multiple R-squared: 0.6978, Adjusted R-squared: 0.6799
## F-statistic: 39.07 on 13 and 220 DF, p-value: < 2.2e-16
Statistical tests for alpha diversity
microbiome::alpha(phyloseq)
## Observed richness
## Other forms of richness
## Diversity
## Evenness
## Dominance
## Rarity
## observed chao1 diversity_inverse_simpson diversity_gini_simpson
## F3D0 101 101 28.30269 0.9646677
## F3D1 95 95 33.63697 0.9702708
## F3D141 72 72 20.40816 0.9510000
## F3D142 46 46 16.18065 0.9381978
## F3D143 52 52 18.96604 0.9472742
## F3D144 48 48 15.64626 0.9360870
## F3D145 65 65 15.90272 0.9371177
## F3D146 80 80 23.17233 0.9568451
## F3D147 100 100 17.67047 0.9434084
## F3D148 92 92 20.35859 0.9508807
## F3D149 107 107 23.33034 0.9571374
## F3D150 72 72 22.03383 0.9546153
## F3D2 128 128 14.24080 0.9297792
## F3D3 62 62 12.77515 0.9217230
## F3D5 79 79 26.61261 0.9624238
## F3D6 85 85 16.09561 0.9378712
## F3D7 56 56 12.45250 0.9196948
## F3D8 91 91 21.97649 0.9544968
## F3D9 103 103 23.49765 0.9574426
## diversity_shannon diversity_fisher diversity_coverage evenness_camargo
## F3D0 3.858471 16.981200 10 0.3427514
## F3D1 3.961659 16.721685 13 0.4009949
## F3D141 3.443176 11.995626 7 0.3073986
## F3D142 3.105778 8.022685 6 0.3398586
## F3D143 3.291013 9.279171 7 0.3553785
## F3D144 3.066893 7.815491 6 0.3135287
## F3D145 3.108397 10.265899 6 0.2482799
## F3D146 3.614385 14.274466 8 0.3298440
## F3D147 3.339392 14.687532 6 0.2132134
## F3D148 3.435338 14.040249 7 0.2454306
## F3D149 3.633402 16.580009 8 0.2597679
## F3D150 3.551333 12.314447 8 0.3416144
## F3D2 3.550710 18.859482 6 0.2410682
## F3D3 3.024484 9.826275 5 0.2430211
## F3D5 3.730210 14.243093 9 0.3807388
## F3D6 3.395656 13.767017 6 0.2752713
## F3D7 2.965860 9.152918 5 0.2539099
## F3D8 3.685304 16.216649 8 0.3264930
## F3D9 3.755616 17.681589 8 0.3140676
## evenness_pielou evenness_simpson evenness_evar evenness_bulla
## F3D0 0.8360499 0.2802247 0.3897545 0.5241681
## F3D1 0.8699530 0.3540734 0.4190817 0.5753430
## F3D141 0.8051076 0.2834467 0.3541053 0.4506402
## F3D142 0.8111959 0.3517532 0.3098635 0.4712513
## F3D143 0.8329056 0.3647315 0.3993937 0.4979298
## F3D144 0.7922328 0.3259638 0.2872649 0.4349288
## F3D145 0.7446356 0.2446573 0.2375730 0.3602024
## F3D146 0.8248204 0.2896541 0.4084409 0.4796167
## F3D147 0.7251398 0.1767047 0.2718871 0.3317670
## F3D148 0.7597298 0.2212891 0.2844793 0.3685461
## F3D149 0.7775594 0.2180405 0.3070244 0.4001504
## F3D150 0.8303976 0.3060255 0.4252157 0.4987970
## F3D2 0.7317987 0.1112563 0.3100413 0.4040083
## F3D3 0.7328291 0.2060508 0.2859279 0.3663371
## F3D5 0.8537028 0.3368684 0.4427727 0.5535557
## F3D6 0.7643309 0.1893601 0.4014008 0.4165620
## F3D7 0.7367952 0.2223660 0.3044753 0.3723305
## F3D8 0.8169850 0.2414999 0.4348843 0.4997712
## F3D9 0.8103206 0.2281325 0.4133750 0.4855633
## dominance_dbp dominance_dmn dominance_absolute dominance_relative
## F3D0 0.08851195 0.1612953 574 0.08851195
## F3D1 0.08244681 0.1540507 403 0.08244681
## F3D141 0.10252170 0.1940885 496 0.10252170
## F3D142 0.12257282 0.2382686 303 0.12257282
## F3D143 0.09322709 0.1852590 234 0.09322709
## F3D144 0.11558621 0.2143448 419 0.11558621
## F3D145 0.11232639 0.2121528 647 0.11232639
## F3D146 0.10095780 0.1827595 390 0.10095780
## F3D147 0.11298457 0.2045164 1501 0.11298457
## F3D148 0.08861532 0.1762132 871 0.08861532
## F3D149 0.08514889 0.1692513 895 0.08514889
## F3D150 0.10894118 0.2002353 463 0.10894118
## F3D2 0.20909200 0.3036655 3491 0.20909200
## F3D3 0.18227332 0.2938995 983 0.18227332
## F3D5 0.08908441 0.1646962 324 0.08908441
## F3D6 0.15355465 0.2557223 1013 0.15355465
## F3D7 0.15553412 0.2761032 645 0.15553412
## F3D8 0.12533937 0.2045249 554 0.12533937
## F3D9 0.09946417 0.1846952 594 0.09946417
## dominance_simpson dominance_core_abundance dominance_gini
## F3D0 0.03533233 0.7822668 0.8317814
## F3D1 0.02972919 0.7172668 0.8176164
## F3D141 0.04900000 0.9096734 0.8927575
## F3D142 0.06180222 0.9538835 0.9255230
## F3D143 0.05272583 0.9294821 0.9093803
## F3D144 0.06391302 0.9544828 0.9280702
## F3D145 0.06288231 0.9366319 0.9239413
## F3D146 0.04315493 0.8480456 0.8689525
## F3D147 0.05659158 0.9075649 0.8986609
## F3D148 0.04911931 0.9119951 0.8920291
## F3D149 0.04286264 0.8665208 0.8668117
## F3D150 0.04538475 0.8687059 0.8789076
## F3D2 0.07022077 0.8069598 0.8539834
## F3D3 0.07827699 0.9404784 0.9275789
## F3D5 0.03757618 0.7951608 0.8541172
## F3D6 0.06212876 0.8450811 0.8850364
## F3D7 0.08030517 0.9266940 0.9315458
## F3D8 0.04550316 0.7635747 0.8528341
## F3D9 0.04255744 0.7436370 0.8408481
## rarity_log_modulo_skewness rarity_low_abundance rarity_rare_abundance
## F3D0 2.041833 0.031765613 0.088357749
## F3D1 2.031534 0.021890344 0.046849427
## F3D141 2.039782 0.020876395 0.010954940
## F3D142 2.005921 0.003640777 0.010517799
## F3D143 2.037479 0.005577689 0.006374502
## F3D144 2.018215 0.010206897 0.005517241
## F3D145 2.030860 0.028645833 0.012673611
## F3D146 2.046630 0.012943308 0.040383122
## F3D147 2.046947 0.055024464 0.022581859
## F3D148 2.045807 0.054125547 0.022586224
## F3D149 2.045309 0.048615736 0.041004662
## F3D150 2.040085 0.012000000 0.029882353
## F3D2 1.994599 0.049173455 0.053066603
## F3D3 2.046069 0.022621917 0.017059151
## F3D5 2.050813 0.013747594 0.032444322
## F3D6 2.053307 0.028800970 0.031832651
## F3D7 2.048982 0.019049916 0.008680974
## F3D8 2.055754 0.018325792 0.063122172
## F3D9 2.050676 0.034829203 0.074514401
sample_data_with_alpha <-
phyloseq %>%
sample_data() %>%
merge(., #Merge function will do the same thing as vloopup at excel
microbiome::alpha(phyloseq),
by = 0) %>%
column_to_rownames("Row.names") %>%
sample_data
## Observed richness
## Other forms of richness
## Diversity
## Evenness
## Dominance
## Rarity
sample_data(phyloseq) <- sample_data_with_alpha
sample_data_with_alpha
## Subject Gender Day When observed chao1 diversity_inverse_simpson
## F3D0 3 F 0 Early 101 101 28.30269
## F3D1 3 F 1 Early 95 95 33.63697
## F3D141 3 F 141 Late 72 72 20.40816
## F3D142 3 F 142 Late 46 46 16.18065
## F3D143 3 F 143 Late 52 52 18.96604
## F3D144 3 F 144 Late 48 48 15.64626
## F3D145 3 F 145 Late 65 65 15.90272
## F3D146 3 F 146 Late 80 80 23.17233
## F3D147 3 F 147 Late 100 100 17.67047
## F3D148 3 F 148 Late 92 92 20.35859
## F3D149 3 F 149 Late 107 107 23.33034
## F3D150 3 F 150 Late 72 72 22.03383
## F3D2 3 F 2 Early 128 128 14.24080
## F3D3 3 F 3 Early 62 62 12.77515
## F3D5 3 F 5 Early 79 79 26.61261
## F3D6 3 F 6 Early 85 85 16.09561
## F3D7 3 F 7 Early 56 56 12.45250
## F3D8 3 F 8 Early 91 91 21.97649
## F3D9 3 F 9 Early 103 103 23.49765
## diversity_gini_simpson diversity_shannon diversity_fisher
## F3D0 0.9646677 3.858471 16.981200
## F3D1 0.9702708 3.961659 16.721685
## F3D141 0.9510000 3.443176 11.995626
## F3D142 0.9381978 3.105778 8.022685
## F3D143 0.9472742 3.291013 9.279171
## F3D144 0.9360870 3.066893 7.815491
## F3D145 0.9371177 3.108397 10.265899
## F3D146 0.9568451 3.614385 14.274466
## F3D147 0.9434084 3.339392 14.687532
## F3D148 0.9508807 3.435338 14.040249
## F3D149 0.9571374 3.633402 16.580009
## F3D150 0.9546153 3.551333 12.314447
## F3D2 0.9297792 3.550710 18.859482
## F3D3 0.9217230 3.024484 9.826275
## F3D5 0.9624238 3.730210 14.243093
## F3D6 0.9378712 3.395656 13.767017
## F3D7 0.9196948 2.965860 9.152918
## F3D8 0.9544968 3.685304 16.216649
## F3D9 0.9574426 3.755616 17.681589
## diversity_coverage evenness_camargo evenness_pielou evenness_simpson
## F3D0 10 0.3427514 0.8360499 0.2802247
## F3D1 13 0.4009949 0.8699530 0.3540734
## F3D141 7 0.3073986 0.8051076 0.2834467
## F3D142 6 0.3398586 0.8111959 0.3517532
## F3D143 7 0.3553785 0.8329056 0.3647315
## F3D144 6 0.3135287 0.7922328 0.3259638
## F3D145 6 0.2482799 0.7446356 0.2446573
## F3D146 8 0.3298440 0.8248204 0.2896541
## F3D147 6 0.2132134 0.7251398 0.1767047
## F3D148 7 0.2454306 0.7597298 0.2212891
## F3D149 8 0.2597679 0.7775594 0.2180405
## F3D150 8 0.3416144 0.8303976 0.3060255
## F3D2 6 0.2410682 0.7317987 0.1112563
## F3D3 5 0.2430211 0.7328291 0.2060508
## F3D5 9 0.3807388 0.8537028 0.3368684
## F3D6 6 0.2752713 0.7643309 0.1893601
## F3D7 5 0.2539099 0.7367952 0.2223660
## F3D8 8 0.3264930 0.8169850 0.2414999
## F3D9 8 0.3140676 0.8103206 0.2281325
## evenness_evar evenness_bulla dominance_dbp dominance_dmn
## F3D0 0.3897545 0.5241681 0.08851195 0.1612953
## F3D1 0.4190817 0.5753430 0.08244681 0.1540507
## F3D141 0.3541053 0.4506402 0.10252170 0.1940885
## F3D142 0.3098635 0.4712513 0.12257282 0.2382686
## F3D143 0.3993937 0.4979298 0.09322709 0.1852590
## F3D144 0.2872649 0.4349288 0.11558621 0.2143448
## F3D145 0.2375730 0.3602024 0.11232639 0.2121528
## F3D146 0.4084409 0.4796167 0.10095780 0.1827595
## F3D147 0.2718871 0.3317670 0.11298457 0.2045164
## F3D148 0.2844793 0.3685461 0.08861532 0.1762132
## F3D149 0.3070244 0.4001504 0.08514889 0.1692513
## F3D150 0.4252157 0.4987970 0.10894118 0.2002353
## F3D2 0.3100413 0.4040083 0.20909200 0.3036655
## F3D3 0.2859279 0.3663371 0.18227332 0.2938995
## F3D5 0.4427727 0.5535557 0.08908441 0.1646962
## F3D6 0.4014008 0.4165620 0.15355465 0.2557223
## F3D7 0.3044753 0.3723305 0.15553412 0.2761032
## F3D8 0.4348843 0.4997712 0.12533937 0.2045249
## F3D9 0.4133750 0.4855633 0.09946417 0.1846952
## dominance_absolute dominance_relative dominance_simpson
## F3D0 574 0.08851195 0.03533233
## F3D1 403 0.08244681 0.02972919
## F3D141 496 0.10252170 0.04900000
## F3D142 303 0.12257282 0.06180222
## F3D143 234 0.09322709 0.05272583
## F3D144 419 0.11558621 0.06391302
## F3D145 647 0.11232639 0.06288231
## F3D146 390 0.10095780 0.04315493
## F3D147 1501 0.11298457 0.05659158
## F3D148 871 0.08861532 0.04911931
## F3D149 895 0.08514889 0.04286264
## F3D150 463 0.10894118 0.04538475
## F3D2 3491 0.20909200 0.07022077
## F3D3 983 0.18227332 0.07827699
## F3D5 324 0.08908441 0.03757618
## F3D6 1013 0.15355465 0.06212876
## F3D7 645 0.15553412 0.08030517
## F3D8 554 0.12533937 0.04550316
## F3D9 594 0.09946417 0.04255744
## dominance_core_abundance dominance_gini rarity_log_modulo_skewness
## F3D0 0.7822668 0.8317814 2.041833
## F3D1 0.7172668 0.8176164 2.031534
## F3D141 0.9096734 0.8927575 2.039782
## F3D142 0.9538835 0.9255230 2.005921
## F3D143 0.9294821 0.9093803 2.037479
## F3D144 0.9544828 0.9280702 2.018215
## F3D145 0.9366319 0.9239413 2.030860
## F3D146 0.8480456 0.8689525 2.046630
## F3D147 0.9075649 0.8986609 2.046947
## F3D148 0.9119951 0.8920291 2.045807
## F3D149 0.8665208 0.8668117 2.045309
## F3D150 0.8687059 0.8789076 2.040085
## F3D2 0.8069598 0.8539834 1.994599
## F3D3 0.9404784 0.9275789 2.046069
## F3D5 0.7951608 0.8541172 2.050813
## F3D6 0.8450811 0.8850364 2.053307
## F3D7 0.9266940 0.9315458 2.048982
## F3D8 0.7635747 0.8528341 2.055754
## F3D9 0.7436370 0.8408481 2.050676
## rarity_low_abundance rarity_rare_abundance
## F3D0 0.031765613 0.088357749
## F3D1 0.021890344 0.046849427
## F3D141 0.020876395 0.010954940
## F3D142 0.003640777 0.010517799
## F3D143 0.005577689 0.006374502
## F3D144 0.010206897 0.005517241
## F3D145 0.028645833 0.012673611
## F3D146 0.012943308 0.040383122
## F3D147 0.055024464 0.022581859
## F3D148 0.054125547 0.022586224
## F3D149 0.048615736 0.041004662
## F3D150 0.012000000 0.029882353
## F3D2 0.049173455 0.053066603
## F3D3 0.022621917 0.017059151
## F3D5 0.013747594 0.032444322
## F3D6 0.028800970 0.031832651
## F3D7 0.019049916 0.008680974
## F3D8 0.018325792 0.063122172
## F3D9 0.034829203 0.074514401
Alpha diversity boxplots
sample_data_with_alpha %>%
ggplot(aes(x = When, y = observed)) +
geom_boxplot() +
xlab("Genus richness") +
theme_classic(base_size = 15)
Statistical tests on alpha diversity (linear model)
sample_data_with_alpha %>%
data.frame() %>%
lm(data = ., observed ~ When) %>%
summary
##
## Call:
## lm(formula = observed ~ When, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.89 -15.64 -1.40 13.11 39.11
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 88.889 7.226 12.301 6.88e-10 ***
## WhenLate -15.489 9.961 -1.555 0.138
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.68 on 17 degrees of freedom
## Multiple R-squared: 0.1245, Adjusted R-squared: 0.07303
## F-statistic: 2.418 on 1 and 17 DF, p-value: 0.1384
The difference between those two group - is not statistically significant
How about other outcomes?
sample_data_with_alpha %>%
data.frame() %>%
lm(data = ., observed ~ evenness_simpson) %>%
summary
##
## Call:
## lm(formula = observed ~ evenness_simpson, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.133 -13.320 3.044 14.488 32.319
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 131.10 17.17 7.637 6.83e-07 ***
## evenness_simpson -193.25 63.79 -3.030 0.00756 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.67 on 17 degrees of freedom
## Multiple R-squared: 0.3506, Adjusted R-squared: 0.3124
## F-statistic: 9.178 on 1 and 17 DF, p-value: 0.007562
The evenness difference between those two group - is statistically significant
sample_data_with_alpha %>%
data.frame() %>%
lm(data = ., observed ~ dominance_core_abundance) %>%
summary
##
## Call:
## lm(formula = observed ~ dominance_core_abundance, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.214 -12.941 -2.964 2.150 35.648
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 257.88 45.21 5.704 2.58e-05 ***
## dominance_core_abundance -205.13 52.16 -3.932 0.00107 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.77 on 17 degrees of freedom
## Multiple R-squared: 0.4763, Adjusted R-squared: 0.4455
## F-statistic: 15.46 on 1 and 17 DF, p-value: 0.001073
Core abundance was also statistically different.
How can we say this difference is not because of deeper sequencing?
sample_data_with_alpha$final_reads <- phyloseq %>% sample_sums
sample_data_with_alpha %>%
data.frame() %>%
ggplot(aes(x = final_reads, y = observed)) +
geom_point()
Linear model of this association
(As deeper sequencing identified more taxa)
lm_eqn <- function(m){
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(unname(coef(m)[1]), digits = 2),
b = format(unname(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
ggplot(data = sample_data_with_alpha, aes(x = final_reads, y = observed)) +
geom_smooth(method = "lm",
se = T,
color = "red",
linetype = "dashed"
) + # Plot regression slope
geom_point() +
theme_classic( base_family = "serif", base_size = 20) +
geom_label(x = 12000, y = 60,
label = lm_eqn(lm(data = data.frame(sample_data_with_alpha), observed ~ final_reads)),
parse = TRUE,
family='serif',
size = 8, label.size = 0, color = "red")#, fill = alpha(c("white"),1))
## `geom_smooth()` using formula = 'y ~ x'
lm(data = data.frame(sample_data_with_alpha), observed ~ final_reads) %>% summary
##
## Call:
## lm(formula = observed ~ final_reads, data = data.frame(sample_data_with_alpha))
##
## Residuals:
## Min 1Q Median 3Q Max
## -20.443 -13.328 -1.137 10.476 23.659
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.161e+01 6.672e+00 7.736 5.75e-07 ***
## final_reads 4.643e-03 9.189e-04 5.053 9.81e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.65 on 17 degrees of freedom
## Multiple R-squared: 0.6003, Adjusted R-squared: 0.5768
## F-statistic: 25.53 on 1 and 17 DF, p-value: 9.805e-05
There is significant correlation !
ggplot(data = sample_data_with_alpha, aes(x = final_reads, y = observed)) +
geom_smooth(method = "lm",
se = T,
color = "red",
linetype = "dashed"
) + # Plot regression slope
geom_point(aes(col = When)) +
theme_classic( base_family = "serif", base_size = 20) +
geom_label(x = 12000, y = 60,
label = lm_eqn(lm(data = data.frame(sample_data_with_alpha), observed ~ final_reads)),
parse = TRUE,
family='serif',
size = 8, label.size = 0, color = "red")#, fill = alpha(c("white"),1))
## `geom_smooth()` using formula = 'y ~ x'
Even for evenness, this association seem to have a significant term in general.
Multiple linear regression
lm(data = data.frame(sample_data_with_alpha),
evenness_simpson ~ final_reads + When) %>% summary
##
## Call:
## lm(formula = evenness_simpson ~ final_reads + When, data = data.frame(sample_data_with_alpha))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.052392 -0.025102 0.001077 0.019681 0.090052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.348e-01 2.117e-02 15.818 3.44e-11 ***
## final_reads -1.449e-05 2.530e-06 -5.728 3.11e-05 ***
## WhenLate 3.168e-02 1.853e-02 1.710 0.107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04027 on 16 degrees of freedom
## Multiple R-squared: 0.6972, Adjusted R-squared: 0.6593
## F-statistic: 18.42 on 2 and 16 DF, p-value: 7.075e-05
The association we ‘obsereved’ was not because of experimental condition, but due to different final reads.
glm(data = data.frame(sample_data_with_alpha),
as.factor(When) ~ final_reads, family = "binomial") %>% summary
##
## Call:
## glm(formula = as.factor(When) ~ final_reads, family = "binomial",
## data = data.frame(sample_data_with_alpha))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.820e-01 9.144e-01 0.308 0.758
## final_reads -2.813e-05 1.259e-04 -0.223 0.823
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 26.287 on 18 degrees of freedom
## Residual deviance: 26.237 on 17 degrees of freedom
## AIC: 30.237
##
## Number of Fisher Scoring iterations: 3
However, the change of dominance was due to experimental condition.
So, the TRUE change between the samples from early and late
phase, was Dominance
.
How would it change the association for beta diversity?
Ordination
Principal Coordinates Analysis with Bray-Curtis dist
phyloseq_rel <- phyloseq %>% transform_sample_counts(., function(x) {x/sum(x)})
sample_data(phyloseq_rel) <- sample_data_with_alpha
plot_ordination(physeq = phyloseq_rel,
ordination = ordinate(phyloseq_rel, "PCoA", "bray"),
col = "When"
) +
geom_point(size = 8)
These two groups seems different. Is it really different, by sequencing depth or the sampling phase?
plot_ordination(physeq = phyloseq_rel,
ordination = ordinate(phyloseq_rel, "PCoA", "bray"),
col = "final_reads"
) +
geom_point(size = 8) +
scale_color_gradient(low = "blue", high = "red")
Symbols and colors at the sample time
plot_ordination(physeq = phyloseq_rel,
ordination = ordinate(phyloseq_rel, "PCoA", "bray"),
shape = "When",
col = "final_reads"
) +
geom_point(size = 8) +
scale_color_gradient(low = "blue", high = "red") +
stat_ellipse(type = "norm", linetype = 2) +
stat_ellipse(type = "t") +
theme_classic()
## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
#scale_color_brewer(type = "qual")
PERMANOVA
As betadiversity indices have multiple distances at the same time, the calculation must accompany multiple analyses, at a random order (permutation).
This is permutational anova (PERMANOVA).
Univariate permutational anova (PERMANOVA).
Based on sampling point
set.seed(1)
adonis(formula = distance(phyloseq, method = "bray") ~ When,
data = data.frame(sample_data_with_alpha),
permutations = 10000) %>%
.$aov.tab
## Warning in adonis(formula = distance(phyloseq, method = "bray") ~ When, : 'adonis' is deprecated.
## Use 'adonis2' instead.
## See help("Deprecated") and help("vegan-deprecated").
## Permutation: free
## Number of permutations: 10000
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## When 1 0.50664 0.50664 5.6658 0.24997 4e-04 ***
## Residuals 17 1.52013 0.08942 0.75003
## Total 18 2.02676 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There was difference. R-squared shows that the 25% of variance in
beta-diversity was explained by When
variable.
Multivariate anlaysis
adonis(formula = distance(phyloseq, method = "bray") ~ final_reads + When,
data = data.frame(sample_data_with_alpha),
permutations = 10000) %>%
.$aov.tab
## Warning in adonis(formula = distance(phyloseq, method = "bray") ~ final_reads + : 'adonis' is deprecated.
## Use 'adonis2' instead.
## See help("Deprecated") and help("vegan-deprecated").
## Permutation: free
## Number of permutations: 10000
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## final_reads 1 0.65460 0.65460 12.0304 0.32298 9.999e-05 ***
## When 1 0.50158 0.50158 9.2182 0.24748 9.999e-05 ***
## Residuals 16 0.87059 0.05441 0.42955
## Total 18 2.02676 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There was difference. R-squared shows that the 24% of variance in
beta-diversity was explained by When
variable, and 32% was
explained by Fian reads
.
–> 1% of variance in beta-diversity was ADJUSTED by sequencing depth.
Bibliography
## Computing. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>. We have invested a lot of time and effort in creating R, please cite it when using it for data analysis. See also 'citation("pkgname")' for citing R packages.
## version 1.4.3, <https://CRAN.R-project.org/package=readxl>.
## graphics of microbiome census data. Paul J. McMurdie and Susan Holmes (2013) PLoS ONE 8(4):e61217.
## Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
## J, reikoch, Beasley W, O'Connor B, Warnes GR, Quinn M, Kamvar ZN, Gao C (2024). yaml: Methods to Convert R Data to YAML and Back_. R package version 2.3.10, <https://CRAN.R-project.org/package=yaml>. ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see 'help("citation")'.