\[S_{chao1} = S_{obs} + \frac{{n_1}\left ({n_1}-1 \right )}{2\left ({n_2}+1 \right )}\]
这里,
\[S_{chao1} = 估计OTU个数 \]
\[S_{obs}= 实际观测到的OTU个数\]
\[n_{1} = 只有一条序列的OTU个数\]
\[n_{2} = 只有两条序列的OTU个数\]
用Chao1算法估计样品中所含OTU数目的指数,Chao1在生态学中常用来估计物种总数,由Chao (1984) 最早提出
\[S_{ACE} = \begin{cases} S_{abund} + \frac {S_{rare}}{C_{ACE}} + \frac{n_1}{C_{ACE}}{{\hat{\gamma}}_{ACE}^2}\mbox{, for }\hat{\gamma}_{ACE}<\mbox{0.80} \\ S_{abund} + \frac {S_{rare}}{C_{ACE}} + \frac{n_1}{C_{ACE}}{{\tilde{\gamma}}_{ACE}^2}\mbox{, for }\hat{\gamma}_{ACE}\geqslant\mbox{0.80} \end{cases} \]
这里,
\[N_{rare} = \sum_{i=1}^{abund}{in_i}\]
\[C_{ACE} = 1 - \frac {n_1}{N_{rare}}\]
\[{{\hat{\gamma}}_{ACE}^2} = max \left[ \frac {S_{rare}}{C_{ACE}} \frac{\sum_{i=1}^{abund} i \left ( i-1 \right ) n_i }{N_{rare} \left( N_{rare} - 1 \right )} - 1,0 \right ]\]
\[\tilde{\gamma}_{ACE}^2 = max \left[\hat{\gamma}_{ACE}^2 \left\{1+\frac{N_{rare}\left(1-C_{ACE}\right)\sum_{i=1}^{abund} i \left ( i-1 \right ) n_i }{N_{rare}\left(N_{rare}-C_{ACE}\right)}\right\}\mbox{, 0}\right]\]
\[var \left( S_{ACE} \right ) {\approx} \sum_{j=1}^{n} \sum_{i=1}^{n} \frac{{\partial}S_{ACE}}{{\partial}n_i} \frac{{\partial}S_{ACE}}{{\partial}n_j}\]
\[cov \left( f_i, f_j \right) = f_i \left(1-f_i / S_{ACE} \right ), i = j\]
\[cov\left ( f_i, f_j \right) = -f_i f_j / {S_{ACE}}, i\ne j \]
\[n_{i} = 含有i条序列的OTU \]
\[S_{rare} = 小于等于abund的OTU的数目\]
\[S_{abund} = 大于abund的OTU的数目\]
\[abund = 被认为是一个丰富的OTU的阈值;这是默认设置为10\]
用来估计群落中OTU 数目的指数,由Chao 提出,是生态学中估计物种总数的常用指数之一,与Chao1 的算法不同
本质上都是对于观测值的矫正
越大说明菌群丰度越高即种类越多
\[D_{simpson} = \frac {\sum_{i=1}^{S_{obs}} {n_i \left ( n_i - 1 \right )}}{N \left( N-1 \right )}\]
\[var\left(D_{simpson}\right)=\frac{\sum_{i=1}^{S_{obs}} \left(\frac{n_i}{N}\right)^3-\left(\sum_{i=1}^{S_{obs}} \left(\frac{n_i}{N}\right)^2\right)^2}{0.25N}\]
\[LCI_{95\%}=D_{simpson}-1.96\sqrt{var\left ( D_{simpson} \right )}\]
\[UCI_{95\%}=D_{simpson}+1.96\sqrt{var\left ( D_{simpson} \right )}\]
这里,
\[S_{obs} = 实际观测到的OTU数目\]
\[n_i = 第i个OTU的序列数\]
\[N = 所有序列数\]
Mothur: 0.800631 0.958772
## $a
## [1] 75 6 2 1
##
## $b
## [1] 0 94 2 0
(75*(75-1)+6*(6-1)+2*(2-1)+1*(1-1))/(84*83)
## [1] 0.8006311
(94*(94-1)+2*(1))/(96*95)
## [1] 0.9587719
辛普生多样性指数(Simpson’s diversity index)的生物学意义是:在很大的群落中,随机抽取两个个体,它们为相同物种的概率是多少
越大说明多样性越少
\[H_{shannon} = - \sum_{i=1}^{S_{obs}} \frac{n_i}{N} ln \frac{n_i}{N}\]
\[var\left ( H_{Shannon} \right ) = \frac {\sum_{i=1}^{S_{obs}} \frac{n_i}{N} \left ( ln \frac{n_i}{N} \right )^2 - H_{shannon}^{2}}{N} + \frac{S_{obs} - 1}{2N^{2}}\]
\[LCI_{95\%}=H_{shannon}-1.96\sqrt{var\left ( H_{shannon} \right )}\]
\[UCI_{95\%}=H_{shannon}+1.96\sqrt{var\left ( H_{shannon} \right )}\]
Mothur: 0.431430 0.101265
-(log(1/84)*1/84+log(2/84)*2/84+log(6/84)*6/84+log(75/84)*75/84)
## [1] 0.4314304
-(log(94/96)*94/96+log(2/96)*2/96)
## [1] 0.1012648
Shannon指数来源于信息理论。它的计算公式表明,群落中生物种类增多代表了群落的复杂程度增高,即H值越大,群落所含的信息量越大,群落的物种多样性越大。
菌群多样性指数考量以下两个方面:
1)菌群丰度(richness of species)即物种数量的多少。
2)菌群分布均匀程度(eveness of species distribution)即个体数量在种间分布的均匀程度。
\[C=1-\frac{n_1}{N}\]
这里,
\[n_{1} = 序列数为1的OTU数目\]
\[N = 总序列数\]
coverage的意义在于评估OTU对于物种的覆盖程度
取值区间[0,1]
Alpha多样性(Alpha diversity)反映的是单个样品物种丰度(richness)及物种多样性(diversity),有多种衡量指标:Chao1、Ace、Shannon、Simpson。
Chao1和Ace指数衡量物种丰度即物种数量的多少。
Chao1和Ace指数越大,说明物种丰度越高。
Shannon和Simpson指数用于衡量物种多样性,受样品群落中物种丰度和物种均匀度(Community evenness)的影响。
相同物种丰度的情况下,群落中各物种具有越大的均匀度,则认为群落具有越大的多样性,Shannon指数值越大,Simpson指数值越小,说明样品的物种多样性越高。
另外还统计了OTU覆盖率 Coverage,其数值越高,则样本中物种被测出的概率越高,而没有被测出的概率越低。该指数反映本次测序结果是否代表了样本中微生物的真实情况。
## R version 3.3.3 (2017-03-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
##
## locale:
## [1] LC_CTYPE=zh_CN.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=zh_CN.UTF-8 LC_COLLATE=zh_CN.UTF-8
## [5] LC_MONETARY=zh_CN.UTF-8 LC_MESSAGES=zh_CN.UTF-8
## [7] LC_PAPER=zh_CN.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] backports_1.0.5 magrittr_1.5 rprojroot_1.2
## [4] tools_3.3.3 htmltools_0.3.6 yaml_2.1.14
## [7] Rcpp_0.12.12 stringi_1.1.5 rmarkdown_1.6.0.9001
## [10] knitr_1.16 stringr_1.2.0 digest_0.6.12
## [13] evaluate_0.10.1