In this package, we use the ecology methods to estimate the Tumor Heterogeneity(TH) based on their mutated loci of variant allele frequency(VAF).
The function inferHeterogeneityPlus estimates the TH based on two different methods in the Package vegan
Diveristy indices
Taxonomic indices
See also http://www.coastalwiki.org/wiki/Measurements_of_biodiversity for the concepts.
Function diversity finds the most commonly used diversity indices.
\(H=-\sum_{i=1}^{S}~p_ilogp_i\) Shannon Index (1)
\(D=\frac{1}{\sum_{i=1}^{S}~p_i^2} \\\) Inverse Simpson Index(2)
Where \(p_i\) is the proportion of species \(i\) and \(S\) is the number of species. For the tumor data, the VAFs of mutated loci in the tumor were assigned to i-th of S bins, and the parameter \(p_i\) the proportion of mutated loci belonging to the bins. Here, we set the bin size to 10 (Parameter bin_size controls the number of bins), yielding enough information to represent the distribution for proprotions of VAFs.
The simple diveristy above only consider species identity: all species are euqally different. In contrast, taxonimic diveristy indices judge the differences of species.
\(\Delta=\frac{\sum\sum_{i<j}~~\omega_ijX_iX_j}{n(n-1)/2}\) Taxonomic diveristy (3)
\(\Delta^*=\frac{\sum\sum_{i<j}~~\omega_ijX_iX_j}{\sum\sum_{i<j}~~X_iX_j}\) Taxonomic distinctness (4)
These equations give the index values for Taxonomic difference, and summation goes over species \(i\) and \(j\), and \(\omega\) are the taxonomic distances among taxa, \(X\) are species abundances, and \(n\) is the total abundance for a site.
For the tumor data, the distance of adjacent bins is set 1. For example, if the bins are set 5, then the distance between bin #1 and #5 is 4. If the numbers of occurrences for the 5 bins are (2,4,0,4,2) or (0,2,4,4,3), the former one has higher Taxonomic diversity than the latter one.
We use the function inferHeterogeneityPlus in the THindex to estimate the Tumor Heterogeneity.
library(THindex)
library(maftools)
# read maf data. The read.maf is functions of 'maftools'.
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read.maf(maf = laml.maf)
#> -Reading
#> -Validating
#> -Silent variants: 475
#> -Summarizing
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 5.800s elapsed (1.620s cpu)The function inferHeterogeneityPlus is modified from the function inferHeterogeneity of maftools. The input parameters are largely overlapped between the two functions.
The parameter index of inferHeterogeneityPlus controls the functions of TH index. If index = "diveristy", shannon and inverse Simpson indices are calculated(Eqs 1 and 2).
TCGA.ab.het <- inferHeterogeneityPlus(maf = laml, vafCol = "i_TumorVAF_WU", index = "diversity")
#> Processing TCGA-AB-3009..
#> Processing TCGA-AB-2807..
#> Processing TCGA-AB-2959..
#> Processing TCGA-AB-3002..
#> Processing TCGA-AB-2849..
knitr::kable(TCGA.ab.het$diveristy)| Tumor_Sample_Barcode | MATH | MedianAbsoluteDeviation | shannon | simpson |
|---|---|---|---|---|
| TCGA-AB-3009 | 9.045040 | 2.789054 | 0.9556732 | 1.783951 |
| TCGA-AB-2807 | 12.086586 | 3.790000 | 1.1171469 | 2.285714 |
| TCGA-AB-2959 | 28.492504 | 8.060000 | 1.5206479 | 3.699301 |
| TCGA-AB-3002 | 8.207392 | 2.447382 | 0.9532563 | 1.860759 |
| TCGA-AB-2849 | 20.588348 | 5.170000 | 1.4956158 | 3.571429 |
If index = "taxonomic", Taxonomic diversity and Taxonomic distinctness are calculated (Eqs 3 and Eqs 4).
TCGA.ab.het1 <- inferHeterogeneityPlus(maf = laml, vafCol = "i_TumorVAF_WU", index = "taxonomic")
#> Processing TCGA-AB-3009..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-2807..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-2959..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-3002..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-2849..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
knitr::kable(TCGA.ab.het1$diveristy)| Tumor_Sample_Barcode | MATH | MedianAbsoluteDeviation | Delt | Dstar |
|---|---|---|---|---|
| TCGA-AB-3009 | 9.045040 | 2.789054 | 3.151515 | 6.960630 |
| TCGA-AB-2807 | 12.086586 | 3.790000 | 3.373188 | 5.746914 |
| TCGA-AB-2959 | 28.492504 | 8.060000 | 4.454546 | 5.839378 |
| TCGA-AB-3002 | 8.207392 | 2.447382 | 2.676191 | 5.509804 |
| TCGA-AB-2849 | 20.588348 | 5.170000 | 4.726316 | 6.236111 |