1. Concepts

In this package, we use the ecology methods to estimate the Tumor Heterogeneity(TH) based on their mutated loci of variant allele frequency(VAF).

The function inferHeterogeneityPlus estimates the TH based on two different methods in the Package vegan

  1. Diveristy indices

  2. Taxonomic indices

See also http://www.coastalwiki.org/wiki/Measurements_of_biodiversity for the concepts.

1.1 Diversity indices

Function diversity finds the most commonly used diversity indices.

\(H=-\sum_{i=1}^{S}~p_ilogp_i\) Shannon Index (1)

\(D=\frac{1}{\sum_{i=1}^{S}~p_i^2} \\\) Inverse Simpson Index(2)

Where \(p_i\) is the proportion of species \(i\) and \(S\) is the number of species. For the tumor data, the VAFs of mutated loci in the tumor were assigned to i-th of S bins, and the parameter \(p_i\) the proportion of mutated loci belonging to the bins. Here, we set the bin size to 10 (Parameter bin_size controls the number of bins), yielding enough information to represent the distribution for proprotions of VAFs.

1.2 Taxonomic diversity.

The simple diveristy above only consider species identity: all species are euqally different. In contrast, taxonimic diveristy indices judge the differences of species.

\(\Delta=\frac{\sum\sum_{i<j}~~\omega_ijX_iX_j}{n(n-1)/2}\) Taxonomic diveristy (3)

\(\Delta^*=\frac{\sum\sum_{i<j}~~\omega_ijX_iX_j}{\sum\sum_{i<j}~~X_iX_j}\) Taxonomic distinctness (4)

These equations give the index values for Taxonomic difference, and summation goes over species \(i\) and \(j\), and \(\omega\) are the taxonomic distances among taxa, \(X\) are species abundances, and \(n\) is the total abundance for a site.

For the tumor data, the distance of adjacent bins is set 1. For example, if the bins are set 5, then the distance between bin #1 and #5 is 4. If the numbers of occurrences for the 5 bins are (2,4,0,4,2) or (0,2,4,4,3), the former one has higher Taxonomic diversity than the latter one.

2. Example of codes

We use the function inferHeterogeneityPlus in the THindex to estimate the Tumor Heterogeneity.


library(THindex)
library(maftools)

# read maf data. The read.maf is functions of 'maftools'.
laml.maf <- system.file("extdata", "tcga_laml.maf.gz", package = "maftools")
laml <- read.maf(maf = laml.maf)
#> -Reading
#> -Validating
#> -Silent variants: 475 
#> -Summarizing
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 5.800s elapsed (1.620s cpu)

The function inferHeterogeneityPlus is modified from the function inferHeterogeneity of maftools. The input parameters are largely overlapped between the two functions.

The parameter index of inferHeterogeneityPlus controls the functions of TH index. If index = "diveristy", shannon and inverse Simpson indices are calculated(Eqs 1 and 2).

TCGA.ab.het <- inferHeterogeneityPlus(maf = laml, vafCol = "i_TumorVAF_WU", index = "diversity")
#> Processing TCGA-AB-3009..
#> Processing TCGA-AB-2807..
#> Processing TCGA-AB-2959..
#> Processing TCGA-AB-3002..
#> Processing TCGA-AB-2849..

knitr::kable(TCGA.ab.het$diveristy)
Tumor_Sample_Barcode MATH MedianAbsoluteDeviation shannon simpson
TCGA-AB-3009 9.045040 2.789054 0.9556732 1.783951
TCGA-AB-2807 12.086586 3.790000 1.1171469 2.285714
TCGA-AB-2959 28.492504 8.060000 1.5206479 3.699301
TCGA-AB-3002 8.207392 2.447382 0.9532563 1.860759
TCGA-AB-2849 20.588348 5.170000 1.4956158 3.571429

If index = "taxonomic", Taxonomic diversity and Taxonomic distinctness are calculated (Eqs 3 and Eqs 4).

TCGA.ab.het1 <- inferHeterogeneityPlus(maf = laml, vafCol = "i_TumorVAF_WU", index = "taxonomic")
#> Processing TCGA-AB-3009..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-2807..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-2959..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-3002..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names
#> Processing TCGA-AB-2849..
#> dimensions do not match between 'comm' and 'dis'
#> matched 'dis' labels by 'comm' names

knitr::kable(TCGA.ab.het1$diveristy)
Tumor_Sample_Barcode MATH MedianAbsoluteDeviation Delt Dstar
TCGA-AB-3009 9.045040 2.789054 3.151515 6.960630
TCGA-AB-2807 12.086586 3.790000 3.373188 5.746914
TCGA-AB-2959 28.492504 8.060000 4.454546 5.839378
TCGA-AB-3002 8.207392 2.447382 2.676191 5.509804
TCGA-AB-2849 20.588348 5.170000 4.726316 6.236111