Binomial distribution for required depth in next generation sequecing

Establishing Criteria for Depth of Sequencing

The required depth of coverage can be estimated based on the required lower limit of detection, the quality of the reads, and tolerance for false-positive or false-negative results.

For example, for a given proportion of mutant alleles, the probability of detecting a minimum number of alleles can be determined using the binomial distribution equation:

\[ P(x)=\frac{n!}{x!(n-x)!} p^x(1-p)^{n-x} \] Where \(P(x)\) is the probability of \(x\) variant reads, \(x\) is the number of variant reads, \(n\) is the number of total reads, and \(p\) is the probability of detecting a variant allele (ie, the proportion of mutant alleles in the sample).

By calculating the binomial probability for a given number of trials and probability of successes, one can define the binomial distribution.

For example, for a given mutant allele frequency of 5% and 250 reads, the probability of detecting four or fewer mutations would be \[0.457\%\] .

sum(dbinom(0:4,250,0.05))

## [1] 0.004570736

Therefore, the probability of detecting of five or more mutations is 1 minus 0.457% (or 99.543%).

1-sum(dbinom(0:4,250,0.05))

## [1] 0.9954293

Thus, if the threshold for a variant call were set at five or more reads, the probability of a false negative would be <0.5% provided a minimum of 250 reads were obtained. For clinical NGS panels, a minimal depth of coverage of 250 reads per tested amplicon or target is strongly recommended.

Binomial distribution for required depth in next generation sequecing

nordhuang from HaploX

2019年9月11日

Establishing Criteria for Depth of Sequencing