2024-10-17

What is Sequencing?

  • Sequencing is a tool used by many biology and computing fields to investigate genomes (typically human)
  • A DNA sample is processed such that a forward and reverse strand are isolated and indexed
  • Using computing you can manipulate this data to do any number of things

DNA Extraction

  • Any type of cell can be used
  • The cell and nucleus are broken apart to expose intracellular DNA
  • It is then extracted and purified to further isolate the DNA

Library Preparation

  • DNA is fragmented into smaller chunks, which are cheaper and more flexible to sequence
  • Known adapter sequences are added to allow the chunks to be used
  • These chunks of DNA are then processed through sequencing machines which “read” the DNA

Sequencing Specifics

  • Strands of DNA with adapters are put into machines which can index runs of nucleic acid
  • A read is known as the smaller fragments of DNA that are sequenced
  • A read has a length and orientation associated with it

Read Depth

  • Some sequencing technologies have the same read length, no matter fragment lengths
  • This means portions of DNA will be read multiple times
  • Depth coverage or read depth is the average amount of times that fragments are read over
  • The depth can be found with the following formula

Read Depth

Where

  • C = Coverage
  • N = Number of reads
  • L = Read length
  • G = Target genome size

\[ C = N * L / G \]

  • Always round down, as you cannot physically create extra reads

Errors

  • Because not all DNA is the same, and sequencing technologies are not perfect, some sections of the DNA will not get sequenced
  • The Lander/Waterman model is a good estimation of the probability of bases not getting sequenced based on genome size

Where C = Coverage

\[ P = e ^{(-C)} \]

Lander/Waterman model

fig <- plot_ly(data = coverage, x = myX, y = myY, 
               mode = 'lines', height = 325) %>%
layout(paper_bgcolor='#EDEDED', plot_bgcolor='#EDEDED')
fig
  • The higher the coverage, the less likely it is that a nucleic acid gets skipped over

Sequencing Uses

Sequencing has many uses including:

  • Associating genes with diseases
  • Diagnosing disorder and diseases
  • Identifying drug targets
  • Identifying microorganisms
  • Studying gene expression
  • Forensic use-cases