QUESTION: How can I create a random sequence of DNA?

A random sequence of DNA can be made using the 4 bases found in DNA - A, C, T, G.

This is done using the fucntion, sample().

Setting up the data

bases <- c("A", "C", "T", "G")

Making the sequence

sample has an argument called replace, which should be set to TRUE. The means that when a base is used from the vector, it can be used again in the sequence, or, it is “replaced” in the dataset.

Here we will make a sequence of size 10.

seq <- sample(x=bases, size=10, replace=TRUE)
seq
##  [1] "T" "C" "G" "T" "G" "T" "G" "T" "T" "C"

This is returned as a vector, but it is more useful to see it as a string, so the function paste() is used to concatenate the characters. Be sure to set collapse to "" (empty string) to ensure that the characters are not separated.

seq <- paste(seq, collapse = "")
seq
## [1] "TCGTGTGTTC"

Additional Reading

For more information on this topic, see

TODO: find one resource related to this topic, such as those found on https://www.statmethods.net/index.html, https://r-charts.com/, http://www.r-tutor.com/, http://www.sthda.com/. (http://www.sthda.com/ is run by the author of ggpubr and has lots of resources for it).

Keywords

  • sequence
  • DNA
  • sample()
  • sample with(out) replacement
  • paste
  • concatenation