Knitted HTML of R Markdown To Show Data Structures for Common BIN Filetypes

Part I: Fasta File Import

myFA <- read.FASTA("MT103168.fasta") # This line of code reads the fasta file into R and stores it as a list of 1 sequence in binary format.
head(myFA) # This line of code returns the first six elements of the data list.

## 1 DNA sequence in binary format stored in a list.
## 
## Sequence length: 1560 
## 
## Label:
## MT103168.1 Bifidobacterium longum strain BB536 cell division...
## 
## Base composition:
##     a     c     g     t 
## 0.156 0.319 0.289 0.236 
## (Total: 1.56 kb)

str(myFA)  # This line of code returns the structure of the data in a compact format.

## List of 1
##  $ MT103168.1 Bifidobacterium longum strain BB536 cell division protein FtsW (rodA) gene, complete cds: raw [1:1560] 88 18 48 88 ...
##  - attr(*, "class")= chr "DNAbin"

Part II: FASTQ File Import

myFQ <- read.fastq("ERR1072710.fastq")  # This line of code reads the fastq file into R and stores it as a list of 3 sequences in binary format.
head(myFQ) # This line of code returns the first six elements of the data list.

## 3 DNA sequences in binary format stored in a list.
## 
## Mean sequence length: 183.667 
##    Shortest sequence: 146 
##     Longest sequence: 259 
## 
## Labels:
## ERR1072710.1 10317.000001315_0 length=151
## ERR1072710.2 10317.000001315_1 length=116
## ERR1072710.4 10317.000001315_3 length=151
## 
## Base composition:
##     a     c     g     t 
## 0.318 0.208 0.254 0.219 
## (Total: 551 bases)

str(myFQ)  # This line of code returns the structure of the data in a compact format.

## List of 3
##  $ ERR1072710.1 10317.000001315_0 length=151: raw [1:146] 18 18 88 88 ...
##  $ ERR1072710.2 10317.000001315_1 length=116: raw [1:259] 18 28 18 28 ...
##  $ ERR1072710.4 10317.000001315_3 length=151: raw [1:146] 28 28 88 28 ...
##  - attr(*, "class")= chr "DNAbin"
##  - attr(*, "QUAL")=List of 7
##   ..$ ERR1072710.1 10317.000001315_0 length=151: num [1:11] 32 38 51 34 32 34 32 34 32 38 ...
##   ..$ ERR1072710.2 10317.000001315_1 length=116: num [1:11] 30 30 30 30 30 30 30 30 30 30 ...
##   ..$ ERR1072710.4 10317.000001315_3 length=151: num [1:42] 10 36 49 49 16 15 22 17 22 16 ...
##   ..$ NA                                       : num [1:70] 51 32 34 38 38 32 38 38 38 51 ...
##   ..$ NA                                       : num [1:67] 30 30 30 30 30 30 30 30 30 30 ...
##   ..$ NA                                       : num [1:11] 32 51 51 32 38 32 38 34 34 51 ...
##   ..$ NA                                       : num [1:11] 30 30 30 30 30 30 30 30 30 30 ...

Part III: VCF Imports

myVCF<- read.table("TwoVariants.vcf") # This line of code reads in the vcf file and stores it as a dataframe.
myLINES<- read.csv("TwoVariants.vcf", sep="\n") # This line of code reads in the vcf file and stores it in csv format.
colnames(myVCF)<-c('CHROM','tPOS','tID','tREF','tALT','tQUAL','tFILTER','tINFO','tFORMAT','t__NONE__')  # This line of code renames the columns from the myVCF dataframe with the column headers from the myLines csv.
head(myVCF)  # This line of code returns the first six elements of the dataframes.

##               CHROM tPOS tID tREF tALT tQUAL tFILTER                tINFO
## 1 NZ_BCYL01000006.1   29   .    A    G     .       .  AC=84;AF=1.0;SB=0.0
## 2 NZ_BCYL01000006.1  145   .    A    G     .       . AC=114;AF=1.0;SB=0.0
##          tFORMAT                  t__NONE__
## 1 GT:AC:AF:SB:NC  1:84:1.0:0.0:+G=37,-G=47,
## 2 GT:AC:AF:SB:NC 1:114:1.0:0.0:+G=42,-G=72,

str(myVCF)   # This line of code returns the structure of the data in a compact format.

## 'data.frame':    2 obs. of  10 variables:
##  $ CHROM    : chr  "NZ_BCYL01000006.1" "NZ_BCYL01000006.1"
##  $ tPOS     : int  29 145
##  $ tID      : chr  "." "."
##  $ tREF     : chr  "A" "A"
##  $ tALT     : chr  "G" "G"
##  $ tQUAL    : chr  "." "."
##  $ tFILTER  : chr  "." "."
##  $ tINFO    : chr  "AC=84;AF=1.0;SB=0.0" "AC=114;AF=1.0;SB=0.0"
##  $ tFORMAT  : chr  "GT:AC:AF:SB:NC" "GT:AC:AF:SB:NC"
##  $ t__NONE__: chr  "1:84:1.0:0.0:+G=37,-G=47," "1:114:1.0:0.0:+G=42,-G=72,"

Knitted HTML of R Markdown To Show Data Structures for Common BIN Filetypes

John Beliveau

2024-09-02

Part I: Fasta File Import

Part II: FASTQ File Import

Part III: VCF Imports