Comparison of flat file reading functions in R

reading large flat text files is one of the most common tasks in data science and in quantitative genomic applications. There are always multiple ways of performing a task in any programming language and reading files in R is no exception.

Two aspects are worth considering when benchmarking file reading functions in R: elapsed time and memory footprint. In this case, we only focus on time to read and not in memmory usage

read with fread

init<-Sys.time()
f1<-fread(file="genotypes_Hypor_JP_9010_2012-09-29.dat")
fin<-Sys.time()
print(fin-init)

## Time difference of 1.329033 secs

dim(f1)

## [1] 9010    3

class(f1)

## [1] "data.table" "data.frame"

Now let’s compare to read_delim

init<-Sys.time()
f2<-read_delim("genotypes_Hypor_JP_9010_2012-09-29.dat",delim = " ",col_names = FALSE)
fin<-Sys.time()
print(fin-init)

## Time difference of 0.4439561 secs

dim(f2)

## [1] 9010    3

class(f2)

## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

Now, use the traditional read.table

init<-strt_time <- Sys.time()
f3<-read.table("genotypes_Hypor_JP_9010_2012-09-29.dat",header = FALSE)
fin<-Sys.time()
print(fin-init)

## Time difference of 1.142066 mins

dim(f3)

## [1] 9010    3

class(f3)

## [1] "data.frame"

Comparison of flat file reading functions in R

read with fread

conclusion