Introduction

Readr’s primary function is to load plain-text rectangular files into R data frames. It’s also able to write data frames to different flat files and it includes additional functions to address different data types, unconventional delimiters, encoding, etc.

In this vignette, we will explain how to install and load the package and then explore its basic functionality by loading a CSV file into R. Finally, we will save our data frame as a TSV file.

By default, Readr will try to assign the appropriate data type to each column. In our example with will import a subset of the columns using cols_only and we’ll use the col_* functions to specify each data type.

Installation

Readr can be installed along with the complete set of Tidyverse packages by using the command:

install.packages("tidyverse")

Alternativley, it can be installed by itself by running:

install.packages("readr")

Loading and Basic Use

Before using the package it should be loaded. As before, it can be loaded along with the rest of the Tydiverse packages using the command below:

library(tidyverse)
## -- Attaching packages ------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1       v purrr   0.3.2  
## v tibble  2.1.1       v dplyr   0.8.0.1
## v tidyr   0.8.3       v stringr 1.4.0  
## v readr   1.3.1       v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Or it can be loaded independently using:

library(readr)

Readr is designed to import 7 types of flat files and has a specific function for each:

read_csv(): comma separated (CSV) files
read_tsv(): tab separated files
read_delim(): general delimited files
read_fwf(): fixed width files
read_table(): tabular files where columns are separated by white-space.
read_log(): web log files

By default, Readr will try to assign the appropiate data type to each column.

col_logical(): containing only T, F, TRUE or FALSE.
col_integer(): integers.
col_double(): doubles.
col_character(): strings, and everything else.
col_factor(levels, ordered):a fixed set of values.
col_date(format = "“): with the locale’s date_format.
col_time(format =”“): with the locale’s time_format.
col_datetime(format =”“): ISO8601 date times.
col_number() numbers containing the grouping_mark . col_skip(): don’t import this column.
col_guess(): parse using the”best" type based on the input.

Basic import with default options

Lets import our CSV file using Readr’s default options.

fifa <- read_csv(
  file = "fifa.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   Row = col_double(),
##   ID = col_double(),
##   Age = col_double(),
##   Overall = col_double(),
##   Potential = col_double(),
##   Special = col_double(),
##   `International Reputation` = col_double(),
##   `Weak Foot` = col_double(),
##   `Skill Moves` = col_double(),
##   `Jersey Number` = col_double(),
##   Crossing = col_double(),
##   Finishing = col_double(),
##   HeadingAccuracy = col_double(),
##   ShortPassing = col_double(),
##   Volleys = col_double(),
##   Dribbling = col_double(),
##   Curve = col_double(),
##   FKAccuracy = col_double(),
##   LongPassing = col_double(),
##   BallControl = col_double()
##   # ... with 24 more columns
## )
## See spec(...) for full column specifications.

As shown above, Readr will output the result of the operation along with the data type for each variable. In some cases, the output is truncated so we’ll use the spec() function to get the full list.

spec(fifa)
## cols(
##   Row = col_double(),
##   ID = col_double(),
##   Name = col_character(),
##   Age = col_double(),
##   Photo = col_character(),
##   Nationality = col_character(),
##   Flag = col_character(),
##   Overall = col_double(),
##   Potential = col_double(),
##   Club = col_character(),
##   `Club Logo` = col_character(),
##   Value = col_character(),
##   Wage = col_character(),
##   Special = col_double(),
##   `Preferred Foot` = col_character(),
##   `International Reputation` = col_double(),
##   `Weak Foot` = col_double(),
##   `Skill Moves` = col_double(),
##   `Work Rate` = col_character(),
##   `Body Type` = col_character(),
##   `Real Face` = col_character(),
##   Position = col_character(),
##   `Jersey Number` = col_double(),
##   Joined = col_character(),
##   `Loaned From` = col_character(),
##   `Contract Valid Until` = col_character(),
##   Height = col_character(),
##   Weight = col_character(),
##   LS = col_character(),
##   ST = col_character(),
##   RS = col_character(),
##   LW = col_character(),
##   LF = col_character(),
##   CF = col_character(),
##   RF = col_character(),
##   RW = col_character(),
##   LAM = col_character(),
##   CAM = col_character(),
##   RAM = col_character(),
##   LM = col_character(),
##   LCM = col_character(),
##   CM = col_character(),
##   RCM = col_character(),
##   RM = col_character(),
##   LWB = col_character(),
##   LDM = col_character(),
##   CDM = col_character(),
##   RDM = col_character(),
##   RWB = col_character(),
##   LB = col_character(),
##   LCB = col_character(),
##   CB = col_character(),
##   RCB = col_character(),
##   RB = col_character(),
##   Crossing = col_double(),
##   Finishing = col_double(),
##   HeadingAccuracy = col_double(),
##   ShortPassing = col_double(),
##   Volleys = col_double(),
##   Dribbling = col_double(),
##   Curve = col_double(),
##   FKAccuracy = col_double(),
##   LongPassing = col_double(),
##   BallControl = col_double(),
##   Acceleration = col_double(),
##   SprintSpeed = col_double(),
##   Agility = col_double(),
##   Reactions = col_double(),
##   Balance = col_double(),
##   ShotPower = col_double(),
##   Jumping = col_double(),
##   Stamina = col_double(),
##   Strength = col_double(),
##   LongShots = col_double(),
##   Aggression = col_double(),
##   Interceptions = col_double(),
##   Positioning = col_double(),
##   Vision = col_double(),
##   Penalties = col_double(),
##   Composure = col_double(),
##   Marking = col_double(),
##   StandingTackle = col_double(),
##   SlidingTackle = col_double(),
##   GKDiving = col_double(),
##   GKHandling = col_double(),
##   GKKicking = col_double(),
##   GKPositioning = col_double(),
##   GKReflexes = col_double(),
##   `Release Clause` = col_character()
## )

There are more variables than we need and many of them have a data type which isnt’ optimal. Next we’ll import a subset of variables and manually assign data types.

Filtered import

fifa_subset <- read_csv(
  file = "fifa.csv",
  cols_only(
    Name = col_character(),
    Age = col_integer(),
    Nationality = col_factor(),
    Club = col_factor(),
    Joined = col_date(format = "%b %d, %Y")),
  col_names = TRUE
  )

We now have a data frame with our 5 variables, each with an appropiate data type.

head(fifa_subset)

Saving data frames to file

Perhaps we have finished with out analysis and we’d like to share our data with a colleague. We can save our data to a file using Readr’s write_* functions.

In this case, we can save it as a tab delimted file by using the following code.

write_tsv(
  fifa_subset,
  path = "fifa_subset.tsv"
)

References

https://readr.tidyverse.org/

https://readr.tidyverse.org/reference/cols.html

https://www.kaggle.com/karangadiya/fifa19

https://www.ibm.com/support/knowledgecenter/en/SSQHWE_10.1.0/com.ibm.ondemand.mp.doc/arsa0257.htm