readr VS Base R

The Base R functions are the built-in functions that are already available when you download R and RStudio. Therefore, in order to use Base R functions, you do not need to install or load any packages before using them.

Compared to the equivalent base functions, readr functions are around 10× faster. In order to use readr package functions, you need to install and load the readr package using the following commands:

  • install.packages(“readr”)

  • library(readr)

Reading csv Files with readr

Let’s read the iris.csv file using read_csv function.

Make sure packages are loaded.

#install.packages("readr")
library(readr)

Load the csv file into a dataframe

iris <- read_csv("iris.csv")

Reading Excel Files with readxl

To read in Excel data with readxl you can use the read_excel() function.

Make sure packages are loaded

#install.packages("readxl")
library(readxl)

Find existing sheets in the excel file

excel_sheets("iris.xlsx")
## [1] "iris"

Load the xlsx file into a dataframe

iris2 <- read_excel("iris.xlsx", sheet = "iris")

loading data by skipping a row

iris3 <- read_excel("iris.xlsx", sheet = "iris", skip = 1)

Reading Statistical Data with foreign

The foreign package provides functions that help you read data files from other statistical software such as SPSS, SAS, Stata, and others into R.

To import an SPSS data file (.sav) into R, we need to call the foreign library and then use the read.spss() function. Similarly, if we want to import a STATA data file, the corresponding function will be read.dta().

Here is an example of importing an SPSS data file.

#install.packages("foreign")
library(foreign)

Read spss data file and store it as a dataframe

iris_spss <- read.spss("iris.sav", to.data.frame = TRUE)

Reading From Databases

One of the best approaches for working with data from a database is to export the data to a text file and then import the text file into R.

Scraping Data from Web

Reading online .csv or .txt file is just like reading tabular data. The only difference is, we need to provide the URL of the data instead of the file name as follows:

csv Files

Save the URL of the online csv file

url <- "https://data.gov.au/dataset/29128ebd-dbaa-4ff5-8b86-d9f30de56452/resource/cf663ed1-0c5e-497f-aea9-e74bfda9cf44/download/otptimeseriesweb.csv"

Use read.csv to import

ontime_data <- read.csv(url, stringsAsFactors = FALSE)

Scraping HTML Table Data

Sometimes, web pages contain several HTML tables and we may want to read the data from that HTML table. The simplest approach to scraping HTML table data directly into R is by using the rvest package. Recall that. HTML tables are contained within tags; therefore, to extract the tables, we need to use the html_nodes() function to select the

nodes.

#install.packages("rvest")
library(rvest)

We will use read_html to locate the URL of the HTML table. When we use read_html, all table nodes that exist on the webpage will be captured.

# births <- read_html("https://www.ssa.gov/oact/babynames/numberUSbirths.html")

In this example, using the length function we can see that the html_nodes captures 1 HTML table.

# length(html_nodes(births, "table"))

In this example the webpage included only one table and this first table on the webpage is the place where our data is located, thus, we will select the first element of the html_nodes.

# births_data<- html_table(html_nodes(births, "table")[[1]])

Exporting Data to csv using readr

Make sure readr is loaded

#library(readr)

Write to a csv file in the working directory

# write_csv(df, path = "cars_csv2")

Write to a csv and save in a different directory (i.e., ~/Desktop)

# write_csv(df, path = "~/Desktop/export_csv2")

Write to a csv file without column names

# write_csv(df, path = "export_csv2", col_names = FALSE)

Write to a txt file in the working directory

# write_delim(df, path = "export_txt2")

Exporting Data to Excel Files with xlsx

Load the library

# library(xlsx)

Write to a .xlsx file in the working directory

# write.xlsx(df, file = "cars.xlsx")

Write to a .xlsx file without row names in the working directory

# write.xlsx(df, file = "cars.xlsx", row.names = FALSE)

In some cases we may wish to create a .xlsx file that contains multiple data frames.

In this case you can just create an empty workbook and save the data frames on separate worksheets within the same workbook.

Let’s try it using the built in mtcars and iris data sets. First, we will create an empty workbook using createWorkbook() function.

Create empty workbook using createWorkbook() function

# multiple_df <- createWorkbook()

Create worksheets within workbook

# car_df <- createSheet(wb = multiple_df, sheetName = "Cars")
# iris_df <- createSheet(wb = multiple_df, sheetName = "Iris")

We will use addDataFrame() to add the data frames into the worksheets as follows.

# addDataFrame(x = mtcars, sheet = car_df)

# addDataFrame(x = iris, sheet = iris_df)

Save as a .xlsx file in the working directory

# saveWorkbook(multiple_df, file = "combined.xlsx")

Saving Data as an R Object File

Sometimes we may need to save data or other R objects outside of the workspace or may want to store, share, or transfer between computers.

Basically, we can use the .rda or .RData file types when we want to save several, or all, objects and functions that exist in the global environment.

On the other hand, if we only want to save a single R object such as a data frame, function, or statistical model results, it is best to use the .rds file type.

Still, we can use .rda or .RData to save a single object but the benefit of .rds is it only saves a representation of the object and not the name whereas .rda and .RData save both the object and its name.

As a result, with .rds the saved object can be loaded into a named object within R that is different from the name it had when originally saved.

Generate random numbers from uniform and normal distribution and assign them to objects named x and y, respectively.

# x <- runif(10)
# y <- rnorm(10, 0, 1)

Save both objects in .RData format in the working directory

# save(x, y, file = "xy.RData")

Also, the save.image() function will save your all current workspace as .RData in the global environment.

# save.image()

Save a single object to file using saveRDS()

# saveRDS(x, "x.rds")