Module 1 Lesson 4 :Data Import and Export in R

Lesson Objectives

At the end of the lesson, the students are expected to:

Learn how to import data to R from different sources.
Learn how to export data from R to different sources.

Importing Data into R

In this section, you will learn how to prepare your data to avoid errors during the importation of a file into R.

Preparing your Datasets

1.) Row and column names:

Use the first row as column headers (or column names). Generally, columns represent variables.
Use the first column as row names. Generally rows represent observations.
Each row name should be unique, so remove duplicated names.

Column names should be compatible with R naming conventions. As illustrated below, our data contains some issues that should be fixed before importing:

Figure 1

2.) Naming conventions

Avoid names with blank spaces. Good column names: Long_jump or Long.jump. Bad column name: Long jump.
Avoid names with special symbols: ?, $, *, +, #, (, ), -, /, }, {, |, >, < etc.
Avoid beginning variable names with a number. Use letter instead. Good column names: sport_100m or x100m. Bad column name: 100m
Column names must be unique. Duplicated names are not allowed.
R is case sensitive. This means that Name is different from Name or NAME.
Avoid blank rows in your data.
Delete any comments in your file
Replace missing values by NA (for not available)
If you have a column containing date, use the four digit format. Good format: 01/01/2016. Bad format: 01/01/16

3.) Final File:

Our finale file should look like this:

Figure 2

4.) Save the file.

Reading Data from TXT|CSV files

In this section, you’ll learn how to import data from .txt (tab-separated values) and .csv (comma-separated values) file formats into R.

R base functions for importing data

The R base function read.table() is a general function that can be used to read a file in table format. The data will be imported as a data frame.

Note that, depending on the format of your file, several variants of read.table() are available to make your life easier, including read.csv(), read.csv2(), read.delim() and read.delim2().

read.csv(): for reading “comma separated value” files (“.csv”).
read.csv2(): variant used in countries that use a comma “,” as decimal point and a semicolon “;” as field separators.
read.delim(): for reading “tab-separated value” files (“.txt”). By default, point (“.”) is used as decimal points.
read.delim2(): for reading “tab-separated value” files (“.txt”). By default, comma (“,”) is used as decimal points.

The simplified format of these functions are, as follow:

# Read tabular data into R
read.table(file, header = FALSE, sep = "", dec = ".")
# Read "comma separated value" files (".csv")
read.csv(file, header = TRUE, sep = ",", dec = ".", ...)
# Or use read.csv2: variant used in countries that 
# use a comma as decimal point and a semicolon as field separator.
read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)
# Read TAB delimited files
read.delim(file, header = TRUE, sep = "\t", dec = ".", ...)
read.delim2(file, header = TRUE, sep = "\t", dec = ",", ...)

file: the path to the file containing the data to be imported into R.
sep: the field separator character. “ is used for tab-delimited file.
header: logical value. If TRUE, read.table() assumes that your file has a header row, so row 1 is the name of each column. If that’s not the case, you can add the argument header = FALSE.
dec: the character used in the file for decimal points.

Reading a local file

To import a local .txt or a .csv file, the syntax would be:

# Read a txt file, named "mtcars.txt"
my_data <- read.delim("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read.csv("mtcars.csv")

The above R code, assumes that the file “mtcars.txt” or “mtcars.csv” is in your current working directory. To know your current working directory, type the function getwd() in R console.

It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:

or you can go to : Environment > Import Dataset > From text. That is,

Figure 3

Reading a file from internet

It’s possible to use the functions read.delim(), read.csv() and read.table() to import files from the web.

my_data <- read.delim("http://www.sthda.com/upload/boxplot_format.txt")
head(my_data)

Reading Data From Excel Files (xls|xlsx) into R

You can import data from excel though : Environment > Import Dataset > From Excel > Choose the file

Figure 4

In case where your excel file contains many sheets, you can specify the sheet name that contains the data that you want to import in R.
You can also specify the Range of cells to include in the import process.

Reading Data from SPSS, Stata and SAS

The process of importing data from SPSS, Stata and SAS can also be done through the Environment > Import Dataset > From SPSS or SAS or Stata.

Exporting Data from R

Here, you’ll learn how to export data from R to txt, csv, Excel (xls, xlsx) and R data file formats.

Figure 4

Writing Data from R to txt

You can use the base functions or the functions from the readr package to write data from R to txt.

R base functions for writing data: write.table(), write.csv(), write.csv2()
readr functions for writing data: write_tsv(), write_csv()

1.) Use the R base functions

# Loading mtcars data
data("mtcars")
# Write data to txt file: tab separated values
# sep = "\t"
write.table(mtcars, file = "mtcars.txt", sep = "\t",
            row.names = TRUE, col.names = NA)
# Write data to csv files:  
# decimal point = "." and value separators = comma (",")
write.csv(mtcars, file = "mtcars.csv")
# Write data to csv files: 
# decimal point = comma (",") and value separators = semicolon (";")
write.csv2(mtcars, file = "mtcars.csv")

2.) Using the readr package functions

# Loading mtcars data
data("mtcars")
library("readr")
# Writing mtcars data to a tsv file
write_tsv(mtcars, path = "mtcars.txt")
# Writing mtcars data to a csv file
write_csv(mtcars, path = "mtcars.csv")

Writing data from R to Excel files (xls|xlsx)

In order to do this, we need to install the package : xlsx

install.packages(“writexl”)

library(writexl)
# Write the first data set in a new workbook
write_xlsx(USArrests,"myworkbook.xlsx")

You can also give it a named list of data frames, in which case each data frame becomes a sheet in the xlsx file:

write_xlsx(list(iris = iris, cars = cars, mtcars = mtcars), "mydata.xlsx")

Saving data into R data format: RDATA and RDS

Save one object to a file: saveRDS(object, file), readRDS(file)
Save multiple objects to a file: save(data1, data2, file), load(file)
Save your entire workspace: save.image(), load()

1.) Saving and restoring one single R object:

# Save a single object to a file
saveRDS(mtcars, "mtcars.rds")
# Restore it under a different name
my_data <- readRDS("mtcars.rds")

2.) Saving and restoring one or more R objects:

# Save multiple objects
save(data1, data2, file = "data.RData")
# To load the data again
load("data.RData")

3.) Saving and restoring your entire workspace:

# Save your workspace
save.image(file = "my_work_space.RData")
# Load the workspace again
load("my_work_space.RData")