title: “R Import Export Tutorial & Examples” output: html_notebook —

Starting points

Working directory

Determin:

getwd()

getwd()

## [1] "C:/Users/sasch/dev.local/r.dev/rstudio/000_Tutorials"

Set:

setwd("C:/Users/sasch/dev.local/r.dev/rstudio/000_Tutorial")

Set Default WD in RStudio

A default working directory is a folder where RStudio goes, every time you open it. You can change the default working directory from RStudio menu under: Tools –> Global options –> click on “Browse” to select the default working directory you want.

Prepare Excel Data

Read https://www.sthda.com/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r

Regular Read tabular data into R

The R base function read.table() is a general function that can be used to read a file in table format. The data will be imported as a data frame. Note that, depending on the format of your file, several variants of read.table() are available to make your life easier, including read.csv(), read.csv2(), read.delim() and read.delim2().

read.csv(): for reading “comma separated value” files (“.csv”).
read.csv2(): variant used in countries that use a comma “,” as decimal point and a semicolon “;” as field separators.
read.delim(): for reading “tab-separated value” files (“.txt”). By default, point (“.”) is used as decimal points.
read.delim2(): for reading “tab-separated value” files (“.txt”). By default, comma (“,”) is used as decimal points.

read.table(file, header = FALSE, sep = "", dec = ".")

Read “comma separated value” files (“.csv”)

read.csv(file, header = TRUE, sep = ",", dec = ".", ...)

Or use read.csv2: variant used in countries that use a comma as decimal point and a semicolon as field separator.

read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)

Read TAB delimited files

read.delim(file, header = TRUE, sep = "\t", dec = ".", ...)
read.delim2(file, header = TRUE, sep = "\t", dec = ",", ...)

file: the path to the file containing the data to be imported into R.
sep: the field separator character. “” is used for tab-delimited file.
header: logical value. If TRUE, read.table() assumes that your file has a header row, so row 1 is the name of each column. If that’s not the
case, you can add the argument header = FALSE.
dec: the character used in the file for decimal points.

Reading a local file

To import a local .txt or a .csv file, the syntax would be:

# Read a txt file, named "mtcars.txt"
my_data <- read.delim("mtcars.txt")

# Read a csv file, named "mtcars.csv"
my_data <- read.csv("mtcars.csv")

# Read a txt file
my_data <- read.delim(file.choose())

# Read a csv file
my_data <- read.csv(file.choose())

Surpress factoring

If your data contains column with text, R may assume that columns as a factors or grouping variables (e.g.: “good”, “good”, “bad”, “bad”, “bad”). If you don’t want your text data to be converted as factors, add stringsAsFactor = FALSE in read.delim(), read.csv() and read.table() functions. In this case, the data frame columns corresponding to string in your text file will be character.

my_data <- read.delim(file.choose(), 
                      stringsAsFactor = FALSE)

If your field separator is for example “|”, it’s possible use the general function read.table() with additional arguments:

my_data <- read.table(file.choose(), 
                      sep ="|", header = TRUE, dec =".")

Fast read usind readr

https://readr.tidyverse.org/

The easiest way to get readr is to install the whole tidyverse:

install.packages("tidyverse")

# Installing
install.packages("readr")
# Loading
library("readr")

The readr package contains functions for reading i) delimited files, ii) lines and iii) the whole file.

Functions for reading delimited files: txt|csv

The function read_delim()[in readr package] is a general function to import a data table into R. Depending on the format of your file, you can also use:

read_csv(): to read a comma (“,”) separated values
read_csv2(): to read a semicolon (“;”) separated values
read_tsv(): to read a tab separated (“”) values
read_fwf(): fixed width files
read_table(): tabular files where colums are separated by white-space.
read_log(): web log files

The simplified format of these functions are, as follow:

# General function
read_delim(file, delim, col_names = TRUE)
# Read comma (",") separated values
read_csv(file, col_names = TRUE)
# Read semicolon (";") separated values
# (this is common in European countries)
read_csv2(file, col_names = TRUE)
    
# Read tab separated values
read_tsv(file, col_names = TRUE)

To import a local .txt or .csv files, the syntax would be:

# Read a txt file, named "mtcars.txt"
my_data <- read_tsv("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read_csv("mtcars.csv")


# Read a txt file
my_data <- read_tsv(file.choose())
# Read a csv file
my_data <- read_csv(file.choose())

If your field separator is for example “|”, it’s possible to use the general function read_delim(), which reads in files with a user supplied delimiter:

my_data <- read_tsv("http://www.sthda.com/upload/boxplot_format.txt")
head(my_data)

In the case of parsing problems

my_data <- read_csv(file.choose())
problems(my_data)

Specify column types

There are different types of data: numeric, character, logical, … readr tries to guess automatically the type of data contained in each column. You might see a lot of warnings in a situation where readr has guessed the column type incorrectly. To fix these problems you can use the additional arguments col_type() to specify the data type of each column. The following column types are available:

col_integer(): to specify integer (alias = “i”)
col_double(): to specify double (alias = “d”).
col_logical(): to specify logical variable (alias = “l”)
col_character(): leaves strings as is. Don’t convert it to a factor (alias = “c”).
col_factor(): to specify a factor (or grouping) variable (alias = “f”)
col_skip(): to ignore a column (alias = “-” or “_“)
col_date() (alias = “D”), col_datetime() (alias = “T”) and col_time() (“t”) to specify dates, date times, and times.

An example is as follow (column x is an integer (i) and column treatment = “character” (c):

read_csv("my_file.csv", col_types = cols(
  x = "i", # integer column
  treatment = "c" # character column
))

Reading lines from a file

Function: read_lines().

read_lines(file, skip = 0, n_max = -1L)

file: file path
skip: Number of lines to skip before reading data
n_max: Numbers of lines to read. If n is -1, all lines in file will be read.

# Demo file
my_file <- system.file("extdata/mtcars.csv", package = "readr")
# Read lines
my_data <- read_lines(my_file)
head(my_data)

Read whole file

read_file(file)

# Demo file
my_file <- system.file("extdata/mtcars.csv", package = "readr")
# Read whole file
read_file(my_file)

Import Excel

http://www.sthda.com/english/wiki/reading-data-from-excel-files-xls-xlsx-into-r

Copying data from Excel and import into R

Windows:

Open the Excel file containing your data: select and copy the data (ctrl + c)
Type the R code below to import the copied data from the clipboard into R and store the data in a data frame (my_data):

my_data <- read.table(file = "clipboard", 
                      sep = "\t", header=TRUE)

OSX:
1. Select and copy the data (Cmd + c) 2. Use the function pipe(pbpaste) to import the data you’ve copied (with Cmd + c):

my_data <- read.table(pipe("pbpaste"), sep="\t", header = TRUE)

Read excel

install.packages("readxl")
library("readxl")

Read both xls and xlsx files

# Loading
library("readxl")

# xls files
my_data <- read_excel("my_file.xls")

# xlsx files
my_data <- read_excel("my_file.xlsx")


# Or interactively: 
my_data <- read_excel(file.choose())

Specify sheet with a number or name:

# Specify sheet by its name
my_data <- read_excel("my_file.xlsx", sheet = "data")
  
# Specify sheet by its index
my_data <- read_excel("my_file.xlsx", sheet = 2)

Case of missing values: NA (not available). If NAs are represented by something (example: “—”) other than blank cells, set the na argument:

my_data <- read_excel("my_file.xlsx", na = "---")

Exporting

http://www.sthda.com/english/wiki/exporting-data-from-r

Using readr

Installing and loading readr: install.packages(“readr”)
readr functions for writing data: write_tsv(), write_csv()
Writing data to a file

# Loading mtcars data
data("mtcars")
library("readr")
# Writing mtcars data to a tsv file
write_tsv(mtcars, path = "mtcars.txt")
# Writing mtcars data to a csv file
write_csv(mtcars, path = "mtcars.csv")

To Excel using xlsx

library("xlsx")
# Write the first data set in a new workbook
write.xlsx(USArrests, file = "myworkbook.xlsx",
      sheetName = "USA-ARRESTS", append = FALSE)
# Add a second data set in a new worksheet
write.xlsx(mtcars, file = "myworkbook.xlsx", 
           sheetName="MTCARS", append=TRUE)

To rdata format

Save one object to a file: saveRDS(object, file), readRDS(file)
Save multiple objects to a file: save(data1, data2, file), load(file)
Save your entire workspace: save.image(), load()

Saving and restoring one single R object:

# Save a single object to a file
saveRDS(mtcars, "mtcars.rds")
# Restore it under a different name
my_data <- readRDS("mtcars.rds")

Saving and restoring one or more R objects:

# Save multiple objects
save(data1, data2, file = "data.RData")
# To load the data again
load("data.RData")

Saving and restoring your entire workspace:

# Save your workspace
save.image(file = "my_work_space.RData")
# Load the workspace again
load("my_work_space.RData")