At the end of the lesson, the students are expected to:
In this section, you will learn how to prepare your data to avoid errors during the importation of a file into R.
1.) Row and column names:
Use the first row as column headers (or column names). Generally, columns represent variables.
Use the first column as row names. Generally rows represent observations.
Each row name should be unique, so remove duplicated names.
Column names should be compatible with R naming conventions. As illustrated below, our data contains some issues that should be fixed before importing:
Figure 1
2.) Naming conventions
Avoid names with blank spaces. Good column names: Long_jump or Long.jump. Bad column name: Long jump.
Avoid names with special symbols: ?, $, *, +, #, (, ), -, /, }, {, |, >, < etc.
Avoid beginning variable names with a number. Use letter instead. Good column names: sport_100m or x100m. Bad column name: 100m
Column names must be unique. Duplicated names are not allowed.
R is case sensitive. This means that Name is different from Name or NAME.
Avoid blank rows in your data.
Delete any comments in your file
Replace missing values by NA (for not available)
If you have a column containing date, use the four digit format. Good format: 01/01/2016. Bad format: 01/01/16
3.) Final File:
Our finale file should look like this:
Figure 2
4.) Save the file.
In this section, you’ll learn how to import data from .txt (tab-separated values) and .csv (comma-separated values) file formats into R.
The R base function read.table() is a general function that can be used to read a file in table format. The data will be imported as a data frame.
Note that, depending on the format of your file, several variants of read.table() are available to make your life easier, including read.csv(), read.csv2(), read.delim() and read.delim2().
The simplified format of these functions are, as follow:
# Read tabular data into R
read.table(file, header = FALSE, sep = "", dec = ".")
# Read "comma separated value" files (".csv")
read.csv(file, header = TRUE, sep = ",", dec = ".", ...)
# Or use read.csv2: variant used in countries that
# use a comma as decimal point and a semicolon as field separator.
read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)
# Read TAB delimited files
read.delim(file, header = TRUE, sep = "\t", dec = ".", ...)
read.delim2(file, header = TRUE, sep = "\t", dec = ",", ...)
To import a local .txt or a .csv file, the syntax would be:
# Read a txt file, named "mtcars.txt"
my_data <- read.delim("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read.csv("mtcars.csv")
The above R code, assumes that the file “mtcars.txt” or “mtcars.csv” is in your current working directory. To know your current working directory, type the function getwd() in R console.
It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:
or you can go to : Environment > Import Dataset > From text. That is,
Figure 3
You can import data from excel though : Environment > Import Dataset > From Excel > Choose the file
Figure 4
In case where your excel file contains many sheets, you can specify the sheet name that contains the data that you want to import in R.
You can also specify the Range of cells to include in the import process.
The process of importing data from SPSS, Stata and SAS can also be done through the Environment > Import Dataset > From SPSS or SAS or Stata.
Here, you’ll learn how to export data from R to txt, csv, Excel (xls, xlsx) and R data file formats.
Figure 4
You can use the base functions or the functions from the readr package to write data from R to txt.
R base functions for writing data: write.table(), write.csv(), write.csv2()
readr functions for writing data: write_tsv(), write_csv()
1.) Use the R base functions
# Loading mtcars data
data("mtcars")
# Write data to txt file: tab separated values
# sep = "\t"
write.table(mtcars, file = "mtcars.txt", sep = "\t",
row.names = TRUE, col.names = NA)
# Write data to csv files:
# decimal point = "." and value separators = comma (",")
write.csv(mtcars, file = "mtcars.csv")
# Write data to csv files:
# decimal point = comma (",") and value separators = semicolon (";")
write.csv2(mtcars, file = "mtcars.csv")2.) Using the readr package functions
In order to do this, we need to install the package : xlsx
install.packages(“writexl”)
library(writexl)
# Write the first data set in a new workbook
write_xlsx(USArrests,"myworkbook.xlsx")You can also give it a named list of data frames, in which case each data frame becomes a sheet in the xlsx file:
Save one object to a file: saveRDS(object, file), readRDS(file)
Save multiple objects to a file: save(data1, data2, file), load(file)
Save your entire workspace: save.image(), load()
1.) Saving and restoring one single R object:
# Save a single object to a file
saveRDS(mtcars, "mtcars.rds")
# Restore it under a different name
my_data <- readRDS("mtcars.rds")2.) Saving and restoring one or more R objects:
# Save multiple objects
save(data1, data2, file = "data.RData")
# To load the data again
load("data.RData")
3.) Saving and restoring your entire workspace:
# Save your workspace
save.image(file = "my_work_space.RData")
# Load the workspace again
load("my_work_space.RData")