Extremely Short Intro to R… for Absolute Beginners

Introduction

This mini hands-on tutorial serves as an introduction to R, covering the following topics:

Online sources of information about R;
RStudio projects;
Packages and Repositories;
Getting help;
Basics of R syntax;
R operators: Assignment, Comparison, Logical and Arithmetic;
Loading and Saving data.

This document will guide you through the initial steps toward using R. RStudio will be used as the development platform on this course since it is a free software, available for Linux, Mac and Windows, integrating many functionalities that facilitate the learning process. You can download it directly from: https://www.rstudio.com/products/rstudio/download/

Keep Calm… and Good Work!

Online Sources and other useful Bibliography

www.r-project.org (The developers of R)
www.statmethods.net (Quick-R)
www.cookbook-r.com (R code “recipes”)
www.bioconductor.org/help/workflows (R code for pipelines of genomic analyses)
Advanced R (If you want to learn R from a programmers point of view)
Introductory Statistics with R (Springer, Dalgaard)
A first course in statistical programming with R (CUP, Braun and Murdoch)
Computational Genome Analysis: An Introduction (Springer, Deonier, Tavaré and Waterman)
R programming for Bioinformatics (CRC Press, Gentleman)

Start/Quit RStudio

RStudio can be opened by double-clicking its icon. Alternatively, in Linux and Mac, one can start R by typing R in a terminal, and get an R interactive console.

To quit R, in RStudio, just close it, or in the terminal use the q () function, and you will be prompted if you want to save the workspace image (i.e. the .RData file):

q()

Save workspace image to ~/path/to/your/working/directory/.RData? [y/n/c]:

By typing y (yes), then the entire R workspace will be written to the .RData file, which can become very large. Often it is sufficient to just save an analysis protocol in an R source file. This way, one can quickly load all data sets and objects in future analyses, without having to re-compute the whole data.

Create an RStudio project

To start we will open the RStudio. This is an Integrated Development Environment - IDE - that includes syntax-highlighting text editor (1), an R console (2), as well as workspace and history management (3), and tools for plotting and exporting images, browsing the workspace and creating projects (4).

Figure 1: RStudio GUI

Projects are a great functionality, easing the transition between data set analysis, and allowing a fast navigation to your analysis/working directory. To create a new project:

File > New Project... > New Directory > Empty Project
Directory name: r-absoluteBeginners
Create project as a subdirectory of: ~/
                           Browse... (directory/folder to save the workshop data)
Create Project

Projects should be personalized by clicking on the menu in the right upper corner. The general options - R General - are the most important to customize, since they allow the definition of the RStudio “behavior” when the project is opened. The following suggestions are particularly useful:

Restore .RData at startup - Yes (for analyses with +1GB of data, you should choose "No")
Save .RData on exit - Ask
Always save history - Yes

Figure 2: Customize Project

Package repositories

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. Most packages are stored, in an organized way, in online repositories from which they can be easily retrieved and installed on your computer (R packages by Hadley Wickham). There are 2 main R repositories:

The Comprehensive R Archive Network - CRAN (nearly 9200 packages)
Bioconductor (>1200 packages) (bioscience data analysis)

This huge variety of packages is one of the reasons why R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package for free.

To set the repositories that you want to use when searching and installing packages:

# To Install Bioconductor, run the following code

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite()

## To set the repositories where R should look for packages:
setRepositories()   
# then input the numbers corresponding to the requested repositories
 # (it is advisable to use repositories 1 2 3 4 5 6 7 to cover most packages)

Now, lets install the packages needed for the course exercises:

# Install packages from Bioconductor
source("https://bioconductor.org/biocLite.R")

biocLite("Biobase")
biocLite("limma")
biocLite("Heatplus")
biocLite("Mfuzz")
biocLite("cycle")

# Install packages from CRAN
install.packages ("aplpack")   # install the package called aplpack

Getting Help

R has many built-in ways of providing help regarding its functions and packages:

library ("limma")    # load the library limma 
help (package=limma) # help(package="package_name") to get help about a specific package
vignette ("limma")   # launch a pdf with the package manual (called R vignettes)
?toptable   # ?function to get quick info about the function of interest

Basics: General Notes (about R and RStudio)

R is case sensitive - be aware of capital letters (b is different from B).
All R code lines starting with the # (cardinal or hash) sign are interpreted as comments, and therefore not evaluated.

# This is a comment
# 3 + 4   # this code is not evaluated, so and it does not print any result
2 + 3     # the code before the hash sign is evaluated, so it prints the result (value 5)

[1] 5

Expressions in R are evaluated from the innermost parenthesis toward the outermost one (just like in the usual mathematical evaluation).

# Example with parenthesis:
((2+2)/2)-2

[1] 0

# Without parenthesis:
2+2/2-2

[1] 1

Spaces matter in variable names — use a dot or underscore instead, e.g. my.variable_name.
Spaces between variables and operators do not matter: 3+2 is the same as 3 + 2, and function (arg1 , arg2) is the same as function(arg1,arg2).
If you want to write 2 expressions/commands in the same line, you have to separate them by a ; (semi-colon)

#Example:
3 + 2 ; 5 + 1

[1] 5

[1] 6

Recent versions of RStudio auto-complete your commands by showing you possible alternatives as soon as you type 3 consecutive characters, however, if you want to see the options for less than 3 chars, just press tab to display the options. Tip: Use auto-complete as much as possible to avoid typing mistakes.
There are 4 main vector data types: Logical (TRUE or FALSE); Numeric (eg. 1,2,3…); Character (eg. “u”, “alg”, “arve”) and Complex (eg. 3+2i)
Vectors are ordered sets of elements. In R, vectors are 1-based, meaning that the first index position is number 1 (opposed to other languages whose indexes start at zero).
R objects can be divided into two main groups: Functions and Data-related Objects. Functions receive arguments inside circular brackets ( ) and objects receive arguments inside square brackets [ ]:

function (arguments)
data.object [arguments]

Basics: Operators

Assignment Operators

Values are assigned to named variables with an <- (arrow) or an = (equal) sign. In most cases they are interchangeable, however it is good practice to use the arrow since it is explicit about the direction of the assignment. If the equal sign is used, the assignment occurs from left to right.

x <- 7     # assign the number 7 to the variable x
x          # R will print the value associated with variable x
y <- 9     # assign the number 9 to the variable y
z = 3      # assign the value 3 to the variable z
42 -> lue  # assign the value 42 to the variable lue
x ->  xx   # assign the value of x (7) to the variable xx, which becomes 7
xx
my_variable = 5   # my_variable has the value 5

Comparison Operators

Allow the direct comparison between values:

Symbol	Description
`==`	exactly the same (equal)
`!=`	different (not equal)
`<`	smaller than
`>`	greater than
`<=`	smaller or equal
`>=`	greater or equal

1 == 1   # TRUE
1 != 1   # FALSE
x > 3    # TRUE (x is 7)
y <= 9   # TRUE (y is 9)
my_variable < z   # FALSE (z is 3 and my_variable is 5)

Logical Operators

Compare logical (TRUE FALSE) values:

Symbol	Description
`&`	AND
`\|`	OR
`!`	NOT

QUESTION: Are these TRUE, or FALSE?

x < y & x > 10   # AND means that both expressions have to be true
x < y | x > 10   # OR means that only one expression must be true
!(x != y & my_variable <= y)  # yet another AND example

Arithmetic Operators

R makes calculations using the following arithmetic operators:

Symbol	Description
`+`	summation
`-`	subtraction
`*`	multiplication
`/`	division
`^`	powering

3 / y   ## 0.3333333

x * 2   ## 14

3 - 4   ## -1

my_variable + 2   ## 7

2^z   ## 8

Basics: Loading data and Saving files

Most R users need to load their data sets, usually saved as table files (e.g. Excel .csv files), to be able to analyse and manipulate them. After the analysis, the results need to be exported/saved (eg. to view in another program).

# Inspect the esoph data set, (which is an R built-in data set, installed together with R)
esoph
dim(esoph)       # number of rows and columns of this dataset 
colnames(esoph)  # the name of the columns

### Saving ###
# Save to a file named esophData.csv the esoph R dataset, separated by commas and
 # without quotes (the file will be saved in the current working directory)
write.table (esoph, file="esophData.csv", sep="," , quote=F)

# Save to a file named esophData.tab the esoph dataset, separated by tabs and without
 # quotes (the file will be saved in the current working directory)
write.table (esoph, file="esophData.tab", sep="\t" , quote=F)

### Loading ###
# Load a data file into R (the file should be in the working directory)
  # read a table with columns separated by tabs
my.data.tab <- read.table ("esophData.tab", sep="\t", header=TRUE)
 # read a table with columns separated by commas
my.data.csv <- read.csv ("esophData.csv", header=T)

Note: if you want to load or save the files in directories different from the working dir, just use (inside quotes) the full path as the first argument, instead of just the file name (e.g. “/home/Desktop/r_Workshop/esophData.csv”).

FUTURE learning: R for absolute beginners (online tutorial)

If you are an absolute beginner, and you want to start learning R, you can follow our hands-on online tutorial.

Part 1: Syntax and Data Structures in R
rpubs.com/isabelduarte/r4ab-day1
Part 2: Summary Statistics and Graphics in R (using data from a 2015 study of the effects of seawater acidity in coral reef calcification)
rpubs.com/isabelduarte/r4ab-day2

END