This mini hands-on tutorial serves as an introduction to R, covering the following topics:
This document will guide you through the initial steps toward using R. RStudio will be used as the development platform on this course since it is a free software, available for Linux, Mac and Windows, integrating many functionalities that facilitate the learning process. You can download it directly from: https://www.rstudio.com/products/rstudio/download/
Keep Calm… and Good Work!
Advanced R (If you want to learn R from a programmers point of view)
R programming for Bioinformatics (CRC Press, Gentleman)
RStudio can be opened by double-clicking its icon. Alternatively, in Linux and Mac, one can start R by typing R in a terminal, and get an R interactive console.
To quit R, in RStudio, just close it, or in the terminal use the q () function, and you will be prompted if you want to save the workspace image (i.e. the .RData file):
q()
Save workspace image to ~/path/to/your/working/directory/.RData? [y/n/c]:
By typing y (yes), then the entire R workspace will be written to the .RData file, which can become very large. Often it is sufficient to just save an analysis protocol in an R source file. This way, one can quickly load all data sets and objects in future analyses, without having to re-compute the whole data.
To start we will open the RStudio. This is an Integrated Development Environment - IDE - that includes syntax-highlighting text editor (1), an R console (2), as well as workspace and history management (3), and tools for plotting and exporting images, browsing the workspace and creating projects (4).
Figure 1: RStudio GUI
Projects are a great functionality, easing the transition between data set analysis, and allowing a fast navigation to your analysis/working directory. To create a new project:
File > New Project... > New Directory > Empty Project
Directory name: r-absoluteBeginners
Create project as a subdirectory of: ~/
Browse... (directory/folder to save the workshop data)
Create Project
Projects should be personalized by clicking on the menu in the right upper corner. The general options - R General - are the most important to customize, since they allow the definition of the RStudio “behavior” when the project is opened. The following suggestions are particularly useful:
Restore .RData at startup - Yes (for analyses with +1GB of data, you should choose "No")
Save .RData on exit - Ask
Always save history - Yes
Figure 2: Customize Project
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. Most packages are stored, in an organized way, in online repositories from which they can be easily retrieved and installed on your computer (R packages by Hadley Wickham). There are 2 main R repositories:
This huge variety of packages is one of the reasons why R is so successful: the chances are that someone has already solved a problem that you’re working on, and you can benefit from their work by downloading their package for free.
To set the repositories that you want to use when searching and installing packages:
# To Install Bioconductor, run the following code
## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite()
## To set the repositories where R should look for packages:
setRepositories()
# then input the numbers corresponding to the requested repositories
# (it is advisable to use repositories 1 2 3 4 5 6 7 to cover most packages)
Now, lets install the packages needed for the course exercises:
# Install packages from Bioconductor
source("https://bioconductor.org/biocLite.R")
biocLite("Biobase")
biocLite("limma")
biocLite("Heatplus")
biocLite("Mfuzz")
biocLite("cycle")
# Install packages from CRAN
install.packages ("aplpack") # install the package called aplpack
R has many built-in ways of providing help regarding its functions and packages:
library ("limma") # load the library limma
help (package=limma) # help(package="package_name") to get help about a specific package
vignette ("limma") # launch a pdf with the package manual (called R vignettes)
?toptable # ?function to get quick info about the function of interest
# This is a comment
# 3 + 4 # this code is not evaluated, so and it does not print any result
2 + 3 # the code before the hash sign is evaluated, so it prints the result (value 5)
[1] 5
# Example with parenthesis:
((2+2)/2)-2
[1] 0
# Without parenthesis:
2+2/2-2
[1] 1
my.variable_name.3+2 is the same as 3 + 2, and function (arg1 , arg2) is the same as function(arg1,arg2).; (semi-colon)#Example:
3 + 2 ; 5 + 1
[1] 5
[1] 6
R objects can be divided into two main groups: Functions and Data-related Objects. Functions receive arguments inside circular brackets ( ) and objects receive arguments inside square brackets [ ]:
function (arguments)data.object [arguments]
Values are assigned to named variables with an <- (arrow) or an = (equal) sign. In most cases they are interchangeable, however it is good practice to use the arrow since it is explicit about the direction of the assignment. If the equal sign is used, the assignment occurs from left to right.
x <- 7 # assign the number 7 to the variable x
x # R will print the value associated with variable x
y <- 9 # assign the number 9 to the variable y
z = 3 # assign the value 3 to the variable z
42 -> lue # assign the value 42 to the variable lue
x -> xx # assign the value of x (7) to the variable xx, which becomes 7
xx
my_variable = 5 # my_variable has the value 5
Allow the direct comparison between values:
| Symbol | Description |
|---|---|
== |
exactly the same (equal) |
!= |
different (not equal) |
< |
smaller than |
> |
greater than |
<= |
smaller or equal |
>= |
greater or equal |
1 == 1 # TRUE
1 != 1 # FALSE
x > 3 # TRUE (x is 7)
y <= 9 # TRUE (y is 9)
my_variable < z # FALSE (z is 3 and my_variable is 5)
Compare logical (TRUE FALSE) values:
| Symbol | Description |
|---|---|
& |
AND |
| |
OR |
! |
NOT |
QUESTION: Are these TRUE, or FALSE?
x < y & x > 10 # AND means that both expressions have to be true
x < y | x > 10 # OR means that only one expression must be true
!(x != y & my_variable <= y) # yet another AND example
R makes calculations using the following arithmetic operators:
| Symbol | Description |
|---|---|
+ |
summation |
- |
subtraction |
* |
multiplication |
/ |
division |
^ |
powering |
3 / y ## 0.3333333
x * 2 ## 14
3 - 4 ## -1
my_variable + 2 ## 7
2^z ## 8
Most R users need to load their data sets, usually saved as table files (e.g. Excel .csv files), to be able to analyse and manipulate them. After the analysis, the results need to be exported/saved (eg. to view in another program).
# Inspect the esoph data set, (which is an R built-in data set, installed together with R)
esoph
dim(esoph) # number of rows and columns of this dataset
colnames(esoph) # the name of the columns
### Saving ###
# Save to a file named esophData.csv the esoph R dataset, separated by commas and
# without quotes (the file will be saved in the current working directory)
write.table (esoph, file="esophData.csv", sep="," , quote=F)
# Save to a file named esophData.tab the esoph dataset, separated by tabs and without
# quotes (the file will be saved in the current working directory)
write.table (esoph, file="esophData.tab", sep="\t" , quote=F)
### Loading ###
# Load a data file into R (the file should be in the working directory)
# read a table with columns separated by tabs
my.data.tab <- read.table ("esophData.tab", sep="\t", header=TRUE)
# read a table with columns separated by commas
my.data.csv <- read.csv ("esophData.csv", header=T)
Note: if you want to load or save the files in directories different from the working dir, just use (inside quotes) the full path as the first argument, instead of just the file name (e.g. “/home/Desktop/r_Workshop/esophData.csv”).
If you are an absolute beginner, and you want to start learning R, you can follow our hands-on online tutorial.
END