Outline

In Session I of this R workshop, we plan to cover the following:

  1. Introduction to the R environment (R Studio, workings directories, script files)
  2. Basic R syntax (help, load library/packages, commands)
  3. Data types (numeric, character, complex, logical)
  4. Data objects (vectors, factors, matrices, data frames, lists, functions, tables)

To return to the list of sessions at the TRIA-Net website, click here

http://tria-net.srv.ualberta.ca/tria-publications.

Why use R?

There are many different kinds of software available that you can use so what makes R so special and appealing?

  • R is free to use!
  • R runs on a variety of platforms including Windows, Unix and MacOS.
  • There is a large community of people who use R and create easy to use add-on packages.
  • R generates high-quality graphical output that can be used from analysis to publications.
  • There are many textbooks that are tailored specifically for biologists using R:
    • Ben Boelker- Ecological Models and Data in R
    • Andrew Beckerman & Owen Petchey - Getting Started with R An Introduction for Biologists
    • Victor Bloomfield - Computer Simulation and Data Analysis in Molecular Biology and Biophysics: An Introduction Using R
    • Michael Whitlock & Dolph Schluter - The Analysis of Biological Data

Installing R

To install R on your computer you can go to the following webpage: https://cran.r-project.org/

For the Windows operating system take the following steps:

  1. Choose the correct operating system and start your download of R. For example, for windows click on the link “Download R for Windows”"
  2. Under “Download and Install R”, click on the “Download R for Windows” link. This should take you to a new page.
  3. Under “Subdirectories” on the new page, click on the “install R for the first time” link.
  4. On the next page, you should click on a link saying something like “Download R 3.2.3 for Windows” (or R X.X.X, where X.X.X gives the version of R).
  5. You may be asked if you want to save or run a file “R-3.2.3-win.exe”. Choose “Save” and save the file on the Desktop. Then double-click on the icon for the file to run it.
  6. Follow the simple instructions for the installation wizard as they show up on the screen.
  7. Then R should now be installed. When R has finished installing, you will see “Completing the R for Windows Setup Wizard” appear. Click “Finish”.

Congratulations! You can now run R on your own personal computer.

Installing RStudio

RStudio is an IDE (integrated development environment) that makes R easier to use and more productive. RStudio combines a set of productivity tools into a single environment including:

  • Code Editor - syntax highlighting, code completion, indenting, and definitions
  • Debugging - debugging console, breakpoints, environment panel, and tracebacks
  • Visualization - data display, data plotting, and data manipulation

To install RStudio on your computer you can go to the following webpage: https://www.rstudio.com/products/rstudio/download/

For the windows operating system take the following steps:

  1. Under the label “Installers for Supported Platforms” click on the link “RStudio 0.99.887 - Windows Vista/7/8/10” (or RStudio X.X.X, where X.X.X gives the version of RStudio)
  2. Once the file is downloaded to your computer, click the “RStudio-0.99.887.exe” file to start the installation process.
  3. Follow the simple instructions for the installation wizard as they show up on the screen.
  4. Now RStudio should be installed.

Congratulations! You can now use the RStudio IDE environment to simplify and enhance your experience with R.

Introduction to the R environment

Starting and closing R

The R GUI versions, including RStudio, under Windows and Mac OS X can be opened by double-clicking their icons.

When opening R you should see the following environment: alt text

Alternatively, when opening RStudio you have the following environment: alt text

At first, the RStudio environment may look more confusing to navigate and more complicated, but this user interface is very easy to use and simplifies the R experience.

To close R use the command q(). R will ask you if you want to save the workspace image. You can respond with y (yes) or n (no).

When responding with y, the entire R workspace will be written to a .RData file which can become very large. To avoid saving the R workspace we will discuss the alternative to this by saving and using scripts in R.

Working Directories

There are two ways to set your working directory in R:

  1. Through the menu
    • Under the “Session” dropdown menu select “Set Working Directory” then “Choose Directory…”. From here you can navigate through your files to find the proper place.
  2. Using the function setwd(...)
    • For example the command setwd("C:/Users/User Name/Documents/FOLDER") sets the working directory to a folder FOLDER within the Documents directory.

Check the current working directory by

getwd()

Return the content of the current working directory by

dir()

Script Files

Script files are nothing more than text files of the commands that you enter in the R console. Some benefits to using Script files over inputting the commands through the R console are:

  • You can execute or run your R commands directly from the script file.
  • You can run the entire file at once, executing multiple commands, or even a subset of the commands of a script file by highlighting those you wish to execute.
  • By using scripts you do not need to save your R workspace.
  • It is easier to handle longer commands
  • You can write down comments to help document the different commands and what they accomplish.
  • These files can be easily saved and loaded.

To create a script file under the “file” dropdown menu select “New file” followed by “R script”. This should open up a blank document where you can begin your work.

To load a script file, go to “File” from the drop down menu at the R console and choose “Open Script”. Find the file and open it.

To save a script file, simply choose “File” from the drop down menu (when you have the script file open and active) and choose “Save As”.

To add comments to a script file you use the # symbol on the keyboard. For example, let’s look at a function we already discussed the getwd(). Let us add a comment so we remember what the function does.

# Check the current working directory
getwd()

R simply ignores anything after a # sign and does not execute it as a command.

To execute a R script file, use the following command:

source("my_script.R")

where “my_script.R” is the name of the file you wish to execute. Note: if your working directory is not set to the proper pathway where the script file is located then R will return with and error.

Basic R Syntax

To find help for a particular function you can use the command ?function_name or help(function_name). Both commands complete the same task.

Sometimes in R we need to load a library. To load a library, we use the command library("library_name").

Data Types

When using R, you will need to use various variables to store information. You may need to store information of various data types like character, numbers, logical, Boolean, etc. In R there are many different data types:

  • Numeric
    • e.g. 12.5, 1, 999
  • Character
    • e.g. ‘a’, “hello”, “TRUE”
  • Complex
    • e.g. 2+4i
  • Logical
    • e.g. TRUE, < (less than), == (equal), != (not equal)

To return the type of the data type you can use the command typeof(). For example str <- "hello" typeof(str) [1] "character"

Data Objects

Vectors

  • When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector.

    # Create a vector
    a <- c(1,2,3,4,5)
    # Display the vector
    print(a)
    [1] 1 2 3 4 5

Factors

  • Factors are the R-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.
  • Factors are created using the factor() function. The nlevels function gives the count of levels.

Matrices

  • A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.

    # Create a matrix.
    M = matrix( c('a','b','c',1 ,2 ,3 ), nrow = 2, ncol = 3, byrow = TRUE)
    # Display the matrix
    print(M)
         [,1] [,2] [,3]
    [1,] "a"  "b"  "c" 
    [2,]  1    2    3

Data frames

  • Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length.
  • Data Frames are created using the data.frame() function.

    # Create the data frame.
    BMI <-  data.frame(
    gender = c("Male", "Male","Female"), 
    height = c(72, 69.5, 62), 
    weight = c(175, 210, 125),
    Age = c(25,38,26)
    )
    # Display the data frame
    print(BMI)
      gender height weight Age
    1   Male     72    175  25
    2   Male   69.5    210  38
    3 Female     62    125  26 

Lists

  • A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it.

    # Create a list.
    list1 <- list(c(2,5,3),21.5,'hello', cos)
    # Print the list.
    print(list1)
    [[1]]
    [1] 2 5 3
    
    [[2]]
    [1] 21.5
    
    [[3]]
    [1] "hello"
    
    [[4]]
    function (x)  .Primitive("cos")

Functions

  • A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.
  • We have already seen many examples of functions such as help(), c(), data.frame(), etc.
  • Users can create their own functions; we will discuss this in Session III.

Tables

  • Another common way to store information is in a table.

    a <- c("Sometimes","Sometimes","Never","Always","Always","Sometimes","Sometimes","Never")
    b <- c("Maybe","Maybe","Yes","Maybe","Maybe","No","Yes","No")
    results <- table(a,b)
    print(results)
               b
    a           Maybe No Yes
      Always        2  0   0
      Never         0  1   1
      Sometimes     2  1   1