1. Getting Setup

The purpose of this tutorial is to provide a basic understanding of R. I want to provide knowledge of the basic concepts you need to know, as well as introduce widely used packages within R that can help you with your data analysis. The R language is vast and providing a comprehensive tutorial of it would take many lifetimes and textbooks. Like any computer language, mastering it will require dedication, patience and a lifetime commitment. But even with a basic understanding of R, you will be able to automate monotomous tasks, work more efficiently with data and produce some very insightful and informative analysis. This tutorial is more of a jumping off point to help you get started, and point you towards resources created by people far smarter and adept in R than myself.

Note: Much of this tutorial is just a copy and paste job from different sources. No use reinventing the wheel. Credit is given in the citations at the end.

To get started, you will need to download R and R Studio. R is the language that your computer interprets to run programs and commands, while R Studio is the integrated development environment (IDE) that is used to build applications that combines developer tools into a single graphical user interface (GUI). You can think of R as the paint, and R Studio as all of the tools like the paintbrushes, canvas and palette, that you as the painter will use to create a new piece of art (or just a simple program that makes your life easier).

2. Helpful Resources

I found these resources extremely useful when I started learning R, but they are far from the only useful resources. R has alot of online support, particularly because it is an open-source language where regular users can build upon the language by creating new packages (R is similar to Python in this way).

3. Getting Started with R and R Studio

R is a simpler computing language to learn than most for two reasons:

To get started learning this language, it is important to understand how to navigate the R Console and RStudio IDE.

3.1 The R Console

When you start R on your computer (not R Studio), the R console shown below will open. You can try typing some simple mathematical commands yourself. It is good to understand the console, but the scripts that we write and execute will be done in the much more flexible and powerful RStudio IDE.



3.2 Scripts and the RStudio IDE

One of the great advantages of R over point-and-click analysis software like Excel is that you can save your work as a script. You can edit and save scripts using a text editor. There are many available IDE’s, but the RStudio IDE is the most widely used as it is developed specifically for working with R (it can also interpret other codes like SQL and Python). Look at the picture below, which takes an example of a simple plot using the AirPassengers data set from the Datasets package:

  • The top left quadrant is the Code Editor, where you can write and save your script. Note, that the code editor is not immediately open when you open R Studio. You must start a new script (click File -> New -> R Script).

  • The bottom left quadrant is the R Console which shows the lines of code that have been run and their outputs, as well as any error messages.

  • The top right quadrant is the Environment/History section, which displays any objects, such as variables, datasets or packages, that are active in the current R session.

  • The bottom right quadrant is the Other Panes section, which can show files, plots, packages, help support and viewer. In the picture below, the output from the plot(AirPassengers) command is currently in view.

For more in depth discussion on RStudio, refer to the link to Getting Started with R and R Studio in section 1 above. Learning how to use RStudio effectively will be just as important as learning the language itself.



4. Data Types and Structures

To make the most of the R language, you’ll need a strongunderstanding of the basic data types and data structures and how to operate on them.

4.1 Data Types

R has six basic data types (in addition to the five listed below, there are also raw data types, which are not discussed here). We will go through each character type, and use the typeof() function to prove its data type.

4.1.1 Character Types

The character data type is a string, or words and letters. Below, the variable named character_variable is assigned the value Patrick. When assigning a character variable, the object needs to be enclosed in quotation marks, as shown below.

Note: to assign a value to an object in R, we use the <- as shown below. This is how you will assign a value to any variable, data structure or other object.

character_variable <- "Patrick"

typeof(character_variable)
## [1] "character"

4.1.2 Double Types (Real or Decimals)

The double data type stores numerical values that are not integers. Below, the variable named numerical_variable is assigned the value 10.5.

numerical_variable <- 10.5

typeof(numerical_variable)
## [1] "double"

4.1.3 Integer Types

The integer data type stores integers. Below, the variable named integer_variable is assigned the value 5. Note that we assign the value as 5L. By default, R will store any numerical value as a double. The L tells R to store the variable as an integer. When the variable is called, only the number will appear (see code below).

Many times you will prefer to use integer values, such as when analysing non-divisible units (such as people). Integers also require far less memory storage on your computer than doubles do. Therefore, it is best to ensure your data is stored as an integer and not a double when working with integers.

integer_variable <- 5L

typeof(integer_variable)
## [1] "integer"
integer_variable
## [1] 5

4.1.4 Logical Types

The logical data type is a binary TRUE or FALSE value. Note, that R is case sensitive. When assigning a logical value, TRUE and FALSE needs to be in all upper case. You can also assign the a logical type the value T for TRUE and F for FALSE.

true_variable <- TRUE

false_variable <- FALSE 

typeof(true_variable)
## [1] "logical"
typeof(false_variable)
## [1] "logical"

4.1.5 Complex Types

The complex data type stores complex numbers which contain real elements and imaginary elements. These types are not commonly used outside of STEM fields.

complex_variable <- 1 + 2i

typeof(complex_variable)
## [1] "complex"

4.2 Data Structures

This section summarizes the most basic data structures in base R. I will provide a brief overview of each data structure, and how they are connected.

R’s basic data structures are listed below. They can be organized by their dimensionality (1D, 2D, ND) and whether they are homogenous (all contents must be of the same data type) or hetergenous (the contents can be of different data types).

Dimensions Homogenous Hetergenous
1D Atomic Vector List
2D Matrix Data frame
ND Array

Given an object, the best way to understand what data structures it’s composed of is to use the str() function. str() is short for structure and it gives a compact, human readable description of any R data structure.

4.2.1 Vectors

The basic data structure in R is the vector. The vector comes in two types: atomic vectors and lists. They have three common properties:

  • Type, typeof(), what it is.
  • Length, length(), how many elements it contains.
  • Attributes, attributes(), additional arbitrary metadata.

Atomic vectors and lists differ in the types of their elements: all elements of an atomic vector must be of the same data type, whereas lists can support elements of different data types.

4.2.1.1 Atomic Vectors

The four common types of atomic vectors include logical, integer, double (often referred to as numeric) and character. Atomic vectors are usually created with c(), short for combine.

dbl_vector <- c(1, 2.5, 4.5, 8.5)

#With the L suffix, you get an integer rather than a double
int_vector <- c(1L, 2L, 3L, 4L)

#Use TRUE and FALSE (or T and F) to create logical vectors
logical_vector <- c(TRUE, T, FALSE, F)

#Enclose each element in a character vector in double quotations
char_vector <- c("Hello", "World")


Types and Tests on Atomic Vectors

Given a vector you can determine its type with typeof(), or check for a specific type with an “is” function: is.character(), is.double(), is.integer(), is.logical(), or, more generally, is.atomic().

is.double(dbl_vector)
## [1] TRUE
is.integer(int_vector)
## [1] TRUE
typeof(char_vector)
## [1] "character"
is.atomic(logical_vector)
## [1] TRUE


Coercion

If you attempt to combine elements of different types into one atomic vector, R will coerce the vector into one data type. For example, combining a character and an integer yields a character vector:

str(c("character", 1))
##  chr [1:2] "character" "1"

When a logical vector is coerced to a double or integer, TRUE becomes 1 and FALSE becomes zero. This is useful in conjunction with sum() and mean().

as.numeric(logical_vector)
## [1] 1 1 0 0
#Total number of TRUEs
sum(logical_vector)
## [1] 2
#Proportion of TRUEs
mean(logical_vector)
## [1] 0.5

You can also coerce a vector manually, using the “as” functions: as.character(), as.double(), as.integer() and as.logical().

as.double(int_vector)
## [1] 1 2 3 4
as.character(logical_vector)
## [1] "TRUE"  "TRUE"  "FALSE" "FALSE"

If you coerce a double type vector to an integer, R will round the elements to the closest integers:

as.integer(dbl_vector)
## [1] 1 2 4 8

But be careful, if you try to coerce a vector to a data type that isn’t possible, R will return a vector of NA values, which represents missing values (not available).

as.double(char_vector)
## Warning: NAs introduced by coercion
## [1] NA NA



Simple Operations on Atomic Vectors

You can conduct operations on vectors as long as they are of the same length. The operation will return a vector of the same length. Think back to linear algebra 101.

x <- c(1,2,3)
y <- c(4,5,6)

#You can condcut operations on vectors of the same length
x+y
## [1] 5 7 9
y/x
## [1] 4.0 2.5 2.0
#You can also conduct operations on vectors using a scalar
x+3
## [1] 4 5 6
y-2
## [1] 2 3 4
x^2
## [1] 1 4 9

Operations on character vectors are also possible, but they are beyond the scope of this tutorial. There are many ways to work with character vectors. My preferred package is the stringr that is part of the tidyverse.



4.2.1.2 Lists

Lists are different from atomic vectors because their elements can be of any type, including lists. You construct lists by using list() instead of c().

x <- list(1:3, "a", c(TRUE, FALSE), c(2.3, 5.9))
str(x)
## List of 4
##  $ : int [1:3] 1 2 3
##  $ : chr "a"
##  $ : logi [1:2] TRUE FALSE
##  $ : num [1:2] 2.3 5.9

Lists are sometimes referred to as recursive vectors, because a list can contain another list.

x <- list(list(list()))
str(x)
## List of 1
##  $ :List of 1
##   ..$ : list()

c() will combine several lists into one. If given a combination of atomic vectors and lists, c() will coerce the vectors to lists before combining them. Compare the results of list() and c():

x <- list(list(1,2), c(3,4))
y <- c(list(1,2), c(3,4))
str(x)
## List of 2
##  $ :List of 2
##   ..$ : num 1
##   ..$ : num 2
##  $ : num [1:2] 3 4
str(y)
## List of 4
##  $ : num 1
##  $ : num 2
##  $ : num 3
##  $ : num 4

The typeof() a list is list. You can test for a list with is.list() and coerce to a list with as.list(). You can turn a list into an atomic vector with unlist(). If the elements of a list have different types, unlist() uses the same coercion rules as c().

Lists are used to build up many of the more complicated data structures in R. For example, both data frames and linear models objects (as produced by lm()) are lists.

4.2.1.3 Attributes