This is an introduction to R. This document quickly goes through the syntax, variable types, datastructures and functions necessary to get you started on your R journey.

Variables and Datatypes

A variable can store different types of values such as numbers, characters etc. These different types of data that we can use in our code are called data types.

You can use the assignment operator <- to create new variables. Note that R is case sensitive, so A and a are different symbols and would refer to different variables.

Numeric Data

In R, the numeric data type represents all real numbers with or without decimal values. For example,

n <- 10  # a numeric
class(n)   # function to find datatype of avariable
## [1] "numeric"
a <- 2.3 
class(a)
## [1] "numeric"
b <- -34.8
class(b)
## [1] "numeric"

Logical Data

The logical data type in R is also known as boolean data type. It can only have two values: TRUE and FALSE. For example,

bool1 <- TRUE
class(bool1)
## [1] "logical"
bool2 <- FALSE
class(bool2)
## [1] "logical"

Character

The character data type is used to specify character or string values in a variable.

In programming, a string is a set of characters. For example, ‘A’ is a single character and “Apple” is a string.

You can use single quotes ’’ or double quotes “” to represent strings. In general, we use:

  • ’’ for character variables
  • “” for string variables

For example,

# create a string variable
fruit <- "Apple"
class(fruit)
## [1] "character"
# create a character variable
my_char <- 'A'
class(my_char)
## [1] "character"

Integer and Complex Data Types

The integer data type specifies real values without decimal points. We use the suffix L to specify integer data. For example,

an_integer <- 25L
class(an_integer)
## [1] "integer"

The complex data type is used to specify purely imaginary values in R. We use the suffix i to specify the imaginary part. For example,

# 2i represents imaginary part
complex_value <- 3 + 2i
class(complex_value)
## [1] "complex"

If .. else Statements

In programming, if else statement allows us to create a decision making program.

A decision making program runs one block of code under a condition and another block of code under a different condition. For example,

  • If age is greater than 18, allow the person to vote.
  • If age is not greater than 18, don’t allow the person to vote.

If Statement

The syntax of an if statement is

if (test_expression) {
  # body of if
}

Here, the test_expression is a boolean expression. It returns either True or False. If the test_expression is

True - body of the if statement is executed False - body of the if statement is skipped

x <- 3

# check if x is greater than 0
if (x > 0) {
  print("The number is positive")
}
## [1] "The number is positive"
print("Outside if statement")
## [1] "Outside if statement"

In the above program, the test condition x > 0 is true. Hence, the code inside parenthesis is executed.

If Else Statement

We can also use an optional else statement with an if statement. The syntax of an if…else statement is:

if (test_expression) {
  # body of if statement
} else {
  # body of else statement
}

The if statement evaluates the test_expression inside the parentheses.

If the test_expression is True,

  • body of if is executed
  • body of else is skipped

If the test_expression is False

  • body of else is executed
  • body of if is skipped

Example

age <- 15

# check if age is greater than 18
if (age > 18) {
  print("You are eligible to vote.")
} else {
  print("You cannot vote.")
}
## [1] "You cannot vote."

if…else if…else Statement

If you want to test more than one condition, you can use the optional else if statement along with your if…else statements. The syntax is:

if(test_expression1) {
  # code block 1
} else if (test_expression2){
  # code block 2
} else {
  # code block 3
}

Example

x <- 0

# check if x is positive or negative or zero
if (x > 0) {
  print("x is a positive number")
} else if (x < 0) {
  print("x is a negative number")
} else {
  print("x is zero")
}
## [1] "x is zero"

ifelse() Function

In R, the ifelse() function is a shorthand vectorized alternative to the standard if…else statement.

Most of the functions in R take a vector as input and return a vectorized output. Similarly, the vector equivalent of the traditional if…else block is the ifelse() function.

The syntax of the ifelse() function is:

ifelse(test_expression, x, y)

The output vector has the element x if the output of the test_expression is TRUE. If the output is FALSE, then the element in the output vector will be y.

Example 1

# input vector
x <- c(12, 9, 23, 14, 20, 1, 5)

# ifelse() function to determine odd/even numbers
ifelse(x %% 2 == 0, "EVEN", "ODD")
## [1] "EVEN" "ODD"  "ODD"  "EVEN" "EVEN" "ODD"  "ODD"

Example 2

# input vector of marks
marks <- c(63, 58, 12, 99, 49, 39, 41, 2)

# ifelse() function to determine pass/fail
ifelse(marks < 40, "FAIL", "PASS")
## [1] "PASS" "PASS" "FAIL" "PASS" "PASS" "FAIL" "PASS" "FAIL"

for Loops

In programming, loops are used to repeat the execution of a block of code. Loops help you to save time, avoid repeatable blocks of code, and write cleaner code. The most commonly used loop (in this course) is the for loop.

A for loop is used to iterate over a list, vector or any other object of elements. The syntax of for loop is:

for (value in sequence) {
  # block of code
}

Here, sequence is an object of elements and value takes in each of those elements. In each iteration, the block of code is executed. For example,

numbers <- c(1, 2, 3, 4, 5)

# for loop to print all elements in numbers
for (x in numbers) {
  print(x)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

In the above code, we have used a for loop to iterate through a sequence of numbers called numbers. In each iteration, the variable x stores the element from the sequence and the block of code is executed.

Example 1: Count the Number of Even Numbers

# vector of numbers
num <- c(2, 3, 12, 14, 5, 19, 23, 64)

# variable to store the count of even numbers
count = 0

# for loop to count even numbers
for (i in num) {

  # check if i is even
  if (i %% 2 == 0) {
    count = count + 1
  }
}

print(count)
## [1] 4

In the above code, the operator %% is a modulo operator which calculates the remainder. So x%%y finds the remainder after dividing x by y.

R has two other loops, while loop and repeat loops.

Example 2: for Loop With break Statement You can use the break statement to exit from the for loop in any iteration. For example,

# vector of numbers
numbers <- c(2, 3, 12, 14, 5, 19, 23, 64)

# for loop with break
for (i in numbers) {

  # break the loop if number is 5
  if( i == 5) {
    break
  }

  print(i)
}
## [1] 2
## [1] 3
## [1] 12
## [1] 14

You can also nest one for loop inside another for loop. The syntax for a nested for loop is:

for (i in sequence_1) {
  for (j in sequence_2) {
    # code block
  }
}

Functions

A function is just a block of code that you can call and run from any part of your program. They are used to break our code in simple parts and avoid repeatable codes.

You can pass data into functions with the help of parameters and return some other data as a result. You can use the function() reserve keyword to create a function in R. The syntax is:

func_name <- function (parameters) {
statement
}

Here, func_name is the name of the function.

Example

# define a function to compute power
power <- function(a,b) {
    return (a^b)
}

Here, we have defined a function called power which takes two parameters - a and b. Inside the function, we have included a code to print the value of a raised to the power b.

After you have defined the function, you can call the function using the function name and arguments. For example,

# call nested function 
result <- power(3,2)
print(result)
## [1] 9
print(power(2,2))   # directly call the function
## [1] 4

Datastructures in R

Datastructures are a particular way of organizing data in the computer. Some important datastructures in R are given below:

Strings

A string is a sequence of characters. In R, we represent strings using quotation marks " ".

mystring <- "hello world!"  # an example string
print(mystring)
## [1] "hello world!"
# find length of string using nchar
nchar(mystring)
## [1] 12
# join strings using paste()
string1 <- "hello"
string2 <- "world"
paste(string1, string2)
## [1] "hello world"
# change to uppercase using toupper()
toupper(string1)
## [1] "HELLO"
# change to lowercase using tolower()
tolower("HELLO")
## [1] "hello"

Vector

One dimensional datastructure, all elements in vector should be of the same variable type (all numeric or all character/string etc).

In R, we use the c() function to create a vector.

# create vector of string types
employees <- c("Sabby", "Cathy", "Lucy")
print(employees)
## [1] "Sabby" "Cathy" "Lucy"
# a vector with number sequence from 1 to 5 
numbers <- c(1, 2, 3, 4, 5)
print(numbers)
## [1] 1 2 3 4 5
# a vector with number sequence from 1 to 5 
numbers <- 1:5
print(numbers)
## [1] 1 2 3 4 5
# repeat sequence of vector 2 times
numbers <- rep(c(2,4,6), times = 2)
print(numbers)
## [1] 2 4 6 2 4 6
# loop over a vector
numbers <- c(1, 2, 3, 4, 5)
# iterate through each elements of numbers
for (number in numbers) {
    print(number)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Access and Modify Vector Elements You can access individual vector elements using number indexes 1,2,3..

# a vector of string type
languages <- c("Python", "Java", "R")

# access first element of languages
print(languages[1])  # "Python"
## [1] "Python"
# access third element of languages
print(languages[3]) # "R"
## [1] "R"

To change a vector element, we can simply reassign a new value to the specific index.

languages[2] <- "C++" # replacing Java with C++
print(languages)
## [1] "Python" "C++"    "R"

Length of a vector We can find the length of a vector using the length() function.

languages <- c("Python", "Java", "R")
length(languages)
## [1] 3

Matrix

A matrix is a two-dimensional data structure where data are arranged into rows and columns. The syntax of a matrix() function in R is:

matrix(vector, nrow, ncol)

Example

# create a 2 by 3 matrix
matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)

print(matrix1)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

Since we have passed byrow = TRUE, the data items in the matrix are filled row-wise.

# access matrix elements
matrix1[1,2]  # first row, second column element
## [1] 2

List

A List is a collection of similar or different types of data. Can be a combination of numeric, string, boolean etc.

# list with similar type of data 
list1 <- list(24, 29, 32, 34)

# list with different type of data
list2 <- list("University", 38, TRUE)

You can assess and modify each element in the list by number indexing

list1 <- list(24, "Sabby", 5.4, "Nepal")

# access 1st item
print(list1[1]) # 24
## [[1]]
## [1] 24
# access 4th item
print(list1[4]) # Nepal
## [[1]]
## [1] "Nepal"
# change element at index 2
list1[2] <- "Cathy"

# print updated list
print(list1)
## [[1]]
## [1] 24
## 
## [[2]]
## [1] "Cathy"
## 
## [[3]]
## [1] 5.4
## 
## [[4]]
## [1] "Nepal"

You can add items to a list by using the append() function.

list1 <- list(24, "Sabby", 5.4, "Nepal")

# using append() function 
append(list1, 3.14)
## [[1]]
## [1] 24
## 
## [[2]]
## [1] "Sabby"
## 
## [[3]]
## [1] 5.4
## 
## [[4]]
## [1] "Nepal"
## 
## [[5]]
## [1] 3.14

You can remove list elements by the index number and the negative sign.

list1 <- list(24, "Sabby", 5.4, "Nepal")

# remove 4th item
print(list1[-4]) # Nepal
## [[1]]
## [1] 24
## 
## [[2]]
## [1] "Sabby"
## 
## [[3]]
## [1] 5.4

Data Frame

A data frame is a two-dimensional data structure which can store data in tabular format.

Data frames have rows and columns and each column can be a different vector. And different vectors can be of different data types.

In R, we use the data.frame() function to create a Data Frame.

The syntax of the data.frame() function is:

dataframe1 <- data.frame(
   first_col  = c(val1, val2, ...),
   second_col = c(val1, val2, ...),
   ...
)

Each column is a vector type.

Example

# Create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE)
)

print(dataframe1) 
##       Name Age  Vote
## 1     Juan  22  TRUE
## 2  Alcaraz  15 FALSE
## 3 Simantha  19  TRUE

There are different ways to extract columns from a data frame. We can use [ ], [[ ]], or $ to access specific column of a data frame in R.

# Create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE)
)

# pass index number inside [ ] 
print(dataframe1[1])
##       Name
## 1     Juan
## 2  Alcaraz
## 3 Simantha
# pass column name inside [[  ]] 
print(dataframe1[["Name"]])
## [1] "Juan"     "Alcaraz"  "Simantha"
# use $ operator and column name 
print(dataframe1$Name)
## [1] "Juan"     "Alcaraz"  "Simantha"

Plotting

In R, you can use the plot() function to plot points in a graph. A variety of plot types like line plot, bar plot, histogram etc are available in R.

An example line plot

# sequence vector of values
x <- seq(-3, 3, 0.25)  # from -3 to 3 with 0.25 spacing

# respective sine value of x
y <- sin(x)

# plot y against x
plot(x,y)  # points

plot(x, y, type = "l") #line plot

plot(x, y, type = "l", main = "My Plot", xlab = "X Axis", ylab = "Y Axis", col = "blue") # additional options

Importing and Exporting Data

R offers options to import and export many file types. The most common file type is comma separated CSV.

In R, you can use the read.csv() function to read a CSV file available in your current directory. The syntax is

mydata <- read.csv("filename.csv")

To export, you can use write.csv() function.

write.csv(yourdata, "filename.csv")

There are also functions to read and write excel, dta and text files to name a few. You may need to install packages to call some of these functions.

Packages

Packages are collections of R functions. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.

To install a package, the syntax is:

install.packages("packagename")

To use functions from this packages, you need to load it by library(packagename).

Note that you need to install a package only once. You should install packages from the console

For example, to read excel files you need the readxl package. You can install this package by executing the command install.packages("readxl") in your console. To use functions from this package, you need to load it by using the library() function.

library(readxl)

You can then use the read_excel function from this package read_excel("yourfilename.xlsx").

Working Directory

To load data, either you should specify the directory where it is located or it should be in your current working directory

Some useful functions:

  • getwd() - to see which directory you are currently working on
  • setwd() - to set a particular working directory, We usually start by this function.

In Windows, you often need double backslash\\ when specifying the directory. For example, setwd("C:\\Users\\Desktop"). This is not the case in Mac OS. Be wary of this!

Getting Help

There is a comprehensive built in help system for R, At the console, you can use any of the following:

help(foo)   # for help about function foo
## No documentation for 'foo' in specified packages and libraries:
## you could try '??foo'
?foo        # same as help(foo)
## No documentation for 'foo' in specified packages and libraries:
## you could try '??foo'
example(foo)  # give an example of function foo
## Warning in example(foo): no help found for 'foo'
apropos("foo")   # list all functions containing string foo
## [1] "citFooter"

R Learning Resources

Most of this document was made from Programiz.

There are a lot of online resources to learn R. I list a few below:

  • W3Schools - A popular website dedicated to teaching different programming languages.
  • freeCodeCamp - Youtube - A 2 hour Youtube Video introducing R for beginners.
  • Data Exploration Using Dplyr - Youtube - Data Exploration Essentials covered.
  • R for DataScience - An ebook by Hadley Wickham and Garret Grolemund
  • Econometrics with R - An ebook that teaches econometric methods using R.
  • R for Stata Users - Compares Stata commands with R commands.
  • R Markdown - Introduces RMarkdown, a modern approach to producing reproducible code by combining R code with text. You can use this for writing your assignments. This document was made using R Markdown.
  • CheatSheets for R - Quick Reference for popular R functions. Highly Recommended. The cheatsheets for dplyr, readr and `ggplot2 are quite useful.