This is an introduction to R. This document quickly goes through the syntax, variable types, datastructures and functions necessary to get you started on your R journey.
A variable can store different types of values such as numbers, characters etc. These different types of data that we can use in our code are called data types.
You can use the assignment operator <- to create new
variables. Note that R is case sensitive, so A and a are
different symbols and would refer to different variables.
In R, the numeric data type represents all real numbers with or without decimal values. For example,
n <- 10 # a numeric
class(n) # function to find datatype of avariable
## [1] "numeric"
a <- 2.3
class(a)
## [1] "numeric"
b <- -34.8
class(b)
## [1] "numeric"
The logical data type in R is also known as boolean data type. It can only have two values: TRUE and FALSE. For example,
bool1 <- TRUE
class(bool1)
## [1] "logical"
bool2 <- FALSE
class(bool2)
## [1] "logical"
The character data type is used to specify character or
string values in a variable.
In programming, a string is a set of characters. For
example, ‘A’ is a single character and “Apple” is a string.
You can use single quotes ’’ or double quotes “” to represent strings. In general, we use:
For example,
# create a string variable
fruit <- "Apple"
class(fruit)
## [1] "character"
# create a character variable
my_char <- 'A'
class(my_char)
## [1] "character"
The integer data type specifies real values without
decimal points. We use the suffix L to specify integer
data. For example,
an_integer <- 25L
class(an_integer)
## [1] "integer"
The complex data type is used to specify purely
imaginary values in R. We use the suffix i to specify the
imaginary part. For example,
# 2i represents imaginary part
complex_value <- 3 + 2i
class(complex_value)
## [1] "complex"
print() function in RIn R, we use the print() function to print values and variables. For example,
# print values
print("R is fun")
## [1] "R is fun"
# print variables
x <- "I love Glasgow"
print(x)
## [1] "I love Glasgow"
In the above example, we have used the print() function
to print a string and a variable. When we use a variable inside
print(), it prints the value stored inside the
variable.
In programming, if else statement allows us to create a
decision making program.
A decision making program runs one block of code under a condition and another block of code under a different condition. For example,
The syntax of an if statement is
if (test_expression) {
# body of if
}
Here, the test_expression is a boolean expression. It
returns either True or False. If the
test_expression is
True - body of the if statement is executed False - body of the if statement is skipped
x <- 3
# check if x is greater than 0
if (x > 0) {
print("The number is positive")
}
## [1] "The number is positive"
print("Outside if statement")
## [1] "Outside if statement"
In the above program, the test condition x > 0 is true. Hence, the code inside parenthesis is executed.
We can also use an optional else statement with an if statement. The syntax of an if…else statement is:
if (test_expression) {
# body of if statement
} else {
# body of else statement
}
The if statement evaluates the test_expression inside the parentheses.
If the test_expression is True,
If the test_expression is False
Example
age <- 15
# check if age is greater than 18
if (age > 18) {
print("You are eligible to vote.")
} else {
print("You cannot vote.")
}
## [1] "You cannot vote."
If you want to test more than one condition, you can use the optional else if statement along with your if…else statements. The syntax is:
if(test_expression1) {
# code block 1
} else if (test_expression2){
# code block 2
} else {
# code block 3
}
Example
x <- 0
# check if x is positive or negative or zero
if (x > 0) {
print("x is a positive number")
} else if (x < 0) {
print("x is a negative number")
} else {
print("x is zero")
}
## [1] "x is zero"
In R, the ifelse() function is a shorthand vectorized
alternative to the standard if…else statement.
Most of the functions in R take a vector as input and return a
vectorized output. Similarly, the vector equivalent of the traditional
if…else block is the ifelse() function.
The syntax of the ifelse() function is:
ifelse(test_expression, x, y)
The output vector has the element x if the output of the
test_expression is TRUE. If the output is
FALSE, then the element in the output vector will be
y.
Example 1
# input vector
x <- c(12, 9, 23, 14, 20, 1, 5)
# ifelse() function to determine odd/even numbers
ifelse(x %% 2 == 0, "EVEN", "ODD")
## [1] "EVEN" "ODD" "ODD" "EVEN" "EVEN" "ODD" "ODD"
Example 2
# input vector of marks
marks <- c(63, 58, 12, 99, 49, 39, 41, 2)
# ifelse() function to determine pass/fail
ifelse(marks < 40, "FAIL", "PASS")
## [1] "PASS" "PASS" "FAIL" "PASS" "PASS" "FAIL" "PASS" "FAIL"
In programming, loops are used to repeat the execution of a block of
code. Loops help you to save time, avoid repeatable blocks of code, and
write cleaner code. The most commonly used loop (in this course) is the
for loop.
A for loop is used to iterate over a list, vector or any
other object of elements. The syntax of for loop is:
for (value in sequence) {
# block of code
}
Here, sequence is an object of elements and value takes in each of those elements. In each iteration, the block of code is executed. For example,
numbers <- c(1, 2, 3, 4, 5)
# for loop to print all elements in numbers
for (x in numbers) {
print(x)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
In the above code, we have used a for loop to iterate
through a sequence of numbers called numbers. In each
iteration, the variable x stores the element from the
sequence and the block of code is executed.
Example 1: Count the Number of Even Numbers
# vector of numbers
num <- c(2, 3, 12, 14, 5, 19, 23, 64)
# variable to store the count of even numbers
count = 0
# for loop to count even numbers
for (i in num) {
# check if i is even
if (i %% 2 == 0) {
count = count + 1
}
}
print(count)
## [1] 4
In the above code, the operator %% is a modulo operator
which calculates the remainder. So x%%y finds the remainder
after dividing x by y.
R has two other loops, while loop and
repeat loops.
Example 2: for Loop With break Statement You can use the break statement to exit from the for loop in any iteration. For example,
# vector of numbers
numbers <- c(2, 3, 12, 14, 5, 19, 23, 64)
# for loop with break
for (i in numbers) {
# break the loop if number is 5
if( i == 5) {
break
}
print(i)
}
## [1] 2
## [1] 3
## [1] 12
## [1] 14
You can also nest one for loop inside another
for loop. The syntax for a nested for loop
is:
for (i in sequence_1) {
for (j in sequence_2) {
# code block
}
}
A function is just a block of code that you can call and run from any part of your program. They are used to break our code in simple parts and avoid repeatable codes.
You can pass data into functions with the help of parameters and
return some other data as a result. You can use the
function() reserve keyword to create a function in R. The
syntax is:
func_name <- function (parameters) {
statement
}
Here, func_name is the name of the function.
Example
# define a function to compute power
power <- function(a,b) {
return (a^b)
}
Here, we have defined a function called power which
takes two parameters - a and b. Inside the
function, we have included a code to print the value of a
raised to the power b.
After you have defined the function, you can call the function using the function name and arguments. For example,
# call nested function
result <- power(3,2)
print(result)
## [1] 9
print(power(2,2)) # directly call the function
## [1] 4
Datastructures are a particular way of organizing data in the computer. Some important datastructures in R are given below:
A string is a sequence of characters. In R, we represent strings
using quotation marks " ".
mystring <- "hello world!" # an example string
print(mystring)
## [1] "hello world!"
# find length of string using nchar
nchar(mystring)
## [1] 12
# join strings using paste()
string1 <- "hello"
string2 <- "world"
paste(string1, string2)
## [1] "hello world"
# change to uppercase using toupper()
toupper(string1)
## [1] "HELLO"
# change to lowercase using tolower()
tolower("HELLO")
## [1] "hello"
One dimensional datastructure, all elements in vector should be of the same variable type (all numeric or all character/string etc).
In R, we use the c() function to create a vector.
# create vector of string types
employees <- c("Sabby", "Cathy", "Lucy")
print(employees)
## [1] "Sabby" "Cathy" "Lucy"
# a vector with number sequence from 1 to 5
numbers <- c(1, 2, 3, 4, 5)
print(numbers)
## [1] 1 2 3 4 5
# a vector with number sequence from 1 to 5
numbers <- 1:5
print(numbers)
## [1] 1 2 3 4 5
# repeat sequence of vector 2 times
numbers <- rep(c(2,4,6), times = 2)
print(numbers)
## [1] 2 4 6 2 4 6
# loop over a vector
numbers <- c(1, 2, 3, 4, 5)
# iterate through each elements of numbers
for (number in numbers) {
print(number)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
Access and Modify Vector Elements You can access
individual vector elements using number indexes 1,2,3..
# a vector of string type
languages <- c("Python", "Java", "R")
# access first element of languages
print(languages[1]) # "Python"
## [1] "Python"
# access third element of languages
print(languages[3]) # "R"
## [1] "R"
To change a vector element, we can simply reassign a new value to the specific index.
languages[2] <- "C++" # replacing Java with C++
print(languages)
## [1] "Python" "C++" "R"
Length of a vector We can find the length of a
vector using the length() function.
languages <- c("Python", "Java", "R")
length(languages)
## [1] 3
A matrix is a two-dimensional data structure where data are arranged
into rows and columns. The syntax of a matrix() function in
R is:
matrix(vector, nrow, ncol)
Example
# create a 2 by 3 matrix
matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)
print(matrix1)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
Since we have passed byrow = TRUE, the data items in the
matrix are filled row-wise.
# access matrix elements
matrix1[1,2] # first row, second column element
## [1] 2
A List is a collection of similar or different types of data. Can be
a combination of numeric, string,
boolean etc.
# list with similar type of data
list1 <- list(24, 29, 32, 34)
# list with different type of data
list2 <- list("University", 38, TRUE)
You can assess and modify each element in the list by number indexing
list1 <- list(24, "Sabby", 5.4, "Nepal")
# access 1st item
print(list1[1]) # 24
## [[1]]
## [1] 24
# access 4th item
print(list1[4]) # Nepal
## [[1]]
## [1] "Nepal"
# change element at index 2
list1[2] <- "Cathy"
# print updated list
print(list1)
## [[1]]
## [1] 24
##
## [[2]]
## [1] "Cathy"
##
## [[3]]
## [1] 5.4
##
## [[4]]
## [1] "Nepal"
You can add items to a list by using the append()
function.
list1 <- list(24, "Sabby", 5.4, "Nepal")
# using append() function
append(list1, 3.14)
## [[1]]
## [1] 24
##
## [[2]]
## [1] "Sabby"
##
## [[3]]
## [1] 5.4
##
## [[4]]
## [1] "Nepal"
##
## [[5]]
## [1] 3.14
You can remove list elements by the index number and the negative sign.
list1 <- list(24, "Sabby", 5.4, "Nepal")
# remove 4th item
print(list1[-4]) # Nepal
## [[1]]
## [1] 24
##
## [[2]]
## [1] "Sabby"
##
## [[3]]
## [1] 5.4
A data frame is a two-dimensional data structure which can store data in tabular format.
Data frames have rows and columns and each column can be a different vector. And different vectors can be of different data types.
In R, we use the data.frame() function to create a Data
Frame.
The syntax of the data.frame() function is:
dataframe1 <- data.frame(
first_col = c(val1, val2, ...),
second_col = c(val1, val2, ...),
...
)
Each column is a vector type.
Example
# Create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
print(dataframe1)
## Name Age Vote
## 1 Juan 22 TRUE
## 2 Alcaraz 15 FALSE
## 3 Simantha 19 TRUE
There are different ways to extract columns from a data frame. We can
use [ ], [[ ]], or $ to access
specific column of a data frame in R.
# Create a data frame
dataframe1 <- data.frame (
Name = c("Juan", "Alcaraz", "Simantha"),
Age = c(22, 15, 19),
Vote = c(TRUE, FALSE, TRUE)
)
# pass index number inside [ ]
print(dataframe1[1])
## Name
## 1 Juan
## 2 Alcaraz
## 3 Simantha
# pass column name inside [[ ]]
print(dataframe1[["Name"]])
## [1] "Juan" "Alcaraz" "Simantha"
# use $ operator and column name
print(dataframe1$Name)
## [1] "Juan" "Alcaraz" "Simantha"
In R, you can use the plot() function to plot points in
a graph. A variety of plot types like line plot, bar plot, histogram etc
are available in R.
An example line plot
# sequence vector of values
x <- seq(-3, 3, 0.25) # from -3 to 3 with 0.25 spacing
# respective sine value of x
y <- sin(x)
# plot y against x
plot(x,y) # points
plot(x, y, type = "l") #line plot
plot(x, y, type = "l", main = "My Plot", xlab = "X Axis", ylab = "Y Axis", col = "blue") # additional options
R offers options to import and export many file types. The most
common file type is comma separated CSV.
In R, you can use the read.csv() function to read a CSV
file available in your current directory. The syntax is
mydata <- read.csv("filename.csv")
To export, you can use write.csv() function.
write.csv(yourdata, "filename.csv")
There are also functions to read and write excel, dta and text files to name a few. You may need to install packages to call some of these functions.
Packages are collections of R functions. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.
To install a package, the syntax is:
install.packages("packagename")
To use functions from this packages, you need to load it by
library(packagename).
Note that you need to install a package only once. You should install packages from the console
For example, to read excel files you need the readxl
package. You can install this package by executing the command
install.packages("readxl") in your console. To use
functions from this package, you need to load it by using the
library() function.
library(readxl)
You can then use the read_excel function from this
package read_excel("yourfilename.xlsx").
To load data, either you should specify the directory where it is located or it should be in your current working directory
Some useful functions:
getwd() - to see which directory you are currently
working onsetwd() - to set a particular working directory, We
usually start by this function.In Windows, you often need double backslash\\
when specifying the directory. For example,
setwd("C:\\Users\\Desktop"). This is not the case
in Mac OS. Be wary of this!
There is a comprehensive built in help system for R, At the console, you can use any of the following:
help(foo) # for help about function foo
## No documentation for 'foo' in specified packages and libraries:
## you could try '??foo'
?foo # same as help(foo)
## No documentation for 'foo' in specified packages and libraries:
## you could try '??foo'
example(foo) # give an example of function foo
## Warning in example(foo): no help found for 'foo'
apropos("foo") # list all functions containing string foo
## [1] "citFooter"
Most of this document was made from Programiz.
There are a lot of online resources to learn R. I list a few below:
dplyr,
readr and `ggplot2 are quite useful.