Quick Overview of R

R is a programming language and software environment for statistical analysis, graphics representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team.

The core of R is an interpreted computer language which allows branching and looping as well as modular programming using functions. R allows integration with the procedures written in the C, C++, .Net, Python or FORTRAN languages for efficiency.

R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows and Mac. This programming language was named R, based on the first letter of first name of the two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S.

Target Audience

This tutorial is designed for experts in Remote Sensing domain(technical) programmers, statisticians, GIS experts in Road Network and data miners who are looking forward for developing statistical software using R programming. If you are trying to understand the R programming language as a beginner, this tutorial will give you basic understanding on almost all the concepts of the language from where you can take yourself to higher levels of expertise.

Prerequisites

Before proceeding with this tutorial, you should have a basic understanding of Computer Programming terminologies are very essential. A basic understanding of any of the programming languages will help you in understanding the R programming concepts and move fast on the learning track.

Evolution of R

R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. R made its first appearance in 1993.

  • A large group of individuals has contributed to R by sending code and bug reports.

  • Since mid-1997 there has been a core group (the “R Core Team”) who can modify the R source code archive.

Features of R

As mentioned above, R is a programming language and software environment basically for statistical analysis, graphics representation and reporting. The following are the important features of R;

  • R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities.

  • R has an effective data handling and storage facility,

  • R provides a suite of operators for calculations on arrays, lists, vectors and matrices.

  • R provides a large, coherent and integrated collection of tools for data analysis, more importantly, for spatial data analysis.

  • R provides graphical facilities for data analysis and display either directly at the computer or printing at the papers.

Why We Need to Use R?

  • It is a great resource for data analysis, data visualization, data science and machine learning.
  • It provides many statistical techniques (such as statistical tests, classification, clustering and data reduction)
  • It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc++
  • It works on different platforms (Windows, Mac, Linux)
  • It is open-source and free
  • It has a large community support
  • It has many packages (libraries of functions) that can be used to solve different problems

As a summary, R is world’s most widely used statistics programming language. It’s the # 3^rd choice of data scientists working in Spatial domain and supported by a vibrant and talented community of contributors. R is taught in universities and deployed in mission critical business applications. This tutorial will teach you R programming along with suitable examples in simple and easy steps.

R - Environment Setup

Local Environment Setup

If you are still willing to set up your environment for R, you can follow the steps given below.

Windows Installation

Follow my previous tutorial in the Link. As it is a Windows installer(.exe) with a name “R-version-win.exe”. You can just double click and run the installer accepting the default settings. If your Windows is 32-bit version, it installs the 32-bit version. But if your windows is 64-bit, then it installs both the 32-bit and 64-bit versions.

Linux Installation

R is available as a binary for many versions of Linux at the location R Binaries.

The instruction to install Linux varies from flavor to flavor. These steps are stated under each type of Linux version in the mentioned link. However, if you are in a hurry, then you can use yum command to install R as follows;

# $ yum install R

Above command will install core functionality of R programming along with standard packages, still you need additional package, then you can launch R prompt as follows;

Now we can start our job here

Now you can use install command at R prompt to install the required package. For example, the following command will install plotrix package which is required for 3D charts.

# install.packages("plotrix") # install.packages() used to install R packages, as you did pip install "package name" or conda install "package name" 

R - Basic Syntax

As a convention, we will start learning R programming by writing a “Hello, World!” program. Depending on the needs, you can program either at R command prompt or you can use an R script file to write your program. Let’s check both one by one. Then we will go through all in one using Rmarkdown.

R Command Prompt

Once you have R environment setup, then it’s easy to start your R command prompt by just typing the following command at your command prompt;

# $ R   # in linux or ubuntu environment
# > R   # in windows 

This will launch R interpreter and you will get a prompt > where you can start typing your program as follows;

myString <- "Hello, World!"
print ( myString)
## [1] "Hello, World!"

Here first statement defines a string variable myString, where we assign a string “Hello, World!” and then next statement print() is being used to print the value stored in variable myString.

R Script File

Basically, you will do your programming by writing your programs in script files and then you execute those scripts at your *command prompt with the help of R interpreter called Rscript. So let’s start with writing following code in a text file called My1stRP.R as under shown bellow;

# My first program in R Programming
# myString <- "Hello, World!" # Don't copy "#" this is a comment command. 

# print ( myString) # Don't copy "#" this is a comment command. 

Save the above code in a file My1stRP.R and execute it at Linux command prompt as given below. Even if you are using Windows or other system, syntax will remain same.

# $ Rscript My1stRP.R  # Don't copy "#" this is a comment command. 

When we run the above program, it produces the following result. [1] “Hello, World!”

Important Tips

Unlike many other programming languages, you can output code in R without using a print function:

"Good Morning!"
## [1] "Good Morning!"
print("Good Morning!", quote=FALSE)
## [1] Good Morning!

However, R does have a print() function available if you want to use it. This might be useful if you are familiar with other programming languages, such as Python, which often uses the print() function to output code.

for (x in 1:10) {
  print(x)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

Comments

Comments are like helping text in your R program and they are ignored by the interpreter while executing your actual program. Single comment is written using # in the beginning of the statement as follows;

# My first program in R Programming

R does not support *multi-line comments but you can perform a trick which is something as follows;

if(FALSE) {
   "This is a demo for multi-line comments and it should be put inside either a 
      single OR double quote"
}

myString <- "Hello, World!"
print ( myString)
## [1] "Hello, World!"

Though above comments will be executed by R interpreter, they will not interfere with your actual program. You should put such comments inside, either single or double quote.

Data Types in R

Generally, while doing programming in any programming language, you need to use various variables to store various information. Variables are nothing but reserved memory locations to store values. This means that, when you create a variable you reserve some space in memory.

You may like to store information of various data types like character, wide character, integer, floating point, double floating point, Boolean etc. Based on the data type of a variable, the operating system allocates memory and decides what can be stored in the reserved memory.

In contrast to other programming languages like C and java in R, the variables are not declared as some data type. The variables are assigned with R-Objects and the data type of the R-object becomes the data type of the variable. There are many types of R-objects, the frequently used ones are; - Vectors - Lists - Matrices - Arrays - Factors - Data Frames

The simplest of these objects is the vector object and there are six data types of these atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon the atomic vectors.

Data Type Description Verified R code
Logical TRUE, FALSE v <- TRUE
print(class(v))
output[1] “logical”
Numeric 12.3, 5, 999 v <- 23.5
print(class(v))
output[1] “numeric”
Integer 2L, 34L, 0L v <- 2L
print(class(v))
output[1] “integer”
Complex 3 + 2i v <- 2+5i
print(class(v))
output[1] “complex”
Character ’ a’ , ‘“good”, “TRUE”, ’23.4’ v <- “TRUE”
print(class(v))
output[1] “character”
Raw “H ello” is stored as 48 65 6c 6c 6f v <- charToRaw(“Hello”)
print(class(v))
output[1] “raw”

In R programming, the very basic data types are the R-objects called vectors which hold elements of different classes as shown above. Please note in R the number of classes is not confined to only the above six types. For example, we can use many atomic vectors and create an array whose class will become array.

Vectors

When you want to create vector with more than one element, you should use c() function which means to combine the elements into a vector.

# Create a vector.
RoadType <- c('Asphalt','Coble',"Gravel","...")
print(RoadType)
## [1] "Asphalt" "Coble"   "Gravel"  "..."
# Get the class of the vector.
print(class(RoadType))
## [1] "character"

Lists

A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it.

# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

# Print the list.
print(list1)
## [[1]]
## [1] 2 5 3
## 
## [[2]]
## [1] 21.3
## 
## [[3]]
## function (x)  .Primitive("sin")

Do the following as an exercise

data <- data.frame(x1 = 1:5,     # Create example data object
                   x2 = letters[1:5])
data                             # Print example data object
##   x1 x2
## 1  1  a
## 2  2  b
## 3  3  c
## 4  4  d
## 5  5  e

Then create List of Data Attributes Using attributes() Functions

data_attr <- attributes(data)    # Apply attributes function
data_attr                        # Print list of attributes
## $names
## [1] "x1" "x2"
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3 4 5
# $names
# [1] "x1" "x2"
# 
# $class
# [1] "data.frame"
# 
# $row.names
# [1] 1 2 3 4 5

As you can see in the previously shown RStudio console output, we have created a new list called data_attr, which contains multiple attributes of our example data frame. More precisely, we have returned the column names, the class, and the row.names of our data frame.

Extract Certain Attribute of Data Object

In this exercise, You’ll show how to extract only on specific attribute of a data object. More precisely, the following R code accesses the class attribute of a data object:

data_attr$class                  # Extract class attribute
## [1] "data.frame"
# [1] "data.frame"

As you can see, the class of our data object is the data.frame class.

Matrices

A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
##      [,1] [,2] [,3]
## [1,] "a"  "a"  "b" 
## [2,] "c"  "b"  "a"

More exercise, 1st create data:a As data1

data1 <- matrix(1:20, ncol = 5)                   # Create example matrix
data1                                             # Print example matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20

Set Row Names of Data without Knowing the Number of Rows

The following R code illustrates how to set the row numbers of a matrix from 1 to the number of rows without actually knowing the number of rows.

rownames(data1) <- 1:nrow(data1)                   # Set row names
data1                                             # Print updated data
##   [,1] [,2] [,3] [,4] [,5]
## 1    1    5    9   13   17
## 2    2    6   10   14   18
## 3    3    7   11   15   19
## 4    4    8   12   16   20

As shown in the result, we have set the row names of our matrix by executing the previously shown R programming syntax.

Set Column Names of Data without Knowing the Number of Columns

The syntax below explains how to define variable names of an unknown number of columns. For this task, we can apply the colnames, paste0, and ncol functions as shown below:

colnames(data1) <- paste0("Col", 1:ncol(data1))    # Set column names
data1                                             # Print updated data
##   Col1 Col2 Col3 Col4 Col5
## 1    1    5    9   13   17
## 2    2    6   10   14   18
## 3    3    7   11   15   19
## 4    4    8   12   16   20

Extract Values from Matrix by Column & Row Names in R

In this exercise, you’ll show us how to get certain column and row values using column and row names of a matrix in the R.

my_matrix <- matrix(1:15, ncol = 5)                # Create example matrix
colnames(my_matrix) <- paste0("Col", 1:5)
rownames(my_matrix) <- paste0("Row", 1:3)
my_matrix                                          # Print example matrix
##      Col1 Col2 Col3 Col4 Col5
## Row1    1    4    7   10   13
## Row2    2    5    8   11   14
## Row3    3    6    9   12   15
#      Col1 Col2 Col3 Col4 Col5
# Row1    1    4    7   10   13
# Row2    2    5    8   11   14
# Row3    3    6    9   12   15

As you can see based on the previously shown RStudio console output, our example matrix has three rows and five columns. The rows of our matrix are named Row1 – Row3 and the variables are named Col1 – Col5. Let’s extract some values of our matrix!

Extracting Certain Columns of Matrix by Column Names

my_matrix_col <- my_matrix[ , c("Col2", "Col5")]   # Extract columns
my_matrix_col                                      # Print updated matrix
##      Col2 Col5
## Row1    4   13
## Row2    5   14
## Row3    6   15
#      Col2 Col5
# Row1    4   13
# Row2    5   14
# Row3    6   15

Extracting Certain Rows of Matrix by Row Names

my_matrix_row <- my_matrix[c("Row2", "Row3"), ]    # Extract rows
my_matrix_row                                      # Print updated matrix
##      Col1 Col2 Col3 Col4 Col5
## Row2    2    5    8   11   14
## Row3    3    6    9   12   15
#      Col1 Col2 Col3 Col4 Col5
# Row2    2    5    8   11   14
# Row3    3    6    9   12   15

We have extracted all columns, but only the second and third row were kept.

Extracting Certain Columns & Rows of Matrix by Column & Row Names

The code explains how to use column and row names simultaneously to extract specific data points from our matrix:

my_matrix_col_row <- my_matrix[c("Row1", "Row3"),  # Extract columns & rows
                               c("Col1", "Col4", "Col5")]
my_matrix_col_row                                  # Print updated matrix
##      Col1 Col4 Col5
## Row1    1   10   13
## Row3    3   12   15
#      Col1 Col4 Col5
# Row1    1   10   13
# Row3    3   12   15
More tips:- Select Data Frame Columns by Logical Condition in R

In this practice you will illustrates how to extract particular data frame columns based on a logical condition in the R. First, we’ll need to create some data that we can use in the following examples:

data <- data.frame(x1 = 1:5,                      # Create example data
                   y1 = letters[1:5],
                   x2 = "x",
                   x3 = 9:5,
                   y2 = 7)
data                                              # Print example data
##   x1 y1 x2 x3 y2
## 1  1  a  x  9  7
## 2  2  b  x  8  7
## 3  3  c  x  7  7
## 4  4  d  x  6  7
## 5  5  e  x  5  7
#   x1 y1 x2 x3 y2
# 1  1  a  x  9  7
# 2  2  b  x  8  7
# 3  3  c  x  7  7
# 4  4  d  x  6  7
# 5  5  e  x  5  7

The previous output of the RStudio console shows the structure of our example data: It has five rows and five columns. Some of the variable names start with x and some of the variable names start with y.

Extract Data Frame Variables by Logical Condition Using grepl() Function

data_new1 <- data[ , grepl("x", colnames(data))]  # Extract by logical
data_new1                                         # Print updated data
##   x1 x2 x3
## 1  1  x  9
## 2  2  x  8
## 3  3  x  7
## 4  4  x  6
## 5  5  x  5
#   x1 x2 x3
# 1  1  x  9
# 2  2  x  8
# 3  3  x  7
# 4  4  x  6
# 5  5  x  5

Extract Data Frame Variables by Logical Condition Using select() & starts_with() Functions of dplyr Package

You’ll illustrate how to subset data frame columns whose names match a specific prefix condition. For this, we’ll use the dplyr add-on package. First, we have to install and load the dplyr package:

#install.packages("dplyr")                         # Install dplyr package
library("dplyr")                                  # Load dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data_new2 <- data %>%                             # Using dplyr functions
  select(starts_with("x"))
data_new2                                         # Print updated data
##   x1 x2 x3
## 1  1  x  9
## 2  2  x  8
## 3  3  x  7
## 4  4  x  6
## 5  5  x  5
#   x1 x2 x3
# 1  1  x  9
# 2  2  x  8
# 3  3  x  7
# 4  4  x  6
# 5  5  x  5

Subset Data Frame Rows by Logical Condition in R

In this tutorial you’ll learn how to subset rows of a data frame based on a logical condition in the R.

data <- data.frame(x1 = c(3, 7, 1, 8, 5),                    # Create example data
                   x2 = letters[1:5],
                   group = c("g1", "g2", "g1", "g3", "g1"))
data                                                         # Print example data
##   x1 x2 group
## 1  3  a    g1
## 2  7  b    g2
## 3  1  c    g1
## 4  8  d    g3
## 5  5  e    g1
# x1 x2 group
#  3  a    g1
#  7  b    g2
#  1  c    g1
#  8  d    g3
#  5  e    g1
Subset Rows with ==

In Example 1, we’ll filter the rows of our data with the == operator. Have a look at the following R code:

data[data$group == "g1", ]                                   # Subset rows with ==
##   x1 x2 group
## 1  3  a    g1
## 3  1  c    g1
## 5  5  e    g1
# x1 x2 group
#  3  a    g1
#  1  c    g1
#  5  e    g1
Subset Rows with !=

We can also subset our data the other way around (compared to Example 1). The following R code selects only rows where the group column is unequal to “g1”. We can do this based on the != operator:

data[data$group != "g1", ]                                   # Subset rows with !=
##   x1 x2 group
## 2  7  b    g2
## 4  8  d    g3
# x1 x2 group
#  7  b    g2
#  8  d    g3
Subset Rows with %in%

We can also use the %in% operator to filter data by a logical vector. The %in% operator is especially helpful, when we want to use multiple conditions. In the following R syntax, we retain rows where the group column is equal to “g1” OR “g3”:

data[data$group %in% c("g1", "g3"), ]                        # Subset rows with %in%
##   x1 x2 group
## 1  3  a    g1
## 3  1  c    g1
## 4  8  d    g3
## 5  5  e    g1
# x1 x2 group
#  3  a    g1
#  1  c    g1
#  8  d    g3
#  5  e    g1
Subset Rows with subset Function

Base R also provides the subset() function for the filtering of rows by a logical vector. Consider the following R code:

subset(data, group == "g1")                                  # Apply subset function
##   x1 x2 group
## 1  3  a    g1
## 3  1  c    g1
## 5  5  e    g1
# x1 x2 group
#  3  a    g1
#  1  c    g1
#  5  e    g1

The output is the same as in Example 1, but this time we used the subset function by specifying the name of our data frame and the logical criteria within the function.

Subset Rows with filter Function [dplyr Package]

Now, we can use the filter function of the dplyr package as follows:

filter(data, group == "g1")                                  # Apply filter function
##   x1 x2 group
## 1  3  a    g1
## 2  1  c    g1
## 3  5  e    g1
# x1 x2 group
#  3  a    g1
#  1  c    g1
#  5  e    g1

Compare the R syntax of Example subset Function and filter Function [dplyr Package]. The subset and filter functions are very similar.

Arrays

While matrices are confined to two dimensions, arrays can be of any number of dimensions. The array function takes a dim attribute which creates the required number of dimension. In the below example we create an array with two elements which are 3x3 matrices each.

# Create an array.
Roads <- array(c('Class 1','Class 2', 'class 3'),dim = c(3,3,2))
print(Roads)
## , , 1
## 
##      [,1]      [,2]      [,3]     
## [1,] "Class 1" "Class 1" "Class 1"
## [2,] "Class 2" "Class 2" "Class 2"
## [3,] "class 3" "class 3" "class 3"
## 
## , , 2
## 
##      [,1]      [,2]      [,3]     
## [1,] "Class 1" "Class 1" "Class 1"
## [2,] "Class 2" "Class 2" "Class 2"
## [3,] "class 3" "class 3" "class 3"

Factors

Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.

Factors are created using the factor() function. The nlevels functions gives the count of levels.

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.
factor_apple <- factor(apple_colors)

# Print the factor.
print(factor_apple)
## [1] green  green  yellow red    red    red    green 
## Levels: green red yellow
print(nlevels(factor_apple))
## [1] 3

Data Frames

Data frames are tabular data objects. Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical. It is a list of vectors of equal length. Data Frames are created using the data.frame() function.

# Create the data frame.
Lab1 <-     data.frame(
   road_type = c("Express", "Asphalt","Coble"), 
   Length_Km = c(200, 200, 200), 
   time_min = c(40,180, 240),
   DriverAge = c(42,38,26)
)
print(Lab1)
##   road_type Length_Km time_min DriverAge
## 1   Express       200       40        42
## 2   Asphalt       200      180        38
## 3     Coble       200      240        26

R - Variables

A variable provides us with named storage that our programs can manipulate. A variable in R can store an atomic vector, group of atomic vectors or a combination of many Robjects. A valid variable name consists of letters, numbers and the dot or underline characters. The variable name starts with a letter or the dot not followed by a number.

Variable Name Validity Reason
var_name2. valid Has letters, numbers, dot and underscore
var_name% I invalid Has the character ‘%’. Only dot(.) and underscore allowed.
2var_name invalid Starts with a number
.var_name, valid Can start with a dot(.) but the dot(.)should not be
var.name followed by a number.
.2var_name invalid The starting dot is followed by a number making it invalid.
_var_name invalid Starts with _ which is not valid

Variable Assignment

The variables can be assigned values using leftward, rightward and equal to operator. The values of the variables can be printed using print() or cat() function. The cat() function combines multiple items into a continuous print output.

# Assignment using equal operator.
var.1 = c(0,1,2,3)           

# Assignment using leftward operator.
var.2 <- c("learn","R")   

# Assignment using rightward operator.   
c(TRUE,1) -> var.3           

print(var.1)
## [1] 0 1 2 3
cat ("var.1 is ", var.1 ,"\n")
## var.1 is  0 1 2 3
cat ("var.2 is ", var.2 ,"\n")
## var.2 is  learn R
cat ("var.3 is ", var.3 ,"\n")
## var.3 is  1 1

Note − The vector c(TRUE,1) has a mix of logical and numeric class. So logical class is coerced to numeric class making TRUE as 1.

Data Type of a Variable

In R, a variable itself is not declared of any data type, rather it gets the data type of the R - object assigned to it. So R is called a dynamically typed language, which means that we can change a variable’s data type of the same variable again and again when using it in a program.

var_x <- "Hello"
cat("The class of var_x is ",class(var_x),"\n")
## The class of var_x is  character
var_x <- 34.5
cat("  Now the class of var_x is ",class(var_x),"\n")
##   Now the class of var_x is  numeric
var_x <- 27L
cat("   Next the class of var_x becomes ",class(var_x),"\n")
##    Next the class of var_x becomes  integer

Finding Variables

To know all the variables currently available in the workspace we use the ls() function. Also the ls() function can use patterns to match the variable names.

print(ls())
##  [1] "apple_colors"      "data"              "data_attr"        
##  [4] "data_new1"         "data_new2"         "data1"            
##  [7] "factor_apple"      "Lab1"              "list1"            
## [10] "M"                 "my_matrix"         "my_matrix_col"    
## [13] "my_matrix_col_row" "my_matrix_row"     "myString"         
## [16] "Roads"             "RoadType"          "var_x"            
## [19] "var.1"             "var.2"             "var.3"            
## [22] "x"

Note − It is a sample output depending on what variables are declared in your environment. The ls() function can use patterns to match the variable names.

# List the variables starting with the pattern "var".
print(ls(pattern = "var")) 
## [1] "var_x" "var.1" "var.2" "var.3"

Tips here: The variables starting with dot(.) are hidden, they can be listed using “all.names = TRUE” argument to ls() function.

print(ls(all.name = TRUE))
##  [1] "apple_colors"      "data"              "data_attr"        
##  [4] "data_new1"         "data_new2"         "data1"            
##  [7] "factor_apple"      "Lab1"              "list1"            
## [10] "M"                 "my_matrix"         "my_matrix_col"    
## [13] "my_matrix_col_row" "my_matrix_row"     "myString"         
## [16] "Roads"             "RoadType"          "var_x"            
## [19] "var.1"             "var.2"             "var.3"            
## [22] "x"

Deleting Variables

Variables can be deleted by using the rm() function. Below we delete the variable var.3. On printing the value of the variable error is thrown.

#rm(var.3)
#print(var.3)

[1] “var.3” Error in print(var.3) : object ‘var.3’ not found

All the variables can be deleted by using the rm() and ls() function together.

rm(list = ls())
print(ls())
## character(0)

Operators in R programming Languages

An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations. R language is rich in built-in operators and provides following types of operators. ## Types of Operators We have the following types of operators in R programming:

Arithmetic Operators

Following table shows the arithmetic operators supported by R language. The operators act on each element of the vector.

Operator Description R Code examples
+ Adds two vectors v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v+t)
_ Subtracts second vector
from the first print(v-t)
* Multiplies both vectors print(v*t)
/ Divide the first vector
with the second print(v/t)
%% Give the remainder
of the first vector print(v%%t)
with the second
%/% The result of division
of first vector with print(v%/%t)
second (quotient)
^ The first vector r print(v^t)
aised to the exponent
of second vector

Relational Operators

Following table shows the relational operators supported by R language. Each element of the first vector is compared with the corresponding element of the second vector. The result of comparison is a Boolean value.

Operator Description R Code examples
> Checks if each element of v <- c(2,5.5,6,9)
the first vector is greater t <- c(8,2.5,14,9)
than the corresponding print(v>t)
element of the second
vector.
< Checks if each element of
the first vector is less print(v < t)
than the corresponding
element of the second vector.
== Checks if each element of the print(v == t)
first vector is equal to the
corresponding element of the
second vector.
<= Checks if each element of the
first vector is less than or print(v<=t)
equal to the corresponding
element of the second vector.
>= Checks if each element of the
first vector is greater than print(v>=t)
or equal to the corresponding
element of the second vector.
!= Checks if each element of the print(v!=t)
first vector is unequal to the
corresponding element of the
second vector.

Logical Operators

Following table shows the logical operators supported by R language. It is applicable only to vectors of type logical, numeric or complex. All numbers greater than 1 are considered as logical value TRUE.

Each element of the first vector is compared with the corresponding element of the second vector. The result of comparison is a Boolean value.

Operator Description R Code examples
& It is called Element-wise Logical v <- c(3,1,TRUE,2+3i)
AND operator. It combines each t <- c(4,1,FALSE,2+3i)
element of the first vector with print(v&t)
the corresponding element of the
second vector and gives a output
TRUE if both the elements are TRUE.
| It is called Element-wise
Logical OR operator. It combines
each element of the first vector
with the corresponding element of print(v|t)
the second vector and gives a
output TRUE if one the elements
is TRUE.
! It is called Logical NOT operator. v <- c(3,0,TRUE,2+2i)
Takes each element of the vector print(!v)
and gives the opposite logical
value.

The logical operator && and || considers only the first element of the vectors and give a vector of single element as output.

Operator Description R Code examples
&& Called Logical AND operator. v <- c(3,0,TRUE,2+2i)
Takes first element of both the t <- (1,3,TRUE,2+3i)
vectors and gives the TRUE only print(v&&t)
if both are TRUE.
|| Called Logical OR operator. v <- c(0,0,TRUE,2+2i)
Takes first element of both the t <- c(0,3,TRUE,2+3i)
vectors and gives the TRUE if print(v||t)
one of them is TRUE.

Assignment Operators

These operators are used to assign values to vectors.

Operator Description R Code examples
<− Called Left Assignment v1 <- c(3,1,TRUE,2+3i)
or v2 <<- c(3,1,TRUE,2+3i)
= v3 = c(3,1,TRUE,2+3i)
or print(v1), print(v2)
<<− print(v3)
-> Called Right Assignment c(3,1,TRUE,2+3i)->> v1
or c(3,1,TRUE,2+3i)->> v2
->> print(v1), print(v2)

Miscellaneous Operators

These operators are used to for specific purpose and not general mathematical or logical computation.

Operator Description R Code examples
: Colon operator. It creates the v <- 2:8
series of numbers in sequence for print(v)
a vector. [1] 2 3 4 5 6 7 8
%in% This operator is used v1 <- 8, v2 <- 12
to identify t <- 1:10
if an element print(v1 %in% t)
belongs to a vector. print(v2 %in% t)
%*% This operator M = matrix( c(2,6,5,1,10,4),
is used to nrow = 2,ncol = 3,byrow = TRUE)
multiply t = M %*% t(M)
a matrix
with its print(t)
transpose

Decision making in R

Decision making structures require the programmer to specify one or more conditions to be evaluated or tested by the program, along with a statement or statements to be executed if the condition is determined to be true, and optionally, other statements to be executed if the condition is determined to be false.

Following is the general form of a typical decision making structure found in most of the programming languages

R provides the following types of decision making statements. Click the following links to check their detail.

  • if statment. An if statement consists of a Boolean expression followed by one or more statements. The basic syntax for creating an if statement in R is
# if(boolean_expression) {
#   // statement(s) will execute if the boolean expression is true.
# }

If the Boolean expression evaluates to be true, then the block of code inside the if statement will be executed. If Boolean expression evaluates to be false, then the first set of code after the end of the if statement (after the closing curly brace) will be executed.

Example

x <- 30L
if(is.integer(x)) {
   print("X is an Integer")
}
## [1] "X is an Integer"
  • if…else statement an if statement can be followed by an optional else statement, which executes when the Boolean expression is false.
#if(boolean_expression) {
#   // statement(s) will execute if the boolean expression is true.
# } else {
#   // statement(s) will execute if the boolean expression is false.
# }
x <- c("what","is","truth")

if("Truth" %in% x) {
   print("Truth is found")
} else {
   print("Truth is not found")
}
## [1] "Truth is not found"
The if…else if…else Statement

An if statement can be followed by an optional else if…else statement, which is very useful to test various conditions using single if…else if statement. When using if, else if, else statements there are few points to keep in mind.

  • An if can have zero or one else and it must come after any else if’s.

  • An if can have zero to many else if’s and they must come before the else.

  • Once an else if succeeds, none of the remaining else if’s or else’s will be tested. The basic syntax for creating an if…else if…else statement in R is

# if(boolean_expression 1) {
#   // Executes when the boolean expression 1 is true.
# } else if( boolean_expression 2) {
#   // Executes when the boolean expression 2 is true.
# } else if( boolean_expression 3) {
#   // Executes when the boolean expression 3 is true.
# } else {
#   // executes when none of the above condition is true.
# }
x <- c("what","is","truth")

if("Truth" %in% x) {
   print("Truth is found the first time")
} else if ("truth" %in% x) {
   print("truth is found the second time")
} else {
   print("No truth found")
}
## [1] "truth is found the second time"
  • switch statement A switch statement allows a variable to be tested for equality against a list of values. The basic syntax for creating a switch statement in R is −
# switch(expression, case1, case2, case3....)

The following rules apply to a switch statement −

  • If the value of expression is not a character string it is coerced to integer.

  • You can have any number of case statements within a switch. Each case is followed by the value to be compared to and a colon.

  • If the value of the integer is between 1 and nargs()−1 (The max number of arguments)then the corresponding element of case condition is evaluated and the result returned.

  • If expression evaluates to a character string then that string is matched (exactly) to the names of the elements.

  • If there is more than one match, the first matching element is returned.

  • No Default argument is available.

  • In the case of no match, if there is a unnamed element of … its value is returned. (If there is more than one such argument an error is returned.)

Example

x <- switch(
   3,
   "first",
   "second",
   "third",
   "fourth"
)
print(x)
## [1] "third"

Loops in R

There may be a situation when you need to execute a block of code several number of times. In general, statements are executed sequentially. The first statement in a function is executed first, followed by the second, and so on.

Programming languages provide various control structures that allow for more complicated execution paths.

A loop statement allows us to execute a statement or group of statements multiple times and the following is the general form of a loop statement in most of the programming languages

R programming language provides the following kinds of loop to handle looping requirements. Click the following links to check their detail.

  • repeat loop Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable. The basic syntax for creating a repeat loop in R is
# repeat { 
#   commands 
#   if(condition) {
#      break
#   }
# }

Example

v <- c("Hello","loop")
cnt <- 2

repeat {
   print(v)
   cnt <- cnt+1
   
   if(cnt > 5) {
      break
   }
}
## [1] "Hello" "loop" 
## [1] "Hello" "loop" 
## [1] "Hello" "loop" 
## [1] "Hello" "loop"
  • while loop Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body. The basic syntax for creating a while loop in R is
# while (test_expression) {
#   statement
# }

Here key point of the while loop is that the loop might not ever run. When the condition is tested and the result is false, the loop body will be skipped and the first statement after the while loop will be executed. Example,

v <- c("Hello","while loop")
cnt <- 2

while (cnt < 7) {
   print(v)
   cnt = cnt + 1
}
## [1] "Hello"      "while loop"
## [1] "Hello"      "while loop"
## [1] "Hello"      "while loop"
## [1] "Hello"      "while loop"
## [1] "Hello"      "while loop"
  • for loop Like a while statement, except that it tests the condition at the end of the loop body. A For loop is a repetition control structure that allows you to efficiently write a loop that needs to execute a specific number of times. The basic syntax for creating a for loop statement in R is
# for (value in vector) {
#    statements
# }

R’s for loops are particularly flexible in that they are not limited to integers, or even numbers in the input. We can pass character vectors, logical vectors, lists or expressions. For example,

v <- LETTERS[1:4]
for ( i in v) {
   print(i)
}
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "D"

Loop Control Statements

Loop control statements change execution from its normal sequence. When execution leaves a scope, all automatic objects that were created in that scope are destroyed.

R supports the following control statements. Click the following links to check their detail.

  • break statement Terminates the loop statement and transfers execution to the statement immediately following the loop. The break statement in R programming language has the following two usages;

  • When the break statement is encountered inside a loop, the loop is immediately terminated and program control resumes at the next statement following the loop.

  • It can be used to terminate a case in the switch statement. The basic syntax for creating a break statement in R is

# break

example,

v <- c("Hello","loop")
cnt <- 2

repeat {
   print(v)
   cnt <- cnt + 1
    
   if(cnt > 5) {
      break
   }
}
## [1] "Hello" "loop" 
## [1] "Hello" "loop" 
## [1] "Hello" "loop" 
## [1] "Hello" "loop"

Next statement the next statement simulates the behavior of R switch. The next statement in R programming language is useful when we want to skip the current iteration of a loop without terminating it. On encountering next, the R parser skips further evaluation and starts next iteration of the loop.

The basic syntax for creating a next statement in R is,

# next

example

v <- LETTERS[1:6]
for ( i in v) {
   
   if (i == "D") {
      next
   }
   print(i)
}
## [1] "A"
## [1] "B"
## [1] "C"
## [1] "E"
## [1] "F"

Functions in R

A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions.

The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects.

Function Definition

An R function is created by using the keyword function. The basic syntax of an R function definition is as follows

# function_name <- function(arg_1, arg_2, ...) {
#   Function body 
# }

Function Components

The different parts of a function are −

  • Function Name − This is the actual name of the function. It is stored in R environment as an object with this name.

  • Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.

  • Function Body − The function body contains a collection of statements that defines what the function does.

  • Return Value − The return value of a function is the last expression in the function body to be evaluated.

R has many in-built functions which can be directly called in the program without defining them first. We can also create and use our own functions referred as user defined functions.

Built-in Function

Simple examples of in-built functions are seq(), mean(), max(), sum(x) and paste(…) etc. They are directly called by user written programs. You can refer most widely used R functions. Example,

# Create a sequence of numbers from 32 to 44.
print(seq(32,44))
##  [1] 32 33 34 35 36 37 38 39 40 41 42 43 44
# Find mean of numbers from 25 to 82.
print(mean(25:82))
## [1] 53.5
# Find sum of numbers frm 41 to 68.
print(sum(41:68))
## [1] 1526

User-defined Function

We can create user-defined functions in R. They are specific to what a user wants and once created they can be used like the built-in functions. Below is an example of how a function is created and used.

# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
   for(i in 1:a) {
      b <- i^2
      print(b)
   }
}   

In this example, you’ll learn to take input from a user using readline() function. When we are working with R in an interactive session, you can use readline() function to take input from the user (terminal). This function will return a single element character vector. So, if we want numbers, we need to do appropriate conversions. #### Example: Take input from user

my.name <- readline(prompt="Enter name:")
## Enter name:
my.age <- readline(prompt="Enter age:")
## Enter age:
# convert character into integer
my.age <- as.integer(my.age)
print(paste("Hi,", my.name, "next year you will bee", my.age+1, "years old. "))
## [1] "Hi,  next year you will bee NA years old. "

Calling a Function

# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
   for(i in 1:a) {
      b <- i^2
      print(b)
   }
}

# Call the function new.function supplying 6 as an argument.
new.function(6)
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
## [1] 36

Calling a Function without an Argument

# Create a function without an argument.
new.function <- function() {
   for(i in 1:5) {
      print(i^2)
   }
}   

# Call the function without supplying an argument.
new.function()
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25

Calling a Function with Argument Values (by position and by name)

The arguments to a function call can be supplied in the same sequence as defined in the function or they can be supplied in a different sequence but assigned to the names of the arguments.

# Create a function with arguments.
new.function <- function(a,b,c) {
   result <- a * b + c
   print(result)
}

# Call the function by position of arguments.
new.function(5,3,11)
## [1] 26
# Call the function by names of the arguments.
new.function(a = 11, b = 5, c = 3)
## [1] 58

Calling a Function with Default Argument

We can define the value of the arguments in the function definition and call the function without supplying any argument to get the default result. But we can also call such functions by supplying new values of the argument and get non default result.

# Create a function with arguments.
new.function <- function(a = 3, b = 6) {
   result <- a * b
   print(result)
}

# Call the function without giving any argument.
new.function()
## [1] 18
# Call the function with giving new values of the argument.
new.function(9,5)
## [1] 45

Lazy Evaluation of Function

Arguments to functions are evaluated lazily, which means so they are evaluated only when needed by the function body.

# Create a function with arguments.
# new.function <- function(a, b) {
#   print(a^2)
#   print(a)
#   print(b)
#}

# Evaluate the function without supplying one of the arguments.
#new.function(6)

[1] 36 [1] 6 Error in print(b) : argument “b” is missing, with no default

Strings in R

Any value written within a pair of single quote or double quotes in R is treated as a string. Internally R stores every string within double quotes, even when you create them with single quote.

Rules Applied in String Construction

  • The quotes at the beginning and end of a string should be both double quotes or both single quote. They can not be mixed.

  • Double quotes can be inserted into a string starting and ending with single quote.

  • Single quote can be inserted into a string starting and ending with double quotes.

  • Double quotes can not be inserted into a string starting and ending with double quotes.

  • Single quote can not be inserted into a string starting and ending with single quote. The following examples clarify the rules about creating a string in R.

a <- 'Start and end with single quote'
print(a)
## [1] "Start and end with single quote"
b <- "Start and end with double quotes"
print(b)
## [1] "Start and end with double quotes"
c <- "single quote ' in between double quotes"
print(c)
## [1] "single quote ' in between double quotes"
d <- 'Double quotes " in between single quote'
print(d)
## [1] "Double quotes \" in between single quote"

Examples of Invalid String usage

#e <- 'Mixed quotes" 
#print(e)

#f <- 'Single quote ' inside single quote'
#print(f)

#g <- "Double quotes " inside double quotes"
#print(g)

Error: unexpected symbol in: “print(e) f <- ’Single” Execution halted

String Manipulation

Concatenating Strings - paste() function Many strings in R are combined using the paste() function. It can take any number of arguments to be combined together.

Syntax

# paste(..., sep = " ", collapse = NULL)  # The basic syntax for paste function

Following is the description of the parameters used −

  • represents any number of arguments to be combined.

  • sep represents any separator between the arguments. It is optional.

  • collapse is used to eliminate the space in between two strings. But not the space within two words of one string.

Example

a <- "Hello"
b <- 'How'
c <- "are you? "

print(paste(a,b,c))
## [1] "Hello How are you? "
print(paste(a,b,c, sep = "-"))
## [1] "Hello-How-are you? "
print(paste(a,b,c, sep = "", collapse = ""))
## [1] "HelloHoware you? "

Formatting numbers & strings - format() function

Numbers and strings can be formatted to a specific style using format() function.

Syntax

#format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) # This is he basic syntax for **format function**

The following are the description of the parameters to be used:

  • x is the input vector.

  • digits is the total number of digits displayed.

  • nsmall is the minimum number of digits to the right of the decimal point.

  • scientific is set to TRUE to display scientific notation.

  • width indicates the minimum width to be displayed by padding blanks in the beginning.

  • justify is the display of the string to left, right or center.

Example,

# Total number of digits displayed. Last digit rounded off.
result <- format(23.123456789, digits = 9)
print(result)
## [1] "23.1234568"
# Display numbers in scientific notation.
result <- format(c(6, 13.14521), scientific = TRUE)
print(result)
## [1] "6.000000e+00" "1.314521e+01"
# The minimum number of digits to the right of the decimal point.
result <- format(23.47, nsmall = 5)
print(result)
## [1] "23.47000"
# Format treats everything as a string.
result <- format(6)
print(result)
## [1] "6"
# Numbers are padded with blank in the beginning for width.
result <- format(13.7, width = 6)
print(result)
## [1] "  13.7"
# Left justify strings.
result <- format("Hello", width = 8, justify = "l")
print(result)
## [1] "Hello   "
# Justfy string with center.
result <- format("Hello", width = 8, justify = "c")
print(result)
## [1] " Hello  "

Counting number of characters in a string - nchar() function

This function counts the number of characters including spaces in a string. The description bellow is the parameters we’ll used in −

  • x is the vector input. Syntax
# nchar(x)   # This is the basic syntax for nchar() function. 

Example,

result <- nchar("Count the number of characters")
print(result)
## [1] 30

Changing the case - toupper() & tolower() functions

These functions change the case of characters of a string.

Syntax

# toupper(x)     # This is the basic syntax for toupper() & tolower() function.
 # tolower(x)

Example,

# Changing to Upper case.
result <- toupper("Changing To Upper")
print(result)
## [1] "CHANGING TO UPPER"
# Changing to lower case.
result <- tolower("Changing To Lower")
print(result)
## [1] "changing to lower"

Extracting parts of a string - substring() function

This function extracts parts of a String.

Syntax

# substring(x,first,last) # This is the basic syntax for substring() function. 

Following is the description of the parameters used −

  • x is the character input vector.

  • first is the position of the first character to be extracted.

  • last is the position of the last character to be extracted.

Example,

# Extract characters from 5th to 7th position.
result <- substring("Extract", 5, 7)
print(result)
## [1] "act"

Vectors in R

Vectors are the most basic R data objects and there are six types of atomic vectors seen above. They are logical, integer, double, complex, character and raw. In this portion we will see how to create and manipulate vectors in different forms.

Vector Creation

Single Element Vector

Even when you write just one value in R, it becomes a vector of length 1 and belongs to one of the above vector types. For example,

# Atomic vector of type character.
print("Roads");
## [1] "Roads"
# Atomic vector of type double.
print(1.5)
## [1] 1.5
# Atomic vector of type integer.
print(21L)
## [1] 21
# Atomic vector of type logical.
print(TRUE)
## [1] TRUE
# Atomic vector of type complex.
print(6+6i)
## [1] 6+6i
# Atomic vector of type raw.
print(charToRaw('Mohammed'))
## [1] 4d 6f 68 61 6d 6d 65 64

Multiple Elements of Vector Let’s see how it looks like using colon operator with numeric data.

# Creating a sequence from 2 to 12.
v <- 2:12
print(v)
##  [1]  2  3  4  5  6  7  8  9 10 11 12
# Creating a sequence from 11.7 to 19.7.
v <- 11.7:19.7  # this code assigning decimal sequence number to variable v.
print(v) # This will display the results of sequence of decimal numbers starting with 11.7, 12.7, ... 19.7  
## [1] 11.7 12.7 13.7 14.7 15.7 16.7 17.7 18.7 19.7
# If the final element specified does not belong to the sequence then it is discarded.
v <- 2.5:13.4  # this code discards 13.4 and and end up with 12.5
print(v)
##  [1]  2.5  3.5  4.5  5.5  6.5  7.5  8.5  9.5 10.5 11.5 12.5

Using sequence (Seq.) operator

# Create vector with elements from 1 to 7 incrementing by 0.2.
print(seq(1, 7, by = 0.2))
##  [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6
## [20] 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0
Using the c() function

The non-character values are coerced to character type if one of the elements is a character. Example,

# The logical and numeric values are converted to characters.
s <- c('Express','Gravel',5,TRUE)
print(s)
## [1] "Express" "Gravel"  "5"       "TRUE"

Accessing Vector Elements

Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing. Indexing starts with position 1. Giving a negative value in the index drops that element from result. TRUE, FALSE or 0 and 1 can also be used for indexing. Example;

# Accessing vector elements using position.
t <- c("Sunday","Monday","Tuesday","Wednsday","Thursday","Friday","Saturday")
u <- t[c(2,3,6)]
print(u)
## [1] "Monday"  "Tuesday" "Friday"
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)
## [1] "Sunday" "Friday"
# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)
## [1] "Sunday"   "Tuesday"  "Wednsday" "Friday"   "Saturday"
# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)
## [1] "Sunday"

Vector Manipulation

Vector arithmetic

Two vectors of same length can be added, subtracted, multiplied or divided giving the result as a vector output. Example;

# Create two vectors.
a1 <- c(3,6,4,9,0,14)
a2 <- c(7,12,0,8,1,2)

# Vector addition.
add.result <- a1+a2
print(add.result)
## [1] 10 18  4 17  1 16
# Vector subtraction.
sub.result <- a1-a2
print(sub.result)
## [1] -4 -6  4  1 -1 12
# Vector multiplication.
multi.result <- a1*a2
print(multi.result)
## [1] 21 72  0 72  0 28
# Vector division.
divi.result <- a1/a2
print(divi.result)
## [1] 0.4285714 0.5000000       Inf 1.1250000 0.0000000 7.0000000

Vector Element Recycling

If we apply arithmetic operations to two vectors of unequal length, then the elements of the shorter vector are recycled to complete the operations. Example,

a1 <- c(3,8,4,5,0,11)
a2 <- c(3,7)
# a2 becomes c(3,7,3,3,3,7)

add.result <- a1+a2
print(add.result)
## [1]  6 15  7 12  3 18
sub.result <- a1-a2
print(sub.result)
## [1]  0  1  1 -2 -3  4

Vector Element Sorting

Elements in a vector can be sorted using the sort() function. Example;

a <- c(3,8,4,5,0,11, -9, 304)

# Sort the elements of the vector.
sort.result <- sort(a)
print(sort.result)
## [1]  -9   0   3   4   5   8  11 304
# Sort the elements in the reverse order.
revsort.result <- sort(a, decreasing = TRUE)
print(revsort.result)
## [1] 304  11   8   5   4   3   0  -9
# Sorting character vectors.
a <- c("Red","Blue","yellow","violet")
sort.result <- sort(a)
print(sort.result)
## [1] "Blue"   "Red"    "violet" "yellow"
# Sorting character vectors in reverse order.
revsort.result <- sort(a, decreasing = TRUE)
print(revsort.result)
## [1] "yellow" "violet" "Red"    "Blue"

Lists Manipulation in R

Lists are the R objects which contain elements of different types such as numbers, strings, vectors and another list inside. A list can also contain a matrix or a function as its elements. List is created using list() function.

Creating a List in R

Following is an example to create a list containing strings, numbers, vectors and a logical values shown below.

# Create a list containing strings, numbers, vectors and a logical values.
list_data <- list("Asphalt", "Express", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)
## [[1]]
## [1] "Asphalt"
## 
## [[2]]
## [1] "Express"
## 
## [[3]]
## [1] 21 32 11
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] 51.23
## 
## [[6]]
## [1] 119.1

Naming List Elements

The list elements can be given names and they can be accessed using these names. For example,

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("January","February","March"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.
print(list_data)
## $`1st Quarter`
## [1] "January"  "February" "March"   
## 
## $A_Matrix
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
## 
## $`A Inner list`
## $`A Inner list`[[1]]
## [1] "green"
## 
## $`A Inner list`[[2]]
## [1] 12.3

Accessing List Elements

Elements of the list can be accessed by the index of the element in the list. In case of named lists it can also be accessed using the names. Let’s repeat the above example here.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("January","February","March"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Access the first element of the list.
print(list_data[1])
## $`1st Quarter`
## [1] "January"  "February" "March"

Manipulating List Elements in R

We can add, delete and update list elements. We can add and delete elements *only at the end of a list. But we can update any element.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Express","Asphalt","Gravel"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("Main Roads", "A_Matrix", "A Inner list")

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
## [[1]]
## [1] "New element"
# Remove the last element.
list_data[4] <- NULL

# Print the 4th Element.
print(list_data[4])
## $<NA>
## NULL
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])
## $`A Inner list`
## [1] "updated element"

Merging Lists

You can merge many lists into one list by placing all the lists inside one list() function.

# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Express","Asphalt","Gravel")

# Merge the two lists.
merged.list <- c(list1,list2)

# Print the merged list.
print(merged.list)
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] 3
## 
## [[4]]
## [1] "Express"
## 
## [[5]]
## [1] "Asphalt"
## 
## [[6]]
## [1] "Gravel"

Converting List to Vector

A list can be converted to a vector so that the elements of the vector can be used for further manipulation. All the arithmetic operations on vectors can be applied after the list is converted into vectors. To do this conversion, we use the unlist() function. It takes the list as input and produces a vector.

# Create lists.
list1 <- list(1:5)
print(list1)
## [[1]]
## [1] 1 2 3 4 5
list2 <-list(10:14)
print(list2)
## [[1]]
## [1] 10 11 12 13 14
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)

print(v1)
## [1] 1 2 3 4 5
print(v2)
## [1] 10 11 12 13 14
# Now add the vectors
result <- v1+v2
print(result)
## [1] 11 13 15 17 19

Matrices Manipulation in R

Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. Though we can create a matrix containing only characters or only logical values, they are not of much use. In most cases we use matrices containing numeric elements to be used in mathematical calculations. A Matrix is created using the matrix() function.

Syntax

# matrix(data, nrow, ncol, byrow, dimnames) # The basic syntax of a matrix

The following statements are the description of the parameters used

  • data is the input vector which becomes the data elements of the matrix.

  • nrow is the number of rows to be created.

  • ncol is the number of columns to be created.

  • byrow is a logical clue. If TRUE then the input vector elements are arranged by row.

  • dimname is the names assigned to the rows and columns.

Example, Create a matrix taking a vector of numbers as input:

# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)
##      [,1] [,2] [,3]
## [1,]    3    4    5
## [2,]    6    7    8
## [3,]    9   10   11
## [4,]   12   13   14
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
##      [,1] [,2] [,3]
## [1,]    3    7   11
## [2,]    4    8   12
## [3,]    5    9   13
## [4,]    6   10   14
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)
##      col1 col2 col3
## row1    3    4    5
## row2    6    7    8
## row3    9   10   11
## row4   12   13   14

Accessing Elements of a Matrix

Elements of a matrix can be accessed by using the column and row index of the element. We consider the matrix P above to find the specific elements below.

# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))

# Access the element at 3rd column and 1st row.
print(P[1,3])
## [1] 5
# Access the element at 2nd column and 4th row.
print(P[4,2])
## [1] 13
# Access only the  2nd row.
print(P[2,])
## col1 col2 col3 
##    6    7    8
# Access only the 3rd column.
print(P[,3])
## row1 row2 row3 row4 
##    5    8   11   14

Matrix Computations

Various mathematical operations are performed on the matrices using the R operators. The result of the operation is also a matrix.

The dimensions (number of rows and columns) should be same for the matrices involved in the operation.

Matrix Addition & Subtraction

# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
##      [,1] [,2] [,3]
## [1,]    3   -1    2
## [2,]    9    4    6
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
##      [,1] [,2] [,3]
## [1,]    5    0    3
## [2,]    2    9    4
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
## Result of addition
print(result)
##      [,1] [,2] [,3]
## [1,]    8   -1    5
## [2,]   11   13   10
# Subtract the matrices
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
## Result of subtraction
print(result)
##      [,1] [,2] [,3]
## [1,]   -2   -1   -1
## [2,]    7   -5    2

Matrix Multiplication & Division

# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
##      [,1] [,2] [,3]
## [1,]    3   -1    2
## [2,]    9    4    6
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
##      [,1] [,2] [,3]
## [1,]    5    0    3
## [2,]    2    9    4
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
## Result of multiplication
print(result)
##      [,1] [,2] [,3]
## [1,]   15    0    6
## [2,]   18   36   24
# Divide the matrices
result <- matrix1 / matrix2
cat("Result of division","\n")
## Result of division
print(result)
##      [,1]      [,2]      [,3]
## [1,]  0.6      -Inf 0.6666667
## [2,]  4.5 0.4444444 1.5000000

Array Manipulation in R

Arrays are the R data objects which can store data in more than two dimensions. For example If we create an array of dimension (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data type.

An array is created using the array() function. It takes vectors as input and uses the values in the dim parameter to create an array. The following example creates an array of two 3x3 matrices each with 3 rows and 3 columns.

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15

Naming Columns and Rows

We can give names to the rows, columns and matrices in the array by using the dimnames parameter.

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,column.names,
   matrix.names))
print(result)
## , , Matrix1
## 
##      COL1 COL2 COL3
## ROW1    5   10   13
## ROW2    9   11   14
## ROW3    3   12   15
## 
## , , Matrix2
## 
##      COL1 COL2 COL3
## ROW1    5   10   13
## ROW2    9   11   14
## ROW3    3   12   15

Accessing Array Elements

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,
   column.names, matrix.names))

# Print the third row of the second matrix of the array.
print(result[3,,2])
## COL1 COL2 COL3 
##    3   12   15
# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])
## [1] 13
# Print the 2nd Matrix.
print(result[,,2])
##      COL1 COL2 COL3
## ROW1    5   10   13
## ROW2    9   11   14
## ROW3    3   12   15

Manipulating Array Elements

As array is made up matrices in multiple dimensions, the operations on elements of array are carried out by accessing elements of the matrices.

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim = c(3,3,2))

# Create two vectors of different lengths.
vector3 <- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector1,vector2),dim = c(3,3,2))

# create matrices from these arrays.
matrix1 <- array1[,,2]
matrix2 <- array2[,,2]

# Add the matrices.
result <- matrix1+matrix2
print(result)
##      [,1] [,2] [,3]
## [1,]   10   20   26
## [2,]   18   22   28
## [3,]    6   24   30

Calculations Across Array Elements

We can do calculations across the elements in an array using the apply() function.

Syntax

## apply(x, margin, fun) # This is the basic syntax

The description of the parameters used are;

  • x is an array.

  • margin is the name of the data set used.

  • fun is the function to be applied across the elements of the array.

We used the apply() function below to calculate the sum of the elements in the rows of an array across all the matrices.

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    5   10   13
## [2,]    9   11   14
## [3,]    3   12   15
# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)
## [1] 56 68 60

Manipulation of Factors in R

Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like “Male”, “Female” and True, False etc. They are useful in data analysis for spatial and statistical modeling.

Factors are created using the factor () function by taking a vector as input.

# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
##  [1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West" 
## [10] "East"  "North"
print(is.factor(data))
## [1] FALSE
# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
##  [1] East  West  East  North North East  West  West  West  East  North
## Levels: East North West
print(is.factor(factor_data))
## [1] TRUE

R Factors in Data Frame

On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it. For example;

# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)
##   height weight gender
## 1    132     48   male
## 2    151     49   male
## 3    162     66 female
## 4    139     53 female
## 5    166     67   male
## 6    147     52 female
## 7    122     40   male
# Test if the gender column is a factor.
print(is.factor(input_data$gender))
## [1] FALSE
# Print the gender column so see the levels.
print(input_data$gender)
## [1] "male"   "male"   "female" "female" "male"   "female" "male"

Changing the Order of Levels

The order of the levels in a factor can be changed by applying the factor function again with new order of the levels. For example;

data <- c("East","West","East","North","North","East","West",
   "West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)
##  [1] East  West  East  North North East  West  West  West  East  North
## Levels: East North West
# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)
##  [1] East  West  East  North North East  West  West  West  East  North
## Levels: East West North

Generating Factor Levels in R

We can generate factor levels by using the gl() function. It takes two integers as input which indicates how many levels and how many times each level.

Syntax

# gl(n, k, labels)  # this is a syntax for generating factor levels in R.

the description of the parameters are;

  • n is a integer giving the number of levels.

  • k is a integer giving the number of replications.

  • labels is a vector of labels for the resulting factor levels.

Example,

v <- gl(6, 2, labels = c("Addis.Ababa", "Adama","Hawassa","Mekele","Gonder", "Dessie"))
print(v)
##  [1] Addis.Ababa Addis.Ababa Adama       Adama       Hawassa     Hawassa    
##  [7] Mekele      Mekele      Gonder      Gonder      Dessie      Dessie     
## Levels: Addis.Ababa Adama Hawassa Mekele Gonder Dessie

Manipulate Data Frames in R

A data frame is a table or a two-dimensional array like structure in which each column contains values of one variable and each row contains one set of values from each column.

Following are the characteristics of a data frame.

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor or character type.
  • Each column should contain same number of data items.

Create Data Frame

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Abel","Abdi","Birtukan","Abenet","Mekannent"),
   salary = c(2623.3,2715.2,2811.0,2929.0,3043.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame.         
print(emp.data) 
##   emp_id  emp_name  salary start_date
## 1      1      Abel 2623.30 2012-01-01
## 2      2      Abdi 2715.20 2013-09-23
## 3      3  Birtukan 2811.00 2014-11-15
## 4      4    Abenet 2929.00 2014-05-11
## 5      5 Mekannent 3043.25 2015-03-27

Get the Structure of the Data Frame

The structure of the data frame can be seen by using str() function. Let’s use the above example here,

# Get the structure of the data frame.
str(emp.data)
## 'data.frame':    5 obs. of  4 variables:
##  $ emp_id    : int  1 2 3 4 5
##  $ emp_name  : chr  "Abel" "Abdi" "Birtukan" "Abenet" ...
##  $ salary    : num  2623 2715 2811 2929 3043
##  $ start_date: Date, format: "2012-01-01" "2013-09-23" ...

Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary() function. Let’s repat the above example here again;

# Print the summary.
print(summary(emp.data))  
##      emp_id    emp_name             salary       start_date        
##  Min.   :1   Length:5           Min.   :2623   Min.   :2012-01-01  
##  1st Qu.:2   Class :character   1st Qu.:2715   1st Qu.:2013-09-23  
##  Median :3   Mode  :character   Median :2811   Median :2014-05-11  
##  Mean   :3                      Mean   :2824   Mean   :2014-01-14  
##  3rd Qu.:4                      3rd Qu.:2929   3rd Qu.:2014-11-15  
##  Max.   :5                      Max.   :3043   Max.   :2015-03-27

Extract Data from Data Frame in R

Extract specific column from a data frame using column name. We used the above example again here

# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)
##   emp.data.emp_name emp.data.salary
## 1              Abel         2623.30
## 2              Abdi         2715.20
## 3          Birtukan         2811.00
## 4            Abenet         2929.00
## 5         Mekannent         3043.25

Extract the first two rows and then all columns

# Extract first two rows.
result <- emp.data[1:2,]   # this code will extract the 1st two rows and all columns
print(result)
##   emp_id emp_name salary start_date
## 1      1     Abel 2623.3 2012-01-01
## 2      2     Abdi 2715.2 2013-09-23

More example,

# Extract 3rd and 5th row with 2nd and 4th column.
result <- emp.data[c(3,5),c(2,4)] #Extract 3rd and 5th row with 2nd and 4th column
print(result)
##    emp_name start_date
## 3  Birtukan 2014-11-15
## 5 Mekannent 2015-03-27

Expand Data Frame in R

A data frame can be expanded by adding columns and rows. #### Add Column Just add the column vector using a new column name for the previous example.

# Add the "dept" coulmn.
emp.data$dept <- c("GIS","Geomatics","Mapping","LiDAR","Remote Sensing")
v <- emp.data
print(v)
##   emp_id  emp_name  salary start_date           dept
## 1      1      Abel 2623.30 2012-01-01            GIS
## 2      2      Abdi 2715.20 2013-09-23      Geomatics
## 3      3  Birtukan 2811.00 2014-11-15        Mapping
## 4      4    Abenet 2929.00 2014-05-11          LiDAR
## 5      5 Mekannent 3043.25 2015-03-27 Remote Sensing

Add Row

To add more rows permanently to an existing data frame, we need to bring in the new rows in the same structure as the existing data frame and use the rbind() function.

In the example below we create a data frame with new rows and merge it with the existing data frame to create the final data frame above.

# Create the second data frame
emp.newdata <-  data.frame(
   emp_id = c (6:8), 
   emp_name = c("Surafel","Meron","Tesfaye"),
   salary = c(1578.0,1722.5,1632.8), 
   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","HR","Fianance"),
   stringsAsFactors = FALSE
)

# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
##   emp_id  emp_name  salary start_date           dept
## 1      1      Abel 2623.30 2012-01-01            GIS
## 2      2      Abdi 2715.20 2013-09-23      Geomatics
## 3      3  Birtukan 2811.00 2014-11-15        Mapping
## 4      4    Abenet 2929.00 2014-05-11          LiDAR
## 5      5 Mekannent 3043.25 2015-03-27 Remote Sensing
## 6      6   Surafel 1578.00 2013-05-21             IT
## 7      7     Meron 1722.50 2013-07-30             HR
## 8      8   Tesfaye 1632.80 2014-06-17       Fianance

R - Packages

R packages are a collection of R functions, complied code and sample data. They are stored under a directory called “library” in the R environment. By default, R installs a set of packages during installation. More packages are added later, when they are needed for some specific purpose. When we start the R console, only the default packages are available by default. Other packages which are already installed have to be loaded explicitly to be used by the R program that is going to use them.

Check Available R Packages

Get library locations containing R packages

## .libPaths() this code tells you where your installed libraries are located in your pc.

Get the list of all the packages installed

# library()  # this code tells you the available packages installed inside the library directory in computer

Get all packages currently loaded in the R environment

# search() This code will help you to get all packages currently loaded in the R environment

Install a New Package

There are two ways to add new R packages. One is installing directly from the CRAN directory and another is downloading the package to your local system and installing it manually.

Install directly from CRAN

The following command gets the packages directly from CRAN webpage and installs the package in the R environment. You may be prompted to choose a nearest mirror. Choose the one appropriate to your location.

# install.packages("Package Name")
 
# Install the package named "XML".
 # install.packages("XML")

Install package manually

Go to the link R Packages to download the package needed. Save the package as a .zip file in a suitable location in the local system.

Now you can run the following command to install this package in the R environment.

# install.packages(file_name_with_path, repos = NULL, type = "source")

# Install the package named "XML"
# install.packages("E:/XML_3.98-1.3.zip", repos = NULL, type = "source")

Load Package to Library

Before a package can be used in the code, it must be loaded to the current R environment. You also need to load a package that is already installed previously but not available in the current environment.

A package is loaded using the following command

## library("package Name", lib.loc = "path to library")

# Load the package named "XML"
## install.packages("E:/XML_3.98-1.3.zip", repos = NULL, type = "source")

Data Reshaping In R

Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame. It is easy to extract data from the rows and columns of a data frame but there are situations when we need the data frame in a format that is different from format in which we received it. R has many functions to split, merge and change the rows to columns and vice-versa in a *data frame.

Joining Columns and Rows in a Data Frame

We can join multiple vectors to create a data frame using the cbind() function. Also we can merge two data frames using rbind() function.

# Create vector objects.
city <- c("Addis Ababa","Adama","Mekele","Bahir Dar")
state <- c("AA","OR","TG","AM")
zipcode <- c(01000,03000,04000,02000)

# Combine above three vectors into one data frame.
addresses <- cbind(city,state,zipcode)

# Print a header.
cat("# # # # The First data frame\n") 
## # # # # The First data frame
# Print the data frame.
print(addresses)
##      city          state zipcode
## [1,] "Addis Ababa" "AA"  "1000" 
## [2,] "Adama"       "OR"  "3000" 
## [3,] "Mekele"      "TG"  "4000" 
## [4,] "Bahir Dar"   "AM"  "2000"
# Create another data frame with similar columns
new.address <- data.frame(
   city = c("Hawassa","Sheger"),
   state = c("SID","OR"),
   zipcode = c("05000","06000"),
   stringsAsFactors = FALSE
)

# Print a header.
cat("# # # The Second data frame\n") 
## # # # The Second data frame
# Print the data frame.
print(new.address)
##      city state zipcode
## 1 Hawassa   SID   05000
## 2  Sheger    OR   06000
# Combine rows form both the data frames.
all.addresses <- rbind(addresses,new.address)
# Print a header.
cat("# # # The combined data frame\n") 
## # # # The combined data frame
# Print the result.
print(all.addresses)
##          city state zipcode
## 1 Addis Ababa    AA    1000
## 2       Adama    OR    3000
## 3      Mekele    TG    4000
## 4   Bahir Dar    AM    2000
## 5     Hawassa   SID   05000
## 6      Sheger    OR   06000

Merging Data Frames

We can merge two data frames by using the merge() function. The data frames must have same column names on which the merging happens.

In the example below, we consider the data sets about Diabetes in Pima Indian Women available in the library names “MASS”. we merge the two data sets based on the values of blood pressure(“bp”) and body mass index(“bmi”). On choosing these two columns for merging, the records where values of these two variables match in both data sets are combined together to form a single data frame.

# install.packages("MASS")
# library(MASS)
# merged.Pima <- merge(x = Pima.te, y = Pima.tr,
  # by.x = c("bp", "bmi"),
  # by.y = c("bp", "bmi")
#)
# print(merged.Pima)
# nrow(merged.Pima)

Melting and Casting

One of the most interesting aspects of R programming is about changing the shape of the data in multiple steps to get a desired shape. The functions used to do this are called melt() and cast().

We consider the dataset called ships present in the library called “MASS”.

# library(MASS)
# print(ships)

Melt the Data

Now we melt the data to organize it, converting all columns other than type and year into multiple rows.

# molten.ships <- melt(ships, id = c("type","year"))
 #  print(molten.ships)

Cast the Molten Data

We can cast the molten data into a new form where the aggregate of each type of ship for each year is created. It is done using the cast() function.

# recasted.ship <- cast(molten.ships, type+year~variable,sum)
# print(recasted.ship)

Manipulation of CSV Files in R

In R, we can read data from files stored outside the R environment. We can also write data into files which will be stored and accessed by the operating system. R can read and write into various file formats such as csv, excel, xml etc.

In this exercise we will learn to read data from a csv file and then write data into a csv file. The file should be present in current working directory so that R can read it. Of course we can also set our own directory and read files from there.

Getting and Setting the Working Directory

You can check which directory the R workspace is pointing to using the getwd() function. You can also set a new working directory using setwd() function.

# Get and print current working directory.
print(getwd())
## [1] "/home/tdl/Pytho_R_AfriGeoinformation/R-Programming"
# Set current working directory.
setwd("/home/tdl/Pytho_R_AfriGeoinformation/R-Programming")
print(getwd())
## [1] "/home/tdl/Pytho_R_AfriGeoinformation/R-Programming"

Input data as CSV File

The csv file is a text file in which the values in the columns are separated by a comma. Let’s consider the following data present in the file named input.csv.

Also, you can create this file using windows notepad by copying and pasting this data. Save the file as input.csv using the save As All files(.) option in notepad.

# Get and print current working directory.
print(getwd())
## [1] "/home/tdl/Pytho_R_AfriGeoinformation/R-Programming"

Input as CSV File

The csv file is a text file in which the values in the columns are separated by a comma. Let’s consider the following data present in the file named input.csv.

You can create this file using windows notepad by copying and pasting this data. Save the file as input.csv using the save As All files(.) option in notepad. please remove # while copping the text.

#id,name,salary,start_date,dept
#1,Abel,2623.3,2012-01-01,GIS
#2,Abdi,2715.2,2013-09-23,Geomatics
#3,Birtukan,2811.0,2014-11-15,Mapping
#4,Abenet,2929.0,2014-05-11,LiDAR
#5,Mekannent,3043.25,2015-03-27,Remote Sensing
#6,Surafel,1578.0,2013-05-21,IT
#7,Meron,1722.5,2013-07-30,HR
#8,Tesfaye,1632.8,2014-06-17,Finance

Reading a CSV File

Following is a simple example of read.csv() function to read a CSV file available in your current working directory −

data <- read.csv("input.csv")
print(data)
##   id      name Salary.in.USD start_date           dept
## 1  1      Abel       2623.30 2012-01-01            GIS
## 2  2      Abdi       2715.20 2013-09-23      Geomatics
## 3  3  Birtukan       2811.00 2014-11-15        Mapping
## 4  4    Abenet       2929.00 2014-05-11          LiDAR
## 5  5 Mekannent       3043.25 2015-03-27 Remote Sensing
## 6  6   Surafel       1578.00 2013-05-21             IT
## 7  7     Meron       1722.50 2013-07-30             HR
## 8  8   Tesfaye       1632.80 2014-06-17        Finance

Analyzing the CSV File

By default the read.csv() function gives the output as a data frame. This can be easily checked as follows. Also we can check the number of columns and rows.

data <- read.csv("input.csv")

print(is.data.frame(data))
## [1] TRUE
print(ncol(data))
## [1] 5
print(nrow(data))
## [1] 8

Once we read data in a data frame, we can apply all the functions applicable to data frames as explained in subsequent section. Example, Get the maximum salary

# Get the max salary from data frame.
sal <- max(data$Salary.in.USD)
print(sal)
## [1] 3043.25

Get the details of the person with max salary

We can fetch rows meeting specific filter criteria similar to a SQL where clause.

# Create a data frame.
data <- read.csv("input.csv")

# Get the max salary from data frame.
sal <- max(data$Salary.in.USD)

# Get the person detail having max salary.
retval <- subset(data, Salary.in.USD == max(Salary.in.USD))
print(retval)
##   id      name Salary.in.USD start_date           dept
## 5  5 Mekannent       3043.25 2015-03-27 Remote Sensing

Get all the people working in IT department

# Create a data frame.
data <- read.csv("input.csv")

retval <- subset( data, dept == "IT")
print(retval)
##   id    name Salary.in.USD start_date dept
## 6  6 Surafel          1578 2013-05-21   IT

Get the persons in IT department whose salary is greater than 600

# Create a data frame.
data <- read.csv("input.csv")

info <- subset(data, Salary.in.USD > 600 & dept == "IT")
print(info)
##   id    name Salary.in.USD start_date dept
## 6  6 Surafel          1578 2013-05-21   IT

Get the people who joined on or after 2014

# Create a data frame.
data <- read.csv("input.csv")

retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))
print(retval)
##   id      name Salary.in.USD start_date           dept
## 3  3  Birtukan       2811.00 2014-11-15        Mapping
## 4  4    Abenet       2929.00 2014-05-11          LiDAR
## 5  5 Mekannent       3043.25 2015-03-27 Remote Sensing
## 8  8   Tesfaye       1632.80 2014-06-17        Finance

Writing into a CSV File

R can create csv file form existing data frame. The write.csv() function is used to create the csv file. This file gets created in the working directory.

# Create a data frame.
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered data into a new file.
write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")
print(newdata)
##   X id      name Salary.in.USD start_date           dept
## 1 3  3  Birtukan       2811.00 2014-11-15        Mapping
## 2 4  4    Abenet       2929.00 2014-05-11          LiDAR
## 3 5  5 Mekannent       3043.25 2015-03-27 Remote Sensing
## 4 8  8   Tesfaye       1632.80 2014-06-17        Finance

Here the column X comes from the data set newper. This can be dropped using additional parameters while writing the file.

# Create a data frame.
data <- read.csv("input.csv")
retval <- subset(data, as.Date(start_date) > as.Date("2014-01-01"))

# Write filtered data into a new file.
write.csv(retval,"output.csv", row.names = FALSE)
newdata <- read.csv("output.csv")
print(newdata)
##   id      name Salary.in.USD start_date           dept
## 1  3  Birtukan       2811.00 2014-11-15        Mapping
## 2  4    Abenet       2929.00 2014-05-11          LiDAR
## 3  5 Mekannent       3043.25 2015-03-27 Remote Sensing
## 4  8   Tesfaye       1632.80 2014-06-17        Finance

Exercise 1: Use the Federal_Roads_Afri.csv as input and do the following; 1) Identify the longest Asphalt from Federal_Roads_Afri.csv file? 2) Identify the longest Gravel from Federal_Roads_Afri.csv file? 3) Identify the longest Asphalt and Gravel from Federal_Roads_Afri.csv file?

#Road_csv <- read.csv("Federal_Roads_Afri.csv", header = TRUE, sep=",")
#Road_csv
#print(is.data.frame(Road_csv))
#print(ncol(Road_csv))
#print(nrow(Road_csv))

Excel File Manipulation in R

Microsoft Excel is the most widely used spreadsheet program which stores data in the .xls or .xlsx format. R can read directly from these files using some excel specific packages. Few such packages such as XLConnect, xlsx, gdata etc. We will be using xlsx package. R can also write into *excel file using this package.

Install xlsx Package

You can use the following command in the R console to install the “xlsx” package. It may ask to install some additional packages on which this package is dependent. Follow the same command with required package name to install the additional packages. What we do here is similar as we did earlier.

# install.packages("xlsx") # remove the comments to install xlsx

Verify and Load the “xlsx” Package

Use the following command to verify and load the “xlsx” package.

# Verify the package is installed.
any(grepl("xlsx",installed.packages())) # TRUE if it is already exist or FALSE if not there 
## [1] FALSE
# Load the library into R workspace.
# library("xlsx")

Reading the Excel File

The input.xlsx is read by using the read.xlsx() function as shown below. The result is stored as a data frame in the R environment. Save the previous file as input.xlsx

# Read the first worksheet in the file input.xlsx.
#data <- read.xlsx("input.xlsx", sheetIndex = 1)
#print(data)

Binary Files in R

A binary file is a file that contains information stored only in form of bits and bytes.(0’s and 1’s). They are not human readable as the bytes in it translate to characters and symbols which contain many other non-printable characters. Attempting to read a binary file using any text editor will show characters like Ø and ð.

The binary file has to be read by specific programs to be useable. For example, the binary file of a Microsoft Word program can be read to a human readable form only by the Word program. Which indicates that, besides the human readable text, there is a lot more information like formatting of characters and page numbers etc., which are also stored along with alphanumeric characters. And finally a binary file is a continuous sequence of bytes. The line break we see in a text file is a character joining first line to the next.

Sometimes, the data generated by other programs are required to be processed by R as a binary file. Also R is required to create binary files which can be shared with other programs.

R has two functions WriteBin() and readBin() to create and read binary files. Syntax

# writeBin(object, con)
# readBin(con, what, n )

The description of the parameters used −

  • con is the connection object to read or write the binary file.

  • object is the binary file which to be written.

  • what is the mode like character, integer etc. representing the bytes to be read.

  • n is the number of bytes to read from the binary file.

Example We consider the R inbuilt data “mtcars”. First we create a csv file from it and convert it to a binary file and store it as a OS file. Next we read this binary file created into R.

Writing the Binary File

We read the data frame “mtcars” as a csv file and then write it as a binary file to the OS.

# Read the "mtcars" data frame as a csv file and store only the columns 
#   "cyl", "am" and "gear".
#write.table(mtcars, file = "mtcars.csv",row.names = FALSE, na = "", 
 #  col.names = TRUE, sep = ",")

# Store 5 records from the csv file as a new data frame.
#new.mtcars <- read.table("mtcars.csv",sep = ",",header = TRUE,nrows = 5)

# Create a connection object to write the binary file using mode "wb".
#write.filename = file("/web/com/binmtcars.dat", "wb")

# Write the column names of the data frame to the connection object.
#writeBin(colnames(new.mtcars), write.filename)

# Write the records in each of the column to the file.
#writeBin(c(new.mtcars$cyl,new.mtcars$am,new.mtcars$gear), write.filename)

# Close the file for writing so that it can be read by other program.
#close(write.filename)