Introduction to R

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

I would recommend that you try out all the codes by yourself, DO NOT copy-paste, that will not help you. There are many options within functions that was not explained in the tutorial. You have to explore and learn those by yourself.

The Windows in RStudio

The upper left is the script window, the one below it is the console. The one to its right is the environment/workspace, and the one diagonally opposite is the graphics and directory panel.

We use the script to write and save codes, and the console to write and execute lines of code without saving any.

In the graphics window, you can see various plots, the packages installed, so that you can download new packages incase you need any, and see the current active folder.

Example of a script:

x = 100;
y = x+1

# This whole document is filled with comments.
# This is a comment referring to itself referring to itself...ad infinitum

Scalar Operations

Basic Math Operations:

x = 100; y <- 69;
x + y -> z;
sum = x+y;
diff = x-y;
prod = x*y;
quo = x/y;

Other math operations:

z = x^2; a = log(x); root = sqrt(x);

Complex Numbers, finding modulus and argument:

comp = 3+4i;
mod = abs(comp); argument = Arg(comp)

Printing values to the console:

## [1] 10000

print(z)

## [1] 10000

Vector Operations

Creating a Vector:

x = c(4,2,6,8,9);
y <- c(1,2,3,4,5);
z = c(x,y)

Accessing elements of a vector:

p = x[4];

Taking a subset of the elements in the vector:

q = x[c(2,5)]; # Take out only certain values from the vector
r = x[x>5]; # Taking out only those elements greater than 5

Mathematical Operations:

a = x+y; b = x*y; c = x^2; d = log(x);

All the above operations are element-wise mathematical operations

All Summary Statistics:

summary(x);

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0     4.0     6.0     5.8     8.0     9.0

Speific stats:

mu = mean(x); sigma = sqrt(var(x));
maxi = max(x); mini = min(x)

Characteristics of a vector: Length and Names of elements

n = length(x); # Number of elements in the vector
n1 = length(x <- rnorm(4)); # This can be done dynamically also. Might be useful

names(x) = c('First','Second','Third','Fourth')
print(x);

##      First     Second      Third     Fourth 
## -0.5628921 -0.7803335 -0.2367322 -0.8598477

Matrix Operations

Creating a Matrix:

A = matrix(1:20, nrow = 4, ncol = 5) # This fills the elements column-wise
B = matrix(1:20, nrow = 4, ncol = 5, byrow='T') # This fills the elements row-wise

Concatentaion, both row- and coloumn-wise

rbind(A,B); # "Binds" the rows together

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20
## [5,]    1    2    3    4    5
## [6,]    6    7    8    9   10
## [7,]   11   12   13   14   15
## [8,]   16   17   18   19   20

cbind(A,B); # "Binds" the rows together

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    5    9   13   17    1    2    3    4     5
## [2,]    2    6   10   14   18    6    7    8    9    10
## [3,]    3    7   11   15   19   11   12   13   14    15
## [4,]    4    8   12   16   20   16   17   18   19    20

Accessing elements in a matrix:

p = A[2,3]; # Single element

q = B[1:3,2:4]; # A subset of the matrix.
r = A[3,]; # This gives the entire 3rd row.
t = B[,4] # This gives the entire 4th column.

VERY IMPORTANT: Vectors are NOT 1-D matrices. We need to convert the vectors into matrix objects so that we can perform matrix operations on them!

Element wise mathematical operations:

This is the same as the previous two cases.

C = A+B; D = A*B; E = A^2; F = log(B)/A

Matrix Multiplication:

M = A %*% t(B) # t() is the matrix transpose

Inverse of a Matrix:

A = matrix(rnorm(16), nrow = 4, ncol = 4);
In = solve(A);
Inv = qr.solve(A) # More efficient than the previous one

We need to learn how to save and load RData files, as the data in the assignments will be given to you in the form of RData files.

Saving the workspace as an RData file:

save.image(file = ‘var.RData’) saves the workspace in the current directory

Loading an RData file from the current directory:

load(‘var.RData’) loads the saved variables in the workspace

Time Series Objects

These are different from vectors in the sense that the notion of time is incorporated in these objects directly.

Creation:

x = ts(rnorm(100), start = 2, frequency = 1)

y = rnorm(1000); # Normal Vector
z = ts(y) # Converted into a ts object

ts() also converts matrices to multivariate ts objects.

Attributes of a TS object:

time(x) # Gives the times of measurement of each value

## Time Series:
## Start = 2 
## End = 101 
## Frequency = 1 
##   [1]   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
##  [18]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
##  [35]  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52
##  [52]  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69
##  [69]  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86
##  [86]  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101

start(x) # The first number gives time of start of measurement (first observation)

## [1] 2 1

end(x) # The first number gives time of end of measurement (last observation)

## [1] 101   1

frequency(x) # Gives the number of measurements at each time

## [1] 1

Test whether an object is a Time series or not

is.ts(x)

## [1] TRUE

Yes, this is a time series object.

Plotting in R

R provides a very good visualization of data. All the plots appear in the graphic window at the bottom right. Multiple plots can be plotted at once, and they get plotted one after the other.

x = rnorm(100); # Creating vectors
y = rnorm(100);

Default are scatter plots:

plot(x);

plot(x,y)

There are other kinds of plots also:

plot(x,type = 'l'); # Line plot

plot(x,type = 'p'); # Point (scatter)

plot(x,type = 'b'); # Both the above

plot(x,type = 'h'); # Bar plot

plot(x,type = 's'); # Step type

We can change the x axes label, y axes label, title and colour:

plot(x,y,type='p',xlab='x axis',ylab='y axis',main='title',col = 'red')

For plotting time series objects, the time is automatically incorporated:

x = ts(x);
plot.ts(x,type = 'l',col = 'blue');

Checking the structure of a Time Series object

str(EuStockMarkets) # Default dataset available in R

##  mts [1:1860, 1:4] 1629 1614 1607 1621 1618 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:4] "DAX" "SMI" "CAC" "FTSE"
##  - attr(*, "tsp")= num [1:3] 1991 1999 260
##  - attr(*, "class")= chr [1:3] "mts" "ts" "matrix"

Plotting Multivariable TS Data

data("EuStockMarkets"); # Multivariate time series available as pre defined datasets
# These types of datasets are available in the TSA package in R
plot.ts(EuStockMarkets,col = 'steelblue')

User Defined Functions

Functions are basically reusable chunks of code, that need not be rewiteen, and can just be used by referencing them with the function name and the input arguments, both of whihc we will see below.

Functions can be defined anywhere. They can be defined at the start of a code in the script, or bang in the middle of the script, or it can be defined in a separate script file. Functions are stored like variables, and hence must be sourced before they can be used.

The basic format is:

Defining a function:

myfunction <- function(arg1, arg2, ... ){
  statements;
  return(object)
}

Calling a function:

val = myfunction(arg1value, arg2value, ... )

An example of a function:

I am going to show you a function that takes two values and outputs the sum:

summation <- function(x,y=0){
  z = x+y;
  return(z)
}

We will call this using its name:

summation(5,4);

## [1] 9

summation(5); # Since y was given a default value of 0, this must not give an error

## [1] 5

We can do anything within a function. Make sure you are comfortable with writing them, because you will be asked to write functions for certain algorithms like the Durbin Levinson algorithm for finding PACF.

Common pre-defined functions in R:

#Generates set of numbers from the normal distribution
x = rnorm(100,mean=0,sd=1)

#Data/Estimation Statistics
summary(x)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.19700 -0.71860 -0.03625 -0.06766  0.50720  3.14000

#Uniformly sampled random numbers
y = runif(100,min=0,max=1)

#Transpose of a matrix
A = matrix(x,nrow = 10,ncol=10);
A

##              [,1]        [,2]          [,3]       [,4]       [,5]
##  [1,]  0.08419173  0.94312390  0.2673332192 -0.4118098 -2.0465639
##  [2,]  0.36704998 -1.13954421  0.7275996409 -1.8335537 -1.3909775
##  [3,]  0.08957699  0.05390569 -1.0997736646  0.3654702 -1.4440367
##  [4,]  1.01358033 -0.95973825  0.8608864522 -0.9702803 -0.6266463
##  [5,]  0.42056653  1.11790217 -0.3581301023 -0.6207176 -0.7522858
##  [6,] -0.48439900 -0.03252393 -0.0005890898  2.7499970  0.1200942
##  [7,] -0.99414954 -0.53377011 -0.4001906285  0.6301813  0.3553984
##  [8,]  0.42447353  0.85443221  0.4030093941  1.6378169 -0.2563641
##  [9,] -0.11102767  0.11880671  0.2195147983  0.8524258  1.2554500
## [10,] -0.79445793 -0.17884740 -0.3351544091 -1.3045567  3.1399525
##              [,6]        [,7]       [,8]       [,9]       [,10]
##  [1,] -0.93712953 -0.80181190 -2.0982629 -1.1788544  0.56471639
##  [2,]  0.94432014 -1.61808815 -0.7074049 -0.3265361  1.22151907
##  [3,]  1.00571002 -0.58593644  0.1182375 -1.1166347 -0.60227075
##  [4,] -0.01737095 -1.47900967  0.5490843  0.3329637  0.32924063
##  [5,] -0.14251162  0.13013662 -2.1965043  1.0360963  2.46570366
##  [6,]  0.53193726 -0.51233560  2.4253834 -1.0007336 -0.23967009
##  [7,] -1.87932711  0.10015420 -0.2053209 -0.2907052 -0.63068067
##  [8,]  0.49889801 -0.85278570  0.2626400 -0.3524037  1.15412119
##  [9,]  0.04676556 -0.02083691  0.6087533 -1.1616830  0.81846534
## [10,]  1.28245018 -0.45335503 -1.0943573 -0.6431191 -0.03998163

t(A)

##              [,1]       [,2]        [,3]        [,4]       [,5]
##  [1,]  0.08419173  0.3670500  0.08957699  1.01358033  0.4205665
##  [2,]  0.94312390 -1.1395442  0.05390569 -0.95973825  1.1179022
##  [3,]  0.26733322  0.7275996 -1.09977366  0.86088645 -0.3581301
##  [4,] -0.41180982 -1.8335537  0.36547023 -0.97028028 -0.6207176
##  [5,] -2.04656388 -1.3909775 -1.44403666 -0.62664633 -0.7522858
##  [6,] -0.93712953  0.9443201  1.00571002 -0.01737095 -0.1425116
##  [7,] -0.80181190 -1.6180882 -0.58593644 -1.47900967  0.1301366
##  [8,] -2.09826295 -0.7074049  0.11823751  0.54908430 -2.1965043
##  [9,] -1.17885441 -0.3265361 -1.11663472  0.33296371  1.0360963
## [10,]  0.56471639  1.2215191 -0.60227075  0.32924063  2.4657037
##                [,6]       [,7]       [,8]        [,9]       [,10]
##  [1,] -0.4843990043 -0.9941495  0.4244735 -0.11102767 -0.79445793
##  [2,] -0.0325239269 -0.5337701  0.8544322  0.11880671 -0.17884740
##  [3,] -0.0005890898 -0.4001906  0.4030094  0.21951480 -0.33515441
##  [4,]  2.7499970121  0.6301813  1.6378169  0.85242584 -1.30455672
##  [5,]  0.1200942080  0.3553984 -0.2563641  1.25544996  3.13995252
##  [6,]  0.5319372567 -1.8793271  0.4988980  0.04676556  1.28245018
##  [7,] -0.5123356039  0.1001542 -0.8527857 -0.02083691 -0.45335503
##  [8,]  2.4253833773 -0.2053209  0.2626400  0.60875333 -1.09435730
##  [9,] -1.0007336043 -0.2907052 -0.3524037 -1.16168298 -0.64311905
## [10,] -0.2396700923 -0.6306807  1.1541212  0.81846534 -0.03998163

#Lists all objects from the environment
ls()

##  [1] "a"              "A"              "argument"       "b"             
##  [5] "B"              "c"              "C"              "comp"          
##  [9] "d"              "D"              "diff"           "E"             
## [13] "EuStockMarkets" "F"              "In"             "Inv"           
## [17] "M"              "maxi"           "mini"           "mod"           
## [21] "mu"             "n"              "n1"             "p"             
## [25] "prod"           "q"              "quo"            "r"             
## [29] "root"           "sigma"          "sum"            "summation"     
## [33] "t"              "x"              "y"              "z"

#Removes specified objects from the environment.
rm(list = c('A')) 

#Replicates elements of vectors.
rep(x<-rnorm(4),2)

## [1] -0.9037245 -0.3484089  0.6884484  0.1151878 -0.9037245 -0.3484089
## [7]  0.6884484  0.1151878

#Returns those indices of a vector which follws a certain logical condition
which(x>0)

## [1] 3 4

————————————–THE END—————————————