R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Source file ⇒ Applied_TimeSeries_Analysis_NPTEL_IITM_ArunTangirala.rmd
At the outset, it is useful to understand the interface that RStudio presents you with. The upper left iscthe script window, and the one below it is the console, where all the user commands are keyed in. The onecon the top right shows the environment/workspace with tabs for viewing the history as well as the files in the current directory. Finally, the one on the bottom right is the part where you would see the plots being displayed. In addition, there are tabs to view the list of packages that are available and active for the current R session; to look up the Help on a certain routine, which includes a searchable window.
One of the first things that we need to learn is to ask for directions when in doubt, i.e., asking for help in R:
#`?`(plot)
# help(sum)
#`?`(help)Example of a script:
x = 100
y = x + 1
# This whole document is filled with comments. This is a comment referring
# to itself referring to itself...ad infinitumBasic Math Operations:
y <- 69
sumxy = x + y
diffxy = x - y
prodxy = x * y
quoxy = x/yOther math operations:
xsq = x^2
lx = log(x)
xroot = sqrt(xsq)Complex Numbers, finding modulus and argument:
z = x + (0+1i) * y
magz = abs(z)
argz = Arg(z)
lz = log10(z)Printing values to the console:
z## [1] 100+69i
print(z)## [1] 100+69i
Creating a Vector:
xvec = c(4, 2, 6, 8, 9)
yvec <- c(1, 2, 3, 4, 5)
zvec = c(xvec, yvec)Accessing elements of a vector:
x4 = xvec[4]Taking a subset of the elements in the vector:
x25 = xvec[c(2,5)] # Take out only certain values from the vector
r = xvec[xvec > 5] # Taking out only those elements greater than 5Mathematical Operations:
xyadd = xvec + yvec
xyprod = xvec * yvec
c = xvec^2
d = log(x)All the above operations are element-wise mathematical operations
All Summary Statistics:
summary(xvec)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.0 4.0 6.0 5.8 8.0 9.0
Specific stats:
mux = mean(xvec)
sigmax = sqrt(var(xvec))
maxix = max(xvec)
minix = min(xvec)Characteristics of a vector: Length and Names of elements
n = length(x); # Number of elements in the vector
n1 = length(xvec2 <- rnorm(4)); # This can be done dynamically also. Might be useful
names(xvec2) = c('First','Second','Third','Fourth')
print(xvec2)## First Second Third Fourth
## 0.2647422 1.5002891 -1.4166086 -0.2589104
Creating a Matrix:
A = matrix(1:20, nrow = 4, ncol = 5) # This fills the elements column-wise
B = matrix(1:20, nrow = 4, ncol = 5, byrow = "T") # This fills the elements row-wiseConcatenation, both row- and column-wise
rbind(A,B); # "Binds" the rows together## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 5 9 13 17
## [2,] 2 6 10 14 18
## [3,] 3 7 11 15 19
## [4,] 4 8 12 16 20
## [5,] 1 2 3 4 5
## [6,] 6 7 8 9 10
## [7,] 11 12 13 14 15
## [8,] 16 17 18 19 20
cbind(A,B) # "Binds" the rows together## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 5 9 13 17 1 2 3 4 5
## [2,] 2 6 10 14 18 6 7 8 9 10
## [3,] 3 7 11 15 19 11 12 13 14 15
## [4,] 4 8 12 16 20 16 17 18 19 20
Accessing elements in a matrix:
p = A[2,3]; # Single element
q = B[1:3,2:4]; # A subset of the matrix.
r = A[3,]; # This gives the entire 3rd row.
t = B[,4] # This gives the entire 4th column.Some more examples including creating a 3-D array
B2 = array(c(1:3),c(2,3))
B2 # Print B## [,1] [,2] [,3]
## [1,] 1 3 2
## [2,] 2 1 3
as.vector(B2) # presents B as a vector## [1] 1 2 3 1 2 3
dim(B2) # Gets the dimensions## [1] 2 3
C2 = array(seq(1,3,length=12),c(2,3,2)); # Create a 3-D array
dim(C2) # Prints the dimension of C## [1] 2 3 2
C2[,,1]## [,1] [,2] [,3]
## [1,] 1.000000 1.363636 1.727273
## [2,] 1.181818 1.545455 1.909091
** VERY IMPORTANT: Vectors are NOT 1-D matrices. We need to convert the vectors into matrix objects so that we can perform matrix operations on them! **
Element wise mathematical operations:
This is the same as the previous two cases
C = A + B
D = A * B
E = A^2
F = log(B)/AMatrix Multiplication:
M = A %*% t(B) # t() is the matrix transposeInverse of a Matrix:
A = matrix(rnorm(16), nrow = 4, ncol = 4)
Ainv = solve(A)
Ainv = qr.solve(A) # More efficient than the previous oneSaving your data
We need to learn how to save and load RData files, as the data in the assignments will be given to you in the form of RData files.
Saving the workspace as an RData file:
save.image(file = 'var.RData') saves the workspace in the current directory
Loading an RData file from the current directory:
load('var.RData') loads the saved variables in the workspace
R has very good visualization capabilities. All plots appear in the graphic window at the bottom right of RStudio. Multiple plots can be plotted at once, and they are drawn one after the other. Overlays are also, of course, possible.
x = rnorm(100); # Creating vectors
y = rnorm(100)Default are scatter plots:
plot(x)plot(x, y)There are other kinds of plots also:
plot(x,type = 'l'); # Line plotplot(x,type = 'p'); # Point (scatter)plot(x,type = 'b'); # Both the aboveplot(x,type = 'h'); # Bar plotplot(x,type = 's') # Step typeWe can change the axes labels, title, colour, font size, etc.
plot(sort(x), y, type = "l", xlab = "x axis", ylab = "y axis", main = "Line plot",col = "red", font.main = 2)R offers powerful features for plot customization including overlays, multiple graphs on a single screen, etc.
Overlays
Overlay the scatter and line plots
# Plot a sine wave
kvec = (0:99)
xk = sin(2 * pi * 0.1 * kvec)
plot(kvec, xk)
lines(kvec, xk, col = "red", lwd = 2)Note that using the ‘o’ option in plot produces the same figure.
Now, for another example, where some prior customization is required for overlaying both plots.
kvec = (0:99)
xk1 = sin(2 * pi * 0.1 * kvec) # Sinusoid
xk2 = 0.02 * kvec # Line
plot(kvec, xk1, type = "b", pch = 19, ylim = c(-1, 2.5), xlab = "Time", ylab = "Signal", col = "blue")
lines(kvec, xk2, col = "red", lty = 2, lwd = 2)
legstr = c("Sine", "Line")
legend(x = 0, y = 2.4, legend = c("Sine", "Line"), col = c("blue", "red"), lty = 1:2,pch = c(19, NA), lwd = 1:2, text.width = 1.2 * max(strwidth("Line Types")),cex = 0.9, y.intersp = 1.3, title = "Line Types")Observe that we have adjusted the limits of the y-axis prior to overlaying the lines. Further, we have added a legend to the plot. Pay attention to the options passed on to the legend routine.
Next, we shall learn how to draw multiple figures in the same screen.
Drawing multiple plots in same screen
There are three commands for this purpose, namely, par, layout and split.screen, in that order of sophistication.
Our example consists of plotting the weekly SO2 series provided in the astsa package (library). For this purpose, install the astsa package either from the command line (using the install.packages routine) or using the RStudio GUI (in the “Packages” pane). Make sure the dependencies (if any) are also installed.
After installing the package, it is necessary to first “attach” the package. This is done using the library (or alternatively the require) command.
library(astsa)The series of interest is so2, which contains the weekly sulphur dioxide levels in Los Angeles, USA obtained from a pollution study during the years 1970-80. We shall plot this series, its Box plot and histogram.
Note that the data is in the time-series format.
The par with the mfrow and/or mfcol option offer some of the simplest solutions to grid plotting.
par(mfrow = c(3, 1))
par(mar = c(3, 3, 3, 3)) # L, R, T and B widths of each plot in no. of lines
par(oma = c(1, 1, 1, 1)) # Outer margins for the figure
plot(so2, main = "SO2 series", xlab = "Year")
boxplot(so2, main = "Box plot")
hist(so2, col = "grey")Let’s realize these plots, in fact, with a more fancy arrangement using the layout command. Our target is to plot the series across the entire width, but the Box and the histogram plots share the bottom panel.
M <- rbind(c(1, 1), c(2, 3)) # Create a matrix of plot layout
print(M)## [,1] [,2]
## [1,] 1 1
## [2,] 2 3
layout(M) # Lay out the plots
layout.show(3)par(mar = c(3, 3, 3, 3))
plot(so2, main = "SO2 series", xlab = "Year", bty = "l")
boxplot(so2, main = "Box plot")
hist(so2, col = "grey", probability = T)This arrangement looks more elegant, doesn’t it? Now finally, let’s show how to achieve the same arrangement using split.screen command
split.screen(c(2, 1))## [1] 1 2
screen(1)
par(mar = c(3, 3, 3, 3))
plot(so2, main = "SO2 series", xlab = "Year", bty = "l")
screen(2)
split.screen(c(1, 2), screen = 2)## [1] 3 4
par(mar = c(3, 3, 3, 3))
boxplot(so2, main = "Box plot")
screen(4)
par(mar = c(3, 3, 3, 3))
hist(so2, col = "grey", probability = T, xlab = "", ylab = "Density")close.screen(all.screens = TRUE)For more features and its implementations, refer to the excellent tutorials and literature availabe on the web.
We shall close this tutorial with a few commonly used functions.
# Generate a set of numbers from the normal distribution
x = rnorm(100, mean = 0, sd = 1)
# Data/Estimation Statistics
summary(x)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.4260 -0.8208 -0.2155 -0.1951 0.4601 1.9900
# Uniformly sampled random numbers
y = runif(100, min = 0, max = 1)
# Transpose a matrix
A = matrix(x[1:25], nrow = 5, ncol = 5)
A## [,1] [,2] [,3] [,4] [,5]
## [1,] -1.0405070 -0.1557340 -1.214630605 -0.3730367 -0.4372162
## [2,] 0.3192112 1.9904163 -1.105352030 0.4555316 0.4020222
## [3,] -0.8010222 -1.0313627 -1.854447743 -1.7982363 -0.7312137
## [4,] 1.4509594 0.8849008 0.245738810 -0.8137032 0.4811592
## [5,] -1.1852694 -0.5178932 -0.002710641 0.6556761 0.4888438
t(A)## [,1] [,2] [,3] [,4] [,5]
## [1,] -1.0405070 0.3192112 -0.8010222 1.4509594 -1.185269411
## [2,] -0.1557340 1.9904163 -1.0313627 0.8849008 -0.517893197
## [3,] -1.2146306 -1.1053520 -1.8544477 0.2457388 -0.002710641
## [4,] -0.3730367 0.4555316 -1.7982363 -0.8137032 0.655676084
## [5,] -0.4372162 0.4020222 -0.7312137 0.4811592 0.488843768
# Get the diagonal elements
diag(A)## [1] -1.0405070 1.9904163 -1.8544477 -0.8137032 0.4888438
# Eigenvalues of A
eigen(A)$values## [1] 2.3900887+0.000000i -2.1413270+0.996405i -2.1413270-0.996405i
## [4] 0.8257935+0.000000i -0.1626260+0.000000i
# Lists all objects from the environment
ls()## [1] "A" "Ainv" "argz" "B" "B2" "c" "C"
## [8] "C2" "d" "D" "diffxy" "E" "F" "kvec"
## [15] "legstr" "lx" "lz" "M" "magz" "maxix" "minix"
## [22] "mux" "n" "n1" "p" "prodxy" "q" "quoxy"
## [29] "r" "sigmax" "sumxy" "t" "x" "x25" "x4"
## [36] "xk" "xk1" "xk2" "xroot" "xsq" "xvec" "xvec2"
## [43] "xyadd" "xyprod" "y" "yvec" "z" "zvec"
# Remove specified objects from the environment.
rm(list = c("A"))
# Replicate elements of vectors
rep(x <- rnorm(4), 2)## [1] 0.18788169 -0.92023386 -0.13698138 -0.08318267 0.18788169 -0.92023386
## [7] -0.13698138 -0.08318267
# Print with control on digits
print(x[2], digits = 2)## [1] -0.92
# Return those indices of a vector that satisfy a logical condition
which(x > 0)## [1] 1
| . | . |
|---|---|
| 1 | 2 |
| 3 | 4 |
| 5 | 6 |
| 7 | 8 |
| 9 | 10 |
| 11 | 12 |
| 13 | 14 |
| 15 | 16 |
| 17 | 18 |
| 19 | 20 |
| 21 | |
| . | . |
|---|---|
| 1 | 2 |
| 3 | 4 |
| 5 | 6 |
| 7 | 8 |
| 9 | 10 |
| 11 | 12 |
| 13 | 14 |
| 15 | 16 |
| 17 | 18 |
| 19 | 20 |
| 21 | 22 |
| 23 | 24 |
| 25 | 26 |
| 27 | 28 |
| 29 | 30 |
| 31 | |
devtools::session_info()## Session info --------------------------------------------------------------
## setting value
## version R version 3.3.2 (2016-10-31)
## system x86_64, mingw32
## ui RTerm
## language (EN)
## collate English_India.1252
## tz Asia/Calcutta
## date 2017-01-29
## Packages ------------------------------------------------------------------
## package * version date
## assertthat 0.1 2013-12-06
## astsa * 1.7 2016-12-22
## backports 1.0.4 2016-10-24
## base64enc * 0.1-3 2015-07-28
## colorspace 1.3-2 2016-12-14
## curl * 2.3 2016-11-24
## DataComputing * 0.8.3 2016-03-19
## DBI 0.5-1 2016-09-10
## devtools 1.12.0 2016-06-24
## digest 0.6.11 2017-01-03
## dplyr * 0.5.0 2016-06-24
## evaluate 0.10 2016-10-11
## ggdendro 0.1-20 2016-04-27
## ggplot2 * 2.2.1 2016-12-30
## gridExtra 2.2.1 2016-02-29
## gtable 0.2.0 2016-02-26
## htmltools 0.3.5 2016-03-21
## knitr * 1.15.1 2016-11-22
## lattice * 0.20-34 2016-09-06
## lazyeval 0.2.0 2016-06-12
## lubridate * 1.6.0 2016-09-13
## magrittr 1.5 2014-11-22
## manipulate * 1.0.1 2014-12-24
## MASS 7.3-45 2016-04-21
## Matrix * 1.2-7.1 2016-09-01
## memoise 1.0.0 2016-01-29
## mime 0.5 2016-07-07
## mosaic * 0.14.4 2016-07-29
## mosaicData * 0.14.0 2016-06-17
## munsell 0.4.3 2016-02-13
## plyr 1.8.4 2016-06-08
## R6 2.2.0 2016-10-05
## Rcpp 0.12.8 2016-11-17
## rmarkdown 1.3 2016-12-21
## rprojroot 1.1 2016-10-29
## scales 0.4.1 2016-11-09
## stringi 1.1.2 2016-10-01
## stringr * 1.1.0 2016-08-19
## tibble 1.2 2016-08-26
## tidyr * 0.6.1 2017-01-10
## withr 1.0.2 2016-06-20
## yaml 2.1.14 2016-11-12
## source
## CRAN (R 3.3.0)
## CRAN (R 3.3.2)
## CRAN (R 3.3.1)
## CRAN (R 3.3.0)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## Github (DataComputing/DataComputing@fb983c7)
## CRAN (R 3.3.1)
## CRAN (R 3.3.1)
## CRAN (R 3.3.2)
## CRAN (R 3.3.1)
## CRAN (R 3.3.1)
## CRAN (R 3.2.5)
## CRAN (R 3.3.2)
## CRAN (R 3.2.3)
## CRAN (R 3.2.3)
## CRAN (R 3.2.4)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.0)
## CRAN (R 3.3.1)
## CRAN (R 3.2.1)
## CRAN (R 3.2.0)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.2.3)
## CRAN (R 3.3.1)
## CRAN (R 3.3.1)
## CRAN (R 3.3.0)
## CRAN (R 3.2.3)
## CRAN (R 3.3.0)
## CRAN (R 3.3.1)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.2)
## CRAN (R 3.3.1)
## CRAN (R 3.3.1)
## CRAN (R 3.3.1)
## CRAN (R 3.3.2)
## CRAN (R 3.3.0)
## CRAN (R 3.3.2)