Analysis of Spatio-Temporal Data: Lab1

Avipsa Roy, MSc Geoinformatics, WISE 2015

Lecture Wrap Up

How is the mean for a random variable X defined?
The mean of a random variable X = (x₁,x₂,….) is defined as follows:
Give the equation for covariance between two variables x and y.
The equation of ovariance for x and y is given by,
Give the equation for correlation between two variables x and y.
The equation of correaltion between variables x and y is,
For the samples x and y, compute the sample mean, the variance and the standard deviation.

x = c(1,2,4,5)
y = c(2,1,5,4)
mx = mean(x)
mx

## [1] 3

my = mean(y)
my

## [1] 3

vx = var(x)
vx

## [1] 3.333333

vy = var(y)
vy

## [1] 3.333333

sdx = sd(x)
sdx

## [1] 1.825742

sdy = sd(y)
sdy

## [1] 1.825742

Draw the data in a scatter plot (by hand)
From the table above, compute the covariance between x and y, and the correlation between x and y.

df <- data.frame(x,y)
df

##   x y
## 1 1 2
## 2 2 1
## 3 4 5
## 4 5 4

cov(x,y)

## [1] 2.666667

cor(x,y)

## [1] 0.8

Suppose x is a time series data set, and the rows indicate the time index. Compute the lag 0, lag 1 and lag 2 autocorrelations of x.

 head(acf(x))

## 
## Autocorrelations of series 'x', by lag
## 
##    1    2    3   NA   NA   NA 
##  0.3 -0.4 -0.4   NA   NA   NA

 plot(acf(x))

plot of chunk unnamed-chunk-3

What is the general equation of an autoregressive AR(p) process?
An autoregressive process of order p: Y_t is a linear combination of p preceding values:
What is the general equation of a moving average MA(q) process?
The moving average process of order q: Y_t is a linear combination of q preceding error terms:

alt text
10) Up to which lag is an AR(p) process correlated?

The lag is limited automatically to one less than the number of observations in the series. The default value for lag.max is __10*log10(N/m)__ where N is the number of observations and m the number of series.

Up to which lag is a MA(q) process correlated?

The default value for lag.max is __10*log10(N/m)__ where N is the number of observations and m the number of series.

What can be said about partial correlations of an AR(2) process?
The partial correlation coefficient is estimated by fitting autoregressive models of successively higher orders up to lag.max.

Computer Exercises

Create a random walk time series of length 1000 using rnorm and cumsum, and plot it

x <- rnorm(1000)
head(x)

## [1]  0.3826126 -0.9535073 -0.3763547  0.4392412 -0.4325646 -0.3112970

y <- cumsum(x)
head(y)

## [1]  0.3826126 -0.5708948 -0.9472495 -0.5080083 -0.9405729 -1.2518699

plot(y)

plot of chunk unnamed-chunk-5

Compute and plot the autocorrelation with acf and partial autocorrelation with pacf of this series.

y.acf1 <- acf(y,plot = FALSE)
y.acf1

## 
## Autocorrelations of series 'y', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 1.000 0.995 0.989 0.984 0.979 0.974 0.968 0.963 0.957 0.952 0.946 0.941 
##    12    13    14    15    16    17    18    19    20    21    22    23 
## 0.935 0.929 0.924 0.918 0.913 0.907 0.902 0.897 0.891 0.887 0.881 0.876 
##    24    25    26    27    28    29    30 
## 0.871 0.865 0.860 0.855 0.850 0.844 0.839

plot(y.acf1)

plot of chunk unnamed-chunk-6

y.pacf1 <- pacf(y,plot=FALSE)
y.pacf1

## 
## Partial autocorrelations of series 'y', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.995 -0.023  0.018 -0.001  0.018 -0.034 -0.015 -0.014  0.001  0.016 
##     11     12     13     14     15     16     17     18     19     20 
## -0.028 -0.025  0.026 -0.003 -0.020  0.014 -0.007  0.009  0.038 -0.005 
##     21     22     23     24     25     26     27     28     29     30 
##  0.009 -0.035 -0.004  0.001 -0.010 -0.013  0.023 -0.008 -0.018 -0.019

plot(y.pacf1)

plot of chunk unnamed-chunk-6

Generate an ARIMA model with coefficient 0.9 and 0.1, determine acf and pacf

ar1 <- arima.sim(list(ar = 0.9) ,n = 1000)
head(ar1)

## [1] 3.212650 4.242610 6.965067 6.672971 6.455681 5.020726

plot(ar1)

plot of chunk unnamed-chunk-7

ar2 <- arima.sim(list(ar = 0.1) ,n = 1000)
head(ar2)

## [1]  0.1343806  1.2841450 -0.4630326 -0.4183226 -0.0467525 -1.2421204

plot(ar2)

plot of chunk unnamed-chunk-7

acf(arima.sim(list(ar = 0.1) ,n = 1000))

plot of chunk unnamed-chunk-7

pacf(arima.sim(list(ar = 0.1) ,n = 1000))

plot of chunk unnamed-chunk-7

Generate a model AR(3) of length 10000 using coefficients c(0.7,0.15,0.05) and calculate acf, pacf

ar3 <- arima.sim(list(ar = c(0.7,0.15,0.05)) ,n = 10000)
head(ar3)

## [1]  1.7151568  0.3530036 -1.4396374 -1.7142441 -1.4319962 -2.2411801

plot(ar3)

plot of chunk unnamed-chunk-8

ar3.acf <- acf(ar3,plot = FALSE)
ar3.acf

## 
## Autocorrelations of series 'ar3', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.877  0.813  0.758  0.703  0.651  0.603  0.557  0.515  0.474 
##     10     11     12     13     14     15     16     17     18     19 
##  0.434  0.398  0.368  0.341  0.319  0.295  0.271  0.249  0.233  0.216 
##     20     21     22     23     24     25     26     27     28     29 
##  0.199  0.186  0.170  0.156  0.141  0.127  0.117  0.104  0.093  0.080 
##     30     31     32     33     34     35     36     37     38     39 
##  0.069  0.055  0.050  0.041  0.034  0.025  0.017  0.012  0.009  0.006 
##     40 
## -0.001

plot(ar3.acf)

plot of chunk unnamed-chunk-8

ar3.pacf <- pacf(ar3, plot = FALSE)
ar3.pacf

## 
## Partial autocorrelations of series 'ar3', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.877  0.191  0.060  0.004 -0.010  0.001 -0.010  0.000 -0.009 -0.016 
##     11     12     13     14     15     16     17     18     19     20 
## -0.002  0.012  0.011  0.018 -0.012 -0.009 -0.004  0.015 -0.002 -0.005 
##     21     22     23     24     25     26     27     28     29     30 
##  0.006 -0.012 -0.003 -0.011 -0.001  0.005 -0.009 -0.003 -0.016 -0.006 
##     31     32     33     34     35     36     37     38     39     40 
## -0.014  0.019 -0.005 -0.001 -0.013 -0.007  0.006  0.010  0.005 -0.020

plot(ar3.pacf)

plot of chunk unnamed-chunk-8

Save the series and fit an AR model to it using ar. Why are the parameters not identical to those used for simulation? What can you do to make the estimated parameters closer to those used for simulation?

ar(ar3)

## 
## Call:
## ar(x = ar3)
## 
## Coefficients:
##      1       2       3  
## 0.6977  0.1491  0.0597  
## 
## Order selected 3  sigma^2 estimated as  0.9874

arima.sim simulates the given model and creates a series of values even though these values are based on a specific model. You can increase the number of elements (from n=1000 to n=10000 or more) in the series.

Simulate an MA(10) process with constant weights rep(1,10), and plot autocorrelation functions.

sim.ma<-arima.sim(list(ma=c(1,10)),n=1000)
head(sim.ma)

## [1] -6.479210 -5.632104 -8.920058 -5.765782  8.918624 24.830047

acf(sim.ma,main="ACF of MA(10) process")

plot of chunk unnamed-chunk-10

pacf(sim.ma,main="PACF of MA(10) process")

plot of chunk unnamed-chunk-10

Consider the plot generated by plot(stl(co2.sub,s.window = “per”)) and describe in some detail the components into which the original time series has been decomposed. What do the grey bars at the right hand side of the sub-plots reflect?

co2.sub <- ts(co2[1:400], 1959, 1959+399/12, 12)
head(co2.sub)

## [1] 315.42 316.31 316.50 317.56 318.13 318.00

plot(stl(co2.sub,s.window = "per"))

plot of chunk unnamed-chunk-11

A: The grey bars reflect the data series that has been composed into seasonal, trend and remainder components.The seasonal compoenent looks at reaccuring patterns, the trend component looks at overall trend component and the remainder defines all the patterns that are not explained by the aforementionend components.

Fit an AR process to the remainder of this decomposition, obtained by stl(co2.sub, s.window = “per”)$time.series[,“remainder”]

ts.test <- ar(stl(co2.sub, s.window = "per")$time.series[,"remainder"]) 
ts.test

## 
## Call:
## ar(x = stl(co2.sub, s.window = "per")$time.series[, "remainder"])
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  0.2285  -0.0804  -0.2241  -0.1614  -0.1631  -0.1600  -0.1416  -0.1429  
##       9       10  
##  0.0205  -0.1108  
## 
## Order selected 10  sigma^2 estimated as  0.04636

Simulate a new process with these fitted coefficients for the dropped years, also plot the 95% confidence levels.

ts.sim <- arima.sim(n=10000,model = list(ar=ts.test$ar))
head(ts.sim)

## [1] -2.70232667 -0.43528506 -0.62958179  0.09579876  0.98405062  2.24704356

ts.plot(ts.sim)

plot of chunk unnamed-chunk-13

require(forecast)
fit.conf <- forecast(ts.sim,level = 95)
fit.conf

##       Point Forecast     Lo 95    Hi 95
## 10001      0.2371253 -2.249450 2.723701
## 10002      0.2371253 -2.578974 3.053224
## 10003      0.2371253 -2.873786 3.348036
## 10004      0.2371253 -3.142981 3.617232
## 10005      0.2371253 -3.392265 3.866516
## 10006      0.2371253 -3.625494 4.099745
## 10007      0.2371253 -3.845421 4.319672
## 10008      0.2371253 -4.054091 4.528342
## 10009      0.2371253 -4.253075 4.727325
## 10010      0.2371253 -4.443606 4.917857

plot(fit.conf)

plot of chunk unnamed-chunk-13