Analysis of Spatio-Temporal Data: Lab1

Avipsa Roy, MSc Geoinformatics, WISE 2015

Lecture Wrap Up

  1. How is the mean for a random variable X defined?
    The mean of a random variable X = (x1,x2,….) is defined as follows:
    alt text
  2. Give the equation for covariance between two variables x and y.
    The equation of ovariance for x and y is given by,
    alt text
  3. Give the equation for correlation between two variables x and y.
    The equation of correaltion between variables x and y is,
    alt text
  4. For the samples x and y, compute the sample mean, the variance and the standard deviation.
x = c(1,2,4,5)
y = c(2,1,5,4)
mx = mean(x)
mx
## [1] 3
my = mean(y)
my
## [1] 3
vx = var(x)
vx
## [1] 3.333333
vy = var(y)
vy
## [1] 3.333333
sdx = sd(x)
sdx
## [1] 1.825742
sdy = sd(y)
sdy
## [1] 1.825742
  1. Draw the data in a scatter plot (by hand)
    alt text
  2. From the table above, compute the covariance between x and y, and the correlation between x and y.
df <- data.frame(x,y)
df
##   x y
## 1 1 2
## 2 2 1
## 3 4 5
## 4 5 4
cov(x,y)
## [1] 2.666667
cor(x,y)
## [1] 0.8
  1. Suppose x is a time series data set, and the rows indicate the time index. Compute the lag 0, lag 1 and lag 2 autocorrelations of x.
 head(acf(x))
## 
## Autocorrelations of series 'x', by lag
## 
##    1    2    3   NA   NA   NA 
##  0.3 -0.4 -0.4   NA   NA   NA
 plot(acf(x))

plot of chunk unnamed-chunk-3

  1. What is the general equation of an autoregressive AR(p) process?
    An autoregressive process of order p: Yt is a linear combination of p preceding values:
    alt text
  2. What is the general equation of a moving average MA(q) process?
    The moving average process of order q: Yt is a linear combination of q preceding error terms:

alt text
10) Up to which lag is an AR(p) process correlated?

The lag is limited automatically to one less than the number of observations in the series. The default value for lag.max is __10*log10(N/m)__ where N is the number of observations and m the number of series.

  1. Up to which lag is a MA(q) process correlated?

The default value for lag.max is __10*log10(N/m)__ where N is the number of observations and m the number of series.

  1. What can be said about partial correlations of an AR(2) process?
    The partial correlation coefficient is estimated by fitting autoregressive models of successively higher orders up to lag.max.

Computer Exercises

  1. Create a random walk time series of length 1000 using rnorm and cumsum, and plot it
x <- rnorm(1000)
head(x)
## [1]  0.3826126 -0.9535073 -0.3763547  0.4392412 -0.4325646 -0.3112970
y <- cumsum(x)
head(y)
## [1]  0.3826126 -0.5708948 -0.9472495 -0.5080083 -0.9405729 -1.2518699
plot(y)

plot of chunk unnamed-chunk-5

  1. Compute and plot the autocorrelation with acf and partial autocorrelation with pacf of this series.
y.acf1 <- acf(y,plot = FALSE)
y.acf1
## 
## Autocorrelations of series 'y', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 1.000 0.995 0.989 0.984 0.979 0.974 0.968 0.963 0.957 0.952 0.946 0.941 
##    12    13    14    15    16    17    18    19    20    21    22    23 
## 0.935 0.929 0.924 0.918 0.913 0.907 0.902 0.897 0.891 0.887 0.881 0.876 
##    24    25    26    27    28    29    30 
## 0.871 0.865 0.860 0.855 0.850 0.844 0.839
plot(y.acf1)

plot of chunk unnamed-chunk-6

y.pacf1 <- pacf(y,plot=FALSE)
y.pacf1
## 
## Partial autocorrelations of series 'y', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.995 -0.023  0.018 -0.001  0.018 -0.034 -0.015 -0.014  0.001  0.016 
##     11     12     13     14     15     16     17     18     19     20 
## -0.028 -0.025  0.026 -0.003 -0.020  0.014 -0.007  0.009  0.038 -0.005 
##     21     22     23     24     25     26     27     28     29     30 
##  0.009 -0.035 -0.004  0.001 -0.010 -0.013  0.023 -0.008 -0.018 -0.019
plot(y.pacf1)

plot of chunk unnamed-chunk-6

  1. Generate an ARIMA model with coefficient 0.9 and 0.1, determine acf and pacf
ar1 <- arima.sim(list(ar = 0.9) ,n = 1000)
head(ar1)
## [1] 3.212650 4.242610 6.965067 6.672971 6.455681 5.020726
plot(ar1)

plot of chunk unnamed-chunk-7

ar2 <- arima.sim(list(ar = 0.1) ,n = 1000)
head(ar2)
## [1]  0.1343806  1.2841450 -0.4630326 -0.4183226 -0.0467525 -1.2421204
plot(ar2)

plot of chunk unnamed-chunk-7

acf(arima.sim(list(ar = 0.1) ,n = 1000))

plot of chunk unnamed-chunk-7

pacf(arima.sim(list(ar = 0.1) ,n = 1000))

plot of chunk unnamed-chunk-7

  1. Generate a model AR(3) of length 10000 using coefficients c(0.7,0.15,0.05) and calculate acf, pacf
ar3 <- arima.sim(list(ar = c(0.7,0.15,0.05)) ,n = 10000)
head(ar3)
## [1]  1.7151568  0.3530036 -1.4396374 -1.7142441 -1.4319962 -2.2411801
plot(ar3)

plot of chunk unnamed-chunk-8

ar3.acf <- acf(ar3,plot = FALSE)
ar3.acf
## 
## Autocorrelations of series 'ar3', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.877  0.813  0.758  0.703  0.651  0.603  0.557  0.515  0.474 
##     10     11     12     13     14     15     16     17     18     19 
##  0.434  0.398  0.368  0.341  0.319  0.295  0.271  0.249  0.233  0.216 
##     20     21     22     23     24     25     26     27     28     29 
##  0.199  0.186  0.170  0.156  0.141  0.127  0.117  0.104  0.093  0.080 
##     30     31     32     33     34     35     36     37     38     39 
##  0.069  0.055  0.050  0.041  0.034  0.025  0.017  0.012  0.009  0.006 
##     40 
## -0.001
plot(ar3.acf)

plot of chunk unnamed-chunk-8

ar3.pacf <- pacf(ar3, plot = FALSE)
ar3.pacf
## 
## Partial autocorrelations of series 'ar3', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.877  0.191  0.060  0.004 -0.010  0.001 -0.010  0.000 -0.009 -0.016 
##     11     12     13     14     15     16     17     18     19     20 
## -0.002  0.012  0.011  0.018 -0.012 -0.009 -0.004  0.015 -0.002 -0.005 
##     21     22     23     24     25     26     27     28     29     30 
##  0.006 -0.012 -0.003 -0.011 -0.001  0.005 -0.009 -0.003 -0.016 -0.006 
##     31     32     33     34     35     36     37     38     39     40 
## -0.014  0.019 -0.005 -0.001 -0.013 -0.007  0.006  0.010  0.005 -0.020
plot(ar3.pacf)

plot of chunk unnamed-chunk-8

  1. Save the series and fit an AR model to it using ar. Why are the parameters not identical to those used for simulation? What can you do to make the estimated parameters closer to those used for simulation?
ar(ar3)
## 
## Call:
## ar(x = ar3)
## 
## Coefficients:
##      1       2       3  
## 0.6977  0.1491  0.0597  
## 
## Order selected 3  sigma^2 estimated as  0.9874

arima.sim simulates the given model and creates a series of values even though these values are based on a specific model. You can increase the number of elements (from n=1000 to n=10000 or more) in the series.

  1. Simulate an MA(10) process with constant weights rep(1,10), and plot autocorrelation functions.
sim.ma<-arima.sim(list(ma=c(1,10)),n=1000)
head(sim.ma)
## [1] -6.479210 -5.632104 -8.920058 -5.765782  8.918624 24.830047
acf(sim.ma,main="ACF of MA(10) process")

plot of chunk unnamed-chunk-10

pacf(sim.ma,main="PACF of MA(10) process")

plot of chunk unnamed-chunk-10

  1. Consider the plot generated by plot(stl(co2.sub,s.window = “per”)) and describe in some detail the components into which the original time series has been decomposed. What do the grey bars at the right hand side of the sub-plots reflect?
co2.sub <- ts(co2[1:400], 1959, 1959+399/12, 12)
head(co2.sub)
## [1] 315.42 316.31 316.50 317.56 318.13 318.00
plot(stl(co2.sub,s.window = "per"))

plot of chunk unnamed-chunk-11

A: The grey bars reflect the data series that has been composed into seasonal, trend and remainder components.The seasonal compoenent looks at reaccuring patterns, the trend component looks at overall trend component and the remainder defines all the patterns that are not explained by the aforementionend components.

  1. Fit an AR process to the remainder of this decomposition, obtained by stl(co2.sub, s.window = “per”)$time.series[,“remainder”]
ts.test <- ar(stl(co2.sub, s.window = "per")$time.series[,"remainder"]) 
ts.test
## 
## Call:
## ar(x = stl(co2.sub, s.window = "per")$time.series[, "remainder"])
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  0.2285  -0.0804  -0.2241  -0.1614  -0.1631  -0.1600  -0.1416  -0.1429  
##       9       10  
##  0.0205  -0.1108  
## 
## Order selected 10  sigma^2 estimated as  0.04636
  1. Simulate a new process with these fitted coefficients for the dropped years, also plot the 95% confidence levels.
ts.sim <- arima.sim(n=10000,model = list(ar=ts.test$ar))
head(ts.sim)
## [1] -2.70232667 -0.43528506 -0.62958179  0.09579876  0.98405062  2.24704356
ts.plot(ts.sim)

plot of chunk unnamed-chunk-13

require(forecast)
fit.conf <- forecast(ts.sim,level = 95)
fit.conf
##       Point Forecast     Lo 95    Hi 95
## 10001      0.2371253 -2.249450 2.723701
## 10002      0.2371253 -2.578974 3.053224
## 10003      0.2371253 -2.873786 3.348036
## 10004      0.2371253 -3.142981 3.617232
## 10005      0.2371253 -3.392265 3.866516
## 10006      0.2371253 -3.625494 4.099745
## 10007      0.2371253 -3.845421 4.319672
## 10008      0.2371253 -4.054091 4.528342
## 10009      0.2371253 -4.253075 4.727325
## 10010      0.2371253 -4.443606 4.917857
plot(fit.conf)

plot of chunk unnamed-chunk-13