Analysis of Spatio-Temporal Data: Lab1

Avipsa Roy, MSc Geoinformatics, WISE 2015

Lecture Wrap Up

1) How is the mean for a random variable X defined?
The mean of a random variable X = (x₁,x₂,….) is defined as follows:
alt text
2) Give the equation for covariance between two variables x and y.
The equation of ovariance for x and y is given by,

3) Give the equation for correlation between two variables x and y.
The equation of correaltion between variables x and y is,

4) For the samples x and y, compute the sample mean, the variance and the standard deviation.

x = c(1,2,4,5)
y = c(2,1,5,4)
mx = mean(x)
mx

## [1] 3

my = mean(y)
my

## [1] 3

vx = var(x)
vx

## [1] 3.333333

vy = var(y)
vy

## [1] 3.333333

sdx = sd(x)
sdx

## [1] 1.825742

sdy = sd(y)
sdy

## [1] 1.825742

5) Draw the data in a scatter plot (by hand)
alt text
6) From the table above, compute the covariance between x and y, and the correlation between x and y.

df <- data.frame(x,y)
df

##   x y
## 1 1 2
## 2 2 1
## 3 4 5
## 4 5 4

cov(x,y)

## [1] 2.666667

cor(x,y)

## [1] 0.8

7) Suppose x is a time series data set, and the rows indicate the time index. Compute the lag 0, lag 1 and lag 2 autocorrelations of x.

 head(acf(x))

## 
## Autocorrelations of series 'x', by lag
## 
##    1    2    3   NA   NA   NA 
##  0.3 -0.4 -0.4   NA   NA   NA

 plot(acf(x))

plot of chunk unnamed-chunk-3

8) What is the general equation of an autoregressive AR(p) process?
An autoregressive process of order p: Y_t is a linear combination of p preceding values:
alt text
9) What is the general equation of a moving average MA(q) process?
The moving average process of order q: Y_t is a linear combination of q preceding error terms:

10) Up to which lag is an AR(p) process correlated?
The lag is limited automatically to one less than the number of observations in the series. The default value for lag.max is 10*log10(N/m) where N is the number of observations and m the number of series.
11) Up to which lag is a MA(q) process correlated?
The default value for lag.max is 10*log10(N/m) where N is the number of observations and m the number of series.

 #' </br>

12) What can be said about partial correlations of an AR(2) process?
The partial correlation coefficient is estimated by fitting autoregressive models of successively higher orders up to lag.max.

Computer Exercises

1) Create a random walk time series of length 1000 using rnorm and cumsum, and plot it

x <- rnorm(1000)
head(x)

## [1] -0.5322932  0.6371624 -0.4848977 -1.2789985  0.4184086  1.1485540

y <- cumsum(x)
head(y)

## [1] -0.53229320  0.10486924 -0.38002841 -1.65902691 -1.24061830 -0.09206432

plot(y)

plot of chunk unnamed-chunk-5

2) Compute and plot the autocorrelation with acf and partial autocorrelation with pacf of this series. Explain what you see

y.acf1 <- acf(y,plot = FALSE)
y.acf1

## 
## Autocorrelations of series 'y', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 1.000 0.982 0.965 0.950 0.934 0.917 0.901 0.885 0.869 0.855 0.840 0.827 
##    12    13    14    15    16    17    18    19    20    21    22    23 
## 0.813 0.799 0.785 0.772 0.760 0.747 0.734 0.723 0.713 0.703 0.695 0.689 
##    24    25    26    27    28    29    30 
## 0.683 0.677 0.669 0.662 0.654 0.644 0.633

plot(y.acf1)

plot of chunk unnamed-chunk-6

y.pacf1 <- pacf(y,plot=FALSE)
y.pacf1

## 
## Partial autocorrelations of series 'y', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.982  0.012  0.026 -0.013 -0.032  0.015  0.000 -0.017  0.030  0.001 
##     11     12     13     14     15     16     17     18     19     20 
##  0.018 -0.002 -0.034  0.004  0.009  0.022 -0.008 -0.009  0.024  0.033 
##     21     22     23     24     25     26     27     28     29     30 
##  0.025  0.029  0.045  0.010  0.016 -0.056 -0.001 -0.019 -0.047 -0.039

plot(y.pacf1)

plot of chunk unnamed-chunk-6

3) Generate an ARIMA model with coefficient 0.9 and 0.1, determine acf and pacf

ar1 <- arima.sim(list(ar = 0.9) ,n = 1000)
head(ar1)

## [1] -3.634793 -3.867896 -4.923998 -4.903015 -4.362268 -3.690888

plot(ar1)

plot of chunk unnamed-chunk-7

ar2 <- arima.sim(list(ar = 0.1) ,n = 1000)
head(ar2)

## [1] -1.6669862 -0.2338517  1.1162852 -0.3139022  0.5387906  2.3854389

plot(ar2)

plot of chunk unnamed-chunk-7

acf(arima.sim(list(ar = 0.1) ,n = 1000))

plot of chunk unnamed-chunk-7

pacf(arima.sim(list(ar = 0.1) ,n = 1000))

plot of chunk unnamed-chunk-7

4) Generate a model AR(3) of length 10000 using coefficients c(0.7,0.15,0.05) and calculate acf, pacf

ar3 <- arima.sim(list(ar = c(0.7,0.15,0.05)) ,n = 10000)
head(ar3)

## [1]  0.1228688 -0.6714625 -0.5739529  0.3848259 -1.3420960  0.6117727

plot(ar3)

plot of chunk unnamed-chunk-8

ar3.acf <- acf(ar3,plot = FALSE)
ar3.acf

## 
## Autocorrelations of series 'ar3', by lag
## 
##      0      1      2      3      4      5      6      7      8      9 
##  1.000  0.869  0.799  0.735  0.673  0.615  0.564  0.517  0.474  0.433 
##     10     11     12     13     14     15     16     17     18     19 
##  0.399  0.368  0.336  0.306  0.275  0.244  0.217  0.193  0.172  0.154 
##     20     21     22     23     24     25     26     27     28     29 
##  0.137  0.124  0.107  0.093  0.079  0.063  0.052  0.041  0.030  0.021 
##     30     31     32     33     34     35     36     37     38     39 
##  0.013  0.004 -0.004 -0.013 -0.022 -0.028 -0.033 -0.041 -0.046 -0.052 
##     40 
## -0.054

plot(ar3.acf)

plot of chunk unnamed-chunk-8

ar3.pacf <- pacf(ar3, plot = FALSE)
ar3.pacf

## 
## Partial autocorrelations of series 'ar3', by lag
## 
##      1      2      3      4      5      6      7      8      9     10 
##  0.869  0.178  0.040 -0.003 -0.007  0.003  0.001 -0.001 -0.005  0.012 
##     11     12     13     14     15     16     17     18     19     20 
##  0.008 -0.010 -0.013 -0.015 -0.020 -0.006  0.000  0.002  0.003  0.002 
##     21     22     23     24     25     26     27     28     29     30 
##  0.006 -0.016 -0.008 -0.006 -0.017  0.003 -0.001 -0.005 -0.002 -0.002 
##     31     32     33     34     35     36     37     38     39     40 
## -0.009 -0.010 -0.007 -0.016  0.006  0.001 -0.019 -0.001 -0.007  0.006

plot(ar3.pacf)

plot of chunk unnamed-chunk-8

5) Save the series and fit an AR model to it using ar. Why are the parameters not identical to those used for simulation? What can you do to make the estimated parameters closer to those used for simulation?

ar(ar3)

## 
## Call:
## ar(x = ar3)
## 
## Coefficients:
##      1       2       3  
## 0.7069  0.1495  0.0402  
## 
## Order selected 3  sigma^2 estimated as  0.9912

arima.sim simulates the given model and creates a series of values even though these values are based on a specific model. You can increase the number of elements (from n=1000 to n=10000 or more) in the series.

6) Simulate an MA(10) process with constant weights rep(1,10), and plot autocorrelation functions; explain what you see.

sim.ma<-arima.sim(list(ma=c(1,10)),n=1000)
head(sim.ma)

## [1]   9.526457  -9.712811  -9.851704 -15.965739  19.234959  11.512739

acf(sim.ma,main="ACF of MA(10) process")

plot of chunk unnamed-chunk-10

pacf(sim.ma,main="PACF of MA(10) process")

plot of chunk unnamed-chunk-10

7) Consider the plot generated by plot(stl(co2.sub,s.window = “per”)) and describe in some detail the components into which the original time series has been decomposed. What do the grey bars at the right hand side of the sub-plots reflect?

co2.sub <- ts(co2[1:400], 1959, 1959+399/12, 12)
head(co2.sub)

## [1] 315.42 316.31 316.50 317.56 318.13 318.00

plot(stl(co2.sub,s.window = "per"))

plot of chunk unnamed-chunk-11

A: The grey bars reflect the data series that has been composed into seasonal, trend and remainder components.The seasonal compoenent looks at reaccuring patterns, the trend component looks at overall trend component and the remainder defines all the patterns that are not explained by the aforementionend components.

8) Fit an AR process to the remainder of this decomposition, obtained by stl(co2.sub, s.window = “per”)$time.series[,“remainder”]

ts.test <- ar(stl(co2.sub, s.window = "per")$time.series[,"remainder"]) 
ts.test

## 
## Call:
## ar(x = stl(co2.sub, s.window = "per")$time.series[, "remainder"])
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  0.2285  -0.0804  -0.2241  -0.1614  -0.1631  -0.1600  -0.1416  -0.1429  
##       9       10  
##  0.0205  -0.1108  
## 
## Order selected 10  sigma^2 estimated as  0.04636

9) Simulate a new process with these fitted coefficients for the dropped years, also plot the 95% confidence levels.

ts.sim <- arima.sim(n=10000,model = list(ar=ts.test$ar))
head(ts.sim)

## [1] -0.2172221 -1.1313935 -2.4871699 -0.1323190 -2.0552569  2.0968131

ts.plot(ts.sim)

plot of chunk unnamed-chunk-13

require(forecast)
fit.conf <- forecast(ts.sim,level = 95)
fit.conf

##       Point Forecast     Lo 95    Hi 95
## 10001   -0.006174289 -2.303404 2.291055
## 10002   -0.006174289 -2.303404 2.291055
## 10003   -0.006174289 -2.303404 2.291055
## 10004   -0.006174289 -2.303404 2.291055
## 10005   -0.006174289 -2.303404 2.291055
## 10006   -0.006174289 -2.303404 2.291055
## 10007   -0.006174289 -2.303404 2.291055
## 10008   -0.006174289 -2.303404 2.291055
## 10009   -0.006174289 -2.303404 2.291055
## 10010   -0.006174289 -2.303404 2.291055

plot(fit.conf)

plot of chunk unnamed-chunk-13