When a variable is measured sequentially in time over or at a fixed interval, known as the sampling interval, the resulting data form a time series. Observations that have been collected over fixed sampling intervals form historical time series.

A sequence of random variables defined at fixed sampling intervals is sometimes rederred to as a discrete-time stochastic process, though the shorter name time series model is oftern preffered.

The theory of stochastic processes is vast and may be studied without necessarily fitting any models to data.

The main features of many time series are trends and seasonal variations that can be modelled deterministically with mathematical functions of time. But, another important feature of most time series is that observations close together in time tend to be correlated (serially dependent). Much of the methodology in a time series analysis is aimed at explaining this correlation and the main features in the data using appropriate statistical models and descriptive methods. Once a good model is found and fitted to data, the analyst can use the model to forecast future values, or generate simulations, to guide planning decisin. Fitted models are also used as a basis for statistical tests. Sampling intervals differ in their relation to the data. The data may have been aggregated or sampled. If data are smapled, the sampling interval must be short enough for the times series to provide a very close approximation to the original continuous signal when it is interpolated.

Decomposition of series

Decomposition in R

In R, the function decompose estimates trends and seasonal effects using a moving average method. Nesting the function within plot(using plot(stl())) produces a single figure showing theoriginal series xt and the decomposed serist mt, st, and zt. For example, with the electricity data, additive and multiplicative decomposition plots are given by the commands bellow; the last plot, which uses lt to give different line types, is the superposition of the seasonal effect on the trend.

plot(decompose(elec.ts))

elec.decom <- decompose(elec.ts, type = "mult")
plot(elec.decom)

trend <- elec.decom$trend

seasonal <- elec.decom$seasonal

ts.plot(cbind(trend, trend*seasonal), lty = 1:2)

In this example, the multiplicative model would seem more appropriate than the addictive model beause the variance of the original series and trend increase with time. However, the random component, which corresponds to zt, also has an increasing variance which indicates that a log-transformation may be more appropriate for this series. The random series obtained from the decompose function is not precisesly a realisation of the random process zt but rather an estimate of that realisation. It is an estimate because it is ontained from the original time series using estmates of the trend and seasonal effects. This estimate of the realisation of the random process is residural error series.

CORRELATION

Purpose

Once we have indentified any trend and seasonal effects, we can deseasonalise the time series and remove the trend. If we use the additive decomposition method, we first caluclate the seasonally adjusted time series and then remove the trend by substraction. This leaves the random component, but the random component is not necessarily well modelled by independent random variables. In many cases, consecutive variables will be correlated. If we identify such correlations, we can improve our forecasts, quite dramatically if the correlations are high. We also need to estimate correlations if we are to generate realistic time series for simulations. The correlation structure of a time series model is defined by the correlation fucntion, and we estimate this from the observed time series.

www4 <- "Herald.dat"

herald.dat <- read.table(www4, header = T)
attach(herald.dat)

We now use R to calculate the covariance for the Herald Square pairs in three different ways:

x <- CO; y <- Benzoa; n<- length(x)

sum((x-mean(x))*(y-mean(y)))/(n-1)
## [1] 5.511042
mean((x-mean(x))*(y-mean(y)))
## [1] 5.166602
cov(x,y)
## [1] 5.511042

The correspondence between the R code above and the expectation definition of covariance should be noted:

mean((x-mean(x))*(y-mean(y))) -> E[(x-ux)(y-uy)]

Correlation is a dimensionless measure of the linear association between a pair of variables (x,y) and is obtained by standardising the covariance by dividing it by the product of the standard deviations of the variables. Correlation takes a value between -1 and +1 indicates an exact linear association, with the (x,y) pairs falling on a straight line of positive or negative slope, respectively. The correlation between the CO and benzoapyrene measurmements at Herald Square is now calculated both from the definition and usign cor.

cor(x,y)/(sd(x)*sd(y))
## [1] 0.02288027
cor(x,y)
## [1] 0.3550973

Although the correlation is small, the is nevertheless a physical explanation for the correlation because both products are a result of incomplete combustion. A correlation of 0,36 typically corresponds t a slight visual impression that y tends to increase as x increases, although the points will be well scattered.

The sample acf is defined as rk=ck/c0

We will demonstrate the calculations in R using a time series of wave heights(mm relative to still water level) measured at the centre of a wave tank. There is no rend and no seasonal period, so it is reasonable to suppose the time series is a realisation of a stationary process.

www5 <- "wave.dat"

wave.dat <- read.table(www5, header = T); attach(wave.dat)

plot(ts(wave.dat)) 

plot(ts(waveht[1:60]))

The upper plot shows the netire time series. There are no outlying values. The lower plot is of the first sixty wave heights. We can see that there is a tendency for consecutive values to be relatively similar and that the form is like a rough sea, with a quasi-periodicity but no fixed frequency.

The autocorrelations of x are stored in the vector acf(x)\(acf, with the lag k autocorrelation located in acf(x)\)acf[k+1]. For example, the lag 1 autocorrelation for waveht is

acf(waveht)$acf[2]

## [1] 0.4702564

Tion and alerts us to any the first entry, acf(waveht)$acf[1], is r0 and equals 1. A scatter plot, such as figure bellow for the Herald Square data, complements the calculation of the correlation and alerts us to any non-linear paterns. In a similar way, we can draw a scatter plot corresponding to each autocorrelation. In a similar way, we can draw a scatter plot corresponding to each autocorrelation. For example, for lag 1 we plot(waveht[1:396], waveht[2:397]) to obtain figure bellow.

plot(waveht[1:396], waveht[2:397])

Autocovariances are obtained by adding an argument to acf. The lag 1 autocovariance is given by

acf(waveht, type = c("covariance"))$acf[2]

## [1] 33328.39

Correlogram

Example based on ai passenger series

Although we want to know about trends and seasonal patterns in a time series, we do not necessarily rely on the correlogram to indentify them.The main use of the correlogram is to detect autocorrelations in the time series after we have removed an estimate of the trend and seasonal variation. In the code bellow, the air passenger series is seasonally adjusted and the trend removed using decompose.

data("AirPassengers")
ap <- AirPassengers

ap.decomp <- decompose(ap, "multiplicative")

plot(ts(ap.decomp$random[7:138]))

acf(ap.decomp$random[7:138])

To plot the random component and draw the correlogram, we need to remember that a consequence of using a centred moving average of 12 months to smooth the time series, and thereby estmate the trend, is that the first six and last six terms in the random component cannot be calculated and are thus stored in R as NA. The random component and correlogram are show bellow.

The correlogram in the last figure above suggests either a damped cosine shapethat is characteristic of an autoregressive model of order 2 or that the seasonal adjustmnet has not been entirely effective. The latter expalanation is unlikely because the decomposition does estimate twelve independent monthly indices. If we investigate further, we see that the standard deviation of the orifical series from July until June is 109, the standard deviation of the series agter subtracting the trend estimate is 41, and the standard deviation after seasonal adjustment is just 0.03

sd(ap[7:138])
## [1] 109.4187
sd(ap[7:138] - ap.decomp$trend[7:138])
## [1] 41.11491
sd(ap.decomp$random[7:138])
## [1] 0.0333884

The reduction in the standard deviation shows that the seasonal adjustment has been very effective.

Example based on the Font Reservoir series

Monthly effective inflows (m3s-1) to the Fond Reservoir in Northumberland for the period from Jan 1909 until December 1980 have been provided Northumbrian Water PLC. It can be observed that there was a slight decreasing trend over this period, and substantial seasonal variation. The trend and seasonal variation have been estimated by regression and residual series (adflow). The main difference between the regression approach and using decompose is that the former assumes a linear trend, whereas the latter smooths the time series without assuming any particular form for the trend.

www6 <- "Fontdsdt.dat"

fontdsdt.dat <- read.table(www6, header = T)

attach(fontdsdt.dat)

plot(ts(adflow), ylab = 'adflow')

acf(adflow, xlab = 'lag (months)', main="")

There is a statistically significant correlation at lag 1. The physical interpretation is that the inflow next month is more likely than not to be above average if the inflow this month is above average. Similarly, if the inflow this month is elow average it is more likely thatn not that next month;s inflow will be below average. The explanation is that the groundwater supply can be thought of as a slowly discharging reservoir. If groundwater is high one month it will augment inflows, and is likely to do so next month as well. Given this explanation, you may be surprised that the lag 1 correlation is not higher. The explanation for this is that most of the inflow is runoof following rainfall, and in Nrothumberland there is ltittle correlation between seasonally adjusted rainfall in consecutive months. Ans expenential decay in the correlogram is typical of a first-order autoregressive model. The correlogram of the adjusted inflows is consistent with an exponential decauy. However, given the sampling errors for a time series of this length, estimate of autocorrelation at higher lags are unlikely to be statistically significant. This is not a practical limitation because such low correlations are inconsequential. When we come to infentify suitable models, we should remember that there is one correct model and that there will often be achoice of suitable models. We may make use of a specific statistical criterion as as Akaike;s information criterion, see bellow, to choose a model, but this does not imply that the model is correct.